20 The art of computer generation of random variables

20 The art of computer generation of random variables

C. R. Rao, ed., Handbook of Statistics, Vol. 9 © 1993 Elsevier Science Publishers B.V. All rights reserved. t,,tf ~ LU The Art of Computer Generati...

3MB Sizes 51 Downloads 75 Views

C. R. Rao, ed., Handbook of Statistics, Vol. 9 © 1993 Elsevier Science Publishers B.V. All rights reserved.

t,,tf ~

LU

The Art of Computer Generation of Random Variables

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

I. Introduction

The title of this paper has been chosen to be similar to The Art of Computer Programming, a multi-volume work by Donald E. Knuth. The second volume of Knuth's treatise includes a comprehensive discussion of the design and testing of uniform random number generators. The descriptor, art, is appropriate in connection with the generation of uniform random numbers since the algorithms in use today are the result of a long history of trial and error evaluation of proposed generators. In fact, the search for an ultimate uniform generator is never-ending. This is because the uniform 'random' number streams produced on computers are actually deterministic and, accordingly, each generator violates some aspect of 'randomness' and would fail a suitably devised test of uniformity. For this reason, the term 'pseudo-random' has been coined, but we will forego this bit of pedantry and omit the qualifier 'pseudo'. The present paper deals exclusively with random variable generation. Section 2 surveys some of the issues that arise in connection with uniform generators. This is a vast area and we can only touch upon a few of the main points. The rest of the paper assumes that a suitable uniform generator is available as a system resource, and is concerned with using this resource to generate observations from non-uniform distributions. Section 3 discusses univariate distributions and Section 4 deals with multivariate distributions. Our emphasis is upon general methods rather than specific distributional families, although the latter are used for illustrative examples and for motivational purposes. In contrast with the case of uniform generators, the criteria for evaluating non-uniform generators are more clearcut, although these criteria often conflict so that the final choice of generator involves tradeoffs among factors such as programming complexity, portability, storage requirements, and execution time. Important topics not covered in the present survey include: variance reduction (Nelson, 1987; Wilson, 1984); importance sampling, Gibbs sampling, and data augmentation (Gelfand and Smith, 1990; Hahn and Jeruchim, 1987; Tanner, 1991; Tanner and Wong, 1987); large deviation methods (Bucklew, 661

662

M. T. Boswell, S. D. Gore, G. P. Patil and C. TaiUie

1990, Chapter VIII); permutation generators (Oommen and Ng, 1990); Monte Carlo tests (Besag and Clifford, 1989; Hall and Titterington, 1989); simulation of time series (Barone, 1987), spatial processes and other stochastic processes; numerical quadrature (Flournoy and Tsutakawa, 1991); general simulation and other applications. The subject of random variable generation has an enormous literature. Bibliographies have been prepared by Deak and Bene (1987), Nance and Overstreet (1972), Sahai (1979), and Sowey (1972, 1978). See also Holst (1979). Survey articles include Banks, Carson and Goldsman (1989), Patil, Boswell and Friday (1975), Ripley (1983a), and Schmeiser (1980b, 1990). There are also a number of recent monographs with at least one chapter on the topic of random variable generation. These include Banks and Carson (1984), Dagpunar (1988b), Deak (1990), Devroye (1986), Dudewicz and Ralley (1981), Fishman (1978), Johnson (1987), Kennedy and Gentle (1980), Kleijnen (1988), Knuth (1981), Law and Kelton (1982), Lewis and Orav (1989), Lewis (1975), Morgan (1984), and Ripley (1987).

2. Generating uniform random numbers on a computer

Historically, simulation was of interest even before the advent of electronic computers, and physical devices such as dice, coins, and radiation detectors were used to inject randomness into the processes under study. W. S. Gossett's investigations of the 'student' t-distribution are an early and famous example of the systematic use of simulation. The RAND Corporation (1955) produced a list of one million random digits and made the list available on tape so it would be usable on computers. But, such lists are difficult to maintain and timeconsuming to read electronically. Physical generators also suffer from several shortcomings: the results are not reproducible (unless stored in toto) and can be sensitive to extraneous events. Experience indicates that only relatively short sequences of high-quality random numbers can be produced by physical means. Interest has therefore turned to methods of generation that directly employ the computer's arithmetical capabilities. Perhaps the earliest such method is due to von Neumann who suggested that an N-digit number be squared and that the middle N digits of the result be used as the next number in the sequence. Von Neumann's 'middle-square' method has proved unsatisfactory in general; see Knuth (1981, p. 4) for discussion on this point. But the method is broadly representative of today's uniform generators which (internally, at least) produce a stream of integers with each member of the stream being calculated by applying a deterministic mathematical formula to a small number of immediate predecessors in the stream. Generally, these integers are subsequently converted to a floating point format and returned as real numbers in the unit interval (0, 1). The conversion from integer to floating point format is not entirely trivial and will be discussed in greater detail below. A well-designed uniform generator should never return

Computer generation of random variables

663

the values 0.0 or 1.0. We will use the term 'uniform' generator to refer to either the internal stream of integers or the external stream of floating point values, letting the context determine which is intended. An enormous variety of tests, both theoretical and statistical, have been proposed for evaluating the quality of a uniform generator. These tests will not be enumerated or discussed here; interested readers may refer to Bright and Enison (1979), Dudewicz and Ralley (1981), and Knuth (1981). MacLaren (1989) discusses between-sequence performance of multiple generators on parallel computers. Almost all uniform generators in use today are based upon a linear congruential recurrence relation of the form X~+ 1 = (aoX n + a l X , _ 1 + . . . + a~X,,_ k + c) mod m .

Here ao, a l , . . . ,a~, c, and m are specified integers that characterize the generator. The user must also provide a seed vector (X0, X 1 , . . . , Xk) to get the recurrence started. Then Xk+ , can be generated, and the seed for Xk+ 2 becomes ( X 1 , X z , . . . , X k + l ) , etc. See Eichenauer and Lehn (1986) for discussion of nonlinear congruential generators. Most often, k = 1 and the recurrence becomes X . +1 = (aXn + c) mod m ,

a form proposed by Lehmer (1951). Here, the parameters are the multiplier a, the additive constant c, and the modulus m. Additionally, a seed X 0 is required. When c = 0, the generator is called a multiplicative congruential generator and, when c C 0 , the generator is called a mixed congruential generator. The characterizing parameters must be chosen with care if the generator is to give satisfactory results. It has become a cliche to point out t h a t the generator R A N D U , based on Am+1 = (216 + 3 ) X n mod (231 - 1) and provided for use on IBM 360/370 and P D P l l machines, is a very poor generator. Borosh and Niederreiter (1983) discuss the choice of multipliers in linear congruential generators. Fishman and Moore (1986) study the multiplicative congruential generators with m = 231 - 1. A given generator may also restrict the allowed seed values for satisfactory performance and sufficiently long period. A multiplicative congruential generator, for example, must obviously disallow X 0 = 0 as a seed. Heath and Sanchez (1986) discuss the period of uniform generators, pointing out that the minimum acceptable period-length depends upon the application. One of the better known multiplicative generators is based on the relation

Xn+1 (aXn) mod (2 31 - i ) , =

where a = 14 z9 and has been shown to possess good statistical properties. See Carta (1990), Lewis et al. (1969), Park and Miller (1988) and Payne et al.

664

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

(1969). See also Atkinson (1980), Borosh and Niederreiter (1983), Peskun (1980), and Ripley (1983a). It should be emphasized that the integer arithmetic in the recurrence relation needs to be carried out in full precision. An amusing example where this dictum was overlooked is the random number generator built into the Master Library of the Texas Instruments TI-59 programmable calculator. The library manual provides a detailed description of the mixed congruential generator employed (one of those listed by Knuth, 1981), but the recurrence is calculated using the calculator's floating point arithmetic with the result that after three or four iterations the generated uniforms bear no resemblance to what should come out of a correct implementation. When m = 2, the linear congruential recurrence generates a sequence b n of bits. Tausworthe (1965) generators are of this type with recurrence relation given by bn+ 1 = (bn_ p + bn_(p_q) ) mod 2,

where the integers p > q > 0 and a seed (b0, b x , . . . ,bp) determine the generator. The right-hand side can be calculated efficiently by the 'mutually exclusive or' function, b,+ 1 = b n _ p X O R b n _ ( p _ q ) . The parameters p and q should be chosen to make the trinomial x P + x ~ + 1 a primitive polynomial modulo 2. When this is done, the period of the bit stream is 2 p - 1. Zierler (1969) published a list of 23 pairs (p, q) that make x P + x q + 1 a primitive polynomial modulo 2 (see also Stahnke, 1973, and Zierler and Brilhart, 1968). In the Tausworthe scheme, random integers are not formed by concatenating consecutive bits from the bit stream. Instead, separate streams are used for each bit position. Suppose, for example, that 32-bit integers are desired. Then 32 separate bit streams b~i),

i=1,2,...,32,

would be generated, each with its own starting seed, and the stream of random integers X n would be formed by the following concatenation of bits, Xn = b ,(1) b~(2) . . . b ~(32) .

Notice that the bit streams can be generated simultaneously as X , + 1 = X , _ p X O R X,_(p_q) .

Tausworthe generators of this type are also called generalized feedback shift register (GFSR) generators. Selecting starting seeds ( X o, X a, . . . , Xp) that will ensure 'independence' of the different bit streams is a nontrivial matter involving a lengthy initialization (see Lewis and Payne, 1973, and Collings and Hembree, 1986). Kirkpatriek and Stoll (1981) suggest choosing the seed according to a certain random process in order to avoid this initialization. GFSR generators have been studied extensively both statistically and theoretically. Bright and Enison (1979) have shown that with an appropriate

Computer generation of random variables

665

seed vector, Tausworthe sequences are uniform over the period of the generator. See also Altman (1988), Boyar (1989), Collings and Hembree (1986), Fellen (1969), Grube (1973), Koopman (1986), Lewis and Payne (1973), Neuman and Martin (1976), Niederreiter (1976, 1978, 1984, 1987), Pavlov and Pohodzei (1979), Romero (1980), Tezuka (1984, 1987a,b), Tootill et al. (1971, 1973), and Whittlesey (1968, 1969). The first author of this paper considers Tausworthe generators to be superior to the more common and simpler Lehmer generators, and he has developed a package of FORTRAN-callable random variable generators based on the GFSR generator with p = 98 and q = 27 proposed by Lewis and Payne (1973). A high-quality generator should exhibit little serial correlation when k-tuples ( X 1 , . . . , Xk) of observations are examined. This is known as k-distributivity. Tausworthe generators have been found to have excellent k-distributivity properties (see Fushimi and Tezuka, 1983). Various proposals have been made to improve the k-distributivity of other generators, notably shuffling and/or combinations of generators. See, for example, Brown and Solomon (1979), Collings (1987), L'Ecuyer (1988), Gebhardt (1967), MacLaren and Marsaglia (1965), Nance and Overstreet (1978), Westlake (1967), and Wichmann and Hill (1982, 1984, 1987). As pointed out above, the stream of integers produced internally by a uniform generator has to be converted to floating point values between zero and one. Also, a well-designed uniform generator should never return 0.0 or 1.0. Routines that are coded in Assembly language can sometimes manipulate the bits directly into a floating point format but higher level routines usually employ division. Suppose, for example, that the possible values in the stream of integers are J = 0, 1, 2 . . . . , M - 1. Then the transformation from J to a uniform variable U is described by the following algorithm: Step 1. If J = 0, fetch the next J and repeat Step 1. Step 2. Deliver U = J/M.

Some generators, including multiplicative congruential, never yield J = 0 and, for these, Step 1 may be omitted. Other generators may restrict the values of J to even or to odd integers, but these are minor complications. More fundamental is the fact that successive values J / M of U typically have a finer resolution than can be accommodated in single precision floating point format. To see the consequences of this suppose the possible values of J are J = 1 , 2 , . . . , 998,999, and suppose the floating point representation carries two decimal digits of precision. The first 99 values of J / M can be accommodated exactly: 0.001, 0 . 0 0 2 , . . . , 0.098, 0.099, but at the upper end some rounding or truncation will occur. Thus, J / M = 0.999 will either be rounded toward to U = 0.99 or upward to the undesirable

666

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

value U = 1.0. Even if these undesirable values are screened out, the resulting generator violates one of the fundamental properties of a uniform distribution, namely that U and 1 - U should be identically distributed. When M is approximately a power of the computer's radix (as it is above), simple and efficient remedies are available in Assembly language to force U and 1 - U to have nearly the same distribution. Implementing these remedies in higher level languages tends to be awkward, and the solution is often to ignore the difficulty. Our own preference is to screen out the values U = 0.0 and U = 1.0 as well as those values of U that are so small that 1 - U = 1 in floating point representation. Another solution is to return U as double precision. While a strong case can be made for implementing uniform generators in Assembly language, the need for portability may dictate the use of a higher level language. L'Ecuyer (1986, 1988), Marsaglia, Zaman, and Tsang (1990), Marse and Roberts (1983), Sehrage (1979) and Wichmann and Hill (1982, 1984; see also McLeod, 1985, and Zeisel, 1986) describe portable uniform generators. Park and Miller (1988) discuss implementation aspects and present several examples of unsatisfactory generators. The remainder of this paper assumes that a well-tested uniform generator is available to the user.

3. Univariate generation methods 3.1. Introduction

This section reviews various computational techniques that have appeared in the literature on computer generation of random variables from non-uniform probability distributions. Examples are given for the purpose of motivating and clarifying the discussion and illustrating the techniques under consideration but may not represent the most efficient method of generating observations from the given distribution. We assume that the reader of this paper has a foundation in basic mathematical probability. He should be familiar with the standard discrete and continuous distributions, with the concept of conditional probability, and with change of variable techniques. The more he knows about various probability distributions, their properties and interrelationships, the easier it will be for him to follow the methods presented in this paper and the easier it will be for him to develop new methods. A good collection of facts regarding distributional properties and interrelationships is contained in the three-volume Dictionary and Classified Bibliography of Statistical Distributions in Scientific Work by Patil et al. (1984). Our discussion is concerned with the generation of random deviates from specified distributions. But in simulation studies of robustness or power of statistical procedures, it is often convenient to organize the results in terms of characteristics such as skewness or kurtosis instead of particular distributions.

Computer generation of random variables

667

Here the use of standardized systems such as the Johnson family may prove convenient. See Tadikamalla (1980) for discussion of several such systems of distributions. In this connection, Johnson, Tietjen and Beckman (1980) propose a four-parameter system of symmetric distributions. Fleishman (1978) points out that specified skewness and kurtosis can be achieved by a suitable cubic polynomial-transformation of a standard normal deviate.

3.2. Notation If X denotes a random variable, the corresponding lower case letter, X = x, indicates a realized value. Cumulative distribution functions and density functions are denoted by F(x) and f(x), respectively. A subscript, as in Fx and fx, will be appended as needed to avoid ambiguity. Algorithms are presented in an outline form for generating one observation. When this observation has been obtained we say 'deliver X'. At this point, the routine terminates if only one observation is desired, or loops back to the beginning if another observation is desired. It is also convenient to have symbolic names for some frequently occurring distributions: U(a, b) uniform distribution on the interval (a, b), U(0, 1) uniform distribution on the unit interval, N(/z, o-2) normal distribution with mean/z and variance o-2, N(0, 1) standard normal distribution, EXP(/x) exponential distribution with mean/z, EXP(1) standard exponential distribution, P(A) Poisson distribution with mean A, P(1) standard Poisson distribution. Insofar as possible, the letters U, V and W will be reserved for uniformly distributed variables. When an algorithm states 'generate U1, U2, . . . , Un from U(0, 1)', it is understood that the U~ are independent.

3.3. Transformations of random variables If it is possible to generate realizations of a random variable Y, then observations can also be generated for any transformation X = g ( Y ) of Y, provided the transformation g(.) is computable. Multivariate transformations of the form X = g(Yx, Y 2 , . . . , Y,) are also useful in random variable generation. Often, but not always, I11, • • •, Yn are independent identically distributed realizations from some single distribution. EXAMPLE 3.3.1 (The chi-square distribution). If Y has a standard normal distribution, then X = y2 has a chi-square distribution with one degree of freedom. So, to generate observations from X, one can generate realizations of the standard normal distribution and square those realizations. More generally, if Y1. . . . . Yn are independent standard normal variates, then X = y2 + . . . +

668

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

y2 has a chi-square distribution with n degrees of freedom. In a similar fashion, the F- and t-distributions can be obtained by suitably transforming normal and chi-square variates. EXAMPLE 3.3.2 (The Cauchy distribution). If I11 and Y2 are independent realizations of the standard normal distribution, then the standard Cauchy distribution can be generated by X = IIi/Y2. Michael, Schucany and Haas (1976), consider applications in which the transformation X = g(Y) is not single-valued a n d the different values need to be selected with appropriate probabilities.

3.3.1. Inverse distribution functions: Discrete case Consider a discrete random variable taking values x 1 < X 2 < ' ° " . The distribution function F(x) is a step function with jumps at xl, x 2, . . . . The inverse distribution function is defined by F-l(u)=xi

if F ( x i _ ~ ) < u < ~ F ( x i ) , i = 1 , 2 , . . . ,

where x0 = - ~ . Now let U be a random variable uniformly distributed on (0, 1), and set X = F-I(U). Then P[X ~
F(x),

so that the generated X has F(x) as its distribution function. Implementation details are given in the following algorithm. ALGORITHM 3.3.1. Step 1. Generate U = u from U(0, 1). Step 2. Initialize i = 1 and f = F(xl). Step 3. If u ~
F(xi).

Much of the execution time in this algorithm may be spent in making comparisons at Step 3, and performance can be improved by employing efficient search strategies. If nothing else, the search order could be changed to hold down the expected number of comparisons. For example, K e m p (1986) describes a binomial generator with the search moving away from the modal value. In general, let p / = F(xi) - F(xi_l), i = 1, 2 , . . . , and order these so that Ph ~>PJ2 ~> " " " " In fact, the reordered sequence does not have to be monotonic; simply moving large p to the beginning of the sequence will improve performance. Now, the improved algorithm becomes as follows. ALGORITHM 3.3.2. Step 1. Generate U = u from U(0, 1).

Computergenerationof random variables Step Step Step Step

2. 3. 4. 5.

669

Initialize i = 1 and f = Ph" If u ~
EXAMPLE 3.3.3 (The Poisson distribution). Suppose we desire an observation from the standard Poisson distribution P(1). The probability function is Pi = e-i/i!, i = 0, 1 , . . . . Since pi is already a non-increasing function of i, we make -1 comparisons in their natural order. To avoid repeated multiplication by e , we multiply the uniform random variable U by e. The m e t h o d is summarized by the following algorithm. ALGORITHM 3.3.3.

Step Step Step Step Step Step

1. 2. 3. 4. 5. 6.

Generate U = u from U(0, 1). Initialize: i = 0, a = 1, z = 1. (N.B., a = eF(i) and z = 1~ft.) Set y = eu. If y ~< a then deliver X = i. Increment i to i + 1 and then replace z by z/i and a by a + z. Go to Step 4.

The method could be further improved by a faster search routine which combines various intervals in a suitable manner. Index tables (Chen and Asau, 1977; Fishman and Moore, 1986) could also be employed to speed up the searching. See Kachitvichyanukul et al. (1988), Kemp (1982, 1988, 1990) and Kemp and Kemp (1990, 1991) for other Poisson generation methods. Kemp (1981) combines the inverse distribution function with other transformations to generate observations from the logseries distribution. Floating point comparisons can be avoided through the use of table-lookup methods, as illustrated in the following example. The m e t h o d requires that subscripts (or indices) be selected at random from among a finite set {1, 2 , . . . , N } of index values. This is easily accomplished by the floor (or truncation) operator, I = 1 + [NU]. Then t(I) is a random selection from the stored table t ( 1 ) , . . . , t(N). EXAMPLE 3.3.4 (The hypergeometric distribution). The hypergeometric distribution results from sampling without replacement from an urn containing r red balls and b blue balls. If n balls are drawn at random without replacement from the urn and X is the number of red balls, then

for max{0, n - b} ~
670

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

suppose we wish to sample three times without replacement from an urn containing 6 red and 4 blue balls to get the number of red balls, X. Let t(1) . . . . . t(6) = 1 and t(7) . . . . . t(10) = 0. Next generate U from U(0, 1), and put I = [10U] + 1. Then set X = t(I) to get the result after one draw. We now want to sample from the 9 remaining tabled values. We interchange t(I) and t(10); if I = 10, the table is unchanged. We generate a new value of U, set I = [9U] + 1, and replace X by X + t(I). This value of X represents the n u m b e r of red balls after two draws, etc. The following algorithm describes this procedure for n draws without replacement when there are r red balls and b blue balls initially. ALGORITHM 3.3.4.

Step 1. Initialize x = 0, t(1) . . . . . t(r) = 1, t(r + 1) . . . . . ]=r+b, k=n. Step 2. Generate U = u from U(0, 1). Step 3. Set i = [ju] + 1. Step 4. Replace x by x + t(i) and decrement k. Step 5. If k = 0 then deliver X = x. Step 6. Interchange t(i) and t(j), then decrement j. Step 7. Go to Step 2.

t(r + b) = O,

The table still contains the original r + b values but in different order. Consequently, the algorithm can be repeated to get a second observation of X without re-initializing the table. Also notice that if n >i r or n I> b, then efficiency can be improved slightly by terminating the routine when all the red or all the blue balls have been drawn. In Algorithm 3.3.4, sampling is without replacement from a population of two distinct types. A simple modification for sampling with replacement would result in the binomial distribution; merely omit Step 6. On the other hand, if three (or more) distinct types are required then the values stored in the table (say, 1, 2, 3) can be used as subscripts on the variables. That is, if the sampled value is t(i) =j then replace X(j) by X(]) + 1, j = 1, 2, 3. The delivered pair of values (X(1), X(2)) would comprise an observation from the multivariate hypergeometric distribution (or from the multinomial distribution if sampling were with replacement). EXAMPLE 3.3.5 (Table-lookup method, Marsaglia, 1963). Suppose we wish to generate observations of a discrete random variable X with density P[X = xi] = Pi, i = 1, 2 , . . . , with E p / = 1. The inverse distribution function m e t h o d generates U from U(0, 1) and returns X = x~ where i is the integer satisfying

po+...+pi_l
with p0 =-0.

A large number of comparisons may be necessary. Instead, for suitable probabilities, a table-lookup procedure can be used. Suppose observations

Computer generation of random variables

671

from an approximation to the standard Poisson distribution P(1) are desired. The distribution to three decimal places is

x

f(x)

Total

0

1

2

3

4

5

6

0.368

0.368

0.184

0.061

0.015

0.003

0.001

1.000

Analogous to Example 3.3.4, we initialize a table t ( 1 ) , . . . , t(1000) so that the values 0 and 1 each occur 368 times, the value 2 occurs 184 times, etc. T h e n X is generated as X = t(I), where I = [1000U] + 1. This lookup procedure is very fast but does require moderate storage. If we were interested in the tail values, the distribution would have to be calculated to more decimal places. For example, five digits of accuracy would require a table with 100 000 entries. The mixture of distributions m e t h o d to be described in Section 3.4 can be used to reduce the storage requirements. See Example 3.4.3 for a continuation of this example.

3.3.2. Inverse distribution functions: General case If F(x) is a distribution function, then the inverse distribution function is defined by F - l ( u ) = inf{x [ u ~< F(x)}. Suppose U has a uniform distribution over (0, 1). Using the same argument as given in Section 3.1, the variate X = F I(U) has F(x) as its distribution function. Approximations for selected inverse distribution functions may be found in Kennedy and Gentle (1980) and Thisted (1988). Hora (1983) and Ramberg et al. (1979) discuss some methods for estimating F -1 when data are available. EXAMPLE 3.3.6 (The exponential distribution). If Y has the standard exponential distribution EXP(1), then F(y) = 1 - e -y and F - l ( u ) = - I n ( 1 - u). Thus, Y can be generated by the transformation Y = - l n ( 1 - U). Further, 1 - U can be replaced by U, since U and 1 - U are identically distributed. Also, notice that X = / z Y = - / z In(U) has an exponential distribution with m e a n / z . The following algorithm summarizes the method of generating an exponential random variable with m e a n / z . ALGORITHM 3.3.5. Step 1. Generate U = u from U(0, 1). Step 2. Deliver X = -/~ ln(u). This algorithm involves evaluating the transcendental function In(u). Transcendental functions, in general, take comparatively more computer time than arithmetic operations.

672

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

EXAMPLE 3.3.7 (The WeibuU d&tribution). The Weibull distribution is a powerfunction transformation of the exponential distribution and has its distribution function given by F(x) = 1 - e -axb ,

x>0,

where a, b > 0 are parameters of the distribution. Setting U = 1 - e -ax~ and solving gives

Here again 1 - U can be replaced by U, giving X=

1 ] 1/b - a ln(U)] .

EXAMPLE 3.3.8 (The normal distribution). Using the inverse distribution function technique, a standard normal variate Z can be generated as Z = - I ( u ) where 1

z f_~ exp(-t2/2) dt

@(z) = ~

is the standard normal distribution function. Accurate routines for q~-1 should be available on most systems. If not, Abramowitz and Stegun (1972, p. 933) give some crude approximations. Blair et al. (1976), Guirguis (1991), Moshier (1989, p. 258), Strecok (1968), and Wichura (1988) give approximations accurate to 16+ digits. Also, see the discussion in Van der Laan and Temme (1984, Section 7.1). EXAMPLE 3.3.9 (The B o x - M u l l e r method, Box and Muller, 1958). The normalizing constant ~ = f-~o~e x p ( - t 2 / 2 ) d t for the standard normal distribution is usually obtained by combining the product of two such integrals into a bivariate integral and then transforming to polar coordinates. A similar idea can be employed to generate N(0, 1) observations. Suppose that X and Y are independent standard normal variables. Then the transformation defined by X = R cos O and Y = R sin O gives independent random variables R and O with distribution functions that have closed forms. Specifically, r

fRo(r, O) = ~-~

e_r2/2

where 0 < r < m and 0 < 9 < 2-rr. Clearly 8 is uniformly distributed on the interval (0, 2-rr); R has the Rayleigh distribution with FR(r) = Then

fo

FRI(U) = ~/--2

u e -u2/2 du = 1 - e -r2/2

0

< r<

ln(1 - u), 0 < u < 1. Replacing 1 - U with U and using the

Computergeneration of random variables

673

inverse distribution function technique results in generating R by R = ~ - - 2 In U. Note that this is a special case of the Weibull distribution as discussed in Example 3.3.7. A realization of O can be generated by O = 27rV, where V is a uniform variate independent of U. Then to get observations of X and Y we set X = R cos 0 = ~

In U cos(2"rrV),

Y = R sin 0 = V - 2 In U sin(2~rV). So X and Y are two (independent) observations from the desired normal distribution. There are faster methods for generating normal variables, and this method has been criticized because the generated uniform variates U and V are not truly independent. The serial dependence among the uniform random variables imparts dependence among the X and Y. For a discussion, see Golder and Settle (1976) and Neave (1973). Ripley (1987, pp. 56-58) plots pairs of points generated by this method. A modification of the Box-Muller method is called the polar method and will be discussed in Section 3.5 on rejection methods. EXAMPLE 3.3.10 (The Cauchy distribution). If X = Y~/Y2 where Y1 and Y2 have independent standard normal distributions, then X has the standard Cauchy distribution. If I11 and Y2 are obtained by the Box-Muller method of Example 3.3.9, then X is given by

X

X/Z--2In U sin(2"rrV) X/Z21n U cos(2"rrV) = tan(27rV).

Thus, only a single observation from the uniform distribution is needed to give an observation from the Cauchy. The inverse distribution function technique gives X = -cotan(wU) to be standard Cauchy. A glance at the graphs of the tangent and cotangent functions shows that the two methods are equivalent. EXAMPLE 3.3.11 (The chi-square distribution). In Example 3.3.1 we saw that a chi-square variate X with n degrees of freedom could be generated as the sum of squares of n independent standard normal random variables. If n = 2 and if the normals are generated by the Box-Muller method, then a simple calculation shows that X = - 2 1 n ( U ) . The relation with Example 3.3.6 should be apparent since a chi-square with two degrees of freedom is an exponential with mean equal to 2. To generate a random variable X having a chi-square distribution with 2n degrees of freedom, generate independent uniform random variables

674

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

U 1 , . . . , U n on the interval (0, 1) and set X = - 2 In U 1 . . . . .

2 In U n = - 2 ln(Ul" • -Un).

To generate X with 2n + 1 degrees of freedom we need to add the square of a standard normal r a n d o m variable. F r o m E x a m p l e 3.3.9, we can generate two independent uniforms Un+ 1 and U,+ 2 and add to X above the value of - 2 ln(U,+l) sinZ(ZTrUn+2) •

3.4. Mixtures method or composition method If F1, F2, . . . are distribution functions and ( p l , P2, • • .) is a probability vector, then

F(x) = Z piF~(x) is a distribution function known as the mixture distribution of the F i with the pi as mixing proportions. Often the F / a n d corresponding Pi are finite in n u m b e r . The following is an example of a two-fold mixture. EXAMPLE 3.4.1 (Mixture distribution). The height X of a person selected at r a n d o m from the workers at a certain manufacturing plant is measured. Suppose a fraction p of the workers are men and 1 - p are women. If Fm(X) is the height distribution (function) of men and Fw(x ) is that of w o m e n , then the overall height distribution is the mixture distribution Fx(x) = pFm(x) +

(1 - p)Fw(x)



Mixture distributions can arise naturally in applications because of a physical mixing of subpopulations, as illustrated in the preceding example. For purposes of r a n d o m variable generation, the decomposition into a mixture is m o r e often done on mathematical grounds. Suppose F(x)= E pfj(x) is a mixture decomposition and define a discrete r a n d o m variable I by P ( I = i ) = P i , i= 1, 2 , . . . . If an algorithm is available for generating I, and if realizations of each F~(x) can also be generated, then observations from the mixture F(x) can be obtained as follows:

Step 1. Generate an observation I = i. Step 2. Generate an observation x from F/(x). Step 3. Deliver X = x. EXAMPLE 3.4.2 (The normal distribution). Let X have a standard normal distribution with distribution function @. The distribution of IX I is known as the half-normal distribution. F r o m the symmetry of the normal distribution, it is clear that realizations of X can be obtained by randomly attaching signs to

Computer generationof random variables

675

realizations of [X[. This is actually a mixture generation method since

4~(x) = p[x < 0]~(x rx < 0) + P[X/> 0]~(x Ix/> 0), where ¢b(x ]X > 0) is the distribution function of IXI, while @(x IX < 0) is that of -IX]. It is clear that, in this case, P[X < 0] = PIXy> 0] = ½. EXAMPLE 3.4.3 (Modified table-lookup method). Example 3.3.5 considered the following approximation to the standard Poisson distribution:

x

fx(x)

Total

0

1

2

3

4

5

6

0.368

0.368

0.184

0.061

0.015

0.003

0.001

1.000

Adding the first digits gives 0.3 + 0.3 + 0.1 + 0.0 + 0.0 + 0.0 = 0.7; adding the second digits gives 0.06 + 0.06 + 0.08 + 0.06 + 0.01 + 0.00 + 0.00 = 0.27; adding the last digits gives 0.008 + 0.008 + 0.004 + 0.001 + 0.005 + 0.003 + 0.001 = 0.030. Now P [ X = o] = 3 ( 0 . 7 ) + ~7 (0.27) + ~ ( 0 . 0 3 ) ;

v[x =

11 = ~ ( 0 . 7 ) + ~7 (0.27) + ~ ( 0 . 0 3 ) ;

P [ X = 2] = 4 ( 0 . 7 ) + ~7 (0.27) + ~ ( 0 . 0 3 ) ; P [ X = 31 = ° ( 0 . 7 ) + ~7 (0.27) + ~ ( 0 . 0 3 ) ; P [ X = 41 = ° ( 0 . 7 ) + ~7 (0.27) + ~ ( 0 . 0 3 ) ; P [ X = 5] = ° ( 0 . 7 ) + ° ( 0 . 2 7 ) + ~ ( 0 . 0 3 ) ;

P[X = 61 = °(0.7) + ° ( 0 . 2 7 ) + ~(o.o3). That is, the distribution of X can be decomposed as the mixture

fx(X) = f~(x)(0.7) + fz(X)(0.27) + f3(x)(0.03), with fl(x), fz(X), f3(x) given by

x

f~(x) fz(x) f3(x)

Total

0

1

2

3

4

5

6

3/7 6/27 8/30

3/7 6/27 8/30

1/7 8/27 4/30

0 6/27 1/30

0 1/27 5/30

0 0 3/30

0 0 1/30

1 1 1

So 70 percent of the time we generate an observation from fl(x), 27 percent of

676

M. T. Boswell, S. D. Gore, G. P. Patil and C. TaiUie

the time from f2(x), and the other 3 percent of the time f r o m f3(x). E a c h of these distributions can be generated by a table-lookup procedure. Instead of the 1000 storage locations as necessary in E x a m p l e 3.3.5, this p r o c e d u r e requires only 7 + 27 + 30 = 64 locations. For other examples, the n u m b e r of locations will depend on the probabilities involved, but in any case should result in a large saving of storage space. The three tables can be c o m b i n e d into a single table with 64 entries: For fl(x) set t(1) = t(2) = t(3) = 0; t(4) = t(5) = t(6) = 1; and t(7) = 2. For fz(X) set t(8) . . . . . t(13) = 0; t(14) . . . . . t(19) = 1; t(20) . . . . . t(27) = 2; t(28) . . . . . t(33) = 3; and t(34) - 4. For f3(x) set t(35) . . . . . t(42) = 0; t(43) . . . . . t(50) = 1; t(51) = t(52) = t(53) = t(54) = 2; t(55) = 3; t(56) . . . . . t(60) = 4; t(61) = t(62) = t(63) = 5; and t(64) = 6. We need to select one of the first seven tabled values with probability 0.7, and conditional on this, the choice from a m o n g the seven should be uniform. Thus generate U from U(0, 1) and set I = [10U] + 1. N o w P[I ~< 7] = 0.7 and P[I = i [I ~<7] = ~ for i = 1 , . . . , 7. Thus, if I ~< 7 then deliver X = t(I). ( H e r e we have used the fact that the conditional distribution of a uniform r a n d o m variable restricted to a subset of its original sample space is still uniform.) If I > 7, then we have not delivered an observation of X. Using the same U generated above, let 1 = [100U] + 1. We know that I > 70. Thus, P[I ~< 97 11 > 70] = 0.27, and P[I - 63 = i ] I > 70] = ~ for i = 8 , . . . , 34. T h e result is summarized in the following algorithm. ALCORITHN 3.4.1. Step 1. Initialize tabled values t ( 1 ) , . . . , t(64) as indicated above. Step 2. G e n e r a t e U from U(0, 1). Step 3. Set I = [10U] + 1. Step 4. If I ~< 7 then deliver X = t(I). Step 5. Replace I by [100U] + 1. Step 6. If I ~< 97 then deliver X = t(I - 63). Step 7. Replace I by [1000U] + 1. Step 8. Deliver X = t(I - 936). We observe in Step 6 that I takes values 7 1 , . . . , 97 and I - 63 takes values 8 , . . . , 34 as necessary to look up the tabled values for f2. In Step 8, I takes values 9 7 1 , . . . , 1000 and I - 936 takes values 3 5 , . . . , 64 as required for f3. This algorithm is fast and requires relatively little storage space. Only one uniform r a n d o m variable and few other operations are necessary. Unfortunately, the required storage, as well as the details of the algorithm, do depend on the numerical probabilities as well as the n u m b e r of digits used. For application to other discrete distributions, see N o r m a n and C a n n o n (1972). An important case of the mixture m e t h o d occurs when the region below the density fx(X) of a continuous r a n d o m variable X can be partitioned into two subregions by a nonnegative function f(x) with O<-g(x)<~fx(x). (See Figure

Computer generation of random variables

(×)

677

"-,~ X

Fig. 3.4.1. The partitioning into two parts of the region below a density function function g(x).

fx(x) by a

define G(x)=f~_=g(u)du, then the area under g(x) is and g(x)/G(~) is a probability density function with corresponding distribution function G(x)/G(~)=F~(x), say. Also the area below fx(X)-g(x) is 1 - G(~); thus [fx(X)-g(x)]/[1- G(~)] is a density function with corresponding distribution function [Fx(x ) - G(x)]/[1 - G(oo)] = F2(x ), say. Thus 3.4.1.)

If we

limx_,=G(x)=G(~ )

F x ( x ) = p , F , ( x ) + p F (x) ,

where Pl = G(~) and P2 = 1 - G(~). This result is easily generalized by partitioning the region under fx(X) into an arbitrary n u m b e r of subregions. T h e curve bounding each region can be changed into a probability density by dividing by the area of the region. EXAMPLE 3.4.4 (The rectangle, wedge, tail method of Marsaglia). Suppose it is desired to generate observations from the standard exponential distribution with density fx(x)= e -x, x > 0. As shown in Figure 3.4.2 (ignore the dotted line), the region under fx(X) can be split into three pieces: a rectangular piece; a wedge piece above the rectangle; and the tail piece. The areas of the regions and the corresponding densities are: The rectangle. The region below y = e -1 from x = 0 to x = 1 has area Pl = e - l ; the density is fl(x) -- 1 if 0 < x < 1 and f l ( X ) = 0 otherwise. The wedge. The region below y = e -x and above y = e -1 f r o m x = 0 to x = 1 has area P2 = 1 - 2e-1; the density is f2(x) = [e -x - e-~]/[1 - 2e -1] if 0 < x < 1 and f2(x) = 0 otherwise. The tail. The region below y = e -~ for x t> 1 has area P3 e - t ; the density is =

fa(X) = e - X / e -1 = e - ( x - l ) i f x ~> 1 a n d f 3 ( x ) = 0 o t h e r w i s e .

678

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

%×)=Z ×

2g' ,,.,...,~(x) =(2_×)~-~

"""''"N

e-~

1 x Fig. 3.4.2. The partitioning of the region below the standard exponential density function.

The mixture decomposition then becomes

fx(x) =p, fl(x) + p2L(x) + p3L(x) , where Pl =P3 is approximately 0.368 and P2 is approximately 0.264. Notice that tail density f3(x) is simply a translate of the exponential distribution so that realizations of f3(x ) can be generated as 1 + Y where Y has a standard exponential distribution. (Y may be generated as - I n ( U ) ; see Example 3.3.6.) The method also requires that observations be generated from the wedge f2(x) about 26 percent of the time. However, the distribution function F2(x ) = [1 - e -x - x

e-1]/[1 -

2 e -11

cannot be inverted in closed form. A method of generating from the wedge will be given in Exanaple 3.4.6. This method can be extended by decomposing the region under fx into many rectangles, many wedges, and a tail. Because of the lack of memory property of the exponential distribution, the wedge distributions all have a distribution function analogous to F2(x). For details see MacLaren, Marsaglia and Bray (1964). Another modification is to partition off a triangular region from the wedge as indicated by the dotted line in Figure 3.4.2. The area of the triangle is maximized by taking the slope equal to f~(1) = - e -1. The area of the triangle is then P21 = e - 1/2 ~ 0.184 and the area of the new wedge is P22 = P2 - e - 1/2 0.08. The density corresponding to the triangle is fz~(X) = 2(1 - x ) if 0 < x < 1 and fz~(X) = 0 otherwise. Observations from this density can be obtained as the

Computer generation of random variables

679

minimum, min(U1, U2) , of two independent uniform r a n d o m variables on (0, 1). The (new) wedge has density fzz(X) = [e -x - (2 - x) e-1]/pz2 if 0 < x < 1 and f 2 2 ( X ) ~-- 0 otherwise. Since this distribution needs to be used only 8 percent of the time, a somewhat inefficient algorithm could be employed here without adversely affecting the overall performance of the method. EXAMPLE 3.4.5 ( R a n d o m s u m s as mixtures). A randomly stopped sum, or a random sum, is a random variable X which can be written as the sum of a random number N of random variables, say N

z=Exi, i=1

where N is a random variable assumed to be independent of all the X i. Typically, the X~ are themselves independent and identically distributed (iid). If N has the Poisson distribution and the X i are iid, then Z is said to be a Poisson sum of the Xi and Z is said to have a compound Poisson distribution. If the X~ are iid discrete random variables with density f ( x ) , then the distribution of Z is given by fz(Z) = e ( z ) . P [ N = 0] + P[X 1 = z]P[N = 1] +P[XI +Xz=z]P[N=2]+...,

z=0,1,2,...,

where e(z) = 1 if z = 0 and e(z) = 0 if z > 0. An analogous formula holds if the X~ are iid continuous random variables. That is, the distribution of Z is a mixture of an infinite number of distributions. To generate an observation of Z we first generate an observation of N = n and then generate n observations from the common distribution of the X i. Finally, we add these n observations to get Z. In specific cases, there my be efficient methods for generating the convolutions X1, X 1 + X2, X 1 + X z + X3, . . . directly. For example, if N is P(A1) and each X~ is P(A2) , then Z has the Thomas distribution with parameters & and A2. Example 3.3.3 can be modified to generate an observation N = n from P ( & ) and then to generate Z = X I + . . . + X n from P(nA2) , the desired observation from the Thomas distribution. EXAMPLE 3.4.6 ( R a n d o m m i n i m u m s , continuation o f E x a m p l e 3.4.4). Instead of adding the random variables X 1 , . . . , X u as in Example 3.4.5, we can take their minimum. Let the X / b e iid random variables with distribution function F(x), and let N be a positive integer-valued random variable independent of

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

680

XN} is given by

the X i. T h e n the distribution of Z = m i n { X 1 , . . . , ce

Fz(z ) = 1 - P [ Z > z] = 1 - ~ P [ m i n { X 1. . . .

, Xn} > z ] P [ N = n]

n=l c~

= 1- ~

P[X 1> z ....

, X , > z ] P [ N = n]

n=l c~

[1 - F(z)]np[N = n].

=ln=l

In the special case where the X i are u n i f o r m on (0, 1), we have F ( x ) = x , 0
Fz(z ) = 1 - E ( 1 - z ) n P [ N = n ] ,

0
n=l

N o w suppose that N has the Poisson distribution, with m e a n k, t r u n c a t e d to values 2, 3 , . . . . T h e n An

a n e-a/n! P [ N = n] = 1 - e - a ( 1 + k)

n=2,3,...

m

n!(e a -

1 -

k)

'

and oo

Fz(z ) = 1 - ~ n=2

[k(1 - z)l" 1 n! (e* -- 1 -- k)

= 1 + [1 + k(1 -- z) -- ea(1-z)]/(ea -- 1 -- k ) ,

0
T h e density of Z is given by

fz(Z) =

- k ( 1 - e a(1-z)) eX_l_k ,

0
Recall that in E x a m p l e 3.4.4 we w a n t e d to g e n e r a t e an o b s e r v a t i o n f r o m a w e d g e distribution with density

f(x)-

e -x - e -1 1-2e- 1

e 1-x - 1 e-------2-'

0
We n o w see that this is exactly the distribution of Z w h e n k = 1. T h u s , to g e n e r a t e observations f r o m f(x), we m a y g e n e r a t e an o b s e r v a t i o n N = n f r o m the Poisson distribution with unit m e a n truncated to 2, 3 , . . . , and then g e n e r a t e U 1 , . . . , U n f r o m U ( 0 , 1). Finally, we deliver X = m i n { U 1 , . . . , Un}. E x a m p l e 3.3.3 can easily be modified to g e n e r a t e the t r u n c a t e d Poisson distribution, or the truncated distribution can be o b t a i n e d by the rejection technique discussed in Section 3.5. Also notice that if the Poisson is t r u n c a t e d to {1,2, 3 . . . . } then realizations of X c o r r e s p o n d i n g to N = 1 yield the rectangle part of Marsaglia's decomposition. See E x a m p l e 3.11.1 for a n o t h e r application of r a n d o m minimums.

Computer generation of random variables

681

There are two reasons why the mixture of distributions technique is useful. The first is that mixtures sometimes occur in a natural manner and the method permits observations to be generated from the desired distribution. The second is to provide a fast routine. The decomposition in this case is often arbitrary and is made up of two general types of distributions, those for which fast generating routines are available, and these with only slow routines. In order for the overall method to be efficient, the slow routines should not be called very often. In the rectangle (triangle), wedge, tail method, the rectangle parts are generated as linear functions of uniform variates and are very fast. The triangle parts, if any, are generated as a linear functions of the minimum of two uniforms and are also fast. The wedge and tail distributions are usually complicated and the generating routines can be slow. Therefore, the region below the density should be divided into enough rectangles/triangles to make the wedge and tail parts occur infrequently. Although it is usually desirable to make the area of the rectangle as large as possible, smaller rectangles are sometimes used in order to improve the rest of the decomposition or to simplify other aspects of the implementation. The stochastic selection from among the various subregions is often made by comparing a uniform random variable with a set of 'cut off' numbers determined by the mixing probabilities. These comparisons are slow and a table-lookup procedure can sometimes improve the overall efficiency. Examples are given by Marsaglia and Tsang (1984) for distributions with decreasing density functions or with symmetric unimodal density functions. In order to generate observations from the Poisson distribution, Kemp and Kemp (1990) use the inverse distribution function technique but first decompose the Poisson distribution into three pieces.

3.5. Rejection method or envelope rejection method The odd-man-out example from elementary probability courses illustrates this method. Three persons go to a restaurant for coffee. It is decided one person will pay for all three. To decide who will pay, each person tosses a coin and if the result is two heads and one tail, or vice versa, then the 'odd-man' pays. Otherwise, the experiment is repeated. Suppose we want to generate an observation X from a distribution Fx(x), but we have only a method for generating observations Y from a distribution Fv(y ). It seems reasonable, if the two distributions are not too different, to correct the distribution of Y by sometimes rejecting Y where Y tends to occur more frequently than X. The relative frequency would be lowered in such regions and raised elsewhere. If Y -- y is the observation generated from Fv(y), then we will either 'accept' this value and deliver X = y with a suitable probability a(y), say, or we will 'reject' the value and start over by generating a new observation on Y. For this reason, the method is often called the acceptance-rejection technique. Of

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

682

course, the hard part is choosing a(y) so that the accepted observations follow the desired distribution Fx(x ). The number of trials needed to achieve an acceptance has the Pascal distribution given by P[No. of t r i a l s = n ] = [ 1 - P(A)] n - I P ( A ) ,

n = 1,2,...,

where A is the event {a trial is a success (accept y)}. The expected n u m b e r of trials is 1/P(A). In order to determine the correct form for a(y), we note that the event { X ~ x } can be partitioned into the disjoint union of events En, n = 0, 1 , 2 , . . . , where E n = {n rejections, then Y ~
Fx(x ) = P[X ~
= P[Y<~x and accept Y]/P(A). Now, P[Y<~x and accept Y]=fXo~a(y) dFr(y). differencing in the discrete case, we obtain

After differentiating, or

fx(x) = a(x)fr(X ) / P(A) so that a(x) =

fx(x) P(A)

That is, the acceptance probability for Y = x is proportional to the likelihood ratio of the two distributions. If we choose a(x) according to a(x) = Cfx(X )/fr(X) for a specified constant C, then it will automatically happen that C - - P ( A ) , because P(A) = E[a(Y)]. For reasons of efficiency, we would like a(x), and hence C, to be as large as possible, subject to the constraint a(x)~< 1. The optimal value of C is then Copt = inf{ fr(X)/fx(X)}, where the infinum is taken over all x in the support of X, i.e., over those x such that fx(X) ¢ O. Notice that if there are x such that fx(X) ¢ 0 but fy(x) = O, then it is not possible to generate X = x by rejection on Y, and the above formula gives C o p t = 0 confirming that the method cannot be applied. A suboptimal value of C may be used in situations where exact calculation o f Copt is inconvenient. The actual acceptance/rejection decision requires a separate r a n d o m experiment. Typically, a uniform random variable U is generated and the observed value Y = y is accepted if U ~< a(y). The overall procedure, then, is to generate observations Y = y and U = u from fr(Y) and from U(O, 1), respectively,

Computer generation of random variables

683

calculate a(y) and deliver X = y if u ~< a ( y ) ; otherwise, repeat the process by generating new observations of Y and U. EXAMPLE 3.5.1 (The exponential distribution truncated to (0, 1)). It is clear how we would proceed to generate an exponential r a n d o m variable truncated to take values in (0, 1) from an untruncated exponential r a n d o m variable. We now apply the above theory to formalize the result. Let fx(X) = e - * / ( 1 - e -1) if 0 < x < 1 and fx(X) = 0, otherwise. Let fy(y) = e -y if y > 0 and fy(y) = O, otherwise. Then fy(X)/fx(X) = 1 - e - 1 if 0 < x < 1 and is infinite if x >/1. The ratio is undefined if x ~< 0; however, this is of no consequence because such values are outside the support of X and Y. Now Copt = P ( A ) = inf,>o{fy(X)/ f x ( x ) } = l - e -1, and a ( x ) = 1 if 0 < x < l and a ( x ) = 0 , otherwise. Thus, the acceptance rate is 100 percent for 0 < Y < 1, and zero otherwise. T h e procedure is summarized in the following algorithm with Y being generated as in E x a m p l e 3.3.6. ALGORITHM 3.5.1. Step 1. G e n e r a t e U = u from U(0, 1). Step 2. If u > e -1 then deliver X = - l n ( u ) . Step 3. Go to Step 1. For this procedure, the expected n u m b e r of trials per delivered observation is 1/[P(A) = 1 - e -1] = 1.58. But note that the logarithm is only calculated on the final trial that leads to acceptance. EXAMPLE 3.5.2 (The half-normal distribution). Suppose we wish to generate an observation X from the half-normal distribution with density '~ ~

fx(X) = v ~ - e

--x2/2

if x > 0 ,

using rejection on a standard exponential r a n d o m variable Y with density

fy(y) = e -y

if y > 0.

Then Cop t =

.

P(A) = infx>o{fr(X ) Ifx(X)} = lnfx>o{%/~72 e

x2/2--x

}

= V'rr/2e ~ 0.76, and

a(x) = [fx(X) Ifr(x)]P(A) =

e-(X-1)2/2 .

In E x a m p l e 3.5.1, the acceptance probability is either 0 or 1; thus, a separate uniform variate is not needed to decide if the observation should be accepted or not. H e r e , however, we need to accept with probability a(x), so we generate

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

684

a uniform random variable U = u on (0, 1) and accept X = y if u <- a(y). This is equivalent to accepting if U

~
or

--In(u)/> ( y - - 1)2/2.

Since, by Example 3.3.6, - I n ( U ) has the standard exponential distribution, we use the exponential generator a second time to make the decision to accept Y or to start over. The procedure is summarized in the following algorithm. This method occurs in Butcher (1960) and Kahn (1954). ALCOIUTnM 3.5.2. Step 1. Generate two standard exponential variables Ya = Yl and Y2 = Y2. Step 2. If Yl > (Y2 - 1) 2/2, then deliver X = Y2. Step 3. Go to Step 1. The expected number of trials per delivered observation is 1 / P ( A ) = ~ 1.32.

3.5.1. Use of non-uniform variates to accept or reject The steps given below outline a procedure in which the acceptance/rejection decision is based upon a continuous non-uniform variate Z. As before, the variable of interest is X for which observations are generated by accepting or rejecting realizations of Y. The function t(.) occurring in Step 3 is to be chosen so that the delivered values have the prescribed distribution F x. The choice of t(.) is discussed below. Step Step Step Step Step

1. 2. 3. 4. 5.

Generate an observation Y = y with density fy(y). Generate an observation Z = z with distribution function Fz(z ). Calculate t(y) where t is a suitably chosen function. If z <- t(y) then deliver X = y. Go to Step 1.

The probability of accepting a value Y = y is P[Z ~ t(y)] = Vz(t(y)) = a(y) = P ( A ) f x ( y ) / f v ( Y ) . Consequently,

t(x) = F za(p(A)fx(x) /fv(x)) =- V zl(a(x)) . The optimum procedure occurs, as before, when P ( A ) = inf{fv(x)/fx(X)}. Using this expression for t(.), we see that the acceptance decision in Step 4 is equivalent to

Fz(z) <~P(A)fx(y)/fy(y) =- a(y) . Since Fz(Z ) is distributed as U(0, 1), the method is not fundamentally different from making the decision on the basis of a uniform variate. It is simply a matter of computational convenience and feasibility.

Computer generation of random variables

685

EXAMPLE 3.5.2 (Revisited). Recall that we wished to generate an observation from the half-normal density fx()X = 2X/~-~e -x2/2 using observations Y = y from the exponential density fy(y) = e -y. We calculated a(x) = e -(x-1)2/2 and ended up using a second observation Z = z from the exponential distribution in order to decide whether to accept Y = y. With the above notation, we find that

X=x

t(x) = F zl(a(x)) = - I n ( 1 - a(x)) = ln[1-e-(X-1)2/2],

x/>0.

This function t(x) will work, but it is not very nice. Now there is no reason why we have to use the left tail of the distribution of Z as the acceptance region. If the right tail is used instead, then Step 4 of the above algorithm becomes:

Step 4'. If z > t-(y) then deliver X = y, where t-(x)= F z l ( 1 - a(x)). For the present example, this results in t-(x) = --ln(e -(x-1)2/2) = (x -- 1)2/2, in agreement with Algorithm 3.5.2. An important special case of the rejection m e t h o d occurs when the support of X is a subset of the unit interval and Y is taken to have a uniform distribution on (0, 1). Then fy(y) = 1 if 0 < y < 1, P(A) = info
(The exponential distribution truncated to (0, 1)). Let fx(X) = e - X / ( 1 - e -1) if 0 < x < l , and fx(x)=O otherwise. Then P ( A ) = inf0
3.5.2. A geometric interpretation When observations X = x are to be generated by accepting or rejecting observations Y = y from the uniform distribution then the acceptance probabilities a(y) =fx(y)P(A) must be less than or equal to one. So P(A) is chosen as large as possible to make fx(x)P(A) ~< 1. This results in scaling fx(X) down so that it is contained entirely within the unit square as shown in Figure 3.5.1. If the acceptance/rejection decision is based on an observation U = u from the uniform distribution, then the pair (y, u) is a point chosen at random from the unit square and if (y, u) falls in the region below the graph of a(x), then X = y is delivered. This technique is sometimes called the envelope-rejection method. An envelope function g(y) is found that 'encases' or bounds the density function from above. When g(y) is divided by the area under g(y), it becomes the density function of Y.

686

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

1

0 0 1(

Fig. 3.5.1. Scaling an acceptance probability P(A) by multiplication of the density function fx(X) to fit within a unit square.

3.5.3. Bounds for the rejection method, or rejection with squeezing Supposing an observation Y = y is generated and is to be accepted if an independent r a n d o m variable Z satisfies Z <-t(y), as in Section 3.5.1. Often the function t(y) is slow to evaluate. If a simple bound s(y) <~t(y) exists then one can check to see if Z<~s(y) in which we accept and deliver X = y ; otherwise, we need check to see if Z <~t(y). This squeeze m e t h o d is very useful if the event {Z ~
a(v) = [fx(y(v))/fr(y(v))lP(A) . For a < l

andfi
P(A) =O
-

~#r(~)r(~) F(a + fl)

Computer generation of random variables

687

and vl/_~ a(v)= [l_(l_vl/~)l/,~]

.] 1-~ .

Boswell and DeAngelis (1981) give the bounds

ol 1-~ [ a + ( 1 - a ) v ] a-t3 then u>a(v) and Y is rejected. If al-t3 a + (1 - a)v, then go to Step 2. Step 6. Set vt~ = v 1/~ and y = (1 - v¢) TM. Step 7. If (1 - y)u¢ <~vt~ then deliver X = y. Step 8. Go to Step 2. See Ahrens and Dieter (1980), Atkinson (1982), Devroye (1984), Kachitvichyanukul and Schmeiser (1988), and Tadikamalla (1978a,b) for other examples using the rejection technique. See Devroye (1981) for an example involving the Kolmogorov-Smirnov distribution; Schmeiser and Lal (1980a) for the gamma distribution; Schmeiser and Shalaby (1980) for the beta distribution; and Schmeiser (1980a) for the distribution from the tails of the normal, gamma, Weibull and beta distributions.

3.5.4. Band rejection Payne (I977) introduced this technique to improve the efficiency of the rejection methods in the case of bounded random variables. His suggestion not only reduces the average number of trials before delivery but also reduces the variability in the number of trials. The idea is to recycle rejected values by using them to generate symmetrical values of the random variable of interest. Suppose the variable of interest X has density fx(x) and bounded support (0, a). Let Y be the variable whose realizations will be accepted or rejected. ALGORITHM 3.5.4. Step 1. Calculate P(A) = info
688

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

Step 4. If 1 - u
Step Step Step Step

1. 2. 3. 4.

Generate U = u and V = v from U(0, 1) and put y = 2v. If y2 < 1 - 2 ln(2u), then deliver X = y. If (2 - y)2 < 1 - 2 ln(2(1 - u)), then deliver X = 2 - y. Go to Step 1.

This band rejection algorithm has an acceptance probability of 2 P ( A ) = 0.9862.

3.6. Alias method The alias method, devised by Walker (1974, 1977), is an efficient m e t h o d of generating observations from discrete distributions with finite support. Without loss of generality, let X be a random variable taking values 1, 2 , . . . , n with respective probabilities Pl, P : , . . . , Pn. For i = 1, 2 , . . . , n, let ai, called an alias value, be an integer from 1 to n. The alias m e t h o d first selects an integer I = i uniformly from { 1 , . . . , n} and then either delivers X = i with a suitable probability or else delivers the alias value. The alias method ensures that an observation is delivered at each trial. This method bears some formal similarity to band rejection except that band rejection may require several trials to deliver an observation. The alias method is implemented in two stages. The first stage sets up the alias values and the acceptance probabilities. This stage is based on a theorem

Computer generation of random variables

689

(Kronmal and Peterson, 1979) that any discrete distribution with a finite support can be expressed as an equiprobable mixture of (possibly degenerate) two-point distributions. Let fn(X) denote a probability mass function f,(x) = p,, x=l,2,...,n. Then for i = l , 2 , . . . , n , and for ai in { 1 , 2 , . . . , n } , there exist functions

!i gi(x) =

if x = i, - qi if x = ai , otherwise,

such that 0 ~< qi ~< 1 and f,(x) = ( l / n ) ~in=l gi(x), The qi are called cut-offs. The following algorithm describes the procedure for setting up the cut-offs and aliases needed for implementing the alias method. The input to the algorithm is the probability function Px, x = 1 , . . . , n. Also G and S are two sets of integers that are used for deciding when to terminate the algorithm. The letters 'G' and 'S' stand for 'greater' and 'smaller' respectively. The integers x E G are characterized by the relation qx > 1. ALGORITHM 3.6.1. Step 1. Initialize G and S as the empty sets. Step 2. For i = 1 , 2 , . . . , n set qi =nPi and a i = 1. If qi~
xESUG

qx=n-l,

where 1 is the number of times that Step 9 has been executed. Certainly this relationship is true initially (since 2 P x - - 1 ) . But the right-hand side is decremented at each execution of Step 9 and, from Steps 6 and 8, we see that the left-hand side is also changed by ( - 1 + q j ) - qj = - 1 during each iteration of the loop. Let IAI stand for the number of elements in a set A. If S were ever empty in Step 4, then we would have

IGI=ISUGI=n-I since qx > 1 for all x E G.

= ~

xCG

Qx>IGI

690

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

To see why Algorithm 3.6.1 works, observe that

npx=q~+

~] ( 1 - q i ) ,

x=l,2,...,n.

i: aj=x

This relationship holds true initially (a slight argument is required here) and, from Step 6, is seen to hold after each iteration of the loop. When the algorithm terminates, each q~ ~< 1 and the above relationship is the defining relation for the alias values a~ and the cut-offs qx. The second stage of the procedure is quite simple and is given in the following algorithm. AL6ORITHM 3.6.2 (The alias method). Step 1. Generate U = u and V = v from U(0, 1). Step 2. Let i = [nv] + 1. Step 3. If u <~ qi then deliver X = i, else deliver X = a~. In this method, the probabilities qi and the alias value a;, i = 1, 2 , . . . , n, are tabled. To generate one observation of the random variable X, it is necessary to look up qi and sometimes a i. Either one or two table-accesses are required. The expected number of table accesses is l+--

1

ni=l

( 1 - qi)

One could attempt to choose the q so as to minimize this expression. H o w e v e r , table lookups are 'cheap' enough that the effort is seldom justified. Ahrens and Dieter (1982) generate observations from a Poisson distribution using a combination of the alias method and a rejection technique. Schmeiser (1990) points out that index tables (Chen and Asau, 1977; Fishman and Moore, 1984) can be more efficient and easier to set up than the alias method.

3.7. Alias urn method Peterson and Kronmal (1982) suggest the following generalization of the alias method which reduces the number of table accesses: Consider the probability distribution on n points 1, 2 , . . . , n with respective probabilities Pl, P2, • • •, Pn and extend this to a distribution on 1, 2 , . . . , n* (where n* > n ) by putting Pi = 0 for i = n + 1 . . . . . n*. However, the cut-offs are determined, it is clear that qi must vanish for i > n, and for these values of i, only one table access is necessary (namely, look up ai). The method is summarized in the following algorithm. ALGORITHM 3.7.1. Step 1. Generate X = x uniformly on 1 , . . . , n*. Step 2. If x > n then deliver X = a x. Step 3. Generate U = u uniformly on (0, 1).

Computer generationof random variables

691

Step 4. If u <~qx then deliver X = x else deliver X = a x. In this method, the expected number of table accesses is 1+~-

( 1 - q i ) ~ < 1 + ~ -, i=1

which can be made to approach unity by taking n* large. Compared with the alias method, this technique requires a second comparison (in Step 2), but this comparison will be fast since it involves integers. With probability 1 - n/n*, the alias-urn method avoids the generation of U and the floating point comparisons in Steps 3 and 4. The principal disadvantages are the heavier storage demands and the possibly more time-consuming setup and initialization. However, it is possible to develop a streamlined version of the setup routine especially adapted for the alias-urn method.

3.8. Acceptance-complement method The acceptance-complement method is the analogue of the alias method for sampling from continuous distributions (Kronmal and Peterson, 1981). Suppose observations from fx(X) are desired and the density can be decomposed into fx(X) =pfl(x) + (1 -p)f2(x), 0 ~


fl(x) = min{fx(X ), fy(X)} /p , where p is the normalizing constant needed to make fl(x) a density function. The density f2(x) is then given by

f2(x) = [fx(X) - fy(x)] ÷/(1 - p ) . The only computational difficulty in applying this method could be in

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

692

determining the set on which fx(X)>fr(x). Some acceleration and simplification has been suggested in certain special cases. For more details on this acceleration and other generalizations, see the discussion in Deak (1981, 1986), Devroye (1986), Kachitvichyanukul and Schmeiser (1985), and Stadlober (1981).

3.9. Mixture rejection method The mixture rejection method, sometimes called the composition-rejection technique, is written up in almost every paper that uses it. The writeups can be so different that it sometimes appears that the authors are describing different methods. Suppose the density of interest, fx(X), can be written as a mixture fx(X) = P l f l ( X )

+ Pzf2(x) + . . .

+ Pnfn(X) .

Then X can be generated by the mixture method, and the needed observations from f ( y ) can be generated by applying the rejection technique to observations from fvi(Y) with acceptance probabilities ai(y ) and P(Ai). Then

fi(x) = f r i ( X ) a i ( x ) / P ( A i ) ,

i =

1,..., n,

and

PlfYl(X)al(x) fx(X) --

P(A1 )

Pzfr2(X)a2(x) +

P(A2 )

+"" +

Pnfy,(X)an(x) p(An )

EXAMPLE 3.9.1 (The half-normal distribution). Butcher (1960) decomposes the half-normal distribution into a mixture of the standard normal distribution truncated to (0, 1) and the standard normal distribution truncated to (1, ~). The mixing probabilities are p = 2P[0 < Z < 1] and 1 - p . The (0, 1) truncated normal is generated by applying the rejection technique to the standard uniform distribution, and the (1, ~) truncated normal is generated by rejecting on Y = 1 + T where T - EXP(0.5). As in Example 3.3.6, Y can be generated as Y = 1 - (In V)/2 where V is uniform on (0, 1). The following algorithm uses a uniform variable W to adjudicate between the two components of the mixture. ALGORITHM 3.9.1. Step 1. Set p = 2P[0 < Z < 1] = 0.683. Step 2. Generate W = w from U(0, 1). Step 3. If w ~


Computer generation of random variables

693

Step 8. Generate U = u and V = v from U(0, 1). Step 9. If u ~ e-vZ/z, then deliver X = v. Step 10. Go to Step 8. When Butcher presented the above decomposition, his routine branched in Steps 7 and 10 to Step 2. That is, a rejection was done first and then a mixture was done. This was incorrect for the problem at hand, but has the potential of being developed into a new technique. We also remark that the acceptance rate for generating the (1, o~) truncated normal can be maximized by taking T EXP(Ix) w h e r e / x = 2/(1 + X/5) = 0.62. Steps 5 and 6 should then be replaced by:

Step 5': Set y = 1 - ~ ln(v). Step 6'. If u ~< e x p { - ( y - 1//z)2/2], then deliver X = y . For the general mixture-rejection method, it is clear that methods other than rejection could be used for some of the components of the mixture.

3.10. Exact-approximation method 3.10.1. The basic method The random variable of interest X can be represented as X = F x a ( U ) where U is uniform on (0, 1). In the exact-approximation method, which is due to Marsaglia (1984), the transformation F x 1 is replaced by an approximating function g and, in order to adjust for the approximation, U is replaced by a suitably chosen variate V whose distribution will depend upon g. Making a change of variable, we see that V must have density

fv(v) =fx[g(v)]g'(v),

O < v < 1,

in order that X = g(V) have the desired distribution. The m e t h o d requires that fv be bounded away from zero on the unit interval. A value of p is then chosen to satisfy 0 < p •Popt =--inf{fv(V): 0 < v < 1} , and the density of V is decomposed into the following mixture of a uniform density and a residual density:

fv(v) = p + (1 - p){fx[g(v)]g'(v) - p }/(1 - p ) . The optimal choice of p is Popt" Examples 3.10.1 and 3.10.2 illustrate ways of finding g and p. Once suitable g and p have been determined, the m e t h o d is summarized by the following algorithm. Note that the algorithm assumes that a method is available for generating observations from the residual density. ALGORITHM 3.10.1.

694

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

Step 1. Generate a uniform random variable U. Step 2. If U ~


F(t) = [ 1 - exp(-t2/c)] 1/2 , t>~O, where c is a suitable constant, so that

g(v) = F-I(v) = [ - c ln(1

- v 2 ) ] 1/2 ,

0 < o

< 1.

If we let ~ be the standard normal density function, then the density of V is given by

fv(V) = 2~o[g(v)]g'(v) =(~/2)-l/2cv(1-vZ)C/z-1/g(v),

0
Polya takes c = ~/2 = 1.57; but Marsaglia chooses c to maximize Popt and finds that c = 1.55 and p = Popt = 0.985. Thus, observations from the residual density are needed only 1.5 percent of the time.

3.10.2: The general exact-approximation method The basic method described in the preceding section can be generalized by (i) allowing the transformation g to be defined on an arbitrary (possibly unbounded) interval and, correspondingly, dropping the requirement that V have the unit interval for its support, and (ii) replacing the uniform density in the mixture decomposition of fv by some other density fw for which observations are easy to generate. As before, fv(v)=fx(g(v))g'(v), but now the ratio fv(v)/fw(V), v E support(V), is required to be bounded away from zero, and a value o f p is chosen so that 0 < p ~ P o p t ~ inf{fv(v)/fw(V): v ~ support(V)}. Then the density of V may be decomposed as

fv(V) = pfw(V) + (1 --p)fR(V), where fR is the 'residual' density. The general algorithm is given below. In order for the method to be reasonably efficient, either p should be close to unity or observations from the residual density should be easy to generate. ALGORITHM 3 . 1 0 . 2 .

Step 1. Generate U from U(0, 1). Step 2. If U ~
Computergenerationof random variables

695

Step 3. Generate V from fR and deliver X = g(V). EXAMPLE 3.10.2 (The t-distribution). In order to generate an observation X = x from the t-distribution with n degrees of freedom, Marsaglia (1984) considers the cubic transformation

x=g(v)=v +(v +v3)/4n,

-~
When the degrees of freedom are large, X is approximately standard normal and g(v)-~ v, so that V is also approximately standard normal. This suggests that we take the density fw to be the standard normal density function q~. In fact, Marsaglia shows that a decomposition

fv(V) =pq~(v) + (1 -p)fR(V) is possible provided p ~< 1 - 0.176/n.

3.11. Waiting time method The rejection method (discussed in Section 3.5) consists of generating r a n d o m variables Y1, Y2,. • •, Yn until an acceptable value is obtained; then X = Yn is delivered. The waiting time method is a generalization where sampling continues until some specified event occurs, and then X = g(Y~,..., Yn) is delivered where g is a suitably chosen transformation. EXAMPLE 3.11.1 (The exponential distribution). Marsaglia (1961) generates an observation X from the standard exponential distribution by decompositing X = Y + Z into the integer part Y = IX] and the fractional part Z. Now X can be looked upon as the waiting time to the first event in a Poisson process with intensity parameter A = 1 (the intensity is the mean number of events per unit time). If we watch the process through the consecutive time intervals, 0 ~< t < 1, 1 ~< t < 2 , . . . , i ~< t < i + 1, then Y (the integral part of the waiting time) is the left-hand endpoint of the first interval in which an event occurs. Since the Poisson process has independent increments, Y follows a geometric distribution with parameter p = 1 - e -1. Marsaglia generates Y by a table-lookup method. Also, by independent increments Y and Z are independent. If Y = i, then the first event occurs in the interval i ~< t < i + 1 and, given the n u m b e r N of events in this interval, their locations are uniformly distributed within the interval. Since Z is the increment to the first event, it can be generated as the random minimum (see Example 3.4.6), Z = m i n { U 1 , . . . , UN}, where N has a standard Poisson distribution, truncated to take values 1, 2 , . . . . The following algorithm implements this procedure. Section 3.12 gives a m e t h o d for generating Z directly. ALGORITHM 3.11.1. Step 1. Initialize y = 0.

696

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

Step 2. Generate X = x from the standard Poisson distribution. Step 3. If x = 0, increment y and go to Step 2. Step 4. Generate Ua = u~, . . . , Ux = ux from U(0, 1) and set

z =min{ul,...,

u~}.

Step 5. Deliver X = y + z.

Using the fact that the waiting time to the k-th event has a gamma distribution with k as shape parameter, Marsaglia modifies this algorithm to generate gamma variates with integral shape parameters. 3.11.1. Forsythe's m e t h o d

Forsythe (1972) gives a method for generating realizations of bounded random variables whose density has the form f x ( X ) = c e -g(x) where a < x < b and 0 <~g(x) ~< 1. Here a and b must be finite. The method is a rejection technique that uses a waiting time random variable to decide whether to reject or accept. Let U0, U1, U 2 , . . . be a sequence of U(0, 1) variates. Then the waiting time variable is the first subscript k for which monotonicity is violated in the following sequence: g(a + (b - a)Uo) >! UI >! U 2 > ! . . . . The m e t h o d is summarized in the following algorithm. ALaORITHM 3.11.2. Step 1. Generate W = w from U(0, 1) and initialize k = 1. Step 2. Set x = a + (b - a)w and calculate g = g(x). Step 3. Generate U = u from U(0, 1). Step 4. If g/> u then put g = u, increment k, and go to Step 3. Step 5. If k is odd, then deliver X = x. Step 6. Go to Step 1. The distribution of the delivered X is fx(X) = e - g ( x ) / p ( A ) , a <-x < b, where P(A) = fab e -g~x) dx is the normalizing constant (see Section 3.5). The e v e n / o d d bookkeeping in this algorithm can be avoided if the uniforms in Step 3 are generated in pairs (U, V). Then k is odd for U and even for V. The revised algorithm is as follows: ALGORITHM 3.11.3. Step 1. Generate W = w from U(0, 1). Step 2. Set x = a + (b - a)w and calculate g = g(x). Step 3. Generate U = u from U(0, 1). Step 4. If g < u, then deliver X = x. (At this point we will not actually put g = u; but the reader should mentally substitute g for u in the comparison in Step 6.) Step 5. Generate V = v from U(0, 1). Step 6. If u / > v , then put g = v and go to Step 3. Step 7. Go to Step 1.

Computer generation of random variables

697

Forsythe's method has two limitations: (i) X is required to be bounded, and (ii) the requirement that 0 <~g(x)<~ 1 means that fx(X), a < x < b, must be bounded away from zero and that

1 <- sup{fx(X): a < x < b} inf{fx(x): a < x < b} ~< e = 2.7, thereby limiting the possible variability in the density fx(x). Either limitation can sometimes be overcome by partitioning the support of X into sufficiently small subintervals and employing what might be termed the 'mixture-Forsythe' method. This is illustrated in the following two examples from Forsythe (1972). See also Ahrens and Dieter (1973) and Dagpunar (1990). EXAMPLE 3.11.2 (The standard exponential distribution). The mixture-Forsythe method is applied by subdividing the positive real axis into subintervals of unit length. A simplification is possible since the standard exponential truncated to [i, i + 1] is a translate of the standard exponential truncated to [0, 1). Thus, as before, we decompose the variable of interest X = Y + Z into its integer part Y = [X] and its fractional part Z. Forsythe's m e t h o d is used to generate Z. Further, we recall that Y and Z are independent and that Y has a geometric distribution with parameter p = 1 - e -1. But the number of rejections in generating Z by Forsythe's method is also geometric with parameter P ( A ) = f~ e-Xdx = 1 - e -1 and is clearly independent of Z. Thus, we obtain the desired X by tallying the number of rejections that occur in generating Z and adding this quantity to the delivered Z. This procedure is summarized by the following algorithm, adapted from Ahrens and Dieter (1972). Compare with Algorithm 3.11.3. ALGORITHM 3.11.4. Step 1. Initialize y = 0. Step 2. Generate W = w from U(0, 1) and set g = z = w. Step 3. Generate U = u from U(0, 1). Step 4. If g < u then deliver X = y + z. Step 5. Generate V = v from U(0, 1). Step 6. If u I> v, then put g = v and go to Step 3. Step 7. Increment y and to to Step 2. EXAMPLE 3.11.3 (The half-normal distribution). Forsythe's method is applied with the positive real axis partitioned into subintervals with 0 = a 0 < a 1 < a 2 < • .. as partition points. When truncated to the interval I k = [ak, ak+l), the half-normal has its density function proportional to e x p [ - ( x 2 - a~)/2]. Setting g(x) = (x 2 - a2)/2, the requirement that 0 <-g(x)~< 1 for x E 1 k implies that 2

2

ak+ 1 ~ 2 + a k . One possible choice (Forsythe, 1972) is to take the above inequality to be an

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

698

equality, which gives

0,1, as the sequence of partition points. Ahrens and Dieter (1973) examine a partitioning in which P[Ik] = 1/2 ~+1 under the half-normal distribution.

3.12. Ratio-of-uniforms method The idea behind this method, due to Kinderman and Monahan (1977), is to generate the nonnegative variable of interest X by the relation

X=V/U, where U and V are the coordinates of a point chosen at random (i.e., uniformly) from a suitable planar region R. The area of R must be finite. If R is in fact bounded, then the required point (U, V) is easily generated by enclosing R in a rectangle, using a uniform generator to randomly select a point from the rectangle, and delivering that point if it falls within R and rejecting that point otherwise. Some other method of generation must be used when R is unbounded (but of finite area); see Section 3.12.1 below. Now suppose the region has the form R = {(u, v): 0 < u, 0 < v, and u < g(v/u)} , where g(x), x > 0, is a specified nonnegative function. (The region R has finite area exactly when g is square-integrable; see below.) Letting A be the area of R, the density of (U,V) is fu.v(U, v)= 1/A if (u, v ) E R and is 0 otherwise. Making the change of variables, x = v / u and y = u , gives the density fx, r ( x , y ) = y / A on { 0 < x , 0 < y , y~
fx(x) = ~

1

I(x) .

The function g is to be chosen proportional to the square-root of fx, and it is convenient to choose the constant of proportionality so that

g(x) = I'/2(x) =- I(x) .

Computer generation of random variables

699

T h e region R is R = {(u, v): 0 < u, 0 < v, and u < I(v/u)} ,

=

{(u, v): O < u < l , a < v / u < b }

,

which is a triangle b o u n d e d by the lines u = 1, v = au, and v = bu (Figure 3.12.1(a)). Since R can be enclosed in the rectangle {(u, v): 0 < u < 1, 0 < v < b} a generating algorithm is given by:

Step 1. G e n e r a t e U = u and V = v f r o m U ( 0 , 1) and U ( 0 , b), respectively. Step 2. Calculate x = v/u. Step 3. If a < x < b, deliver X = x.

U----1

v--1

R .~...~'~

(a)

'-'1'~

(b)

,-(a/o) 'n

(c)

(d)

Fig. 3.12.1. The acceptance region R for the ratio of uniforms technique: (a) to generate a uniform random variable on (a, b); (b) to generate the reciprocal of a uniform random variable on (a, b); (c) to generate an exponential random variable with mean one; and (d) to generate a half-normal random varible.

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

700

Step 4. Go to Step 1. The overall acceptance probability is the ratio of the area of R to the area of the enclosing rectangle; here, the acceptance probability is ( b - a)/2b. Curiously, for a given interval-length b - a, the acceptance probability decreases as the interval moves to the right. Clearly this is not a good way to generate uniforms. But, the converse will be used in Section 3.12.1: to generate observations according to the above mechanism, simply produce a uniform on (a, b). EXAMPLE 3.12.2 (The reciprocal of a uniform (a, b) variate). Let the variable of interest be Y = 1/X where X is uniform on (a, b), 0 ~
v): 0 < u, 0 < v and u
= {(u,

v): 0 < u, 0 < v and v < I(u/v)}

= {(u,

v): O < v < 1 a n d a < u / v < b } .

Noting that the transformation (u, v)¢:>(v, u) maps the region R into the corresponding region of the previous example, we see that the algorithm delivers the reciprocal of the result delivered in the previous example. In particular, the overall acceptance probability is the same, i.e., ( b - a)/2b. The transformation invariance illustrated by this pair of examples is not typical of the ratio-of-uniforms method. If X and Y are positive variates, and Y is a strictly monotonic transformation Y = T(X) of X, then there are two options for generating Y: (i) generate X by the ratio-of-uniforms method and apply the transformation T, and (ii) generate Y directly by the ratio-ofuniforms method. The two algorithms are quite different unless T(x)= cx or

T(x) = c/x. EXAMPLE 3.12.3 (The standard exponential distribution). Here, we take g(x)= e x p ( - x / 2 ) so that the region R is R = {(u,v): 0 < v a n d 0 < u < e

-v/(2u)}

= {(u, v): 0 < u < 1 and 0 < v < - 2 u In(u)} . The region R, shown in Figure 3.12.1(c), has area j-1 - 2 u In(u)du = 0.5, and can be enclosed by the rectangle {(u, v): 0 < u < 1 and 0 < v < 2 e-l}. The method is to generate U and V as independent uniforms taking values 0 < U < 1, 0 < V < 2 e -a. If V ~<-2U In U, then X = V/U is delivered; otherwise, the

701

Computer generation of random variables

procedure is repeated by generating another pair (U, V). The acceptance probability, the ratio of the two areas, is e/4 or about 0.68. EXAMPLE 3.12.4 (The half-normal distribution). Here, we take g(x) = e x p ( - x 2 / 4) so that the region R is R = {(u, v): 0 < v, 0 < u

<-

e -02/(4u2)}

= {(u, v): 0 < u < 1, 0 < v < 2uVrL-i-n-(u)} . The area of R is J'~ 2 u l / - l n ( u ) du = fo ~ enclosed in the rectangle, {(u,v): 0 < u <

land0
e x p ( - x ) dx = X / ~ .

Since R is

<~},

the method generates pairs of uniforms (U, V) from this rectangle until the condition 0 < V ~<2 U ~ / - I n ( U ) is satisfied; then X = V~ U is delivered. The acceptance probability, the ratio of the two areas, is ~ / 4 or about 0.73. An observation from the full normal distribution can be obtained by randomly attaching a sign to the half-normal observation. This is neatly accomplished by 'doubling the rectangle' as follows (see Figure 3.12.1(d)): generate pairs of uniforms (U, V) from the rectangle {(u, v): 0 < u < 1 and -Xf2/e < v < X/2/e} until the condition V 2 ~ - 4 U 2 In(U) is satisfied; then deliver X = V~ U. Kinderman and Monahan (1977) give some bounds which can speed up the floating point comparisons. The bounds are (1 + In c) - cu <~ - I n u <~ 1/(cu) - (1 - In c)

when 0 < u < 1

for any c > 0 . Equality occurs when u = l / c , so c = 2 can be used as a compromise value. The acceptance/rejection decision can usually be reached by comparison with the lower a n d / o r upper bound, thereby avoiding an evaluation of ln(u). The procedure is summarized in the following algorithm. ALGORITHM 3.12.1. Step 1. Calculate 2 ~ , (1 + In 2), and (1 - In 2). Step 2. Generate U = u and U 1 = u 1 from U(0, 1). Step 3. Set v = 2 2X/~-ee(uI - 0.5). Step 4. S e t v s = v 2, w = 2 u , a n d w s = w 2.

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

702

Step Step Step Step

5. 6. 7. 8.

If vs < ws[(1 + In 2) - w], then deliver X = v/u. If v s >i ws[1/w - (1 - In 2)], then go to Step 2. If vs ~< - w s In u, then deliver X = v/u. Go to Step 2.

Kinderman and Monahan (1977, 1980) also describe routines based on the ratio of two uniforms for the Cauchy distribution, chi-square distribution with 3 degrees of freedom, and the t-distribution with 3 degrees of freedom. For other examples see Cheng and Feast (1980), Dagpunar (1988a, 1989), and Young, Seman and Turner (1990).

3.12.1. Mixture ratio-of-uniforms method This method also goes by the name 'composition ratio-of-uniforms'. If the region R is unbounded, or is not well approximated by an enclosing rectangle, then one can decompose R into a countable (possible finite) disjoint union R = R~ t_JR 2 U • • • of non-overlapping and bounded subregions. Letting Pi be the ratio of the area of R i to that of R, the random point (U, V) E R can be obtained by generating (U, V) uniformly o v e r R i with probability Pi. One can usually arrange matters so that some of the Ri have geometrically convenient shapes. In particular, if R~ has the triangular shape of Example 3.12.1 or Example 3.12.2 then the algorithm need only deliver the appropriate uniform or reciprocal of uniform with probability pi. See Kemp (1990) for an example.

3.13. Methods for generating order statistics The obvious way of obtaining order statistics is to generate a random sample of the appropriate size from the desired distribution and then sort the sample values.

3.13.1. Methods using inverse distribution functions If observations are to be generated by the inverse distribution function technique, then a simplification can be achieved. Since the inverse distribution function is an increasing function, the overall procedure of generating U 1. . . . , U,, transforming these into Y1 . . . . , I1, by the inverse distribution function technique, and then ordering to obtain the order statistics Y¢I)< Y(2) < " " < I1(,) is the same as generating U 1 , . . . , Un, ordering to get the order statistics U(a ) < U(z) < - • • < U(.) of the n uniforms, and then transforming these into the order statistics Y(1) < Y¢z) < " " " < Y(n). The latter m e t h o d has the advantage that if only the first k order statistics are desired then the inverse distribution function transformation needs to be applied only k times. It is also possible to generate order statistics sequentially. Let X~, X 2 , . . . , X, be independent identically distributed random variables with density f(x) and distribution function F(x). Let )((1) < . . - < X(.) be the order

Computer generation of random variables

703

statistics with joint density function f(1) . . . . . ( n ) ( X l , . . . ,

Xn) = n!f(Xl)'''

f(x,).

Then

F(1)(x) = P[X(1 ) ~ x ] = 1 - [1 - F(x)]". T h u s f(1)(x) = n i l --

F(X)]n-lf(x), and

f(2) . . . . . (n)(X2, " " • , X n I X ( l ) = X l ) = f(1) . . . . . (n)(Xl . . . .

= (n - 1 ) ! f ( x 2 ) . . .

, Xn)/f(1)(Xx)

f(x,)/[1 - F ( X l ) ] " - ' .

T h a t is, given )((1 ) = X l , the r e m a i n i n g o r d e r statistics are distributed as the o r d e r statistics of a r a n d o m sample of size n - 1 f r o m the t r u n c a t e d density f(x)/[1 - V(xl) ]. EXAMPLE 3.13.1 (Generation of uniform order statistics in ascending order). Lurie and H a r t l e y (1972) obtain the distribution function for the (] + 1)-th o r d e r statistic conditional u p o n the j-th o r d e r statistic for a r a n d o m s a m p l e f r o m U ( 0 , 1). T h e inverse of this distribution function is t h e n used to g e n e r a t e realizations of the o r d e r statistic. L e t U 1 , . . . , U n be the n i n d e p e n d e n t u n i f o r m deviates with c o r r e s p o n d i n g o r d e r statistics U ( 1 ) , . . . , U(n). T h e distribution function of the smallest o r d e r statistic is F(1)(u ) = 1 - [1 - F(u)] n -- 1 - (1 - u) n so that F~-11)(v)= 1 - (1 v) ~/n Thus, the smallest o r d e r statistic can be g e n e r a t e d as U(1 ) = 1 - v 1/~ w h e r e V~ has a u n i f o r m distribution on (0, 1). A s usual, we h a v e used the fact that V I and 1 - 1/1 are identically distributed. N o w , conditional u p o n UO) = Ul, the variables (U(2) - Ul)/(1 - u l ) , . . . , (U(~) - u l ) / ( 1 - u l ) act like the o r d e r statistics of n - 1 i n d e p e n d e n t u n i f o r m r a n d o m variables on (0, 1). So /_](2~ given U(1 ) = u I can be g e n e r a t e d by the relation (U(2) - - U l ) / ( 1 - U l ) = 1 V12/("-1) w h e r e V2 is a second u n i f o r m r a n d o m variable i n d e p e n d e n t of Va. Solving gives U(2 ) = u 1 +

(1 - U l ) ( 1

-V12/(n-l))

~- 1 -

(1 - " I A 1"~171/(n-1) Jv 2

Continuing in this m a n n e r we see that, given U(i ) = ui, the next o r d e r statistic 1~(n-i) • • can be g e n e r a t e d by U(~+A) = 1 - (1 - u~)V~+~ . T h e following a l g o r i t h m uses this m e t h o d to g e n e r a t e the first k o r d e r statistics f r o m a s a m p l e of n u n i f o r m r a n d o m variables on (0, 1). This m e t h o d , as well as o t h e r sequential m e t h o d s , should be applied with caution w h e n n is large since the r o u n d i n g errors can a c c u m u l a t e quickly• ALGORITHM 3.13.1. Step 1. Initialize t e m p = 0.0 and i - - 0 .

M. T. Boswell, S. D. Gore, G. P. Pan'l and C. Taillie

704

Step 2. Generate V from U(0, 1).

Step 3. Step 4. Step 5. Step 6. Step 7.

Set U(i+l ) = 1 - (1 - temp)V 1/("-0. If i = k - 1 then deliver U(1), U ( 2 ) , . . . , U(k). Replace temp by U(i+l). Replace i by i + 1. Go to Step 2.

The method generates the uniform order statistics without doing any sorting or making comparisons, but it does require the extraction of integral roots. With the uniform order statistics in hand, the inverse distribution function technique can be employed to generate order statistics from nonuniform distributions. Also notice that the k largest order statistics from a uniform distribution can be generated as the one's complement of the k smallest order statistics. That is, U(,_~+~) has the same distribution as 1 - U(~). EXAMPLE 3.13.2 (Generation of uniform order statistics in descending order). For independent uniform random variables, U 1 , . . . ,/_In, on (0, 1), F(,,)(u)= Fn(U) = u n and F(n)(v) --1 = v 1 I n . Using the inverse distribution function technique vrl/n I gives U(n) = V 1 , where 111 has a uniform distribution on (0, 1). Given U(,) = Un, then U(1)/u ~, U ( 2 ) / u , , . . . , U(n_i)/u, act like the order statistics of n - 1 independent uniform random variables on (0, 1). Thus, to generate an • ~T vrl/(n--1) observation U(n_l) given U(,)= u,, we set u(~_l)= UnV 2 . Continuing in this manner we see that given U(,_ 0 = u,_g, then U(n_i_~)=-b t n _ i V ixrx/(,-;-1) + 2 Schucany (1972) used this approach to generate the largest order statistics. Taking the one's complement then gives the k smallest order statistics as outlined in the following algorithm. •

r r

ALGORITHM 3.13.2. Step 1. Initialize temp = 1.0 and i = 0. Step 2. Generate V from U(0, 1). Step 3. Set U(n_i)= t e m p V 1/(n-°. Step 4. If i = k - 1, then deliver 1 - U(~), 1 - U ( , _ I ) , . . . , 1 - U(n_g+l)Step 5. Replace temp by U(,_ 0. Step 6. Replace i by i + 1. Step 7. Go to Step 2. Looking at Step 3 and Step 4 we see that this routine is slightly simpler than that of Algorithm 3.13.1; the literature reports that it is also faster (Laurie and Mason, 1973; Reeder, 1972). For generating consecutive central order statistics, Ramberg and Tadikamalla (1978) advocate the use of Schucany's descending method after generating the largest required order statistic directly from the appropriate beta distribution. They report that this can be twice as fast as descending all the way from Un:n.

Computer generation of random variables

705

EXAMPLE 3.13.3 (Sequential generation of exponential order statistics). Let X 1 , . . . , X n be a r a n d o m sample f r o m the exponential distribution with m e a n /z. The smallest order statistic has distribution function F(1)(x ) = 1 - [1 - F(x)]" = 1 - e -x/(tL/n) ,

x > O.

T h a t is, X(1 ) a l s o has an exponential distribution but with m e a n tx/n. Conditional upon X o ) = xl, the second order statistic is x 1 plus the first order statistic from a sample of n - 1 exponential r a n d o m variables. The following algorithm uses the inverse distribution function technique to generate the exponential r a n d o m variables (see E x a m p l e 3.3.6), but any m e t h o d of generating exponential r a n d o m variables could be used. ALGORITHM 3.13.3. Step 1. Initialize i = 0, X(0 ) = 0. Step 2. G e n e r a t e U from U(0, 1). Step 3. Set X(i+i ) = X(i > - [il~/(n - i)1 In(U). Step 4. If i = k - 1, then deliver X ( x ) , . . . , X@). Step 5. Replace i by i + 1. Step 6. G o to Step 2. EXAMPLE 3.13.4 (Uniform order statistics via exponential order statistics). If a fast exponential generator exists, then it can be used as in E x a m p l e 3.13.3 to generate the order statistics from the exponential distribution. These in turn can be transformed into order statistics from the uniform distribution on (0, 1) by the exponential distribution function. T h a t is, if X(1 ) < - - • < X(n ) are order statistics from the exponential distribution with m e a n 1, then 1 - e -x(x) < • • • < 1 - e -X(n) have the same distribution as the order statistics from the uniform distribution on (0, 1). It is interesting to observe that if the exponentials are generated by the inverse distribution function technique, then this algorithm is the same as that given in E x a m p l e 3.13.1. Lurie and Hartley (1972) give another m e t h o d of transforming exponential r a n d o m variables into uniform order statistics. We can obtain their results f r o m properties of the Poisson process. It is well known that the times between successive events are independent exponential r a n d o m variables with m e a n 1/Z where )t is the intensity p a r a m e t e r of the Poisson process. Conditional on the time T,+ 1 = t of the (n + 1)-th event, the times of the first n events are distributed as the order statistics from a uniform distribution on (0, t). Therefore, if X 1. . . . , X , + 1 are independent identically distributed exponential r a n d o m variables, then

u(,) = X , l ( X , + . . . + x . + , ) , u~) = (x, + x~)i(Xx + . . . + x . + , ) ,

706

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

u<.> = ( x l + . . . + x . ) / ( x , + . . . + x . + , ) ,

are the order statistics from n uniform random variables on (0, 1). 3.13.2. Methods using hazard functions

This method for generating order statistics, due to Newby (1979), is essentially the same as the inverse distribution function technique described above, except that the inverse distribution function is expressed in terms of the hazard function, and it is not given explicitly here. 3.13.3. Mixture method with sorts

Gerontidis and Smith (1982) give a method of obtaining order statistics for a sample of size n from a general distribution. The support of the distribution is partitioned into k equally-likely intervals (any partition will work; in the discrete case, it may not be possible to allocate the probabilities uniformly). A vector ( m l , . . . , mk) is generated from the k-celled multinomial distribution with parameters n and Pi = 1/k, i = 1 , . . . , k. (See Examples 3.3.4 and 4.2.1 for multinomial generators.) Finally, m i observations are generated from the original distribution truncated to the i-th partition interval, i = 1, 2 , . . . , k. The observations in each interval have to be sorted to arrive at the final order statistics. The method gains efficiency as the sample size n increases. Gerontidis and Smith give comparisons of the direct method (sorting) and this grouping method as applied to the uniform distribution, the normal distribution, and several members of the beta family of distributions.

4. Multivariate generation methods 4.1. Introduction

Computer generation of observations from multivariate distributions is more complex than from univariate distributions, largely because the multivariate dependence structure forces one to generate the various components collectively instead of individually. In general, one can expect the algorithmic and computational complexity to increase, sometimes markedly, with increasing dimension. In addition, the univariate techniques do not always generalize readily to the multivariate setting. For instance, there is no obvious analogue of the inverse distribution function method. Multivariate generation is discussed by Johnson (1987), Nachtsheim and Johnson (1988), Schmeiser (1990), and Schmeiser and Lal (1980b). Barone (1987) considers the generation of multivariate time series. Simulation studies sometimes employ variance reduction techniques that require the generation of correlated random variables; see Hammersley and Handscomb (1964), Kleij-

Computer generation of random variables

707

nen (1974, 1975), and Sim and Lee (1989) for examples. In what follows, we describe and illustrate three important general methods for multivariate random variable generation. These are: (i) the conditional distributions method, (ii) the transformation method, and (iii) the rejection method. Additionally, we describe an approximate method that has been used to generate observations from arbitrary multivariate distributions. This m e t h o d simulates the evolution of a Markov chain until equilibrium is nearly achieved.

4.2. Conditional distributions method A k-dimensional distribution can be represented as a product of k univariate conditional distributions. Correspondingly, random variable generation from the multivariate distribution can be accomplished by generating an observation from each of the univariate conditional distributions in sequence. This idea was previously encountered when we discussed the sequential generation of order statistics. Let the joint density of the random vector X of interest be factored as follows: f(x

, x2, . . . ,

= L ( x l ) L ( x 2 I xO'"L(x

Ix1 . . . .

,

The generation method is described by the following algorithm. ALGORITHM 4.2.1.

Step 1. Generate X 1 = x I from the marginal density fa(x~). Step 2. Set i = 2. Step 3. Generate X i = x i from the univariate conditional density (xi

Step 4. If i = k, then deliver X = ( X l , . . . , x~)'. Step 5. Increment i to i + 1. Step 6. Go to Step 3. The main obstacles in implementing this algorithm are: (i) determination of the conditional distributions, and (ii) identification of a suitable univariate generate technique for each of the conditional distributions. Note that the magnitude of the difficulties here can be markedly affected by even such simple transformations as permutations of the components of X. EXAMPLE 4.2.1 (The multinomial distribution). See also Example 3.3.4. Suppose X = ( X 1 , . . . , Xk)' follows the multinomial distribution with parameters n and p l , . . . , Pk, so that 0~
M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

708

with parameters n~=n-2x:

and

p*=pi

j=l

/[

1-~,pi



j=l

We thus obtain the following algorithm for random vector generation from the multinomial distribution. Loukas and Kemp (1986) use similar techniques to generate realizations of the bivariate binomial and bivariate negative binomial distributions. ALGORITHM 4.2.2. Step 1. S e t i = l , N = n , P = p ~ . Step 2. Generate X i = x i from the binomial distribution with parameters N and P. Step 3. If i = k, then deliver X = (xl, x 2, . . . , x~)'. Step 4. Increment i. Step 5. Compute N = n ~ and P = p * . Step 6. Go to Step 2. EXAMPLE 4.2.2 (The multivariate normal distribution). Suppose X follows the nonsingular multivariate normal distribution with mean vector /t = (/zl . . . . . /*k)' and positive definite dispersion matrix ~ = (%). We also write the diagonal entries of ,~ as o- 2 = ~ r For i = 1 , . . . , k , define X(i )= ( X 1 , . . . , X 'i) as the vector of the first i components of X. The mean vector of X(O is/,(e) = (/*a, • • •,/*i)' and the dispersion matrix of X(o, given by

"~ii :

o-ll

"""

0.1i )

i \ °'a

" ". "'"

:. 0.ii

,

is the leading i by i submatrix of X. The vector of covariances of X i + 1 with X 1 , . . . , X i is given by %0 = (%,i+1, • • • , °'/,g+l)'The marginal distribution of X 1 is univariate normal with mean /xI and variance o-2; whereas, for i = 1 , . . . , k - 1, the conditional distribution of Xi+ > given X I , . . . , X~, is univariate normal with mean

=

+

ioX

l(x(o - t*(i))

and variance -2 2 p -1 O"i+1 "~- or/+1 -- O'(i)•ii lOt(i) "

This gives the following algorithm for generating the k-dimensional multivariate normal distribution with m e a n / x and dispersion matrix X. ALGORITHM 4.2.3. Step 1. Set i = 1, /x =/zl, and

0. 2 =

Or~.

Computer generation of random variables

Step Step Step Step Step

2. 3. 4. 5. 6.

Generate X i = x i from N(/x, 0-2). If i = k, then deliver X = (Xa, x2, Increment i. Compute/x =/2 i and 0-2= ~/2. Go to Step 2.

.

.

.

,

709

Xk)'.

4.3. Transformation method If X and Y are random vectors satisfying a functional relationship ¥ = h(X), then observations on ¥ can be generated by first generating observations on X and then applying the transformation Y = h(X). Nonlinear transformations will be illustrated later in this section. Here, we consider linear (or affine) transformations. An important application of linear transformations is in generating multivariate observations whose mean vector /~ and dispersion matrix X are specified, but whose full distribution is unknown or unspecified. Such observations are needed, for example, in studying the properties of least squares procedures, including the kriging of spatial data (Alabert, 1987; Davis, 1987a,b). Suppose, then, that the k-dimensional random vector ¥ is specified to have mean ~ and dispersion matrix X. For simplicity, we take X to be positive definite. According to the Cholesky decomposition, X can be written as

=HH', where H is a lower triangular k by k matrix; for example, see George and Liu (1981) or Golub (1969). Now let X be any k-dimensional random vector whose components are independent with zero means and unit variances. Then E[X] = 0 and E[XX'] = I, from which it is seen that the affine transformation,

Y=HX+p, will generate Y with the desired moments. Of course, X is relatively easy to generate since its components are independent. Hull (1977) describes a method (involving nonlinear transformations) in which the generated random vector has prescribed marginals and has (approximately) a prescribed dispersion matrix. If we recall that the multivariate normal family of distributions is mapped into itself by affine transformations, then the above transformation provides an alternative method for generating multivariate normal observations: simply take the components of X to be standard normal. Actually a close study of the Cholesky decomposition would reveal that this method of generating the multivariate normal is not fundamentally different from the method using conditional distributions discussed earlier. See also Chung, Kailath and LevAri (1987) and Cheng (1985).

710

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

EXAMPLE 4.3.1 (The multivariate Poisson distribution). If X, X 1 , . . . , X k are independent Poisson random variates with respective means A, A ~ , . . . , Ak, then the random vector Y = ( Y ~ , . . . , Yk)' defined by Y~= X + X ~ ,

i=l,...,k,

is said to have the multivariate Poisson distribution with parameters A, A~, A 2 , . . . , Ak (Johnson and Kotz, 1969, p. 317). In order to generate an observation Y = y on the multivariate Poisson, we can generate the k + 1 independent Poisson variates X,)(1, X Z , . . . , X k , and then make the transformation defined above. The following examples illustrate the use of nonlinear transformations to generate multivariate distributions. EXAMPLE 4.3.2 (The Wishart distribution). If X follows k-variate normal distribution, N~(0, X), and if X 1 , . . . , X n are independent realizations of X then the k by k random matrix W= ET=1XiX~ is said to have the Wishart distribution Wk(n,X ) with parameters n and ,~. Generation of a single observation on Wk(n, ~ ) is accomplished by applying the above quadratic transformation to the n independent observations X 1. . . . , Xn from Nk(0, 2 ) . EXAMPLE 4.3.3 (The Dirichlet distribution). Suppose X 1 , . . . , X g+1 are independent random variables from the gamma distribution, X i ~ F(ai, 1), with parameters ai, i = 1 , . . . , k + 1. Define X = ~i=lrk+a~i. Y Then the random vector Y= (Y1 . . . . , Yk)' with Y~= X J X , i= 1 , . . . ,k, follows the k-dimensional Dirichlet distribution with parameters a l , . . . , ak+ 1. Note that the univariate beta distribution is the special case when k = 1. Again observations can be generated directly by applying the defining transformation. Multivariate distributions can also be obtained using Khintchine's machine (Bryson and Johnson, 1982). The transformation for this method is the product X i = ZiU~ where {U~} are dependent uniform random variables. Loukas and Kemp (1986) give transformations of three random variables into bivariate binomial and also bivariate negative binomial random variables. Deak (1980) gives a multidimensional polar transformation for generating multivariate normal random variables. Kocherlakota, Kocherlakota and Balakrishnan (1986) transform two independent random variables into a bivariate random variable with the Edgeworth series distribution. See Anderson, Olkin and Underhill (1987) for the use of transformations in generating random orthogonal matrices.

4.4. Rejection method Rejection methods have been found to be very powerful for univariate generation, and it is clear that this method is not restricted to one dimension. In principle, multivariate rejection is a straightforward generalization of the

Computer generation of random variables

711

univariate rejection method and is presented in the skeletal form in the following algorithm. The random vector of interest is X with multivariate density f, and Y with density g is the random vector whose realizations are to be screened (i.e., accepted or rejected) to produce the desired realizations of X. ALGORITHM 4.4.1.

Step Step Step Step Step

1. 2. 3. 4. 5.

Compute m = supx[f(x)/g(x)]. Generate an observation y from g. Generate U = u from U(0, 1). If u
However, there are some significant practical difficulties in implementing this algorithm. One major difficulty is in finding a density function g that is suitably close to the multivariate density of interest. In order to simplify the generation from g in Step 2, it is attractive to choose the distribution of Y in such a way that its components are independently distributed. One possible solution is to let ¥ have independent components Yi with the same marginal distributions as the corresponding components of X. Even this solution is not without its difficulties, however. First, there are no assurances that the constant m appearing in Step 1 is finite. Also, with independent components, the support of ¥ is a hypercube in R k and may be substantially larger than the support of X, leading to significant inefficiencies in the algorithm. Finally, if the dependence among the components of X is strong, then one cannot expect its density to be well approximated by a density having independent marginals. One further general comment regarding multivariate rejection methods may be in order here. Univariate implementations of these techniques often partition regions into subregions in order to achieve reasonable efficiency. In principle, such partitioning can also be done for high-dimensional regions, but the practical difficulties can be overwhelming, especially if the algorithm has to be capable of generating observations from each member of a parametric family of distributions. EXAMPLE 4.4.1 (Bingham's distribution on the unit sphere, Bingham, 1974). Johnson (1987, p. 47) describes this method which illustrates that multivariate distributions require individual approximating functions for the rejection method. Suppose observations (~9, @) = (0, q~) are desired from the density

f(O, q~I k~, k2) = c exp{[k I COS2~ Jr- k 2 sin2q~] sin20) sin 0 , 0<0<7

and 0
where c is the normalizing constant. The pair of angles (0, q~) represent a point of the unit sphere. Since the density f(O, q~ [kl, k2) is symmetric about 0 = ~/2, we need only generate observations on 0 < 0 <'r r/2 and either deliver that observation or the symmetric point with equal probability.

712

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

Johnson splits the interval 0 < 0 < 7r/2 into 0 < 0 ~ -n/3 and ~ / 3 < 0 < -rr/2 and finds an approximating function g(O[k), where k = m a x { k 1, k 2 } , with different forms on the two pieces: ~[exp(k sin20)] sine0

g(O ]k) = [.[exp(k)] sin 0

if 0 < 0 ~< 7r/3, if ~ / 3 < 0 < "rr/2.

This approximating function does not depend on q~. ALGORITHM 4.4.2. Step 1. Set k = min{k 1, k2}, d 1 = 2~r[exp(3k/4) - 1]/k, d 2 = "rr, S 1 = 1, and S 2 = exp(k). Step 2. G e n e r a t e U = u and V = v from U(0, 1). Step 3. Set ~ = 2"rru and 0 = s i n - l { k -1 ln[v(exp(3k/4) - 1]} 1/2. Step 4. G e n e r a t e U = u from U(0, 1). Step 5. If u <~(dl/S1)/(dl/S ~ + d 2 / 8 2 ) , then deliver (9 = 0 and ¢b = q~. Step 6. Deliver (9 = sin-l[v/exp(k)] and @ = 2"rru. See A h m a d (1987) for examples where rejection methods are combined with transformations to generate bivariate r a n d o m walk distributions.

4.5. A general method for generating multivariate random variables This section describes a technique for generating observations f r o m an arbitrary multivariate distribution. The distribution of interest is regarded as the equilibrium distribution of a Markov chain and the m e t h o d samples f r o m this chain until equilibrium is approached. To the extent that equilibrium is never fully achieved, the m e t h o d is only approximate. A useful feature of the technique is that the normalizing constant for the multivariate distribution does not have to be known. See Lele and Boswell (1986) for a m o r e rigorous discussion of the method, and Ripley (1979) for applications to the sampling of spatial processes.

4.5.1. Description of the algorithm The method applies to both continuous and discrete distributions. In the continuous case, the Markov chain is unusual in that the state space is continuous, but the transition function has an atom of probability corresponding to no change of state. Let f(r), r in R k, be the density from which observations are desired. The method starts with a density function g(r[s) approximating f(r) in a neighborhood of s. Transitions from s to a potential new state r are generated from g(r[s). Thus, the support S of g(r[s) must contain R k. The transition from s to r is allowed to occur with a certain probability a(r[s); otherwise, the M a r k o v chain remains at s. For the process to have f(r) as equilibrium distribution, it is necessary and sufficient that

a(r Is) = min{g(s [r)f(r)/[g(r[s)f(s)], 1}

Computer generation of random variables

713

(see Lele and Boswell, 1986, for details). Further, the initial state s o is generated from some initial density g(s) which may be degenerate. Once g(s), g(r[s), and a(r]s) are known, then the following algorithm summarizes the method. Equilibrium is assumed to have been reached after N E Q steps. ALGORITHM 4.5.1. Step 1. Set C O U N T E R equal to 0. Step 2. Generate an observation S = s from g(s). Step 3. Generate an observation R = r from g(r[s). Step 4. Increment C O U N T E R . Step 5. Calculate a = g(s[r)f(r)/[g(r[s)f(s)]. Step 6. If a >i 1, then set s = r and go to Step 9. Step 7. Generate U from U(0, 1). Step 8. If U ~
4.5.2. Performance of the algorithm The process is terminated after finitely many transitions to yield an observation which has approximately the equilibrium distribution. Since the rate of convergence for Markov chains is geometric, relatively few transitions are required. Lele and Boswell (1986) have simulated the process for 5 to 15 transitions for the bivariate Dirichlet, the bivariate binomial, the bivariate log-series and the bivariate normal distributions, and have found that for these distributions ten transitions is enough to reach equilibrium. Bhavsar and Isaac (1987) point out certain Monte Carlo methods which, like the above method, converge and have expected error rates proportional to the square root of the sample size (often taken as k = 103 or k = 105). Thus, for the above algorithm, it would be prudent to use a larger number of steps than 10. Lele and Boswell report two difficulties that were encountered during the testing of this algorithm. Both are of a computational character. (a) If the distribution g(r[s) has its center too far from the center of the desired distribution, then a large number of transitions may be needed to reach equilibrium. In selecting g(r[s), its location (central tendency) is more important than its shape.

714

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

(b) When a is calculated in Step 5, underflow/overflow sometimes occurs. Lele and Boswell suggest working with logarithms in such cases.

Acknowledgement The subject of random variable generation has been of interest at the Penn State Center for Statistical Ecology and Environmental Statistics for quite some time in its cooperative research with National Oceanic and Atmospheric Administration and the Environmental Protection Agency. These random variable generation methods have important applications in statistical ecology and in environmental statistics. Our thanks are due to Drs. Herbert Lacayo and John Warren of the EPA Statistical Policy Branch for their continuing interest and support. We are grateful to Dr. J. S. Dagpunar of Dundee Institute of Technology for his careful reading of the paper and for his making several useful suggestions. We would also like to acknowledge the comments and suggestions received from Drs. Mark E. Johnson, Bruce Schmeiser, William Smith, and P. R. Tadikamalla on the earlier version of the manuscript. Our interest in the subject continues. We will be glad to hear from the interested reader.

References Abramowitz, M. and I. A. Stegun (1972). Handbook of Mathematical Functions. Dover, New York. Ahmad, K. E. (1987). Generating random vectors from a bivariate random walk density function. J. Statist. Comput. Simulation 27, 243-250. Ahrens, J. H. and U. Dieter (1972). Computer methods for sampling from the exponential and normal distributions. Comm. A C M 15, 873-881. Ahrens, J. H. and U. Dieter (1973). Extensions of Forsythe's method for random sampling from the normal distribution. Math. Comp. 27, 927-937. Ahrens, J. H. and U. Dieter (1980). Sampling from binomial and Poisson distributions: A method with bounded computation times. Computing 25, 193-208. Ahrens, J. H. and U. Dieter (1981). Generating gamma variates by comparison of probability densities. Institute of Statistics, Technical University of Graz, Graz, Austria. Ahrens, J. H. and U. Dieter (1982). Computer generation of Poisson deviates from modified normal distributions. A C M Trans. Math. Software 8, 163-179. Ahrens, J. H., K. D. Kohrt and U. Dieter (1983). Algorithm 599. Sampling from gamma and Poisson distributions. A C M Trans. Math. Software 9, 255-257. Alabert, F. (1987). The practice of fast conditional simulations through the LU decomposition of the covariance matrix. Math. Geol. 19, 369-386. Altman, N. S. (1988). Bit-wise behavior of random number generators. S l A M J. Sci. Statist. Comput. 9, 941-949. Anderson, T. W., I. Olkin and L. G. Underhill (1987). Generation of random orthogonal matrices. S l A M J. Sci. Statist. Comput. 8, 625-629. Atkinson, A. C. (1980). Tests of pseudo-random numbers. Appl. Statist. 29, 164-171. Atkinson, A. C. (1982). The simulation of generalized inverse Gaussian and hyperbolic random variables. S l A M J. Sci. Statist. Comput. 3, 502-515.

Computer generation of random variables

715

Banks, J. and J. S. Carson (1984). Discrete-Event System Simulation. Prentice-Hall, Englewood Cliffs, NJ. Banks, J., J. S. Carson and D. Goldsman (1989). Computer simulation. In: H. Wadsworth, ed., Handbook of Statistical Methods for Engineers and Physical Scientists, McGraw-Hill, New York. Barone, P. (1987). A method of generating independent realizations of a multivariate normal stationary and invertible ARMA(p, q) process. J. Time Ser. Anal. 8, 125-130. Besag, J. and P. Clifford (1989). Generalized Monte Carlo significance tests. Biometrika 76, 633-642. Bhavsar, V. C. and J. R. Isaac (1987). Design and analysis of parallel Monte Carlo algorithms. SIAM J. Sci. Statist. Comput. 8, s73-s95. Bingham, C. (1974). An antipodally symmetric distribution on the sphere. Ann. Statist. 2, 1201-1225. Blair, J. M., C. A. Edwards and J. H. Johnson (1976). Rational Chebyshev approximations for the inverse of the error function. Math. Comp. 30, 827-830. (Note that the numerical coefficients are not included in the paper, but are available on microfiche.) Borosh, I. and H. Niederreiter (1983). Optimal multipliers for pseudorandom number generation by the linear congruential method. BIT 23, 65-74. Boswell, M. T. and R. J. DeAngelis (1981). A rejection technique for the generation of random variables with the beta distributions. In: C. Taillie, G. P. Patil, and B. Baldessari, eds., Statistical Distributions in Scientific Work, Vol. 4: Models, Structures, and Characterizations. Reidel, Dordrecht, 305-312. Box, G. E. P. and M. E. Muller (1958). A note on the generation of random normal deviates. Ann. Math. Statist. 29, 610-611. Boyar, J. (1989). Inferring sequences produced by pseudo random number generators. J. A C M 36, 129-141. Bright, H. S. and R. L. Enison (1979). Quasirandom Number Sequences From a Long-period TLP Generator with Remarks on Application to Cryptography. ACM, New York. Brown, M. and H. Solomon (1979). On combining pseudorandom number generators. Ann. Statist. 7, 691-695. Bryson, M. C. and M. E. Johnson (1982). Constructing and simulating multivariate distributions using Khintchine's theorem. J. Statist. Comput. Simulation 16, 129-137. Bucklew, J. A. (1990). Large Deviation Techniques in Decision, Simulation, and Estimation. Wiley, New York. Butcher, J. C. (1960). Random sampling from the normal distribution. Comput. 3, 251-253. Carta, D. C. (1990). Two fast implementations of the 'minimal standard' random number generators. Comm. A C M 33, 87-88. Chen, H. C. and Y. Asau (1977). On generating random variates from an empirical distribution. A I I E Trans. 6, 163-166. Cheng, R. C. H. (1985). Generation of multivariate normal samples with given sample mean and covariance matrix. J. Statist. Comput. Simulation 21, 39-50. Cheng, R. C. H. and G. M. Feast (1980). Gamma variate generators with increased shape parameter range. Comm. A C M 23, 389-394. Chung, J., T. Kailath and H. Lev-Ari (1987). Fast parallel algorithms for QR and triangular factorization. SIAM J. Sci. Statist. Comput. 8, 899-913. Collings, B. J. (1987). Compound random number generators. J. Amer. Statist. Assoc. 82, 525-527. Collings, B. J. and G. B. Hembree (1986). Initializing generalized feedback shift register pseudo-random generators. J. A C M 33, 706-711. Dagpunar, J. S. (1988a). Computer generation of random variates from the tail of t and normal distributions. Comm. Statist. Simulation Comput. 17, 653-661. Dagpunar, J. S. (1988b). Principles of Random Variate Generation. Clarendon Press, Oxford. Dugpunar, J. S. (1989). An easily implemented inverse Gaussian generator. Comm. Statist. Simulation Comput. 18, 703-710.

716

M. T. Boswell, S. D. Gore, G. P. Patil and C. TaiUie

Dagpunar, J. S. (1990). Sampling from the von Mises distribution via a comparison of random numbers. J. Appl. Statist. 17, 165-168. Davis, M. Vo (1987a). Production of conditional simulations via the LU triangular decomposition of the covariance matrix. Math. Geol. 19, 91-98. Davis, M. V. (1987b). Generating large stochastic simulations- The matrix polynomial approximation method. Math. Geol. 19, 99-107. Deak, I. (1980). Fast procedures for generating stationary normal vectors. J. Statist. Comput. Simulation 10, 225-242. Deak, I. (1981). An economical method for random number generation and a normal generator. Computing 27, 113-121. Deak, I. (1986). The economical method for generating random samples from discrete distributions. A C M Trans. Math. Software 12, 34-36. Deak, I. (1990). Random Number Generators and Simulation. Akademiai Kiado, Budapest. Deak, I. and B. Bene (1987). Random number generation: A bibliography. MTA SZTAKI Working Paper MO-5, Budapest. Devroye, L. (1981). The series method for random variate generation and its application to the Kolmogorov-Smirnov distribution. Amer. J. Math. Management Sci. 1, 359-379. Devroye, L. (1984). Random variable generation for unimodal and monotone densities. Computing 32, 43-68. Devroye, L. (1986). Non-Uniform Random Variate Generation. Springer, New York. Dudewicz, E. J. and T. G. Ralley (1981). The Handbook of Random Number Generation and Testing with TESTRAND Computer Code. American Sciences Press, Columbus, OH. Eichenauer, J. and J. Lehn (1986). A non-linear congruential pseudo random number generator. Statist. Hefte 27, 315-326. Fellen, B. M. (1969). An implementation of the Tausworthe generator. Comm. A C M 12, 413. Fishman, G. S. (1978). Principles of Discrete Event Simulation. Wiley, New York. Fishman, G. S. and L. S. Moore (1984). Sampling from a discrete distribution while preserving monotonicity. Amer. Statist. 38, 219-223. Fishman, G. S. and L. S. Moore (1986). An exhaustive analysis of multiplicative congruential random number generators with modulus 231 - 1. SIAM J. Sci. Statist. Comput. 7, 24-45. Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika 43, 521-532. Flournoy, N. and R. K. Tsutakawa (1991). Statistical Multiple Integration. Contemp. Math. Ser., Vol. 115. American Mathematical Society, Providence, RI. Forsythe, G. E. (1972). Von Neumann's comparison method for random sampling from the normal and other distributions. Math. Comp. 26, 817-826. Fushimi, M. and S. Tezuka (1983). The k-distribution of generalized feedback shift register pseudorandom numbers. Comm. A C M 26, 516-523. Gebhardt, F. (1967). Generating pseudo-random numbers by shuffling a Fibonacci sequence. Math. Comp. 21, 708-709. Gelfand, A. E. and A. F. M. Smith (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85, 398-409. George, A. and J. W. H. Liu (1981). Computer Solution of Large Sparse Positive Definitive Systems. Prentice-Hall, Englewood Cliffs, NJ. Gerontidis, I. and R. L. Smith (1982). Monte Carlo generation of order statistics from general distributions. Appl. Statist. 31, 238-243. Golder, E. R. and J. G. Settle (1976). The Box-Muller method for generating pseudo-random normal deviates. Appl. Statist. 25, 12-20. Golub, G. H. (1969). Matrix decompositions and statistical calculations. In: R. C. Milton and J. A. Nelder, eds., Statistical Computation. Academic Press, New York. Grube, A. (1973). Mehrfach rekursiv-erzeugte Pseudo-Zufallszahlen. Z. Angew. Math. Mech. 53, T223-T225. Guirguis, G. H. (1991). A rational approximation of the inverse normal probability function. Comput. Statist. Data Anal. 11, 199-200.

Computer generation of random variables

717

Hahn, P. and M. Jeruchim (1987). Developments in the theory and applications of importance sampling. IEEE Trans. Commun. 35, 706-714. Hall, P. and D. M. Titterington (1989). The effect of simulation order on level accuracy and power of Monte Carlo tests. J. Roy. Statist. Soc. Ser. B 51, 459-467. Hammersley, J. M. and D. C. Handscomb (i964). Monte Carlo Methods. Methuen, London. Heath, D. and P. Sanehez (1986). On the period of pseudorandom number generators (or: How big a period do we need?). Oper. Res. Lett. 5, 3-6. Holst, P. (1979). Computer Simulation 1951-1976. An Index to the Literature. Mansell, London. Hora, S. C. (1983). Estimation of the inverse function for random variate generation. Comm. A C M 26, 590-594. Hull, J. C. (1977). Dealing with dependence in risk simulations. Oper. Res. Quart. 28, 201-213. Johnson, M. E. (1987). Multivariate Statistical Simulation. Wiley, New York. Johnson, N. L. and S. Kotz (1969). Distributions in Statistics: Discrete Distributions. Houghton Mifflin, Boston. Johnson, M. E., G. L. Tietjen and R. J. Beckman (1980). A new family of probability distributions with applications to Monte Carlo studies. J. Amer. Statist. Assoc. 75, 276-279. Kachitvichyanukul, V. and B. W. Schmeiser (1985). Computer generation of hypergeometric random variates. J. Statist. Comput. Simul. 22, 127-146. Kachitvichyanukul, V. and B. W. Schmeiser (1988). Binomial random variate generation. Comm. A C M 31, 216-222. Kemp, A. W. (1981). Efficient generation of logarithmically distributed pseudo-random variables. Appl. Statist. 30, 249-253. Kemp, A. W. (1988). Simple algorithms for the Poisson model cumulative probability. Comm. Statist. Simulation Comput. 17, 1495-1508. Kemp, A. W. (1990). Patchwork rejection algorithms. J. Comput. Appl. Math. 31, 127-131. Kemp, A. W. and C. D. Kemp (1990). A composition search algorithm for low-parameter Poisson generation. J. Statist. Comput. Simulation 35, 239-244. Kemp, C. D. (1982). Low-storage Poission generators for microcomputers. In: Proc. Compstat. 1982, Vol. 4, 145-146. Kemp, C. D. (1986). A model method for generating binomial variables. Comm. Statist. Theory Methods 15, 805-813. Kemp, C. D. (1990). New algorithms for generating Poisson variables. J. Comput. Appl. Math. 31, 133-137. Kemp, C. D. and A. W. Kemp (1991). Poisson random variate generation. Appl. Statist. 40, 143-158. Kennedy Jr W. J. and J. E. Gentle (1980). Statistical Computing. Marcel Dekker, New York. Kinderman, A. J. and J. F. Monahan (1977). Computer generation of random variables using the ratio of uniform deviates. A C M Trans. Math. Software 3, 257-260. Kinderman, A. J. and J. F. Monahan (1980). New methods for generating student's t and gamma variables. Computing 25, 369-377. Kirkpatrick, S. and E. P. Stoll (1981). A very fast shift-register sequence random number generator. J. Comput. Phys. 40, 517-526. Kleijnen, J. P. C. (1974). Statistical Techniques in Simulation. Part 1. Marcel Dekker, New York. Kleijnen, J. P. C. (1975). Statistical Techniques in Simulation. Part 2. Marcel Dekker, New York. Kleijnen, J. P. C. (1988). Statistical Tools for Simulation Practitioners. Marcel Dekker, New York. Knuth, D. E. (1981). The Art of Computer Programming. Volume 2: Seminumerical Algorithms. Addison-Wesley, Reading, MA. Koeherlakota, K., S. Kocherlakota and N. Balakrishnan (1986). Random number generation from a bivariate Edgeworth series distribution. Comput. Statist. 2, 97-105. Koopman, R. F. (1986). The orders of equidistribution of subsequences of some asymptotically random sequences. Comm. A C M 29, 802-806. Kronmal, R. A. and A. V. Peterson (1979). On the alias method for generating random variables from a discrete distribution. Amer. Statist. 33, 214-218.

718

M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie

Kronmal, R. A. and A. V. Peterson (1981). A variant of the acceptance-rejection method for computer generation of random variables. J. Amer. Statist. Assoc. 76, 446-451. L'Ecuyer, P. (1986). Efficient and portable 32-bit random variate generators. In: J. R. Wilson, J. O. Henriksen and S. D. Roberts, eds., 1986 Winter Simulation Conference Proceedings, December 8-10. Sponsored by American Statistical Association... let al.], Institute of Electrical and Electronics Engineers, New York. L'Ecuyer, P. (1988). Efficient and portable combined random number generators. Comm. A C M 31, 742-751. Law, A. M. and W. D. Kelton (1982). Simulation Modeling and Analysis. McGraw-Hill, New York. Lehmer, D. H. (1951). Mathematical methods in large-scale computing units. In: Proc. 2nd Symp. on Large-Scale Digital Calculating Machinery. Harvard Univ. Press, Cambridge, MA, pp. 141-146. Lele, S. and M. T. Boswell (1986). A general technique for generating pseudo multivariate random variables: A Markov process approach. In: Computer Science & Statistics: Proc. 18th Sympos. on the Interface, American Statistical Association, Alexandria, VA, 377-380. Lewis, P. A. W., A. S. Goodman and J. M. Miller (1969). A pseudo-random number generator for the System/360. IBM System J. 8, 136-146. Lewis, P. A. W. and E. J. Orav (1989). Simulation Methodology for Statisticians, Operation Analysts, and Engineers, Vol. 1. Wadsworth & Brooks/Cole, Pacific Grove, CA. Lewis, T. G. (1975). Distribution Sampling for Computer Simulation. Heath, Lexington, MA. Lewis, T. G. and W. H. Payne (1973). Generalized feedback shift register pseudorandom number algorithm. J. A C M 20, 456-468. Loukas, S. and C. D. Kemp (1986). The computer generation of bivariate binomial and negative binomial random variables. Comm. Statist. Simulation Comput. 15, 15-25. Lurie, D. and H. O. Hartley (1972). Machine-generation of order statistics for Monte Carlo computations. Amer. Statist. 26, 26-27. Lurie, D. and R. L. Mason (1973). Empirical investigation of several techniques for computer generation of order statistics. Comm. Statist. 2, 363-371. MacLaren, N. M. (1989). The generation of multiple independent sequences of pseudorandom numbers. Appl. Statist. 38, 351-359. MacLaren, M. D. and G. Marsaglia (1965). Uniform random number generators. J. A C M 12, 83-89. MacLaren, M. D., G. Marsaglia and T. A. Bray (1964). A fast procedure for generating exponential random variables. Comm. A C M 2, 298-300. Marsaglia (1961). Expressing a random variable in terms of uniform random variables. Ann. Math. Statist. 32, 894-898. Marsaglia (1963). Generating discrete random variables in a computer. Comm. A C M 6, 37-38. Marsaglia, G. (1964). Generating a variable from the tail of the normal distribution. Technometrics 6, 101-102. Marsaglia, G. (1984). The exact-approximation method for generating random variables in a computer. J. Amer. Statist. Assoc. 79, 218-221. Marsaglia, F. and W. W. Tsang (1984). A fast, easily implemented method for sampling from decreasing or symmetric unimodal density functions. S I A M J. Sci. Statist. Comput. 5, 349-359. Marsaglia, G., A. Zaman and W. W. Tsang (1990). Toward a universal random number generator. Statist. Probab. Lett. 8, 35-39. Marse, K. and S. D. Roberts (1983). Implementing a portable FORTRAN uniform (0, 1) generator. Simulation 41, 135-139. McLeod, A. I. (1985). A remark on algorithm AS 183: An efficient and portable pseudo-random number generator. Appl. Statist. 34, 198-200. Michael, J. R., W. R. Schucany and R. W. Haas (1976). Generating random variates using transformations with multiple roots. Amer. Statist. 30, 88-90. Morgan, B (1984). The Elements of Simulation. Chapman and Hall, London.

Computer generation c~( random variables

719

Moshier, S. L. (1989). Methods and Programs for Mathematical Functions. Halstead Press (Wiley), New York. Nachtsheim. C. J. and M. E. Johnson (1988). A new family of multivariate distributions with applications to Monte Carlo studies. J. Amer. Statist. Assoc. 83, 984-989. Nance, R. E. and C. Overstreet (1972). A bibliography of random number generation. Comput. Rev. 13, 495-508. Nance, R. E. and C. Overstreet (1978). Some observations on the behavior of composite random number generators. Oper. Res. 26, 915-935. Neave, H. R. (1973). On using the Box-Muller transformation with multiplicative congruential pseudo-random number generators. Appl. Statist. 22, 92-97. Nelson, B. L. (1987). A perspective on variance reduction in dynamic simulation experiments. Comm. Statist. Simulation Comput. 16, 385-426. Neuman, F. and C. F. Martin (1976). The autocorrelation structure of Tausworthe pseudorandom number generators. 1EEE Trans. Comput. 25, 460-464. Newby, M. J. (1979). The simulation of order statistics from life distributions. Appl. Statist. 28, 298-301. Niederreiter, H. (1976). On the distribution of pseudorandom numbers generated by the linear congruential method III. Math. Comp. 30, 571-597. Niederreiter, H. (1978). Quasi-Monte Carlo methods and pseudo-random numbers. Bull. Amer. Math. Soc. 84, 957-1041. Niederreiter, H. (1984). The performance of k-step pseudo-random number generators under the uniform test. SIAM J. Sci. Statist. Comput. 5, 7798-7810. Niederreiter, H. (1987).-A statistical analysis of generalized feedback shift register pseudo-random number generators. SIAM J. Sci. Statist. Comput. 8, 1035-1051. Norman, J. E. and L. E. Cannon (1972). A complete program for the generation of random variables from any discrete distribution. J. Statist. Comput. Simulation 1, 331-343. Oommen. B. J. and D. T. H. Ng (1990). On generating random permutations with arbitrary distributions. Comput. J. 33, 368-374. Park, S. K. and K. W. Miller (1988). Random number generators: Good ones are hard to find. Comm. A C M 31, 1192-1201. Patil, G. P., M. B. Boswell and D. S. Friday (1975). Chance mechanisms in computer generation of random variable. In: G. P. Patil et al., eds., Statistical Distributions in Scientific Work, Vol. 2. Reidel, Dordrecht, 37-50. Patil, G. P., M. T. Boswell, S. W. Joshi and M. V. Ratnaparkhi (1984). Dictionary and Classified Bibliography of Statistical Distributions in Scientific Work, Volume 1: Discrete Models. International Co-operative Publishing House, Fairland, MD. Patil, G. E, M. T. Boswell and M. V. Ratnaparkhi (1984). Dictionary and Classified Bibliography of Statistical Distributions in Scientific Work, Volume 2: Continuous Univariate Models. International Co-operative Publishing House, Fairland, MD. Patil, G. P., M. T. Boswell, M.V. Ratnaparkhi and J. J. J. Roux (1984). Dictionary and Claffified Bibliography of Statistical Distributions in Scientific Work, Volume 3: Multivariate Models. International Co-operative Publishing House, Fairland, MD. Pavlov, A. I. and B. B. Pohodzei (1979). Pseudorandom numbers generated by linear recurrence relations over a finite field. Z. Vyeisl. Mat. i Mat. Fiz. 19, 836-842. (In Russian.) Payne, W. H. (1977). Normal random numbers: Using machine analysis to choose the best algorithm. A C M Trans. Math. Software 3, 346-358. Payne, W. H., J. R. Rabung and T. E Pogyo (1969). Coding the Lehmer pseudo-random number generator. Comm. A C M 12, 85-86. Peskun, P. H. (1980). Theoretical tests for choosing the parameters of the general mixed linear congruential pseudorandom number generator. J. Statist. Comput. Simulation 11,281-305. Peterson, A.V. and R. A. Kronmal (1982). On mixture methods for the computer generation of random variables. Amer. Statist. 36, 184-191. Ramberg, J. S. and P. R. Tadikamalla (1978). On the generation of subsets of order statistics. J. Statist. Comput. Simulation 6, 239-241.

720

M. T. Boswell, S. D. Gore. G. P. Patil and C. Taillie

Ramberg, J. S., P. R. Tadikamalla, E. J. Dudewicz and E. F. Mykytka (1979). A probability distribution and its uses in fitting data. Technometrics 21, 201-214. RAND Corporation (1955). A Million Random Digits with 100.000 Normal Deviates. Free Press, Glencoe, IL. Reeder, H, A. (1972). Machine generation of order statistics. Amer. Statist. 26, 56-57. Ripley, B. D. (1979). Algorithm AS137. Simulating spatial patterns: Dependent samples from a multivariate density. Appl. Statist. 28, 109-112. Ripley, B. D. (1983a). Computer generation of random variables: A tutorial. Int. Statist. Rev. 51, 301-319. Ripley, B. D. (1983b). The lattice structure of pseudo-random number generators. Proc. Roy. Soc. Set. A 389, 197-204. Ripley, B. D. (1987). Stochastic Simulation. Wiley. New York. Romero, L. A. (1980). On the generation of uniform psuedorandom numbers, lnvestigaci6n Oper. 2-3, 17-27. (In Spanish.) Sahai, H. (1979). A supplement to Sowey's bibliography on random number generation and related topics. J. Statist. Comp. Simulation 1O, 31-52. Schmeiser, B. W. (1980a). Generation of variates from distribution tails. Oper. Res. 28. 1012-1017. Schmeiser, B. W. (1980b). Random variate generation: A survey. In: T. I. Oren. C. M. Shub. and P. F. Roth, eds., Simulation with Discrete Models: A State of the Art Survey. IEEE, New York, 79-104. Schmeiser, B. W. (1990). Simulation experiments. In: D. P. Heyman and M. J. Sobel, eds.. Stochastic Models. Elsevier, Amsterdam, Chapter 7, 295-330. Schmeiser, B. W. and R. Lal (1980a). Squeeze methods for generating gamma variates. J. Amer. Statist. Assoc. 75, 679-682. Schmeiser, B. W. and R. Lal (1980b). Multivariate modeling in simulation: A survey. In: 23rd Ann. Technical Conf. Trans. American Society for Quality Control, 252-261. Schmeiser, B. W. and M, A. Shalaby (1980). Acceptance/rejection methods for beta variate generation. J. Amer. Statist. Assoc. 75, 673-678. Schucany, R. W. (1972). Order statistics in simulation. J. Statist. Comput. Simulation 1,281-286. Schrage, L. E. (1979). A more portable Fortran random number generator. A C M Trans. Math. Software 5, 132-138. Sim, C. H. and P. A. Lee (1989). Simulation of negative binomial processes. J. Statist. Comput. Simulation 34, 29-42. Sowey, E. R. (1972). A chronological and classified bibliography on random number generation and testing, lnternat. Statist. Rev. 40, 355-371. Sowey, E. R. (1978). A second classified bibliography on random number generation and testing. lnternat. Statist. Rev. 46. 89-101. Stadlober. E. (1981). Generating students t variates by a modified rejection method. In: Proc. 2rid Pannonian Sympos. on Math. Statist. Reidel. Dordrecht. Stahnke. W. (1973). Primitive binary polynomials. Math, Comp. 27. 977-980. Strecok, A. J. (1968). On the calculation of the inverse of the error function. Math. Comp. 22, 144-158. Tadikamalla, E R. (1978a). Computer generation of gamma random variables. Comm. A C M 21. 419-422. Tadikamalla, P. R. (1978b). Computer generation of gamma random variables- II. Comm. A C M 21. 925-928. Tadikamalla, P. R. (1980). On simulating non-normal distributions. Psvchometrika 45. 273-279. Tanner. M. A. (1991) ~. Tools .for Statistical h~ference. Springer. New York. Tanner, M. A. and W. H. Wong (1987). The calculation of posterior distribution bv data augmentation (with discussion). J. Amer. Statist. Assoc. 82. 528-550. Tausworthe, R. C. (1965). Random numbers generated by linear recurrence modulo two. Math. Comp. 19. 201-209. Tezuka. S. (1984). The theoretical comparison bet~een congruential methods and GFSR algorithms. JSI Res. Rep. TR87-0001. IBM. Tokyo. Japan.

Computer generation of random variables

721

Tezuka, S. (1987a). Walsh-spectral test for GFSR pseudorandom numbers. Comm. A C M 311, 731-734. Tezuka, S. (1987b). On the discrepancy of GFSR pseudo-random numbers. J. A C M 34, 939-949. Thisted, R. A. (1988). Elements of Statistical Computing: Numerical Computation. Chapman and Hall, New York. Tootill, J. P. R., W. D. Robinson and A. G. Adams (1971). The runs up-and-down performance of Tansworthe pseudo-random number generators. J. A C M 18, 381-399. Tootill, J. P. R., W. D. Robinson and D. J. Egole (1973). An asymptotically random Tausworthe sequence. J. A C M 211, 469-481. Van der Laan, C. G. and N. M. Temme (1984). Calculation of Special Functions: The Gamma Function, the Exponential Integrals and Error-like Functions. CWI Tract, Vol. 10, Centre for Mathematics and Computer Science, Amsterdam, The Netherlands. Walker, A. J. (1974). New fast method for generating discrete random numbers with arbitrary frequency distributions. Electron. Lett. 10, 127-128. Walker, A. J. (1977). An efficient method for generating discrete random variables with general distributions. A C M Trans. Math. Software 3, 253-256. Westlake, W. J. (1967). A uniform random number generator based on the combination of two congruential generators. J. A C M 14, 337-340. Whittlesey, J. R. B. (1968). A comparison of the correlational behavior of random number generators for the IBM 360. Comm. A C M 11, 641-644. Whittlesey, J. R. B. (1969). On the multidimensional uniformity of pseudorandom generators. Comm. A C M 12, 247. Wichmann, B. A. and I. D. Hill (1982). Algorithm AS 183: An efficient and portable pseudorandom number generator. Appl. Statist. 31, 188-190. Wichmann, B. A. and I. D. Hill (1984). Correction to algorithm AS 183. An efficient and portable pseudo-random number generator. Appl. Statist. 33, 123. Wichmann, B. A. and I. D. Hill (1987). Building a random-number generator. B Y T E 12, 127-128. Wichura, M. J. (1988). The percentage points of the normal distribution. Appl. Statist. 37, 477-484. Wilson, J. R. (1984). Variance reduction techniques for digital simulation. Amer. J. Math. Management Sci. 4, 277-312. Young, D. M., J. W. Seaman Jr and D. W. Turner (1990). A ratio-of-uniforms method for generating exponential power variates. J. Statist. Comput. Simulation 35, 13-18. Zeisel, H. (1986). A remark on algorithm AS 183: An efficient and portable pseudo-random number generator. Appl. Statist. 35, 89. Zierler, N. (1969). Primitive trinomial whose degree is a Marsenne exponent. Inform. Control 15, 67-69. Zierler, N. and J. Brilhart (1968). On primitive trinomials (mod 2). Inform. Control 13, 541-554.