The Art of Computer Generation of Random Variables
M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie
I. Introduction
The title of this paper has been chosen to be similar to The Art of Computer Programming, a multi-volume work by Donald E. Knuth. The second volume of Knuth's treatise includes a comprehensive discussion of the design and testing of uniform random number generators. The descriptor, art, is appropriate in connection with the generation of uniform random numbers since the algorithms in use today are the result of a long history of trial and error evaluation of proposed generators. In fact, the search for an ultimate uniform generator is never-ending. This is because the uniform 'random' number streams produced on computers are actually deterministic and, accordingly, each generator violates some aspect of 'randomness' and would fail a suitably devised test of uniformity. For this reason, the term 'pseudo-random' has been coined, but we will forego this bit of pedantry and omit the qualifier 'pseudo'. The present paper deals exclusively with random variable generation. Section 2 surveys some of the issues that arise in connection with uniform generators. This is a vast area and we can only touch upon a few of the main points. The rest of the paper assumes that a suitable uniform generator is available as a system resource, and is concerned with using this resource to generate observations from non-uniform distributions. Section 3 discusses univariate distributions and Section 4 deals with multivariate distributions. Our emphasis is upon general methods rather than specific distributional families, although the latter are used for illustrative examples and for motivational purposes. In contrast with the case of uniform generators, the criteria for evaluating non-uniform generators are more clearcut, although these criteria often conflict so that the final choice of generator involves tradeoffs among factors such as programming complexity, portability, storage requirements, and execution time. Important topics not covered in the present survey include: variance reduction (Nelson, 1987; Wilson, 1984); importance sampling, Gibbs sampling, and data augmentation (Gelfand and Smith, 1990; Hahn and Jeruchim, 1987; Tanner, 1991; Tanner and Wong, 1987); large deviation methods (Bucklew, 661
662
M. T. Boswell, S. D. Gore, G. P. Patil and C. TaiUie
1990, Chapter VIII); permutation generators (Oommen and Ng, 1990); Monte Carlo tests (Besag and Clifford, 1989; Hall and Titterington, 1989); simulation of time series (Barone, 1987), spatial processes and other stochastic processes; numerical quadrature (Flournoy and Tsutakawa, 1991); general simulation and other applications. The subject of random variable generation has an enormous literature. Bibliographies have been prepared by Deak and Bene (1987), Nance and Overstreet (1972), Sahai (1979), and Sowey (1972, 1978). See also Holst (1979). Survey articles include Banks, Carson and Goldsman (1989), Patil, Boswell and Friday (1975), Ripley (1983a), and Schmeiser (1980b, 1990). There are also a number of recent monographs with at least one chapter on the topic of random variable generation. These include Banks and Carson (1984), Dagpunar (1988b), Deak (1990), Devroye (1986), Dudewicz and Ralley (1981), Fishman (1978), Johnson (1987), Kennedy and Gentle (1980), Kleijnen (1988), Knuth (1981), Law and Kelton (1982), Lewis and Orav (1989), Lewis (1975), Morgan (1984), and Ripley (1987).
2. Generating uniform random numbers on a computer
Historically, simulation was of interest even before the advent of electronic computers, and physical devices such as dice, coins, and radiation detectors were used to inject randomness into the processes under study. W. S. Gossett's investigations of the 'student' t-distribution are an early and famous example of the systematic use of simulation. The RAND Corporation (1955) produced a list of one million random digits and made the list available on tape so it would be usable on computers. But, such lists are difficult to maintain and timeconsuming to read electronically. Physical generators also suffer from several shortcomings: the results are not reproducible (unless stored in toto) and can be sensitive to extraneous events. Experience indicates that only relatively short sequences of high-quality random numbers can be produced by physical means. Interest has therefore turned to methods of generation that directly employ the computer's arithmetical capabilities. Perhaps the earliest such method is due to von Neumann who suggested that an N-digit number be squared and that the middle N digits of the result be used as the next number in the sequence. Von Neumann's 'middle-square' method has proved unsatisfactory in general; see Knuth (1981, p. 4) for discussion on this point. But the method is broadly representative of today's uniform generators which (internally, at least) produce a stream of integers with each member of the stream being calculated by applying a deterministic mathematical formula to a small number of immediate predecessors in the stream. Generally, these integers are subsequently converted to a floating point format and returned as real numbers in the unit interval (0, 1). The conversion from integer to floating point format is not entirely trivial and will be discussed in greater detail below. A well-designed uniform generator should never return
Computer generation of random variables
663
the values 0.0 or 1.0. We will use the term 'uniform' generator to refer to either the internal stream of integers or the external stream of floating point values, letting the context determine which is intended. An enormous variety of tests, both theoretical and statistical, have been proposed for evaluating the quality of a uniform generator. These tests will not be enumerated or discussed here; interested readers may refer to Bright and Enison (1979), Dudewicz and Ralley (1981), and Knuth (1981). MacLaren (1989) discusses between-sequence performance of multiple generators on parallel computers. Almost all uniform generators in use today are based upon a linear congruential recurrence relation of the form X~+ 1 = (aoX n + a l X , _ 1 + . . . + a~X,,_ k + c) mod m .
Here ao, a l , . . . ,a~, c, and m are specified integers that characterize the generator. The user must also provide a seed vector (X0, X 1 , . . . , Xk) to get the recurrence started. Then Xk+ , can be generated, and the seed for Xk+ 2 becomes ( X 1 , X z , . . . , X k + l ) , etc. See Eichenauer and Lehn (1986) for discussion of nonlinear congruential generators. Most often, k = 1 and the recurrence becomes X . +1 = (aXn + c) mod m ,
a form proposed by Lehmer (1951). Here, the parameters are the multiplier a, the additive constant c, and the modulus m. Additionally, a seed X 0 is required. When c = 0, the generator is called a multiplicative congruential generator and, when c C 0 , the generator is called a mixed congruential generator. The characterizing parameters must be chosen with care if the generator is to give satisfactory results. It has become a cliche to point out t h a t the generator R A N D U , based on Am+1 = (216 + 3 ) X n mod (231 - 1) and provided for use on IBM 360/370 and P D P l l machines, is a very poor generator. Borosh and Niederreiter (1983) discuss the choice of multipliers in linear congruential generators. Fishman and Moore (1986) study the multiplicative congruential generators with m = 231 - 1. A given generator may also restrict the allowed seed values for satisfactory performance and sufficiently long period. A multiplicative congruential generator, for example, must obviously disallow X 0 = 0 as a seed. Heath and Sanchez (1986) discuss the period of uniform generators, pointing out that the minimum acceptable period-length depends upon the application. One of the better known multiplicative generators is based on the relation
Xn+1 (aXn) mod (2 31 - i ) , =
where a = 14 z9 and has been shown to possess good statistical properties. See Carta (1990), Lewis et al. (1969), Park and Miller (1988) and Payne et al.
664
M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie
(1969). See also Atkinson (1980), Borosh and Niederreiter (1983), Peskun (1980), and Ripley (1983a). It should be emphasized that the integer arithmetic in the recurrence relation needs to be carried out in full precision. An amusing example where this dictum was overlooked is the random number generator built into the Master Library of the Texas Instruments TI-59 programmable calculator. The library manual provides a detailed description of the mixed congruential generator employed (one of those listed by Knuth, 1981), but the recurrence is calculated using the calculator's floating point arithmetic with the result that after three or four iterations the generated uniforms bear no resemblance to what should come out of a correct implementation. When m = 2, the linear congruential recurrence generates a sequence b n of bits. Tausworthe (1965) generators are of this type with recurrence relation given by bn+ 1 = (bn_ p + bn_(p_q) ) mod 2,
where the integers p > q > 0 and a seed (b0, b x , . . . ,bp) determine the generator. The right-hand side can be calculated efficiently by the 'mutually exclusive or' function, b,+ 1 = b n _ p X O R b n _ ( p _ q ) . The parameters p and q should be chosen to make the trinomial x P + x ~ + 1 a primitive polynomial modulo 2. When this is done, the period of the bit stream is 2 p - 1. Zierler (1969) published a list of 23 pairs (p, q) that make x P + x q + 1 a primitive polynomial modulo 2 (see also Stahnke, 1973, and Zierler and Brilhart, 1968). In the Tausworthe scheme, random integers are not formed by concatenating consecutive bits from the bit stream. Instead, separate streams are used for each bit position. Suppose, for example, that 32-bit integers are desired. Then 32 separate bit streams b~i),
i=1,2,...,32,
would be generated, each with its own starting seed, and the stream of random integers X n would be formed by the following concatenation of bits, Xn = b ,(1) b~(2) . . . b ~(32) .
Notice that the bit streams can be generated simultaneously as X , + 1 = X , _ p X O R X,_(p_q) .
Tausworthe generators of this type are also called generalized feedback shift register (GFSR) generators. Selecting starting seeds ( X o, X a, . . . , Xp) that will ensure 'independence' of the different bit streams is a nontrivial matter involving a lengthy initialization (see Lewis and Payne, 1973, and Collings and Hembree, 1986). Kirkpatriek and Stoll (1981) suggest choosing the seed according to a certain random process in order to avoid this initialization. GFSR generators have been studied extensively both statistically and theoretically. Bright and Enison (1979) have shown that with an appropriate
Computer generation of random variables
665
seed vector, Tausworthe sequences are uniform over the period of the generator. See also Altman (1988), Boyar (1989), Collings and Hembree (1986), Fellen (1969), Grube (1973), Koopman (1986), Lewis and Payne (1973), Neuman and Martin (1976), Niederreiter (1976, 1978, 1984, 1987), Pavlov and Pohodzei (1979), Romero (1980), Tezuka (1984, 1987a,b), Tootill et al. (1971, 1973), and Whittlesey (1968, 1969). The first author of this paper considers Tausworthe generators to be superior to the more common and simpler Lehmer generators, and he has developed a package of FORTRAN-callable random variable generators based on the GFSR generator with p = 98 and q = 27 proposed by Lewis and Payne (1973). A high-quality generator should exhibit little serial correlation when k-tuples ( X 1 , . . . , Xk) of observations are examined. This is known as k-distributivity. Tausworthe generators have been found to have excellent k-distributivity properties (see Fushimi and Tezuka, 1983). Various proposals have been made to improve the k-distributivity of other generators, notably shuffling and/or combinations of generators. See, for example, Brown and Solomon (1979), Collings (1987), L'Ecuyer (1988), Gebhardt (1967), MacLaren and Marsaglia (1965), Nance and Overstreet (1978), Westlake (1967), and Wichmann and Hill (1982, 1984, 1987). As pointed out above, the stream of integers produced internally by a uniform generator has to be converted to floating point values between zero and one. Also, a well-designed uniform generator should never return 0.0 or 1.0. Routines that are coded in Assembly language can sometimes manipulate the bits directly into a floating point format but higher level routines usually employ division. Suppose, for example, that the possible values in the stream of integers are J = 0, 1, 2 . . . . , M - 1. Then the transformation from J to a uniform variable U is described by the following algorithm: Step 1. If J = 0, fetch the next J and repeat Step 1. Step 2. Deliver U = J/M.
Some generators, including multiplicative congruential, never yield J = 0 and, for these, Step 1 may be omitted. Other generators may restrict the values of J to even or to odd integers, but these are minor complications. More fundamental is the fact that successive values J / M of U typically have a finer resolution than can be accommodated in single precision floating point format. To see the consequences of this suppose the possible values of J are J = 1 , 2 , . . . , 998,999, and suppose the floating point representation carries two decimal digits of precision. The first 99 values of J / M can be accommodated exactly: 0.001, 0 . 0 0 2 , . . . , 0.098, 0.099, but at the upper end some rounding or truncation will occur. Thus, J / M = 0.999 will either be rounded toward to U = 0.99 or upward to the undesirable
666
M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie
value U = 1.0. Even if these undesirable values are screened out, the resulting generator violates one of the fundamental properties of a uniform distribution, namely that U and 1 - U should be identically distributed. When M is approximately a power of the computer's radix (as it is above), simple and efficient remedies are available in Assembly language to force U and 1 - U to have nearly the same distribution. Implementing these remedies in higher level languages tends to be awkward, and the solution is often to ignore the difficulty. Our own preference is to screen out the values U = 0.0 and U = 1.0 as well as those values of U that are so small that 1 - U = 1 in floating point representation. Another solution is to return U as double precision. While a strong case can be made for implementing uniform generators in Assembly language, the need for portability may dictate the use of a higher level language. L'Ecuyer (1986, 1988), Marsaglia, Zaman, and Tsang (1990), Marse and Roberts (1983), Sehrage (1979) and Wichmann and Hill (1982, 1984; see also McLeod, 1985, and Zeisel, 1986) describe portable uniform generators. Park and Miller (1988) discuss implementation aspects and present several examples of unsatisfactory generators. The remainder of this paper assumes that a well-tested uniform generator is available to the user.
This section reviews various computational techniques that have appeared in the literature on computer generation of random variables from non-uniform probability distributions. Examples are given for the purpose of motivating and clarifying the discussion and illustrating the techniques under consideration but may not represent the most efficient method of generating observations from the given distribution. We assume that the reader of this paper has a foundation in basic mathematical probability. He should be familiar with the standard discrete and continuous distributions, with the concept of conditional probability, and with change of variable techniques. The more he knows about various probability distributions, their properties and interrelationships, the easier it will be for him to follow the methods presented in this paper and the easier it will be for him to develop new methods. A good collection of facts regarding distributional properties and interrelationships is contained in the three-volume Dictionary and Classified Bibliography of Statistical Distributions in Scientific Work by Patil et al. (1984). Our discussion is concerned with the generation of random deviates from specified distributions. But in simulation studies of robustness or power of statistical procedures, it is often convenient to organize the results in terms of characteristics such as skewness or kurtosis instead of particular distributions.
Computer generation of random variables
667
Here the use of standardized systems such as the Johnson family may prove convenient. See Tadikamalla (1980) for discussion of several such systems of distributions. In this connection, Johnson, Tietjen and Beckman (1980) propose a four-parameter system of symmetric distributions. Fleishman (1978) points out that specified skewness and kurtosis can be achieved by a suitable cubic polynomial-transformation of a standard normal deviate.
3.2. Notation If X denotes a random variable, the corresponding lower case letter, X = x, indicates a realized value. Cumulative distribution functions and density functions are denoted by F(x) and f(x), respectively. A subscript, as in Fx and fx, will be appended as needed to avoid ambiguity. Algorithms are presented in an outline form for generating one observation. When this observation has been obtained we say 'deliver X'. At this point, the routine terminates if only one observation is desired, or loops back to the beginning if another observation is desired. It is also convenient to have symbolic names for some frequently occurring distributions: U(a, b) uniform distribution on the interval (a, b), U(0, 1) uniform distribution on the unit interval, N(/z, o-2) normal distribution with mean/z and variance o-2, N(0, 1) standard normal distribution, EXP(/x) exponential distribution with mean/z, EXP(1) standard exponential distribution, P(A) Poisson distribution with mean A, P(1) standard Poisson distribution. Insofar as possible, the letters U, V and W will be reserved for uniformly distributed variables. When an algorithm states 'generate U1, U2, . . . , Un from U(0, 1)', it is understood that the U~ are independent.
3.3. Transformations of random variables If it is possible to generate realizations of a random variable Y, then observations can also be generated for any transformation X = g ( Y ) of Y, provided the transformation g(.) is computable. Multivariate transformations of the form X = g(Yx, Y 2 , . . . , Y,) are also useful in random variable generation. Often, but not always, I11, • • •, Yn are independent identically distributed realizations from some single distribution. EXAMPLE 3.3.1 (The chi-square distribution). If Y has a standard normal distribution, then X = y2 has a chi-square distribution with one degree of freedom. So, to generate observations from X, one can generate realizations of the standard normal distribution and square those realizations. More generally, if Y1. . . . . Yn are independent standard normal variates, then X = y2 + . . . +
668
M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie
y2 has a chi-square distribution with n degrees of freedom. In a similar fashion, the F- and t-distributions can be obtained by suitably transforming normal and chi-square variates. EXAMPLE 3.3.2 (The Cauchy distribution). If I11 and Y2 are independent realizations of the standard normal distribution, then the standard Cauchy distribution can be generated by X = IIi/Y2. Michael, Schucany and Haas (1976), consider applications in which the transformation X = g(Y) is not single-valued a n d the different values need to be selected with appropriate probabilities.
3.3.1. Inverse distribution functions: Discrete case Consider a discrete random variable taking values x 1 < X 2 < ' ° " . The distribution function F(x) is a step function with jumps at xl, x 2, . . . . The inverse distribution function is defined by F-l(u)=xi
if F ( x i _ ~ ) < u < ~ F ( x i ) , i = 1 , 2 , . . . ,
where x0 = - ~ . Now let U be a random variable uniformly distributed on (0, 1), and set X = F-I(U). Then P[X ~
F(x),
so that the generated X has F(x) as its distribution function. Implementation details are given in the following algorithm. ALGORITHM 3.3.1. Step 1. Generate U = u from U(0, 1). Step 2. Initialize i = 1 and f = F(xl). Step 3. If u ~
F(xi).
Much of the execution time in this algorithm may be spent in making comparisons at Step 3, and performance can be improved by employing efficient search strategies. If nothing else, the search order could be changed to hold down the expected number of comparisons. For example, K e m p (1986) describes a binomial generator with the search moving away from the modal value. In general, let p / = F(xi) - F(xi_l), i = 1, 2 , . . . , and order these so that Ph ~>PJ2 ~> " " " " In fact, the reordered sequence does not have to be monotonic; simply moving large p to the beginning of the sequence will improve performance. Now, the improved algorithm becomes as follows. ALGORITHM 3.3.2. Step 1. Generate U = u from U(0, 1).
Computergenerationof random variables Step Step Step Step
2. 3. 4. 5.
669
Initialize i = 1 and f = Ph" If u ~
EXAMPLE 3.3.3 (The Poisson distribution). Suppose we desire an observation from the standard Poisson distribution P(1). The probability function is Pi = e-i/i!, i = 0, 1 , . . . . Since pi is already a non-increasing function of i, we make -1 comparisons in their natural order. To avoid repeated multiplication by e , we multiply the uniform random variable U by e. The m e t h o d is summarized by the following algorithm. ALGORITHM 3.3.3.
Step Step Step Step Step Step
1. 2. 3. 4. 5. 6.
Generate U = u from U(0, 1). Initialize: i = 0, a = 1, z = 1. (N.B., a = eF(i) and z = 1~ft.) Set y = eu. If y ~< a then deliver X = i. Increment i to i + 1 and then replace z by z/i and a by a + z. Go to Step 4.
The method could be further improved by a faster search routine which combines various intervals in a suitable manner. Index tables (Chen and Asau, 1977; Fishman and Moore, 1986) could also be employed to speed up the searching. See Kachitvichyanukul et al. (1988), Kemp (1982, 1988, 1990) and Kemp and Kemp (1990, 1991) for other Poisson generation methods. Kemp (1981) combines the inverse distribution function with other transformations to generate observations from the logseries distribution. Floating point comparisons can be avoided through the use of table-lookup methods, as illustrated in the following example. The m e t h o d requires that subscripts (or indices) be selected at random from among a finite set {1, 2 , . . . , N } of index values. This is easily accomplished by the floor (or truncation) operator, I = 1 + [NU]. Then t(I) is a random selection from the stored table t ( 1 ) , . . . , t(N). EXAMPLE 3.3.4 (The hypergeometric distribution). The hypergeometric distribution results from sampling without replacement from an urn containing r red balls and b blue balls. If n balls are drawn at random without replacement from the urn and X is the number of red balls, then
for max{0, n - b} ~
670
M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie
suppose we wish to sample three times without replacement from an urn containing 6 red and 4 blue balls to get the number of red balls, X. Let t(1) . . . . . t(6) = 1 and t(7) . . . . . t(10) = 0. Next generate U from U(0, 1), and put I = [10U] + 1. Then set X = t(I) to get the result after one draw. We now want to sample from the 9 remaining tabled values. We interchange t(I) and t(10); if I = 10, the table is unchanged. We generate a new value of U, set I = [9U] + 1, and replace X by X + t(I). This value of X represents the n u m b e r of red balls after two draws, etc. The following algorithm describes this procedure for n draws without replacement when there are r red balls and b blue balls initially. ALGORITHM 3.3.4.
Step 1. Initialize x = 0, t(1) . . . . . t(r) = 1, t(r + 1) . . . . . ]=r+b, k=n. Step 2. Generate U = u from U(0, 1). Step 3. Set i = [ju] + 1. Step 4. Replace x by x + t(i) and decrement k. Step 5. If k = 0 then deliver X = x. Step 6. Interchange t(i) and t(j), then decrement j. Step 7. Go to Step 2.
t(r + b) = O,
The table still contains the original r + b values but in different order. Consequently, the algorithm can be repeated to get a second observation of X without re-initializing the table. Also notice that if n >i r or n I> b, then efficiency can be improved slightly by terminating the routine when all the red or all the blue balls have been drawn. In Algorithm 3.3.4, sampling is without replacement from a population of two distinct types. A simple modification for sampling with replacement would result in the binomial distribution; merely omit Step 6. On the other hand, if three (or more) distinct types are required then the values stored in the table (say, 1, 2, 3) can be used as subscripts on the variables. That is, if the sampled value is t(i) =j then replace X(j) by X(]) + 1, j = 1, 2, 3. The delivered pair of values (X(1), X(2)) would comprise an observation from the multivariate hypergeometric distribution (or from the multinomial distribution if sampling were with replacement). EXAMPLE 3.3.5 (Table-lookup method, Marsaglia, 1963). Suppose we wish to generate observations of a discrete random variable X with density P[X = xi] = Pi, i = 1, 2 , . . . , with E p / = 1. The inverse distribution function m e t h o d generates U from U(0, 1) and returns X = x~ where i is the integer satisfying
po+...+pi_l
with p0 =-0.
A large number of comparisons may be necessary. Instead, for suitable probabilities, a table-lookup procedure can be used. Suppose observations
Computer generation of random variables
671
from an approximation to the standard Poisson distribution P(1) are desired. The distribution to three decimal places is
x
f(x)
Total
0
1
2
3
4
5
6
0.368
0.368
0.184
0.061
0.015
0.003
0.001
1.000
Analogous to Example 3.3.4, we initialize a table t ( 1 ) , . . . , t(1000) so that the values 0 and 1 each occur 368 times, the value 2 occurs 184 times, etc. T h e n X is generated as X = t(I), where I = [1000U] + 1. This lookup procedure is very fast but does require moderate storage. If we were interested in the tail values, the distribution would have to be calculated to more decimal places. For example, five digits of accuracy would require a table with 100 000 entries. The mixture of distributions m e t h o d to be described in Section 3.4 can be used to reduce the storage requirements. See Example 3.4.3 for a continuation of this example.
3.3.2. Inverse distribution functions: General case If F(x) is a distribution function, then the inverse distribution function is defined by F - l ( u ) = inf{x [ u ~< F(x)}. Suppose U has a uniform distribution over (0, 1). Using the same argument as given in Section 3.1, the variate X = F I(U) has F(x) as its distribution function. Approximations for selected inverse distribution functions may be found in Kennedy and Gentle (1980) and Thisted (1988). Hora (1983) and Ramberg et al. (1979) discuss some methods for estimating F -1 when data are available. EXAMPLE 3.3.6 (The exponential distribution). If Y has the standard exponential distribution EXP(1), then F(y) = 1 - e -y and F - l ( u ) = - I n ( 1 - u). Thus, Y can be generated by the transformation Y = - l n ( 1 - U). Further, 1 - U can be replaced by U, since U and 1 - U are identically distributed. Also, notice that X = / z Y = - / z In(U) has an exponential distribution with m e a n / z . The following algorithm summarizes the method of generating an exponential random variable with m e a n / z . ALGORITHM 3.3.5. Step 1. Generate U = u from U(0, 1). Step 2. Deliver X = -/~ ln(u). This algorithm involves evaluating the transcendental function In(u). Transcendental functions, in general, take comparatively more computer time than arithmetic operations.
672
M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie
EXAMPLE 3.3.7 (The WeibuU d&tribution). The Weibull distribution is a powerfunction transformation of the exponential distribution and has its distribution function given by F(x) = 1 - e -axb ,
x>0,
where a, b > 0 are parameters of the distribution. Setting U = 1 - e -ax~ and solving gives
Here again 1 - U can be replaced by U, giving X=
1 ] 1/b - a ln(U)] .
EXAMPLE 3.3.8 (The normal distribution). Using the inverse distribution function technique, a standard normal variate Z can be generated as Z = - I ( u ) where 1
z f_~ exp(-t2/2) dt
@(z) = ~
is the standard normal distribution function. Accurate routines for q~-1 should be available on most systems. If not, Abramowitz and Stegun (1972, p. 933) give some crude approximations. Blair et al. (1976), Guirguis (1991), Moshier (1989, p. 258), Strecok (1968), and Wichura (1988) give approximations accurate to 16+ digits. Also, see the discussion in Van der Laan and Temme (1984, Section 7.1). EXAMPLE 3.3.9 (The B o x - M u l l e r method, Box and Muller, 1958). The normalizing constant ~ = f-~o~e x p ( - t 2 / 2 ) d t for the standard normal distribution is usually obtained by combining the product of two such integrals into a bivariate integral and then transforming to polar coordinates. A similar idea can be employed to generate N(0, 1) observations. Suppose that X and Y are independent standard normal variables. Then the transformation defined by X = R cos O and Y = R sin O gives independent random variables R and O with distribution functions that have closed forms. Specifically, r
fRo(r, O) = ~-~
e_r2/2
where 0 < r < m and 0 < 9 < 2-rr. Clearly 8 is uniformly distributed on the interval (0, 2-rr); R has the Rayleigh distribution with FR(r) = Then
fo
FRI(U) = ~/--2
u e -u2/2 du = 1 - e -r2/2
0
< r<
ln(1 - u), 0 < u < 1. Replacing 1 - U with U and using the
Computergeneration of random variables
673
inverse distribution function technique results in generating R by R = ~ - - 2 In U. Note that this is a special case of the Weibull distribution as discussed in Example 3.3.7. A realization of O can be generated by O = 27rV, where V is a uniform variate independent of U. Then to get observations of X and Y we set X = R cos 0 = ~
In U cos(2"rrV),
Y = R sin 0 = V - 2 In U sin(2~rV). So X and Y are two (independent) observations from the desired normal distribution. There are faster methods for generating normal variables, and this method has been criticized because the generated uniform variates U and V are not truly independent. The serial dependence among the uniform random variables imparts dependence among the X and Y. For a discussion, see Golder and Settle (1976) and Neave (1973). Ripley (1987, pp. 56-58) plots pairs of points generated by this method. A modification of the Box-Muller method is called the polar method and will be discussed in Section 3.5 on rejection methods. EXAMPLE 3.3.10 (The Cauchy distribution). If X = Y~/Y2 where Y1 and Y2 have independent standard normal distributions, then X has the standard Cauchy distribution. If I11 and Y2 are obtained by the Box-Muller method of Example 3.3.9, then X is given by
X
X/Z--2In U sin(2"rrV) X/Z21n U cos(2"rrV) = tan(27rV).
Thus, only a single observation from the uniform distribution is needed to give an observation from the Cauchy. The inverse distribution function technique gives X = -cotan(wU) to be standard Cauchy. A glance at the graphs of the tangent and cotangent functions shows that the two methods are equivalent. EXAMPLE 3.3.11 (The chi-square distribution). In Example 3.3.1 we saw that a chi-square variate X with n degrees of freedom could be generated as the sum of squares of n independent standard normal random variables. If n = 2 and if the normals are generated by the Box-Muller method, then a simple calculation shows that X = - 2 1 n ( U ) . The relation with Example 3.3.6 should be apparent since a chi-square with two degrees of freedom is an exponential with mean equal to 2. To generate a random variable X having a chi-square distribution with 2n degrees of freedom, generate independent uniform random variables
674
M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie
U 1 , . . . , U n on the interval (0, 1) and set X = - 2 In U 1 . . . . .
2 In U n = - 2 ln(Ul" • -Un).
To generate X with 2n + 1 degrees of freedom we need to add the square of a standard normal r a n d o m variable. F r o m E x a m p l e 3.3.9, we can generate two independent uniforms Un+ 1 and U,+ 2 and add to X above the value of - 2 ln(U,+l) sinZ(ZTrUn+2) •
3.4. Mixtures method or composition method If F1, F2, . . . are distribution functions and ( p l , P2, • • .) is a probability vector, then
F(x) = Z piF~(x) is a distribution function known as the mixture distribution of the F i with the pi as mixing proportions. Often the F / a n d corresponding Pi are finite in n u m b e r . The following is an example of a two-fold mixture. EXAMPLE 3.4.1 (Mixture distribution). The height X of a person selected at r a n d o m from the workers at a certain manufacturing plant is measured. Suppose a fraction p of the workers are men and 1 - p are women. If Fm(X) is the height distribution (function) of men and Fw(x ) is that of w o m e n , then the overall height distribution is the mixture distribution Fx(x) = pFm(x) +
(1 - p)Fw(x)
•
Mixture distributions can arise naturally in applications because of a physical mixing of subpopulations, as illustrated in the preceding example. For purposes of r a n d o m variable generation, the decomposition into a mixture is m o r e often done on mathematical grounds. Suppose F(x)= E pfj(x) is a mixture decomposition and define a discrete r a n d o m variable I by P ( I = i ) = P i , i= 1, 2 , . . . . If an algorithm is available for generating I, and if realizations of each F~(x) can also be generated, then observations from the mixture F(x) can be obtained as follows:
Step 1. Generate an observation I = i. Step 2. Generate an observation x from F/(x). Step 3. Deliver X = x. EXAMPLE 3.4.2 (The normal distribution). Let X have a standard normal distribution with distribution function @. The distribution of IX I is known as the half-normal distribution. F r o m the symmetry of the normal distribution, it is clear that realizations of X can be obtained by randomly attaching signs to
Computer generationof random variables
675
realizations of [X[. This is actually a mixture generation method since
4~(x) = p[x < 0]~(x rx < 0) + P[X/> 0]~(x Ix/> 0), where ¢b(x ]X > 0) is the distribution function of IXI, while @(x IX < 0) is that of -IX]. It is clear that, in this case, P[X < 0] = PIXy> 0] = ½. EXAMPLE 3.4.3 (Modified table-lookup method). Example 3.3.5 considered the following approximation to the standard Poisson distribution:
x
fx(x)
Total
0
1
2
3
4
5
6
0.368
0.368
0.184
0.061
0.015
0.003
0.001
1.000
Adding the first digits gives 0.3 + 0.3 + 0.1 + 0.0 + 0.0 + 0.0 = 0.7; adding the second digits gives 0.06 + 0.06 + 0.08 + 0.06 + 0.01 + 0.00 + 0.00 = 0.27; adding the last digits gives 0.008 + 0.008 + 0.004 + 0.001 + 0.005 + 0.003 + 0.001 = 0.030. Now P [ X = o] = 3 ( 0 . 7 ) + ~7 (0.27) + ~ ( 0 . 0 3 ) ;
P[X = 61 = °(0.7) + ° ( 0 . 2 7 ) + ~(o.o3). That is, the distribution of X can be decomposed as the mixture
fx(X) = f~(x)(0.7) + fz(X)(0.27) + f3(x)(0.03), with fl(x), fz(X), f3(x) given by
x
f~(x) fz(x) f3(x)
Total
0
1
2
3
4
5
6
3/7 6/27 8/30
3/7 6/27 8/30
1/7 8/27 4/30
0 6/27 1/30
0 1/27 5/30
0 0 3/30
0 0 1/30
1 1 1
So 70 percent of the time we generate an observation from fl(x), 27 percent of
676
M. T. Boswell, S. D. Gore, G. P. Patil and C. TaiUie
the time from f2(x), and the other 3 percent of the time f r o m f3(x). E a c h of these distributions can be generated by a table-lookup procedure. Instead of the 1000 storage locations as necessary in E x a m p l e 3.3.5, this p r o c e d u r e requires only 7 + 27 + 30 = 64 locations. For other examples, the n u m b e r of locations will depend on the probabilities involved, but in any case should result in a large saving of storage space. The three tables can be c o m b i n e d into a single table with 64 entries: For fl(x) set t(1) = t(2) = t(3) = 0; t(4) = t(5) = t(6) = 1; and t(7) = 2. For fz(X) set t(8) . . . . . t(13) = 0; t(14) . . . . . t(19) = 1; t(20) . . . . . t(27) = 2; t(28) . . . . . t(33) = 3; and t(34) - 4. For f3(x) set t(35) . . . . . t(42) = 0; t(43) . . . . . t(50) = 1; t(51) = t(52) = t(53) = t(54) = 2; t(55) = 3; t(56) . . . . . t(60) = 4; t(61) = t(62) = t(63) = 5; and t(64) = 6. We need to select one of the first seven tabled values with probability 0.7, and conditional on this, the choice from a m o n g the seven should be uniform. Thus generate U from U(0, 1) and set I = [10U] + 1. N o w P[I ~< 7] = 0.7 and P[I = i [I ~<7] = ~ for i = 1 , . . . , 7. Thus, if I ~< 7 then deliver X = t(I). ( H e r e we have used the fact that the conditional distribution of a uniform r a n d o m variable restricted to a subset of its original sample space is still uniform.) If I > 7, then we have not delivered an observation of X. Using the same U generated above, let 1 = [100U] + 1. We know that I > 70. Thus, P[I ~< 97 11 > 70] = 0.27, and P[I - 63 = i ] I > 70] = ~ for i = 8 , . . . , 34. T h e result is summarized in the following algorithm. ALCORITHN 3.4.1. Step 1. Initialize tabled values t ( 1 ) , . . . , t(64) as indicated above. Step 2. G e n e r a t e U from U(0, 1). Step 3. Set I = [10U] + 1. Step 4. If I ~< 7 then deliver X = t(I). Step 5. Replace I by [100U] + 1. Step 6. If I ~< 97 then deliver X = t(I - 63). Step 7. Replace I by [1000U] + 1. Step 8. Deliver X = t(I - 936). We observe in Step 6 that I takes values 7 1 , . . . , 97 and I - 63 takes values 8 , . . . , 34 as necessary to look up the tabled values for f2. In Step 8, I takes values 9 7 1 , . . . , 1000 and I - 936 takes values 3 5 , . . . , 64 as required for f3. This algorithm is fast and requires relatively little storage space. Only one uniform r a n d o m variable and few other operations are necessary. Unfortunately, the required storage, as well as the details of the algorithm, do depend on the numerical probabilities as well as the n u m b e r of digits used. For application to other discrete distributions, see N o r m a n and C a n n o n (1972). An important case of the mixture m e t h o d occurs when the region below the density fx(X) of a continuous r a n d o m variable X can be partitioned into two subregions by a nonnegative function f(x) with O<-g(x)<~fx(x). (See Figure
Computer generation of random variables
(×)
677
"-,~ X
Fig. 3.4.1. The partitioning into two parts of the region below a density function function g(x).
fx(x) by a
define G(x)=f~_=g(u)du, then the area under g(x) is and g(x)/G(~) is a probability density function with corresponding distribution function G(x)/G(~)=F~(x), say. Also the area below fx(X)-g(x) is 1 - G(~); thus [fx(X)-g(x)]/[1- G(~)] is a density function with corresponding distribution function [Fx(x ) - G(x)]/[1 - G(oo)] = F2(x ), say. Thus 3.4.1.)
If we
limx_,=G(x)=G(~ )
F x ( x ) = p , F , ( x ) + p F (x) ,
where Pl = G(~) and P2 = 1 - G(~). This result is easily generalized by partitioning the region under fx(X) into an arbitrary n u m b e r of subregions. T h e curve bounding each region can be changed into a probability density by dividing by the area of the region. EXAMPLE 3.4.4 (The rectangle, wedge, tail method of Marsaglia). Suppose it is desired to generate observations from the standard exponential distribution with density fx(x)= e -x, x > 0. As shown in Figure 3.4.2 (ignore the dotted line), the region under fx(X) can be split into three pieces: a rectangular piece; a wedge piece above the rectangle; and the tail piece. The areas of the regions and the corresponding densities are: The rectangle. The region below y = e -1 from x = 0 to x = 1 has area Pl = e - l ; the density is fl(x) -- 1 if 0 < x < 1 and f l ( X ) = 0 otherwise. The wedge. The region below y = e -x and above y = e -1 f r o m x = 0 to x = 1 has area P2 = 1 - 2e-1; the density is f2(x) = [e -x - e-~]/[1 - 2e -1] if 0 < x < 1 and f2(x) = 0 otherwise. The tail. The region below y = e -~ for x t> 1 has area P3 e - t ; the density is =
fa(X) = e - X / e -1 = e - ( x - l ) i f x ~> 1 a n d f 3 ( x ) = 0 o t h e r w i s e .
678
M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie
%×)=Z ×
2g' ,,.,...,~(x) =(2_×)~-~
"""''"N
e-~
1 x Fig. 3.4.2. The partitioning of the region below the standard exponential density function.
The mixture decomposition then becomes
fx(x) =p, fl(x) + p2L(x) + p3L(x) , where Pl =P3 is approximately 0.368 and P2 is approximately 0.264. Notice that tail density f3(x) is simply a translate of the exponential distribution so that realizations of f3(x ) can be generated as 1 + Y where Y has a standard exponential distribution. (Y may be generated as - I n ( U ) ; see Example 3.3.6.) The method also requires that observations be generated from the wedge f2(x) about 26 percent of the time. However, the distribution function F2(x ) = [1 - e -x - x
e-1]/[1 -
2 e -11
cannot be inverted in closed form. A method of generating from the wedge will be given in Exanaple 3.4.6. This method can be extended by decomposing the region under fx into many rectangles, many wedges, and a tail. Because of the lack of memory property of the exponential distribution, the wedge distributions all have a distribution function analogous to F2(x). For details see MacLaren, Marsaglia and Bray (1964). Another modification is to partition off a triangular region from the wedge as indicated by the dotted line in Figure 3.4.2. The area of the triangle is maximized by taking the slope equal to f~(1) = - e -1. The area of the triangle is then P21 = e - 1/2 ~ 0.184 and the area of the new wedge is P22 = P2 - e - 1/2 0.08. The density corresponding to the triangle is fz~(X) = 2(1 - x ) if 0 < x < 1 and fz~(X) = 0 otherwise. Observations from this density can be obtained as the
Computer generation of random variables
679
minimum, min(U1, U2) , of two independent uniform r a n d o m variables on (0, 1). The (new) wedge has density fzz(X) = [e -x - (2 - x) e-1]/pz2 if 0 < x < 1 and f 2 2 ( X ) ~-- 0 otherwise. Since this distribution needs to be used only 8 percent of the time, a somewhat inefficient algorithm could be employed here without adversely affecting the overall performance of the method. EXAMPLE 3.4.5 ( R a n d o m s u m s as mixtures). A randomly stopped sum, or a random sum, is a random variable X which can be written as the sum of a random number N of random variables, say N
z=Exi, i=1
where N is a random variable assumed to be independent of all the X i. Typically, the X~ are themselves independent and identically distributed (iid). If N has the Poisson distribution and the X i are iid, then Z is said to be a Poisson sum of the Xi and Z is said to have a compound Poisson distribution. If the X~ are iid discrete random variables with density f ( x ) , then the distribution of Z is given by fz(Z) = e ( z ) . P [ N = 0] + P[X 1 = z]P[N = 1] +P[XI +Xz=z]P[N=2]+...,
z=0,1,2,...,
where e(z) = 1 if z = 0 and e(z) = 0 if z > 0. An analogous formula holds if the X~ are iid continuous random variables. That is, the distribution of Z is a mixture of an infinite number of distributions. To generate an observation of Z we first generate an observation of N = n and then generate n observations from the common distribution of the X i. Finally, we add these n observations to get Z. In specific cases, there my be efficient methods for generating the convolutions X1, X 1 + X2, X 1 + X z + X3, . . . directly. For example, if N is P(A1) and each X~ is P(A2) , then Z has the Thomas distribution with parameters & and A2. Example 3.3.3 can be modified to generate an observation N = n from P ( & ) and then to generate Z = X I + . . . + X n from P(nA2) , the desired observation from the Thomas distribution. EXAMPLE 3.4.6 ( R a n d o m m i n i m u m s , continuation o f E x a m p l e 3.4.4). Instead of adding the random variables X 1 , . . . , X u as in Example 3.4.5, we can take their minimum. Let the X / b e iid random variables with distribution function F(x), and let N be a positive integer-valued random variable independent of
M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie
680
XN} is given by
the X i. T h e n the distribution of Z = m i n { X 1 , . . . , ce
Fz(z ) = 1 - P [ Z > z] = 1 - ~ P [ m i n { X 1. . . .
, Xn} > z ] P [ N = n]
n=l c~
= 1- ~
P[X 1> z ....
, X , > z ] P [ N = n]
n=l c~
[1 - F(z)]np[N = n].
=ln=l
In the special case where the X i are u n i f o r m on (0, 1), we have F ( x ) = x , 0
Fz(z ) = 1 - E ( 1 - z ) n P [ N = n ] ,
0
n=l
N o w suppose that N has the Poisson distribution, with m e a n k, t r u n c a t e d to values 2, 3 , . . . . T h e n An
0
Recall that in E x a m p l e 3.4.4 we w a n t e d to g e n e r a t e an o b s e r v a t i o n f r o m a w e d g e distribution with density
f(x)-
e -x - e -1 1-2e- 1
e 1-x - 1 e-------2-'
0
We n o w see that this is exactly the distribution of Z w h e n k = 1. T h u s , to g e n e r a t e observations f r o m f(x), we m a y g e n e r a t e an o b s e r v a t i o n N = n f r o m the Poisson distribution with unit m e a n truncated to 2, 3 , . . . , and then g e n e r a t e U 1 , . . . , U n f r o m U ( 0 , 1). Finally, we deliver X = m i n { U 1 , . . . , Un}. E x a m p l e 3.3.3 can easily be modified to g e n e r a t e the t r u n c a t e d Poisson distribution, or the truncated distribution can be o b t a i n e d by the rejection technique discussed in Section 3.5. Also notice that if the Poisson is t r u n c a t e d to {1,2, 3 . . . . } then realizations of X c o r r e s p o n d i n g to N = 1 yield the rectangle part of Marsaglia's decomposition. See E x a m p l e 3.11.1 for a n o t h e r application of r a n d o m minimums.
Computer generation of random variables
681
There are two reasons why the mixture of distributions technique is useful. The first is that mixtures sometimes occur in a natural manner and the method permits observations to be generated from the desired distribution. The second is to provide a fast routine. The decomposition in this case is often arbitrary and is made up of two general types of distributions, those for which fast generating routines are available, and these with only slow routines. In order for the overall method to be efficient, the slow routines should not be called very often. In the rectangle (triangle), wedge, tail method, the rectangle parts are generated as linear functions of uniform variates and are very fast. The triangle parts, if any, are generated as a linear functions of the minimum of two uniforms and are also fast. The wedge and tail distributions are usually complicated and the generating routines can be slow. Therefore, the region below the density should be divided into enough rectangles/triangles to make the wedge and tail parts occur infrequently. Although it is usually desirable to make the area of the rectangle as large as possible, smaller rectangles are sometimes used in order to improve the rest of the decomposition or to simplify other aspects of the implementation. The stochastic selection from among the various subregions is often made by comparing a uniform random variable with a set of 'cut off' numbers determined by the mixing probabilities. These comparisons are slow and a table-lookup procedure can sometimes improve the overall efficiency. Examples are given by Marsaglia and Tsang (1984) for distributions with decreasing density functions or with symmetric unimodal density functions. In order to generate observations from the Poisson distribution, Kemp and Kemp (1990) use the inverse distribution function technique but first decompose the Poisson distribution into three pieces.
3.5. Rejection method or envelope rejection method The odd-man-out example from elementary probability courses illustrates this method. Three persons go to a restaurant for coffee. It is decided one person will pay for all three. To decide who will pay, each person tosses a coin and if the result is two heads and one tail, or vice versa, then the 'odd-man' pays. Otherwise, the experiment is repeated. Suppose we want to generate an observation X from a distribution Fx(x), but we have only a method for generating observations Y from a distribution Fv(y ). It seems reasonable, if the two distributions are not too different, to correct the distribution of Y by sometimes rejecting Y where Y tends to occur more frequently than X. The relative frequency would be lowered in such regions and raised elsewhere. If Y -- y is the observation generated from Fv(y), then we will either 'accept' this value and deliver X = y with a suitable probability a(y), say, or we will 'reject' the value and start over by generating a new observation on Y. For this reason, the method is often called the acceptance-rejection technique. Of
M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie
682
course, the hard part is choosing a(y) so that the accepted observations follow the desired distribution Fx(x ). The number of trials needed to achieve an acceptance has the Pascal distribution given by P[No. of t r i a l s = n ] = [ 1 - P(A)] n - I P ( A ) ,
n = 1,2,...,
where A is the event {a trial is a success (accept y)}. The expected n u m b e r of trials is 1/P(A). In order to determine the correct form for a(y), we note that the event { X ~ x } can be partitioned into the disjoint union of events En, n = 0, 1 , 2 , . . . , where E n = {n rejections, then Y ~
Fx(x ) = P[X ~
= P[Y<~x and accept Y]/P(A). Now, P[Y<~x and accept Y]=fXo~a(y) dFr(y). differencing in the discrete case, we obtain
After differentiating, or
fx(x) = a(x)fr(X ) / P(A) so that a(x) =
fx(x) P(A)
That is, the acceptance probability for Y = x is proportional to the likelihood ratio of the two distributions. If we choose a(x) according to a(x) = Cfx(X )/fr(X) for a specified constant C, then it will automatically happen that C - - P ( A ) , because P(A) = E[a(Y)]. For reasons of efficiency, we would like a(x), and hence C, to be as large as possible, subject to the constraint a(x)~< 1. The optimal value of C is then Copt = inf{ fr(X)/fx(X)}, where the infinum is taken over all x in the support of X, i.e., over those x such that fx(X) ¢ O. Notice that if there are x such that fx(X) ¢ 0 but fy(x) = O, then it is not possible to generate X = x by rejection on Y, and the above formula gives C o p t = 0 confirming that the method cannot be applied. A suboptimal value of C may be used in situations where exact calculation o f Copt is inconvenient. The actual acceptance/rejection decision requires a separate r a n d o m experiment. Typically, a uniform random variable U is generated and the observed value Y = y is accepted if U ~< a(y). The overall procedure, then, is to generate observations Y = y and U = u from fr(Y) and from U(O, 1), respectively,
Computer generation of random variables
683
calculate a(y) and deliver X = y if u ~< a ( y ) ; otherwise, repeat the process by generating new observations of Y and U. EXAMPLE 3.5.1 (The exponential distribution truncated to (0, 1)). It is clear how we would proceed to generate an exponential r a n d o m variable truncated to take values in (0, 1) from an untruncated exponential r a n d o m variable. We now apply the above theory to formalize the result. Let fx(X) = e - * / ( 1 - e -1) if 0 < x < 1 and fx(X) = 0, otherwise. Let fy(y) = e -y if y > 0 and fy(y) = O, otherwise. Then fy(X)/fx(X) = 1 - e - 1 if 0 < x < 1 and is infinite if x >/1. The ratio is undefined if x ~< 0; however, this is of no consequence because such values are outside the support of X and Y. Now Copt = P ( A ) = inf,>o{fy(X)/ f x ( x ) } = l - e -1, and a ( x ) = 1 if 0 < x < l and a ( x ) = 0 , otherwise. Thus, the acceptance rate is 100 percent for 0 < Y < 1, and zero otherwise. T h e procedure is summarized in the following algorithm with Y being generated as in E x a m p l e 3.3.6. ALGORITHM 3.5.1. Step 1. G e n e r a t e U = u from U(0, 1). Step 2. If u > e -1 then deliver X = - l n ( u ) . Step 3. Go to Step 1. For this procedure, the expected n u m b e r of trials per delivered observation is 1/[P(A) = 1 - e -1] = 1.58. But note that the logarithm is only calculated on the final trial that leads to acceptance. EXAMPLE 3.5.2 (The half-normal distribution). Suppose we wish to generate an observation X from the half-normal distribution with density '~ ~
fx(X) = v ~ - e
--x2/2
if x > 0 ,
using rejection on a standard exponential r a n d o m variable Y with density
fy(y) = e -y
if y > 0.
Then Cop t =
.
P(A) = infx>o{fr(X ) Ifx(X)} = lnfx>o{%/~72 e
x2/2--x
}
= V'rr/2e ~ 0.76, and
a(x) = [fx(X) Ifr(x)]P(A) =
e-(X-1)2/2 .
In E x a m p l e 3.5.1, the acceptance probability is either 0 or 1; thus, a separate uniform variate is not needed to decide if the observation should be accepted or not. H e r e , however, we need to accept with probability a(x), so we generate
M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie
684
a uniform random variable U = u on (0, 1) and accept X = y if u <- a(y). This is equivalent to accepting if U
~
or
--In(u)/> ( y - - 1)2/2.
Since, by Example 3.3.6, - I n ( U ) has the standard exponential distribution, we use the exponential generator a second time to make the decision to accept Y or to start over. The procedure is summarized in the following algorithm. This method occurs in Butcher (1960) and Kahn (1954). ALCOIUTnM 3.5.2. Step 1. Generate two standard exponential variables Ya = Yl and Y2 = Y2. Step 2. If Yl > (Y2 - 1) 2/2, then deliver X = Y2. Step 3. Go to Step 1. The expected number of trials per delivered observation is 1 / P ( A ) = ~ 1.32.
3.5.1. Use of non-uniform variates to accept or reject The steps given below outline a procedure in which the acceptance/rejection decision is based upon a continuous non-uniform variate Z. As before, the variable of interest is X for which observations are generated by accepting or rejecting realizations of Y. The function t(.) occurring in Step 3 is to be chosen so that the delivered values have the prescribed distribution F x. The choice of t(.) is discussed below. Step Step Step Step Step
1. 2. 3. 4. 5.
Generate an observation Y = y with density fy(y). Generate an observation Z = z with distribution function Fz(z ). Calculate t(y) where t is a suitably chosen function. If z <- t(y) then deliver X = y. Go to Step 1.
The probability of accepting a value Y = y is P[Z ~ t(y)] = Vz(t(y)) = a(y) = P ( A ) f x ( y ) / f v ( Y ) . Consequently,
t(x) = F za(p(A)fx(x) /fv(x)) =- V zl(a(x)) . The optimum procedure occurs, as before, when P ( A ) = inf{fv(x)/fx(X)}. Using this expression for t(.), we see that the acceptance decision in Step 4 is equivalent to
Fz(z) <~P(A)fx(y)/fy(y) =- a(y) . Since Fz(Z ) is distributed as U(0, 1), the method is not fundamentally different from making the decision on the basis of a uniform variate. It is simply a matter of computational convenience and feasibility.
Computer generation of random variables
685
EXAMPLE 3.5.2 (Revisited). Recall that we wished to generate an observation from the half-normal density fx()X = 2X/~-~e -x2/2 using observations Y = y from the exponential density fy(y) = e -y. We calculated a(x) = e -(x-1)2/2 and ended up using a second observation Z = z from the exponential distribution in order to decide whether to accept Y = y. With the above notation, we find that
X=x
t(x) = F zl(a(x)) = - I n ( 1 - a(x)) = ln[1-e-(X-1)2/2],
x/>0.
This function t(x) will work, but it is not very nice. Now there is no reason why we have to use the left tail of the distribution of Z as the acceptance region. If the right tail is used instead, then Step 4 of the above algorithm becomes:
Step 4'. If z > t-(y) then deliver X = y, where t-(x)= F z l ( 1 - a(x)). For the present example, this results in t-(x) = --ln(e -(x-1)2/2) = (x -- 1)2/2, in agreement with Algorithm 3.5.2. An important special case of the rejection m e t h o d occurs when the support of X is a subset of the unit interval and Y is taken to have a uniform distribution on (0, 1). Then fy(y) = 1 if 0 < y < 1, P(A) = info
(The exponential distribution truncated to (0, 1)). Let fx(X) = e - X / ( 1 - e -1) if 0 < x < l , and fx(x)=O otherwise. Then P ( A ) = inf0
3.5.2. A geometric interpretation When observations X = x are to be generated by accepting or rejecting observations Y = y from the uniform distribution then the acceptance probabilities a(y) =fx(y)P(A) must be less than or equal to one. So P(A) is chosen as large as possible to make fx(x)P(A) ~< 1. This results in scaling fx(X) down so that it is contained entirely within the unit square as shown in Figure 3.5.1. If the acceptance/rejection decision is based on an observation U = u from the uniform distribution, then the pair (y, u) is a point chosen at random from the unit square and if (y, u) falls in the region below the graph of a(x), then X = y is delivered. This technique is sometimes called the envelope-rejection method. An envelope function g(y) is found that 'encases' or bounds the density function from above. When g(y) is divided by the area under g(y), it becomes the density function of Y.
686
M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie
1
0 0 1(
Fig. 3.5.1. Scaling an acceptance probability P(A) by multiplication of the density function fx(X) to fit within a unit square.
3.5.3. Bounds for the rejection method, or rejection with squeezing Supposing an observation Y = y is generated and is to be accepted if an independent r a n d o m variable Z satisfies Z <-t(y), as in Section 3.5.1. Often the function t(y) is slow to evaluate. If a simple bound s(y) <~t(y) exists then one can check to see if Z<~s(y) in which we accept and deliver X = y ; otherwise, we need check to see if Z <~t(y). This squeeze m e t h o d is very useful if the event {Z ~
a(v) = [fx(y(v))/fr(y(v))lP(A) . For a < l