Determining simulation run length with the runs test

Determining simulation run length with the runs test

Simulation Modelling Practice and Theory 11 (2003) 237–250 www.elsevier.com/locate/simpat Determining simulation run length with the runs test E. Jac...

221KB Sizes 0 Downloads 20 Views

Simulation Modelling Practice and Theory 11 (2003) 237–250 www.elsevier.com/locate/simpat

Determining simulation run length with the runs test E. Jack Chen *, W. David Kelton BASF Corporation, Mathematical Modeling Group, 3000 Continental Drive––North, Mount Olive, NJ 07828-1234, USA Department of Quantitative Analysis and Operations Management, University of Cincinnati, Cincinnati, OH 45221-0130, USA Received 12 September 2002; received in revised form 17 March 2003; accepted 21 March 2003

Abstract This paper discusses implementation of a sequential procedure to determine the simulation run length and construct a confidence interval for the mean of a steady-state simulation. The quasi-independent (QI) procedure increases the simulation run length progressively until a certain number of essentially independent and identically distributed systematic samples are obtained. We estimate the variance of the sample mean through an empirical distribution (histogram). Several experimental performance evaluations demonstrate the validity of the QI procedure and histogram approximation.  2003 Elsevier B.V. All rights reserved. Keywords: Runs test; Histogram; Statistical analysis

1. Introduction When estimating a steady-state performance measure, for example the mean l, of some discrete-time stochastic output process fXi : i P 1g via simulation, we would like an algorithm to determine the run length N so that the mean estimator Psimulation N (sample mean X ðN Þ ¼ ð1=N Þ i¼1 Xi ) is approximately unbiased (X ðN Þ is asymptotically unbiased in most circumstances), the confidence interval (CI) is of a pre-specified width, and the actual coverage probability of the CI is close to the nominal * Corresponding author. Address: BASF Corporation, Mathematical Modeling Group, 3000 Continental Drive––North, Mount Olive, NJ 07828-1234, USA. E-mail addresses: [email protected] (E.J. Chen), [email protected] (W.D. Kelton).

1569-190X/03/$ - see front matter  2003 Elsevier B.V. All rights reserved. doi:10.1016/S1569-190X(03)00048-0

238

E.J. Chen, W.D. Kelton / Simulation Modelling Practice and Theory 11 (2003) 237–250

coverage probability 1  a. Since simulation output data are generally correlated, the usual method of constructing a CI based on classical statistics, which assumes independent and identically distributed (i.i.d.) observations, is not directly applicable. Several methods of determining the simulation run length and estimating the variance of the sample mean have been proposed. For an overview of existing methods of determining the simulation run length and estimating the variance of the sample mean, see Law and Kelton [5]. Chen and Kelton [1] propose a Quasi-Independent (QI) algorithm to determine the simulation run length. The QI procedure is asymptotically valid as the systematic samples (QI subsequence) appear to become independent, as determined by the runs test (see Section 2). The only condition required is that the correlations of the stochastic process output sequence die off as the lag between observations increases. This mild assumption is satisfied in virtually all practical settings. These weakly dependent processes typically obey a central limit theorem for dependent processes. Furthermore, to be considered as a general-purpose procedure, it should be easy to interpret and be highly portable. In this paper, we present the theoretical basis for using the runs test to determine simulation run length, show the interdependence between the strength of the autocorrelation of the underlying output sequence and the lag l at which the systematic samples (see Section 3.1) pass the runs test, and propose using an empirical distribution (histogram) to estimate the variance of the sample mean. In Section 2, we provide background necessary for the proposed procedure. In Section 3, we present the proposed procedure for determining the simulation run length of stationary processes and histogram approximation. In Section 4, we give some empirical-experimental results. In Section 5, we give concluding remarks.

2. Background We use the runs test to check whether a sequence appears to be independent. We examine the sequence of output data for unbroken subsequences of maximal length within which the sequence increases (decreases) monotonically; such a subsequence is called a run up (run down), and the length of the subsequence is called the run length. Note that this ‘‘run length’’ is different from the ‘‘simulation run length.’’ The ‘‘run length’’ used by the runs test is to determine whether a sequence appears to be independent. On the other hand, ‘‘simulation run length’’ is the total length of a particular simulation execution, measured either in number of observations collected (for a discrete-time output process), or amount of simulated time (for a continuous-time output process); we focus on discrete-time processes. The runs test looks solely for independence and has been shown to be very powerful [4]. When we examine the run length of consecutive subsequences, a long run will tend to be followed by a short run, and conversely. This is because adjacent runs are not independent. For example, if the run length of a runs-up subsequence is r, then for some constant i, Xiþj1 < Xiþj for j ¼ 1; 2; . . . ; r  1, and Xiþr1 > Xiþr . If r is large, then Xiþr1 is likely to be large. Therefore, even though Xiþr1 > Xiþr , Xiþr could be

E.J. Chen, W.D. Kelton / Simulation Modelling Practice and Theory 11 (2003) 237–250

239

a relatively large number and will result in a short run. This lack of independence is enough to invalidate a straightforward v2 test. Knuth [4] suggests a vastly simpler and practical runs-up test. The procedure will ‘‘throw away’’ the element that immediately follows a run, so that when Xj is greater than Xjþ1 we start the next run with Xjþ2 . Hence, if the output data are i.i.d., then the run lengths are independent because the last element of the first subsequence Xj and the first element of the adjacent subsequence Xjþ2 are independent. Consequently, a simple v2 test may be used. For this simple runs-up test, the probability of a run of length r is r=ðr þ 1Þ!, under the null hypothesis that the output data are i.i.d. random variables (see Appendix A for a proof). If we chose to record six run lengths, 1 through 5 and 6þ (run length of six or longer), then for a large sample size n, the test statistic will have an approximate v2 distribution with five degrees of freedom (df). Knuth [4] suggests that n be at least 4000. That is, define the statistic V1¼

X ðC½i  SP ½i Þ2 ; SP ½i 16i66

where C½i , i ¼ 1; . .P . ; 6 is the number of subsequences with run length i (or P i in the case of i ¼ 6), S ¼ 1 6 i 6 6 C½i , and P ½i ¼ i=ði þ 1Þ!. Then under the null hypothesis that the output data are i.i.d. random variables, the statistic V 1 has a v2 distribution with five df, when n is large (P4000). The runs test requires data to be absolutely continuous and may not work properly if this condition is not satisfied. The probability of a run of length of 1 will be unusually high at a discontinuous point where there is a jump in the cumulative distribution function. This ‘‘over mixing’’ will cause the subsequence to fail the runs test. To correct this problem, Chen and Kelton [2] propose to increase the run length with probability 1=ðr þ 1Þ when two elements are equal, where r is the current run length. Because if there are r þ 1 observations, the probability of the ðr þ 1Þth observation being the largest is 1=ðr þ 1Þ. For example, if Xi ¼ Xiþ1 and the runs-up run length r up to Xi is 2, we will generate a uniform (0,1) random variable u; if u < 1=3, we will let Xi 6 Xiþ1 and increase the run length by one, otherwise we will let Xi P Xiþ1 and start a new runs-up sequence. Knuth [4] suggests another runs-up test that takes into consideration the dependence between adjacent runs instead of ‘‘throws away’’ observations. Let V2¼

1 X ðC½i  nbi ÞðC½j  nbj Þaij ; n 1 6 i;j 6 6

where the aij and bj are pre-computed coefficients; see [4] or [5] for their values. Then under the null hypothesis that the output data are i.i.d. random variables, V 2 has a v2 distribution with six df, when n is large (P4000). Based on our experimental results this runs-up test has greater power in detecting dependence; however, it has larger Type I error (see Section 4.3 and Appendix B) when compared to the runs-up test using V 1. In the non-overlapping batch-means method, the simulation output sequence fXi : i ¼ 1; 2; . . . ; N g is divided into R adjacent non-overlapping batches, each of size

240

E.J. Chen, W.D. Kelton / Simulation Modelling Practice and Theory 11 (2003) 237–250

m. For simplicity, we assume that N is a multiple of m so that N ¼ Rm. The sample mean, l^j , for the jth batch is l^j ¼

1 m

mj X

Xi

for j ¼ 1; 2; . . . ; R:

i¼mðj1Þþ1

^ of the individual batch means, given by l ^ ¼ ð1=RÞ PR l^j ; is Then the grand mean l j¼1 ^ ¼ X ðN Þ, the sample mean of all N individual used as a point estimator for l. Here l Xi Õs, and we seek to construct a CI based on the point estimator. The batch-means variance estimator is simply the sample variance of the estimator computed on means of subsets of consecutive subsamples.

3. Methodologies Let FX denote the steady-state distribution function of the process under study and W a property of FX , such as mean, variance, or quantile. The natural point esb , is typically the sample mean, the sample variance, timator for W, denoted by W the sample quantile, or a simple function of the relevant order statistics, chosen to imitate the performance measure W. Furthermore, the natural estimators are appropriate for estimating any W, regardless of dependency, which follows from the empirical distribution function converging to FX . The natural estimators are computationally reasonable and perform well statistically [3]. Although asymptotic results are often applicable when the amount of data is ‘‘large enough,’’ the point at which the asymptotic results become valid generally depends on unknown factors. We propose using the runs test to estimate the required sample size sequentially, and using histogram approximation to estimate the variance of the sample mean. 3.1. Determining the simulation run length The QI procedure increases the simulation run length progressively until a subsequence of n samples (taken from the original output sequence) appears to be independent, as determined by the runs test. We accomplish this by systematic sampling, i.e., select a number l between 1 and L, choose that observation and then every lth observation thereafter. For example, n lag 5 systematic samples will be observations x5kþ1 for k ¼ 0; 1; 2; . . . ; n  1. Here, the chosen l will be sufficiently large so that the systematic samples appear to be independent. This is possible because we assume the autocorrelations of the output sequence die off as the lag increases. To avoid incorrectly determining a correlated sequence to be independent because it possesses some special structure, we use both the runs-up and runs-down tests simultaneously. Thus, if the probability of an independent sequence passing the runs-up test is p1 and passing the runs-down test is p2 , the probability of an independent sequence passing the combined runs test is p ¼ p1 p2 . We compute the lag l at which the systematic samples appear to be independent, as determined by the runs test. The minimum required sample size is then N ¼ nl, i.e., the minimum required simulation

E.J. Chen, W.D. Kelton / Simulation Modelling Practice and Theory 11 (2003) 237–250

241

run length computed based on the data, which is called the computation run rength. The power of a test is the probability of rejecting the null hypothesis when it is false and is generally greater with a larger n. We set n ¼ 4000, the minimum recommended sample size for the runs tests, in our procedure. The procedure will iterate repeatedly to increase the lag l until it has obtained n systematic samples that appear to be independent, as determined by the runs test. Optimally, we would like to search for the minimum l such that the systematic samples pass the runs test of independence. However, in order to process the data in one pass, we double the lag l every two iterations. Thus, the lag l is determined dynamically as the simulation proceeds and is likely to be much larger than the minimum lag l to pass the test of independence. Consequently, the empirical simulation run length will likely be longer than the computation run length. Since the lag l grows rapidly at later iterations, we set L (the limit of l) to 210 , which is based on our experience. We allocate a buffer with size t ¼ 3n to store our systematic samples yi , 1 6 i 6 t. Note that lag l0 of the systematic samples is used to refer to systematic samples ykl0 þ1 , for k ¼ 0; 1; 2; . . . ; n  1. The following shows the total number of observations at each iteration: Iteration

0

1A

1B

2A

2B

3A

3B

4A

4B

. . . kA kB ðk > 0Þ ðk > 0Þ

Total n 2n 3n 4n 6n 8n 12n 16n 24n . . . 2k n observations Samples n 2n 3n 2n 3n 2n 3n 2n 3n . . . 2n in buffer Lag of 20 2 0 2 0 2 1 2 1 2 2 2 2 23 23 . . . 2k1 samples

2k1 3n 3n 2k1

The Total Observations row shows the total number of observations at a certain iteration. The Samples in Buffer row shows the number of systematic samples stored in the buffer. The Lag of Samples row shows the lag l used to obtain the systematic samples stored in the buffer. For example, at the end of the 1B th iteration, the total number of observations is 3n, there are 3n samples in the buffer, and each systematic sample is the lag 1 observation. At the beginning of the 2A th iteration, we reduce the number of systematic samples in the buffer from 3n to 3n=2 by discarding every other systematic sample; consequently, each systematic sample is the lag 2 observation. Thus, the procedure will double the lag l (and the number of observations) every two iterations. At the 2A th iteration, we generate another n observations and store n=2 systematic samples that are lag 2 observations in the buffer. Therefore, at the end of the 2A th iteration, the total number of observations is 4n, there are 2n systematic samples in the buffer, and each systematic sample is the lag 2 observation. Note that all observations are used to construct a histogram of the parameter under study. We discard systematic samples in the buffer so that the number of systematic samples will be no more than t. Systematic samples in the buffer are used by the runs test to determine the lag l at which they appear to be independent.

242

E.J. Chen, W.D. Kelton / Simulation Modelling Practice and Theory 11 (2003) 237–250

3.2. Histogram approximation Chen and Kelton [2] propose using the QI algorithm to determine the simulation run length and using grid points to construct a histogram (multiple quantiles). We modified the procedure to estimate the variance of a mean estimator. The total number of grid points is fixed to G ¼ 203. The value of the grid points g0 ; g1 ; . . . ; g202 will be constructed as follows: g0 and g202 are set to 1 and 1 (in practice, the minimum and maximum values on the underlying computer) respectively. If the analyst knows what may be the minimum or maximum values of the distribution, those values should be used instead. For example, the waiting-time of any queuing system cannot be negative, so the analyst should enter 0 as the minimum. Grid point g51 is set to the minimum of the initial n, 2n, or 3n observations, depending on the degree of the correlation of the sequence. The number of main grids, Gm , is set to 100. Grid points giþ51 , i ¼ 1; 2; . . . ; Gm , are set to the i=Gm quantile of the initial n, 2n, or 3n observations. We will set grid points g1 through g50 and g152 through g201 to appropriate values so that g1 through g52 will have the same segment length and g150 through g201 will have the same segment length. Therefore, the grids will be dense where the estimated probability density function is high and will be sparse where the estimated probability density function is low. Furthermore, each grid will generally contain no more than 1=Gm proportion of the distribution. A corresponding array of n1 ; n2 ; . . . ; n202 is used to store the number of observations between two consecutive grid points. For example, the number of observations between gi1 and gi is stored in ni . The information in the grid points will be updated each time a new observation xi is generated. The number stored in ni is increased by one if gi1 < xi 6 gi . Once the QI algorithm has determined that the sample size is large enough for the asymptotic approximation to become valid, we then compute the mean and variance based on the estimated histogram. The mean is estimated by X ðN Þ. Theoretically, the sample variance can be estimated by G1  G1  2 2 X 1 X gj1 þ gj gj1 þ gj S2 ¼  X ðN Þ nj  X ðN Þ Pj : N  1 j¼1 2 2 j¼1 PG1 Note that N ¼ nl ¼ j¼1 nj and Pj ¼ nj =N . However, since the output sequences may be correlated and it is likely that we have not captured the true minimum and maximum of the underlying distribution, we conservatively estimated the sample variance by S2

G1 X

2

2

maxððgj1  X ðN ÞÞ ; ðgj  X ðN ÞÞ ÞPj :

ð1Þ

j¼1

This would then lead to the 100ð1  aÞ% CI, for l, S X ðN Þ z1a=2 pffiffiffi ; n

ð2Þ

where z1a=2 is the 1  a=2 quantile for the N ð0; 1Þ distribution. N ðl; r2 Þ denotes the normal distribution with mean l and variance r2 . Here n is the sample size used for

E.J. Chen, W.D. Kelton / Simulation Modelling Practice and Theory 11 (2003) 237–250

243

the runs test. Even though X ðN Þ is the sample mean of N samples, n is used to compute the variance of the sample mean to adjust for the autocorrelation of the data. Let the half-width be S H ¼ z1a=2 pffiffiffi : n The final step in the procedure is to determine whether the CI meets the userÕs halfwidth requirement, a maximum absolute half-width  or a maximum relative fraction c of the magnitude of the final point mean estimator X ðN Þ. If the relevant requirement H 6 , or H 6 cjX ðN Þj for the precision of the CI is satisfied, then the procedure terminates, and we return the point estimator X ðN Þ and the CI with half-width H . If the precision requirement is not satisfied, then  2 the procedure will increase the sample 2 size to n0 l where n0 ¼ ð H =Þ n or H =cX ðN Þ n. Furthermore, the half-width will be computed by S H ¼ z1a=2 pffiffiffiffi : n0 The histogram-approximation algorithm: (1) Remark: t is the size of the buffer, which is used to store the systematic samples, and k is the index of iterations. Each iteration k contains two sub-iterations kA and kB . Set the maximum number of iterations ITMAX to 10. (2) Set n ¼ 4000, t ¼ 3n, and k ¼ 0. Generate n observations as our initial samples. (3) Carry out a runs test to determine whether the sequence appears to be independent. If this is the initial iteration, use lag 1 of the systematic samples in the buffer. If this is a kA iteration, use lag 2 of the systematic samples in the buffer. If this is a kB iteration, use lag 3 of the systematic samples in the buffer. (4) If the tested systematic samples appear to be independent or the sample size has reached t, compute the value of grid points as discussed in Section 3.2 to ensure that the segments between two consecutive main grid points contain about the same proportion of the distribution. (5) If the tested systematic samples appear to be independent, go to step 11. (6) If the current iteration is kA iterations, start a kB iteration. If the current iteration is the initial or kB iterations, set k ¼ k þ 1 and start a kA iteration. (7) If this is the 1A th or 1B th iteration, generate n observations, store those values in the buffer after the ones already there and go to step 3. (8) If this is a kA iteration (k > 1), then discard every other systematic sample in the buffer, reindex the rest of 3n=2 samples in the first half of the buffer. Generate 2k1 n=2 observations and store ðn=2Þ lag 2k1 systematic samples in the later portion of the buffer. (9) If this is a kB iteration (k > 1), generate 2k1 n observations and store n lag 2k1 systematic samples in the later portion of the buffer. (10) If k < ITMAX , go to step 3. (11) Compute the variance estimator according to (1) and confidence interval according to (2).

244

E.J. Chen, W.D. Kelton / Simulation Modelling Practice and Theory 11 (2003) 237–250

(12) Let  be the desired absolute half-width and cjX ðN Þj be the desired relative halfwidth. If the half-width of the CI is less than  or cjX ðN Þj, terminate the algorithm. (13) Increase the sample size as discussed. Go to step (11). The proposed procedure uses the runs test to determine the simulation run length, which has a strong theoretical basis, and is relatively easy to implement. However, this method also suffers from one of the problems that afflicts any sequential stopping rules: the run length of the simulation may be inappropriately short or long. By specifying an appropriate stopping rule, we can reduce the chance that the simulation terminates early. To avoid the simulationÕs running longer than necessary, though, is more difficult. For example, we can increase the a1 value of the runs test to increase the probability that correlated sequences fail the runs test and increase the simulation run length; however, a larger a1 also increases the probability that independent sequences fail the runs test and causes the simulation to run longer than necessary.

4. Empirical experiments In this section, we present some empirical results from simulation experiments using the proposed procedure. We use n ¼ 4000 samples for the runs test and test the procedure with six stochastic processes: • • • •

Observations are i.i.d. uniform between 0 and 1, denoted U ð0; 1Þ. Observations are i.i.d. normal with mean 0 and variance 1, denoted N ð0; 1Þ. Observations are i.i.d. exponential with mean 1, denoted expon(1). Steady-state of the first-order moving average process, generated by the recurrence Xi ¼ l þ i þ hi1

for i ¼ 1; 2; . . . ;

where i is i.i.d. N ð0; 1Þ and 0 < h < 1. l is set to 0 in our experiments. This process is denoted MA(h). • Steady-state of the first-order auto-regressive process, generated by the recurrence Xi ¼ l þ uðXi1  lÞ þ i

for i ¼ 1; 2; . . . ;

where i is i.i.d. N ð0; 1Þ, and 0 < u < 1. l is set to 0 and X0 is set to a random variate drawn from the steady-state distribution in our experiments. This process is denoted AR(u). • Steady-state of the M/M/1 delay-in-queue process with the arrival rate (k) and the service rate (m ¼ 1). This process is denoted MM1(q), where q ¼ k=m is the traffic intensity. We carry out experiments to evaluate the performance of using the runs test to determine whether a sequence appears to be independent, to check the interdependence between the average lag at which the systematic samples appear to be indepen-

E.J. Chen, W.D. Kelton / Simulation Modelling Practice and Theory 11 (2003) 237–250

245

dent and the strength of the autocorrelation of the output sequence, and the performance the histogram-approximation vs. the batch-means method of Chen and Kelton [1]. This batch-means procedure uses the runs-up test and an initial sample size of 4000 to determine the required sample size sequentially and uses the same iteration algorithm described in Section 3.2. The entire output sequence is divided to three or four non-overlapping batches when no absolute or relative precision is specified. Schmeiser [6] discusses batch-size effects in the analysis of simulation output once the sample size is fixed. Since batch-means procedures often are used to obtain i.i.d. normal data for other procedures, such as ranking and selection, we would like to divide the output sequence into as many batches as possible. We divide the entire output sequence into 100 batches, so the batch size is nl=100. Since the number of batches is many times larger than in the original procedure, the variance of the batch-means mean estimator from the modified procedure will be much smaller. We also modify this batch-means procedure to use the stopping rule proposed in this paper, i.e., we use both the runs-up and runs-down tests to determine whether a sequence appears to be independent. Therefore, the only difference between the histogram-approximation and batch-means approaches is how the variance of the sample mean is estimated. 4.1. Experiment 1 In this experiment, we use the runs test (statistic V 1 in Section 2) to determine whether a sequence appears to be independent. Table 1 lists the experimental results. Each design point is based on 10,000 independent simulation runs. The

Table 1 Runs test with sample size 4000 Distribution

P (81%)

P (90.25%)

P (98.01%)

U ð0; 1Þ N ð0; 1Þ Expon(1)

81.13% 81.17% 81.13%

89.99% 89.97% 89.99%

97.63% 97.61% 97.63%

MA(0.15) MA(0.25) MA(0.35) MA(0.50)

95.96% 100.00% 100.00% 100.00%

91.07% 100.00% 100.00% 100.00%

74.72% 99.97% 100.00% 100.00%

AR(0.15) AR(0.25) AR(0.35) AR(0.50)

89.91% 99.96% 100.00% 100.00%

81.87% 99.83% 100.00% 100.00%

59.79% 98.73% 100.00% 100.00%

MM1(0.15) MM1(0.25) MM1(0.35) MM1(0.50)

38.69% 74.35% 98.78% 100.00%

25.30% 63.19% 97.39% 100.00%

9.07% 42.89% 92.96% 100.00%

246

E.J. Chen, W.D. Kelton / Simulation Modelling Practice and Theory 11 (2003) 237–250

P ð100ð1  a1 Þ%Þ column lists the observed percentage of correct decision when the nominal probability of the runs test is set to 1  a1 , i.e., the percentage of these 10,000 runs passing the runs (runs-up and runs-down) test for the U ð0; 1Þ, N ð0; 1Þ, and expon(1) design points and failing the runs test for all other design points. We set the a level of the underlying runs-up and runs-down tests to 0.1, 0.05, and 0.01; thus, the corresponding a1 levels of the combined runs test are 0.09, 0.0975, and 0.0199, respectively. A Type I error of a hypothesis test is the event that we reject the null hypothesis when it is true, i.e., independent sequences fail the runs test. This runs test performs very well in terms of Type I error, it is around the specified a1 level. These results confirm that the runs test is a valid tool in detecting independence. A Type II error is the event that we accept the null hypothesis when it is false, i.e., correlated sequences pass the runs test. When the sequence is only slightly correlated, Type II error is high, for example, for the MM1(0.15) and MM1(0.25) processes. Note that we can use a large n to increase the power of test and reduce Type II error. If we mistakenly treat slightly correlated sequences as being i.i.d., the performance measurements should still be fairly accurate, thus, the low probability of correct decision of those slightly correlated sequences should not pose a problem. On the other hand, when the output sequences are mild to highly correlated, the runs test detects the dependence almost all the time. Moreover, the discontinuity adjustment works well for this runs test to process the discontinuous point of the M/M/1 delay-in-queue output distribution. Consequently, the runs test can be used to determine the simulation run length. Furthermore, among the three different a1 Õs we used, the test statistic of the runs test is more consistent between independent and correlated sequences with a1 ¼ 0:0975. See Appendix B for the experimental results of the runs test using statistic V 2. 4.2. Experiment 2 In this section, we check the interdependence between the average lag at which the systematic samples appear to be independent and the strength of the autocorrelation of the output sequence. We use the runs test (statistic V 1) with n ¼ 4000 samples and set a1 ¼ 0:0975. Table 2 lists the experimental results. Each design point is based on

Table 2 Average lag l at which the output sequences appear to be independent Process

MA(h)

h,u,q

l

stdvðlÞ

l

AR(u) stdvðlÞ

l

MM1(q) stdvðlÞ

0.15 0.25 0.35 0.50 0.75 0.90

2.18 2.22 2.23 2.23 2.21 2.22

0.52 0.54 0.53 0.52 0.51 0.51

2.19 2.57 3.09 4.20 8.78 20.84

0.65 0.75 0.82 0.91 1.38 2.64

1.79 3.50 4.90 5.70 24.23 157.56

1.41 3.62 5.00 1.83 3.70 17.92

E.J. Chen, W.D. Kelton / Simulation Modelling Practice and Theory 11 (2003) 237–250

247

1000 independent simulation runs. The h, u, q column lists the coefficient values of the corresponding stochastic process. The MA(h) column lists the results of the underlying moving average output sequences. The l column lists the average lag at which the systematic samples appear to be independent. Hence, the average computation run length for that particular stochastic process is nl. The stdvðlÞ column lists the standard deviation of the lag l. In general, the average lag at which the systematic samples appear to be independent increases as the autocorrelation increases. The MA(h) processes are only slightly correlated even with h as large as 0.9. Therefore, lag 2 observations of the MA(0.90) output sequences generally appear to be independent. On the other hand, the MM1(0.90) output sequences are highly correlated, the average lag at which the systematic samples appear to be independent is as large as 157. The results from this experiment indicate a strong correlation between the average lag l at which the systematic samples appear to be independent and the strength of the autocorrelation. We also performed the experiment using different n. The average lag at which the systematic samples appear to be independent generally increases with a larger n. 4.3. Experiment 3 In this section, we present some empirical results of using the histogram approximation to estimate the variance of the sample mean. We tested the MM1(q) delayin-queue processes. In these experiments, no relative precision or absolute precision were specified, so the half-width of the CI is the result of the default precision. We use the sample size 4000 for the runs test (statistic V 1) and the a1 value is set to 0.0975. Note that a1 is the probability that independent sequences fail the runs test, so a higher a1 will result in a longer simulation run. We tested the model with q set to 0.50, 0.75, and 0.90. The confidence level of the CI is set to 90%. In order to eliminate the initial bias, the initial value of these stochastic processes is set to a random variate drawn from their steady-state distribution. Table 3 lists the experimental results. Each design point is based on 10,000 independent simulation runs.

Table 3 Coverage of 90% confidence intervals for the MM1(q) delay-in-queue process q

0.50

0.75

0.90

l X ðN Þ avg. rp sdev rp avg. samp sdev samp Method

1.00 0.998645 0.026065 0.022053 83301 415054 Histogram

Batch

3.00 2.997878 0.019128 0.015136 161309 69116 Histogram

Batch

9.00 8.990395 0.014904 0.011755 1386611 570948 Histogram

Batch

avg. hw sdev hw Coverage

0.056226 0.004691 89.83%

0.052819 0.016155 89.27%

0.114511 0.006546 88.16%

0.116756 0.023592 89.17%

0.279314 0.011606 89.85%

0.275205 0.054653 89.64%

248

E.J. Chen, W.D. Kelton / Simulation Modelling Practice and Theory 11 (2003) 237–250

The q row lists the q value of the underlying MM1(q) process. The l row lists the true mean. The X ðN Þ row lists the grand sample mean, i.e., the sample mean from these 10,000 independent runs. The avg. rp row lists the average of the relative precision of the estimators. Here, the relative precision is defined as rp ¼ jX ðN Þ  lj=X ðN Þ. The sdev rp row lists the standard deviation of the relative precision of the estimators. The avg. samp row lists the average of the sample size. The sdev samp row lists the standard deviation of the sample size of each independent replication. The method row lists the method used to obtain the estimates; histogram and batch denote the histogram-approximation approach and batch-means approach, respectively. The avg. hw row lists the average of the CI half-width. The sdev hw row lists the standard deviation of the CI half-width. The coverage row lists the percentage of the CIs that cover the true mean. The computation run length for these three MM1(q) processes are 22800, 96920, and 630240, respectively, for q ¼ 0:5; 0:75, and 0.90. The simulation run length, which is much longer than the computation run length, also increases as the strength of the autocorrelation increases. Clearly, we can improve the efficiency of the procedure by slowing down the rate of growth of l at later iterations. The CI coverage of both approaches are all near the specified 90% confidence level. In all cases, the average of the CI half-widths are approximately the same for both the histogramapproximation and the batch-means approaches. However, the standard deviation of the CI half-width from the histogram approximation is much smaller than from batch means. When estimating the M/M/1 delay-in-queue process with q ¼ 0:5 and 0.9, there are 25 and 2, respectively, independent runs that reach the limit l ¼ 210 . We believe this is caused by ‘‘over mixing,’’ i.e., too many 0s, when q ¼ 0:5. Consequently, the variance of the simulation run length is very large for the MM1(0.5) process.

5. Conclusions We have demonstrated that the runs test can be used to determine an appropriate simulation run length when the autocorrelations of the output sequence die off as the lag between observations increases. If the confidence level of the runs test is too large (>0.95), the simulation run length may be shorter than necessary. The experimental results indicate that there is a strong correlation between the average lag l at which the systematic samples appear to be independent and the strength of the autocorrelation of the output sequence. We have presented an algorithm to estimate the mean and histogram of a stationary process. Some histogram estimates require more observations than others before the asymptotic necessary for simulation estimates to become valid. The proposed QI algorithm works well in determining the required simulation run length for the asymptotic approximation to become valid. Although it is heuristic, the QI procedure has a strong theoretical basis. The experimental results show that the procedure produces a valid CI for the mean. Since the procedure stops only when the systematic samples appear to be independent, it obtains high precision and small CI half-width

E.J. Chen, W.D. Kelton / Simulation Modelling Practice and Theory 11 (2003) 237–250

249

with a long simulation run length by default, i.e., without specifying any absolute or relative precision. For example, in our experiments the achieved relative half-width using the histogram-approximation approach is within 5.7% of the sample mean for the M/M/1 delay-in-queue processes by default. We do not think this is a major practical drawback for the QI procedure, because a large relative half-width (larger than 10%) may provide little useful information. Furthermore, if the entire output sequence is stored and can be read in repeatedly, then the procedure will be able to stop once it has reached the computation run length in simulation runs. The proposed histogram-approximation algorithm computes quantiles only at grid points and generates an empirical distribution (histogram) of the output sequence, which can provide insight into the underlying stochastic process. The main advantage of this approach is that by using a straightforward runs test to determine the simulation run length and obtain quantiles at grid points, we can apply classical statistical techniques directly and do not require more advanced statistical theory, making it easy to understand. Consequently, the procedure can easily be implemented in any programming language or simulation-modeling software, so should be highly portable. Because a histogram is constructed as an empirical distribution of the underlying process, it is possible to estimate other characteristics of the distribution, such as a proportion, quantile, or derivative, under the same framework.

Appendix A. Proof of the probability of run length r Corollary 1 (Run-Length r). The probability of a run of length r is r=ðr þ 1Þ!, under the null hypothesis that the data are i.i.d. random variables. For the run length to be r, we need to have exactly r þ 1 observations in a subsequence (with probability zero that any two values are equal). Because these are i.i.d. random variables, by the permutation rule there are exactly ðr þ 1Þ! different subsequences with r þ 1 observations. To have run length equal to r, the observations from x1 to xr must be in ascending order but xr must be greater than xrþ1 . There are only r such subsequences; with r þ 1 observations sorted into ascending order, each run length r subsequence is created by taking the ith observation (for i ¼ 1 to r) and concatenated to the end of the sorted subsequence. It can be shown by induction that n X r¼1

r 1 ¼1 : ðr þ 1Þ! ðn þ 1Þ!

Furthermore, 1 X r¼1

n X r r ¼ lim ¼ 1: ðr þ 1Þ! n!1 r¼1 ðr þ 1Þ!

This completes our proof.

250

E.J. Chen, W.D. Kelton / Simulation Modelling Practice and Theory 11 (2003) 237–250

Appendix B. Experimental results of the second runs test Table 4 lists the experimental results of using the second runs test (statistic V 2 in Section 2) to determine whether a sequence appears to be independent. Even though this runs test has a small Type II error, i.e., it detects the autocorrelation almost all

Table 4 Second runs test with sample size 4000 Distribution

P (81%)

P (90.25%)

P (98.01%)

Uð0; 1Þ N ð0; 1Þ Expon(1)

40.75% 40.90% 40.75%

52.82% 53.52% 52.82%

72.81% 73.23% 72.81%

MA(0.15) MA(0.25) MA(0.35) MA(0.50)

99.69% 100.00% 100.00% 100.00%

99.26% 100.00% 100.00% 100.00%

96.78% 100.00% 100.00% 100.00%

AR(0.15) AR(0.25) AR(0.35) AR(0.50)

98.93% 100.00% 100.00% 100.00%

97.73% 100.00% 100.00% 100.00%

92.85% 100.00% 100.00% 100.00%

MM1(0.15) MM1(0.25) MM1(0.35) MM1(0.50)

100.00% 100.00% 100.00% 100.00%

100.00% 100.00% 100.00% 100.00%

100.00% 100.00% 100.00% 100.00%

the time, it has a large Type I error. Thus, it is not suitable for our purposes.

References [1] E.J. Chen, W.D. Kelton, A Stopping Procedure Based on Phi-Mixing Conditions. Proceedings of the 2000 Winter Simulation Conference, 2000, pp. 617–626. [2] E.J. Chen, W.D. Kelton, Quantile and Histogram Estimation. Proceedings of the 2001 Winter Simulation Conference, 2001, pp. 451–459. [3] D. Goldsman, B.W. Schmeiser, Computational efficiency of batching methods. Proceedings of the 1997 Winter Simulation Conference, 1997, pp. 202–207. [4] D.E. Knuth, The Art of Computer Programming, third ed., vol. 2, Addison-Wesley, Reading, MA, 1998. [5] A.M. Law, W.D. Kelton, Simulation Modeling and Analysis, third ed., McGraw-Hill, New York, 2000. [6] B. Schmeiser, Batch-size effects in the analysis of simulation output, Operations Research 30 (1982) 556–568.