206
European Journal of Operational Research 43 (1989) 206-215 North-Holland
Theory and Methodology
On coverage probability, independence and normality in batch means algorithms * Nabil R. A D A M Rutgers, The State University of New Jersey, 92 New Street, Newark, NJ 07102, U.S.A. Steven W. K L E I N Department of Decision Sciences and Computers, Rider College, Lawrenceville, NJ 08648, U.S.A.
Abstract: We consider the batching approach to interval estimation of the mean of a stochastic process
based upon simulation output. In particular, using batch means processes generated from underlying AR(1) processes and M / M / 1 queueing systems, we study the effects of undetected autocorrelation on both the probability of coverage and the performance of the Shapiro-Wilk test of normality. We also conduct power comparisons of two tests used to detect first-order autocorrelation in batch means processes: the von Neumann ratio test and a rank version thereof proposed by Bartels (1982). Our results indicate that a large number of batches (perhaps at least 100) be used when testing for the presence of autocorrelation. Keywords: Simulation, output analysis, batch means
1. Introduction
The batch means approach to forming a confidence interval for the first moment of a stationary stochastic process based upon simulation data has been given considerable attention in the literature (see, for example, Mechanic and McKay (1966), Fishman (1978a), Law and Carson (1979) and Adam (1983)). Particularities notwithstanding, these algorithms all seek to regroup a sample of dependent observations so as to form a set of subsample (' batch') means which are independent (or nearly so in some sense). This permits the use of Student's classical interval estimator of a normal population's mean based upon a simple random sample, a procedure known to be robust against modest distributional departures from normality. However, the interval's potential for exhibiting considerable degradation in coverage probability in the presence of merely moderate dependency makes the power of any test of randomness among batch means and its sensitivity to the number of batches an important issue. The von Neumann ratio test, a popular choice among such procedures, formally requires normality and is typically employed by fixing the significance level a and tolerating associated implicit power. The Shapiro-Wilk test of normality is * This project was supported by a research grant to N.R. Adam from the Research Resources Committee of Rutgers Graduate School of Management. Received July 1988; revised November 1988 0377-2217/89/$3.50 © 1989, Elsevier Science Publishers B.V. (North-Holland)
N.R. Adam, S. W. Klein / On coverage probability in batch means algorithms
207
often preliminarily suggested to justify use of the von Neumann ratio test. A rank version of the von Neumann test is proposed by Bartels (1982) as competing favorably with the ratio test in detecting first-order autocorrelation in a general time series setting involving certain nonnormal autoregressive processes. The purpose of this paper is to examine the effect of undetected first-order autocorrelation in the batch means sequence on the interval estimator's coverage probability and on the power function of the Shapiro-Wilk test, as well as to compare the power of the rank and parametric versions of von Neumann's test in a batch means framework. We also study the impact of the number of batches and the batch size on the aforementioned. For specificity, we consider ARMA(1, 1) batch means processes generated from an underlying AR(1) process with various error distributions, as well as batch means processes generated from an underlying M / M / 1 queueing system with a range of utilization rates. The relevant theoretical background for the batching approach is given in Section 2. Monte Carlo results on coverage probability sensitivity to autocorrelation are presented in Section 3. In Section 4 we present comparisons of the rank and ratio tests for autocorrelation for processes other than those studied by Bartels (1982). The influence of autocorrelation on the Shapiro-Wilk test's power is studied in Section 5 and conclusions are given in Section 6.
2. Theoretical framework
Let { Y,, i = 1, 2,... } be a covariance stationary process and Ya, Y2. . . . . Yn a truncated realization of the process. The mean, variance and lag-/ autocorrelation of the process will be denoted by /z, o 2, and Or, respectively. Let
Y= ~ ~/n, i=1
~1/2
and let G.f denote the 100(1 - a) percentile of the t-distribution with f degrees of freedom. It is well known that E ( Y ) =/x and that Var(Y) = (o2/n) 1 + 2 •
(1 -jn-1)Oj
j=l
]
.
(1)
Moreover, it is easy to show (Anderson, 1971) that 1)
•
(2)
Thus, if the { Y, } are uncorrelated, then sy2 and s2/n are unbiased for 02 and Var(Y), respectively, while Y is in general unbiased for/~. If, moreover, the { Y, } are marginally normal, then Y and s~ are independent and ( Y - / ~ ) / ( s Z y / n ) 1/2 is distributed as a t random variable with n - 1 degrees of freedom. This leads to the classical interval estimator of t~, Y q-t~/2.n_l(Sy/n 2 ) l/z , which has probability 1 - a of covering ~t and has minimum probability of covering false values, among all unbiased 1 - a confidence intervals (Lehmann, 1959). However, when the { Y, } are positively correlated, then (1) and (2) indicate that E(s~/n) < o2/n < Var(Y). Thus, while one might anticipate a wider interval to accommodate a larger variance in the point estimator, one finds oneself using an interval of smaller expected width than in the independent observations case. It is not surprising that the resulting true coverage probability may be less than the nominal 1 - a value.
208
N.R. Adam, S. IV.. Klein / On coverage probability in batch means algorithms
With a fixed sample size, the batching solution to the above dilemma is to form k nonoverlappin__g batches, each of assumedly integer length m = n / k (any noninteger problems are easily handled). Let Yj k -- - Y)2/(k - 1) denote the sample variance denote the mean of batch j ( j = 1, 2 . . . . . k) and s k2 = Ej=1(Yj of the ~'s, where Y = E~=l~./k. Suppose that we have been successful in choosing k so as to make the batch means marginally normal and independent (this may not be strictly possible for a given process). Since E(~.) = # for all j, the classical 1 - a confidence interval for # based upon the batch means is given by Y + ' ~'ct/2,k-lk~'k/ r~2/k'O/2 1 • It is apparent that a key aspect of the batching approach is the rule for determining the number of batches from which to form the interval estimator of/~. It is customary to begin with a small batch size (say m = 1) and successively increase m by some multiple until a test of the hypothesis of zero first-order autocorrelation among batch means does not reject. This is reasonable since one can show that for a broad class of processes the autocorrelation between any two batch means approaches zero as rn gets large (see, for example, Law and Carson (1979), and Kleijnen (1975)). Furthermore, a central limit theorem for dependent observations gives asymptotic normality of ( ~ } as rn increases. However, as pointed out by Schmeiser (1982), for larger m, the associated smaller number of batches k results in a confidence interval of substantially increased expected width. Thus, the batch size m cannot be increased with impunity in order to attain approximate independence and normality in the { ~ }. Moreover, since the expected width diminishes only negligibly once k exceeds moderate levels, Schmeiser recommends using between 10 and 30 batches in forming the confidence interval.
3. Coverage probability considerations In order to study the effect on coverage probability of erroneously stopping with a batch size too small to eliminate most autocorrelation among batch means, we consider two often occurring underlying processes for which it is possible to determine relevant parameters of the resulting batch means process.
3.1. The AR(1) underlying process Several studies have modeled simulation output as an autoregressive process (for the purpose of tractability in some cases), including those of Fishman (1972), Andrews and Schriber (1978), and Snell and Schruben (1985). We here assume that the original observations constitute a stationary AR(1) process as follows: Y~+ ~Y~_~ = c~ for i>~ 1,
(3)
where 0 ~< [~,[ < 1 and {c i, i >/1} are i.i.d, with mean 0 and variance o~. Then Pl = - ~ and Var(Yi) = R0 = o , 2 / ( 1 - ~,2). Kang and Schmeiser (1986) show that the batch means process for batch size m is ARMA(1, 1), i.e., Yj + q~Yj_, = ~j + 0~j_,
for j>~ 1,
(4)
with ~ = (--1)m+'@ m
(5)
~= ((2(~ 1 3t" ~2..~ 1)--3I- [(1- (~2)(1- (2Pl -]- (~)2)] 1/2'~/[2(plr . ..~ (~)] ,
(6)
~1 = pl( 1 +~)2/[mc(1 +~)2]
(7)
and
where
N.R. Adam, S. IV,, Klein / On coverage probability in batch means algorithms
209
and
<:1+
+
++)-1]/[m(1
+ +)'1 ).
(8)
Furthermore, ~j has mean 0 and variance given by o? = Ro(1 - ~ 2 ) / ( 0 2 - 2 ~ 0 + 1),
(9)
where
k o = cRo/m.
(10)
Now, if o2 is chosen according to o? = (1 - 8 2 ) / ( 0 2 -
2 8 0 + 1),
(11)
then { ~ } as defined in (4) will clearly be a process with mean 0 and unit variance. For comparative purposes, all ARMA(1, 1) batch means processes considered in this paper have the same mean and variance. In order to simulate batch means processes for various batch sizes and with different levels of first-order autocorrelation, we choose q~= 0.1, 0.3, 0.5, and 0.7 with m = 2, 4, and 8. For given q~ and m, the batch means process' autoregressive parameter ~ is obtained from (5); the moving average parameter is obtained by solving for c in (8), for 01 in (7) and, finally, for 0 in (6). Precisely one of the two roots in (6) is in ( - 1, 1) (see Kang and Schmeiser, 1985) and we choose this root for invertibility purposes, i.e., the current batch mean can then be expressed as a linear combination of all prior batch means with weights whose sum converges. The ( ~ } are then generated via (4), with ij taken to have mean 0 and variance given in (11) for j >/1. Initially, we choose the error term to be normally distributed with Y0 = ~0 taken to be normal with mean 0 and variance (1 - 0 2 ) / ( 0 - ~ ) 2 Thus, ~ has mean 0 and variance 1 for j >/1. The simulation is performed for 10, 30, 50, 70, and 100 batches. We note that computer time is saved by simulating the ARMA(1, 1) batch means process directly, rather than batching after generating the underlying AR(1) process. The 95% confidence interval for/~ is then computed. For each case, 5000 replications are performed and the proportion of intervals covering the mean of 0 is calculated as an estimate of the coverage probability P(O E Y + -- t a / 2 , k - l \ ~sk2/t'~/z~ ~ ~') J" The estimator has a standard error of at most [(0.5 × 0.5)/5000] ~/2, or 0.007. Values of coverage probability are given in Table 1. We note the significant drop in coverage probability from the nominal 0.95 value for b~ as small as 0.3 or 0.4. For example, for m = 4 and 0a = 0.414, the coverage probability is about 0.83 (dependency of estimated coverage upon k within the given range of k is negligible). Degradation in coverage probability is also observed for other processes having mean zero and unit variance, but with considerably different error distributions. The two processes chosen involve distributions which are skewed in the one case (the exponential with mean 1) and heavy-tailed in the other (the t with three degrees of freedom) and are generated as follows: (i) Let { ~ , j>~0} be i.i.d, exponential ( m e a n = 1), # as in (11), , j = o,(Uj-1) for j>~ 1, and Y0 = ~0 = [(1 z a 2 ) ~ / 2 / l0 - ~ I](U0 - 1). Then, ~ as defined in (4) has mean 0 and variance 1, for j >/1. (ii) Let { Wj, j >/0} be i.i.d, t random variables with three degrees of freedom (this is the smallest number of freedom for which the t distribution has finite variance). Define o~ as in (11), ij = (oJ31/2)Wj, for j>~ 1 and Y0 = i0 = [(½(1- o 2 ) ) ~ / 2 / 1 0 - q , IlW0. Then, ~ as in (4) has mean 0 and variance 1, for
j~l. 3.2. The M / M / 1 queueing system We now consider the underlying process { Y, i = 1, 2 . . . . } to be an M / M / 1 represents the waiting time of the ith customer (excluding service time).
queueing system, where Y~
N.R. Adam, S. IV..Klein / On coverageprobability in batch means algorithms
210
Table 1 Estimated coverage probability as a function of autocorrelation--AR(1) process (0.95 nominal confidence coefficient) Error distribution
k
Batch size = 2 ff~:
Batch size = 4 if1:
Batch size = 8 Pl :
0.055
0.195
0.375
0.595
0.027
0.097
0.213
0.414
0.013
0.045
0.099
0.225
Normal
10 30 50 70 100
0.9380 0.9426 0.9334 0.9328 0.9328
0.9032 0.9058 0.9008 0.8976 0.8966
0.8394 0.8436 0.8364 0.8358 0.8346
0.7160 0.7202 0.7236 0.7236 0.7216
0.9450 0.9492 0.9402 0.9388 0.9404
0.9290 0.9318 0.9248 0.9224 0.9230
0.9006 0.9006 0.8982 0.8956 0.8940
0.8274 0.8330 0.8270 0.8262 0.8256
0.9478 0.9516 0.9424 0.9412 0.9428
0.9398 0.9454 0.9370 0.9350 0.9356
0.9278 0.9314 0.9244 0.9220 0.9228
0.8978 0.8978 0.8944 0.8916 0.8928
Exponential
10 30 50 70 100
0.8814 0.9074 0.9216 0.9244 0.9256
0.8480 0.8742 0.8866 0.8886 0.8904
0.7926 0.8104 0.8248 0.8248 0.8288
0.6780 0.6944 0.7122 0.7000 0.7098
0.8880 0.9128 0.9276 0.9306 0.9332
0.8724 0.9000 0.9112 0.9174 0.9174
0.8458 0.8714 0.8848 0.8836 0.8870
0.7836 0.8000 0.8138 0.8150 0.8202
0.8906 0.9166 0.9300 0.9332 0.9366
0.8844 0.9092 0.9240 0.9270 0.9294
0.8724 0.8990 0.9110 0.9162 0.9172
0.8420 0.8694 0.9922 0.8802 0.8842
t
10 30 50 70 100
0.9468 0.9442 0.9380 0.9368 0.9392
0.9142 0.9092 0,9038 0.8984 0.9012
0.8516 0.8428 0.8346 0.8292 0.8294
0.7158 0.7140 0.7146 0.7054 0,7168
0.9532 0.9510 0.9438 0.9448 0.9468
0.9382 0.9340 0.9280 0.9280 0.9304
0.9110 0.9072 0.9012 0.8950 0.8970
0.8396 0.8300 0.8246 0.8184 0.8190
0.9558 0.9528 0.9470 0.9484 0.9500
0.9492 0.9476 0.9406 0.9396 0.9418
0.9378 0.9338 0.9274 0.9274 0.9304
0.9092 0.9052 0.8986 0.8932 0.8924
Kleijnen (1975, p. 458) shows that the first-order autocorrelation among the batch means arising from a stationary underlying process is given by
if,=
(l/m)p,+ 1=1
~ (l/rn)p2m_ , 2
(m-l)/rnp,+
(12)
l .
1~1
We determine the lag-/autocorrelation among the Y/'s needed in (12) from Blomqvist (1967), where it is shown that the lag-/autocovariance (for l = 0, 1, 2 . . . . ) for an M / M / 1 process is given by
C(')=[(1-d2)(yd)
-2]
~.. i=l+3
[d/(1
+d)2]i[(2i-3)'//i'(i-2)'](i-I
-
1)(i-1-2),
(13)
where V is the average service rate = 1 / m e a n service time, ~ the average arrival rate = 1 / m e a n interarrival time, and d the average utilization = a/V. Thus, the lag-/ autocorrelation among the individual Yi's is Pt = C ( I ) / C ( O ) . We suitably approximate the infinite series in (13) for arbitrary d and l. In order to simulate batch means processes with different levels of first-order autocorrelation, we choose m = 16, 32, and 64 and d = 0.6, 0.7, and 0.8, resulting in the following values of Pl, as computed from (12) and (13): 0.38, 0.57, 0.77 for m = 16 and d = 0.6, 0.7, 0.8 respectively, 0.23, 0.44, 0.70 for m = 32 and d = 0.6, 0.7, 0.8 respectively, 0.11, 0.23, 0.57 for m = 64 and d = 0.6, 0.7, 0.8 respectively. We note that to obtain the small to moderate values of Pl (among those above) at realistic utilization levels requires larger batch sizes than those used in Section 3.1. Samples of size k - - 10, 30, 50, 70, and 100 are generated by simulating an M / M / 1 queueing system with mean service rate taken to be 1 and ~, therefore, equal to d. For each case, we collect observations on the resulting batch means sequence for the particular choice of m. In order to overcome transient conditions, the first 50000 observations are discarded. The 95% confidence interval for the process mean /x is then calculated. For each case, 1000 replications are performed and the proportion of intervals covering the mean/~ is computed as an estimate
N.R. Adam, S. IlK Klein / On coverage probability in batch means algorithms
211
Table 2 Coverage probability as a function of a u t o c o r r e l a t i o n - - M / M / 1 queue (0.95 nominal confidence coefficient) Batch size = 16
10 30 50 70 100
Batch size = 32
Batch size = 64
0.38
0.57
0.77
0.23
0.44
0.70
0.11
0.23
0.57
0.776 0.785 0.787 0.804 0.825
0.730 0.724 0.730 0.748 0.735
0.592 0.592 0.582 0.587 0.584
0.839 0.869 0.877 0.873 0.882
0.785 0.810 0.828 0.834 0.843
0.666 0.670 0.671 0.695 0.712
0.876 0.896 0.909 0.913 0.909
0.815 0.863 0.882 0.877 0.876
0.776 0.800 0.815 0.803 0.808
of the coverage probability P ( t t ~ Y + t ~ / 2 , k l ( s 2 / k ) l / 2 } . S i n c e 7 = 1, we have I~ = d / ( 1 - d) = 1.5, 2.33, and 4.0 for d = 0.6, 0.7, and 0.8, respectively. Results are summarized in Table 2. We observe that, for moderate levels of autocorrelation, coverage is significantly reduced from the nominal level of 0.95. The conclusion is that for both the M / M / 1 queueing system and the underlying AR(1) processes, it is important from a coverage point of view to detect even moderate levels of first-order autocorrelation in the batch means sequence.
4. The test for first-order autocorrelation among batch means
In most batching algorithms, the test of 'independence' among batch means variously employs the sample first-order (or higher) serial correlation coefficient in the { ~ }, ~31, given by k-1
k
f3, = Y" ( ~ -
Y)(~.+,-
Y ) / Y" ( ~ -
j~l
~)2.
(14)
j=l
Fishman (1978a) suggests a test of zero first-order autocorrelation due to von Neumann (1941) which essentially considers the ratio of the sum of squared successive differences in a sampled time series to the sample variance. The test statistic, C~, is defined by Ck=l--
Y" ( g - - g + l )
/ 2
(~_~)2
,
(15)
j=l
where, for suitably large k, C k is approximately normal with mean zero and variance (k - 2 ) / ( k z _ 1) when Pl = 0 (Fishman, 1978a). Thus, an approximately level a test rejects H 0 : Pl = 0 against H a : Pl > 0 if C k > Z ~ ( ( k - 2)(k z - 1)) 1/2, where Z~ is the 100(1 - a) percentile of a standard normal random variable. The statistic used in (15) is easily seen to equal t3a except for two end-effect terms whose influence shrinks as k increases (Fishman, 1978b, p. 239). If the data (i.e., batch means in our setting) are marginally normal, the von Neumann ratio (VNR) test exhibits larger power in detecting first-order autocorrelation than various competitors (Bartels, 1982). The magnitude of the power, however, may not be great for moderate values of autocorrelation. Moreover, in the presence of nonnormality of { ~ ) , the V N R test may not be superior at all. As Bartels (1977) points out, t31 and the V N R statistic have their distributions somewhat distorted by departures from normality, this distortion being more pronounced with heavy-tailedness of the parent family and small sample size (the number of batches k in our case). As a result, the power under moderate autocorrelation of the VNR test may be of a low order of magnitude when the usual significance levels are chosen and k is not very large. This implies a substantial probability of falsely concluding independence of ( ~ ) (i.e., stopping with too small a batch size), with resulting degradation in coverage probability as discussed in the previous
212
N.R, Adam, S.W. Klein / On coverage probability in batch means algorithms
section. T h i s s c e n a r i o m a y h e l p e x p l a i n s o m e o f t h e r e l a t i v e l y p o o r c o v e r a g e r e s u l t s r e p o r t e d b y c e r t a i n i n v e s t i g a t o r s (see, e.g., F i s h m a n (1978a), a n d A d a m (1983)). I n a t t e m p t i n g to i m p r o v e u p o n the p o w e r o f t h e v o n N e u m a n n r a t i o test, o n e c a n s i m p l y i n c r e a s e the l e v e l o f s i g n i f i c a n c e o f t h e test. A s e c o n d s o l u t i o n is to r e q u i r e a ' l a r g e ' n u m b e r o f b a t c h e s , s u f f i c i e n t to y i e l d p o w e r v a l u e s o f d e s i r a b l e m a g n i t u d e . W e d i s c u s s this a p p r o a c h s u b s e q u e n t l y . A t h i r d a l t e r n a t i v e is to e m p l o y a d i s t r i b u t i o n - f r e e test o f z e r o f i r s t - o r d e r a u t o c o r r e l a t i o n . U s e o f the test c o u l d b e u n e q u i v o c a l o r c o n t i n g e n t u p o n r e j e c t i o n o f n o r m a l i t y b y t h e S h a p i r o - W i l k (or o t h e r ) test o f n o r m a l i t y . A r a n k v e r s i o n o f t h e v o n N e u m a n n r a t i o test is p r o p o s e d b y B a r t e l s (1982) in a g e n e r a l t i m e series s e t t i n g n o t d i r e c t e d at s i m u l a t i o n o u t p u t analysis. T h e r a n k test s t a t i s t i c s u g g e s t e d is s i m p l y a l i n e a r t r a n s f o r m a t i o n o f t h e r a n k serial c o r r e l a t i o n c o e f f i c i e n t , w h i c h h a s h i g h a s y m p t o t i c e f f i c i e n c y , r e l a t i v e to t h e serial c o r r e l a t i o n c o e f f i c i e n t , a g a i n s t f i r s t - o r d e r a u t o c o r r e l a t i o n . I n p a r t i c u l a r , l e t t i n g R j b e t h e r a n k o f in { ~ , j = 1, 2 . . . . . k }, the r a n k test s t a t i s t i c is g i v e n b y k-I E (Rjj=l
k Rj+ 1)2/E (Rjj=l
~)2,
(16)
w h e r e R is t h e m e a n r a n k a n d is g i v e n b y l ( k + 1). C r i t i c a l v a l u e s o f t h e test s t a t i s t i c in (16) are b a s e d
Table 3 Estimated power of the rank and VNR tests of randomness--AR(1) process (significance level of test is 0.05) Error distribution
k
Test
Batch size = 2 01:
Normal
10 RVN VNR 30 RVN VNR 50 RVN VNR 70 RVN VNR 100 RVN VNR
Exponential
10 RVN 0.0658 VNR 0.0654 30 RVN 0.0932 VNR 0.0818 50 RVN 0.1260 VNR 0.0978 70 RVN 0.1484 VNR 0.1066 100 RVN 0.1858 VNR 0.1300
t
10 RVN VNR 30 RVN VNR 50 RVN VNR 70 RVN VNR 100 RVN VNR
0.055
0.195
0.0612 0.1118 0.0638 0.1138 0.0800 0.2386 0.0850 0.2548 0.0978 0.3540 0.1018 0.3752 0.1156 0.4440 0.1140 0.4740 0.1350 0.5868 0.1346 0.6184 0.1244 0.1128 0.3264 0.2330 0.4954 0.3392 0.6274 0.4580 0.7782 0.5920
0.0640 0.1208 0.0596 0.1090 0.0914 0.2798 0.0786 0.2360 0.1124 0.4134 0.0960 0.3660 0.1296 0.5422 0.1096 0.4810 0.1564 0.6840 0.1280 0.6234
Batch size = 4 Pl:
Batch size = 8 Pl:
0,375
0,595
0.027
0.097
0.213
0.414
0.013
0.045
0.2130 0.2358 0.5828 0.6230 0.7976 0.8266 0.9014 0.9274 0.9730 0.9784
0,4094 0.4720 0.9124 0.9402 0.9892 0.9944 0.9986 0.9994 1.0000 1.0000
0.0552 0.0556 0.0652 0.0652 0.0708 0.0714 0.0780 0.0782 0.0852 0.0834
0.0748 0.0762 0.1104 0.1192 0.1460 0.1642 0.1866 0.1924 0.2342 0.2456
0.1184 0.1230 0.2660 0.2814 0.4000 0.4274 0.5068 0.5430 0.6524 0.6890
0.2422 0.2782 0.6630 0.7120 0.8642 0.8988 0.9498 0.9638 0.9908 0.9940
0.0534 0.0536 0.0574 0.0580 0.0620 0.0594 0.0658 0.0628 0.0686 0.0676
0.0592 0.0752 0.0596 0.0770 0.0736 0.1122 0.0778 0.1220 0.0880 0.1496 0.0980 0.1666 0,0982 0.1914 0,0966 0.1978 0,1152 0.2410 0.1178 0.2502
0.1224 0.1318 0.2862 0.3050 0.4284 0.4612 0.5496 0.5852 0.6938 0.7308
0.2520 0.2258 0.6916 0.6222 0.8990 0.8470 0.9696 0.9442 0.9964 0.9896
0.4264 0.4692 0.9506 0.9582 0.9988 0.9984 1,0000 1.0000 1.0000 1.0000
0.0560 0.0594 0.0636 0.0640 0,0808 0.0732 0.0862 0.0752 0.1080 0.0810
0.0834 0.0786 0.1460 0.1126 0.2120 0.1450 0.2724 0.1728 0.3506 0.2254
0.1342 0.1200 0.3602 0.2660 0.5454 0.3958 0.6882 0.5192 0.8308 0.6696
0.2776 0.2684 0.7608 0.7168 0.9418 0.9190 0.9843 0.9774 0.9986 0.9976
0.0520 0.0554 0.0544 0.0572 0.0628 0.0646 0.0654 0.0638 0.0774 0.0680
0.0620 0.0630 0.0830 0.0742 0.1086 0.0906 0.1228 0.0930 0.1576 0.1106
0.1418 0.1268 0.3810 0.2850 0.5794 0.4336 0.7162 0.5636 0.8626 0.7204
0.0540 0.0514 0.0692 0.0616 0.0768 0.0724 0.0822 0.0768 0.0906 0,0794
0.0784 0.0736 0.1308 0.1110 0.1808 0,1496 0,2200 0.1820 0,2868 0.2304
0.1294 0.1198 0.3120 0.2636 0.4622 0.4188 0.6036 0.5520 0.7476 0.7020
0.2658 0.2672 0.7158 0.7194 0.9128 0.9114 0.9738 0.9698 0.9962 0.9956
0.0496 0.0482 0.0590 0.0546 0.0646 0.0624 0.0680 0.0626 0.0714 0.0618
0.0606 0.0790 0.0568 0.0740 0.0820 0.1326 0.0724 0.1128 0.1004 0.1858 0.0862 0.1534 0.1100 0.2256 0.0950 0.1858 0.1312 0.2940 0.1082 0.2354
0.2336 0.4172 0.2324 0.4530 0.6426 0.9370 0.6308 0.9456 0.8530 0.9952 0.8442 0.9946 0.9434 0.9994 0.9354 0.9996 0.9898 1.0000 0.9846 1.0000
0.099
0.0838 0.0790 0.1494 0.1140 0.2178 0.1492 0.2806 0.1788 0.3604 0,2312
0,225
0.1354 0.1260 0.3338 0.2874 0.5018 0.4574 0.6416 0.5984 0.7866 0.7406
N.R. Adam, S.W. Klein / On coverageprobability in batch means algorithms
213
Table 4 Estimated power of the rank and VNR tests of randonmess--M/M/1 queue (significance level of test is 0.05)
k
Test
Batch size = 16
Batch size = 32
Batch size = 64
0.38
0.57
0.77
0.23
0.44
0.70
0.11
0.23
0.57
10
RVN VNR
0.117 0.142
0.194 0.232
0.347 0.426
0.085 0.091
0.138 0,145
0.232 0.266
0.058 0.073
0.080 0.090
0.146 0.170
30
RVN VNR
0,234 0,335
0.474 0,613
0.779 0.871
0.140 0.183
0,277 0,380
0.557 0.664
0.088 0.114
0.141 0.190
0.336 0.460
50
RVN VNR
0.370 0.522
0.678 0.807
0.928 0.971
0.210 0.277
0.416 0.556
0.736 0.848
0,115 0,135
0.201 0.283
0.489 0.642
70
RVN VNR
0.491 0.655
0.801 0.924
0.978 0.995
0.256 0.358
0,851 0.678
0.140 0.944
0.250 0.172
0.601 0.336
0.768
RVN VNR
0,614 0.799
0.901 0,975
1.000 1.000
0.312 0.449
0.627 0.784
0.941 0.983
0.166 0.210
0.338 0.456
0.739 0.885
100
u p o n approximating its distribution under H 0 : Pl = 0 by Beta distributions with support on (0, 4) (see Bartels (1982) for details). We are interested in c o m p a r i n g the V N R test and its rank version ( R V N ) in testing H 0 : 01 = 0 versus H1 : Pl > 0 for the processes described in Section 3. Bartels (1982) c o m p a r e s the tests for an AR(1) process in a n o n b a t c h i n g setting and his results favor the rank test for a variety of n o n n o r m a l error processes therein considered. Tables 3 and 4 present the results of M o n t e Carlo p o w e r studies (based u p o n 5000 replications for the A R M A ( 1 , 1) batch means process and 1000 replications for the underlying M / M / 1 process), with the significance level of all tests taken to be 0.05. W e note that, in studying an A R M A ( 1 , 1) batch means process, we are c o m p a r i n g the R V N and V N R for a different class of processes than does Bartels (1982). We observe that, for the A R M A ( 1 , 1) batch means processes with n o n n o r m a l error terms, the power comparisons again favor the rank test. As expected in the n o r m a l case, the rank test is o u t p e r f o r m e d b y the ratio test, although the margin of superiority is small. However, the situation is reversed for the M / M / 1 underlying process, i.e., comparisons favor the V N R test. The conclusion is that while the rank test does relatively well (vis-~t-vis the ratio test) in detecting autocorrelation in the batch means sequence for certain processes, it does relatively p o o r l y for others. I n addition, the ability of either test to detect modest levels of autocorrelation (for which the coverage probability is significantly reduced from nominal levels) is not large unless the n u m b e r of batches is large.
5. Testing for normality As mentioned earlier, some b a t c h i n g algorithms suggest incorporating a test of normality of { Yj } in conjunction with use of the V N R test. F i s h m a n (1978b), and Bratley et al. (1987) prefer the S h a p i r o - W i l k (SW) test (Shapiro and Wilk, 1965), which is considered a powerful procedure for detecting n o n n o r m a l i t y (Royston, 1982). The test statistic is given b y
]2/k
[ Ik/2l
Wk =
E j=l
(Y(k-j+~)-- Y(j))aj.k
E (Y(j)- ~)2, j=l
where ~ j ) is the j t h order statistic f r o m Yx, Y2,..., Yk, for j -- 1, 2 _. . . . . k, the a j, k s are suitable constants and [-] denotes the greatest integer function. T h e hypothesis that { Yj } are n o r m a l is rejected at the a level if W a < Wk,* where tables of w k* and the abk ' S are provided in, for example, F i s h m a n (1978b) and Bratley
N.R. Adam, S. W. Klein / On coverage probabifity in batch means algorithms
214
Table 5 Rejection probabilities for the Shapiro-Wilk t e s t - - A R ( 1 ) process (significance level of test is 0.05) Distribution
k
Batch size = 2
Batch size = 4
Batch size = 8
~: 0.055
0.195
0.375
0.595
0.027
0.097
0.213
0.414
0.013
0.045
0.099
0.225
Exponential
10 30 50
0.4218 0.9632 0.9994
0.3470 0.9272 0.9980
0,3226 0.8146 0.9658
0.1144 0.5552 0.8158
0.4268 0.9646 0.9996
0.4066 0.9566 0.9992
0.3324 0.9184 0.9972
0.2072 0.7732 0.9562
0.4308 0.9654 0.9996
0.4246 0.9642 0.9994
0.4054 0.9560 0.9992
0.3248 0.9134 0.9964
t
10 30 50
0.1870 0.4306 0.5488
0.1748 0.4058 0.5190
0.1440 0.3486 0.4466
0.1132 0.2908 0.3670
0,1886 0,4324 0.5500
0.1878 0.4280 0.5414
0.1730 0.3996 0.5126
0.1378 0.3372 0.4296
0.1886 0.4326 0.5490
0.1872 0.4318 0.5478
0.1878 0.4276 0.5410
0.1710 0.3970 0.5060
Normal
10 30 50
0.0436 0.0458 0.0392
0.0448 0.0452 0.0396
0.0444 0.0480 0.0458
0.0428 0.0656 0.0766
0.0428 0.0438 0.0386
0.0476 0.0476 0.0380
0.0442 0.0440 0.0384
0.0440 0.0494 0.0488
0.0428 0.0444 0.0390
0.0436 0.0448 0.0384
0.0474 0.0476 0.0376
0.0458 0.0446 0.0390
Table 6 Rejection probability for the Shapiro-Wilk t e s t - - M / M / 1 Batch size = 16
10 30 50
queue (significance level of test is 0.05) Batch size = 32
Batch size = 64
0,38
0.57
0.77
0.23
0.44
0.70
0.11
0.23
0.57
0.487 0.984 1.000
0.482 0.986 1.000
0.377 0.977 1.000
0,445 0.946 0.999
0.471 0.970 0.999
0.428 0.978 1.000
0.342 0.894 1.000
0.425 0.944 0.989
0.471 0.975 0.999
et al. (1987). However, the SW test assumes the independence of observations precisely for which a test such as the V N R or RVN is being conducted. One important concern, therefore, is the reliability of the SW test when applied to correlated observations. To investigate this effect, we apply the SW test to the processes described in Section 3. The rejection probabilities are presented in Tables 5 and 6. For the underlying AR(1) process, the power of the SW test diminishes significantly when there is moderate autocorrelation in the resulting ARMA(1, 1) batch means process and the number of batches is small (10-30). This is especially true when the test is attempting to detect the t distribution, an unsurprising result in view of the similarities between the t and normal families. Even for the (skewed) exponential error distribution, the power is significantly diminished for moderate autocorrelation. For the underlying M / M / 1 queueing system, however, the power is high unless k is small (say 10).
6. Conclusions
As applied to simulation output from underlying AR(1) and M / M / 1 queueing processes, our results concerning batch means algorithms are summarized as follows: (a) Coverage probability for the process mean decreases significantly at merely moderate levels of autocorrelation in the batch means process. It is, therefore, important that any test used to detect dependency among batch means have high power at moderate autocorrelation. (b) Neither the yon Neumann ratio test, nor its rank version, enjoys uniform superiority as a test of zero autocorrelation. For both tests, the power at moderate autocorrelation may not be sufficiently large to prevent coverage degradation (as mentioned above), unless the number of batches is large. (c) The Shapiro-Wilk test to detect nonnormality in the batch means sequence has significantly reduced power when applied to certain autocorrelated batch means processes, unless the number of batches is large. Therefore, its use to validate application of the von Neumann ratio test is questionable.
N.R. Adam, S. W. Klein / On coverage probability in batch means algorithms
215
(d) O u r results lend s u p p o r t to the r e c o m m e n d a t i o n of K l e i j n e n et al. (1982) that a large n u m b e r of batches (perhaps at least 100) b e used i n b a t c h i n g algorithms w h e n testing for first-order autocorrelation. U p o n acceptance of the hypothesis of zero first-order a u t o c o r r e l a t i o n , one m a y t h e n conservatively opt to reduce the n u m b e r of batches to a r o u n d 30. T h e larger resulting b a t c h size tends to further reduce any d e p e n d e n c y within b a t c h means, w i t h o u t significantly i n c r e a s i n g the expected w i d t h of the confidence interval for the process m e a n (Schmeiser, 1982). F u r t h e r m o r e , the b a t c h m e a n s are likely to be more n o r m a l i n d i s t r i b u t i o n for the larger b a t c h size, a n a d v a n t a g e for those u s i n g the v o n N e u m a n n ratio test.
Acknowledgment W e wish to t h a n k J.P.C. Kleijnen, H.M. M a r k o w i t z a n d L. S c h r u b e n for their constructive c o m m e n t s a n d suggestions o n a n earlier draft of this paper.
References Adam, N.R. (1983), "Achieving a confidence interval for parameters stimated by simulation", Management Science 29, 856-866. Anderson, T.W. (1971), The Statistical Analysis of Time Series, Wiley, New York. Andrews, R.W., and Schriber, T.J. (1978), "Interactive analysis of output from GPSS-based simulation", Winter Simulation Conference Proceedings, 267-278. Bartels, R. (1977), "Estimation in a first order autoregressive scheme with non-normal stable disturbances", Journal of Statistical Computation and Simulation 6, 35-48. Barrels, R. (1982), "The rank version of von Neumann's ratio test or randomness", Journal of the American Statistical Association 77, 40-46. Blomqvist, N. (1967), "The covariance function of the M / G / 1 queueing system", Skandinavisk Aktuarietidskrift, 154-174. Bratley, F., Fox, B.L., and Schrage, L.E. (1987), A Guide to Simulation, Springer, New York. Fishman, G.S. (1972), "Bias considerations in simulation experiments", Operations Research 20, 685-790. Fishman, G.S. (1978a), "Grouping observations in digital simulation", Management Science 24, 510-521. Fishman, G.S. (1978b), Principles of Discrete Event Simulation, Wiley, New York. Kang, L., and Schmeiser, B. (1987), "Properties of batch means from stationary ARMA time series, Operations Research Letters 6, 19-24. Kleijnen, J.P.C., van der Ven, R., and Sanders, B. (1982), "Testing independence of simulation subruns: A note on the power of the Von Neumann test", European Journal of Operational Research 9, 92-93. Kleijnen, J.P.C. (1975), Statistical Techniques in Simulation, Part II, Marcel Dekker Inc., New York. Law, A.M., and Carson, J.S. (1979), "A sequential procedure for determining the length of a steady-state simulation", Operations Research 27, 1011-1025. Lehmann, E.L. (1959), Testing Statistical Hypotheses, Wiley, New York. Mechanic, H., and McKay, W. (1966), "Confidence intervals for average of dependant data in simulation II", Technical Report ASDD 17-202, IBM Corporation, Yorktown Heights, NY. Royston, J.P. (1982), "An extension of Shapiro and Wilk's W test for normality to large samples", Applied Statistics 31, 115-124. Schmeiser, B. (1982), "Batch size effects in the analysis of simulation output", Operations Research 30, 556-568. Shapiro, S.S., and Wilk, M.B. (1965), "An analysis of variance test for normality (complete samples)", Biometrika 65, 591-611. Snell, M., and Schruben, L. (1985), "Weighting simulation data to reduce initialization effects", l i E Transactions 17, 354-362. Von Neumann, J. (1941), "Distribution of the ratio of the mean square successive differences to the variance", Annals of Mathematical Statistics 12, 367-395.