Student t-tests and compound tests to detect transients in simulated time series

Student t-tests and compound tests to detect transients in simulated time series

European Journal of Operational Research 116 (1999) 681±691 Theory and Methodology Student t-tests and compound tests to detect transients in simula...

2MB Sizes 0 Downloads 9 Views

European Journal of Operational Research 116 (1999) 681±691

Theory and Methodology

Student t-tests and compound tests to detect transients in simulated time series Daniel H. Ockerman b

a,1

, David Goldsman

b,*

a Manhattan Associates, 2300 Windy Ridge Pkwy., Atlanta, GA 30339, USA School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0205, USA

Received 1 November 1996; accepted 1 March 1998

Abstract Given a stationary stochastic process, with unknown mean l, interest often lies in developing estimates and con®dence intervals for l. Unfortunately, if the process is actually only asymptotically stationary, the transient behavior can a€ect the validity of these estimates and con®dence intervals. We present a test for detecting transients in simulation output based on the familiar Student t-test. We also demonstrate and exploit the independence of this t-test and certain other initialization bias tests to construct ``compound'' initialization bias tests. We demonstrate and evaluate our tests using Monte Carlo simulation. Ó 1999 Published by Elsevier Science B.V. All rights reserved. Keywords: Simulation; Initialization bias test; Transient; Student t-test

1. Introduction Important issues in simulation output analysis include estimating steady-state parameters such as the mean, constructing con®dence intervals for these parameters, and ranking and selecting among various alternative systems. Given stationary simulation output Y1 ; Y2 ; . . . ; Yn (e.g., the waiting times from a candidate queueing network), a typical ®rst step in output data analysis is to compute the sample mean,

* Corresponding author. Tel.: +1 404 894 2300; fax: +1 404 894 2301; e-mail: [email protected] 1 E-mail: [email protected]

Y n  nÿ1

n X

Yi ;

iˆ1

as an obvious point estimator for the unknown population mean, l. Additionally, as interest often lies in obtaining con®dence intervals for l, as estimate is usually required of the variance parameter (Alexopoulos, 1995; Law and Kelton, 1991), r2  lim n Var…Y n †: n!1

This type of steady-state analysis implicitly assumes that the simulation output is representative of the steady-state behavior of the system. Unfortunately, this is frequently not the case, and although transient behavior is sometimes dicult to detect, it can signi®cantly impact estimates for the

0377-2217/99/$ ± see front matter Ó 1999 Published by Elsevier Science B.V. All rights reserved. PII: S 0 3 7 7 - 2 2 1 7 ( 9 8 ) 0 0 2 3 3 - 1

682

D.H. Ockerman, D. Goldsman / European Journal of Operational Research 116 (1999) 681±691

mean and variance. Typically this issue is dealt with by either specifying initial conditions, providing truncation rules, and/or testing for initialization bias (Chance, 1993; Goldsman et al., 1994; Kimbler and Knight, 1987; Wilson and Pritsker, 1978). In some cases, enough is known about the system to be able to start the simulation in a condition that is representative of steady state. If possible, this is usually the desired approach. Otherwise, the simulation can be started in an arbitrary condition and the output data from an initial time period discarded (i.e., ignored in the output analysis process). Ideally, such an initial time period should extend until the simulation has reached a condition that is representative of steady state. However, it is usually not obvious how long a simulation must run before steady state is (at least, approximately) attained. Truncation rules can be used to help determine what portion of the initial data to throw out (Glynn and Iglehart, 1987; Heidelberger and Welch, 1983; Kelton and Law, 1983; Snell and Schruben, 1985). After we believe that the transient portion of the output data has been removed, an initialization bias test can be performed to determine if, statistically, this is indeed the case. The current paper presents new initialization bias tests to check for the presence of transient behavior, i.e., to test the null hypothesis, H0 : no initialization bias present, against the alternative, H1 : initialization bias present. It is organized as follows. Previous initialization bias tests are described in Section 2. In Section 3 we present a test for detecting transients in simulation output based on the familiar t-test. In this section we also demonstrate and exploit the independence of this t-test and certain other initialization bias tests to construct ``compound'' initialization bias tests. Experimental results from Monte Carlo simulations for several stochastic processes and bias functions are presented in Section 4, while Section 5 provides discussion and conclusions. 2. Previous tests for initialization bias The literature contains a number of initialization bias tests. Many of these tests are constructed

as ratios of variance parameter estimators. We refer to this class of tests as variance ratio tests. The motivation for this class of tests is clear. Suppose a process Y1 ; Y2 ; . . . ; Yn is partitioned into two contiguous, non-overlapping portions. For a given realization of the process, we calculate an estimate of the variance parameter based solely on the ®rst portion of the output, and an estimate based on the latter portion. A large di€erence between these two estimates is unlikely under H0 . If the two estimates are deemed to be signi®cantly di€erent, then we reject H0 in favor of H1 . To construct various benchmark initialization bias tests (which are representative of previous tests), we must ®rst de®ne a number of families of estimators for r2 . To this end, suppose that the data has been divided into b contiguous nonoverlapping batches, each consisting of m observations (we make the convenient assumption that n ˆ mb). Thus batch i; i ˆ 1; 2; . . . ; b; consists of the observations Y…iÿ1†m‡1 ; Y…iÿ1†m‡2 ; . . . ; Yim . The batch-means estimator is a simple and popular estimator, well studied in the literature (see Goldsman et al., 1994). It treats the sample means from each batch as approximately independent and identically distributed (i.i.d.) random variables, yielding the obvious estimator for r2 , 2 Pb ÿ m iˆ1 Y n ÿ Y i;m ; VBM;b ˆ bÿ1 P where Y i;m  mÿ1 mlˆ1 Y…iÿ1†m‡l ; i ˆ 1; 2; . . . ; b, are the batch means. (For the current experiments we used ``reasonable'' values for the number of batches, namely b ˆ 4; 8, and 16.) On the way to deriving other estimators for r2 , Schruben (1983) de®nes the standardized time series of a stochastic process as ÿ  bnuc Y n ÿ Y bnuc p for 0 6 u 6 1; T …u†  r n where bc is the ¯oor function. This standardized time series gives rise to additional variance estimators. For instance, the Lp -norm estimators for r2 are      p 2=p n 1X i rT i sgn rT k Vp;k  cÿ1 p;k n iˆ1 n n …1 6 p < 1; k ˆ 0; 1†;

D.H. Ockerman, D. Goldsman / European Journal of Operational Research 116 (1999) 681±691 Table 1 Lp -norm cp;k values

683

Table 2 Quantiles for Lp -norm estimator ratios

p

cp;0

cp;1

p

k

q;;0:1

1 2 10 1

0.117 0.167 0.385

0.083 0.143 0.380

1 2 10 1 2 10

0 0 0 1 1 1

4.42 4.10 3.37 40.0 9.48 3.58 2.9

0.822

2 2 VLpN ;1 ˆ cÿ1 …p ˆ 1†; 1 sup r T …u† 06u61

where the cp;k are scaling factors (as provided in Table 1 and ®rst tabulated in Tokol et al. (1998)) and   ‡1 if x P 0 sgn…x†  : ÿ1 if x < 0 These estimators are functionals that are related to the area under the standardized time series (see Tokol et al., 1998). (For the current experiments we used reasonable values for p, namely 1, 2, 10, and 1.) Further, Schruben's (1982) ``maximum'' estimator (which is a functional of the standardized time series' maximum and its location) is VSch;max ˆ

^s2 ; 3^t…1 ÿ ^t†

where ^s ˆ supu rT …u† and ^t ˆ arg maxu rT …u†. To construct the initialization bias tests, pick a variance estimator and let the generic notation …1† …2† …n† ‰V; …n†Š represent the application of the V; estimator to the ®rst (second) half of the data series. Under H0 , as the length of the series increases (and the number of batches, b, remains ®xed), the individual estimators independently converge to …2† their limiting distributions, say G…1† ; ‰G; Š (the above references give convergence conditions and independence justi®cation). Thus,  D  …1† …2† …2† …n†; V; …n† ! G…1† ; G V; ; ; ; D

where ``!'' denotes convergence in distribution as n ! 1. Hence, via the continuous mapping theorem (Theorem 5.1, Billingsley, 1968), with mapping h…x; y† ˆ x=y, R…n† ; 

…1† V; …n† …2†

V; …n†

D

!

G…1† ; …2†

G;

 R; :

1

Table 3 Quantiles for other estimator ratios Estimator

q;;0:1

Batch-means (b ˆ 4) Batch-means (b ˆ 8) Batch-means (b ˆ 16) Schruben's Max

5.39 2.79 1.97 5.39

See Ockerman and Goldsman (1996) for details on Rp;k and RLpN ;1 , Schruben (1982) for RSch;max , and Goldsman et al. (1994) for RBM;b . Tables 2 and 3 provide the upper 0.1 quantiles (which we denote as q;;wˆ0:1 ) for these distributions. The entries of Table 2 were ®rst derived in Ockerman and Goldsman (1996); the entries of Table 3 are based upon appropriate v2 distribution quantiles, as discussed in the relevant references. Using the data from these tables, an initialization bias test would reject H0 at the 100w% level if R…n† ; > q;;w . 3. New tests We describe three new initialization bias tests that are based upon the familiar Student t-test. 3.1. Student t-test While previous tests have exploited the impact of initialization bias on various variance parameter estimators, our new test is designed to directly exploit changes in the sample mean (an estimator for l). For instance, negative initialization bias (such as that encountered by the waiting times in an initially empty queueing network; see Schruben (1982)) should reduce the sample mean of the ®rst

684

D.H. Ockerman, D. Goldsman / European Journal of Operational Research 116 (1999) 681±691

half of a time series. Thus, a test could be implemented to determine if the sample mean of the ®rst half of a time series is statistically signi®cantly lower than the sample mean from the second. If this is the case, the test would conclude that there is negative initialization bias present in the ®rst half of the time series. The proper test for this framework is the t-test as described in most elementary statistics books. To construct this statistic we use Y 1;n=2 and Y 2;n=2 (the sample means of the ®rst and second halves of the data series, respectively) and an asymptotically of Y 2;n=2 independent v2 estimate of the variance ÿ ÿ Y1;n=2 . ÿ Asymptotically, nVar Y ÿ Y 2;n=2 ÿ 1;n=2   ÿ  n Var Y 2;n=2 ‡ Var Y 1;n=2 ˆ 2n Var Y 2;n=2 ˆ 2nr2 =…n=2† ˆ 4r2 . We initially consider using …2† VBM;8 …n† to estimate r2 . This estimator has the desired independence properties (i.e., it is asymptotically independent of Y 2;n=2 ÿ Y 1;n=2 ; see Tokol et al. (1998)), it converges to a v2 distribution, and it is not a€ected by initialization bias in the ®rst half of the data series. (We chose eight batches because …2† VBM;8 …n† performed well in preliminary tests.) Therefore, the quantity Y 2;n=2 ÿ Y 1;n=2 QBM ˆ q …2† …4=n†VBM;8 …n† converges to t…7† (a t-distributed random variable with 7 degrees of freedom) under H0 . In other words, this test would reject H0 at the 100w% level if QBM > t7;w where t7;w is the upper-w quantile from the t-distribution with 7 degrees of freedom. Note that this test is constructed to be one-sided based on the assumption of H1 : negative initialization bias is present. This was considered reasonable, because in general it would seem that the overall sign of the bias function would be known; the entire process could be multiplied by ÿ1 if the suspected initialization bias were positive (see Schruben, 1982). 3.2. Compound initialization bias tests We also developed compound initialization bias tests that take advantage of certain independence properties between the t-test of Section 3.1 and the

previous tests of Section 2. Suppose we have a collection of z independent initialization bias tests with individual sizes: ak ; k ˆ 1; 2; . . . ; z (i.e., for the kth test, the probability of rejecting the null hypothesis, given that it is true, is ak and is independent of the results of all of the other tests). Consider aA , the probability of rejecting the null hypothesis given that it is true, for a speci®c subset, A, of the tests and not rejecting the null hypothesis for the others: ! ! Y Y ai …1 ÿ aj † : aA ˆ i2A

j62A

We investigated ways to combine two independent tests, each with size a. In this case, there are two possibilities for constructing compound tests. The ®rst is to reject the null hypothesis only if both of the individual tests indicate rejection. The second is to reject the null hypothesis if either of the individual tests indicates rejection. The size of the ®rst compound test is a2 , and the size of the 2 second is 1 ÿ …1 ÿ a† . While variance ratio initialization bias tests are designed to exploit the impact of initialization bias on the sample variance, and the t-test is designed to exploit the impact on the sample mean, the compound tests described below allow for simultaneously exploiting both. We combine the following two individual initialization bias tests. The ®rst component of the compound tests is another t-test. The di€erence between this t-test and the one described in Section 3.1 is that in this case the variance estimator used in the denominator of the ratio cannot be the batch-means estimator. Since the batch-means estimator will be used in the second component of the compound tests, and since we require independence between the two components, we need a di€erent variance estimator. We use the batched-area estimator (i.e., a batched version of the Lp -norm estimator with p ˆ k ˆ 1) because it is asymptotically independent of the batch-means estimator if the same batches are used (Tokol et al., 1998). As in Goldsman et al. (1994), de®ne …2†

VBA;8 …n† ˆ

8 2 2 1X D r v8 V1;1 ‰iŠ ! 8 8 iˆ1

D.H. Ockerman, D. Goldsman / European Journal of Operational Research 116 (1999) 681±691

as the average of eight area estimators from the second half of the data series (i.e., split the second half of the data series into eight batches, generate V1;1 ‰iŠ  V1;1 from each batch i, and average the estimates). Proceeding as with the previous t-test, split the data series into two halves, and generate Y 1;n=2 and Y 2;n=2 . Consider Y 2;n=2 ÿ Y 1;n=2 QBA ˆ q …2† …4=n†VBA;8 …n† which converges to t…8† under H0 . The ®rst component of the compound tests would reject H0 at the 100w% level if QBA > t8;w . The second component of the compound tests is the ratio of the batch-means estimator test, with eight batches in each half, as described in Section 2. This component of the compound tests would reject H0 at the 100w% level if …n† RBM;8 > qBM;8;w . …n†

Since the batch-means test statistic RBM;8 (con…1† …2† structed from VBM;8 …n† and VBM;8 …n†) and the t-test …2† statistic QBA (constructed from VBA;8 …n†; Y 1;n=2 and Y 2;n=2 ) are asymptotically independent (Tokol et al., 1998), the two tests are aymptotically independent; so it is valid to construct compound tests of either of the following two forms. Compound test #1: Reject the claim of no initialization bias only if both the t-test and the batchmeans test would have individually rejected the claim. We constructed individual tests each with size 0.316, to obtain a compound test with size 0.1. Compound test #2: Reject the claim of no initialization bias if either the t-test or the batchmeans test would have individually rejected the claim. We constructed individual tests each with size 0.051, to obtain a compound test with size 0.1. 4. Results In order to gain insight into the performance of the new tests, we investigated their size (i.e., probability of type I error) and power (i.e., one minus the probability of type II error) using Monte Carlo simulation.

685

4.1. Size of tests The size of a test is de®ned to be P fReject H0 jH0 is trueg. As all of our 100w% tests were constructed to be asymptotically valid as n ! 1 (i.e., the quantiles were chosen to obtain the desired size), the asymptotic sizes of the tests are w. However, the manner in which the sizes approach their limits is important when considering ®nite streams of data. For a given run length, if the size of a test is signi®cantly larger than w, then the test is too likely to reject the null hypothesis when it is true. Thus, we are interested in characterizing and comparing the minimum run lengths for which each of the tests can be considered valid. We estimated the true sizes of the tests for four simple stationary (if properly initialized) processes for run lengths varying between 128 and 32, 768 data points. Each estimate is based upon 10, 000 Monte Carlo replications. The four processes we considered are: A step process [STEP], Yj ; j ˆ 1; 2; . . . ; where the Yj 's are i.i.d. 1, each with probability 0.5. For this process, r2 ˆ 1. A ®rst-order autoregressive process [AR(1)], Yj‡1 ˆ /Yj ‡ Zj‡1 ; j ˆ 1; 2; . . . ; where the Zj 's are i.i.d. n…0; 1 ÿ /2 † with ÿ1 < / < 1. (We consider / ˆ 0:9, in which case r2 ˆ 19.) A ®rst-order moving average process [MA(1)], Yj‡1 ˆ /Zj ‡ Zj‡1 ; j ˆ 1; 2; . . . ; where the Zj 's are i.i.d. n…0; 1†. (We take / ˆ ÿ0:9, so that r2 ˆ 0:01.) The waiting time process for an M/M/1 queueing system. (We consider an arrival rate of 0.8, and a service rate of 1.0, so that r2 ˆ 1976; Whitt, 1989.) The estimated sizes of the 10% tests for the processes are given in Figs. 1±4. In these ®gures, the x-axis represents the individual tests (from the left, the ®rst seven are various Lp -norm tests, the eighth represents Schruben's Max test, the ninth through eleventh are the ratio of batchmeans estimator tests with four, eight, and sixteen batches, while the last three (i.e., those to the right of the dashed line) represent our new tests: the t-test using the batch-means estimator and the two compound tests). The y-axis represents the size of the tests. The symbols represent

686

D.H. Ockerman, D. Goldsman / European Journal of Operational Research 116 (1999) 681±691

Fig. 1. 10% test sizes for STEP process.

Fig. 2. 10% test sizes for AR(1) process.

size results for a particular stochastic process for di€erent run lengths ranging from n ˆ 128 to 32, 768. Ideally, for a 10% test, the size of each of the tests should be about 10% (written as 0.10 in the ®gures). Anything signi®cantly di€erent indicates that the test has not properly converged for the given run length of the particular process.

For each of these processes, both the benchmark tests and the new tests converge properly as n becomes large. For the simple STEP process (see Fig. 1), all of the initialization bias tests demonstrate proper size for all tested values of n. For the AR(1) process (see Fig. 2), most of the tests require fairly large values of n for proper sizes to be obtained (although all of the tests eventually ob-

D.H. Ockerman, D. Goldsman / European Journal of Operational Research 116 (1999) 681±691

687

Fig. 3. 10% test sizes for MA (1) process.

Fig. 4. 10% test sizes for M/M/1 queue.

tain the proper size). For the MA(1) process, for the run lengths considered, most of the tests provide too little size due to this MA(1)'s negative autocorrelation (see Fig. 3). However, the batchmeans tests give proper coverage since the negative autocorrelation washes out in the batches. For the M/M/1 queue (see Fig. 4), most of the tests con-

verge very slowly due to the long-lasting autocorrelation of the process. Of particular interest is the di€erence between the convergence patterns of the compound tests due to the decaying ®nite-sample negative correlation between their two components (i.e., for small samples, the t-test and the batch-means test can be correlated).

688

D.H. Ockerman, D. Goldsman / European Journal of Operational Research 116 (1999) 681±691

4.2. Power of tests The power of a test is P fReject H0 jH0 is falseg. We studied the power of initialization bias tests by observing simulated processes containing known initialization bias. We considered a host of processes including the previously de®ned four processes, a failing network of M/M/1 queues, and several types of reentrant lines. We discuss the results from an AR(1), an M/M/1, and a stable reentrant line. 4.2.1. AR(1) process The ®rst test process is an AR(1) process for which initialization bias is injected via the initial condition. We used two initial conditions (Y1 ˆ ÿ10 and ÿ100) to generate the negative initialization bias. We compared these non-stationary processes to the stationary AR(1) process obtained when Y1  n…0; 1†. Table 4 gives the power results for the 10% tests for n ˆ 512; 2048 and 32,768 and Y1  n…0; 1†; Y1 ˆ ÿ10, and Y1 ˆ ÿ100. For this table, as well as Tables 5 and 6, the ®rst two columns represent the initialization bias tests (in the same order they were presented in the previous charts), while the next three groupings, of three columns each, represent run lengths. Each of the three columns for a given run length

refers to a particular initial condition (e.g., in this case, the three initial conditions include starting the process from n…0; 1†, always starting from ÿ10, and always from ÿ100). The entries in these columns represent the power of an initialization bias test to detect the bias for a given run length. Ideally, the tests would have more power in detecting the two non-stationary processes than in falsely ``detecting'' the stationary process. Thus, the desirable result would be that the power results for the second two columns from each grouping would be larger than those from the ®rst. This appears to be the case for the AR(1) process for all of the tests. Typically, the more powerful tests also demonstrated the worst sizes, especially Schruben's Max test (as can be seen from the ®rst column in each grouping in Table 4). Note that all of the tests demonstrated reduced power for large n because the bias is actually washed out early in the ®rst half of the run for this process (i.e., the run is now big enough to overwhelm the bias). 4.2.2. M/M/1 queue Table 5 presents the results obtained by starting an M/M/1 queue (arrival rate of 0.8, service rate of 1.0) in steady state, empty, and overloaded (ten times steady state).

Table 4 AR(1) power results for 10% tests p

k

n ˆ 512

n ˆ 2048

n ˆ 32768

Y1  n…0; 1†

Y1 ˆ ÿ10

Y1 ˆ ÿ100

Y1  n…0; 1†

Y1 ˆ ÿ10

Y1 ˆ ÿ100

Y1  n…0; 1†

Y1 ˆ ÿ10

Y1 ˆ ÿ100

1 0 1 1 2 0 2 1 10 0 10 1 Lp -norm …p ˆ 1† Sch. Max BM (b ˆ 4) BM (b ˆ 8) BM (b ˆ 16)

0.130 0.101 0.133 0.106 0.145 0.141 0.162 0.219 0.100 0.099 0.121

0.650 0.300 0.660 0.451 0.683 0.662 0.707 0.803 0.386 0.713 0.936

1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

0.115 0.102 0.115 0.107 0.121 0.119 0.134 0.163 0.102 0.102 0.110

0.301 0.169 0.287 0.206 0.266 0.255 0.281 0.514 0.169 0.232 0.357

1.000 0.949 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

0.099 0.096 0.099 0.097 0.098 0.098 0.103 0.120 0.102 0.099 0.101

0.109 0.104 0.107 0.103 0.103 0.104 0.109 0.174 0.105 0.105 0.112

0.772 0.371 0.800 0.579 0.848 0.828 0.863 0.954 0.447 0.764 0.971

t-test Compound #1 Compound #2

0.139 0.122 0.197

0.453 0.729 0.771

1.000 1.000 1.000

0.108 0.106 0.122

0.223 0.288 0.275

1.000 1.000 1.000

0.106 0.100 0.100

0.127 0.119 0.116

0.458 0.754 0.726

D.H. Ockerman, D. Goldsman / European Journal of Operational Research 116 (1999) 681±691

689

Table 5 M/M/1 power results for 10% tests for various initial conditions p

k

n ˆ 512

n ˆ 2048

n ˆ 32768

Steady state

Empty

Overloaded

Steady state

Empty

Overloaded

Steady state

Empty

Overloaded

1 0 1 1 2 0 2 1 10 0 10 1 Lp -norm (p ˆ 1) Sch. Max BM (b ˆ 4) BM (b ˆ 8) BM (b ˆ 16)

0.246 0.131 0.254 0.181 0.275 0.269 0.292 0.282 0.214 0.270 0.315

0.211 0.108 0.216 0.153 0.236 0.231 0.252 0.314 0.184 0.235 0.276

0.511 0.340 0.515 0.438 0.532 0.525 0.545 0.542 0.463 0.533 0.577

0.219 0.124 0.225 0.162 0.243 0.237 0.262 0.248 0.186 0.258 0.305

0.205 0.118 0.212 0.152 0.230 0.223 0.249 0.260 0.178 0.246 0.291

0.568 0.431 0.572 0.510 0.586 0.580 0.598 0.615 0.517 0.592 0.632

0.143 0.111 0.146 0.124 0.157 0.153 0.170 0.165 0.126 0.164 0.213

0.142 0.111 0.145 0.123 0.156 0.151 0.168 0.167 0.126 0.163 0.212

0.359 0.254 0.361 0.310 0.373 0.367 0.385 0.447 0.294 0.365 0.429

t-test Compound #1 Compound #2

0.126 0.021 0.451

0.150 0.021 0.452

0.638 0.606 0.684

0.070 0.021 0.298

0.077 0.023 0.295

0.539 0.625 0.597

0.073 0.042 0.140

0.075 0.043 0.141

0.313 0.405 0.360

For this M/M/1 process, although all of the tests are powerful at detecting bias when the system is started overloaded, the majority of the tests detect bias more often when the system is started from steady-state conditions then when it is started empty. This is not the case for Schruben's Max test and the t-test. These results are not entirely surprising due to the impact of the initial ``variance'' reduction caused by starting empty. In other words, when a queue is started empty, initially the waiting times are small, which leads to low vari-

ance estimates and hence ine€ective variance ratio tests. In the literature, this initial reduction in the variance is referred to as a ``second-order e€ect''. Obviously for the M/M/I queue, these e€ects are very signi®cant. The motivation for the t-test is now clear. 4.2.3. Two-station reentrant line We also considered the cycle times from the two-station reentrant line as depicted in Fig. 5. The arrival process is Poisson with unit rate a ˆ 1.

Fig. 5. A two-station reentrant line.

690

D.H. Ockerman, D. Goldsman / European Journal of Operational Research 116 (1999) 681±691

Table 6 Reentrant line power results for 10% tests for various initial conditions p

1 1 2 2 10 10 Lp -norm (p ˆ 1) Schruben's Max BM (b ˆ 4) BM (b ˆ 8) BM (b ˆ 16) t-test Compound # 1 Compound # 2

k

0 1 0 1 0 1

n ˆ 512

n ˆ 2048

n ˆ 32768

Steady state

Empty

Overloaded

Steady state

Empty

Overloaded

Steady state

Empty

Overloaded

0.223 0.110 0.226 0.156 0.244 0.235 0.260 0.341 0.144 0.205 0.265

0.191 0.090 0.191 0.123 0.203 0.194 0.220 0.404 0.120 0.173 0.235

0.276 0.112 0.279 0.185 0.296 0.285 0.315 0.275 0.205 0.295 0.362

0.219 0.117 0.223 0.161 0.239 0.232 0.255 0.298 0.171 0.213 0.260

0.189 0.097 0.194 0.129 0.210 0.203 0.223 0.341 0.148 0.192 0.234

0.591 0.366 0.596 0.496 0.608 0.599 0.625 0.588 0.528 0.612 0.668

0.156 0.110 0.162 0.127 0.178 0.173 0.195 0.196 0.135 0.185 0.239

0.143 0.105 0.151 0.119 0.166 0.160 0.181 0.201 0.139 0.177 0.234

0.648 0.480 0.650 0.578 0.663 0.657 0.675 0.739 0.565 0.655 0.715

0.285 0.096 0.528

0.397 0.123 0.617

0.854 0.407 0.901

0.171 0.036 0.441

0.206 0.043 0.465

0.847 0.713 0.882

0.075 0.041 0.174

0.079 0.040 0.178

0.576 0.681 0.654

Each station has one server which can only work on one job at a time. Each job must pass through the entire system, requiring an exponential amount of time (with mean mi ) at stage i. The quantity of interest is the average total time a job spends in the system as a function of the service times and station service disciplines. The service times and disciplines were chosen such that a known stationary distribution was available. Speci®cally, if m1 ˆ m3 ˆ m5 ˆ 0:3; m2 ˆ m4 ˆ 0:45, and both stations service customers on a ®rst come ®rst served basis, then the resulting Kelly network has a product-form stationary distribution (Kelly, 1975). At steady state, the number of customers at each station is distributed geometrically with parameter 0.9, the class of each customer is uniformly distributed among the possibilities, and the average time in the system is 18. Therefore, it is possible to simulate the system starting from steady-state conditions. Additionally, the system was simulated starting from empty, and the system was simulated starting from an overloaded condition in which the distribution of the number of customers was scaled by a factor of 10. The power results for each of the initialization bias tests for each of these initialization conditions are presented in Table 6.

Once again, with the exceptions of the t-test, the compound tests, and Schruben's Max test, the tests detected bias more often when the system was started from steady-state conditions than when it was started empty. 5. Conclusions The new tests developed herein are serious competitors with previous tests. While the e€ect of initialization bias on higher-order moments is frequently ignored in the literature, we demonstrated it to be signi®cant in a host of practical problems. It would appear that for stochastic processes that have boundaries (e.g., the number of customers waiting in a queue cannot be negative), if the process is started near one of the boundaries, and the variance at the boundary is lower than the variance at steady state, then many current initialization bias tests will not be successful in detecting the bias. We believe that the ttest and compound tests are steps towards a solution. In addition, we have no reason to believe that the compound tests that we constructed are in any sense optimal. Compound tests can be constructed with di€erent individual components,

D.H. Ockerman, D. Goldsman / European Journal of Operational Research 116 (1999) 681±691

with components of unequal sizes (e.g., a test with size 0.5 can be combined with one of size 0.2 to construct a compound test with size 0.1 ± is this better than combining two tests each with size 0.316?), or even with varying numbers of components (e.g., can we ®nd three independent initialization bias tests to combine?) References Alexopoulos, C., 1995. Advanced simulation output analysis for a single system. in: Alexopoulos, C., Kang, K., Lilegdon, W.R., Goldsman, D. (Eds.), Proceedings of the 1995 Winter Simulation Conference, Institute of Electrical and Electronics Engineers, Piscataway, NJ, pp. 101±109. Billingsley, P., 1968. Convergence of Probability Measures. Wiley, New York. Chance, F., 1993. The indi€erence approach to the analysis of transient e€ects in queueing networks and simulations. Ph.D. Dissertation, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY. Glynn, P.W., Iglehart, D.L., 1987. A new initial bias deletion rule. in: Thesen, A., Grant, H., Kelton, W.D. (Eds.), Proceedings of the 1987 Winter Simulation Conference, Institute of Electrical and Electronics Engineers, Piscataway, NJ, pp. 318±319. Goldsman, D., Schruben, L.W., Swain, J.J., 1994. Tests for transient means in simulated time series. Naval Research Logistics 41, 171±187.

691

Heidelberger, P., Welch, P.D., 1983. Simulation run length control in the presence of an initial transient. Operations Research 31, 1109±1145. Kelly, F.P., 1975. Networks of queues with customers of di€erent types. Journal of Applied Probability 12, 542±554. Kelton, W.D., Law, A.M., 1983. A new approach for dealing with the startup problem in discrete event simulation. Naval Research Logistics Quarterly 30, 641±658. Kimbler, D.L., Knight, B.D., 1987. A survey of current methods for the elimination of initialization bias in digital simulation. Annual Simulation Symposium 20, 133±142. Law, A.M., Kelton, W.D., 1991. Simulation Modeling and Analysis, 2nd ed. McGraw-Hill, New York. Ockerman, D.H., Goldsman, D., 1996. Tests for transients in simulated time series based on Lp -norm variance estimators, Technical Report, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA. Schruben, L.W., 1982. Detecting initialization bias in simulation output. Operations Research 30, 569±590. Schruben, L.W., 1983. Con®dence interval estimation using standardized time series. Operations Research 31, 1090± 1108. Snell, M., Schruben, L.W., 1985. Weighting simulation data to reduce initialization e€ects. IIE Transactions 17, 354±363. Tokol, G., Goldsman, D., Ockerman, D.H., Swain, J.J., 1998. Standardized time series Lp -norm variance estimators for simulations. Management Science 44, 234±245. Whitt, W., 1989. Planning queueing simulations. Management Science 35, 1341±1366. Wilson, J.R., Pritsker, A.A.B., 1978. A survey of research on the simulation start-up problem. Simulation 31, 55±58.