Quantile and tolerance-interval estimation in simulation

Quantile and tolerance-interval estimation in simulation

European Journal of Operational Research 168 (2006) 520–540 www.elsevier.com/locate/ejor Stochastics and Statistics Quantile and tolerance-interval ...

400KB Sizes 0 Downloads 65 Views

European Journal of Operational Research 168 (2006) 520–540 www.elsevier.com/locate/ejor

Stochastics and Statistics

Quantile and tolerance-interval estimation in simulation E. Jack Chen a, W. David Kelton b

b,*

a BASF Corporation, Mount Olive, NJ 07828, USA Department of Quantitative Analysis and Operations Management, University of Cincinnati, Cincinnati, OH 45221-0130, USA

Received 12 October 2000; accepted 19 April 2004 Available online 4 July 2004

Abstract This paper discusses two sequential procedures to construct proportional half-width confidence intervals for a simulation estimator of the steady-state quantile and tolerance intervals for a stationary stochastic process having the (reasonable) property that the autocorrelation of the underlying process approaches zero with increasing lag. At each quantile to be estimated, the marginal cumulative distribution function must be absolutely continuous in some neighborhood of that quantile with a positive, continuous probability density function. These algorithms sequentially increase the simulation run length so that the quantile and tolerance-interval estimates satisfy pre-specified precision requirements. An experimental performance evaluation demonstrates the validity of these procedures.  2004 Elsevier B.V. All rights reserved. Keywords: Simulation; Output analysis; Quantile estimation; Tolerance interval; Nonparametric

1. Introduction A properly selected set of quantiles (percentiles) reveals all the essential distributional features of output random variables analyzed by simulation. For 0 < p < 1, the p quantile of a distribution is the value at or below which 100p percent of the distribution lies. Quantile estimators are also more robust to outliers than are the mean and standard deviation. A few exceptional observations do not affect the quantile estimators as heavily as they affect the mean and standard deviation. However, quantiles have seldom been used in simulation studies. We believe that the reason for this lies in the complexity of quantile estimation. Estimating quantiles is computationally more difficult than estimating the mean and variance. One of the basic problems in estimation of quantiles is that the whole output sequence must be stored and sorted if the

*

Corresponding author. Tel.: +1 513 556 6834; fax: +1 513 556 5499. E-mail addresses: [email protected] (E.J. Chen), [email protected] (W.D. Kelton).

0377-2217/$ - see front matter  2004 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2004.04.040

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

521

estimates are based on order statistics. In the past, because of limited computer capability, the quantile-estimation procedures of Iglehart (1976) and Seila (1982a,b) could only estimate quantiles of regenerative processes. For general processes, special algorithms, such as the maximum transformation of Heidelberger and Lewis (1984), the P2 algorithm of Jain and Chlamtac (1985), the extended P2 algorithm of Raatikainen (1987a,b, 1990), and the histogram quantile estimation of Hurley and Modarres (1995) were needed to obtain estimates. Hsu and Nelson (1990) and Hesterberg and Nelson (1998) explore the control-variate variance-reduction technique in bivariate quantile estimation. Avramidis and Wilson (1998) explore the antithetic-variates variance-reduction technique in finite-horizon quantile estimation. However, existing techniques are either too restrictive to be applied to general cases, or too complicated to be implemented for practical use. Therefore, there is a need for robust procedures to estimate quantiles, which can be applied to relatively generic processes, and are easy to implement. An essential part of developing such procedures is to determine dynamically a workable finite simulation run length that will produce quantile estimates that are sufficiently precise, and finding a method for doing so is our primary focus in this paper. Related to quantiles, tolerance intervals are intervals that contain a specified proportion of the values of the random variable X (X represents an output performance measure of the system under study) with a certain prescribed confidence level. We propose zoom-in (ZI) and quasi-independent (QI) procedures (see Section 3) for estimating quantiles and tolerance intervals of discrete-time, covariance-stationary stochastic processes having a continuous marginal distribution with the (reasonable) property that the autocorrelation of the underlying process approaches zero (or ‘‘dies off’’) with increasing lag. Furthermore, at each quantile to be estimated, the marginal cumulative distribution function must be absolutely continuous in some neighborhood of that quantile with a positive, continuous probability density function. The proposed procedures will sequentially determine the simulation run length based entirely on generated output data so that the estimates will satisfy a pre-specified precision requirement. In Section 2, we discuss the rationale for using order statistics to estimate quantiles. In Section 3, we propose the proportional half-width idea, and present our methodologies and proposed procedures for quantile and tolerance-interval estimation. In Section 4, we give empirical results of applying our procedures. In Section 5, we make some concluding remarks. Parts of this paper appeared in an earlier form in Chen and Kelton (1999). 2. Rationale This section presents the rationale for our quantile estimation: order statistics quantile estimators. Let {Xi: i = 1, 2, . . ., n} be a sequence of i.i.d. (independent and identically distributed) random variables from a continuous distribution F(x) with pdf (probability density function) f(x). Let xp (0 < p < 1) denote the 100pth percentile, or the p quantile, which has the property that F(xp) = Pr[X 6 xp] = p. If {Yi: i = 1, 2, . . ., n} are the order statistics corresponding to the Xis from n independent observations, (i.e., Yi is the ith smallest of X1, X2, . . ., Xn) then a point estimator for xp based on the order statistics is the sample p quantile ^xp ¼ Y dnpe ;

ð1Þ

where Øyø denotes the integer ceiling (round-up function) of the real number y. For data that are i.i.d. and satisfy certain conditions (see van Zwet, 1964 or Avramidis and Wilson, 1998), the following properties of ^xp are well known (David, 1981): pð1  pÞf 0 ðxp Þ þ Oð1=n2 Þ; Eð^xp Þ ¼ xp  2ðn þ 2Þf 3 ðxp Þ pð1  pÞ þ Oð1=n2 Þ; Varð^xp Þ ¼ ðn þ 2Þf 2 ðxp Þ where O(an) denotes a quantity that converges to zero at least as fast as does the sequence {an}, as n ! 1.

522

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

For the case of correlated sequences, quantile estimation is much more difficult than in the independent case. Sen (1972) shows that the usual order-statistic point estimate, ^xp , is still asymptotically unbiased but its variance is inflated by autocorrelation when the sequences are /-mixing and satisfy certain conditions. Roughly speaking X1, X2, . . ., Xn is /-mixing if Xi and Xi + j become essentially independent as j becomes large; see Billingsley (1999) for exact definition. Heidelberger and Lewis (1984) and Raatikainen (1987a,b, 1990) have used this property to establish the validity of their methods.

3. Methodologies The results from Sen (1972) provide a motivating rationale for using classical statistical techniques to develop quantile estimators based on order statistics, i.e., they are asymptotically both unbiased and normal. But while asymptotic results are often applicable when the amount of data is ‘‘large enough,’’ the point at which the asymptotic results become valid generally depends on unknown factors. An important practical decision must be made regarding the run length n required to achieve the desired precision. We propose two heuristic sequential algorithms to determine the run length for estimating quantiles: the zoom-in algorithm and the quasi-independent algorithm. The zoom-in algorithm, which is an enclosure method, gives upper and lower bounds on the quantile xp at every step. It is used with combined-precision stopping rules and uses order statistics to estimate quantiles. In our algorithm, the default lower and upper bounds are set to the minimum and maximum floating-point values representable on the host computer. However, any a priori knowledge regarding the value of a quantile could be used for the lower and upper bound to reduce the computational effort. The quasi-independent procedure will process a subsequence of n samples (taken from the original output sequence) that appear to be independent, instead of processing the entire sequence of correlated samples directly. We accomplish this by systematic sampling, i.e., choose an observation and every lth observation thereafter. Here, the chosen l will be sufficiently large so that samples are effectively uncorrelated. This is possible because we assume that the autocorrelation of the underlying process approaches zero with increasing lag. In order to ensure that those samples in the subsequence correctly represent the states of the system, we will take samples every Øl=2øth observations. Therefore, l must be at least 2 and we will set l to 2 for sequences that statistically appear to be independent. We compute n (the required number of independent samples) and l for our systematic sampling. The minimum required population size is then N = 2nl, i.e., the final run length. Since the run length is often somewhat arbitrarily determined, it may be inappropriately short or long. The simulation may terminate inappropriately early due to statistical variability, which may cause difficulties, such as the asymptotic approximations not yet being valid. However, by specifying an appropriate stopping rule, we can reduce the chance that the simulation terminates early. To avoid running the simulations longer than necessary, though, is somewhat difficult. One reason is that we double the run length at each step in our QI algorithm. Furthermore, the variance is estimated directly by several quantile estimators instead of the asymptotic expansions for the moments of the quantile estimator; thus, the underlying condition for these procedures to be valid is that the autocorrelation dies off with increasing lag. 3.1. Linearized order-statistics quantile estimator The linearized order-statistics p quantile estimator ^xp is given by ^xp ¼ ð1  wÞY bnpc þ wY dnpe ;

ð2Þ

where ºyß denotes the integer floor (round-down function) of the real number y. The convex combination of the ºnpßth and Ønpøth order statistics, with the weight w = np  ºnpß, is chosen to reduce the first-order bias

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

523

of the quantile estimate. This linearized quantile estimator should be closer to the true quantile than is the conventional quantile estimator (1). Moreover, if we are estimating the upper tolerance limit (i.e., the estimator should cover more than the specified p value), the upper bound of the confidence interval (CI) of the sample quantiles will be used as a point estimator. If we are estimating the lower tolerance limit (i.e., the estimator should cover no more than the specified p value), the lower bound of the CI of the sample quantile will be used as a point estimator. 3.2. Proportional half-width Usually, the stopping rule of absolute-precision simulation procedures will ensure that Pr½xp 2 ^xp   P 1  a;

ð3Þ

where ^xp is the estimated quantile, xp is the true (but unknown) quantile,  is the maximum allowed halfwidth of the confidence interval, and 1  a is the confidence level (we assume throughout that 0 < a < 1). Since we are estimating quantiles, however, a different precision requirement can be used. We should control the precision by ensuring that Pr½xp 2 ^x½p0 1  P 1  a; 0

where

8 >

: 1

ð4Þ

if 0 6 P 6 1; if P < 0; if P > 1:

This implies that we have 1  a confidence that the true p quantile xp is between the estimated quantiles ^x½p0 1 and ^x½pþ0 1 , i.e., 0 0 h i Pr ^x½p0 1 6 xp 6 ^x½pþ0 1 P 1  a; 0

0

0

where  is the maximum proportional half-width of the CI. The absolute half-width  has the same measurement unit as the variate under investigation and can be any positive value. However, the proportional half-width  0 is dimensionless; it is a proportional value with no measurement unit and must be between 0 and max(p, 1  p), 0 < p < 1. The CI ½^x½p0 1 ; ^x½pþ0 1  is a random interval whose endpoints are observable statistics that can be calcu0 0 lated from the simulation-generated output {Xi: i = 1, 2,. . ., n}. Proposition 1 provides a run-length formula that will guarantee a proportional half-width of  0 when the output sequence is i.i.d.; the proof is in Appendix A. Proposition 1. Let xp be the p quantile of an i.i.d. sequence Xi for i ¼ 1, . . ., n, let  1 if X i 6 xp ; ci ¼ 0 otherwise Pn and define the random variable C n ¼ i¼1 ci . Then a sufficient run length np for a fixed-run-length procedure to achieve the precision Pr[jCn/n  pj <  0 ] P 1  a is

pð1  pÞ np ¼ ; ð5Þ a02 where  0 is the maximum proportional half-width of the confidence interval and 1  a is the confidence level (0 < a < 1).

524

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

While (5) guarantees the desired precision, it could be larger than necessary since the proof of Proposition 1 relies on Chebyshevs inequality, which can be rather loose. As an alternative approximate required run length, note that C n  np pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi npð1  pÞ has a limiting Nð0; 1Þ distribution (Nðm; s2 Þ denotes the normal distribution with mean m and variance s2). So for large n,

" #

C  np h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffii

n 1  a ffi Pr pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 6 z1a=2 ¼ Pr jC n =n  pj 6 z1a=2 pð1  pÞ=n ;

npð1  pÞ where z1  a=2 is the point below which is probability 1  a=2 in the Nð0; 1Þ distribution. Setting pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi z1a=2 pð1  pÞ=n in the last expression to  0 , solving for n, and rounding up yields & ’ z21a=2 pð1  pÞ : ð6Þ np ¼ 02 A common rule of thumb for the conditions under which this normal approximation is accurate is that np and n(1  p) are both more than 5. The proposed procedure will require the run length to be at least 4000, and the allocated run length is generally much larger than this. We choose the value of p for the desired quantile, so the rule of thumb indicates that we are safe so long as 4000p and 4000(1  p) are both more than 5, which is equivalent to choosing p between 0.00125 and 0.99875. Thus, we can safely use the normal approximation for all but the most extreme values of p. Since z21a=2 6 ð4=9Þ  ð1=aÞ (see Appendix B for a proof), the run lengths based on (6) will be over 55% smaller than those based on (5) (ignoring rounding). Numerically comparing z21a=2 and 1=a indicates realized run-length savings of at least 67%; on the ‘‘usual’’ range of a 6 0.10 the savings are at least 73% and increase toward 100% as a decreases to zero. So we will use (6) instead of (5) to compute the required run length; however, conservative users could still use (5). The argument in the preceding paragraph implies that, for n large enough (i.e., larger than np approximately in (6) or, conservatively in (5)), Pr[xp 2 [Yn(p   ), Yn(p +  )]] P 1  a. If nðp  0 Þ are not integer, then linear interpolation can be used between Yºn(p± )ß and YØn(p± )ø. Moreover, if n(p   0 ) < 1 or n(p +  0 ) < n, then the default minimum or maximum should be used, respectively. Thus, if we use the proportional precision requirement (i.e., (4)), the required run length np for a fixed-run-length procedure of estimating the p quantile of an i.i.d. sequence is approximately the minimum np that satisfies (6). For example, if the data are independent and we would like to have 90% confidence that the true 0.5 quantile x0.5 lies between ^x0:50 and ^x0:5þ0 , the estimated quantiles of order 0.5   0 and 0.5 +  0 , respectively, where we take  0 = 0.005, the required run length is np P 1.64520.5(1–0.5)/0.0052 = 27,060.25. So if n P np = 27,061, then Pr½x0:5 2 ½^x0:495 ; ^x0:505  P 0:9, and Pr½jF ð^x0:5 Þ  0:5j 6 0:005 P 0:9. Note that ^x0:495 ¼ Y d0:495ne and ^x0:505 ¼ Y d0:505ne can be calculated directly from the order statistics {Yi: i = 1, 2, . . ., n} and provide information about how accurately the unknown quantile x0.5 is being estimated. 0

0

0

0

3.3. The zoom-in algorithm In this section we propose a zoom-in algorithm and use order statistics to estimate quantiles. A likelihood buffer, which is an array of variables that are defined as double (in the C language), is allocated at the beginning of the algorithm. The likelihood buffer is used to store and sort the observations that are deemed most likely values to be the quantile at any particular iteration. By iteration we mean a run-length-extending step in the zoom-in algorithm described below; note that this algorithm is invoked in only the first run,

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

525

with its results used in subsequent runs. The procedure starts with an initial run length and iterates repeatedly to increase the run length until the stopping rules are met. At each iteration, it increases the run length, computes the sample quantile, as well as estimates of the lower and upper bounds for the p-quantile. The size of the likelihood buffer bfsize is set to 100,000 initially. We compute the sample quantile according to the linearized order-statistics quantile estimator, i.e., (2). If one is more concerned that the p quantile cover at least 100p% of the distribution, instead of being primarily concerned with a tolerance interval, then the original quantile estimator, i.e., (1), can be used. Let n be the current run length (n = bfsize at the initial iteration), and let nl = ºn(p  dp)ß, and nu = Øn(p + dp)ø, where 0 < dp < 0.5 (we use an initial value of dp = 0.4). The sample quantile lower and upper bounds are ^xl and ^xu , and are initially set to 1 and 1, respectively. At each iteration they will be recomputed as follows: If nl > 0 and the current ^xl < Y nl ; then set ^xl

Y nl

ð7Þ

if nu 6 n and the current ^xu > Y nu ; then set ^xu

Y nu :

ð8Þ

and If nl 6 0 or Ynl is smaller than the current value of ^xl , then the lower bound is not changed. If nu < n or Ynu is greater than the current value of ^xu , then the upper bound is not changed. Once we obtain estimates of the lower and upper bounds of the quantile, the sample values Yi for i = 1, . . ., nl  1 and i = nu + 1,. . ., n are no longer needed. More samples can then be generated to fill in the available slots in the likelihood buffer. We will store the newly generated sample values in the likelihood buffer only when they fall within the lower and upper bounds inclusively. If the sample value is less than the lower bound, we increase the counter nl by one. If the sample value is larger than the upper bound, we increase the counter nu by one. The total run length n will be increased by I % (we use 10%) at subsequent iterations. New sample quantiles, lower, and upper bounds are recomputed at each iteration. We tighten the lower and upper bounds of the quantile estimate at each iteration by multiplying the gap dp by dfactor (dfactor < 1; we use 0.90) at subsequent iterations. A larger dfactor means slower convergence, but it increases the possibility that the lower and upper bounds will include the true quantile value. A smaller dfactor will have faster convergence, but it also runs the risk of having the true quantile value outside the lower and upper bounds. The likelihood buffer will be expanded at each iteration to store all necessary observations to ensure that our bounds actually cover the true quantile values. The process will be repeated iteratively until the pre-specified stopping criteria are satisfied. If the sample quantile is outside the range of the likelihood buffer, then the simulation needs to be restarted with a larger likelihood-buffer size bfsize or larger value of dfactor. The lower and upper bounds estimated in the current run can be used for later runs so that the program can start from where it terminated without starting from the very beginning. The estimator can also be approximated by linear extrapolation, but the estimator will not be the exact sample quantile in this case. Once the buffer size and bounds are determined, we will execute another R  1 runs. However, we dont iterate repeatedly to estimate the run length in subsequent runs. Instead, we will store and sort only those observations that are between the lower and upper bounds of the quantile estimated from the first run and tally the number of observations that are less than the lower bound. The simulation will continue until the number of observations between the lower and upper bound estimates reaches bfsize, which has been determined from the first run. For example, consider an M/M/1 queue with output process being the individual customer delays in queue. In the first run, we start with bfsize = 100,000 (delays in queue), ^xl ¼ 1 (^xl ¼ 0 can be used to reduce the computational effort when users know 0 is the minimum), and ^xu ¼ 1. In each iteration we increase the run length (delays in queue) by 10%, recompute bfsize, ^xl , ^xu , and check whether the stopping rules are satisfied. If the stopping rules are met at iteration r, with run length s (i.e., we have generated s delays in queue), bfsize = b, ^xl ¼ bl , and ^xu ¼ bu , then the subsequent runs will estimate quantiles

526

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

based on those values. In each subsequent run, the statistical accumulators are reset to the same state. The procedure will continue until it has obtained b observations that are between bl and bu inclusively. Note that the run length in subsequent runs may be different than in the first run. If we make R  1 subsequent runs this way, we have R quantile estimators in total. Even though using estimates from the first run to subsequent runs introduces some correlation across the runs (so they are not independent), the nature of this correlation is not clearly systematic in any way. It is also possible that, while the quantile estimator is between the lower and upper bounds obtained from the first run, some of the quantile estimators could be outside the range in subsequent runs. In these unlikely cases, estimators will be treated as outliers and the lower or upper bounds of the sample quantiles will be used as estimators. Thus, our procedures have several approximations inherent in them, so empirical testing is required, on which we report in Section 4. The zoom-in algorithm 1. The size of the likelihood buffer, bfsize, is set to 100,000 initially. The run-length incremental rate, I, is set to 0.1. The sample quantile lower and upper bounds ð^xl and ^xu ) are initially set to 1 and 1, respectively. The gap used to compute the lower and upper bound, dp, is set to 0.4 intially. The gap-tightening factor, dfactor, is set to 0.9. 2. Generate n = bfsize observations. Use (2), (7), and (8) to compute ^xp , ^xl , and ^xu . 3. If the observations appear to be i.i.d., go to step 9. We use a runs-up test (Knuth, 1998) to check whether a sequence appears to be independent. 4. Remove observations that are not within the lower and upper bounds from the buffer. Compute nl, the number of observations that are smaller than ^xl , and nu, the number of observations that are larger than ^xu . 5. Generate In more observations. Increase bfsize as necessary to store all newly generated observations that are between the lower and upper bounds. 6. Set n = (1 + I)n, dp = dfactor * dp, and use (7) and (8) to compute ^xl and ^xu . 7. If the stopping criteria are satisfied, compute ^xp and go to step 9. 8. Otherwise, go to step 4. 9. Make another R  1 runs, and use the values of bfsize, ^xl , and ^xu computed from the first run. That is, the simulation run will continue until the buffer is full. Furthermore, we will store samples in the buffer only if they are between ^xl and ^xu inclusively. 10. Set the quantile point estimator to the mean of the observed R quantile estimators. If the initial sequence appears to be independent, as determined by the runs-up test, the procedure uses the initial sequence to estimate the quantile without ensuring that the stopping criteria are met. This is because the initial sample size 100,000 (>27,061) is large enough to satisfy the precision requirement of (4) with  0 = 0.005 and a = 0.1 under the hypothesis that the sequence is independent. If the initial sequence appears to be dependent, the procedure needs to iterate at least three times before the stopping criteria can be met. Since we are estimating quantiles of stochastic processes, we will make R = 3 runs to get R quantile estimators. Using a larger R can achieve greater stability of the CI estimator, but would require longer simulation run lengths. We use a small number for R to reduce the required simulation run length. Users can increase the number progressively to achieve the required precision if the initial estimator does not satisfy the precision requirement. Let ^xp;m denote the estimator of xp in the mth run. We use R 1X  ^xp;m ^xp ¼ R m¼1

ð9Þ

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

527

as a final point estimator of xp. Assuming that the run length determined by our procedures is large enough such that each ^xp;m has a limiting normal distribution, a CI for xp using the ^xp;m s can be approximated by S  ^xp  tR1;1a=2 pffiffiffi ; R

ð10Þ

where tR  1, 1  a/2 is the upper 1  a=2 quantile for the t distribution with R  1 d.f. (R P 2) and S2 ¼

R 1 X ^xp Þ2 ð^xp;m   R  1 m¼1

is the usual estimator of r2p ðnÞ, the variance of ^xp . As noted earlier, our runs are not strictly independent, since the first one sets bounds that are used by subsequent ones, though it is not clear that the nature of this dependence is systematic; nevertheless, this fact renders (10) an approximation, mandating empirical testing, on which we report in Section 4. Even though order-statistic quantile estimators are asymptotically unbiased, we dont know whether the run length determined by our procedures is long enough for this asymptotic approximation to become valid. Thus, this is another reason why the coverage of this CI cannot be guaranteed, i.e., the proposed procedures ^xp  tR1;1a=2 pSffiffiR P 1  a. The precision of our quantile estimation procedures is cannot guarantee Pr½xp 2  the proportional half-width CI (4) in Section 3.2. However, we heuristically use (10) to approximate the confidence limit of the tolerance limits when we construct tolerance intervals; see Section 3.5.

3.3.1. Properties of termination rules For the iterative process of sequential procedures to terminate properly, a stopping mechanism is required. Note that these stopping rules are used only in the first run. The proposed stopping criterion includes six parts: 1. The proportion of observations between ^xl and ^xu inclusively is less than  0 . In this case, the tolerance interval constructed by the lower and upper bounds of the quantile estimator contains less than  0 of the population, i.e., dp <  0 /2. Thus, the quantile estimator should be within tolerance. 2. The difference of consecutive quantile estimators has changed sign at least K times, and the last L consecutive quantile estimators do not increase or decrease monotonically. Based on the results of our empirical studies, we set both K and L to 4. The conjecture is that if the consecutive quantile estimators are increased or decreased monotonically then the run length is not long enough. 3. There is no new maximum or minimum as we increase the run length in the first run. That is, as we iteratively increase the run length, the observations are within the minimum and maximum that have already occured in previous iterations within the first run. The underlying reason is that the maximum or minimum of the output data should have occurred if the run length is sufficiently long. 1 1 4. Let ^xl , ^xu , and blu ¼ ½p þ dp 0  ½p  dp 0 be the estimates of the lower bound, upper bound, and their corresponding coverage obtained at iteration r. Since we set ^xl ¼ Y nl and ^xu ¼ Y nu at iteration r, the ^ be the proportion of all observations between the current ^xl coverage between them is blu. Let b lu and ^xu (obtained from iteration r) at iteration r + 1. The difference between the theoretical coverage ^ is less than  0 . blu and the observed coverage b lu If the run length is long enough, then the tolerance interval formed by ^xu and ^xl (obtained at iteration r) should contain approximately the corresponding proportion of the population (at iteration r + 1).

528

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540 1

1

5. Let bl ¼ ½p  dp 0 and bu ¼ 1  ½p þ dp 0 be the tail coverage of lower and upper bounds, respectively, ^ be the proportion of observations less than ^xl and let b ^ be the proportion obtained at iteration r. Let b l u ^ (bu and b ^ ) is less than of observations greater than ^xu at iteration r + 1. The difference between bl and b l u 0. This rule is basically the same as the previous one except that we have a one-tailed instead of two-tailed tolerance interval. 6. The relative or absolute difference of the consecutive quantile estimators ^xp of the sequential procedure is within tolerance, i.e., less than s or , respectively, where s and  have the same numerical value as  0 . Let ^xp;i denote the p-quantile estimator at the ith iteration. j^xp;iþ1  ^xp;i j 6s j^xp;i j or j^xp;iþ1  ^xp;i j 6 : Even though this stopping rule is somewhat arbitrary, the underlying reason is that if the run length is long enough for a valid asymptotic approximation then the value of subsequent quantile estimates should not change much. Because s is dimensionless, we use the relative precision as a gauge to check whether the quantile estimates have stabilized. However, the relative precision usually results in an overestimated value of n when the quantile estimates are near zero. For example, if the true quantile is zero, and the quantile estimates are 0.001 and 0.001, then the relative precision is 2 even though both quantile estimates are very good. To avoid excessive run length in this situation, we accompanied the relative precision by a corresponding absolute precision. We conveniently set s =  =  0 , which works well empirically. However, users can use different values. The first stopping rule is connected to the others by ‘‘or,’’ and the last five stopping rules (2–6) are connected by ‘‘and.’’ That is, the procedure will terminate as soon as either stopping rule 1 alone is satisfied, or all of stopping rules 2–6 are satisfied, whichever comes first. All rules above are only ‘‘necessary conditions.’’ Therefore, it is possible that all the sequential stopping rules are satisfied by chance, yet the run length is still not long enough. To avoid premature stopping of the sequential procedure, we will stop it only when stopping rules 3–6 have been satisfied in three consecutive iterations, which is based on our experimental results. On the other hand, the sequential procedure will stop immediately whenever stopping rule 1 is satisfied, so the zoom-in algorithm will always terminate gracefully. This is because once the value 0 of  0 has been decided, the maximum number of iterations ni can be computed by ni ¼ dlnð2d p Þ= lnðdfactorÞe (because dfactornidp <  0 /2). Therefore, if the number of iterations for which stopping rules 3–6 must be satisfied was set too large, the sequential procedure will always be terminated by stopping rule 1. In that case, the sequential procedure will behave like a fixed-run-length procedure, the run length determined by those stopping rules will still be long enough for highly correlated data, but will be longer than required for data that are only slightly correlated. Sees and Shortle (2002) have experimented with other stopping rules for the ZI algorithm and are able to obtain good quantile estimates of extreme quantiles (i.e., 0.99 quantile) of M/G/1 queues with heavy-tailed service time. 3.4. Quasi-independent procedure Our second method is the quasi-independent (QI) algorithm. The asymptotic validity of the QI procedure occurs as the subsequence appears to be independent, as determined by the runs-up test. Chen and Kelton (2003) list some empirical results of using the runs test to check for independence. The required number of independent samples n is computed from (6). Let lag l be the minimum lag for which the

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

529

subsequence will appear to be independent. The first step of the algorithm is to determine the lag l sequentially. For continuous-time simulations, we would need to sample observations at discrete intervals in order to discretize the process. We conservatively take two samples every lth observation. This is accomplished by allocating a buffer to store 2n samples that are lag l 0 = 2k apart, where k = 0, 1, 2, . . . is the number of iterations. However, we use odd samples in the buffer, i.e., lag l = 2l 0 observations to check for independence. The simulator will generate 2n observations and store those values as samples x1, x2, x3, . . ., x2n. We carry out a runs-up test with first 4000 odd samples (the minimum recommended sample size for the runs-up test), namely samples x1, x3, x5, . . ., x7999. If the subsequence appears to be independent, as determined by the runs-up test, we compute the sample quantile of the subsequence according to (2). If the input sequence appears to be dependent, samples x2, x4, x6, . . ., x2n will be discarded. Samples x1, x3, x5, . . ., x2n  1 will be reindexed in the buffer as x1, x2, x3, . . ., xn. We will then generate another n samples to fill in the buffer, namely xn + 1, xn + 2, xn + 3, . . ., x2n. However, starting with the first observation x1, we store observations separated by lag l = 2k so that we store the data set {x1 + jl: j = 0, 1,. . .}. So the buffer will store 2n samples with each sample taken 2k  1 observations apart at the kth iteration. This process will be repeated iteratively until the subsequence appears to be independent, as determined by the runs-up test. The quasi-independent algorithm 1. The size of the buffer used to store and sort samples is denoted bfsize and k is the number of iterations. 2. Compute the initial run length n from (6), with p conservatively set to 0.5. Set bfsize = 2n and k = 0. Generate bfsize observations as our initial samples. 3. Carry out a runs-up test to determine whether the sequence appears to be independent. 4. If the samples appear to be i.i.d. then go to step 8. 5. Discard every other sample in the buffer. Reindex the rest of the n samples in the first half of the buffer. 6. Obtain another n samples that are lag 2k observations (i.e., observations that are 2k  1 apart) at the kth iteration and store them in the second half of the buffer. 7. Set k = k + 1. If k < 15, go to step 3. 8. Make another R  1 runs. 9. Compute the quantile estimator according to (9). The QI algorithm can be easily implemented and can estimate multiple quantiles simultaneously without further modification because the QI sequence can be used to construct an empirical distribution of the parameter under study. However, the QI algorithm will waste many observations when the output sequences are strongly correlated. Based on our empirical study, we set the maximum number of iterations to 15. 3.5. Simulation-based tolerance-interval estimation In many types of processes, it is customary to think of tolerance intervals as intervals that contain a certain fraction, say 100b%, of the distribution. In this section we present some approaches to estimating tolerance intervals of a process. We employ the terminology ‘‘a 100c% tolerance interval contains 100b% of the population’’ to describe a tolerance interval that one can claim with 100c% confidence encloses 100b% of the population. We compute the probability that a certain random interval includes (or covers) a pre-assigned percentage of the probability for the distribution under consideration. Let X be a random variable with distribution function F(X) of the continuous type. The random variable Z = F(X) has a uniform (0, 1) distribution. Let {Xi: i = 1, 2, . . ., n} be an i.i.d. sequence of random variables with distribution F and let {Yi: i = 1, 2, . . ., n} be the order statistics corresponding to the Xis from these n observations. Consider the difference Zj  Zi = F(Yj)  F(Yi), i < j and Yi < Yj. Since Pr[X = yi] = Pr[X = yj] = 0, the difference F(yj)  F(yi) is that fractional part of the probability for the distribution of X that is between yi and yj. Let b denote a proportion that is

530

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

less than 1. If F(Yj)  F(Yi) P b, then at least 100b% of the probability for the distribution of X is between yi and yj. Let c = Pr[F(Yj)  F(Yi) P b], or equivalently c = Pr[PrX [Yi 6 X 6 Yj] P b] (PrX [ Æ ] denotes that the probability is with respect to the random variable X ). Then the random interval (Yi, Yj) has probability c of containing at least 100b% of the probability for the distribution of X. If yi and yj denote, respectively, realizations of Yi and Yj, the interval (yi, yj) either does or does not contain at least 100b% of the probability for the distribution of X. However, we refer to the interval (yi, yj) as a 100c% tolerance interval for 100b% of the probability for the distribution of X. Likewise, yi and yj are called 100c% tolerance limits for 100b% of the probability for the distribution of X (Hogg and Craig, 1995). Let PL be the desired maximum proportion of the population in the lower tail of the distribution, and let PU be the desired maximum proportion of the population in the upper tail of the distribution. Furthermore, let the desired coverage b = 1  (PL + PU). Assume that the tolerance limits TPL and TPU have limiting norP P mal distribution and F ðT PL Þ ! P L and F ðT PU Þ ! 1  P U as n ! 1. If the estimators of the tolerance limits satisfy the precision requirements Pr½jF ðT PL Þ  P L j 6 0  P c and Pr½jF ðT PU Þ  ð1  P U Þj 6 0  P c; then the tolerance interval formed by these two tolerance limits satisfies Pr½F ðT PU Þ  F ðT PL Þ P b  20  P c: A 100c% tolerance interval to control both tails of the distribution will be constructed as [TPL,T PU , where TPL is the 100c 0 % (we use 95%) confidence lower bound for the tolerance limit TPL, and T PU is the 100c 0 % confidence upper bound for the tolerance limit TPU. Since the run length, determined by the proposed procedures, does not guarantee asymptotic validity of the quantile estimates, the validity of using (10) to construct an absolute half-width CI cannot be guaranteed. However, we use this CI to estimate the lower and upper tolerance limits to approximate Pr½F ðT PU Þ  F ðT PL Þ P b P c: In the case of a one-tailed upper (or lower) tolerance bound, i.e., TPL =  1 or T PU = 1, the data are to be used to construct a bound to exceed (or be exceeded by) a specified population proportion. A one-tailed tolerance bound is equivalent to a one-sided confidence bound on a distribution quantile. Depending on which weight the user needs, the tolerance interval can be (1; T PU ) or (TPL, 1). For example, if both TPL and TPU are specified as 0.5 quantiles, the first tolerance interval (1; T PU ) will ensure that at least 50% of the observations lie on or below T PU . On the other hand, the second tolerance interval (TPL, 1) will ensure that at least 50% of the observations lie on or above TPL, i.e., no more than 50% of the observations lie on or below TPL. Thus, T PU P T PL . 3.6. Discussion of the zoom-in and quasi-independent algorithms For long run lengths, it becomes impractical to store and sort the entire sequence. These limitations can be overcome by using the zoom-in algorithm, which requires storing and sorting only a sequence of most likely values, and by using the quasi-independent algorithm, which requires storing and sorting only a sequence of sample values that can approximately represent the underlying distribution. Savings in storage and sorting are substantial for these methods. The advantages of the zoom-in approach are that we will be able to obtain the exact sample quantiles in one pass, the estimated lower and upper bounds of the quantile are available at each iteration, and the incremental run length can be adjusted easily. However, the ZI algorithm uses heuristic stopping rules

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

531

and needs to allocate multiple likelihood buffers when estimating multiple quantiles simultaneously. On the other hand, the QI algorithm uses a runs-up test to determine the simulation run length, which has a stronger theoretical basis. The main advantage of the QI approach is that by using a subsequence of QI samples to represent the underlying distribution, we tremendously reduce the amount of data we need to process. The QI procedure computes the number of required independent samples at the beginning of the procedure making implementation a relatively simple task. The QI algorithm generally requires shorter run-time because it requires sorting only the final subsequence once. Theoretically the QI quantile estimator will be statistically the same as the sample quantile of all observations. However, if the whole sequence of observations can be read in again in the second phase, we can use the algorithm in the second phase of the zoom-in algorithm to get the exact sample quantile. That is, we only store and sort the observations that are in the lower and upper bounds of the quantile estimate and tally the number of observations that are less than the lower bound. For example, if there are N observations in the whole sequence and we are estimating p quantiles, the exact sample quantile will be the kth order statistic of the subsequence that contains only those observations that are between the lower and upper confidence limits of the quantile, where k = ØNpø  nl, and nl is the number of observations that are less than the lower bound of the quantile. However, we will need to write the whole sequence to a file and read it in again. Another advantage of the QI algorithm is that the algorithm can estimate multiple quantiles simultaneously without much extra effort.

4. Empirical results In this section we present some empirical results obtained from simulations using the zoom-in and quasiindependent procedures proposed in this paper. We tested four stochastic models: AR(1), MA(1), as well as M/M/1 and M/M/2 delay-in-queue processes. In all cases, the confidence level of the runs-up test is set to 0.90, the number of runs R is set to 3, the maximum allowable proportional half-width  0 is set to 0.005, and the confidence level of the tolerance limit (c 0 ) and tolerance intervals (c) are both set to 0.95. Each design point is based on 100 simulation runs. 4.1. The AR(1) processes A stochastic model that is covariance-stationary and admits an exact analysis of performance criteria is the first-order auto-regressive (AR(1)) process, generated by the recurrence relation X i ¼ l þ uðX i1  lÞ þ i for i ¼ 1; 2; . . . ; where 0 < u < 1, E(i) = 0,  2 r if i ¼ j; Eði j Þ ¼ 0 otherwise; and X0 is specified to be some constant x0. The is are commonly called error terms. If we make the addi1 tional assumption that the is are i.i.d. Nð0; 1Þ, then it can be shown that X has an asymptotic Nðl; 1u 2Þ distribution. We tested two AR(1) models with u set to 0.75, and 0.90, respectively. We set l = 0 and r2 = 1 for these two models. In order to eliminate the initial bias, X0 is set to a random variate drawn from the steady-state distribution. Table 1 lists the true quantile values of these AR(1) processes. Table 2 lists the AR(1) experimental results of the ZI algorithm. The u column lists the correlation coefficient of the AR(1) process. The p column lists the quantile we want to estimate. The ^x^p column lists the global average of the p quantile estimators. The ^c column for quantile estimators lists the proportion of the CIs containing the true quantile.

532

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

Table 1 True quantile values of the AR(1) process Quantile u

0.05

0.10

0.25

0.75

0.90

0.95

0.75 0.90

2.487326 3.774373

1.937792 2.940487

1.019278 1.546696

1.019278 1.546696

1.937792 2.940487

2.487326 3.774373

Table 2 Observed coverage of nominal 95% confidence ZI quantile estimators for the AR(1) process AR(1)

Lower quantile

Upper quantile

Tolerance int.

u

p

 ^xp

^c

SSMSE

p

^xp

^c

SSMSE

^c

^ b

0.75

0.05 0.10 0.25

2.486921 1.937546 1.019632

1.00 1.00 1.00

0.002072 0.002489 0.004685

0.95 0.90 0.75

2.486750 1.937536 1.019623

1.00 1.00 1.00

0.001815 0.002325 0.004391

1.00 1.00 1.00

0.901684 0.802772 0.505018

0.90

0.05 0.10 0.25

3.772297 2.939411 1.547076

1.00 1.00 1.00

0.001997 0.001979 0.004483

0.95 0.90 0.75

3.774020 2.940075 1.547905

1.00 1.00 1.00

0.001698 0.002368 0.005077

0.99 1.00 1.00

0.901516 0.802528 0.504957

The ^c column for tolerance intervals lists the proportion of the estimators that cover at least 100b% of the distribution. The SSMSE column lists the standardized square roots of the mean square errors. The standardization is done by dividing the square root of the mean square errors by the absolute value of the quan^ tile. This quantity measures the mean relative variation of the estimates from the true value. The b column lists the average of the portion of the distribution covered by the tolerance interval [TPL, T PU ]; see Section 3.5 for the definitions of TPL and T PU . All quantile estimators from the zoom-in algorithm satisfy the 1 requirements that xp 2 ½^xp0 0 . All but one tolerance intervals cover more than the specified b proportion of the distribution. The SSMSE of the 0.25 and 0.75 quantiles are larger than the others because the absolute value of the true quantiles are smaller. Table 3 lists the AR(1) experimental results of the QI algorithm; the headings for the columns are as described previously. The QI quantile and tolerance interval estimators are not as good as the ZI estimators. Four of these twelve quantile design points have coverage less than the specified 0.95 nominal level. The coverage of the QI quantile estimators deteriorate where the pdf f(xp) has large values. With the same amount of deviation in xp, the deviation in F(xp) will be large when f(xp) is large. Furthermore, the QI SSMSEs are larger than those obtained in the ZI algorithm. The proportion of tolerance intervals obtained through the QI algorithm that cover the specified b proportion of the distribution is as good as the ZI  ^ is larger than in the zoom-in algorithm. This is an indication that the variance of algorithm; however, b the QI quantile estimators is larger than that of the ZI estimators. Table 4 lists the average run length of each simulation run of the ZI and QI procedures.

4.2. The MA(1) processes Another stochastic model that is covariance-stationary and admits an exact analysis of performance criteria is the first-order moving average (MA(1)) process, generated by the recurrence

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

533

Table 3 Observed coverage of nominal 95% confidence QI quantile estimators for the AR(1) process AR(1)

Lower quantile

Upper quantile

Tolerance int.

u

p

 ^xp

^c

SSMSE

p

^xp

^c

SSMSE

^c

^ b

0.75

0.05 0.10 0.25

2.486790 1.937424 1.019266

1.00 1.00 0.91

0.002772 0.003323 0.005087

0.95 0.90 0.75

2.486790 1.938287 1.020238

1.00 1.00 0.87

0.006909 0.002986 0.005559

1.00 1.00 1.00

0.902603 0.803643 0.505292

0.90

0.05 0.10 0.25

3.773878 2.940186 1.540503

1.00 0.99 0.87

0.002685 0.002881 0.004748

0.95 0.90 0.75

3.773905 2.940705 1.547373

1.00 0.98 0.83

0.003070 0.003168 0.005673

1.00 0.99 1.00

0.902650 0.803568 0.505570

Table 4 Average run lengths of estimating the AR(1) process u

ZI

QI

0.75 0.90

368852 1606316

470008 1325505

Table 5 True quantile values of the MA(1) process Quantile h

0.05

0.10

0.25

0.75

0.90

0.95

0.75 0.90

2.056514 2.213405

1.602161 1.724389

0.842736 0.907029

0.842736 0.907029

1.602161 1.724389

2.056514 2.213405

X i ¼ l þ i þ hi1 for i ¼ 1; 2; . . . ; where the i values are i.i.d. Nð0; 1Þ and 0 < h < 1. It can be shown that the random variable of the MA(1) process Xi has an asymptotic Nð0; 1 þ h2 Þ distribution. The initial 0 is randomly selected from the distribution Nð0; 1 þ h2 Þ. We tested two MA(1) models with l = 0, and h set to 0.75, and 0.90. Table 5 lists the true quantile values of these MA(1) processes. Tables 6 and 7 list the MA(1) experimental results of the ZI and QI algorithms. The h column gives the correlation coefficient of the MA(1) process. The rest of the columns are as defined in Table 2. Table 8 lists the average run length of each simulation run of the ZI and QI procedures. Like the AR(1) output sequence, the MA(1) output sequence is also normally distributed, but the autocorrelation of the output sequence of the MA(1) process is not as strong as the AR(1) process with the same correlation coefficient. Thus, the average run lengths of the MA(1) process are smaller than the AR(1) process with the same correlation coefficient; see Tables 4 and 8. The MA(1) sequences are only weakly correlated even with h as large as 0.95. Therefore, the subsequence with samples taken every other observation appears to be independent, as determined by the runs-up test. Consequently, the difference in run lengths for different hs is insignificant. Furthermore, the run lengths determined by the QI algorithm are smaller than those determined by the ZI

534

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

Table 6 Observed coverage of nominal 95% confidence ZI quantile estimators for the MA(1) process MA(1)

Lower quantile

Upper quantile

Tolerance int.

h

p

 ^xp

^c

SSMSE

p

^xp

^c

SSMSE

^c

^ b

0.75

0.05 0.10 0.25

2.056113 1.602087 0.843117

1.00 1.00 1.00

0.001481 0.001407 0.003029

0.95 0.90 0.75

2.056225 1.601771 0.843039

1.00 1.00 1.00

0.001531 0.001679 0.002703

0.99 1.00 1.00

0.901322 0.801911 0.503412

0.90

0.05 0.10 0.25

2.212860 1.724156 0.907213

1.00 1.00 1.00

0.001427 0.001616 0.003035

0.95 0.90 0.75

2.213224 1.723885 0.907356

1.00 1.00 1.00

0.001517 0.001502 0.002824

1.00 1.00 1.00

0.901258 0.801900 0.503371

Table 7 Observed coverage of nominal 95% confidence QI quantile estimators for the MA(1) process MA(1)

Lower quantile

Upper quantile

Tolerance int.

h

p

 ^xp

^c

SSMSE

p

^xp

^c

SSMSE

^c

^ b

0.75

0.05 0.10 0.25

2.057191 1.602056 0.843148

1.00 1.00 0.87

0.003012 0.003258 0.005573

0.95 0.90 0.75

2.056810 1.602405 0.843995

0.99 1.00 0.87

0.003286 0.003339 0.005393

1.00 0.99 1.00

0.902881 0.803762 0.505779

0.90

0.05 0.10 0.25

2.214062 1.724642 0.907151

1.00 0.99 0.84

0.003021 0.003485 0.907029

0.95 0.90 0.75

2.213626 1.724551 0.908344

1.00 0.98 0.88

0.003372 0.003276 0.005272

0.99 1.00 1.00

0.902834 0.803841 0.505791

Table 8 Average run lengths of estimating the MA(1) process h

ZI

QI

0.75 0.90

316473 334932

87611 86838

algorithm when the sequences are only slightly correlated. If a sequence appears to be dependent, as determined by the runs-up test, the ZI procedure will iterate at least four times before it terminates. On the other hand, the QI procedure can terminate at the first step. The rest of the results have the same implications as those from the AR(1) process. 4.3. The M/M/1 delay-in-queue processes We tested two M/M/1 queuing models. The service rate (l) is set to 1.0 per period for all models. The arrival rate (k) is set to 0.75 and 0.90 per period. Because the steady-state waiting-time distribution function F(x) of the M/M/1 delay in queue has a jump from 0 to 1  q (q = k=l < 1 is the traffic intensity) at 0, any quantile less than 1  q cannot be estimated. Therefore, it is useful to know whether a desired quantile is attainable before conducting an informative experiment. Table 9 lists the true quantile values of these M/M/1 processes.

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

535

Table 9 True quantile values of the M/M/1 process Quantile q

0.15

0.30

0.40

0.90

0.95

0.75 0.90

– 0.571584

0.275972 2.513144

0.892574 –

8.059611 21.972238

10.832200 28.903708

Table 10 Observed coverage of nominal 95% confidence ZI quantile estimators for the M/M/1 delay-in-queue process M/M/1

Lower quantile

Upper quantile

Tolerance int.

q

p

 ^xp

^c

SSMSE

p

^xp

^c

SSMSE

^c

^ b

0.75

0.30 0.40

0.276019 0.892661

1.00 1.00

0.005750 0.002517

0.90 0.95

8.060307 10.831500

1.00 1.00

0.001548 0.001936

1.00 1.00

0.601401 0.551487

0.90

0.15 0.30

0.571839 2.513848

1.00 1.00

0.007517 0.003628

0.90 0.95

21.973193 28.921074

1.00 1.00

0.003472 0.004263

0.99 0.99

0.752862 0.653056

Table 11 Observed coverage of nominal 95% confidence QI quantile estimators for the M/M/1 delay-in-queue process M/M/1

Lower quantile

Upper quantile

Tolerance int.

q

p

 ^xp

^c

SSMSE

p

^xp

^c

SSMSE

^c

^ b

0.75

0.30 0.40

0.276849 0.893522

0.89 0.83

0.022524 0.008269

0.90 0.95

8.063126 10.833309

0.91 1.00

0.004930 0.005577

1.00 1.00

0.605129 0.554908

0.90

0.15 0.30

0.573569 2.516554

0.95 0.92

0.015814 0.006136

0.90 0.95

21.990627 28.913703

0.95 1.00

0.004293 0.004909

0.99 0.99

0.754609 0.654343

Table 12 Average run lengths of estimating the M/M/1 process q

ZI

QI

0.75 0.90

11427120 12463922

1898586 14792893

Tables 10 and 11 list the M/M/1 process experimental results of the ZI and QI algorithms. The q column lists the traffic intensity. The rest of the columns are as defined previously. The results are generally similar to those from the AR(1) process. The quality of the QI quantile estimators worsen as the correlation become stronger. The QI SSMSEs of lower quantiles are large since the true quantile values are small and the variance of sample quantiles of the QI subsequence is large around lower quantiles. It is a drawback of the QI algorithm that it discards many observations when the sequences are strongly correlated. Table 12 lists the average run lengths of each simulation run of the ZI and QI procedures. The M/M/1 queuing process is strongly correlated, so the run lengths are longer than those for the AR(1) and MA(1) processes.

536

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

Table 13 True quantile values of the M/M/2 process Quantile q

0.40

0.45

0.90

0.95

0.75

0.068993

0.156004

1.860752

2.553899

Table 14 Observed coverage of nominal 95% confidence ZI quantile estimators for the M/M/2 delay-in-queue process M/M/2

Lower quantile

Upper quantile

Tolerance int.

q

p

^xp

^c

SSMSE

p

xp

^c

SSMSE

^c

^ b

0.75

0.40 0.45

0.069069 0.155927

1.00 1.00

0.010638 0.006220

0.90 0.95

1.861253 2.552710

1.00 1.00

0.002624 0.003398

0.99 1.00

0.502220 0.502273

4.4. The M/M/2 delay-in-queue processes Finally, we tested an M/M/2 queuing model. The arrival rate (k) is set to 3.0 per period and the service rate (l) is set to 2.0 per period. For this queuing model q = k=(sl) = 0.75, and the theoretical steady-state waiting-time distribution is F ðxÞ ! 1  9ex =14 (Hillier and Lieberman, 2001), where x P 0. Therefore, for this M/M/2 process quantiles less than 5=14 are not attainable. Table 13 lists the true quantile values of this M/M/2 process. Tables 14 and 15 list the M/M/2 process experimental results of the ZI and QI algorithms. The results are generally similar to those of the M/M/1 process. Since the pdf of the steady-state distribution of this M/M/2 process generally has a greater rate of descent than for the M/M/1 process with the same traffic intensity, the quality of the estimators of this M/M/2 process are not as good as for the M/M/1 process. Table 16 lists the average run lengths. 4.5. Discussion of empirical experiments The general conclusion is that the achieved accuracy is excellent. The majority of the estimated quantiles are within the maximum allowed proportional half-width of  0 . For each experiment, the confidence level of Table 15 Observed coverage of nominal 95% confidence QI quantile estimators for the M/M/2 delay-in-queue process M/M/2

Lower quantile

Upper quantile

Tolerance int.

q

p

^xp

^c

SSMSE

p

xp

^c

SSMSE

^c

^ b

0.75

0.40 0.45

0.069037 0.156025

0.83 0.84

0.025720 0.013258

0.90 0.95

1.861068 2.554835

0.94 1.00

0.005157 0.005575

1.00 1.00

0.505321 0.504778

Table 16 Average run lengths of estimating the M/M/2 process q

ZI

QI

0.75

6298125

1923323

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

537

the tolerance intervals with nominal coverage of at least 100b% is greater than the specified value of c = 0.95. About 10% of the tolerance intervals contain more than 100b%, but either the lower or upper tail area is greater than desired. Among those 10%, half of the tolerance intervals overestimated both the lower and upper tolerance limits (i.e., P^ L > P L , P U > P^ U and P^ L  P L < P U  P^ U ) and the other half underestimated both the lower and upper tolerance limits (i.e., P L > P^ L , P^ U > P U and P L  P^ L > P^ U  P U ). Since the stopping rules of the ZI and QI algorithms are different, the run lengths may differ greatly. The quality of the QI quantile estimators are not as good as the ZI quantile estimators, especially when the sequences are strongly correlated and the pdf f(xp) has large values. However, the QI algorithm is simpler, easier to implement, and its performance can be improved if we can read through the entire sequence again. To get an idea of the relative efficiency of the ZI and QI algorithms, we compare our results with some of the quantile-estimation results of the extended P2 procedure from Raatikainen (1987a). We estimated the 0.25, 0.50, 0.75, and 0.90 quantiles for the waiting time of an M/M/1 queuing process with l ¼ 1:1 and k = 1.0. The extended P2 procedure is controlled by a relative-precision criterion, while our procedures are controlled by proportional precision. In Raatikainens experiments the required relative precision is set to 0.224, while the proportional precision  0 is set to 0.005 in the ZI and QI procedures. Note also that we are using different random-number streams. Table 17 lists the run lengths determined by each procedure. These results are from 100 independent experiments, each consisting of just one run to determine a run length for that experiment, as was done in the ZI and QI algorithms. The run lengths determined by the ZI algorithm are different for different quantiles of this M/M/1 delay in queue. Lower sample quantiles stabilize quickly and require less data. In contrast, higher sample quantiles are volatile and require more data. Therefore, we list the run lengths of the zoom-in algorithm from estimating 0.25 and 0.90 quantiles. Moreover, since the run length increases by 10% at each iteration in the ZI algorithm, we list the proportion of the run lengths in certain ranges instead of each individual run length. For example, 31% of the run lengths are between 2,554,923 and 4,666,181. Table 18 lists SSMSE for these three procedures. The SSMSE of the extended P2 algorithm deteriorated when estimating high quantiles, in this case the 0.90 quantile. Because the run lengths used in the ZI and QI

Table 17 Summary of run-length characteristics for M/M/1 queuing process with l ¼ 1:1 and k = 1.0 Run length 133,103 262,144 345,243 to 505,473 524,288 611,624 to 985,030 1,048,576 1,083,534 to 1,191,888 1,311,077 to 2,322,657 2,554,923 to 4,666,181 4,947,456 4,978,827 to 6,706,185 9,894,912 10,753,433 to 12,960,651 19,789,824 39,579,648 79,159,296 Average

Extended P2

ZI(0.25)

ZI(0.90)

QI

0.01 0.06 0.02 0.64 0.07 0.30 0.04 0.19 0.31 0.08 0.36 0.62 1.00 0.26 0.03 0.01 619,180

3,750,881

11,482,572

13,654,978

538

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

Table 18 Standardized square roots of mean square error Quantile

Extended P2

ZI

QI

0.25 0.50 0.75 0.90

0.0207 0.0225 0.0250 0.0405

0.0163 0.0070 0.0065 0.0069

0.0134 0.0070 0.0068 0.0080

algorithms are larger than in the extended P2 algorithm, their SSMSEs are smaller. The run length of this M/M/1 delay-in-queue process determined by the ZI algorithm increases as the estimated quantile increases; thus, the ZI quantile estimators have smaller SSMSE when estimating high quantiles. On the other hand, the extended P2 and QI algorithms use the same run length when estimating different quantiles of the same stochastic process.

5. Concluding remarks We have proposed a proportional half-width CI for estimating quantiles and have presented two algorithms for estimating the quantile xp and tolerance interval of a stationary process. Some quantile estimates require more observations than others before the asymptotics necessary for quantile estimates become valid. Our zoom-in and quasi-independent algorithms work well in determining the required run length for a valid asymptotic approximation. The results from our empirical experiments show that the procedure is excellent in achieving the pre-specified accuracy. However, the variance of the run length from our sequential procedure is large. This is not only because of the randomness of the output sequence, but also because we double the lag length l at each step in our QI procedure. Because the run length grows rapidly at later steps, further research is needed to develop new algorithms that slow the rate of growth of the run length at later steps. The ZI algorithm requires storing and sorting only a set of values that the quantile is most likely to assume, and the QI algorithm requires storing and sorting only a subsequence of observations most likely to represent the underlying distribution. Savings in storage and sorting are substantial for these methods. Our approaches have the desirable property of being sequential procedures and not requiring users to have a priori knowledge of values that the data might assume. This allows the user to apply these methods without having to execute a separate pilot run to determine the range of values to be expected, or to guess and risk having to re-run the simulation. Both of these options represent potentially large costs to the user because many realistic simulations are time-consuming to run. The ZI algorithm can obtain exact sample quantiles in one pass and has better performance in terms of CI coverage and SSMSE. However, it is computationally intensive to sort observations in the likelihood buffer at each iteration. The advantage of the QI approach is that by using a straightforward runs-up test to determine the run length and obtain independent samples that can approximate the underlying distribution, we can apply classical statistical techniques directly instead of resorting to advanced statistical theory, making it easy to understand and simple to implement. Even though the QI procedure generally has the longest run length, only the observations in the QI subsequence are used to estimate quantiles. Thus, the performance is not as good when the sequences are strongly correlated. To eliminate the drawback of discarding observations, further research would be needed to develop quantile-estimation procedures that take into account all observations while using the QI subsequence to determine the run length.

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

539

Acknowledgements We would like to thank the Ohio Supercomputer Center for the Origin 2000 time grant, Marty Levy, Bruce Schmeiser, Ying-Chieh Yeh, Yan Yu, Yong Liu, and the anonymous referees for their valuable comments. Appendix A Proof of Proposition 1. Cn has a binomial distribution with parameters n and p, so E(Cn) = np and Var(Cn) = np(1  p). By Chebyshevs inequality, for any k < 0 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pr½jC n  npj < k npð1  pÞ P 1  1=k 2 : pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Let k ¼ 0 n=½pð1  pÞ, which is clearly positive. Then pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pð1  pÞ Pr½jC n =n  pj < 0  ¼ Pr½jC n  npj < n0  ¼ Pr½jC n  npj < k npð1  pÞ P 1  1=k 2 ¼ 1  : n02 Setting the right-hand side of this to 1  a and solving for n yields n¼

pð1  pÞ a02

and we round up to the next integer to obtain np.

h

Appendix B Proof that z21a=2 6 ð4=9Þ  ð1=aÞ: let Z be a Nð0; 1Þ random variable, so Pr(jZj < z1  a=2) = 1  a. Also, since Nð0; 1Þ is unimodal and symmetric, we have from Gausss inequality that for k > 0 PrðjZj < kÞ P 1 

4 9k 2

(see, for example, Weisstein, 2003, http://mathworld.wolfram.com/GausssInequality.html). Since 0 < a < 1, it is clear that z1  a=2 < 0, so we can take k = z1  a=2 in Gausss inequality. Then 1  a ¼ PrðjZj > z1a=2 Þ P 1 

4 9z21a=2

:

Rearranging this yields the result. References Avramidis, A.N., Wilson, J.R., 1998. Correlation-induction techniques for estimating quantiles in simulation experiments. Operations Research 46, 574–591. Billingsley, P., 1999. Convergence of Probability Measures, second ed. John Wiley, New York. Chen, E.J., Kelton, W.D., 1999. Simulation-based estimation of quantiles. In: Farrington, P.A., Nembhard, H.B., Sturrock, D.T., Evans, G.W. (Eds.), Proceedings of the 1999 Winter Simulation Conference, pp. 428–434. Chen, E.J., Kelton, W.D., 2003. Determining simulation run length with the runs test. Simulation Modelling Practice and Theory 11 (3–4), 237–250. David, H.A., 1981. Order Statistics, second ed. John Wiley, New York. Heidelberger, P., Lewis, P.A.W., 1984. Quantile estimation in dependent sequences. Operations Research 32, 185–209.

540

E.J. Chen, W.D. Kelton / European Journal of Operational Research 168 (2006) 520–540

Hesterberg, T.C., Nelson, B.L., 1998. Control variates for probability and quantile estimation. Management Science 44, 1295–1312. Hillier, F.S., Lieberman, G.J., 2001. Introduction to Operations Research, seventh ed. McGraw-Hill, Boston. Hogg, R.V., Craig, A.T., 1995. Introduction to Mathematical Statistics, fifth ed. Prentice-Hall, Englewood Cliffs, NJ. Hsu, J.C., Nelson, B.L., 1990. Control variates for quantile estimation. Management Science 36, 835–851. Hurley, C., Modarres, R., 1995. Low-storage quantile estimation. Computational Statistics 10, 311–325. Iglehart, D.L., 1976. Simulating stable stochastic systems; VI. quantile estimation. Journal of the Association for Computing Machinery 23, 347–360. Jain, R., Chlamtac, I., 1985. The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations. Communications of the Association of Computing Machinery 28, 1076–1085. Knuth, D.E., 1998, third ed. The Art of Computer Programming, vol. 2 Addison-Wesley, Reading, MA. Raatikainen, K.E.E., 1987a. Run length control for simultaneous estimation of several percentiles in dependent sequences. In: Proceedings of the Conference on Methodology and Validation, Orlando, Society for Computer Simulation. Raatikainen, K.E.E., 1987b. Simultaneous estimation of several percentiles. Simulation 49, 159–164. Raatikainen, K.E.E., 1990. Sequential procedure for simultaneous estimation of several percentiles. Transactions of the Society for Computer Simulation 7, 21–44. Sees, J.C., Jr., Shortle, J., 2002. Simulating M/G/1 queues with heavy-tailed service. In: Yu¨cesan, E., Chen, C.-H., Snowdon, J.L., Charnes, J.M., (Eds.), Proceedings of the 2002 Winter Simulation Conference, pp. 433–438. Seila, A.F., 1982a. A batching approach to quantile estimation in regenerative simulations. Management Science 28, 573–581. Seila, A.F., 1982b. Estimation of percentiles in discrete event simulation. Simulation 39, 193–200. Sen, P.K., 1972. On the Bahadur representation of sample quantiles for sequences of /-mixing random variables. Journal of Multivariate Analysis 2, 77–95. van Zwet, W.R., 1964. Convex transformations of random variables. Mathematical Centre Tracts 7, Amsterdam, The Netherlands. Weisstein, E., 2003. Eric Weissteins World of Mathematics, Wolfram Research, Inc., http://mathworld.wolfram.com/.