Computational Statistics and Data Analysis 55 (2011) 3183–3196
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
Some variants of adaptive sampling procedures and their applications Raghu Nandan Sengupta a,∗ , Angana Sengupta b a
Department of Industrial & Management Engineering, Indian Institute of Technology Kanpur, Kanpur - 208 016, India
b
Delhi Public School Kalyanpur Kanpur, Kanpur - 208 017, India
article
info
Article history: Received 19 November 2009 Received in revised form 24 May 2011 Accepted 29 May 2011 Available online 12 June 2011 Keywords: Decision analysis Normal Exponential Gamma Extreme value Sequential sampling Loss functions Squared error loss Linear exponential loss Bounded risk Simulation Business applications
abstract Sequential analysis as a sampling technique facilitates efficient statistical inference by considering less number of observations in comparison to the fixed sampling method. The optimal stopping rule dictates the sample size and also the statistical inference deduced thereafter. In this research we propose three variants of the already existing multistage sampling procedures and name them as (i) Jump and Crawl (JC), (ii) Batch Crawl and Jump (BCJ) and (iii) Batch Jump and Crawl (BJC) sequential sampling methods. We use the (i) normal, (ii) exponential, (iii) gamma and (iv) extreme value distributions for the point estimation problems under bounded risk conditions. We highlight the efficacy of using the right adaptive sampling plan for the bounded risk problems for these four distributions, considering two different loss functions, namely (i) squared error loss (SEL) and (ii) linear exponential (LINEX) loss functions. Comparison and analysis of our proposed methods with existing sequential sampling techniques is undertaken and the importance of this study is highlighted using extensive theoretical simulation runs. Crown Copyright © 2011 Published by Elsevier B.V. All rights reserved.
1. Introduction Multistage sampling procedure or sequential analysis as a tool was first developed by Wald along with Wolfowitz (Wald and Wolfowitz, 1945) for a more efficient industrial quality control. The same approach was independently developed at the same time by Alan Turing, to test the hypothesis whether different messages coded by German enigma machines should be connected and analyzed together. Wald (1947) developed this branch of statistics using the concept of sequential probability ratio test (SPRT). In course of time different variations of multistage sampling techniques have been developed such as those proposed by Hall (1981, 1983), Liu (1997), Mukhopadhyay (1990), Mukhopadhyay and Solanky (1991), Ray (1957) and Stein (1945). As a sampling technique this method is efficient as it considers less number of observations in comparison to the fixed sampling method. Few good references are, Ghosh et al. (1997), Ghosh and Sen (1991), Govindarajulu (2004), Mukhopadhyay et al. (2004), Mukhopadhyay and de Silva (2009), Mukhopadhyay and Solanky (1994), Schmitz (1993), Siegmund (1985), Wald (1947) and Zacks (2009). In recent times use of sequential analysis as a technique has found wide applications in areas like regime switching models (Carvalhoa and Lopes, 2007; Bollen et al., 2000), study of optimal clinical trials and novel therapies (Cui et al., 2009; Salvan, 1990; Orawo and Christen, 2009;
∗
Corresponding author. Tel.: +91 512 259 6607; fax: +91 512 259 7553. E-mail address:
[email protected] (R.N. Sengupta).
0167-9473/$ – see front matter Crown Copyright © 2011 Published by Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2011.05.020
3184
R.N. Sengupta, A. Sengupta / Computational Statistics and Data Analysis 55 (2011) 3183–3196
Prado, in press; Todd et al., in press), analysis of financial time series (Bai and Perron, 2002; Alp and Demetrescu, 2010), etc. In this paper we propose three different variants of the existing multistage sampling procedures available in the literature, and name them as (i) Jump and Crawl (JC); (ii) Batch Crawl and Jump (BCJ) and (iii) Batch Jump and Crawl (BJC) sequential sampling methodologies. We consider the normal, exponential, gamma and extreme value distributions separately to show the advantage of using the right adaptive sampling plan. We solve the bounded risk point estimation problems for (i) µ, i.e., the location parameter, (ii) λ, which is again the location parameter, (iii) α , the shape parameter and finally (iv) E (X ) = µ + γE σ , (where µ and σ are the location and scale parameters respectively, while γE is Euler’s constant) for the (i) normal, (ii) exponential, (iii) gamma and (iv) extreme value distributions respectively. The two different loss functions used for the bounded risk problem formulations are the squared error loss (SEL) and the linear exponential (LINEX) loss functions. To solve these bounded risk point estimation problems we use our proposed sequential sampling plans, under some specified constraints for these distributions. Finally we test the validity of our models using extensive simulation runs. The paper is organized as follows. In Section 2 we describe the concepts of a loss function and bounded risk along with the brief descriptions of few existing sequential sampling schemes. Our proposed models are then discussed in Section 3, while Section 4 deals with the background for simulation runs conducted for the four distributions. Section 5 gives the detailed analysis related to all the four sets of simulations and finally we conclude this paper with our comments in Section 6. 2. Loss functions, bounded risk, sequential sampling methodologies 2.1. Loss functions In point estimation problem our aim is to find an estimator, Tn , based on a collection of sample of observations, X1 , X2 , . . . , Xn , to estimate the parameter, θ . Tn should be unbiased as well consistent, but this may be unlikely because of the randomness of the sample. Imposition of consistency and unbiasedness may not always lead to a unique estimate. To overcome this problem we utilize the idea of a non-negative metric, called the loss function, L(Tn , θ ), where, L(Tn , θ ) = f (∆) is usually a function of ∆ = (Tn − θ ), which is the error of estimation. The accuracy of the loss function is measured by its corresponding risk function, R(Tn , θ ) = E [L(Tn , θ )] and our main concern is to minimize this risk by a proper choice of the estimator, Tn . Ideally we try to find the optimal estimator, Tn∗ , for which the risk, R Tn∗ , θ ≤ R (Tn , θ) ∀θ ∈ Θ . The optimal estimator thus obtained is called the minimum risk estimator (MRE) with respect to that particular loss function only. Practically for a particular loss function, we may not be able to find, Tn∗ , as the value of R(Tn , θ ) usually depends on the sample size, n, and the parameter, θ . Thus the general risk function may be expressed as [L(Tn , θ ) + c (n)], which is a sum of two terms. The first term, L(Tn , θ ), is non-increasing in nature with respect to n. While the second term, which is the cost of sampling is given by c (n) and depends on the sample size, n. Furthermore this second term is a non-decreasing function with respect to the sample size, n. Hence it may become difficult to find the optimal value of n, such that the risk, E [L(Tn , θ ) + c (n)], is minimized or made as low as possible. Before we discuss the method to find the optimal estimate corresponding to the minimum risk, we give examples of different types of loss functions available in the literature. Loss functions are of various types, few examples of it being the (i) squared error loss (SEL) function, (ii) weighted squared loss function, (iii) linear loss function, (iv) non-uniform linear loss function, (v) 0-1 loss function, (vi) balanced loss function (BLF), Zellner (1994), (vii) squared exponential loss function and (ix) linear exponential (LINEX) loss function, Zellner (1986). Without going into the detail of each we discuss about the SEL and LINEX loss functions only, as our analysis for all the four distributions is based on the risks associated with these two loss functions. Squared error loss (SEL) loss function: The SEL is of the form L(Tn , θ ) = (Tn − θ )2 and is the most widely used loss function. It is used in estimation problems when unbiased estimators of θ are considered, since the risk, R(Tn , θ ) = E [L(Tn , θ )] = E [(Tn − θ )2 ] is the mean square error (MSE) of Tn , which reduces to the variance of Tn subject to unbiasedness. The corresponding optimal estimator, if it exists, is called the minimum variance unbiased (MVU) estimator. It must be remembered that the weighted squared error loss [w(θ)(θ − Tn )2 ] is a variant of the SEL, where the choice of w(θ ), the weight, depends on the specific value of θ , where θ ∈ Θ , (Θ being the parameter space). Linear exponential loss function: An important point which SEL ignores is the fact that overestimation and underestimation of θ may be of unequal importance in many situations. A loss function, which takes care of this is the linear exponential loss function (LINEX), (Zellner, 1986), which is an asymmetric convex loss function given by L (Tn , θ) = b ea(Tn −θ ) − a (Tn − θ ) − 1 , where a and b are the shape and scale parameters respectively. One can easily see that for a > 0, the convex loss increases almost exponential (linearly) for positive (negative) values of error, ∆. Therefore, overestimation is of more serious concern than underestimation, while for a < 0, the trend is just the opposite. It is quite interesting to 2
2
2
note that as a → 0, L (∆) = b 1 + a1∆! + a 2∆! + · · · − a∆ − 1 ≈ b a2 ∆2 + o a2 , i.e., the LINEX loss reduces to the SEL function for small values of a. The value of b does not affect the property of the loss function, but only scales the nature of the LINEX loss function without affecting its shape.
R.N. Sengupta, A. Sengupta / Computational Statistics and Data Analysis 55 (2011) 3183–3196
3185
2.2. Bounded risk Let us illustrate the concept of bounded risk and its implication with a simple example. Consider a normal distribution 2
with the probability distribution function (p.d.f.) given by f (x; µ, σ ) =
− (x−µ) 2
, − ∞ < x < ∞, −∞ < µ < σ 2π ∞, 0 < σ < ∞, with both mean (µ) and variance (σ 2 ) unknown. Suppose one is interested to find the point estimate of µ (location parameter) subject to an SEL given by L(Tn , θ ) = (Tn − θ )2 . After recording X1 , X2 , . . . , Xn observations, ∑function, n we find the sample mean X¯ n = 1n X , which is the estimator of µ. The corresponding associated risk is given by i=1 i 2 σ R X¯ n , µ = E L X¯ n , µ = n . Next suppose we require this risk to be such that it does not exceed a pre-assigned known value, w(> 0 ) . We immediately see that if σ 2 (square of scale parameter) is known then the optimal sample size, 2 n, is D = σw . But with σ 2 unknown the problem cannot be solved with any fixed sampling techniques. √1
e
2σ
2.3. Sequential sampling methodologies From the previous section, one is aware that as σ 2 is unknown, D =
σ2 w
is also unknown. Hence using fixed sampling
rule will not help us to solve our problem of finding the minimum sample size. This implies, one has to take the recourse of some multistage or adaptive sampling techniques to solve this problem. In order to retain the continuity in the paper and make it crisp, we restrict our discussion to few of the multistage or adaptive sampling methodologies used in the literature to circumvent such a bounded risk estimation problem. Two-stage sampling procedure: Stein (1945) considered a two-stage sampling procedure, where at the first a sample stage
of size m (≥2) is drawn to estimate the unknown quantity D which is calculated using N. Here N = max m,
2 Sm
w
, is the
estimate of the number of observations, needed to satisfy the bound placed by w . The methodology works as follows. Start with X1 , X2 , . . . , Xm observations in a single batch and determine N. If N = m, then we stop and do not take any more observations in the second stage. However if N > m, then one samples an additional (N − m) observations in the second ∑N stage. Based on the total observations X1 , X2 , . . . , XN , the estimator, X¯ N = N1 i=1 Xi is calculated. We must remember since S 2 is the result of a random procedure, N, the sample size is also a random number. Purely sequential sampling procedure: Ray (1957) considered a purely sequential methodology, which starts with a
sample of size m (≥2) and one continues to take one observation at a time until, N = inf n ≥ m : n ≥ σw , where 2
2
1 ¯ is the best estimate of σ 2 which is recalculated each time the sample size, n, changes. In other S 2 = n− i =1 X i − X 1 words, the estimator is updated at each stage with the arrival of each new observation, until the stopping rule is met for the very first time. Once sampling stops the value of µ is evaluated using its estimator X¯ N . Thus it establishes the superiority of the purely sequential sampling procedure over Stein’s two-stage procedure from a statistical asymptotic viewpoint and not from the practical perspective. Three-stage sampling procedure: Even though from the theoretical standpoint, purely sequential sampling procedure satisfies the asymptotic second-order efficiency property, yet one immediately realizes that taking one observation at a time, as is done in the purely sequential sampling scheme, is practically inconvenient. Hence in order to save sampling operations and at the same time maintain the second-order property, Hall (1981) and Mukhopadhyay (1990) considered
∑n
1
the three-stage sampling procedure. This methodology is as follows. Let m = O D r , for some r > 1. That is, the starting
sample size m is allowed to grow, but in a manner that m → 0 as w → 0, which implies that m is allowed to increase D at a slower rate than D as w becomes smaller. After having fixed 0 < ρ < 1 and with the starting sample size of m (≥2), let
T = max m, ρ N = max T ,
ST2
w
2 Sm
w
.
Here T estimates ρ × D, which is a fraction of D. If T = m, then we do not sample any more in the second stage, but if T > m, one samples the difference (T − m) in one single batch. Based on the observations {X1 , X2 , . . . , XT } one now proceeds to find N which is the estimator of D. If now N = T , then we do not take any more sample in the third stage, but if N > T , the difference (N − T ) of observations are taken in the third stage to find the value of N. After the sampling procedure ∑N terminates, the estimator X¯ N = N1 i=1 Xi determines the estimated value of µ (the location parameter for the normal distribution). One must remember that even if there is a huge amount of variability in the last (N − T ) observations, still we are certain to terminate our sampling procedure following the same stopping criteria. If the variability in the last (N − T ) observations is appreciable, the number of observations one needs to take in the third, i.e., the last stage would be quite
3186
R.N. Sengupta, A. Sengupta / Computational Statistics and Data Analysis 55 (2011) 3183–3196
high. It is observed that such a three-stage procedure apart from obeying the asymptotic consistency and asymptotic first order efficiency properties, also obeys the asymptotic second-order efficiency property (Hall, 1981). Accelerated sequential sampling procedure: Another variation of the purely sequential sampling methodology has been considered by Hall (1983) and Mukhopadhyay and Solanky (1991) which cuts down on the cost by accelerating the original sequential Here also one starts with a sample size of m (≥2) and after having fixed 0 < ρ < 1 and with 2 n procedure. S define Sn∗2 = n− n 1
R = inf n ≥ m : n ≥ ρ
T =
Sn∗2
w
∗2
SR
w
N = max {R, T } . Thus one first samples purely sequentially obtaining X1 , X2 , . . . , XR such that R estimates ρ×D and then proceeds to estimate D by N. If T = R, then we do not take any more samples, but if T > R, then one samples (T −R) observations in one single batch thus curtailing sampling operations and comes up with the estimate of the location parameter µ. The asymptotic secondorder properties of such accelerated sequential procedures have been developed by Hall (1983), and Mukhopadhyay and Solanky (1991). Batch sequential sampling procedure: Liu (1997) proposed the batch sequential sampling procedure, where we first consider 0 < ρ1 < ρ2 < ρ3 < · · · < ρk−1 < 1. We also specify r1 ≥ r2 ≥ r3 ≥ · · · ≥ rk ≥ 1 and ti ’s, where ri (i = 1, 2, . . . , k) denotes the minimum number of observations one takes at each and every step in the ith batch, while ti , is the number of such steps one is required to take in that ith batch. The connotation of minimum number of observations means the number of observations or individuals one takes at one go. Thus if we have k number of batches, then for the ith (i = 1, 2, . . . , k) batch, the number of observations one would take is ti × ri , and for the whole batch sequential sampling procedure it is m +
∑k
i=1 ti
× ri , where m (≥2) is the number of observations required to initiate the batch sequential
sampling procedure. Remember, this m (≥2) number of observations is taken at one go in the first step which is literally the zero batch. One should also remember that r1 , r2 , r3 , . . . , rk and ti ∈ Z + . The procedure works as follows. Start with a sample size of m (≥2) and for each batch follow the sampling methodology according to the rule given below Sn2
R1 = inf n ≥ (m + r1 × t1 ) : n ≥ ρ1
w
R2 = inf n ≥ (R1 + r2 × t2 ) : n ≥ ρ2
.. .
Sn2
w
Sn2
estimates ρ1 D
batch #2
estimates ρ2 D
batch # (k − 1)
estimates ρk−1 D
batch #k
estimates D.
Rk−1 = inf n ≥ (Rk−2 + rk−1 × tk−1 ) : n ≥ ρk−1 N = inf n ≥ (Rk−1 + rk × tk ) : n ≥
batch #1
Sn2
w
w
3. Proposed models of multistage sampling methodologies In line with the different multistage sampling procedures discussed in Section 2.3 we propose three variants of sequential sampling plan, viz., the (i) Jump Crawl (JC), (ii) Batch Crawl and Jump (BCJ) and (iii) Batch Jump and Crawl (BJC) sequential sampling procedures. These are the hybrid or modified multistage sampling techniques similar to Hall (1981), Mukhopadhyay (1990) and Liu (1997). 3.1. Jump Crawl (JC) sequential sampling technique To illustrate the jump crawl sequential sampling methodology we first give the schematic diagram of the procedure in Fig. 1. The methodology works as follows. We start with an initial sample of size, m (≥2) and also choose a value of ρ (0 < ρ < 1). After that we jump (denotedby —) a large sample of observations at one go to estimate ρ × D (using by collecting R), keeping in mind that, R = max m, ρ
2 Sm
w
. Once ρ × D is estimated we check whether we have completed our
sampling procedure. If not,we proceed purely sequentially, i.e., take one observation at a time (denoted by —) following the rule T = inf n ≥ R : n ≥
Sn2
w
. Finally the random sample, N = max {R, T } is found which estimates, D. One should be
R.N. Sengupta, A. Sengupta / Computational Statistics and Data Analysis 55 (2011) 3183–3196
3187
Fig. 1. Jump Crawl (JC) sequential sampling technique.
Fig. 2. Batch Crawl and Jump (BCJ) sequential sampling technique.
aware that the choice of ρ depends on the compromise one makes between efficiency of the result vs. ease of sampling. A high value of ρ means we reduce our sampling effort but at a cost which results in a larger value of the estimated sample size, N. On the other hand for a low ρ , the reverse holds true where N as well as the estimate results are close to the optimal value but the effort one spends in conducting the experiment is high. 3.2. Batch Crawl and Jump (BCJ) sequential sampling technique This proposed Batch Crawl and Jump (BCJ) sequential sampling methodology follows from Liu (1997) and for convenience of understanding is illustrated below; see Fig. 2. The basic notion of this procedure differs from that of Liu (1997) due to the fact, that for each individual batch we first proceed purely sequentially (denoted by —) and then literary jump (denoted by —) after a certain number of stages to estimate the values of ρi × D′ s. In order to explain the scheme of sampling, one first needs to specify γi , ρi−1 , and k, where i = 1, 2, . . . , k. Here k is the number of batches, which is fixed at the beginning of the experiment depending on the experimenter’s choice of the sampling scheme. Now 0 = ρ0 < ρ1 < ρ2 < ρ3 < · · · < ρk−1 < 1 are the percentages of the total actual sample size, D, which is unknown and which one needs to estimate. As for the values of ρi−1 ’s, they are also specified before the experiment. So if we consider BCJ1 as illustrated in the diagram above, i.e., Fig. 2, then we need to find ρ1 × D, using its estimate R1 . γi ’s, (0 < γi < 1), are the corresponding values of percentage of the ith batch itself, i.e., till which stage in that ith batch we continue sampling taking one at a time observation, i.e., continue using the purely sequential sampling methodology (denoted by —). This effectively means we continue the purely sequential sampling methodology till the (γi × ρi × D)th stage in each of the ith batch. After that, one jumps at one go (denoted by —) to estimate ρi × D. Thus the procedure works as follows. Start with a sample size of m (≥2) and for each batch follow the crawl and jump sampling rule according to the scheme given below
T1 = inf n ≥ m : n ≥ γ1 × ρ1
U1 =
ρ1
ST21
Sn2
w
batch #1
w
R1 = max {T1 , U1 }
T2 = inf n ≥ R1 : n ≥ γ2 × ρ2
U2 =
ρ2
ST22
w
R2 = max {T2 , U2 }
.. .
Sn2
w batch #2
3188
R.N. Sengupta, A. Sengupta / Computational Statistics and Data Analysis 55 (2011) 3183–3196
Fig. 3. Batch Jump and Crawl (BJC) sequential sampling technique.
Tk−1 = inf n ≥ Rk−2 : n ≥ γk−1 × ρk−1
ρk−1
Uk−1 =
ST2k−1
Sn2
w
batch #k − 1
w
Rk−1 = max {Tk−1 , Uk−1 } and
Tk = inf n ≥ Rk−1 : n ≥ γk
Uk =
ST2k
Sn2
w
batch #k.
w
N = max {Tk , Uk } Once sampling stops, we calculate the sample estimator, X¯ N = N1 i=1 Xi . We must remember that R1 , R2 , . . . , Rk−1 and N estimate the values of ρ1 × D, ρ2 × D, . . . , ρk−1 × D and D respectively. In choosing values of ρ1 , . . . , ρk−1 , the basic idea about the efficiency of the sampling results vs. cost of sampling is similar in line to that mentioned in Section 3.1, i.e., Jump Crawl (JC) sequential sampling technique.
∑N
3.3. Batch Jump and Crawl (BJC) sequential sampling technique The Batch Jump and Crawl (BJC) sequential sampling methodology (Fig. 3) is just a different variant of the methodology explained in Section 3.2, whereby for each batch we follow the jump crawl sampling scheme rather than the crawl jump procedure. The nomenclature of BJC scheme is exactly the same as for BCJ procedure (Section 3.2). Hence we illustrate the diagram and the procedure without going into detailed explanation of the same.
T1 =
γ1 × ρ1
2 Sm
w
U1 = max {m, T1 }
batch #1
R1 = inf n ≥ U1 : n ≥ ρ1
T2 =
γ2 × ρ2
SR21
Sn2
w
w
U2 = max {R1 , T2 }
batch #2
R2 = inf n ≥ U2 : n ≥ ρ2
2
Sn
w
.. . Tk−1 =
γk−1 × ρk−1
SR2k−2
w
Uk−1 = max {Rk−2 , Tk−1 }
Rk−1 = inf n ≥ Uk−1 : n ≥ ρk−1
batch #k − 1
Sn2
w
R.N. Sengupta, A. Sengupta / Computational Statistics and Data Analysis 55 (2011) 3183–3196
3189
and
Tk =
γk
SR2k−1
w batch #k.
Uk = max {Rk−1 , Tk }
N = inf n ≥ Uk : n ≥
Sn2
w
After the sampling stops we find the estimator of µ using X¯ N = N1 i=1 Xi . One should note that the value of ρ (for JC method) or ρi and γi (for BJC or BCJ) would depend on two important factors which are (i) expected regret, i.e., the difference between the estimated risk and its actual value (which is denoted by w ) and (ii) number of sampling operations. Hence finding the optimal values of ρ, ρi and γi is a different research idea by itself, which is not the focus of this paper.
∑N
4. Background for simulation study The emphasis of this research work is to highlight the efficacy of the proposed models. Thus for a better appreciation of this work, we illustrate the general framework of simulation employed for the proposed models for all the four distributions. The results of these simulation runs are then discussed in Section 5. Normal distribution: Suppose X ∼ N (µ, σ 2 ), and one is interested to estimate the location parameter, µ. Now in order to compare the efficiencies of the multistage sampling procedures we consider the bounded risk estimation ∑ problem of µ, for both (i) SEL and (ii) LINEX loss functions. The estimate of µ is µ ˆ = X¯ n = 1n ni=1 Xi , and the bounded SEL and LINEX loss risk problems may be written as
R X¯ n , µ = E
X¯ n − µ
2
=
σ2
≤w
and
n a2 σ 2 ¯ − a X −µ R X¯ n , µ = E e ( n ) − a X¯ n − µ − 1 = e 2n − 1 ≤ w , respectively. The corresponding optimal sample 2 2 2 sizes, with a known value of the scale parameter, σ , are then D ≥ σw and D ≥ 2 loga (σ1+w) respectively. In case σ is
e
unknown, then both the bounded risk problems may be solved using different adaptive sampling procedures, some of which have been discussed in Section 3. In this work we determine the empirical efficiency of the proposed sequential sampling estimation procedures for the bounded risk problems for both SEL and LINEX loss functions separately. For some interesting theoretical results as well as practical examples for the bounded risk estimation problem of µ under the LINEX loss function, one may check Chattopadhyay et al. (2000), Chattopadhyay and Sengupta (2006), Sengupta (2008), Takada (2006), etc. Exponential distribution: Consider the exponential distribution, X ∼ E (σ , λ), with location parameter, λ and scale parameter as σ . We highlight the advantages of using sequential sampling procedures when one is interested to estimate, λ. If (X1 , X2 , . . . , Xn ) is the set of sample observations, then the best estimate of λ is X(1) = min {X σ values. Now the bounded SEL risk problem may be 1 , X2 , . . . , Xn }, for and unknown both known
= σn2 ≤ w , while the counterpart for LINEX loss bounded risk is 1 aσ − − 1 ≤ w . The corresponding optimal sample R X(1) , λ = E e−a(X(1) −λ) − a X(1) − λ − 1 = aσ n 1−( n ) 1 σ2 sizes, are D ≥ and D ≥ 12 a + |a| 1 + w4 2 σ . So with σ being unknown, one may solve both these bounded w R X(1) , λ = E
formulated as
2
X(1) − λ
2
risk problems using different multistage sampling methodologies. An interested reader may refer to Basu (1971) and Chattopadhyay (2000), to get a good idea about the use of sequential sampling methods for estimating the location parameter, λ, for the SEL and the LINEX loss function respectively. Gamma distribution: Next assume X ∼ G(α, β, γ ), i.e., the gamma distribution, where α, β and γ as the shape, scale and location parameters respectively. simplicity assume β = 1, γ = 0, then one can easily see that α = E (X ), for ∑For n which the best estimate is αˆ = 1n i=1 Xi . So the bounded SEL and LINEX risk problem for the gamma distribution are
a(α−α ˆ ) − a αˆ − α − 1 = e−aα 1 − a −nα − 1 ≤ w (provided ≤ w and R α, ˆ α = E e n n 2 −1 < a < n) respectively. Furthermore the corresponding optimal sample sizes, are given by D ≥ wα and D ≥ 2 loga (1α+w) e (with −1 < a < n). Thus with unknown σ one may solve both these bounded risk problems using any one of the sequential R α, ˆ α =E
2 αˆ − α =
α
analysis methods discussed earlier. Extreme Value Distribution: Finally consider X ∼ EVD(µ, σ ) as the Extreme Value Distribution (EVD) with µ as its location parameter and σ as the scale parameter. We know the estimate of σ is σˆ =
6
π2
×
1 n −1
×
∑n i =1
2
Xi − X¯ n ,
∑n that of µ is µ ˆ = X¯ n − γE σˆ . Hence the SEL bounded risk problem formulation for µ, is i=1 Xi , while 2 2 2 2 2 R µ, ˆ µ = E µ ˆ −µ = π 6nσ ≤ w , for which the optimal sample size is D ≥ π6wσ . In case one considers where X¯ n =
1 n
3190
R.N. Sengupta, A. Sengupta / Computational Statistics and Data Analysis 55 (2011) 3183–3196
the LINEX loss function, then the bounded risk problem takes the form of R µ, ˆ µ
aσ n
ˆ )−a µ = E ea(µ−µ ˆ −µ −1 =
n
− aγE σ − 1 ≤ w, under the conditions that |a| ≤ σn and a ̸= 0. The optimal sample size for this bounded −a2 σ 2 , when |a| ≤ σn and a ̸= 0. Hence to find the optimal estimate of µ, with unknown LINEX loss risk is D ≥ 2 log (1+w+ aγ σ )+aσ } { e σ , both these bounded risk problems may be solved employing any one of the sequential sampling methods discussed. Furthermore this estimate of µ will be used to calculate E(X). Hence one first finds the bounded risk estimate of µ and then calculates E(X) using Eˆ (X ) = µ ˆ N + γE σ , where µ ˆ N , is the estimate of µ found using any one of the sequential sampling 0 1−
plans. 5. Results and discussions Consider X as the general random variable which specifies the distribution used for our simulation study. Thus X r.v. denotes (i) normal, (ii) exponential, (iii) gamma and (iv) extreme value cases. Our study highlights the efficiencies of the three proposed sequential sampling schemes, and the exhaustive simulation results from all the four distributions adequately emphasize this. As a first step we simulate 1,000,000 X values, i.e., realized values from (i) X ∼ N (µ = 10, σ 2 = 1) (for both SEL and LINEX case), (ii) X ∼ E (σ = 10, λ = 4) (both for SEL and LINEX instances), (iii) X ∼ G(α = 5, β = 1, γ = 0) (for both SEL and LINEX runs) and (iv) X ∼ EVD(µ = 3, σ = 1)/X ∼ EVD(µ = 5, σ = 95) (for SEL/LINEX example). These 1,000,000 generated realized values may be considered as those which constitute the respective populations for the four distributions under study. Apart from considering the relevant parameter values present in the risk function for all the four cases as unknown, we also face the added constraint that the level of risk, i.e., w , is bounded from above. This bound on w signifies resource restrictions as we may have limitations on our budgetary allocations. So to estimate the required value of the parameter of interest one employs multistage sampling methodologies under bounded risk consideration. To understand the efficacies of different sequential sampling plans, we compare the average performances (efficiencies) of the adaptive sampling procedures, by simulating each metric value a total of 1,00,000 times (for any distribution). We then find the average values of these metrics which are of interest to us. Using different choices of w , we find the estimates, i.e., we simulate K (=1000) number of runs, i = 1, 2, . . . , K , and for each of these K ’s we repeat them B (=100) number of times, j = 1, 2, . . . , B. Thus with B × K number of simulations, for different choices of w , we use bootstrap methodology with B = 100 and K = 1000, where the idea of bootstrap is to get the best values for the different estimates we like to use for comparison. Hence for each of the four distribution examples, our motivation is theefficiency of the proposed to highlight multistage sampling procedure by comparing its simulated values of risk, R¯¯ θˆN , θ
∑ ∑ B K
∑
=
1 B×K
∑B ∑K j =1
i=1
Ri,j θˆNi,j , θ
,
1 ˆ ˆ average sample size, N¯¯ = B×1 K j =1 i=1 Ni,j and estimate, θ N = B×K j =1 i=1 θNi,j with the corresponding values of w, D and θ , respectively. Apart from this endeavor, one would also like to comment how the given procedures compare within themselves. One must remember that the theoretical values are a benchmark against which any particular sequential sampling methodology should be judged. Two of comparison, one would like to use are (i) average number of sampling other useful metric
operations, SO = B∗1K
∑B ∑K j=1
i =1
B
∑K
SOi,j and (ii) CPU time required in seconds to complete a particular adaptive sampling
procedure. While, SO gives us the average number of sampling operations one would need to perform for any sampling method, the CPU time on the other hand provides us the average time one would need to invest to conduct such an experiment. The highlight of this study is the generic nature of the analysis and deduction one concludes from all the four different sets of simulation runs pertaining to the four different distributions. Keeping that in mind our attempt is to first state the different parameter values used for all the four different examples (Section 5.1) and then analyze the results and state our observations for the four instances (in Section 5.2). 5.1. Parameter values for different distributions Normal distribution: Table 1(a) shows the first set of simulation runs for the SEL loss function for AS vs. JC sequential sampling procedures with the following combinations of w = (0.008.0.010), ρ = (0.6, 0.7, 0.8) and m = 10. On the other hand Table 1(b) contains the values of the simulation runs considering SEL case, for both BCJ vs. BJC sequential sampling schemes with w = (0.008, 0.010), γ = (γ1 = 0.5, γ2 = 0.6, γ3 = 0.7), ρ = (ρ1 = 0.7, ρ2 = 0.8) and m = 4 combination set. We know that LINEX loss highlights the significance of over estimation vis-à-vis underestimation and that depends on the value of a. So with a = +0.8 (over estimation being more penalized) the corresponding results for (i) AS vs. JC and (ii) BCJ vs. BJC are shown in Table 2(a) and (b) respectively. The combination of parameters used to illustrate these run results (i.e., Table 2(a) and (b)) are (i) w = (0.004, 0.006), ρ = (0.6, 0.7, 0.8), m = 10 and (ii) w = (0.004, 0.006), γ = (γ1 = 0.3, γ2 = 0.5, γ3 = 0.7), ρ = (ρ1 = 0.7, ρ2 = 0.9), m = 4, respectively. For the under estimation case (a = −1.0) the simulation results for (i) AS vs. JC and (ii) BCJ vs. BJC are depicted in Table 3(a) and (b)
R.N. Sengupta, A. Sengupta / Computational Statistics and Data Analysis 55 (2011) 3183–3196
3191
Table 1 (a) and (b) Simulation results for X ∼ N (µ = 10, σ 2 = 1), i.e., normal distribution, with SEL function, B = 100, K = 1000 considering (a) Accelerated Sequential (AS), Jump and Crawl (JC) with m = 10 and (b) Batch Crawl and Jump (BCJ), Batch Jump and Crawl (BJC) with ρ1 = 0.7, ρ2 = 0.8, γ1 = 0.5, γ2 = 0.6, γ3 = 0.7, m = 4 sequential sampling procedures.
w
ρ 0.6 0.7 0.8 0.6 0.7 0.8
0.008
0.010
w 0.008 0.010
D
D
N
126
101
SO
CPU time (s)
JC
AS
JC
AS
JC
AS
JC
AS
JC
74 86 99 59 69 79
124 124 124 99 99 99
64 76 89 49 59 69
114 114 114 89 89 89
0.013172 0.011313 0.009913 0.016408 0.014102 0.012361
0.007946 0.007946 0.007946 0.009915 0.009915 0.009914
9.996341 10.001254 9.999738 9.998820 10.002488 9.998432
9.998960 10.000606 10.000087 10.001149 9.998018 9.999236
522.83 599.20 680.16 493.45 482.06 544.48
851.86 848.00 853.53 678.27 675.34 689.47
µ ˆ N = XN
SO = SO1 + SO2 + SO3
R
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
85 67
124 99
75 = 35 + 15 + 25 57 = 26 + 11 + 20
120 = 77 + 14 + 29 95 = 61 + 12 + 22
0.011780 0.014879
0.007946 0.009914
9.995904 9.997748
9.998626 9.999248
184.89 175.08
960.05 753.06
N
126 101
µ ˆ N = XN
R
AS
CPU time (s)
Table 2 (a) and (b) Simulation results for X ∼ N (µ = 10, σ 2 = 1), i.e., normal distribution, with LINEX loss function, B = 100, K = 1000, a = +0.8 considering (a) Accelerated Sequential (AS), Jump and Crawl (JC) with m = 10 and (b) Batch Crawl and Jump (BCJ), Batch Jump and Crawl (BJC) with ρ1 = 0.7, ρ2 = 0.9, γ1 = 0.3, γ2 = 0.5, γ3 = 0.7, m = 4 sampling procedures.
w
ρ 0.6 0.7 0.8 0.6 0.7 0.8
0.004
0.006
w 0.004 0.006
D
D
N
81
54
µ ˆ N = XN
R
CPU time (s)
JC
AS
JC
AS
JC
AS
JC
AS
JC
47 55 63 31 36 41
79 79 79 52 52 52
37 45 53 21 26 31
69 69 69 42 42 42
0.000025 0.000023 0.000021 0.000038 0.000034 0.000031
0.000017 0.000017 0.000017 0.000025 0.000025 0.000026
10.001017 9.998713 10.000788 10.000929 10.000178 10.002220
10.000107 9.999260 10.000520 9.997193 9.999643 10.001867
347.36 401.17 461.73 242.39 272.16 313.63
568.03 568.13 567.50 376.36 381.64 380.84
µ ˆ N = XN
SO = SO1 + SO2 + SO3
R
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
53 35
79 52
43 = 8 + 16 + 19 25 = 3 + 10 + 12
75 = 46 + 18 + 11 48 = 27 + 12 + 9
0.000022 0.000033
0.000017 0.000026
9.999058 9.999793
10.000127 10.001627
160.11 152.83
633.58 438.73
N
81 54
SO
AS
CPU time (s)
Table 3 (a) and (b) Simulation results for X ∼ N (µ = 10, σ 2 = 1), i.e., normal distribution, with LINEX loss function, B = 100, K = 1000, a = −1.0 considering (a) Accelerated Sequential (AS), Jump and Crawl (JC) with m = 10 and (b) Batch Crawl and Jump (BCJ), Batch Jump and Crawl (BJC) with ρ1 = 0.7, ρ2 = 0.9, γ1 = 0.3, γ2 = 0.5, γ3 = 0.7, m = 4 sequential sampling procedures.
w
ρ 0.6 0.7 0.8 0.6 0.7 0.8
0.004
0.006
w 0.004 0.006
D
D
126 84
N
126
84
SO
µ ˆ N = XN
R
CPU time (s)
AS
JC
AS
JC
AS
JC
AS
JC
AS
JC
74 87 99 49 57 66
124 125 124 83 83 82
64 77 89 39 47 56
114 114 114 73 73 72
0.000025 0.000023 0.000021 0.000038 0.000034 0.000031
0.000017 0.000017 0.000017 0.000025 0.000025 0.000025
10.001221 10.001031 10.000290 10.000313 9.998714 10.002332
9.996907 10.000459 9.997936 10.000111 9.995758 10.003020
536.88 629.58 727.95 357.80 418.66 483.52
906.17 907.88 910.55 599.24 600.52 599.86
µ ˆ N = XN
SO = SO1 + SO2 + SO3
R
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
84 55
124 82
74 = 17+25+32 45 = 9 + 16 + 20
120 = 77 + 27 + 16 78 = 48 + 18 + 12
0.000022 0.000033
0.000018 0.000026
10.002424 10.003427
9.999272 10.002528
173.74 158.03
965.34 655.22
N
CPU time (s)
respectively. The combinations of parameter set w, ρ, γ and m for a = −1.0 (Table 3(a) and (b)) are the same as that used in the run results shown in Table 2(a) and (b) (a = +0.8). Exponential distribution: With the SEL function, Table 4(a) and (b) highlight the simulated estimates of different values of interest for AS vs. JC (Table 4(a)) and BCJ vs. BJC (Table 4(b)) methods respectively. The combinations of parameter choices, used in Table 4(a) are w = (0.009, 0.010), ρ = (0.6, 0.7, 08), m = 10. While Table 4(b) shows the runs for the combination w = (0.009, 0.010), γ = (γ1 = 0.7, γ2 = 0.8, γ3 = 0.9), ρ = (ρ1 = 0.7, ρ2 = 08), m = 4. The corresponding different values of the estimates, when one considers over estimation, (a = +0.8), for the LINEX loss case, are depicted for the sequential sampling plans AS vs. JC in Table 5(a), for which the parameter combinations are w = (0.002, 0.003),
3192
R.N. Sengupta, A. Sengupta / Computational Statistics and Data Analysis 55 (2011) 3183–3196
Table 4 (a) and (b) Simulation results for X ∼ E (σ = 10, λ = 4), i.e., exponential distribution, with SEL function, B = 100, K = 1000 considering (a) Accelerated Sequential (AS), Jump and Crawl (JC) with m = 10 and (b) Batch Crawl and Jump (BCJ), Batch Jump and Crawl (BJC) with ρ1 = 0.7, ρ2 = 0.8, γ1 = 0.7, γ2 = 0.8, γ3 = 0.9, m = 4 sequential sampling procedures.
w
ρ 0.6 0.7 0.8 0.6 0.7 0.8
0.009
0.010
w 0.009 0.010
D
D
N
106
101
N
106 101
SO
R
λˆ N = min∀N min λˆ
CPU time (s)
AS
JC
AS
JC
AS
JC
AS
JC
AS
JC
89 104 119 85 99 113
148 148 148 141 141 141
79 94 109 75 89 103
138 138 138 131 131 131
0.024618 0.018127 0.013902 0.027332 0.020128 0.015438
0.008920 0.008920 0.008920 0.009907 0.009906 0.009907
4.000008 4.000028 4.000003 4.000029 4.000005 4.000021
4.000019 4.000011 4.000016 4.000006 4.000012 4.000010
660.753 758.332 853.055 627.432 735.079 813.555
1036.199 1053.812 1042.204 990.366 989.601 993.127
SO = SO1 + SO2 + SO3
R
λˆ N = min∀N min λˆ
CPU time (s)
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
133 126
148 141
123 = 63 + 23 + 37 116 = 59 + 21 + 36
144 = 95 + 16 + 33 137 = 92 + 16 + 29
0.011112 0.012351
0.008920 0.009907
4.000014 4.000011
4.000003 4.000003
270.614 269.163
1112.202 1062.828
Table 5 (a) and (b) Simulation results for X ∼ E (σ = 10, λ = 4), i.e., exponential distribution, with LINEX loss function, B = 100, K = 1000, a = +0.8 considering (a) Accelerated Sequential (AS), Jump and Crawl (JC) with m = 10 and (b) Batch Crawl and Jump (BCJ), Batch Jump and Crawl (BJC) with ρ1 = 0.7, ρ2 = 0.8, γ1 = 0.7, γ2 = 0.8, γ3 = 0.9, m = 4 sequential sampling procedures.
w
ρ 0.6 0.7 0.8 0.6 0.7 0.8
0.002
0.003
w 0.001 0.002
D
D
N
183
151
R
λˆ N = min∀N min λˆ
CPU time (s)
AS
JC
AS
JC
AS
JC
AS
JC
AS
JC
109 127 146 89 104 119
182 182 182 151 149 149
99 117 136 79 94 109
172 172 172 141 139 139
0.005645 0.004111 0.003127 0.008501 0.006177 0.004691
0.001983 0.001983 0.001983 0.002969 0.002969 0.002969
4.000003 4.000009 4.000001 4.000008 4.000004 4.000008
4.000007 4.000000 4.000024 4.000002 4.000006 4.000004
961.584 1123.606 1295.689 790.686 922.662 1054.965
1617.564 1617.564 1621.090 1313.239 1310.743 1314.612
SO = SO1 + SO2 + SO3
N
258 183
SO
λˆ N = min∀N min λˆ
R
CPU time (s)
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
230 162
256 182
220 = 104 + 51 + 65 152 = 72 + 35 + 45
252 = 172 + 27 + 53 178 = 118 + 20 + 40
0.001249 0.002514
0.000994 0.001983
4.000005 4.000005
4.000009 4.000003
359.362 315.666
2466.001 1704.051
Table 6 (a) and (b) Simulation results for X ∼ E (σ = 10, λ = 4), i.e., exponential distribution, with LINEX loss function, B = 100, K = 1000, a = −1.0 considering (a) Accelerated Sequential (AS), Jump and Crawl (JC) with m = 10 and (b) Batch Crawl and Jump (BCJ), Batch Jump and Crawl (BJC) with ρ1 = 0.7, ρ2 = 0.8, γ1 = 0.7, γ2 = 0.8, γ3 = 0.9, m = 4 sequential sampling procedures.
w
ρ
D
0.6 0.7 0.8 0.6 0.7 0.8
0.002
0.003
N
219
178
SO
λˆ N = min∀N min λˆ
R
CPU time (s)
AS
JC
AS
JC
AS
JC
AS
JC
AS
JC
130 152 174 106 124 141
218 218 218 177 177 177
120 142 164 96 114 131
208 208 208 167 167 167
0.005335 0.003966 0.003064 0.007934 0.005913 0.004577
0.001986 0.001987 0.001986 0.002975 0.002975 0.002975
4.000007 4.000005 4.000009 4.000008 4.000005 4.000003
4.000002 4.000009 4.000002 4.000007 4.000009 4.000031
1168.581 1347.637 1563.136 944.144 1086.961 1247.766
1959.376 1958.830 1956.240 1560.436 1569.314 1556.194
λˆ N = min∀N min λˆ
w
D
N BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
0.001 0.002
312 219
279 195
311 218
269 = 128 + 62 + 79 185 = 87 + 43 + 55
307 = 208 + 33 + 66 214 = 144 + 24 + 46
0.001238 0.002480
0.000995 0.001987
4.000002 4.000031
4.000009 4.000009
402.075 331.547
3065.088 2065.394
SO = SO1 + SO2 + SO3
R
CPU time (s)
ρ = (0.6, 0.7, 0.8), m = 10. On the other hand for BCJ vs. BJC sequential sampling plans, which are depicted in Table 5(b), the corresponding parameter set is w = (0.001, 0.002), γ = (γ1 = 0.7, γ2 = 0.8, γ3 = 0.9), ρ = (ρ1 = 0.7, ρ2 = 0.8), m = 4. Furthermore Table 6(a) and (b) show the synopsis of the simulation results when a = −1.0, for the multistage sampling methodologies, AS vs. JC and BCJ vs. BJC, respectively. The w, γ , ρ and m values for these are the same as those for the LINEX loss overestimation instance (a = +0.8), i.e., Table 5(a) and (b).
R.N. Sengupta, A. Sengupta / Computational Statistics and Data Analysis 55 (2011) 3183–3196
3193
Table 7 (a) and (b) Simulation results for X ∼ G(α = 5, β = 1, γ = 0), i.e., gamma distribution, with SEL function, B = 100, K = 1000, m = 10, considering (a) Accelerated Sequential (AS), Jump and Crawl (JC) with m = 10 and (b) Batch Crawl and Jump (BCJ), Batch Jump and Crawl (BJC) with ρ1 = 0.7, ρ2 = 0.8, γ1 = 0.6, γ2 = 0.7, γ3 = 0.8, m = 4 sequential sampling procedures.
w
ρ 0.7 0.8 0.9 0.7 0.8 0.9
0.01
0.02
w 0.01 0.02
D
D
N
501
251
αˆ N = XN
R
CPU time (s)
JC
AS
JC
AS
JC
AS
JC
AS
JC
350 400 449 175 200 225
501 501 501 251 251 251
340 390 439 165 190 215
491 491 491 241 241 241
0.014298 0.012510 0.011151 0.028630 0.025047 0.022260
0.009988 0.009988 0.009988 0.019952 0.019952 0.019952
4.991384 4.999725 4.997953 4.992816 4.993773 4.994360
4.996036 4.996925 4.998963 4.998004 4.997757 4.992071
809.984 935.907 1076.484 404.203 459.969 517.390
1181.718 1188.703 1183.906 578.156 574.734 573.282
SO = SO1 + SO2 + SO3
N
501 251
SO
AS
αˆ N = XN
R
CPU time (s)
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
404 201
501 251
394 = 203 + 71 + 120 191 = 96 + 35 + 60
497 = 342 + 52 + 103 247 = 167 + 27 + 53
0.012406 0.024956
0.009988 0.019953
5.003337 4.994732
4.999364 4.998398
178.828 114.406
1240.812 613.516
Table 8 (a) and (b) Simulation results for X ∼ G(α = 5, β = 1, γ = 0), i.e., gamma distributions, with LINEX loss function, B = 100, K = 1000, a = +0.8 considering (a) Accelerated Sequential (AS), Jump and Crawl (JC) with m = 10 and (b) Batch Crawl and Jump (BCJ), Batch Jump and Crawl (BJC) with ρ1 = 0.7, ρ2 = 0.8, γ1 = 0.7, γ2 = 0.8, γ3 = 0.9, m = 10 sequential sampling procedures.
w
ρ 0.7 0.8 0.9 0.7 0.8 0.9
0.01
0.02
w 0.01 0.02
D
D
322 162
N
322
162
SO
CPU time (s)
JC
AS
JC
AS
JC
AS
JC
AS
JC
227 257 288 113 129 145
322 322 322 162 162 162
217 247 278 103 119 135
312 312 312 152 152 152
−0.999273 −0.999657 −0.999657 −0.999646 −0.999647 −0.999651
−0.999658 −0.999659 −0.999661 −0.999650 −0.999651 −0.999649
4.899971 4.998107 4.997129 4.993756 4.992525 4.995651
4.996633 4.999112 5.001849 4.991776 4.994285 4.990083
570.893 663.940 710.5567 309.0546 340.6197 383.6076
821.9422 797.3391 815.2728 419.6498 414.6433 412.4438
SO = SO1 + SO2 + SO3
N
αˆ N = XN
R
AS
αˆ N = XN
R
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
290 146
322 162
280 = 149 + 49 + 82 136 = 70 + 24 + 42
312 = 216 + 32 + 64 152 = 103 + 17 + 32
−0.999657 −0.999660 4.996630 −0.999649 −0.999650 4.992363
CPU time (s) BJC
BCJ
BJC
5.001139 4.992775
156.4205 112.2666
821.8930 418.4169
Gamma distribution: Tables 7(a), (b) and 8(a), (b) are the synopsis of the results when one takes into account the gamma distribution for AS vs. JC (SEL), BCJ vs. BJC (SEL), AS vs. JC (LINEX, a = +0.8) and BCJ vs. BJC (LINEX, a = +0.8) instances respectively. For the SEL (AS vs. JC, i.e., Table 7(a)) example the values of the parameters one assumes are w = (0.01, 0.02), ρ = (0.7, 0.8, 0.9), m = 10. While for BCJ vs. BJC under SEL (Table 7(b)) the corresponding values are w = (0.01, 0.02), γ = (γ1 = 0.6, γ2 = 0.7, γ3 = 0.8), ρ = (ρ1 = 0.7, ρ2 = 0.8), m = 4. When we switch over to the LINEX loss case, with a = +0.8, Table 8(a) summarizes the comparison of AS vs. JC, for which we consider the parameter values, as w = (0.01, 0.02), ρ = (0.7, 0.8, 0.9), m = 10. Finally the BCJ vs. BJC comparison results are highlighted in Table 8(b) with w = (0.01, 0.02), γ = (γ1 = 0.7, γ2 = 0.8, γ3 = 0.9), ρ = (ρ1 = 0.7, ρ2 = 0.8), m = 10 as the parameter set for a = +0.8. Extreme Value Distribution: The final two sets of tables, i.e., Tables 9(a), (b) and 10(a), (b) highlight the findings for the simulation runs when the distribution is extreme valued. For the SEL bounded risk example, one considers a combination of (i) w = (0.007, 0.008), ρ = (0.6, 0.7, 0.8), m = 10 and (ii) w = (0.007, 0.008), γ = (γ1 = 0.5, γ2 = 0.6, γ3 = 0.7), ρ = (ρ1 = 0.6, ρ2 = 0.7), m = 10 to evaluate the performance of (i) AS vs. JC (Table 9(a)) and (ii) BCJ vs. BJC (Table 9(b)) multistage sampling methods respectively. We assume the asymmetric, i.e., LINEX, the loss function with a fixed level of bounded risk, i.e., w = +0.05. Our aim is to study the effect of change of a, i.e., shape parameter of the LINEX loss, on the sample size as well as on the estimate of E (X ) (which is of interest to us). The results are highlighted in Table 10(a) (AS vs. JC) and Table 10(b) (BCJ vs. BJC). For the AS vs. JC simulation runs, the parameters vectors are a = (+0.5, +1.0), ρ = (0.5, 0.7, 0.9), m = 10, while a = (+0.5, +1.0), γ = (γ1 = 0.5, γ2 = 0.6, γ3 = 0.7), ρ (ρ1 = 0.6, ρ2 = 0.7), m = 10 are the corresponding values of parameters for BCJ vs. BJC sampling procedures. 5.2. Results, general findings and discussions A closer look at all the tables conveys few general as well as some specific findings. We first highlight the general results and then discuss the specific analysis pertaining to few individual instances.
3194
R.N. Sengupta, A. Sengupta / Computational Statistics and Data Analysis 55 (2011) 3183–3196
Table 9 (a) and (b) Simulation results for X ∼ EVD(µ = 3, σ = 1), i.e., distributions, with SEL function, B = 100, K = 1000, m = 10, considering (a) Accelerated Sequential (AS), Jump and Crawl (JC) with m = 10 and (b) Batch Crawl and Jump (BCJ), Batch Jump and Crawl (BJC) with ρ1 = 0.6, ρ2 = 0.7, γ1 = 0.5, γ2 = 0.6, γ3 = 0.7, m = 10 sequential sampling procedures.
w
ρ 0.6 0.7 0.8 0.6 0.7 0.8
0.007
0.008
w
D
D
0.007 0.008
235
206
CPU time (s)
JC
AS
JC
AS
JC
AS
JC
AS
JC
228 266 306 199 233 267
384 384 383 334 335 337
218 256 296 189 223 257
374 374 373 324 325 327
0.011672 0.010004 0.008753 0.013340 0.011434 0.010005
0.006985 0.006984 0.006984 0.007980 0.007980 0.007980
3.731245 3.732031 3.734643 3.729973 3.731583 3.733409
3.735142 3.735685 3.736172 3.733215 3.733679 3.735939
838.719 987.579 1142.735 734.719 860.344 991.250
1467.141 1471.593 1465.781 1265.578 1264.047 1272.703
SO
R
SO = SO1 + SO2 + SO3
N
235 206
Eˆ (X ) = µ + γ ∗ σˆ N
AS
N
R
Eˆ (X ) = µ + γ ∗ σˆ N
CPU time (s)
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
268 234
383 335
258 = 107 + 56 + 95 224 = 93 + 48 + 83
373 = 213 + 42 + 118 325 = 190 + 35 + 100
0.010149 0.011676
0.006985 0.007980
3.731585 3.731271
3.734956 3.733650
152.047 140.000
1501.125 1286.406
Table 10 (a) and (b) Simulation results for X ∼ EVD(µ = 5, σ = 95), i.e., distributions, with LINEX loss function, B = 100, K = 1000, w = +0.05 considering (a) Accelerated Sequential (AS), Jump and Crawl (JC) with m = 10 and (b) Batch Crawl and Jump (BCJ), Batch Jump and Crawl (BJC) with ρ1 = 0.6, ρ2 = 0.7, γ1 = 0.5, γ2 = 0.6, γ3 = 0.7, m = 10 sequential sampling procedures.
ρ
a
+0.5
+1.0
a
0.5 0.7 0.9 0.5 0.7 0.9 D
+0.5 +1.0
D
36 62
N
36
62
Eˆ (X ) = µ + γ ∗ σˆ N
SO
CPU time (s)
AS
JC
AS
JC
AS
JC
AS
JC
22 31 40 37 52 67
44 44 44 74 74 74
12 21 30 27 42 57
34 34 34 64 64 64
70.944758 72.267342 72.934863 72.975407 73.672788 74.089739
72.993731 73.143197 72.998988 74.213237 74.155536 74.150556
921.214 1550.071 2162.724 1948.064 3112.121 4024.412
2474.927 2431.346 2436.554 4677.936 4588.755 4792.011
SO = SO1 + SO2 + SO3
Eˆ (X ) = µ + γ ∗ σˆ N
BCJ
BJC
BCJ
BJC
BCJ
BJC
BCJ
BJC
30 51
44 74
20 = 4 + 5 + 11 41 = 12 + 9 + 20
34 = 16 + 5 + 13 64 = 34 + 8 + 22
72.166052 73.643506
73.122859 74.146519
528.010 592.107
2745.722 4895.464
N
CPU time (s)
General findings: As w decreases (i.e., as we deploy more resources for our sampling estimation problem) the value of N increases for both AS and BCJ sequential sampling procedures. This is true irrespective of whether one uses the SEL or the LINEX loss function or any one of the four underlying distributions. The only exception being the EVD bounded LINEX loss risk case for any fixed value of w . For all these runs, with decreasing w , the corresponding values of estimated risk (denoted
by R¯¯ (θˆN , θ )) show a decreasing trend while that of sampling time (denoted by CPU time) portrays an increasing tendency. On the other hand, on an average for the other two methods, whereby we first jump and then crawl (i.e., JC and BJC) the N value almost always estimates the optimal sample size, D, accurately, and the values of estimated risk and sampling time show similar trends as that witnessed for AS and BCJ sampling procedures. But one interesting fact worth mentioning is the pronounced effect of part estimation (considering the effect of ρ and γ ) in AS and BCJ cases, which is almost negligible for both JC and BJC methods. This may be true for the reason that in both AS and BCJ, for increasing values of ρ we complete more of the sampling scheme using one big chunk of observations after which we switch over to one observation at a time till the end. While for JC and BJC one is already on the over drive by undertaking a big step after sampling one observation at a time initially. Hence using AS and BCJ we slowly approach the limiting case, while by utilizing JC and BJC we are already there instantly. Thus we do get an accurate estimate of D (using JCand BJC methods) but there are costs associated with it, which are (i) increase in CPU time and (ii) lower values of R¯¯ θˆN , θ , which is not desirable in all practical purpose.
Lower simulated values of risk, R¯¯ θˆN , θ , or higher CPU time implies that more effort/time is required in terms of resources/days one needs to invest/spend to estimate the parameters of interest. This may not be warranted considering resource constraints and bottlenecks. Another interesting point worth mentioning are the values of the sampling operations (SO). If one wants to ensure a more efficient sample survey plan in terms of higher budget outlay (smaller value of w) and better estimation, then it would invariably mean that the experimenter needs to consider more number of sets of observations at any stage for any of the four adaptive sampling procedures discussed. JC and BJC sampling plans will invariably have more number of sample operations/sub-sample operations, as these two procedures come closer to estimating the optimal sample size, D, than either AS or BCJ sampling schemes. But it would cost us more time (CPU time)
R.N. Sengupta, A. Sengupta / Computational Statistics and Data Analysis 55 (2011) 3183–3196
3195
and entail allocation of higher amount of resource/time for the JC and BJC multistage sampling rules. This is clearly visible if one concentrates on decreasing the values of w , i.e., w → 0. Another interesting fact which comes out from these exhaustive simulation runs, is, a decrease in the value of w (increase of budgetary outlay) shows that using JC, irrespective of the distributions, the value of R¯¯ θˆN , θ , predicts the corresponding theoretical values (i.e., w ) more efficiently than the
corresponding values found using the AS sampling method. This is true even though the corresponding value of R¯¯ θˆN , θ
estimated using the AS method is able to predict w on an average with less percentage error. The same findings are applicable when we try to compare the results of estimated risk using BCJ with BJC sampling procedures. This is not apparent as the combinations of ρ, γ = (γ1 , γ2 , γ3 ) and ρ = (ρ1 , ρ2 , ρ3 ) shown in the tables are inadequate in number, which is due to limitation of page number and also the fact that the highlight of the research is to study four different sequential sampling schemes for four different distributions under two different loss functions. On the other hand the cost component, the best proxy for which is the CPU time in seconds, conveys the fact that a gain in efficiency for the estimation process invariably leads to an increase in costs, i.e., CPU time. The simulated value of risk also portrays very clearly that more time and effort is required to obtain a higher level of precision in estimation. Whether that is warranted depends on the different levels of resources one is willing to spend for such a work. Last but not the least, the average values of different sampling operations, i.e., SO, SO1 , SO2 , SO3 for BCJ and BJC clearly highlight
that a proper sampling plans in terms of efficiency of the final results, i.e., better estimate for D, w , etc., is of importance. A higher budget outlay, i.e., more CPU time usage, for better estimation would definitely mean that the experimenter will consider more number of sets of observations at any stage. This may be regulated using different practical combinations of ρ, γ = (γ1 , γ2 , γ3 ) and ρ = (ρ1 , ρ2 , ρ3 ). This fact is also corroborated if one considers that decreasing w , i.e., w → 0, will increase the estimation accuracy, but at a cost. Specific findings: Under the general findings we mentioned that for JC and BJC methods, N values are almost always equal to D. But from Tables 4(a), (b) and 9(a), (b) we get a different picture, which may be due to the specific combinations of ρ and ρ, γ = (γ1 , γ2 , γ3 ), ρ = (ρ1 , ρ2 , ρ3 ), using which we are unable to highlight the estimation power of JC and BJC methods. The second of the specific findings is related to Table 8(a) and (b). Due to the choice of w , the estimated risk is negative, which is an anomaly. The authors did exhaustive runs with many different combinations of w and a, but with fixed values of (α, β, γ ) as shown in the tables. One did obtain positive estimates of risk, i.e., R¯¯ θˆN , θ , but the optimal sample size in
that case is always 1, which is practically unappealing to study, given the main motivation of the paper. One may refer to the gamma distribution under Section 4 to have a better understanding of this aberration. Runs with different negative values of a (i.e., the underestimation case) and values of w , do result in positive estimates of risk, but here again the optimal sample size, D, is 1. The absence of run results for G(α, β, γ ) with negative values of a (LINEX loss function) is also left out as that also gave N = 1, which is not relevant for us. Finally, the presence of the gamma function in the formulation of LINEX risk for EVD case, prevents us with any simulation result to highlight the effect of underestimation, i.e., negative a’s, as we know that negative value of the gamma function, which is present as an integral part in the LINEX loss risk for the EVD distribution, do not exist. 6. Conclusions and future scope of research The basic idea, based on which this extensive simulation based research work were undertaken was to highlight the efficacy of different sequential sampling methodologies. Moreover we verified using extensive simulations runs, the efficiency of different multistage sampling methodologies. This becomes apparent on how the samples are taken, keeping in mind one wants to ensure the adherence of few theoretical properties for these different sampling schemes. In this work it is noticed that decreasing the level of bound on the risk invariably results in more precision in estimation. But the cost and time factors to complete the sampling plan are two of the most important issues one should keep in mind when designing any new sequential sampling procedure. It is true that first undertaking a jump and then crawl leads us to better estimation, but whether the asymptotic properties hold or whether it is at all practical (from time and resource angle) are two very important points which need to be considered in all theoretical and practical situations. The future extension of this study may be to consider other different types of distributions and check how good these different sampling techniques are for these distributions. Moreover emphasis can be laid to design new sampling schemes such that one may achieve greater accuracy of the population estimates without sacrificing too much on the cost front. In these cases it should be ensured that the estimates of population parameters, sample size, risk, etc., are close to the actual parameter values. Apart from this they should also adhere to the asymptotic properties of sequential sampling rules. Acknowledgments The authors would like to thank the Editor, Associate Editor and two anonymous referees for their critical and valuable comments/suggestions for the final version of this paper.
3196
R.N. Sengupta, A. Sengupta / Computational Statistics and Data Analysis 55 (2011) 3183–3196
References Alp, T., Demetrescu, M., 2010. Joint forecasts of Dow Jones stocks under general multivariate loss function. Computational Statistics and Data Analysis 54, 2360–2371. Bai, J., Perron, P., 2002. Computation and analysis of multiple structural change models. Journal of Applied Econometrics 18, 1–22. Basu, A.P., 1971. On a sequential rule for estimating the location parameter of an exponential distribution. Naval Research Logistics Quarterly 18, 329–337. Bollen, N.P.B., Gray, S.F., Whaley, R.E., 2000. Regime switching in foreign exchange rates: evidence from currency option prices. Journal of Econometrics 94, 239–276. Carvalhoa, C.M., Lopes, H.F., 2007. Simulation-based sequential analysis of Markov switching stochastic volatility models. Computational Statistics and Data Analysis 51, 4526–4542. Chattopadhyay, S., 2000. Sequential estimation of exponential location parameter using asymmetric loss function. Communications in Statistics—Theory and Methods 29, 783–795. Chattopadhyay, S., Chaturvedi, A., Sengupta, R.N., 2000. Sequential estimation of a linear function of normal means under asymmetric loss function. Metrika 52, 225–235. Chattopadhyay, S., Sengupta, R.N., 2006. Three-stage and accelerated sequential point estimation of the normal mean using LINEX loss function. Statistics 40, 39–49. Cui, Y., Fub, Y., Hussein, A., 2009. Group sequential testing of homogeneity in genetic linkage analysis. Computational Statistics and Data Analysis 53, 3630–3639. Ghosh, M., Mukhopadhyay, N., Sen, P., 1997. Sequential Estimation. John Wiley & Sons Inc., New York, NY, USA. Ghosh, B.K., Sen, P.K., 1991. Handbook of Sequential Analysis. Mercer Dekker. Govindarajulu, Z., 2004. Sequential Statistics. World Scientific Press. Hall, P., 1981. Asymptotic theory of triple sampling for sequential estimation of a mean. Annals of Statistics 9, 1229–1238. Hall, P., 1983. Sequential estimation saving sampling operations. Journal of the Royal Statistical Society: Series B 45, 219–223. Liu, W., 1997. Improving the fully sequential sampling scheme of Anscombe-Chow-Robbins. Annals of Statistics 25, 2164–2171. Mukhopadhyay, N., 1990. Some properties of a three-stage procedure with applications in sequential analysis. Sankhya A 52, 218–231. Mukhopadhyay, N., Datta, S., Chattopadhyay, S., 2004. Applied Sequential Methodologies: Real-world Examples with Data Analysis. Marcel Dekker Inc., New York, NY, USA, edited. Mukhopadhyay, N., de Silva, B.M., 2009. Sequential Methods and their Applications. CRC Press. Mukhopadhyay, N., Solanky, T.K.S., 1991. Second order properties of accelerated stopping times with application in sequential estimations. Sequential Analysis 10, 99–123. Mukhopadhyay, N., Solanky, T.K.S., 1994. Multistage Selection and Ranking Procedure: Second Order Asymptotics. Marcel Dekker Inc., New York, NY, USA. Orawo, L. A’ O., Christen, J.A., 2009. Bayesian sequential analysis for multiple-arm clinical trials. Journal Statistics and Computing 19, 99–109. Prado, R., Sequential estimation of mixtures of structured autoregressive models. Computational Statistics and Data Analysis, in press (doi:10.1016/j.csda.2011.03.017). Ray, W.D., 1957. Sequential confidence intervals for the mean of a normal population with unknown variance. Journal of the Royal Statistical Society: Series B 19, 133–143. Salvan, A., 1990. Planning sequential clinical trials: a review. Computational Statistics and Data Analysis 9, 47–56. Schmitz, N., 1993. Optimal Sequential Planned Decision Procedures. Springer-Verlag, New York, NY, USA. Sengupta, R.N., 2008. Use of asymmetric loss functions in sequential estimation problem for the multiple linear regression. Journal of Applied Statistics 35 (3), 245–261. Siegmund, D., 1985. Sequential Analysis: Tests and Confidence Intervals. Springer-Verlag, New York, NY, USA. Stein, C., 1945. A two sample test for a linear hypothesis whose power is independent of the variance. Annals of Mathematical Statistics 16, 243–258. Takada, Y., 2006. Multistage estimation procedures with bounded risk for the normal mean under LINEX loss function. Sequential Analysis 25, 227–239. Todd, S., Baksh, M.F., Whitehead, J., Sequential methods for pharmacogenetic studies. Computational Statistics and Data Analysis, in press (doi:10.1016/j.csda.2011.02.019). Wald, A., 1947. Sequential Analysis. John Wiley & Sons Inc., New York, NY, USA. Wald, A., Wolfowitz, J., 1945. Sampling inspection plans for continuous production which insure a prescribed limit on the outgoing quality. Annals of Mathematical Statistics 16, 30–49. Zacks, S., 2009. Stage-Wise Adaptive Designs. John Wiley & Sons. Zellner, A., 1986. Bayesian estimation and prediction using asymmetric loss function. Journal of American Statistical Association 81, 446–451. Zellner, A., 1994. Bayesian and Non-Bayesian estimation using balanced loss function. In: S. S., Gupta, J. O., Berger (Eds.), Statistical Decision Theory and Related Topics V. Springer-Verlag, New York, pp. 377–390. (Chapter 28).