Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686 www.elsevier.com/locate/jspi
Bounded risk estimation of linear combinations of the location and scale parameters in exponential distributions under two-stage sampling N. Mukhopadhyaya,∗ , S. Zacksb a Department of Statistics, University of Connecticut, Storrs, CT 06269-4120, USA b Department of Mathematical Sciences, Binghamton University, Binghamton, NY 13902-6000, USA
Available online 30 March 2007
Abstract Two-stage sampling is proposed for estimating linear combinations of the location and scale parameters of exponential distributions with bounded quadratic risk functions. Exact formulae for the expected values and risks of the estimators are derived, and the performance of estimators is studied. Illustrations with real data are included. © 2007 Elsevier B.V. All rights reserved. MSC: 62L05; 62L12 Keywords: Bounded risk; Mean; Quantiles; Two-parameter exponential distributions; Two-stage sampling
1. Introduction Sequentially estimating the mean time to failure (MTTF) in an exponential distribution is an important problem. This broad area of practical importance has attracted attentions from authors over the years including Basu (1971), Ghosh and Mukhopadhyay (1976), Ghurye (1958), Isogai and Uno (1994), Kubokawa (1989), Mukhopadhyay (1974, 1994), Mukhopadhyay and Datta (1995), Starr and Woodroofe (1972), Swanepoel and van Wyk (1982), Takada (1986), and Woodroofe (1977). These largely addressed an exponential distribution involving a single parameter or a negative exponential distribution involving two parameters. Recently, Zacks and Mukhopadhyay (2006a) gave a unified treatment for the exact evaluation of risks associated with various purely sequential point estimators of MTTF, the hazard rate, and the reliability function in an exponential distribution depending on a scale parameter. Earlier, Mukhopadhyay and Cicconetti (2000, 2005), and Mukhopadhyay and Duggan (2000) introduced classes of estimators for the reliability function after sequential procedures had terminated. Ghosh and Mukhopadhyay (1989, 1990) and Mukhopadhyay and Hilton (1986) investigated interesting problems in a two-parameter exponential distribution. The area was broadly reviewed by Mukhopadhyay (1988, 1995). Ghosh et al.’s (1997) book would provide an overall appraisal in the area of sequential estimation. In a different direction, Mukhopadhyay and Pepe (2006) have constructed a two-stage exact bounded risk estimation methodology for the mean in an exponential distribution involving a single parameter. Even though a preassigned ∗ Corresponding author. Tel.: +1 860 486 6144; fax: +1 860 486 4113.
E-mail addresses:
[email protected] (N. Mukhopadhyay),
[email protected] (S. Zacks). 0378-3758/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2007.03.042
N. Mukhopadhyay, S. Zacks / Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686
3673
risk-bound was achieved through their methodology, the two-stage procedure was oversampling by a large margin. Later, Zacks and Mukhopadhyay (2006b) have shown ways to maintain a preassigned risk-bound more tightly with substantial reduction in the average sample size. In the present paper, our aim is to produce two-stage methodologies for a class of exact bounded risk estimation problems in a two-parameter exponential distribution in the lights of Mukhopadhyay and Pepe (2006) and Zacks and Mukhopadhyay (2006b). The Expo(, ) model (1.1) has been used widely in many reliability and lifetesting experiments to describe, for example, the failure times of complex equipment, vacuum tubes, and small electrical components. One is referred to Johnson and Kotz (1970), Bain (1978), Lawless and Singhal (1980), Grubbs (1971) and other sources including Balakrishnan and Basu (1995). The Expo(, ) model has also been used in the areas of soil sciences and weed propagation. In such applications, the threshold parameter may stand for the minimum number of days a weed variety takes to germinate or spread in some area. Zou (1998) mentioned these applications. Nearly 40 years ago, the Expo(, ) model was also recommended by Zelen (1966) in some clinical trials for studying the behavior of certain type of tumor systems in animals. Zelen showed usefulness of this distribution in modeling the growth of certain cancerous tumors in the analysis of survival data in cancer research. This 1966 paper was a landmark in this area. Suppose that we have a sequence of independent random variables X1 , X2 , . . . having a common exponential distribution with the density function −1 exp(−(x − )/) if x > , (1.1) f (x; , ) = 0 elsewhere, involving two unknown parameters and with −∞ < < ∞ and 0 < < ∞. Here, and , respectively, refer to a location or a threshold parameter and a scale parameter. We abbreviate this distribution by Expo(, ). Our objective is to estimate a linear combination of and , namely ≡ (c, d) = c + d, for given coefficients c 0 and d > 0, with a bounded quadratic risk. Notice that the case of c = 0 and d > 0 is essentially the problem of estimating the scale parameter , and is a special case of the problem treated here. The case of c > 0 and d = 0 generally requires a smaller sample size and can be developed in a similar fashion. Interesting linear combinations of and include: the expected value = + ; the pth quantile xp = − ln(1 − p), 0 < p < 1; the standard deviation = . For a fixed sample of size n, it is well known (see Zacks, 1971, p. 31) that Xn:1 =min1 i n (Xi ) and Tn = ni=1 (Xi − Xn:1 ) are independent, (Xn:1 , Tn ) is minimal sufficient, and Xn:1 ∼ + (/n)G(1, 1) and Tn ∼ G(n − 1, 1), where G(k, 1) is a gamma (Erlang) random variable with scale parameter 1 and shape parameter k 1. The minimum variance unbiased estimator of (c, d) is ˆ n ≡ ˆ n (c, d) = cˆ n + d ˆ n , where ˆ n = Xn:1 − ˆ n /n and ˆ n = Tn /(n − 1). The quadratic risk of ˆ n (c, d), for some fixed coefficient A > 0 is R(ˆ n (c, d), ) = AE , {(ˆ n − )2 } 1 A2 2 2 (nd − c) = 2 c + n n−1 1 Ad 2 2 . +O = n n2
(1.2)
Note that the dominant term in (1.2) is independent of c since the variance of Xn:1 is of order O(1/n2 ). Our goal is to obtain a risk bounded by a constant , 0 < < ∞. As shown by Takada (1986) there is no fixed sample size procedure which can yield a preassigned bounded risk for all 0 < < ∞. We prove in the present paper that the two-stage procedure specified in Section 2 achieves this goal. In Section 3 we derive the distribution and the moments of the (random) sample size, N , of the two-stage procedure. In Section 4 we present explicit formulae for the expected value and quadratic risk of the estimator ˆ N (c, d) based on the two-stage sample. The quadratic risk of ˆ N is R(ˆ N (c, d), ) = AE , {(ˆ N − )2 )}.
(1.3)
3674
N. Mukhopadhyay, S. Zacks / Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686
This risk function does not depend on , when ˆ N is a location and scale equivariant estimator. After stopping, we use the estimator ˆ N =
∞
I {N = n}(cˆ n + d ˆ n ),
(1.4)
n=m
where I {·} is an indicator variable. We prove that if the initial sample size, m, is at least 4 then for all 0 < < ∞ (m − 1)2 Ad 2 lim sup R(ˆ N (c, d), ) , B(m − 2)(m − 3) →∞
(1.5)
where 0 < B < ∞ is a design constant specified in Section 2. Moreover, the risk R(ˆ N (c, d), ) is an analytic function of , converging to zero when approaches zero. Thus, the risk of ˆ N under a two-stage sampling is bounded for each fixed value of B. Moreover, for each , 0 < < ∞, there exists B ≡ B() for which sup0<<∞ R(ˆ N (c, d), ) = . The question is how to determine B()? In Section 2, we present a formula (2.4) for B for which R(ˆ N (c, d), ) is uniformly bounded by . With the aid of an exact formula of the risk, we present in Section 5 an algorithm for a numerical search of B(). In Section 6, we illustrate the procedure with real data. Proofs of some theorems are given in the Appendix. We remark the following about notation. We write E, {·} if the expected value depends on the two parameters. If it depends only on , we write E {·}. 2. The two-stage sampling procedure The following is the two-stage sampling procedure in the light of Stein (1945, 1949), Mukhopadhyay and Pepe (2006), and Zacks and Mukhopadhyay (2006b). Stage one: (i) Sample m (4) i.i.d. random variables, X1 , X2 , . . . , Xm from Expo(, ). (ii) Compute ˆ m =
1 (Xi − Xm:1 ). m−1 m
i=1
(iii) Compute the stopping variable 2
N = max{m, Bd 2 −1 ˆ m + 1}
(2.1)
for some B > 0, where a denotes the largest integer smaller than a. If N = m, stop sampling. ∗ Stage two: (i) If N > m, take additional (N − m) observations from i.i.d. Expo(, ), X1∗ , X2∗ , . . . , XN−m , which are conditionally, given N , independent of X1 , . . . , Xm . (ii) For the combined N observations, compute TN =
N
(Xi − XN:1 ),
i=1
ˆ N = TN /(N − 1), ˆ N = XN:1 −
ˆ N , N
(2.2)
and ˆ N = cˆ N + d ˆ N . Notice that the distribution of N depends only on , since ˆ m ∼ (/(m − 1))G(m − 1, 1). It is independent of , and from (2.1) we have lim N = ∞ a.s.
B→∞
and
lim N = ∞ a.s.
→0
(2.3)
N. Mukhopadhyay, S. Zacks / Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686
3675
Lemma 2.1. For each fixed , d and , the distribution of N is stochastically non-decreasing in B. Proof. If B1 < B2 then according to (2.1), for any > 0, {N1 > } ⊂ {N2 > } where the events {Ni > } = {max{m, Bi d 2 −1 ˆ m + 1} > } (i = 1, 2). Hence P {N1 > } P {N2 > }. Let = c/d. In Appendix A.1 we prove the following result. Theorem 2.1. Under the two-stage sampling procedure (2.1), for each fixed c, d and m 4, if B ≡ Bm (c, d) =
(2m3 + (m2 + 1) 2 )A , m(m − 2)(m − 3)
(2.4)
then R(ˆ N (c, d), ) for all 0 < < ∞. 3. The distribution of N Define the function j → m+j , where
m+j =
√ √ (m − 1) /B m + j , d
j = 0, 1, . . . .
(3.1)
According to (2.1) P {N = m} = P {G(m − 1, 1) m },
(3.2)
and for j 1 P {N = m + j } = P { m+j −1 < G(m − 1, 1) m+j }.
(3.3)
Let P (m; ) denote the c.d.f. of a Poisson distribution with mean . From the Poisson-Gamma relationship (see Kao, 1997, p. 50) we obtain from (3.1) to (3.3). Theorem 3.1. For each fixed m, , B, d and , P {N = m} = 1 − P (m − 2; m ),
(3.4)
and for j 1, P {N = m + j } = P (m − 2; m+j −1 ) − P (m − 2; m+j ).
(3.5)
Proof. Since Tm ∼ G(m − 1, 1) we obtain from (3.2) P {N = m} = P {G(m − 1, 1) m } = 1 − P (m − 2; m ). In a similar way, we can prove (3.5).
From (3.5) we immediately obtain P {N m + j } = P (m − 2; m+j −1 ).
(3.6)
3676
N. Mukhopadhyay, S. Zacks / Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686
Table 1 Expected values and quantiles of N , when m = 10, d = 1, B = 3.5, = 2
E {N}
p = 0.25
p = 0.50
p = 0.75
p = 0.95
5 10 15
49.18 194.84 437.45
26 102 228
41 163 366
64 253 568
113 451 1013
Notice that B → m+j −1 is a decreasing function of B. Thus, since the Poisson c.d.f. is a decreasing function of , Eqs. (3.1) and (3.6) also imply that N is stochastically increasing in B, as in Lemma 2.1. The following is an immediate result: Theorem 3.2. The rth moment of N is E {N r } = mr +
∞
((m + j + 1)r − (m + j )r )P (m − 2; m+j ).
(3.7)
j =0
As a special case, we immediately obtain the formula for the expected sample size, that is E {N } = m +
∞
P (m − 2; m+j ).
(3.8)
j =0
Since (j/j )P (m − 2; ) = −p(m − 2; ), where p(j ; ) is the p.m.f. of the Poisson distribution with mean , we obtain ∞
m−1 j E {N } = p(m − 1; m+j ). jB 2B
(3.9)
j =0
Since (3.9) is positive, E {N } is a strictly increasing function of B. This also follows from Lemma 2.1, in view of monotone convergence theorem. Finally, from (3.6) we obtain that the pth quantile of N is Np = least j m
such that P (m − 2; m+j ) 1 − p.
(3.10)
In Table 1 we present the expected values and quantiles of N , for some special cases, which illustrates the positive skewness of the distribution of N . 4. Expected values and risks of estimators ∗ As before, let us denote by X1 , . . . , Xm the first stage observed random variables and by X1∗ , . . . , XN−m the second ∗ stage ones. Let Xm:1 < Xm:2 < · · · < Xm:m denote the order statistics of the first stage sample, and XN−m:1 <··· ∗ those of the second stage. Let Ui = Xi − Xm:1 , i = 1, . . . , m and Um = (U1 , . . . , Um ), the < XN−m:N−m ∗ ∗ ∗ field generated by (U1 , . . . , Um ). Furthermore, let XN :1 = min(Xm:1 , XN−m:1 ) and TN−m = N−m j =1 (Xj − XN−m:1 ). N As before, we continue to denote by TN the statistic i=1 (Xi − XN:1 ).
Lemma 4.1. Given {N = n}, XN:1 and ˆ N are conditionally independent, and XN:1 | Um ∼ +
G(1, 1), N
(4.1)
N. Mukhopadhyay, S. Zacks / Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686
3677
and 1 Tm + I (N > m) (Tm + G∗ (N − m, 1)), ˆ N | Um ∼ I (N = m) m−1 N −1
(4.2)
where G∗ (N − m, 1) is a random variable having an Erlang (N − m) distribution, independently of Um . ∗ ∗ Proof. Recall that XN:1 = min(Xm:1 , XN−m:1 ), Xm:1 is independent of Um , and XN−m:1 is conditionally independent of Um . Also,
Xm:1 | Um ∼ +
G1 (1, 1), m
and ∗ XN−m:1 | Um ∼ +
G2 (1, 1), N −m
where G1 (1, 1) and G2 (1, 1) are two independent standard exponential random variables. This implies (4.1). To show (4.2), we use an embedding technique of Lombard and Swanepoel (1978), Swanepoel and van Wyk (1982) and Mukhopadhyay (1982). Notice that the distribution of ˆ N depends only on . 4.1. The expected value and risk of ˆ N (c = 0, d = 1) According to (4.2), 1 G(m − 1, 1) ˆ + I (N > m) · · (G(m − 1, 1) + (N − m)) . E {N | Um } = I (N = m) m−1 N −1
(4.3)
Similarly, 2 G2 (m − 1, 1) 1 2 ˆ E {N | Um } = I (N = m) + I (N > m) 2 (m − 1) (N − 1)2
2 × G (m − 1, 1) + 2(N − m)G(m − 1, 1) + (N − m)(N − m + 1) .
(4.4)
We note at this point that Eqs. (4.3) and (4.4) are obtained also in the following manner. In the case of {N > m}, one can write ∗ ∗ ∗ TN = Tm + TN−m + m(Xm:1 − XN−m:1 )I {Xm:1 > XN−m:1 } ∗ ∗ + (N − m)(XN−m:1 − Xm:1 )I {Xm:1 < XN−m:1 }. ∗ ∗ ∗ Since Xm:1 , XN−m:1 and TN−m are conditionally independent given Um ; XN−m:1 | Um ∼ + (/(N − m))G(1, 1), ∗ and TN−m | Um ∼ G(N − m − 1, 1), we obtain after some tedious computations formula (4.3) and (4.4), from which we obtain the expected value and second moment of ˆ N . From these we get the following theorem:
Theorem 4.1. The expected value of ˆ N is ⎛ ⎞ ∞ m−1 p(m − 1; m+j )⎠ . E {ˆ N } = ⎝1 − (m + j − 1)(m + j ) j =0
(4.5)
3678
N. Mukhopadhyay, S. Zacks / Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686
Table 2 Expected value of N and R(ˆ N ; ) for m = 6, d = 1, A = 1, = 2
E {N} B = 2.604
Risk B = 2.604
E {N} B = 2.68
Risk B = 2.68
5 10 15 20
39.67 156.34 334.42 518.49
1.7591 2.0597 1.9134 1.7761
40.81 160.83 342.74 528.26
1.7264 1.9983 1.8528 1.7188
The quadratic risk of ˆ N (c = 0, d = 1) is ⎡ ∞ 1 R(ˆ N , ) = A2 ⎣ (1 − P (m − 2; m+j )) (m + j − 1)(m + j ) j =0
+
∞ (m − 1)(m − 2)(2(m + j ) + 1)
(m + j − 1)2 (m + j )2
j =0
−
∞ m(m − 1)(2(m + j ) + 1) j =0
(m + j − 1)2 (m + j )2
p(m − 1; m+j ) ⎤
p(m; m+j )⎦ .
(4.6)
Proof. See Appendix A.2. The bias of ˆ N , Bias(ˆ N , ) = E {ˆ N } − , is negative. Also, one obtains from (4.5) that lim Bias(ˆ N , ) = 0
→∞
if m > 2.
(4.7)
Let S1 (m, ) = A2
∞ j =0
1 (1 − P (m − 2; m+j )) (m + j − 1)(m + j )
(4.8)
be the first term on the r.h.s. of (4.6). We prove in Appendix A.3 the following: Theorem 4.2. lim S1 (m, )
→∞
(m − 1)2 A . (m − 2)(m − 3)B
(4.9)
The sum of the other two terms of (4.6) contributes generally very little to the risk, and converges to zero as → ∞. If we set the r.h.s. of (4.9) equal to , we get B∗ =
(m − 1)2 A . (m − 2)(m − 3)
(4.10)
This is less than 21 of the value of (2.4) when = 0 and d = 1. As we show in Section 5, for estimating alone (c = 0, d = 1), if we take B 1.25B ∗ , the risk R(ˆ N , ) is uniformly bounded by . For example, if A = 1, d = 1, m = 6 we have B ∗ = 2.083 and B = 2.604. In Table 2 we present the expected value of N and the risk of ˆ N for B = 2.604 and for B = 2.68. We see that correct value of B for this case is B = 2.68. The value of B according to (2.4) is B = 6, which is way too large.
N. Mukhopadhyay, S. Zacks / Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686
3679
4.2. The expected value and risk of ˆ N (c > 0, d > 0) We have seen that ˆ N = cˆ N + d ˆ N = cX N:1 + (d − Nc )ˆ N . We prove in Appendix A.4 that the following result holds. Theorem 4.3. For given c > 0 and d > 0, E, {ˆ N } = c + d − (m − 1)
∞ j =0
d(m + j + 1) − 2c p(m − 1; m+j ) (m + j − 1)(m + j )(m + j + 1)
(4.11)
and ⎡ R(ˆ N (c, d), ) = Ad 2 2 ⎣
∞ j =0
(m + j − 1) + 2( − 1)2 · (1 − P (m − 2; m+j )) (m + j − 1)(m + j )(m + j + 1)
+ (m − 1)(m − 2)
∞ (m + j + 1 − 2 )(2(m + j )(m + j − ) + (m + j − 1)) j =0
(m + j − 1)2 (m + j )2 (m + j + 1)2
× p(m − 1; m+j ) − m(m − 1)
∞ (m + j + 1 − 2 )(2(m + j )(m + j − ) + (m + j − 1))
(m + j − 1)2 (m + j )2 (m + j + 1)2
j =0
⎤
×p(m; m+j )⎦ ,
(4.12)
where = c/d. Notice that if we substitute in (4.11) and (4.12) = 0 (c = 0) and d = 1 we obtain (4.5) and (4.6). Substituting in (4.11) c = 1 and d = 0, we get the expected value of ˆ N , which is E, {ˆN } = + 2(m − 1)
∞ j =0
1 p(m − 1; m+j ). (m + j − 1)(m + j )(m + j + 1)
(4.13)
The mean-squared-error of ˆ N is:
E {(ˆN − )2 } = 2
⎡ ∞
2⎣
j =0
1 (1 − P (m − 2; m+j )) (m + j − 1)(m + j )(m + j + 1)
+ 2(m − 1)(m − 2)
− 2m(m − 1)
∞
1
j =0
(m + j − 1)2 (m + j )(m + j + 1)2
p(m − 1; m+j ) ⎤
∞
1
j =0
(m + j − 1)2 (m + j )(m + j + 1)2
p(m; m+j )⎦ .
(4.14)
3680
N. Mukhopadhyay, S. Zacks / Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686
5. A numerical search for B() Given , we propose the following numerical algorithm to search for B(). Step 1. Compute B ≡ Bm (c, d) according to (2.4), Step 2. Start the search with B0 = Bm (c, d)/3. Step 3. Compute R0 () = R(ˆ N (c, d), ) according to (4.12) substituting B0 in (3.3) for m+j , and find ∗ for which R0 () is maximized. That is, R0∗ = R0 (∗ ). Step 4. If R0∗ < , decrease B0 and go back to Step 3. If R0∗ > , increase B0 and go back to Step 3. If R0∗ ≈ , stop. We illustrate this numerical algorithm with several examples. Example 5.1. We wished to estimate = + . In this case ˆ N = ˆ N + N . With m = 10, A = 2 and = 2, we had from (2.4) B10 (1, 1) = 7.5036. We started with B0 = 2.5. For ∗0 = 8 we obtained R0∗ = 2.760. Thus, B had to be increased. We set B1 = 3.5 and found ∗1 = 6, R1∗ = 1.9676. This was close enough to = 2, and we stopped. Example 5.2. We wished to estimate the median, + ln(2). Here c = 1 and d = ln(2) = 0.693. With A = 2, m = 10 and = 2, we obtained from (2.4) B10 (1, 0.693) = 8.2523. We started with B0 = 2.75 and found R0∗ = 2.477. We set B1 = 3.1 and got R1∗ = 2.275. We thus increased B to B2 = 3.5 and found R2∗ = 1.948. Finally, B2 was slightly decreased. For B3 = 3.44, we had R3∗ = 1.9824. This was close enough. Example 5.3. We wished to estimate =5+. Here c =5, d =1. With A=2, m=10, =2, we had B10 (5, 1)=16.16. Starting with B0 =5.387 we got R0∗ =1.9005. With B1 =5.0 we found R1∗ =2.0266. With B2 =5.2, we got R2∗ =1.9664. This was close enough to and we stopped. 6. Illustrations with real data In this section, we provide some illustrations with real data sets employing the two-stage procedure (2.1) and B according to (2.4). However, we have refrained from giving every detail of our analysis. As an illustration, we went back to a celebrated data set on failure times (in hours) for a fleet of Boeing 720 jet airplanes that was reported in Table 1 of Proschan (1963). We especially considered the data reported for the plane #7909 for which 29 observations were reported. As we fitted the exponential model, Expo(, ), we found = 9.8, = 76.125 and mean 83.5. We went on to implement the proposed methodology on the permuted data set. Example 6.1. We first decided to estimate the mean (≡ + ) of the distribution with Ln (ˆ n , ) ≡ A(ˆ n − )2 where ˆ n was reduced to Xn since we had c = d = 1. We fixed A = 0.01, = 10, and m = 5. From (2.4) we found B = B5 (1, 1) = 0.092. By the way, the pilot observations were 60, 44, 130, 61, 26 which gave the average 64.2 and 2 the minimum 26.0. Thus, we had ˆ 5 = 45 (64.2 − 26.0) ≈ 47.75 which led to Bd 2 −1 ˆ m = (0.092)(47.75)2 /10 ≈ 20.977 ⇒ N = 21. We needed 16 new observations which were 25, 186, 10, 90, 44, 79, 70, 208, 118, 62, 208, 76, 29, 49, 101, 84. The combined data set from both stages with 21 observations gave the sample mean 83.8 which was the observed value of ˆ N . So, we estimated the average failure time as 83.8 h. Example 6.2. Again, in order to estimate the third or upper quartile (≡ + ln(4)) with Ln (ˆ n , ) ≡ A(ˆ n − )2 , we had c = 1, d = ln(4). We fixed A = 0.01, = 50, and m = 5. From (2.4), we had found B ∗ = 0.092. The same pilot observations from Example 6.1 led to N = 21. Then, the same 16 additional observations obtained in Example 6.1 combined with the pilot observations gave the sample average 83.8 and the minimum 10.0. Thus, we had 21 1 ˆ 21 = 20 (83.8 − 10.0) ≈ 77.49 so that the observed value of ˆ N was 10.0 + (ln(4) − 21 )77.49 ≈ 113.73 h. Acknowledgments We are grateful to the referee for helpful suggestions.
N. Mukhopadhyay, S. Zacks / Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686
3681
Appendix A A.1. Proof of Theorem 2.1 Since
c ˆ c ˆ N − = c(XN:1 − ) + d − (N − ) − , N N
we obtain c 2 ˆ (N − )2 (ˆ N − )2 = c2 (XN:1 − )2 + d − N c2 c2 c
(XN:1 − )(ˆ N − ) − 2 + 2 2 + 2c d − N N N
c c × (XN:1 − ) − 2 d− (ˆ N − ). N N
(A.1)
We have seen that XN:1 and ˆ N are conditionally independent, given Um . Thus, by (4.1) and iterated expectation we obtain c 2 ˆ 1 2 2 2 2 ˆ E {(N − ) } = E d − . (A.2) (N − ) + c E N N2 Also, since (d − c/N )2 d 2 + c2 /N 2 we have 1 c2 2 2 2 2 2 ˆ ˆ E {(N − ) } E d + 2 (N − ) + c E . N N2
(A.3)
According to (4.2) we can write m−1 ¯ N − m ¯∗ ˆ N = , Zm−1 + Z N −1 N − 1 N−m
(A.4)
∗ is the mean of (N − m) i.i.d. where Z¯ m−1 is the mean of (m − 1) i.i.d. G(1, 1) random variables, and Z¯ N−m ∗ G(1, 1) random variables. Moreover, Z¯ m−1 and Z¯ N−m are conditionally independent, given Um . Thus, given Um , 2 ∗ since Var {Z¯ N −m } = /(N − m), 2 m − 1 2 1 c N − m 2 2 2 2 2 d + 2 + Ac (Z¯ m−1 − ) + E R(ˆ N (c, d), )AE N N −m N2 (N − 1)2
= A{I + II + III},
say.
(A.5)
2 Notice that (m − 1)/(N − 1) < m/N 1 a.s., and also, N Bd 2 −1 Z¯ m−1 a.s., according to (2.1). According to (A.5) and the above: c 2 m2 ¯ 2 2 I E d + 2 (Zm−1 − ) N N2 c 2 m2 ¯ 2 2 E d + 2 (Zm−1 − ) m N2 c 1 ¯ E md 2 + (Zm−1 − )2 m N 1 c2 −2 d 2m + (A.6) B −1 2 E Z¯ m−1 (Z¯ m−1 − )2 . m d
3682
N. Mukhopadhyay, S. Zacks / Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686
Moreover, −2 (Z¯ m−1 E {Z¯ m−1
− ) } = E 2
1−
= 1 − 2E =1−2
2
Z¯ m−1
m−1 G(m − 1, 1)
+E
(m − 1)2 G2 (m − 1, 1)
(m − 1)2 m+1 m−1 + = . m − 2 (m − 2)(m − 3) (m − 2)(m − 3)
Thus, from (A.6) to (A.7), c2 (m + 1) . I d 2m + m Bd 2 (m − 2)(m − 3)
(A.7)
(A.8)
Similarly,
N −m c2 II E d + 2 N (N − 1)2 c2 (m − 1)2 2 E d m+ m (m − 1)Bd 2 G2 (m − 1, 1) c2 (m − 1) 2 . = d m+ 2 m Bd (m − 2)(m − 3) 2
2
(A.9)
Finally, c2 2 1 E m N c2 (m − 1)2 E G2 (m − 1) mBd 2
III
=
c2 (m − 1)2 . mBd 2 (m − 2)(m − 3)
(A.10)
Combining (A.8)–(A.10) we get A(I + II + III)
A(2m3 + (m2 + 1) 2 ) . Bm(m − 2)(m − 3)
(A.11)
Equating the r.h.s. of (A.11) to and solving for B we get (2.4). A.2. Proof of Theorem 4.1 From (4.3) and iterated expectation, ⎡ m m+j ∞ 1 1 x m−1 e−x dx + x m−1 e−x dx E {ˆ N } = ⎣ (m − 1)(m − 2)! 0 (m + j − 1)(m − 2)! m+j −1 j =1
+
∞ j =1
⎤ m+j j x m−2 e−x dx ⎦ . (m + j − 1)(m − 2)! m+j −1
(A.12)
N. Mukhopadhyay, S. Zacks / Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686
According to the Poisson–Gamma relationship: m+j 1 x m−1 e−x dx = P (m − 1; m+j −1 ) − P (m − 1; m+j ). (m − 1)! m+j −1
3683
(A.13)
Thus, ⎡ E {ˆ N } = ⎣(1 − P (m − 1; m ))
+ (m − 1)
∞
(P (m − 1; m+j −1 ) − P (m − 1; m+j ))
j =1
+ (m − 1)
∞ j =1
1 m+j −1
⎤ j (P (m − 2; m+j −1 ) − P (m − 2; m+j ))⎦ . m+j −1
(A.14)
The first term of (A.14) is I = (1 − P (m − 1; m )). The second term of (A.14) is equivalent to ∞
II =
m−1 m−1 P (m − 1; m+j ). P (m − 1; m ) − (m + j + 1)(m + j ) m
(A.15)
j =1
The third term is equal to ∞
III =
m−1 P (m − 2; m+j −1 ) + P (m − 2; m+j ). m (m + j − 1)(m + j )
(A.16)
j =1
Finally, since P (m − 1; m+j ) = P (m − 2; m+j ) + p(m − 1; m+j ), adding I + II + III we get (4.5). 2 To determine E {(ˆ N − )2 } = E {ˆ N } − 2E {ˆ N } + 2 , we compute first the second moment of ˆ N . According to (4.4), 2
E {ˆ N } = 2 [I + II + III + IV], where
G2 (m − 1, 1) m = (1 − P (m; m )). I = E I (N = m) 2 m−1 (m − 1)
(A.17)
The second term is G2 (m − 1, 1) m−1 II = E I (N > m) P (m; m ) = 2 m (N − 1) − m(m − 1)
∞
2(m + j ) + 1
j =1
(m + j − 1)2 (m + j )2
P (m; m+j ).
(A.18)
The third term is N −m G(m − 1, 1) III = E 2I (N > m) (N − 1)2 ∞
=2
(m − 1)(j (j + 1) − (m − 1)2 ) m−1 P (m − 1;
) − 2 P (m − 1; m+j ). m m2 (m + j − 1)2 (m + j )2 j =1
(A.19)
3684
N. Mukhopadhyay, S. Zacks / Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686
Finally, IV = E I (N > m)
(N − m − 1)(N − m + 2) + 2
(N − 1)2 ∞
=
(j + 1)(j (2m − 3) + 2(m − 1)2 ) 2 · P (m − 2; m+j ). P (m − 2; m ) + 2 2 2 m (m + j − 1) (m + j ) j =1
(A.20)
Adding the coefficients of (A.17)–(A.20) we get the coefficients of (1 − P (m − 2; m+j )) as in (4.6). The coefficients of p(m − 1; m+j ) are obtained by adding (A.17)–(A.19). The coefficients of p(m; m+j ) are obtained by adding those of (A.17)–(A.18). A.3. Proof of Theorem 4.2 According to (4.8) 1 A2
S1 (m, ) =
∞ j =0
1 (m + j − 1)(m + j )(m − 2)!
m+j
x m−2 e−x dx.
(A.21)
0
Notice that ∞ j =0
∞ 1 1 1 1 = − = . (m + j − 1)(m + j ) m+j −1 m+j m−1 j =0
Let = (m − 1)1/2 /B 1/2 . Then m+j = (/)(m + j )2 . It follows that x m+j if, and only if, j x 2 2 /2 − m. Thus, ⎛ ⎞ ∞ 1 1 1 ⎝ ⎠ x m−2 e−x dx S1 (m, ) = (m − 2)! 0 (m + j − 1)(m + 1) A2 2 2 m+j x 2 /
2 2 (m − 2)!
∞ 0
x m−2 e−x (x 2 − 2 /2 )
dx.
(A.22)
From (A.22) we obtain for fixed m lim S1 (m, )
→∞
A2 (m − 4)! A(m − 1)2 = . (m − 2)! (m − 2)(m − 3)B
(A.23)
This proves (4.9). A.4. Proof of Theorem 4.3 R(ˆ N (c, d), ) = c2 AE {(ˆN − )2 } + 2cdAE {(ˆN − )(ˆ N − )} + d 2 AE {(ˆ N − )2 }.
(A.24)
The mean-squared-error of ˆ N is given in (4.6). We have to determine only the first and second terms on the r.h.s. of (A.24). Notice that 2 ˆ N ˆ N 2 2 E {(ˆN − ) } = E (XN:1 − ) − 2E (XN:1 − ) . (A.25) + E N N2
N. Mukhopadhyay, S. Zacks / Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686
3685
Furthermore, E {(XN:1 − )2 } = 22 E
1 . N2
(A.26)
Along the lines of the method in Appendix A.2, we can prove E
1 N2
=
∞
2(m + j ) + 1
j =0
(m + j − 1)2 (m + j )2
(1 − P (m − 2; m+j )).
(A.27)
Also, since XN:1 is conditionally independent of ˆ N ,
ˆ 2E (XN:1 − ) N N
2
= 2 E I (N = m) +I (N > m) ⎡ = 2
−
2⎣
G(m − 1, 1) m2 (m − 1)
1 (G(m − 1, 1) + (N − m)) N 2 (N − 1)
∞
2(m + j ) + 1
j =0
(m + j + 1)2 (m + j )2
(1 − P (m − 2; m+j )) ⎤
∞
(3(m + j ) + 1)(m − 1)
j =0
(m + j − 1)(m + j )2 (m + j + 1)2
p(m − 1; m+j )⎦ .
(A.28)
Finally, E
2 ˆ N N2
⎡ = 2 ⎣
∞
1 1 −2 P (m − 2; m+j ) m(m − 1) (m + j − 1)(m + j )(m + j + 1) j =0
− 2(m − 1)
∞
(m + 2j )2 − (j − 1)2 + 2m
j =0
(m + j − 1)2 (m + j )2 (m + j + 1)2
− 4m(m − 1)
p(m − 1; m+j ) ⎤
∞
1
j =0
(m + j − 1)2 (m + j )(m + j + 1)2
p(m; m+j )⎦ .
(A.29)
Adding all these results we obtain (4.14). In a similar fashion we prove that ⎡ 2E {(ˆN − )(ˆ N − )} = − 22 ⎣2
∞ j =0
+
− These expressions lead to (4.12).
1 (1 − P (m − 2; m+j )) (m + j )(m + j − 1)(m + j + 1)
∞
(m − 1)(m − 2)(3(m + j ) − 1)
j =0
(m + j )2 (m + j − 1)2 (m + j + 1)
∞
m(m − 1)(3(m + j ) − 1)
j =0
(m + j )2 (m + j + 1)2 (m + j + 1)
p(m − 1; m+j ) ⎤ p(m; m+j )⎦ .
(A.30)
3686
N. Mukhopadhyay, S. Zacks / Journal of Statistical Planning and Inference 137 (2007) 3672 – 3686
References Bain, L.J., 1978. Statistical Analysis of Reliability and Life Testing Models. Marcel Dekker, New York. Balakrishnan, N., Basu, A.P., 1995. The Exponential Distribution, edited volume. Gordon and Breach, Amsterdam. Basu, A.P., 1971. On a sequential rule for estimating the location parameter of an exponential distribution. Naval Res. Logist. Quarterly 18, 329–337. Ghosh, M., Mukhopadhyay, N., 1976. On two fundamental problems of sequential estimation. Sankhya Ser. B 38, 203–218. Ghosh, M., Mukhopadhyay, N., 1989. Sequential estimation of the percentiles of exponential and normal distributions. South African Statist. J. 23, 251–268. Ghosh, M., Mukhopadhyay, N., 1990. Sequential estimation of the location parameter of an exponential distribution. Sankhya Ser. A 52, 302–312. Ghosh, M., Mukhopadhyay, N., Sen, P.K., 1997. Sequential Estimation. Wiley, New York. Ghurye, S.G., 1958. Note on sufficient statistics and two-stage procedures. Ann. Math. Statist. 29, 155–166. Grubbs, F.E., 1971. Approximate fiducial bounds on reliability for the two parameter negative exponential distribution. Technometrics 13, 873–876. Isogai, E., Uno, C., 1994. Sequential estimation of a parameter of an exponential distribution. Ann. Inst. Statist. Math. 46, 77–82. Johnson, N.L., Kotz, S., 1970. Continuous Univariate Distributions, vol. 2. Wiley, New York Kao, E.P.C., 1997. An Introduction to Stochastic Processes. Duxbury Press, New York. Kubokawa, T., 1989. Improving on two-stage estimators for scale families. Metrika 36, 7–13. Lawless, J.F., Singhal, K., 1980. Analysis of data from lifetest experiments under an exponential model. Naval Res. Logist. Quarterly 27, 323–334. Lombard, F., Swanepoel, J.W.H., 1978. On finite and infinite confidence sequences. South African Statist. J. 12, 1–24. Mukhopadhyay, N., 1974. Sequential estimation of location parameter of an exponential distribution. Calcutta Statist. Assoc. Bull. 23, 85–93. Mukhopadhyay, N., 1982. A study of the asymptotic regret while estimating the location of an exponential distribution. Calcutta Statist. Assoc. Bull. 31, 201–213. Mukhopadhyay, N., 1988. Sequential estimation problems for negative exponential populations. Comm. Statist. Theory Methods 17, 2471–2506. Mukhopadhyay, N., 1994. Improved sequential estimation of means of exponential distributions. Ann. Inst. Statist. Math. 46, 509–519. Mukhopadhyay, N., 1995. Two-stage and multi-stage estimation. In: Balakrishnan, N., Basu, A.P. (Eds.), The Exponential Distribution: Theory, Methods and Application. Gordon and Breach Publishers, Amsterdam, pp. 429–452 (Chapter 26). Mukhopadhyay, N., Cicconetti, G., 2000. Estimation of the reliability function after sequential experimentation. In: Nikulin, M., Limnios, N. (Eds.), Second International Conference on Mathematical Methods in Reliability, Abstract Book 2. Bordeaux, Universite Victor Segalen, pp. 788–791. Mukhopadhyay, N., Cicconetti, G., 2005. Estimating reliabilities following purely sequential sampling from exponential populations. In: Balakrishnan, N., Kannan, N., Nagaraja, H.N. (Eds.), Advances in Ranking and Selection, Multiple Comparisons, and Reliability. S. Panchapakesan’s 70th Birthday Festschrift. Birkhauser, Boston, pp. 303–332. Mukhopadhyay, N., Datta, S., 1995. On fine-tuned bounded risk sequential point estimation of the mean of an exponential distribution. South African Statist. J. 29, 9–27. Mukhopadhyay, N., Duggan, W.T., 2000. New results on two-stage estimation of the mean of an exponential distribution. In: Basu, A.K., Ghosh, J.K., Sen, P.K., Sinha, B.K. (Eds.), Perspectives in Statistical Science. Oxford University Press, New Delhi, pp. 219–231 (Chapter 19). Mukhopadhyay, N., Hilton, G.F., 1986. Two-stage and sequential procedures for estimating the location parameter of a negative exponential distribution. South African Statist. J. 20, 117–136. Mukhopadhyay, N., Pepe, W., 2006. Exact bounded risk estimation when the terminal sample size and estimator are dependent: The exponential case. Sequential Anal. 25, 85–101. Proschan, F., 1963. Theoretical explanation of observed decreasing failure rate. Technometrics 5, 375–383. Starr, N., Woodroofe, M., 1972. Further remarks on sequential estimation: The exponential case. Ann. Math. Statist. 43, 1147–1154. Stein, C., 1945. A two sample test for a linear hypothesis whose power is independent of the variance. Ann. Math. Statist. 16, 243–258. Stein, C., 1949. Some problems in sequential estimation (abstract). Econometrica 17, 77–78. Swanepoel, J.W.H., van Wyk, J.W.J., 1982. Fixed-width confidence intervals for the location parameter of an exponential distribution. Comm. Statist. Theory Methods 11, 1279–1289. Takada, Y., 1986. Non-existence of fixed sample size procedures for scale families. Sequential Anal. 5, 93–101. Woodroofe, M., 1977. Second-order approximations for sequential point and interval estimation. Ann. Statist. 5, 984–995. Zacks, S., 1971. The Theory of Statistical Inference. Wiley, New York. Zacks, S., Mukhopadhyay, N., 2006a. Exact risks of sequential point estimators of the exponential parameters. Sequential Anal., Milton Sobel Memorial Issue 25, 203–226. Zacks, S., Mukhopadhyay, N., 2006b. Bounded risk estimation of the exponential parameter in a two-stage sampling Sequential Anal. 25, 437–452. Zelen, M., 1966. Application of exponential models to problems in cancer research. J. Roy. Statist. Soc. 129, 368–398. Zou, G., 1998. Weed population sequential sampling plan and weed seedling emergence pattern prediction. Ph.D. Thesis, Department of Plant Science, University of Connecticut, Storrs.