Journal of Statistical Planning and Inference 129 (2005) 3 – 18 www.elsevier.com/locate/jspi
On empirical Bayes procedures for selecting good populations in a positive exponential family夡 Shanti S. Gupta, Jianjun Li∗ Department of Statistics, Purdue University, West Lafayette, IN 47907,USA Available online 21 August 2004
Abstract This paper deals with the problem of selecting good populations with respect to a control from k( 2) populations. The random variable associated with population i is assumed to be positivevalued and has density of form f (xi |i ) = c(i ) exp(−xi /i )h(xi ) with unknown parameter i , for each i = 1, . . . , k. A nonparametric empirical Bayes approach is used to construct the selection procedures. The performance of the procedures isinvestigated by studying their regret Bayes risks rn . Under the simple assumption on the prior Gi of 3 Gi () < ∞, we show that nr n → m, where m is a constant that can be calculated explicitly. The result indicates that the rate O(n−1 ) can be easily obtained for nonparametric procedures under weak conditions. © 2004 Published by Elsevier B.V. MSC: primary 62F07; secondary 62C12 Keywords: Empirical Bayes; Selection procedures; Positive exponential family; Regret Bayes risk; Rate of convergence
1. Introduction and formulation In this paper, we are interested in the problem of simultaneous inference and selection from among k( 2) populations in comparison with a standard or control. The populations 夡 This research was supported in part by USArmy Research Office, Grant DAAH 04-95-1-0165 and Grant DAAD 19-00-1-0502 at Purdue University. The authors are very thankful to Professor Friedrich Liese and Professor Tachen Liang for their helpful comments which led to improvements in the paper. ∗ Corresponding author. Merck Research Labs, UN-A102, 785 Jolly Road, Bldg C, Blue Bell, PA 19422, USA. Tel.: +1-484-344-2101; fax: +1-484-344-7105. E-mail address:
[email protected] (J. Li).
0378-3758/$ - see front matter © 2004 Published by Elsevier B.V. doi:10.1016/j.jspi.2004.06.036
4
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
are denoted by 1 , . . . , k . The random variable Xi associated with i has the density f (xi |i ) = c(i )e−xi /i h(xi ) with h(x) > 0 in (0, ∞) and h(x) = 0 in (−∞, 0) , where the unknown parameter i is the characterization of population i . Let 0 be a known control. In practical situations, we desire to differentiate between good and bad populations and select good ones and exclude bad ones. Here a population i is said to be good if i 0 and bad otherwise. This type of decision problem has been considered by many authors. For example, see early papers: Paulson (1952), Dunnett (1955), Gupta and Sobel (1958), Lehmann (1961), and later: Gupta and Hsiao (1983), and more recently: Gupta and Liang (1999), among others. Let = { = {1 , . . . , k } : i > 0, i = 1, 2, . . . , k} be the parameter space. Let A = { a = {a1 , . . . , ak } : ai = 0 or 1, i = 1, . . . , k} be the action space, where ai = 1 means that population i is selected as good, ai = 0 means population i is excluded as bad. The loss function we use is L( , a) =
k
l(i , ai )
(1.1)
i=1
with l(i , ai ) = ai i (0 − i )I[i <0 ] + (1 − ai )i (i − 0 )I[i 0 ] .
1 , . . . , k are We also assume that i is a realization of a random variable i , and independently distributed with priors G1 , . . . , Gk , respectively. Let G = ki=1 Gi (i ). = (X1 , . . . , Xk ) and X be the sample space of X. Here Xi may be thought of as a Let X sufficient statistic based on several i.i.d. samples. The selection procedure = (1 , . . . , k ), where i ( x ) is the probability of selecting = as good when X x is observed. To ensure that the Bayes rule exists, we population i ∞ assume 0 2 dGi () < ∞ for i = 1, . . . , k. Based on the above assumptions, a straightforward computation shows that R(G, ) =
k
Ri (G, i )
(1.2)
i=1
and
Ri (G, i ) = where fi (xi ) =
X
∞
0
∞
i ( x)
fj (xj ) wi (xi )h(xi ) d x + Ti ,
j =i
c()e−xi / h(xi ) dGi (),
(0 − )c()e−xi / dGi (), wi (xi ) = ∞ 0 ( − 0 )I[>0 ] dGi (). Ti = 0
(1.3)
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
5
. Here fi (xi ) is the marginal density of Xi and Ti is independent of the selection rule G = (Gi , . . . , Gk ) is given by Clearly, a Bayes selection procedure 1 if wi (xi ) 0, (1.4) Gi = 0 if wi (xi ) > 0. ∞ ∞ Let i (xi )= 0 c()e−xi / dGi () and i (xi )= 0 2 c()e−xi / dGi (). Denote i (xi )= i (xi )/i (xi ). Then Gi can be expressed as 1 if i (xi ) 0 , (1.5) Gi = 0 if i (xi ) < 0 . If Gi is unknown, the Bayes rule cannot be applied and the selection cannot be made. The empirical Bayes approach is a way to help one to make the decision when past data are available. Since Robbins (1956, 1964) introduced the empirical Bayes approach, it has become a powerful tool in decision-making. Empirical Bayes methodologies have been developed by many researchers, for example by Singh and Wei (1992), van Houwelingen and Stijnen (1993), Pensky (1998), Pensky and Singh (1999), Liang (1999, 2000), and Li and Gupta (2001, 2002). For each i = 1, . . . , k, let (Xij , ij ), j = 1, 2, . . . be random vectors associated with population i and stage j, where Xij is observable while ij is unobservable. It is assumed that ij has a prior distribution Gi , for all j = 1, 2, . . . , and conditioning on ij = ij , Xij follows a distribution with density f (xij |ij ) and (Xij , ij ), i = 1, . . . , k, j = 1, 2, . . . are x . The past mutually independent.At the present stage, say, stage n+1, we have observed X= accumulated observations are denoted by (X1 , . . . , Xn ) = X n , where Xj = (X1j , . . . , Xkj ) is the observation at stage j. These past observations, although they have been observed, n and are kept random and thus in upper case for technical reasons. Based on X x , we wish to construct an empirical Bayes rule to select all good populations and to exclude all bad populations. Such an empirical Bayes rule can be expressed as n ) = (n1 ( n ), . . . , nk ( n )), x, X x, X x, X n ( n ) is the probability of selecting i as good if X n and x, X x are observed. Let where ni ( R(G, n ) denote the overall Bayes risk of n . Then
n ) = R(G,
k
Ri (G, ni ),
(1.6)
i=1
where
Ri (G, ni ) =
X
E[ni ( x , X)]
fj (xj ) wi (xi )h(xi ) d x + Ti .
(1.7)
j =i
The regret Bayes risk is defined as rn = R(G, n ) − R(G, G ), which is used to measure n is asymptotically the performance of empirical Bayes rule n . If rn = o(1), it is said that optimal (a.o.) in the literature (Robbins, 1956, 1964). If rn = O( n ) for some positive n
6
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
n is asymptotically optimal at a rate of O( n ) such that limn→∞ n = 0, it is said that (Gupta and Liang, 1999) . The aim of this paper is to construct an empirical Bayes rule n for the selection problem described above. For most distributions in the family f(xi |i ), under the above general ∞ setting or, in some cases, with one additional condition 0 3 dG() < ∞, we show that nr n → m, where m is a computable constant depending on G. The result indicates that the rate O(n−1 ) can be easily obtained for nonparametric procedures under quite weak condition. It should be pointed out that Gupta and Liang (1999) studied the selection problem for gamma(x|, s) populations, a special case of above problem, firstly through an empirical ∗ Bayes approach. They constructed an empirical Bayes rule ˜ n and established its convergence rate O(n−1 ) under some regularity conditions. A rate of O(n−1 log n) was obtained there under the condition that i s are bounded. Gupta and Liese (2000) showed that the limiting distribution of nR n is a linear combination of independent 2 random variables, ∗ where Rn is the conditional regret of a modified version of rule ˜ n . The remaining part of this paper is organized as follows. In Section 2, an empirical Bayes selection rule n is investigated in Section 3. n is constructed. The asymptotic behavior of In Section 4, we provide a few typical examples as applications of our results. The proofs of our results are given in Section 5. 2. Construction of empirical Bayes selection procedure n . The construction of n can be divided into three steps. First, we construct an estimator of wi (x). Then we localize the Bayes rule. Finally, we complete the construction by mimicking the Bayes rule using the estimator of wi (x). The construction of an estimator of wi (x) follows the idea of Gupta and Liang (1999). For the loss function (1.1), an unbiased and consistent estimator of wi (x) can be obtained. For each i = 1, . . . , k, j = 1, . . . , n, and x > 0, define Vij (x) =
0 + x − Xij I[Xij x] . h(Xij )
(2.1)
Through a standard calculation, we have E[Vij (x)] = wi (x). Based on this nice property, an unbiased and consistent estimator of wi (x) can be constructed as Wni (x) =
n 1 Vij (x), n
(2.2)
j =1
for each i = 1, . . . , k, and x ∈ (0, ∞). We call the next step as a localization of the Bayes test. Examining the Bayes selection rule G , we see that Gi depends on x only through xi . Also i (x) is increasing for i = 1, . . . , k. If xi is large so that i (xi ) 0 , we have Gi = 1; If xi is small so that i (xi ) < 0 , we have Gi = 0. Since G is unknown, we do not know at which point we should accept H0 or reject it. But, one will be more likely to take action ai = 1 if the observation of Xi = xi is quite large and take action ai = 0 if it is quite small. By knowing this, we want to find two
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
7
numbers Bn and Ln such that we select i as good if we observe xi > Ln and exclude it as bad if xi < Bn . Here both Bn and Ln depend on n. This could be understood as follows. As n increases, we have more information from the accumulated data, and we should adapt new Bn and Ln so that our decision can be made more precisely. Certainly, the exact form of f (x|) and the distribution ∞ G affect the choice of Bn and Ln . Since we have no knowledge about G except that 0 2 dG() < ∞, we rely on f (x|) itself. If limx↓0 h(x) > 0, let Bn =1/Ln and Ln = 0 log n/4. If limx↓0 h(x)=0, let Hn and Ln be the two sequences of positive numbers such that Hn eLn /0 = n1/4 and Hn → ∞, Ln → ∞ as n → ∞. For example, Hn = n1/6 and Ln = 0 log n/12. Then define Bn = inf{x < 1 : h(x) 1/Hn } ∨ (1/Ln ). It follows that Bn → 0 since Hn → ∞ as n → ∞. According to what we mentioned at the beginning of this section, we propose the following empirical Bayes procedure: For each i = 1, . . . , k, and xi , 1 if (xi > Ln ) or (Bn xi Ln and Wni (xi ) 0), (2.3) ni (xi ) = 0 if (xi < Bn ) or (Bn xi Ln and Wni (xi ) > 0). This empirical Bayes procedure says that, at stage n + 1, if the present observation xi from i is relatively big or small, a decision will be made based on xi only. If it is not too small or too big, we have to resort to past data information and use Wni (x), the estimator of wi (x), to make the decision.
n 3. Asymptotic Optimality of In this section, the asymptotic behavior of n is investigated. We derive the regret Bayes risk first. From (1.2) and (1.3), the Bayes risk of G is R(G, G ) = ki=1 Ri (G, Gi ) with ∞ Ri (G, Gi ) = Gi ( x )wi (xi )h(xi ) dxi + Ti . 0
From (1.6) and (1.7), the Bayes risk of x ) is R(G, n ) = ki=1 Ri (G, ni ) with n ( ∞ Ri (G, ni ) = E[ ni ( x )]wi (xi )h(xi ) dxi + Ti . 0
Thus the regret Bayes risk of n , the difference between R(G, n ) and R(G, G ), is rn =
k
rni ,
(3.1)
i=1
where
rni =
Ln
Bn
+
P (Wni (x) 0)wi (x)I[wi (x)>0] h(x) dx Ln
Bn
P (Wni (x) > 0)wi (x)I[wi (x)<0] h(x) dx.
(3.2)
8
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
∞ ∞ Under the assumption 0 2 dGi () < ∞, we have 0 |wi (x)|h(x) dx < ∞ from the inequality ∞ ∞ ∞ |wi (x)|h(x) dx 0 i (x)h(x) dx + i (x)h(x) dx 0 0 ∞ ∞ 0 0 dGi () + 2 dGi (). 0
0
Since Wn (x) is a consistent estimator of wi (x), P (Wni (x) 0) → 0 if wi (x) > 0, and P (Wni (x) > 0) → 0 if wi (x) < 0. Applying the dominated convergence theorem, we have rni = o(1). Thus we have the following theorem. ∞ Theorem 3.1. Assume that 0 2 dGi () < ∞ for each i =1, 2, . . . , k. Then n , as defined by (2.3), is asymptotically optimal. Besides the asymptotic optimality, the convergence rate of an empirical Bayes procedure is also an important factor to be considered when the procedure is applied. The following discussion shows that the procedure n achieves the rate of convergence of order O(n−1 ). From now on, we consider only those members of the family f (x|) in which limx↑∞ h(x) > 0 and h(x) is bounded from below for any inner closed subset of (0, ∞). These members belong to one of the following cases: Case 1: limx↑∞ h(x) x > 0 and lim x↓0 h(x) > 0. h(x) Case 2: limx↑∞ x > 0 and limx↓0 h(x) = 0. Case 3: limx↑∞ h(x) x = 0 and lim x↓0 h(x) > 0. h(x) Case 4: limx↑∞ x = 0 and limx↓0 h(x) = 0. Before presenting the main results, we introduce the following definition. If Gi = 1 for all x ∈ (0, ∞) or Gi = 0 for all x ∈ (0, ∞), we say that Gi is degenerate; otherwise we say that Gi is non-degenerate. If Gi is non-degenerate, i.e., limx↓0 i (x) < 0 < limx↑∞ i (x), then i (x) is strictly increasing. Therefore there exists a point ci ∈ (0, ∞) such that i (ci ) = 0 , i (x) > 0 for x > ci and i (x) < 0 for x < ci . ∞ Theorem 3.2. Assume that 0 2 dGi () < ∞ for i = 1, . . . , k. In Case 3 and Case 4, we ∞ also assume that 1 4 c() dGi () < ∞ for i = 1, . . . , k. Then lim nr n =
n→∞
where
mi ,
(3.3)
i=1
mi =
k
0 h(ci )Var([Vi1 (ci )]) 2|wi (ci )|
ifGi is degenerate, if Gi is non-degenerate.
(3.4)
and Var([Vi1 (ci )]) is the variance of Vi1 (ci ), wi (ci ) is the derivative of wi (x) at ci , wi (ci ) = 0 if Gi is non-degenerate.
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
Proof. The proof is given in Section 5.
9
∞
∞ In Case 3 and Case 4, the assumptions 0 2 dGi () < ∞ and 1 4 c() dGi () < ∞ ∞ can be replaced by a single stronger assumption 0 3 dGi () < ∞. So we have the following corollary. Corollary 3.3. In Case 3 and Case 4, if well as (3.4) holds.
∞ 0
3 dGi () < ∞ for i = 1, . . . , k, then (3.3) as
−1 ∞ and for > 1, Proof. Note that c() = 0 exp(−x/)h(x) dx ∞ ∞ −1 e−x/ h(x) dx = e−y h(y ) dy 0 0 2 −2 −2 min h(t) . e h(y ) dy > e 1
t 1
(3.5)
∞ It follows that c() is bounded for > 1. Thus from 0 3 dGi () < ∞ we have both ∞ 2 ∞ 4 0 dGi () < ∞ and 1 c() dGi () < ∞. Then Corollary 3.3 follows Theorem 3.2. −1 From Theorem 3.2, one sees ∞a 2rate of order O(n ) is obtained under a (quite) weak condition. We only require 0 dG() < ∞ in Case 1 and Case 2. The assumption ∞ 2 0 dG() < ∞ guarantees the existence of the Bayes rule. This assumption is natural not very stringent. In Case 3 and Case 4, we require one moment condition, ∞ and 3 dG( ) < ∞. 0 Liang (1999) considered the selection problem in a testing formulation. The rate of O(n−1 ) was obtained under the assumption of
0 < i M < ∞ for the function h(x) such that (a) h(x) is non-decreasing in x for x > 0, (b) x/ h(x) is non-increasing in x for x > 0, and (c) limx→∞ h (x)/ h(x) = 0. For exp()-family (see Example 4.1 below), h(x) = 1 and h(x) does not satisfy (b). Our results are applicable to exp()-family and his not. So our results are more general, and especially, obtained under much weaker condition. We consider the case where the control 0 is known and loss function of (1.1) is additive. With independence among the populations and independence among the priors, the selection problem can be split into k independent testing problems. The construction of empirical Bayes selection procedure can be focused on each individual problem. In situations where the control 0 is unknown, extra observations associated with unknown control have to be utilized. As a result, the selection problem becomes complicated (Miescke, 1981; Gupta and Miescke (1985). Further research on empirical Bayes procedures for this type of selection problem is desirable. The applications of our results to a few typical distributions are presented in the following section. It includes the construction of n and the statement of convergence rate for each distribution there.
10
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
4. Examples and results We select a few distributions as examples. Example 4.1. (exp()-family). Consider the exponential populations having density f (xi |i ) =
1 −xi /i e , xi > 0, i > 0, i = 1, · · · , k. i
(4.1)
Here h(x) ≡ 1. This family belongs to Case 3. Take Bn = 1/Ln , Ln = 0 log n/4 and n as construct 1 if (xi > Ln ) or (Bn xi Ln and Wni (xi ) 0), (4.2) ni (xi ) = 0 if (xi < Bn ) or (Bn xi Ln and Wni (xi ) > 0). Then applying Corollary 3.3, we have the following result. ∞ Result 4.1. If Xi has density f (xi |i ) given in (4.1) and 0 3 dGi () < ∞ for all i = 1, . . . , k, then n , as constructed in (4.2), has a rate of convergence of order O(n−1 ). Example 4.2. (Gamma (, s)-family with known s > 1). Consider the gamma populations having density f (xi |i ) =
xis−1 −xi /i e , (s)si
xi > 0, i > 0, i = 1, . . . , k.
(4.3)
Here h(x) = x s−1 . This family belongs to Case 2. Let Hn = n1/6 and Ln = 0 log n/12. Then Bn = n−1/[6(s−1)] ∨ L−1 n . Construct n as: 1 if (xi > Ln ) or (Bn xi Ln and Wni (xi ) 0), ni (xi ) = (4.4) 0 if (xi < Bn ) or (Bn xi Ln and Wni (xi ) > 0). Then applying Theorem 3.2, we have the following result. ∞ Result 4.2. If Xi has density f (xi |i ) given in (4.3) and 0 2 dGi () < ∞ for all n , as constructed in (4.4), has a rate of convergence of order O(n−1 ). i = 1, . . . , k, then Example 4.3. (A population having a density with infinitely many discontinuities). Consider the exponential populations having density f (xi |i ) = c(i )e
−xi /i
∞
(l + 1)I[l
xi > 0, i > 0, i = 1, . . . , k.(4.5)
l=0
Here h(x) = ∞ l=0 (l + 1)I[l
Ln ) or (Bn xi Ln and Wni (xi ) 0), (4.6) ni (xi ) = 0 if (xi < Bn ) or (Bn xi Ln and Wni (xi ) > 0). Then applying Theorem 3.2, we have the following result.
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
11
∞ Result 4.3. If Xi has density f (xi |i ) given in (4.5) and 0 2 dGi () < ∞ for all i = 1, . . . , k, then n as constructed in (4.6), has a rate of convergence of order O(n−1 ). Remark. Gupta and Liang (1999) considered the same selection problem for the gamma population (4.3). In that paper, an empirical Bayes rule was constructed as 1 if Wni (xi ) 0, ∗ ni (xi ) = 0 if Wni (xi ) > 0. ∗ n =(∗n1 , . . . , ∗nk ) is affected by the tail probability of the underlyThe convergence rate of ing distributions. In our paper, we split the interval (0, ∞) into three parts (0, Bn ), [Bn , Ln ] and (Ln , ∞) by localizing the Bayes test. Then we construct the empirical Bayes rule as (4.4). So the influence of the tail probability of the underlying distributions is controlled and a rate of O(n−1 ) is obtained under quite weak conditions as shown in Result 4.2.
5. Proof of Theorem 3.2 The main idea of the proof is to use a classic result about the pointwise upper bound for the difference between the normal distribution and the distribution of the sum of i.i.d. random variables. We shall prove it in the following two subsections according to whether all Gi are non-degenerate or not. 5.1. All Gi are non-degenerate We shall prove Theorem 3.2 assuming that all Gi are non-degenerate in this subsection. Then there exists a point ci ∈ (0, ∞) such that i (ci ) = 0 , i (x) > 0 for x > ci and i (x) < 0 for x < ci . Since we consider the asymptotic behavior of n , we assume ci ∈ (Bn , Ln ) for all n without loss of generality. Lemma 5.1. For each i = 1, . . . , k, wi (ci ) < 0 and further there is a neighborhood of ci , denoted by N(ci , i ), such that N (ci , i ) ⊂ (B1 , L1 ) and Ai =
min
x∈N(ci , i )
|wi (x)| > 0.
(5.1)
Denote ci1 = ci − i , ci2 = ci + i . Then for all x ∈ [Bn , ci1 ] ∪ [ci2 , Ln ], there exists an Mi > 0 such that |wi (x)| Mi e−Ln /0 .
(5.2)
Proof. For x > 0, the derivative of wi (x) exists and can be expressed as ∞ ∞ wi (x) = −0 e−x/ c() dGi () + e−x/ c() dGi (). 0
0
12
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
From Jensen’s inequality, we see that for x > 0 ∞ 2 −x/ ∞ −x/ c() dGi () c() dGi () e 0 e ∞ < 0∞ −x/ . −x/ c() dGi () c() dGi () 0 e 0 e Plugging ci for x in the above inequality, we have ∞ −c / e i c() dGi () 0 ∞ < 0 . −ci / c() dG () i 0 e This implies that wi (ci ) < 0. Note that wi (x) is continuous in (0, ∞). So an N (ci , i ) can be found such that N (ci , i ) ⊂ (B1 , L1 ) and Ai =
min
x∈N(ci , i )
|wi (x)| > 0.
Then (5.1) is proved. On the other hand, rewrite wi (x) as wi (x)= i (x)[0 − i (x)]. Noting that i (x) is strictly increasing in x and i (ci ) = 0 , then for x ∈ [Bn , ci1 ] ∪ [ci2 , Ln ], |0 − i (x)| (0 − i (ci1 )) ∧ (i (ci2 ) − 0 ). For x Ln ,
i (x)
∞
0
c()e−x/ dGi () e−Ln /0
∞
0
c() dGi ().
Thus |wi (x)| [(0 − i (ci1 )) ∧ (i (ci2 ) − 0 )]e−Ln /0 This completes the proof of Lemma 5.1.
∞
0
c() dGi ().
Next lemma deals with the bounds of the moments of Wni (x). In Case 1 and Case 3, min0 0. Let Sn ≡ 1/[min0 Bn in all four cases. Recall Ln = 0 log n/4 in Case 1 and Case 3 and Hn eLn /0 = n1/4 in Case 2 and Case 4. Then we have Sn eLn /0 ∼ n1/4 . ∞ ∞ In Case 3 and Case 4, we know 1 4 c() dGi () < ∞ and let Ci = 1 4 c() dGi (). Without loss of generality, we assume h(x) x for x > 1 in Case 1 and Case 2. Lemma 5.2. Let 2i (x) = E[(Vij (x) − wi (x))2 ] and i (x) = E[|Vij (x) − wi (x)|3 ]. Then for x ∈ [Bn , Ln ], [2Sn (0 + 1) + 1]2 for Case 1 and Case 2, (5.3)
2i (x) Sn [(0 + 1)2 i (x) + 2(0 + 1)Ci ] for Case 3 and Case 4 and
i (x)
4[2Sn (0 + 1) + 1]3 + 4|wi (x)|3 16Sn2 [(30 +6)i (x)+6Ci ]+4|wi (x)|3
f or Case 1 and Case 2, f or Case 3 and Case 4.
(5.4)
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
13
For x ∈ [ci1 , ci2 ], there exist a constants Ci > 0 such that
i Ci .
(5.5)
For x ∈ [Bn , ci1 ] ∪ [ci2 , Ln ] and large n , n3/8 |wi (x)|/| i (x)| 1.
(5.6)
Proof. Consider x ∈ [Bn , Ln ]. Note that h(x) Sn−1 . In Case 1 and Case 2, if x 1, h(x) x, and |Vij (x)| I[Xj x] 0 / h(Xj ) + I[Xj x] (Xj − x)/ h(Xj ) 0 Sn + 1. If Bn x < 1, it can be shown that |Vij (x)| 2Sn (0 + 1) + 1. Thus
2i (x) E[|Vij (x)|2 ] [2Sn (0 + 1) + 1]2 . For i (x), using |a + b|3 4|a|3 + 4|b|3 , we have
i (x) 4E[|Vij (x)|3 ] + 4|wi (x)|3 4[2Sn (0 + 1) + 1]3 + 4|wi (x)|3 . Then (5.3) and (5.4) are proved for Case 1 and Case 2. In Case 3 and Case 4, a simple calculation shows that ∞ 3 c()e−x/ dGi ()].
2i (x) Sn [20 i (x) + 20 i (x) + 2 0
By breaking the interval (0, ∞) into (0, 1) and [1, ∞), we have i (x) Ci + i (x) and ∞ 3 −x/ dG () C + (x). Thus i i i 0 c()e
2i (x) Sn [(0 + 1)2 i (x) + 2(0 + 1)Ci ]. Similarly,
i (x) 16Sn2 [(30 + 6)i (x) + 6Ci ] + 4|wi (x)|3 . Now consider x ∈ [ci1 , ci2 ]. It is easy to see that 4[[1 + 2( + 1)]/[min 3 0 x>ci1 h(x)]] +4 max 3 bi1 x ci2 |wi (x)| ≡ Ci i (x) 3 16[( + 6) i (ci1 ) + 6Ci ]/[minx ci1 h(x)]2 0 +4 maxci1 x ci2 |wi (x)|3 ≡ Ci
in Case 1 and Case 2 in Case 3 and Case 4.
Then (5.5) holds. Next we prove (5.6). From (5.2), |wi (x)| Mi e−Ln /0 for x ∈ [Bn , ci1 ] ∪ [ci2 , Ln ]. In Case 1 and Case 2, wi (x) Mi e−Ln /0 −1 −Ln /0 ∼ n−1/4 . (x) 2S ( + 1) + 1 ∼ Sn e i n 0 In Case 3 and Case 4 wi (x) |0 − i (x)| . (x) 1/2 2 i Sn [(0 + 1) /i (x) + 2(0 + 1)Ci /[i (x)]2 ]1/2
(5.7)
14
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
It is easy to see that |0 − i (x)| min{|0 − i (ci1 )|, |0 − i (ci2 )|}. We know from the ∞ proof of Lemma 5.1 that i (x) e−Ln /0 0 c() dGi (). Then Sn [(0 + 1)2 /i (x) + 2(0 + 1)Ci /[i (x)]2 ]1/2 ∼ Sn eLn / . 1/2
1/2
−1/2 −Ln /0 e )
Thus |wi (x)/ i (x)| = O(Sn Lemma 5.2.
= O(Sn n−1/4 ). This completes the proof of 1/2
Note that Vij (x) are fixed x. For large n, the central limit i.i.d random variables for √ theorem tells us that nj=1 [Vij (x) − wi (x)]/[ i (x) n] is close to N (0, 1) in distribution. Furthermore, we have the following pointwise upper bound for the difference between the normal distribution and the distribution of the sum of i.i.d random variables. This result can be found in Petrov (1995, p. 168). Fact. Let X1 , X2 , . . . , Xn be i.i.d random variables, EX1 =0, EX21 = 2 > 0, E|X1 |3 < ∞. Then for all x |Fn (x) − (x)| A √
n(1 + |x|)3
.
(5.8)
Here (x) is the c.d.f. of N (0, 1), Fn (x) and are given by 1 Fn (x) = P √
n
n
Xj x ,
=
j =1
E|X1 |3 .
3
Now, we are ready to prove our main result. Proof of Theorem 3.2. Rewrite P (Wni (x) < 0) as 1 P n 2i (x)
n
√ [Vij (x) − wi (x)] −
j =1
nwi (x) .
i (x)
Then applying (5.8), we have √ n|wi (x)| Ai (x) . +√ P (Wni (x) < 0) − √
i (x) n( i (x) + n|wi (x)|)3 Similarly, √ P (Wni (x) > 0) 1 −
n|wi (x)|
i (x)
+√
Ai (x) . √ n( i (x) + n|wi (x)|)3
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
15
Plugging above two inequalities in (3.2), we obtain ci √ Ai (x) nwi (x) +√ wi (x)h(x) dx − rni √
i (x) n( i (x) + n|wi (x)|)3 Bn √ Ln nwi (x) Ai (x) + |wi (x)|h(x) dx +√ 1− √
i (x) n( i (x) + n|wi (x)|)3 ci ≡ I + II. From (5.3), (5.4), (5.5) and (5.6), we see that wi (x), 2i (x) and i (x) have different behavior for different x. So we decompose I into four parts. ci1 √ ci √ nwi (x) nwi (x) I= wi (x)h(x) dx + wi (x)h(x) dx − −
i (x)
i (x) Bn ci1ci ci1 Ai (x)wi (x)h(x) Ai (x)wi (x)h(x) dx + dx + √ √ √ √ 3 n( i (x) + n|wi (x)|) n( i (x) + n|wi (x)|)3 Bn ci1 ≡ I1 + I2 + I3 + I4 . (5.9) as n is large, wi (x)/ i (x) n−3/8 for x ∈ [Bn , ci1 ], Consider I1 first. √ According to (5.6), 1/8 It follows that nwi (x)/ i (x) n . Then applying it to I1 , we have ci1 I1 (−n1/8 ) wi (x)h(x) dx = o(n−1 ). (5.10) Bn
For I2 , letting h¯ i and ¯ i be the maximum values of h(x) and i (x) on [ci1 , ci2 ]. Thus ci √ nwi (x) I2 h¯ i wi (x) dx. −
¯ i ci1 √ Using (5.1) and letting y = nwi (x)/ ¯ i , ci ∞ √
¯ 2
¯ 2 (− nwi (x)/ ¯ i )wi (x) dx i (−y)y dy = i . Ai n 0 4Ai n ci1 Then lim sup nI 2 n→∞
h¯ i ¯ 2i . 4Ai
(5.11)
Next we consider I3 . From (5.2), |wi (x)| Mi e−Ln /0 for x ∈ [Bn , ci1 ]. In Case 1 and Case 2, applying (5.4), we have ci1 4[2Sn (0 + 1) + 1]3 + 4|wi (x)|3 I3 wi (x)h(x) dx √ √ n( i (x) + n|wi (x)|)3 Bn ci1 4[2Sn (0 + 1) + 1]3 ci1 4 w (x)h(x) dx + wi (x)h(x) dx i n2 Bn n2 Mi3 e−3Ln /0 Bn = o(n−1 ).
(5.12)
16
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
In Case 3 and Case 4, using (5.4) again, ci1 16Sn2 [(30 + 6)i (x) + 6Ci ] + 4|wi (x)|3 I3 wi (x)h(x) dx √ √ n( i (x) + n|wi (x)|)3 Bn ci1 16S 2 (30 + 6) ci1 96Sn2 Ci 2 n2 −2L (x)h(x) dx + wi (x)h(x) dx i n /0 n Mi e n2 Mi3 e−3Ln /0 Bn Bn ci1 4 wi (x)h(x) dx + 2 n Bn = o(n−1 ).
(5.13)
For x ∈ [ci , ci1 ], i (x) Cir . Let i be the minimum value of i (x) on [ci1 , ci2 ]. It is easy to see that i (x) > 0 for each x ∈ [ci1 , ci2 ] ⊂ [B1 , L1 ]. Noting that i (x) is a continuous function of x, then i > 0. It follows that ci wi (x) dx I4 AC i h¯ i √ √ 3 n(
ci1 i + n|wi (x)|) √ nw(ci ) AC i h¯ i y 1 . (5.14) 3/2 dy = o 3 n n Ai 0 ( i + y) Combining (5.9) to (5.14), we have lim supn→∞ nI h¯ i ¯ 2i /4Ai . Note that I is independent of i , h¯ i → h(ci ), ¯ 2i → Var([Vi1 (ci )]) and Ai → |wi (ci )| as i → 0. Then lim sup nI n→∞
h(ci )Var([Vi1 (ci )]) mi = . 4|wi (ci )| 2
Similarly lim supn→∞ nI I mi /2. Therefore lim sup nr ni mi . n→∞
Note that
√ Ai (x) nwi (x) rni −√ wi (x)h(x) dx − √
i (x) n( i (x) + n|wi (x)|)3 Bn √ Ln nwi (x) Ai (x) + 1− |wi (x)|h(x) dx. −√ √
i (x) n( i (x) + n|wi (x)|)3 ci c L √ √ We have proved that Bnn (Ai (x)wi (x)h(x))/( n( i (x) + n|wi (x)|)3 ) dx = o(n−1 ), Bni1 √ √ (− nwi (x)/ i (x))wi (x)h(x) dx = o(n−1 ) and cLi2n [1 − ( nwi (x)/ i (x))]wi (x)h(x) dx = o(n−1 ). Then ci √ [(− nwi (x)/ i (x))wi (x)h(x) dx lim inf nr ni lim inf n[ n→∞ n→∞ ci1 ci2 √ [1 − ( nwi (x)/ i (x))wi (x)h(x) dx]. +
ci
ci
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
17
Using the same idea applied to I2 , it is easy to prove the left-hand side of above inequality is not smaller than mi . So the proof of (3.3) and (3.4) is complete. 5.2. Some components of G are degenerate We shall prove Theorem 3.2 assuming that some of components of G are degenerate. For simplicity, we assume that only G1 is degenerate without loss of generality. We need to show rn1 = o(n−1 ). If P (1 = 0 ) = 1, w1 (x) = 0 and rn1 = 0. Assume P (1 = 0 ) < 1 in the following. From the proof in Section 5.1, we only need to prove that (5.2)–(5.4) and (5.6) hold for all x ∈ [Bn , Ln ]. Notice that (5.3) and (5.4) do not depend on the assumption of nondegeneracy of G1 . Then we only need to show (5.2) and (5.6). If G1 is degenerate, then (5.2) and (5.6) are obvious. Next we assume that G1 is non-degenerate and limx↓0 1 (x) 0 or limx↑∞ 1 (x) 0 . Denote 1 (0+) = limx↓0 1 (x) and 1 (∞−) = limx↑∞ 1 (x). We shall show (5.2) first. ∞ If 1 (0+) 0 and G1 (0 −)=0, for x ∈ [Bn , Ln ], |w1 (x)| e−Ln /0 0 ( − 0 )c() ∞ dG1 () and 1 (x) e−Ln /0 0 c() dG1 (). Then (5.2) holds. If 1 (0+) 0 and G1 (0 −) > 0, then 1 (0) < ∞ and 1 (0) < ∞. The reason is in the following. Since G1 (0 −) > 0, we can find > 0 such ∞ that G1 (0 − ) > 0. From (3.3), we know c() is bounded on [0 − , ∞). Then − c() dG1 () < ∞. Therefore, if 1 (0) = ∞, then 00 − c()e−x/ dG1 () → ∞ as x → 0. And since − ∞ (0 − ) 0 0 c()e−x/ dG1 () + 0 − 2 c()e−x/ dG1 () 1 (x) , 0 − −x/ dG () + ∞ c()e−x/ dG () c( )e 1 1 0 0 − we have 1 (0+) 0 − . This contradicts 1 (0+) 0 . Thus 1 (0) < ∞ and 1 (0) < ∞. Then we have w1 (0+) < 0. Let c11 > 0 such that −w1 (x) > d1 > 0 for x ∈ (0, c11 ). As n is large, Bn < c11 . Then for x ∈ [Bn , c11 ], x > 1/Ln and |w(x)| = |w(x) − w(0)| x |w1 (x ∗ )| d1 /Ln , where x ∗ ∈ (0, x). It is easy to see that (5.2) is true for x ∈ [c11 , Ln ]. Then (5.2) holds for all x ∈ [Bn , Ln ] in this case. If 1 (∞−) 0 , we must have G1 (0 )=1. The reason is in the following. If G1 (0 ) < 1, let > 0 satisfy G(0 + ) = 1. Then
1 (x) 0 + >
0
∞ −x[−1 −(0 + )−1 ] dG () 1 ()+(0 + ) 0 + c()e 1 . 0 + ∞ −1 −1 −1 −1 c()e−x[ −(0 + ) ] dG1 () + 0 + c()e−x[ −(0 + ) ] dG1 () 0 −1
2 c()e−x[
−(0 + )−1 ] dG
Therefore limx↑∞ 1 (x) > 0 + . This contradicts 1 (∞−) 0 . Since G1 (0 )=1, w1 (x)= 0 −x/ dG () e−Ln / 0 ( − )c() dG () for x ∈ [B , L ]. 1 0 1 n n 0 (0 − )c()e 0 Next we shall prove (5.6). It is obvious for Case 1 and Case 2 from (5.2) and (5.3). We only prove (5.6) for Case 3 and Case 4. ∞ If 1 (0+) 0 and G1 (0 −)=0,1 (x) 0 c() dG1 () < ∞ for all x > 0. Then (5.6) follows from (5.2) and (5.3). If 1 (0+) 0 and G1 (0 −) > 0, 1 (0) < ∞ from previous result.Then (5.6) follows from (5.2) and (5.3).
18
S.S. Gupta, J. Li / Journal of Statistical Planning and Inference 129 (2005) 3 – 18
If 1 (∞−) 0 , G1 (0 ) = 1 and 1 (x) < 0 for all x > 0. We know Bn < 1 for large n. Then for x ∈ [1, Ln ],1 (x) is bounded. Then (5.6) follows from (5.2) and (5.3). For x ∈ [Bn , 1],1 (x) and 0 − 1 (x) are bounded from below. Then (5.6) follows from (5.7). Now the proof of Theorem 3.2 is complete. References Dunnett, C.W., 1955. A multiple comparison procedure for comparing several treatments with a control. J. Amer. Statist. Assoc. 50, 1096–1121. Gupta, S.S., Hsiao, P., 1983. Empirical Bayes rules for selecting good populations. J. Statist. Plann. Inference 8, 87–101. Gupta, S.S., Liang, T., 1999. Selecting good exponential populations compared with a control: a nonparametric Bayes approach. Sankhya, Ser. B. 61, 289–304. Gupta, S.S., Liese, F., 2000. Asymptotic distribution of the conditional regret risk for selecting good exponential populations. Kybernetika 36, 571–588. Gupta, S.S., Miescke, K.J., 1985. Minimax multiple t- tests for comparing k normal populations with a control. J. Statist. Plann. Inference 12, 161–169. Gupta, S.S., Sobel, M., 1958. On selecting a subset which contains all populations better than a standard. Ann. Math. Statist. 29, 235–244. Lehmann, E.L., 1961. Some model I problems of selection. Ann. Math. Statist. 32, 990–1012. Li, J., Gupta, S.S., 2001. Monotone empirical Bayes tests with optimal rate of convergence for a truncation parameter. Statist. Decisions 19, 223–237. Li, J., Gupta, S.S., 2002. Monotone empirical Bayes tests based on kernel sequences estimation. Statist. Sinica 12, 1061–1072. Liang, T., 1999. On Singh’s conjecture on rate of convergence of empirical Bayes tests. Nonparametric Statist. 12, 1–16. Liang, T., 2000. On empirical Bayes tests in a positive exponential family. J. Statist. Plann. Inference. 83, 169–181. Miescke, K.J., 1981. Gamma-minimax selection procedures in simultaneous testing problems. Ann. Statist. 9, 215–220. Paulson, E., 1952. An optimum solution to the k-sample slippage problem for the normal distribution. Ann. Math. Statist. 23, 610–616. Pensky, M., 1998. Empirical Bayes estimation based on wavelets. Sankhya Ser. A 60, 214–231. Pensky, M., Singh, R.S., 1999. Empirical Bayes estimation of reliability characteristics for an exponential family. Canad. J. Statist. 27, 127–136. Petrov, V.V., 1995. Limit Theorems of Probability Theory, Clarendon Press, Oxford. Robbins, H., 1956. An empirical Bayes approach to statistics. Proceedings of the Third Berkeley Symposium Mathematical Statistics and Probability 1, 157–163, Univ. California Press, Berkeley. Robbins, H., 1964. The empirical Bayes approach to statistical decision problems. Ann. Math. Statist. 35, 1–20. Singh, R.S., Wei, L.S., 1992. Empirical Bayes with rates and best rates of convergence in u(x)C() exp(−x/)family: estimation case. Ann. Inst. Statist. Math. 44, 435–449. van Houwelingen, H., Stijnen, T., 1993. Monotone empirical Bayes estimators based on more informative samples. J. Amer. Statist. Assoc. 88, 1438–1443.