Selection of the best Bernoulli population provided it is better than a control: An empirical Bayes approach

Selection of the best Bernoulli population provided it is better than a control: An empirical Bayes approach

Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379 www.elsevier.com/locate/jspi Selection of the best Bernoulli population provided...

209KB Sizes 1 Downloads 18 Views

Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379 www.elsevier.com/locate/jspi

Selection of the best Bernoulli population provided it is better than a control: An empirical Bayes approach Lee-Shen Chen Department of Applied Statistics and Information Science, Ming Chuan University, Taoyuan County 333, Taiwan, ROC Received 2 February 2007; received in revised form 8 October 2007; accepted 24 October 2007 Available online 1 November 2007

Abstract We investigate an empirical Bayes procedure ∗n for selecting the best Bernoulli population from among k competitors provided it is better than a control based on negative binomial sampling. The performance of ∗n is evaluated in terms of its associated Bayes risk. Under some conditions, it is shown that ∗n is asymptotically optimal, and the associated Bayes risk converges to the minimum Bayes risk at an exponential decay rate exp(−cn) for some positive number c, where n is the number of historical data at hand when the present selection problem is considered. A simulation is also carried out to study the performance of ∗n for small to moderate values of n. © 2007 Elsevier B.V. All rights reserved. MSC: 62F07; 62C12 Keywords: Best population; Better than a control; Bayes procedure; Rate of convergence; Regret; Asymptotically optimal

1. Introduction In many situations, an experimenter is often confronted with choosing a system or a treatment which is best in some sense from among several competitors. For examples, suppose that there are k different competing operation systems. One would like to select the best among them in the sense that it has the smallest failure probability. Or one may consider several different competing drugs for a certain ailment. One would like to select the best from among them in the sense that it has the highest probability of curing the ailment. This kind of selection problems occurs in many fields, such as medicine, engineering and sociology. The monograph of Bechhofer et al. (1995) has provided detailed revisions on the goals and procedures of the selection problems. Consider k independent Bernoulli populations 1 , . . . , k , where i is characterized by a probability parameter pi , i = 1, . . . , k. Let p[1]  · · · p[k] denote the ordered values of p1 , . . . , pk . It is assumed that the exact pairing between the ordered and the unordered parameters is unknown. Selection procedures for Bernoulli populations have been extensively studied in the literature. Chapter 7 of Bechhofer et al. (1995) provides an overview on selection procedures for Bernoulli populations based on binomial sampling and sequential sampling. Gupta and Nagel (1971) studied some selection procedures using negative binomial sampled data. Also, Vit (1974) and Salvia (1984) considered the hypothesis of testing the equality of failure probabilities based on negative binomial sampled data. In this paper, we let pi denote the failure probability for each trial in i . The population i with pi = p[1] is called the best population E-mail address: [email protected]. 0378-3758/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2007.10.003

L. Chen / Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379

2367

among the k competitors. Let p0 be a specified standard or a control level. Population i is said to be better than the control p0 and acceptable if pi < p0 ; otherwise, i is said to be bad and should be excluded. In many practical situations, an experimenter makes a selection only when the best is better than the control p0 . For example, let p0 be the failure probability of the currently used operation system. An experimenter may make a selection from among the k new operation systems and replace the currently used one by the newly selected operation system provided that the newly selected one is better than the control level p0 . Otherwise, the experimenter may select none and continue using the current operation system. We consider a situation in which one is dealing with repeated independent selection problems. In such instances, it is reasonable to formulate the component problem in the sequence as a Bayes decision problem with respect to an unknown prior distribution on the parameter space. One then uses the accumulated historical data to improve the decision procedures at each stage. This is the so-called empirical Bayes approach due to Robbins (1956). Empirical Bayes procedures have been derived for Bernoulli populations by Gupta and Liang (1986, 1988, 1989) based on binomial sampled data and by Gupta and Liang (1992) and Liang (1991) based on negative binomial sampled data. Readers are referred to Huang and Chang (2006) and the references cited there for the recent development of empirical Bayes procedures on the research area of ranking and selection. In this paper, we study the problem of selecting the best Bernoulli population provided that it is better than the control based on negative binomial sampling using the nonparametric empirical Bayes approach. We organize the paper in the following way. The selection problem is formulated in Section 2. For an error loss, we derive a Bayes selection procedure. The empirical Bayes framework of the selection problem is introduced in Section 3. By mimicking the behavior of the Bayes selection procedure, an empirical Bayes selection procedure ∗n is constructed. We study the asymptotic optimality of ∗n in Section 4. It is shown that under certain conditions, ∗n is asymptotically optimal and its associated Bayes risk converges to the minimum Bayes risk at an exponential decay rate exp(−cn) for some positive number c, where n is the number of the accumulated historical data at hand when the present selection problem is considered. Finally, a simulation study is carried out to investigate the performance of ∗n for small to moderate values of n. The simulation results are reported in Section 5. The proofs regarding the asymptotic optimality of ∗n are given in Appendix. 2. Formulation of the selection problem and a Bayes procedure Consider k(2) independent Bernoulli populations 1 , . . . , k , where i is characterized by a failure probability parameter pi . That is, pi is the failure probability for each Bernoulli trial in i . Let Xi denote the number of failures before observing the rth success in i . It is assumed that the trials are mutually independent. Thus, given pi , Xi has a negative binomial distribution with probability function fi (x|pi ), where  fi (x|pi ) =

 r +x−1 pix (1 − pi )r , r −1

x = 0, 1, 2, . . . .

(2.1)

 Let f (x|p) = ki=1 fi (xi |pi ) for x = (x1 , . . . , xk ) and p = (p1 , . . . , pk ). For each p, let p[1] , . . . , p[k] denote the ordered values of the parameters p1 , . . . , pk . It is assumed that the exact pairing between the ordered and the unordered parameters is unknown. A population i with the smallest failure probability, that is, pi = p[1] is referred to as a best population. Let p0 denote a known control, 0 < p0 < 1. A population i with pi < p0 is said to be better than the control p0 and is acceptable. Otherwise, i is said to be bad and not acceptable. Our goal is to derive empirical Bayes procedures for selecting the best Bernoulli population which should also be better than the control p0 . If there is no such population, we select none and exclude all competitors as bad. Let  = {p = (p1 , . . . , pk )|0 < pi < 1, i = 1, . . . , k} be the parameter space. It is assumed that pi is a realization of a random probability parameter Pi , which has an unknown prior distribution Gi over the interval (0, 1). It is also assumed the joint probability distribution  that P1 , . . . , Pk are mutually independent. Thus, P = (P1 , . . . , Pk ) has G(p) = ki=1 Gi (pi ). Let A = {a = (a0 , a1 , . . . , ak ) | ai = 0, 1, i = 0, 1, . . . , k, and ki=0 ai = 1} be the action space. For each action a, when ai = 1 for some i = 1, . . . , k, it means that population i is selected as the best and considered to be better than the control p0 ; when a0 = 1, it means that no selection is made and all populations are considered to be bad.

2368

L. Chen / Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379

Let i = pi /(1 − pi ), i = 0, 1, . . . , k, and [1] = min(1 , . . . , k ). We consider the following error loss function L(p, a): For each p in  and action a in A, L(p, a) =

k 

ai i − min([1] , 0 ).

(2.2)

i=0

Note that one may consider the loss function directly based on the failure probability pj ; namely, L∗ (p, a) =

k 

ai pi − min(p[1] , p0 ).

(2.2a)

i=0

Obviously, the loss function in (2.2) is unbounded, while the loss function in (2.2a) is bounded above by 1. The penalty of selecting a very poor population will be very large when the loss (2.2) is applied, but it will be always bounded above by 1 under the loss (2.2a). Let X be the sample space generated by X = (X1 , . . . , Xk ). A selection procedure  = (0 , 1 , . . . , k ) is defined to be a measurable mapping from the sample space X into the product space [0, 1]k+1  , such that for each x in X, the function (x) = (0 (x), . . . , k (x)) satisfies that 0 i (x) 1, i = 0, 1, . . . , k, and ki=0 i (x) = 1. That is, for each i = 1, . . . , k, i (x) is the probability of selecting i as the best population and considering i to be better than the control p0 ; and 0 (x) is the probability of excluding all populations as bad, and hence selecting none. Let D be the class of all selection procedures. For each  in D, let R(G, ) denote its associated Bayes risk. Then, R(G) = inf ∈D R(G, ) is the minimum Bayes risk among the class D. A selection procedure, say G , such 1 that R(G, G ) = R(G) is called a Bayes selection procedure. We consider only those priors for which 0 [p/(1 − p)]dGi (p) < ∞, i = 1, . . . , k, to ensure the finiteness of R(G, ) for each selection procedure  in D. Let    r + xi − 1 p xi (1 − p)r dGi (p) fi (xi |p) dGi (p) = fi (xi ) = r −1 : the marginal probability function of Xi , f (x) =

k

fi (xi ),

x = (x1 , . . . , xk ),

i=1



i (xi ) =

[p/(1 − p)]fi (xi |p) dGi (p),

(2.3)

i (xi ) = i (xi )/fi (xi ) = EGi [Pi /(1 − Pi )|Xi = xi ] : the posterior expectation of Pi /(1 − Pi ) given Xi = xi , hi (xi ) = (1 − p0 )i (xi ) − p0 fi (xi ), hij (xi , xj ) = i (xi )fj (xj ) − j (xj )fi (xi ). A straightforward calculation shows that ⎡ ⎤   y ∞  l ⎣ ⎦ fi (y). i (xi ) = l+r −1 y=xi +1

Also, note that i (xi ) is nondecreasing in xi . Based on the preceding model, the Bayes risk of a selection procedure  is:  k    i (xi ) R(G, ) = + 0 (x)0 f (x) − C i (x) fi (xi ) x∈X i=1  k    = i (x)i (xi ) + 0 (x)0 f (x) − C, x∈X

(2.4)

l=xi +1

i=1

 where C =  min([1] , 0 ) dG(p).

(2.5)

L. Chen / Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379

2369

From (2.5), a Bayes selection procedure G = (G0 , . . . , Gk ) can be obtained as follows: For each x in X, let   S(x) = i = 1, . . . , k|i (xi ) = min j (xj ) and i (xi ) < 0 1j k

= {i = 1, . . . , k|hij (xi , xj ) 0 for all j and hi (xi ) < 0}. Let

 iG (x) =

0 if S(x) = , min{i|i ∈ S(x)} if S(x)  = ,

(2.6)

(2.7)

where  denotes the empty set. For each i = 0, 1, . . . , k, define Gi (x) = 1(0)

if i = iG (x) (otherwise).

(2.8)

From (2.5)–(2.8), one sees that G is a Bayes selection procedure. The minimum Bayes risk of the selection problem is R(G, G ), where  k    R(G, G ) = Gi (x)i (xi ) + G0 (x)0 f (x) − C x∈X

=



x∈X

i=1

iG (xiG )f (x) − C,

(2.9)

where 0 (x0 ) ≡ 0 . An alternative form of the Bayes selection procedure G : Let Ai = {xi |i (xi ) < 0 } and Bi = {xi |i (xi ) > 0 }. Define  sup Ai if Ai  = , ai = −1 if Ai = ;  inf Bi if Bi  = , bi = (2.10) ∞ if Bi = . Since i (xi ) is increasing in xi , the set Ai always lies at the left-hand-side of the set Bi if both Ai and Bi are nonempty sets. Thus, ai bi . Also, ai < bi if Bi  = . Note that i (xi ) < 0 for xi ai , i (xi ) > 0 for xi bi , and i (xi ) = 0 for ai < xi < bi ; bi = ai + 1 if i (ai + 1) > 0 or bi = ai + 2 if i (ai + 1) = 0 . Hence, hi (xi ) = (1 − p0 )i (xi ) −  −1  2x and p0 fi (xi ) = (1 − p0 )[i (xi ) − 0 ]fi (xi ) < 0 for xi ai , and hi (xi ) > 0 for xi bi . Let w(x) = r+x−1 r−1 x Hi (−1) = 0, also let Hi (x) = z=0 hi (z)w(z) for x 0. Define   (2.11) M(Hi ) = min x|Hi (x) = min Hi (y) . −1  y<∞

Since hi (xi ) < 0 for xi ai and hi (xi ) 0 for x > ai , Hi (x) has minimum at ai (and also at ai + 1 if i (ai + 1) = 0 ). Therefore, ai = M(Hi ). In terms of ai , the set S(x) of (2.6) can be expressed as: S(x) = {i = 1, . . . , k|xi ai and hij (xi , xj ) 0 for all j for which xj aj }.

(2.12)

3. Empirical Bayes selection procedures When G is unknown, it may not be possible to implement the Bayes selection procedure G for the underlying selection problem. In the following, it is assumed that certain historical data from each of the k populations are available. In such a situation, the empirical Bayes approach of Robbins (1956) is adopted. 3.1. Empirical Bayes framework For each i = 1, . . . , k, let (Xij , Pij ), j = 1, 2, . . . , be iid copies of (Xi , Pi ), associated with the population i , where Xij , j = 1, 2, . . . , are observable, but Pij , j = 1, 2, . . . , are not observable. Here, Pij stands for the failure

2370

L. Chen / Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379

probability of the Bernoulli population i at stage j, and Xij is the number of failures before observing the rth success at stage j. At the present stage n + 1, let Xi (n) = (Xi1 , . . . , Xin ) denote the accumulated historical data associated with i and Xi ≡ Xi,n+1 be the present random observation arising from i , and let pi ≡ pi,n+1 be a realization of Pi,n+1 , i = 1, . . . , k. Denote X(n) = (X1 (n), . . . , Xk (n)) and X = (X1 , . . . , Xk ). At stage n + 1, we are interested in the selection of the population associated with p[1] = min(p1 , . . . , pk ) = min(p1,n+1 , . . . , pk,n+1 ) provided that p[1] < p0 using the error loss L(p, a) of (2.2). An empirical Bayes selection procedure n = (n0 , . . . , nk ) is a measurable function of thepresent observation X = x and the historical data X(n) such that 0 ni (x, X(n)) ≡ ni (x) 1, i = 0, 1, . . . , k and ki=0 ni (x) = 1. For each i = 1, . . . , k, ni (x) is the probability of selecting i as the best and considering i as better than the control p0 , and n0 (x) is the probability of excluding all the k populations as bad and selecting none. Given X(n), the conditional Bayes risk of an empirical Bayes selection procedure n is:   k   R(G, n |X(n)) = ni (x)i (xi ) + n0 (x)0 f (x) − C, x∈X

i=1

and the overall Bayes risk of n is: R(G, n ) = En R(G, n |X(n))  k    = En (ni (x))i (xi ) + En (n0 (x))0 f (x) − C, x∈X

i=1

where the expectation En is taken with respect to the probability measure generated by X(n). Since G is a Bayes selection procedure, R(G, n |X(n)) − R(G, G ) 0 for all X(n) and for all n. Thus, R(G, n ) − R(G, G )0 for all n. The nonnegative regret R(G, n ) − R(G, G ) is often used as a measure of performance of the empirical Bayes selection procedure n . n is said to be asymptotically optimal, relative to the prior distribution G at a convergence rate εn if R(G, n ) − R(G, G ) = O(εn ) where {εn } is a sequence of positive decreasing numbers such that limn→∞ εn = 0. In the following, we shall construct empirical Bayes selection procedures possessing the desired asymptotic optimality. 3.2. Construction of empirical Bayes selection procedures In order to construct empirical Bayes selection procedures by mimicking the behavior of the Bayes selection procedure G , we first have to estimate the functions fi (xi ), i (xi ), hi (xi ) and hij (xi , xj ). For each i = 1, . . . , k and xi , define 1 I{xi } (Xij ), n j =1 ⎤ ⎡ Xij   l ⎦ I[0,∞) (Xij − xi − 1), V (Xij |xi ) = ⎣ l+r −1 n

fin (xi ) =

in (xi ) =

1 n

l=xi +1 n 

V (Xij |xi ),

j =1

hin (xi ) = (1 − p0 )in (xi ) − p0 fin (xi ), hij n (xi , xj ) = in (xi )fj n (xj ) − j n (xj )fin (xi ), Hin (xi ) =

xi 

hin (z)w(z),

z=0

/ A). where IA (t) = 1(0) if t ∈ A (t ∈

Hin (−1) ≡ 0,

(3.1)

L. Chen / Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379

2371

Define 



ain = M(Hin ) = min x|Hin (x) =

min

−1  y<∞

Hin (y) .

(3.2)

ain is used as an estimator of ai . Note that ai = M(Hi ), see (2.10)–(2.11) and the related discussion given in Section 2. Define Sn (x) = {i = 1, . . . , k|xi ain and hij n (xi , xj ) 0 for all j for which xj aj n },

(3.3)

and in∗ (x) =



0 if Sn (x) = , min{i|i ∈ Sn (x)} if Sn (x)  = .

(3.4)

Note that the definitions of Sn (x) and in∗ (x) are mimicries of S(x) and iG (x), respectively, (see (2.12) and (2.8)). We now propose an empirical Bayes selection procedure ∗n = (∗n0 , . . . , ∗nk ), which is defined as follows: For each x in X, i = 0, 1, . . . , k,  1 if i = in∗ (x), ∗ ni (x) = (3.5) 0 otherwise. The conditional Bayes risk of ∗n is: R(G, ∗n |X(n)) =

 k   x∈X

 ∗ni (x)i (xi ) + ∗n0 (x)0

f (x) − C,

(3.6)

i=1

and the Bayes risk of ∗n is: R(G, ∗n ) =

 k   x∈X

 En [∗ni (x)]i (xi ) + En [∗n0 (x)]0

f (x) − C.

(3.7)

i=1

4. Asymptotic optimality of ∗n In this section, we evaluate the asymptotic optimality of the empirical Bayes selection procedure ∗n . Let Ti = {x ∈ X|iG (x) = i}, i = 0, 1, . . . , k. From (2.8)–(2.9) and (3.5)–(3.6), the regret of ∗n can be expressed as: R(G, ∗n ) − R(G, G ) = =

k  k 

x∈X i=0 j =0 k 

Pn {in∗ (x) = i, iG (x) = j }[i (xi ) − j (xj )]f (x)

Pn {in∗ (x) = i, iG (x) = 0}[i (xi ) − 0 ]f (x)

x∈T0 i=1 k  

+

j =1 x∈Tj

+

Pn {in∗ (x) = 0, iG (x) = j }[0 − j (xj )]f (x)

k k   

i = 1 x∈Tj i = j ≡ In + IIn + IIIn . j =1

Pn {in∗ (x) = i, iG (x) = j }[i (xi ) − j (xj )]f (x)

(4.1)

2372

L. Chen / Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379

Thus, to study the asymptotic optimality of ∗n , it suffices to investigate the asymptotic behavior for each of the three terms at the RHS of (4.1). We have the following lemma. Lemma 4.1. Suppose that for each i = 1, . . . , k, the prior distribution Gi satisfies the following conditions: 1 (1) 0 [p/(1 − p)] dGi (p) < ∞, (2) Bi ≡ {xi |i (xi ) > 0 }  = . Then, the following inequalities hold:  

i (a) In (N + 1) exp(−n 1 ) ki=1 [p/(1 − p)] dGi (p), where 1 = 21 min1  i  k min{( 2(N−bi +1)w(z) )2 |bi z N }, bi = inf Bi , i = min(hi (bi )w(bi ), |hi (ai )|w(ai )) > 0 and N ≡ N ( 1 , . . . , k ) is a fixed positive value given in Lemma A.2.

i (b) IIn (a + 1) exp(−n 2 )k0 , where a = max(a1 , . . . , ak ) and 2 = 21 min1  i  k min{( (ai +1)w(z) )2 |0 z ai }. (c) IIIn [(N + 1) exp(−n 1 ) + (a + 1) exp(−n 2 ) + 4 exp(−n 3 )] k   × [p/(1 − p)] dGi (p), i=1

where 3 = min{ ij |i, j = 1, . . . , k and j  = i} and 1 2 ij = min{ 32 hij (xi , xj )|x ∈ Tij 2 }

where Tij 2 = {x ∈ Tj |i (xi ) > j (xj ) and xi < bi }. From (4.1) and Lemma 4.1, we conclude that the proposed empirical Bayes selection procedure ∗n possesses the following asymptotic optimality. Theorem 4.1. Let ∗n be the empirical Bayes selection procedure constructed in Section 3. Suppose that the prior distributions Gi , i = 1, . . . , k, fulfill the conditions of Lemma 4.1. Then, ∗n is asymptotically optimal in the sense that R(G, ∗n ) − R(G, G ) [2(N + 1) exp(−n 1 ) + 2(a + 1) exp(−n 2 ) + 4 exp(−n 3 )]

k  

[p/(1 − p)] dGi (p)

i=1

= O(exp(−cn)), where c = min( 1 , 2 , 3 ) > 0. 5. Simulation study A Monte Carlo simulation study was designed to investigate the performance of the proposed empirical Bayes(EB) selection procedure ∗n for small to moderate values of n. For the (n + 1)–st component selection problem, let X(n) denote the historical data, and Xn+1 stands for the present random observation. From (2.7) and (3.4), one can define Dn+1 (xn+1 ) = in∗ (xn+1 ) (xin∗ ) − iG (xiG ).

(5.1)

Then, E[Dn+1 (Xn+1 )] = R(G, ∗n ) − R(G, G ), where G is the Bayes selection procedure based only on the present observation. Since Dn+1 (xn+1 ) is an unbiased estimator of the regret R(G, ∗n ) − R(G, G ), then sample mean of Dn+1 (xn+1 ) based on iid copies of (X(n), Xn+1 ) can be used as an estimator of the regret R(G, ∗n ) − R(G, G ).

L. Chen / Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379

2373

Table 1 The regrets (R) and percentage relative regrets (RR) of EB selection procedure ∗n for k = 5 populations, with p0 = 0.2 and various values of n and r n

r 2

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180

3

R

RR

R

RR

0.01257 (0.00114) 0.00562 (0.00075) 0.00381 (0.00060) 0.00278 (0.00049) 0.00247 (0.00045) 0.00145 (0.00034) 0.00086 (0.00024) 0.00099 (0.00028) 0.00057 (0.00018) 0.00057 (0.00015) 0.00074 (0.00025) 0.00062 (0.00019) 0.00041 (0.00014) 0.00067 (0.00023) 0.00057 (0.00018) 0.00045 (0.00017) 0.00053 (0.00019) 0.00020 (0.00013)

134.46173 60.14492 40.77714 29.76270 26.40010 15.48585 9.23518 10.57299 6.13483 6.06710 7.92678 6.65857 4.35452 7.11511 6.13483 4.76202 5.64236 2.16911

0.02516 (0.00160) 0.01921 (0.00141) 0.01261 (0.00116) 0.01122 (0.00109) 0.01008 (0.00104) 0.00867 (0.00098) 0.00622 (0.00082) 0.00442 (0.00070) 0.00393 (0.00065) 0.00352 (0.00060) 0.00309 (0.00060) 0.00273 (0.00055) 0.00281 (0.00054) 0.00193 (0.00044) 0.00233 (0.00048) 0.00268 (0.00054) 0.00212 (0.00045) 0.00142 (0.00038)

291.54102 222.59817 146.16044 130.02624 116.79774 100.50565 72.09207 51.24523 45.49890 40.84065 35.82500 31.60391 32.52814 22.38191 26.98361 31.08851 24.51749 16.48529

Estimated standard deviations of regrets are in parentheses.

Let N be a fixed positive integer, and let N 1 = N + 1. The simulation scheme is described as follows. (1) For each i = 1, . . . , k, generate Pij , j = 1, . . . , N1 from the beta distribution Be( i , i ), where i > 0 and i > 0, i = 1, . . . , k are given values. (2) Given Pij = pij , generate Xij from the negative binomial distribution N B(r, pij ) where Xij denotes the number of failures prior to the rth success with pij equal to the probability of success on each trial. Then x1j , . . . , xkj denote the observations at the stage j from the population 1 , . . . , k , respectively, and xj = (x1j , . . . , xkj ) forms the jth observed vector, j = 1, . . . , N1. Note that from (1) and (2) one can easily derive i (xi ) = (xi + i )/(r + i + 1), i = 1, . . . , k. Based on the generated values of the random vectors X1 , . . . , XN1 , and the selected 0 = p0 /(1 − p0 ), do the following computation for each n = 10(10)180. (3) (4) (5) (6)

Compute iG (xn+1 ) in (2.7) and the Bayes selection rule G = (G0 , G1 , . . . , Gk ) in (2.8). x −1 Compute in∗ (xn+1 ) in (3.4) and the EB selection rule ∗n from (3.5) with the weight function w(x) = [ r+x−1 r−1 2 ] . Compute the regret Dn+1 (xn+1 ) = in∗ (xn+1 ) (xin∗ ) − iG (xiG ). Repeat step (1) through step (5) 2000 times. Then take its average denoted by D n+1 , which is used as an estimator of the difference R(G, ∗n )−R(G, G ). In addition, we also compute the estimated standard error of the corresponding estimate D n+1 .

The simulation studies for the given k, p0 , and various values n and r are executed. For k = 5, we take parameters i = i × 0.1, i = 10 − i × 0.1, i = 1, . . . , 5, and i = i × 0.05, i = 10 − i × 0.05, i = 1, . . . , 10 for k = 10. In Tables 1 and 2, we find that the regret of the EB selection rule for k = 5 decreases globally as n increases when p0 and r are fixed, and decreases as r decreases when p0 and n are fixed. Those tables also show that the regrets of the EB selection rule for fixed p0 and r oscillate locally while they decrease globally as n increase. Another way to evaluate of the EB selection rules for small to moderate sample sizes, one may consider the percentage relative regret(RR) of EB procedure ∗n with respect to the minimum Bayes risk R(G, G ),

2374

L. Chen / Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379

Table 2 The regrets (R) and percentage relative regrets (RR) of EB selection procedure ∗n for k = 5 populations, with p0 = 0.4 and various values of n and r. n

r 4

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180

5

R

RR

R

RR

0.01038 (0.00159) 0.00654 (0.00129) 0.00247 (0.00063) 0.00181 (0.00041) 0.00144 (0.00028) 0.0013 (0.00027) 0.00155 (0.00031) 0.00125 (0.00025) 0.00111 (0.00021) 0.00104 (0.00020) 0.00106 (0.00035) 0.00081 (0.00016) 0.00108 (0.00023) 0.00072 (0.00020) 0.00086 (0.00017) 0.00088 (0.00018) 0.00092 (0.00019) 0.00081 (0.00016)

129.56358 81.59073 30.89188 22.60359 17.96659 16.28588 19.37223 15.59550 13.89261 12.94177 13.18363 10.08447 13.42411 9.02893 10.77103 10.95240 11.53692 10.08289

0.01927 (0.00231) 0.01114 (0.00178) 0.00562 (0.00117) 0.00506 (0.00107) 0.00443 (0.00102) 0.00356 (0.00091) 0.00270 (0.00065) 0.00238 (0.00061) 0.00166 (0.00040) 0.00115 (0.00020) 0.00143 (0.00039) 0.00124 (0.00021) 0.00135 (0.00022) 0.00135 (0.00039) 0.00165 (0.00040) 0.00127 (0.00021) 0.00121 (0.00020) 0.00106 (0.00019)

256.95535 148.52616 74.87842 67.46801 59.05385 47.42900 36.03860 31.75459 22.19482 15.31564 19.05172 16.56193 17.95007 17.98890 22.06573 16.97606 16.13445 14.15679

Estimated standard deviations of regrets are in parentheses.

which is defined by RR(∗n ) = 100[R(G, ∗n ) − R(G, G )]/R(G, G ). The RR can be interpreted as the percentage risk reduction of ∗n in terms of the minimum Bayes risk R(G, G ). We also display the RR’s in Tables 1 and 2. Those tables show that for each case the RR’s oscillate and decrease as n increase. For k = 10, the analogous results occur and are omitted for brevity. Overall, the proposed EB selection rule performs satisfactorily in the simulation study. Acknowledgments I wish to thank two anonymous referees for constructive comments which led to an improved version of the original paper. Appendix For presenting a precise proof for Lemma 4.1, we introduce certain useful lemmas as follows. Lemma A.1. For each i = 1, . . . , k, ai (a) For 0 x a i, z=x hi (z)w(z)hi (ai )w(ai ) < 0.  (b) For x > bi , xz=bi hi (z)w(z)hi (bi )w(bi ) > 0. The lemma follows by noting that w(z) > 0 for all z 0 and hi(z) < 0 for z ai and hi (z) > 0 for z bi . 2x ]−1 , x > 0. Thus, we have the following Let i = min(hi (bi )w(bi ), |h(ai )|w(ai )). Note that w(x) = [ r+x−1 r−1 result.

L. Chen / Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379

Lemma A.2. There exists a positive integer N ≡ N ( 1 , . . . , k ) such that N > max(a1 , . . . , ak ) and 4 for each i=, . . . , k.

2375

∞

z=N+1 w(z) < i

From (3.1), hin (x) = (1 − p0 )in (x) − p0 fin (x) n 1 = [(1 − p0 )V (Xij |x) − p0 I{x} (Xij )] n =

1 n

j =1 n 

(A.1)

U (Xij , x),

j =1

where U (Xij , x) = (1 − p0 )V (Xij |x) − p0 I{x} (Xij ), j = 1, . . . , n, are iid, such that − p0 U (Xij , x) 1 − p0 En U (Xij , x) = (1 − p0 )i (x) − p0 fi (x) = hi (x).

(A.2)

Thus, by the inequality in Hoeffding (1963), the following lemma holds. Note that though we apply the same Hoeffding’s inequality as Liang (1988) does, the analysis of the paper is quite different from that of Liang (1988). Lemma A.3. For c > 0, x = 0, 1, 2, . . ., (a) Pn {hin (x) − hi (x) < − c}  exp(−nc2 /2), (b) Pn {hin (x) − hi (x) > c}  exp(−nc2 /2). Lemma A.4. (a) (b) ∞

Pn {

 

y=bi

∞

y

y=bi {

y

{

N

z=bi N z=bi

z=bi

z=bi [hin (z) − hi (z)]w(z) < − i }



N

z=bi {hin (z) − hi (z) < − i /[2(N

− bi + 1)w(z)]}.

[hin (z) − hi (z)]w(z) < − i }}

Pn {hin (z) − hi (z) < − i /[2(N − bi + 1)w(z)]} exp{−n[ i /(2(N − bi + 1)w(z))]2 /2}.

Proof. Note that for z bi , 0 < hi (z) = (1 − p0 )i (z) − p0 fi (z) 1 − p0 . Also, −p0 hin (z) 1 − p0 . Thus, |hin (z) − hi (z)|1. For y N + 1, from Lemma A.2,   y y       [hin (z) − hi (z)]w(z)  |hin (z) − hi (z)|w(z)    z=N+1



z=N+1 y 

w(z) <

z=N+1

Hence, 

y 

 [hin (z) − hi (z)]w(z) < − i /2 = .

z=N +1

i . 4

(A.3)

2376

L. Chen / Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379

Therefore, ⎧ ∞ ⎨ ⎩

[hin (z) − hi (z)]w(z) < − i

z=bi



⎧ N ⎨

⎫ ⎬ ⎭

⎫  y ⎬ 

[hin (z) − hi (z)]w(z) < − i /2 ⎩ ⎭ z=bi ⎧ ⎫ N ⎨ ⎬ = [hin (z) − hi (z)]w(z) < − i /2 ⎩ ⎭

 [hin (z) − hi (z)]w(z) < − i /2

z=N+1

z=bi

N 



{hin (z) − hi (z) < − i /[2(N − bi + 1)w(z)]}.

(A.4)

z=bi

Also, for bi y N , ⎧ y ⎨ ⎩

[hin (z) − hi (z)]w(z) < − i

z=bi



y 

⎫ ⎬ ⎭

{hin (z) − hi (z) < − i /[2(N − bi + 1)w(z)]}.

(A.5)

z=bi

Combining (A.4) and (A.5) yields the result of part (a). Part (b) of the lemma follows immediately from the results of part (a), Lemma A.3(a) and an application of Bonferroni’s inequality. Proof of Lemma 4.1 (a). For each x in T0 , xi > ai , i = 1, . . . , k. Since i (xi ) − 0 = 0 for ai < xi < bi , thus, it suffices to consider the case where xi bi . For x in T0 and xi bi , by the definition of in∗ , Lemma A.1 and the definition of i and Lemmas A.3 and A.4, Pn {in∗ (x) = i, iG (x) = 0} Pn {bi xi ain } Pn {Hin (y) < Hin (bi − 1) for some y bi } ⎧ ⎫ y ⎨ ⎬  = Pn Hin (y) − Hin (bi − 1) = hin (z)w(z) for some y bi ⎩ ⎭ z=bi ⎧ ⎫ y y ⎨ ⎬  = Pn [hin (z) − hi (z)]w(z) < − hi (z)w(z) for some y bi ⎩ ⎭ z=b z=bi ⎧ i ⎫ y ⎨ ⎬ Pn [hin (z) − hi (z)]w(z) < − i for some y bi ⎩ ⎭ z=b ⎧ i⎧ ⎫⎫ y ∞ ⎨ ⎨ ⎬⎬ = Pn [hin (z) − hi (z)]w(z) < − i ⎩ ⎩ ⎭⎭ y=bi



N  z=bi

z=bi

" ! n exp − [ i /(2(N − bi + 1)w(z))]2 2

(N + 1) exp(−n 1 ).

(A.6)

L. Chen / Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379

2377

Substituting (A.6) into In , it follows that

In 

k 

(N + 1) exp(−n 1 )[i (xi ) − 0 ]f (x)

x∈T0 i=1

= (N + 1) exp(−n 1 ) (N + 1) exp(−n 1 )

k 

[i (xi ) − 0 ]f (x)

x∈T0 i=1 k  

p/(1 − p) dGi (p).

i=1

Proof of Lemma 4.1 (b). For x in Tj , j =1, . . . , k, xj aj , hj (xj ) < 0. By the definition of ∗n , following a discussion analogous to (A.6), we have Pn {in∗ (x) = 0, iG (x) = j } Pn {aj n < xj aj } Pn {Hj n (y) < Hj n (aj ) for some y < aj } ⎧ ⎫ aj ⎨ ⎬  = Pn Hj n (aj ) − Hj n (y) = hj n (z)w(z) > 0 for some y < aj ⎩ ⎭ z=y+1 ⎧ ⎫ aj ⎨  ⎬ Pn [hj n (z) − hj (z)]w(z) > j for some y < aj ⎩ ⎭ z=y+1 ⎫⎫ ⎧ ⎧ j −1 ⎨ aj ⎬⎬ ⎨a  = Pn [hj n (z) − hj (z)]w(z) > j ⎭⎭ ⎩ ⎩ y=0 z=y+1  aj   Pn {hj n (z) − hj (z) > j /[(aj + 1)w(z)]} z=0 aj



 z=0 aj



 z=0

Pn {hj n (z) − hj (z) > j /[(aj + 1)w(z)]} ! n " exp − { j /[(aj + 1)w(z)]}2 2

(aj + 1) exp(−n 2 ) (a + 1) exp(−n 2 ). Substituting (A.7) into IIn , it follows that

IIn 

k  

(a + 1) exp(−n 2 )[0 − j (xj )]f (x) (a + 1) exp(−n 2 )k0 .

j =1 x∈Tj

Proof of Lemma 4.1 (c). Let Tj i1 = {x ∈ Tj |xi bi }, Tj i2 = {x ∈ Tj |i (xi ) > j (xj ), xi < bi }, Tj i3 = {x ∈ Tj |i (xi ) = j (xj )}.

(A.7)

2378

L. Chen / Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379

Note that for x in Tj i1 , j (xj ) < 0 i (xi ). Thus, IIIn can be written as: IIIn =

k    j =1 i=j x∈Tj i1

+

Pn {in∗ (x) = i, iG (x) = j }[i (xi ) − j (xj )]f (x)

k    j =1 i=j x∈Tj i2

Pn {in∗ (x) = i, iG (x) = j }[i (xi ) − j (xj )]f (x)

≡ IIIn1 + IIIn2 .

(A.8)

For x in Tj i1 , xi bi . Analogous to (A.6), we have Pn {in∗ (x) = i, iG (x) = j } Pn {bi xi ain } (N + 1) exp(−n 1 ). Thus, IIIn1 

k   

(N + 1) exp(−n 1 )[i (xi ) − j (xj )]f (x)

j =1 i=j x∈Tj i1

= (N + 1) exp(−n 1 )

k   

[i (xi ) − j (xj )]f (x)

j =1 i=j x∈Tj i1

(N + 1) exp(−n 1 )

k  

p/(1 − p)dGi (p).

(A.9)

Pn {in∗ (x) = i, iG (x) = j } = Pn {in∗ (x) = i, aj n < xj aj , iG (x) = j } + Pn {in∗ (x) = i, xj aj n , iG (x) = j },

(A.10)

i=1

For x in Tj i2 ,

where, analogous to (A.7), Pn {in∗ (x) = i, aj n < xj aj , iG (x) = j } (a + 1) exp(−n 2 ).

(A.11)

Note that for x in Tj i2 , hij (xi , xj ) = i (xi )fj (xj ) − j (xj )fi (xi ) > 0. Also, 0 in (xi ), fin (xi ), i (xi ), fi (xi ) 1. Thus, Pn {in∗ (x) = i, xj aj n , iG (x) = j } Pn {in (xi )fj n (xj ) − j n (xj )fin (xi ) 0} ⎧ ⎫ ⎨ in (xi )[fj n (xj ) − fj (xj )] + fj (xj )[in (xi ) − i (xi )] ⎬ = Pn < − hij (xi , xj ) ⎭ ⎩ −j n (xj )[fin (xi ) − fi (xi )] − fi (xi )[j n (xj ) − j (xj )] Pn {fj n (xj ) − fj (xj ) < − hij (xi , xj )/4} + Pn {in (xi ) − i (xi ) < − hij (xi , xj )/4} + Pn {fin (xi ) − fi (xi ) > hij (xi , xj )/4} + Pn {j n (xj ) − j (xj ) > hij (xi , xj )/4} ! n " 4 exp − [hij (xi , xj )/4]2 4 exp(−n ij ) 4 exp(−n 3 ). 2

(A.12)

L. Chen / Journal of Statistical Planning and Inference 138 (2008) 2366 – 2379

2379

Combining (A.10) and (A.12) yields IIIn2 

k   

[(a + 1) exp(−n 2 ) + 4 exp(−n 3 )][i (xi ) − j (xj )]f (x)

j =1 i=j x∈Tj i2

= [(a + 1) exp(−n 2 ) + 4 exp(−n 3 )]

k   

[i (xi ) − j (xj )]f (x)

j =1 i=j x∈Tj i2

[(a + 1) exp(−n 2 ) + 4 exp(−n 3 )]

k  

p/(1 − p)dGi (p).

(A.13)

i=1

Finally, combining (A.8), (A.9) and (A.13) leads to the result of Lemma 4.1(c). Hence, the proof is complete. References Bechhofer, R.E., Santner, T.J., Goldsman, D.M., 1995. Design and Analysis of Experiments for Statistical Selection, Screaming, and Multiple Comparisons. Wiley, New York. Gupta, S.S., Liang, T., 1986. Empirical Bayes rules for selecting good binomial populations. In: J. Van Ryzin (Ed.), Adaptive Statistical Procedure and Related Topics. Institute of Mathematical Statistics, Hayward, CA, pp. 110–128. Gupta, S.S., Liang, T., 1988. Empirical Bayes rules for selecting the best binomial population. In: Gupta, S.S., Berger, J.O. (Eds.), Statistical Decision Theory and Related Topics IV. Springer, New York, pp. 213–224. Gupta, S.S., Liang, T., 1989. Selecting the best binomial population: parametric empirical Bayes approach. J. Statist. Plann. Inference 23, 21–31. Gupta, S.S., Liang, T., 1992. On empirical Bayes selection rules for negative binomial population. In: Goel, P.K., Iyengar, S. (Eds.), Bayesian Analysis in Statistics and Econometrics. Springer, New York, pp. 127–146 (Chapter 7). Gupta, S.S., Nagel, K., 1971. On some contributions to multiple decision theory. In: Gupta, S.S., Yackel, Y. (Eds.), Statistical Decision Theory and Related Topics. Academic, New York, pp. 79–102. Hoeffding, W., 1963. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58, 13–30. Huang, W.T., Chang, Y.P., 2006. Some empirical Bayes rules for selecting the best population with multiple criteria. J. Statist. Plann. Inference 136, 2129–2143. Liang, T., 1988. On the convergence rate of empirical Bayes rules for two-action problems: discrete case. Ann. Statist. 16, 1635–1642. Liang, T., 1991. Empirical Bayes selection for the highest success probability in Bernoulli process with negative binomial sampling. Statist. Decisions 9, 213–234. Robbins, H., 1956. An empirical Bayes approach to statistics. In: Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability vol. 1, University of California Press, Berkeley, CA, pp. 157–163. Salvia, A.A., 1984. Testing equality of binomial parameter based on inverse sampling. IEEE Trans. Reliability R33, 377–378. Vit, P., 1974. Test for homogeneity: the geometric distribution. Biometrika 61, 565–568.