On selecting the largest success probability under unequal sample sizes

On selecting the largest success probability under unequal sample sizes

Journal of Statistical Planning and Inference 21 (1989) 53-68 53 North-Holland ON SELECTING THE LARGEST UNDER UNEQUAL SAMPLE Mansour M...

1018KB Sizes 0 Downloads 82 Views

Journal

of Statistical

Planning

and Inference

21 (1989) 53-68

53

North-Holland

ON

SELECTING

THE

LARGEST

UNDER

UNEQUAL

SAMPLE

Mansour

M. ABUGHALOUS

SUCCESS

PROBABILITY

SIZES*

and Klaus

J. MIESCKE

Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Box 4348, Chicago, IL 60680, U.S.A. Received

23 October

Recommended

1987

by S.S. Gupta

Abstract: Let z,, . . . . 7~~be kr3 independent binomial populations, from which X,- B(n,, p,), i= 1, . . ..k. respectively, have been observed. The problem under concern is to find that population which is associated with the largest of the unknown ‘success probabilities’ pl, , pk. Under

the ‘O-l’

loss,

permutation

invariant

permutation

invariant,

density

with respect

some linear

loss which

loss, interesting

as well as for priors to some symmetric

occurs

properties

in gambling,

and

which are not invariant

measure.

a general

of Bayes rules are studied Examples

monotone,

for priors which are

but have a (DT)-posterior

of independent

beta-priors

are in-

cluded.

AMS Subject Classification: Primary Key words: Selecting the best binomial sizes; Bayes selection

62F07;

Secondary

population;

ranking

62F15. Bernoulli

trials with unequal

sample

rules.

1. Introduction binomial populations from which X,Let xl, . nk be k? 3 independent B(ni,p;), i= 1, ..., k, respectively, have been observed. The problem considered in this paper is to find that population which is associated with the largest of the unknown parameters p,, . . . , pk. Interpreting x, as the number of successes in a Bernoulli sequence of length nj with success probability pi, i= 1, . . . , k, the population to be selected will be called the ‘best population’. Selecting the best of k binomial populations in case of nl = n2 = ... = nk has been the topic of many papers in the past. Sobel and Huyett (1957) employed the indifference zone approach to implement the natural rule dN, say, which selects in terms of the largest ~i==;/ni, i= 1, . . . . k, where ties are broken at random. A subset selection procedure has been introduced by Gupta and Sobel (1960), and ..)

* This Research was supported by the Air Force 85-0347 at the University of Illinois at Chicago.

037%3758/89/$3.50

0

1989, Elsevier

Office

Science Publishers

of Scientific

Research

B.V. (North-Holland)

Contract

AFOSR-

54

M.M. Abughalous, K. J. Miescke / Selecting largest success probability

further examined by Gupta and McDonald (1986). Moreover, several multi-stage selection procedures with vector-at-a-time sampling schemes can be found in the literature, where this special situation arises at some points. For an overview, the reader is referred to Gupta and Panchapakesan (1979). It is well known that the natural rule dN is the uniformly best permutation invariant procedure if the n;‘s are equal. The most general version of this fact is presented in Gupta and Miescke (1984), where the risk function of multi-stage selection rules is studied under a general permutation invariant loss structure. The situation becomes much more complicated when the assumption of n, = Besides its asymptotic consistency, which is a must for n2 = .‘. =nk is dropped. every reasonable decision rule, no further optimality properties of dN are known to this point. The present work parallels that of Gupta and Miescke (1987), where selection of the largest normal mean under heteroscedasticity is considered. In his list of unsolved problems, Bechhofer (1985) presumes that the Bernoulli problem is more difficult due to its lack of location invariance. Two types of loss functions will be of primary interest in the following. The first is the so called ‘O-l’ loss, which is zero if the best population is selected, and one otherwise. It is the loss function most frequently used in the literature of ranking and selection, despite of the fact that it leads to rather pathological situations whenever the distance between plk_ il and pCkl is small, where plil 5p12] I 1..
55

M.&f. Abughalous, K. J. Mescke / Selecting largest success probability

function

are made in Section

4. For suitable

choices of n,, . . . . nk, which depend

on

the given prior, the posterior is decreasing in transposition and leads to a simple selection rule which is Bayes for a general class of loss functions. This is shown in Section 5 and illustrated by an example of beta priors.

2. Minimaxity The independent observations functions, respectively,

f(x;In;,pi)=P(X,=xj}

i = 1, . . . , k with

are Xi - B(ni, pi),

= ;

probability

pi”((l -p;)n’-x’,

0 I x;~{O,l,...,

n,},

pi~[O,l],

i=l,...,

k.

(1)

To represent

the natural rule dN in a concise form, let us introduce the following of X= randomization scheme U = (Ur , . . . , U,), say, which is independent Xk at random, whenever they occur. X,), to split ties among X1, . . . . (X,, . . . . on [0, E], where E>O is sufficiently u,, . . . . CJ, are assumed to be i.i.d., uniformly small to avoid that the ordering of distinct X;‘s could be reversed. To be more The natural rule dN can now be written specific, any E < l/n[,]ntk_ iI is appropriate. (a.e.) as dN(X,U)=i

iff

Xi+ U,=

max J=l,...,

{Xj+Uj},

i=l

9 . . . . k.

(2)

k

For the problem under concern, a minimax rule minimizes the maximum possible risk, i.e. the expected loss, among all selection procedures. Under the linear loss function, the situation turns out to be trivial. Since the risk of every rule assumes its maximum value of u at p, = ... =pk = 0, all of the selection procedures are in fact minimax, and the minimax value of the selection problem is equal to the worst possible loss. In the remainder of this section, the case of a ‘O-l’ loss will be studied. Let Et = {Xi+ Ui
(3)

Thus, all decision theoretic formulations in terms of the risk can be translated immediately into the P(CS)-language used in the area of ranking and selection. Since every binomial family B(n;,p;) is stochastically increasing in pin [0,11, i=l , . . . . k, the maximum risk of dN, and thus its minimum probability of a correct

56

selection,

M.M. Abughalous, K. J. Miescke / Selecting largest success probability

occurs at some parameter

configuration

of the type pr = ... =pk. There-

fore, let us first examine the behavior of (3) at this particular situation. For this purpose, let M1 be the auxiliary function M1@) = P, ,,,,,,(E,), p E [0,11, t = 1, . . . , k. Moreover, let n = nr + .a. + nk, and A = n/k. Then the following can be shown. Lemma 1. Let te (1, . . . . k} be fixed. At the endpoints of the unit interval [0,11, M,

assumes the value l/k, and its one-sided derivatives are equal to (M,);(O)=n,-A

and

(M,)L(l)=(k-l))‘(n,-n).

(4)

Thus, if n, #A, Mr assumes the value l/k at least once in (0, I), the interior of the unit interval. Proof. Apparently, for every fixed t, M,(O) =M,(l) = l/k. To find the derivative the left of MI at p= 1, one can see that for p close to one,

M,(p)=k-‘p”+(k-1))‘~”

on

(5)

C p-“~(l-p”l)+o(l-p). j+t

Hereby, the first term on the right hand side of (5) refers to the outcome Xi= ni, i=l ,..., k, whereas the second one refers to Xj
(1) = lim P&(l)

-M(PWU

-P)

p-1

=k-‘n-(k-

1))’ j;i

nj=(k-

1))‘(nt-A).

(6)

The derivative on the right of MI at p = 0 can be found in a similar to zero, one can see that

Mt(p)=k-‘(1

-p)“+n,(l

way. For p close

-~)~-‘p+o@),

(7)

where the first term on the right hand side refers to the outcome and the second one to X, = 1 and Xj = 0, for all j# t. Thus,

(M,) ‘+(0) = lim [M((p) - M,(O)]/p = n, - A.

Xt = 1.. =X,=0,

(8)

p-0

The last assertion follows from the fact that in case of n,fn, same sign. This completes the proof.

(6) and (8) have the

Corollary 1. Ifp,, . . . . pk are sufficiently close to zero (one), and zt is the best population, then the probability of making a correct selection with dN is increasing (decreasing) in n,, and decreasing (increasing) in ni for i # t. Proof. This follows from the fact that the one-sided derivatives of M, at the endpoints of [0, 11, which are given in (4), are increasing in n,, and decreasing in n, for i# t. Thus, standard continuity argumentation completes the proof.

MM.

Abughalous, K. J. Miescke / Selecting largest success probability

57

the case where p *,. .., pk are dose to some point p E (0, l), except for large nr, . . . , nk. If the ni’s tend to infinity in such a way that (n;/n,)“+& i= 1, . . . . k, where of course yl= 1, then Ml@) tends to Not much

can be said about

_I

Hfh

*..T

Yk)=

I

II @(YiZMZ)dZ,

(9)

R i#f

where @ and cp are the cumulative distribution function and density, respectively, of the standard normal distribution N(0, 1). This follows directly from the asymptotic normality of Xi, i = 1, . . . , k. One interesting point now is that the function Ht in (9) is strictly increasing in yi, i# t. This has been shown in two different approaches by Tong and Wetzell (1979) and by Gupta and Miescke (1987). Another point is that (9) implies that M,(p) is in general not equal to l/k at p=+, despite of the fact that the binomial distributions B(ni, pi), i= 1, . . . . k, are all symmetric at PI = ... =pk =+. Of course, at yr = ..’ = Yk= 1, H, is equal to l/k, but this holds already in the non-asymptotic case, as it is stated in the first part of the following lemma. Lemma 2. If n, = ... =nk, thenMt(p)=l/kforallp~[O,l] other hand, if qll < n[k], then min I=l,...,k

min

M,(p)<

l/k<

max

andt=l,...,k. min

On the

M,(p).

(10)

t=l,...,k p~(O,l)

.DE(O,l)

first assertion follows from the fact that under p1 = 0.. =pk and the random variables rt + Ui, i = 1, . . . , k, are i.i.d., and have a density nl = “’ -k? with respect to the Lebesgue measure on the real line. The second can be seen to hold by considering (4) in Lemma 1 for some t with n,#A, and standard argumentation completes the proof. Proof.

The

The main result of this section is in accordance Miescke (1987) for the normal case.

with the findings

of Gupta

and

1. Under the ‘O-l’ loss, the natural rule dN is minimax if and only if n, = . . . =nk_ Moreover, the minimax value of the selection problem is 1 - l/k, Theorem

Proof. Consider first the no-data rule do, say, which selects each population xi with the same probability l/k, i= 1, . . . . k. Obviously, it has the constant risk l-l/k. On the other hand, in view of (3) and subsequent considerations, the maximum risk of dN is found, cf. Gupta and Huang (1976), to be equal to max PEIO,llk

R@,dN)=

1-

min t=l,...,k

min ,rJE[O,I]

Ml@).

(11)

58

M.M. Abughalous, K. .I. Miescke / Selecting largest success probability

Since, by Lemma 2, this is equal to 1 - l/k for n, = se. =nk, and strictly greater than 1 - l/k for nlll < nlkl, the natural rule dN can Only be minimax if n, = ... = nk. To complete the proof, it has to be shown that the minimax value of the selection problem is equal to 1 - l/k. Since there is an equalizer rule do with constant risk 1 - l/k, it suffices to find, in the Bayesian approach, a sequence of prior distrip = (PI, . . . , pk) treated butions for Q=(O,, . . . . ok), i.e. the success probabilities now a priori tends to this be found in a > 0, p > 0,

as random variables, such that the sequence of associated Bayes risks value of 1 - l/k, cf. Berger (1985), p. 354. Such a sequence can in fact the usual class of conjugate priors, i.e. the beta-distributions Be(a, p), with densities b(pla,P)=T(cx+P)r(a)-‘r(P)~‘pa-l(l-p)B-’,

PE[O,l].

(12)

given Let, a priori, O,, . . . . @k be i.i.d. Be(a, a), for some a>O. Then, a posteriori, that X=x, Oi, . . . . @k are independent, with respective distributions Be(a + xi, a + ni-Xi), i= 1, ..., k. Thus, the Bayes rule da(X), say, which minimizes the posterior expected loss, yields the posterior risk

min i1 -p{@j=@[k]Ix=X}] = 1 -

i=l,...,k

where B,, . . . . Bk are generic random Be(a + Xi, a + n; - Xi) with expectation

max

i=l,...,k

p{Bi=B,k,},

(13)

variables, which are independent, and Bipi = (a + x~)/(~cx + ni) and variance

a~(a)=(a+X;)(~-tni-Xi)/(2a+n;)2(2a+ni+1),

i=l,...,k.

Now it is well known, cf. Johnson and Kotz (1970), p. 41, that a standardized betadistribution tends to N(0, 1) if its variance tends to zero while its expectation is held fixed. The latter condition can be relaxed to the requirement that the expectations tend to a fixed value in (0, l), cf. Bickel and Doksum (1977), p. 53. If c!= 1,2, . . . tends to infinity, one can see that pi tends to 3, 8aa~(a) tends to 1, and that I+-,~~(a)lai_‘((~) tends to zero, which implies that the random variable (8a)1’2(Bi3) has the limiting distribution N(0, l), i = 1, . . . , k. Therefore, (13) tends to the value 1 - l/k at every outcome X=x. The marginal distribution of X has the limit lim P{X=X> a-m

= iir f(xiIni,

3),

Xi-O, 1, ...,ni*

i=l , . . . , k, and thus the Bayes risk of da tends to 1 - l/k pletes the proof of the theorem.

(14) for large a, which com-

As a final note, let us point out what might go wrong with the natural rule dN under the ‘O-l’ loss. If the IEi’S are not all equal, and if pi, . . . . pk are close to some pe [0, 11, then dN can in fact perform ‘worse than at random’. This follows from (11) and (lo), where explicit results for p = 0 and p = 1 are given in Corollary 1, and asymptotic ones subsequently by means of (9).

M.M. Abughalous, K. J. Miescke / Selecting largest success probability

59

3. Bayes rules under the ‘O-l’ loss Throughout this section, it is assumed that the ‘O-l’ loss is given. As long as Bayes rules are under concern, we may restrict considerations to nonrandomized selection rules d, which can be represented simply by functions d(x) with range (1, . . . , k}, where d(x) = i means that at X=x, d selects population nj, xi E (0, 1, . . . . nj}, i= 1, . . . . k. For any fixed prior distribution 5 of 0 = (O,, . . . , Ok), which are the success probabilities treated now as random variables, the Bayes rules dB , say, which minimize the posterior risks P(Oj # O,,,/X= x}, i = 1, . . . , k, pointwise at every given X=x, are characterized by max

dB(x)e{ijYJ(i/x)= j=

g(jlx),

i=l,...,

k},

1, . . ..k

(15)

where

and f is given by (1). Although not always unique, we will talk about the Bayes rule dB in the sequel, thereby tacitly assuming that one particular choice of (15) has been made for each x where the Bayes rule is not unique. Finding the minimum posterior risk can be done through pairwise comparisons k. Hereby, it is natural to say that the of ~((sjx)=P(O,=O,k,lX=X}, s=l,..., Bayes rule dB prefers, at X=x, population 7~;over 7Cjif P{ Oj = @[k]Ix= x} is larger than P{ Oj= @[k]IX=x}, and that it selects one of the most preferred populations. In many applications, there is no initial knowledge available as to how the ordered ?z~,. . . . nk, i.e. success probabilities pII], . . . , p[k] are associated with the populations the apriori belief is that each of the populations may equally likely have the largest success probability. The class of priors which represent this situation are the permutation invariant priors, which are also called permutation symmetric or exchangeable priors. To find out under which conditions one population is preferred over another one if r is permutation invariant, let us compare without loss of generality %(21x) with $(l Ix), say, to keep the notation simple. Although this is basically the same approach used by Gupta and Miescke (1987) in the normal case, the results presented below for the binomial case are rather different in nature. After exchanging the variables p, and p2 in the integral representation of 9 (21x) given in (15), we can see that if the support of 5 is contained in (0, l)k, an assumption being made temporarily to get a simpler representation, 9(2/x)-

9(11x)=

P~~,,(x,P)-

s {PIPI=Plkl~

11 fi f(xjl~;~Pj)W-& j=l

where M2,

I(-?

PI

=

[PI

/P21

xz-x1[(1 _p2)/(l

_pl)](“l-xI)-(nz-x2).

(16)

60

M.M. Abughalous,

K. J. Miescke / Selecting largest success probability

The Bayes rule, derived from (15), depends on the particular prior r and may be rather cumbersome to evaluate in concrete applications. However, in some interesting special cases, its behavior can be seen to be quite robust to variations in the prior, as long as exchangeability is preserved, as well as to be rather intuitive. Theorem 2. For every permutation invariant prior T which satisfies t{ (p, . . . , p)jO I pi l} < 1, the (every) Bayes rule dB prefers, at X=x, population zi over nj if one of the two conditions (a), (b) are fulfilled, which do not involve x,, n,, for sf i, j. (a) Xi>Xj and n,-x;Inj-Xj, (b) Xi’Xj and n,-x; 1, no matter what x,, pS, for s? 3, might actually be. A careful examination of all situations where pz = 0 or p, = 1 occurs shows that the function which is integrated with respect to T in (16), after cancelling out powers of p2 or 1 -p,, respectively, is positive under (a) and (b), except for some cases where it is equal to zero. Thus, in view of the permutation invariance of r, and the fact that r is not concentrated on the diagonal of the unit cube, the proof is completed. Theorem populations if (a) or (b) and number worth to be omitted.

2 provides a very natural partial ordering of preference among the ni is preferred over Zj, ni, . . . , 7~~at every outcome X=x. Population is fulfilled, i.e. if it is as good as ~~ in terms of number of successes of failures, and strictly better in at least one of them. Two special cases mentioned are considered below. The proof is straightforward and thus

2. Under the assumptions of Theorem 2, the following holds. If ni = nj, then dB prefers 71;over nj if and only if xi > xj. On the other hand, if X; = Xj, then dB prefers ni over nj if and only if ni< nj. Corollary

A third and last special case of interest is that one where xi=,Fj occurs. Such a situation was described in the example of a gambling problem considered in the Introduction. It was stated under the assumption of a linear loss, and it will be discussed in this form later in Section 4. At this point, let us see what the gambler should do if the loss were actually a ‘O-l’ loss. His Bayes rule dB is of course given by (15), and it can be determined through (16) if the prior r is permutation invariant. However, more can be said if additional information about r is available. Theorem 3. Under the assumptions of Theorem 2, suppose that at X=X, Xi = Xj = [0, l] holds. Zf the prior t is known to satisfy r([O,~1~) = 1, then the (every)

KE

Bayes rule dB prefers rt; over ~j if and only if n;>nj. On the other hand, if it is known that the prior satisfies T([x, ilk) = 1, then rti is preferred over nj if and only if Tli
M.M. Abughalous, K. J. Mescke / Selecting largest success probability

Proof.

From

(16) it follows

that for Z7,=R2=K

and pl,

61

p2c5(0, l), (17)

Since the function

ha@) =px(l

-p)’ -”

is, for every fixed RE [0, I], increasing

for p E (0, Z), and decreasing in p for p E (x, l), the proof same way as it was done for Theorem 2.

can be completed

in p in the

Besides the cases of K = 0 and K = 1, where such a required knowledge about the prior r is obviously at hand, situations are possible where at some RE (0,l) enough is known about r to be able to apply Theorem 3. In the gambling example it may very well be known that by some existing law, the winning probabilities pl, . . . , pk have to be at least 0.45, in which case the gambler decides that the first type of game, which was played only 20 times, is the best. On the other hand, if such a law would require a lower bound of 0.40, then one can be pretty sure that the casino does not offer gambling with winning chances of 0.45 or higher, in which case the gambler decides that the third type of game, which was played most often, is the best. If, instead of the ‘O-l’ loss, the linear loss is considered, stronger results in the same direction can be seen to hold. This will be discussed in the next section. To conclude this section, let us mention that in the general case of any given prior, which may be permutation invariant or not, the Bayes rules select in terms of the largest P(Oi=O,,,~X=x}, i= 1, . . . . k, under the ‘O-l’ loss. The use of such Bayes rules typically requires numerical integration, as it is exemplified by Bratcher and Bland (1979), where priors are considered under which O,, . . . , 0, are independent with Oi-Be(a;,&), i= 1, . . . . k.

4. Bayes rules under the linear loss In this section,

it is assumed

that the linear loss function

L(p, j) = v - (V + W)Pj,

is given, which is equal to

V, w > 0 fixed,

(18)

if at p=@,, . . . . pk), population Icj is selected, j= 1, . . . . k. This loss function has been discussed already in the Introduction, and it should be pointed out that it is different from the loss function considered by Bratcher and Bland (1979), which is called there linear, too. As before in Section 3, we simply talk about the Bayes rule, although it may not always be unique. By doing so, we tacitly assume that one particular version has been chosen. Since, under the linear loss, the choices at X=x are dB(x)~

{i]E{@ilX=~}

=

max E{OjlX=X), /= 1, . . . . k

i= 1, . . . . k},

(19)

we can simply say that the Bayes rule dB selects in terms of the largest posterior expectations of O,, . . . , ok. Also, as before, the Bayes rule can be found through

M.M. Abughalous,

62

K. J. Miescke / Selecting largest success probability

x},

pairwise comparisons, this time of E{ 0; IX= i = 1, . . . , k, and we can say that the Bayes rule prefers population 7~;over TCjif the posterior expectation of 0; is larger than the one of 0,. The most natural priors are the permutation invariant priors, as it has been justified in Section 3, and thus let us consider them first. A thorough analysis shows that Theorem 2, Corollary 2, and Theorem 3 do not only hold also under the linear loss, but even under any loss function which is monotone and permutation invariant. These results can thus be called ‘universal laws of binomial selections’, and their proofs are postponed to Section 5, where this natural class of loss functions is considered. For specific priors, more can be said about the Bayes rules. This will be done in the remainder of this section. To begin with a permutation invariant prior, let us assume that a priori, Or, . . . , Ok are independent with Oi - Be(a, /I), i = 1, . . . , k, where a, p> 0 are fixed. In this case, a posteriori at X=x, O,, . . . . Ok are independent with respective marginal distributions Be(a + xi, p + nj -xi), and

E(O;lX=x}

=E{O;IX;=Xi}

=((X+Xi)/(a+p+ni),

i= 1, . . ..k.

Therefore,

the Bayes rule dB selects in terms of the largest (a+xj)/(a+p+ into pairwise comparisons, one can see that at X=x, population 71i over ~j if and only if

i=l , . . . . k. Reformulated prefers

ni[(l +nj/(a+/f))X;-aa/(a+/3)]>nj[(l where a/(a+/3) is the common a priori mean lowing result can be derived, where its second sidered in Theorem 3.

+ni/(a+P))~j-a/(a+P)],

n,), dB

(20)

of Or, . . . . Ok. From this, the folpart deals with the situation con-

Theorem 4. Let the prior be given as stated above, and let X=x

be observed. If a and p are sufficiently large, then the (every) Bayes rule dB selects in terms of the largest ni[Zi-a/(a+P)], i= 1, . . . . k. Also, if x~=.Y~=RE [O,l], then the (every) Bayes rule d B prefers population pi over Zj if and only if (n; - nj)[.F- a/(a + p)] > 0. The condition in the first part of the theorem is met for example if for some fixed prior mean a/(a + p), the prior variance ap/(a +/3)*(a + p+ 1) is known to be small. And the condition in the second part can be relaxed at suitable situations to pi and ~j being close to some Z. It should be noted that the rule considered in Theorem 4 has its analogue in the normal case, cf. Gupta and Miescke (1987). The cases of a =p= 1 and a =p = 5 provide noninformative priors, cf. Berger (1985), p. 230. In the first, the Bayes rule selects in terms of the largest (Xi+ l)/(q+2), i= 1, . . . . k, whereas in the second, it selects in terms of the largest (x;++)/(n;+ l), i= 1, . . . . k. And for smaller values of a and p, one can see that the Bayes rule can be approximated by the natural rule dN, which is also analogous to the normal case. A natural class of priors which are not permutation invariant results from the

M.M. Abughalous, K. J. Miescke / Selecting largest success probability

conjugate family of beta-distributions. Suppose that a priori, where the (x’s and p’s are fixed positive i=l , . . . , k, are independent,

63

Oi- Be(ai, /3;), numbers. Then

the Bayes rule dB selects in terms of the largest (ai + ~;)/(a; + pi + n,), i = 1, . . . , k, or equivalently, in terms of the largest ((Y;+ Xi)/(p; + ni - Xi), i = 1, . . . , k, where Xi and n, -Xi are the number of successes and failures, respectively, which occurred in will be further considered in the next population ni, i= 1, . . . . k. This situation section. To conclude this section, let us look again at the case of (pi = ... = ok = a> 0 and p, = . . . =Pk =p> 0. If for some fixed prior mean a/(a + fi) E (0, l), a and p are sufficiently large, then the Bayes rule dB given by Theorem 4, which selects in terms of the largest of the values ni[~i- (x/(a+/3)], i= 1, . . . . k, can be approximated by the rule d *, say, which selects in terms of the largest ni[~~-js],

i= 1, . . . . k,

(21)

where

This can be justified by noting that for large a and p, the prior variance is small. The obvious advantage of using d * is that it does not depend on a and /3, a sometimes desirable robustness property. Applying d * to the data presented by Berger and Deely (1988), which are observed batting averages of k= 12 baseball teams, one can see that d * produces the same a posteriori rank order, except for an exchange of ranks 7 and 8, and an exchange of ranks 9 and 10, which are explainable through closeness of performances.

5. Bayes rules under monotone, In this section, loss if population ditions.

permutation

invariant

loss

it is assumed that the loss function L(p,j), which represents the nj is selected at p = (‘pl, . . . . pk), satisfies the following two con-

L(P,O
ifPi>Pj, a(i))

for every permutation

cr,

(22)

where a(p) = bO(i), . . . , poCkI), p E [0, ilk, i, j= 1, . . . , k. Also assumed is that L is integrable properly, whenever it is needed. It should be noted that by the permutation invariance of L, L(p, i) = L(p, j) holds whenever pi =Pj occurs. The linear loss function given by (18) satisfies (22), whereas the ‘O-l ’ loss function satisfies only L(p, i) I L(p, j) under pi>Pj, besides of the permutation invariance. As mentioned in Section 4, let us first show that Theorem 2, Corollary 2, and Theorem 3 remain to be valid if, instead of a ‘O-l’ loss, a monotone, permutation

MM.

64

Abughalous,

K. J. Miescke / Selecting largest success probability

invariant loss function is assumed. The key for the proof was (16) in the case of ‘O-l’ loss. Below, a similar pairwise comparison of two posterior expected losses is derived, from which this fact follows easily, since the comparison involves the auxiliary function M,, 1(x, p) in basically the same way as before in (16). Lemma 3. Under every monotone,

invariant loss function L, and for l} < 1, the difference of the posterior expected losses for selecting populations x1 and rt2, at X=x, has the representation every permutation

permutation

prior 5 which satisfies z{@, . . . . p)lO~p~

invariant

E{L(O, 1)1X=x} -E{L(O,2)IX=x} =

,fI, f(xjlnjtpjPj)drOl),

l)lPfz,~k~)-ll

[L(P,2)-L(P, 5{PIPZ
(23) where L(p, 2) - L(p, 1) > 0 over the entire range of integration. Proof. Let E{L(O, l)-L(0,2)1X=x} =I, +Z2+Z3, say, where Zr, Z2, 1s are the corresponding integrals over the ranges {pip, p2} respectively. Exchanging variables p1 and p2 in the integral Zr results in I, =

=

Pl, P3, ***,

tL(cn,, PI>P3, . . .9Pk), 1) -NP2, i {PlP2
PA 2)l

Lw92)-UP7111 s{PlPZ
(24)

j23

The first equation hereby follows from the invariance from the invariance of L. Since now Z2 is obviously 13 =

s

[Lb I)-L@,2)1

of t, and the second follows zero, and 1s is simply

(25)

!i, f(xjInj,pj)d@),

{PlP2
the proof

is completed

To summarize,

by noting

that Ml,,

it has been shown

can be written

that the following

as

holds.

5. The results of Theorem 2, Corollary 2, and Theorem 3 are also valid if, instead of a ‘O-l’ loss, any monotone, permutation invariant loss function is given. Theorem

M.M. Abughalous, K. J. Miescke / Selecting largest success probability

The remainder

of this section

deals with the question

65

of how to choose nl, . . . . nk

properly, under a given prior, to get a Bayes rule of simple structure. This situation seems to be quite realistic since in many applications, it is the design of nl, . .., nk rather than the prior r that can be controlled by the experimenter. It will be seen below that the Bayes rule dB is in fact very simple, as it selects in terms of the largest Xi=nj~ji i= 1, . . . . k, if the posterior distribution has a density r(plx) say, with respect to a permutation invariant, sigma finite measure p on the Bore1 sets of which is decreasing in transposition (DT), cf. Hollander, Proschan, and 10, ilk, Sethuraman (1977), i.e. which satisfies the following two conditions: s(PiX)~z@l(XZ,X1,X3,...,Xk))

ifplsP2,

xl~-%, (26)

z(a(p)ja(x))

For such a prior,

for all p, x, and permutations

= T(J~X)

the following

can be shown

cr.

to hold.

Lemma 4. Under a monotone, permutation

invariant loss function, suppose that the posterior distribution has a density with respect to a permutation invariant, sigmafinite measure ,a on the Bore1 sets of [0, ilk, which is (DT). Then the (every) Bayes rule dB selects in terms of the largest Xi = ni.Yi, i = 1, . . . , k. Proof. Under the assumptions X=x, for selecting population

stated in the lemma, the posterior Xi, i E { 1, . . . , k}, is given by

expected

loss at

LF(x,i)=E{L(O,i)lX=x} = yI [O,ilk

UP, i)@lx)

(27)

WA

wherexjE{O,l ,..., nj},j=l ,..., k. By Lemma 3 of Gupta and Miescke (1984), it follows that Z(x,i) satisfies (22), i.e. it has all the properties of a monotone, permutation invariant loss function, where now x plays the role of p. Since the Bayes rule selects in terms of the smallest g(x,i), i= 1, . . . . k, the proof follows from (22). Let us now find sufficient conditions, under which the posterior density r@lx) is (DT). Naturally, for the remainder of this section, we assume that the prior has a density r(p), say, with respect to ,z. And, for simplicity of exposition, since {Oi= 1 and Xi> O> is an impossible event, i = 1,. . . , k, we can represent r@lx) by r01l4

= chxW(x,

pkb,

n),

(28)

where d(x,p)=

fi i=l

and

bi/(l

-Pi)l’,

e@, n> = ii 1=1

(1 -Pi)“‘r@)7

MM.

66

Abughalous, K. J. Miescke / Selecting largest success probability

c(n, x) = 1

0,

p)eOl, n) d&J).

1.i to,Ilk From

this representation,

the following

can be concluded.

Theorem 6. Under a monotone,

permutation invariant loss function, suppose that the posterior distribution has a density t(plx) with respect to a permutation invariant, sigma-finite measure u on the Bore1 sets of [0, Ilk. If, in the representation of s@lx) given by (28), e(p, n) = e(a@), n), for every p E [0, ilk and every permutation o, then s@lx) is (DT), and thus the Bayes rule selects in terms of the largest xi=niR;, i= 1, . . . . k. Proof. Apparently, d(x, p) is (DT). Therefore, t(pix) is (DT) if c(n, x) = c(n, o(x)), for every x and every permutation o. The latter can be shown as follows: 1/c(n, o ~ l(x)) =

4x,

O))e@,

n) Q(P)

s [O, 1Y

0, O))e(cQ),

=

n) 44dd)

n

=

I

4x, p)e@,4 d&G = 1/dn, 4,

(29)

VIIlk

where the first equation follows from d(o-l(x), p) = d(x, a(p)), the second from the assumption made on e(p,n), and the invariance of p, and the third from a simple change of variables in the integration. The second claim of the theorem follows from Lemma 4. Therefore, the proof is completed. The following

result is an immediate

consequence

of (28).

Corollary 3. For a given prior density t(p), each of the following two conditions on n,, . . . . nk are sufficient to apply Theorem 6. (a) r(p) = nf=, ri(pi), where Si(pi) = c;(l -pi)-“lh(pi), p; E [0, 11, h is non-negative, and Ci is a normalizing constant, i= 1, . . . . k, (b) r(p) = flF= 1 (1 -pi) -“Sg(p), where g(o(p)) = g(p) 2 0, for every p E [0, l] “, and every permutation o. To conclude this section, let us demonstrate that the concept of posterior densities with the (DT) property can be extended to cases where (DT) is given with respect to p and y, say, where y is some transformation of x. The following result shows clearly, how this technique can be used for specific types of priors, similar as it is done here for the conjugate family of beta priors. Theorem I. Under a monotone, permutation invariant loss function, suppose that a priori, Oi - Be(oi, pi), where ai > 0 and pi > 0 are fixed, i = 1, . . . , k, are indepen-

M.M. Abughalous, K. J. Miescke / Selecting largest success probability

67

dent. If n,, . . . . nk can be chosen in such a way that for some 6 > 0, ai + /Ii f nj = S, i=l 7-.., k, then the (every) Bayes rule dB selects in terms of the largest (Xi+Xi, i=l 7.-*, k. Proof. Under the assumptions of the theorem, suppose that cXi+p;+ nj=6, i= 1, . . . . k, for some 6> 0. Then one can see that the posterior density t(plx) is proportional to

;ftl [Pi/Cl -Pi)ly’pl ii

HIP9Y)=

t1 -Pj)"p',

j=l

where p E [0, l)k, and y is a transformation of x, given by yi = (Xi+ X, , i = 1, . . . , k. Since, apparently, g@, y) is (DT), the proof proceeds along the lines of the proof of Lemma 4, where now y plays the role of x there. To conclude this section, let us mention that in many situations, n,, . . . . nk with the required condition can in fact be found. This is the case, whenever o;+Pi, i= 1, . . . , k, have a common fractional part, and thus especially, whenever oi + p, , i= 1,...,k,areintegers.Thecaseofa,=...=akandP,+n,=...=Pk+nkprovides an example where Theorem 6 applies.

References Bechhofer, about Berger,

R.E. (1985). Selection

and ranking

present,

alternatives Bickel,

- Some personal

reminiscences,

and thoughts

and future.

York. Berger, J.O. and J. Deely (1988). A Bayesian

Bratcher,

procedures

Amer. J. Mafh. Manag. Sci. 5, 201-234. J.O. (1985). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer-Verlag, its past,

P.J.

to AOV methodology. and K.A.

approach

to ranking

and selection

means with

J. Amer. Statist. Assoc., 83, 364-373.

(1977). Mathematical Statistics. Holden-Day,

Doksum

of related

New

T.L. and R.P. Bland (1975). On comparing

binomial

probabilities

San Francisco, from a Bayesian

CA. viewpoint.

Commun. Statist. 4, 975-985. Gupta, S.S. and D.Y. Huang (1976). Selection procedures binomial populations. Sankhyn Ser. A 38, 153-173.

for the entropy

Gupta,

approach

to binomial

procedures

- A decision-theoretic

S.S.

Technology Gupta,

and

G.C.

McDonald

(1986).

A statistical

function

associated models.

with the

Quality

J.

18, 103-115.

S.S. and K.J. Miescke

(1984). Sequential

selection

approach.

Ann. Statist. 12, 336-350. Gupta,

S.S. and K.J. Miescke (1987). On the problem

of finding

the largest normal

mean under heteros-

cedasticity. In: S.S. Gupta and J.O. Berger, Eds., Statistical Decision Theory and Related Topics IV. Springer Verlag, Berlin-New York, Vol. 2, 37-49. Gupta S.S. and S. Panchapakesan (1979). Multiple Decision Procedures. Wiley, New York. Gupta, S.S. and M. Sobel (1960). Selecting a subset containing the best of several binomial populations. In: I. Olkin et al., Eds., Stanford, CA, 224-248.

Contributions to Probability and Statistics. Stanford

Hollander, M., F. Proschan and J. Sethuraman (1977). Functions applications in ranking problems. Ann. Statist. 5, 722-733.

decreasing

University

in transposition

Press,

and their

68

M.M. Abughalous, K. J. Miescke / Selecting largest success probability

Johnson,

N.L.

and S. Kotz (1970).

Boston, MA. Sobel, M. and M.J.

Huyett

Continuous Univariate Distributions. Vol. 2. Houghton

(1957). Selecting

the best one of several binomial

populations.

Mifflin,

Bell System

Tech. J. 36, 537-576. Tong,

Y.L. and D.E. Wetzell (1979). On the behavior

normal

population.

Biometrika 66, 174-176.

of the probability

function

for selecting

the best