Journal
of Statistical
Planning
and Inference
21 (1989) 53-68
53
North-Holland
ON
SELECTING
THE
LARGEST
UNDER
UNEQUAL
SAMPLE
Mansour
M. ABUGHALOUS
SUCCESS
PROBABILITY
SIZES*
and Klaus
J. MIESCKE
Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Box 4348, Chicago, IL 60680, U.S.A. Received
23 October
Recommended
1987
by S.S. Gupta
Abstract: Let z,, . . . . 7~~be kr3 independent binomial populations, from which X,- B(n,, p,), i= 1, . . ..k. respectively, have been observed. The problem under concern is to find that population which is associated with the largest of the unknown ‘success probabilities’ pl, , pk. Under
the ‘O-l’
loss,
permutation
invariant
permutation
invariant,
density
with respect
some linear
loss which
loss, interesting
as well as for priors to some symmetric
occurs
properties
in gambling,
and
which are not invariant
measure.
a general
of Bayes rules are studied Examples
monotone,
for priors which are
but have a (DT)-posterior
of independent
beta-priors
are in-
cluded.
AMS Subject Classification: Primary Key words: Selecting the best binomial sizes; Bayes selection
62F07;
Secondary
population;
ranking
62F15. Bernoulli
trials with unequal
sample
rules.
1. Introduction binomial populations from which X,Let xl, . nk be k? 3 independent B(ni,p;), i= 1, ..., k, respectively, have been observed. The problem considered in this paper is to find that population which is associated with the largest of the unknown parameters p,, . . . , pk. Interpreting x, as the number of successes in a Bernoulli sequence of length nj with success probability pi, i= 1, . . . , k, the population to be selected will be called the ‘best population’. Selecting the best of k binomial populations in case of nl = n2 = ... = nk has been the topic of many papers in the past. Sobel and Huyett (1957) employed the indifference zone approach to implement the natural rule dN, say, which selects in terms of the largest ~i==;/ni, i= 1, . . . . k, where ties are broken at random. A subset selection procedure has been introduced by Gupta and Sobel (1960), and ..)
* This Research was supported by the Air Force 85-0347 at the University of Illinois at Chicago.
037%3758/89/$3.50
0
1989, Elsevier
Office
Science Publishers
of Scientific
Research
B.V. (North-Holland)
Contract
AFOSR-
54
M.M. Abughalous, K. J. Miescke / Selecting largest success probability
further examined by Gupta and McDonald (1986). Moreover, several multi-stage selection procedures with vector-at-a-time sampling schemes can be found in the literature, where this special situation arises at some points. For an overview, the reader is referred to Gupta and Panchapakesan (1979). It is well known that the natural rule dN is the uniformly best permutation invariant procedure if the n;‘s are equal. The most general version of this fact is presented in Gupta and Miescke (1984), where the risk function of multi-stage selection rules is studied under a general permutation invariant loss structure. The situation becomes much more complicated when the assumption of n, = Besides its asymptotic consistency, which is a must for n2 = .‘. =nk is dropped. every reasonable decision rule, no further optimality properties of dN are known to this point. The present work parallels that of Gupta and Miescke (1987), where selection of the largest normal mean under heteroscedasticity is considered. In his list of unsolved problems, Bechhofer (1985) presumes that the Bernoulli problem is more difficult due to its lack of location invariance. Two types of loss functions will be of primary interest in the following. The first is the so called ‘O-l’ loss, which is zero if the best population is selected, and one otherwise. It is the loss function most frequently used in the literature of ranking and selection, despite of the fact that it leads to rather pathological situations whenever the distance between plk_ il and pCkl is small, where plil 5p12] I 1..
55
M.&f. Abughalous, K. J. Mescke / Selecting largest success probability
function
are made in Section
4. For suitable
choices of n,, . . . . nk, which depend
on
the given prior, the posterior is decreasing in transposition and leads to a simple selection rule which is Bayes for a general class of loss functions. This is shown in Section 5 and illustrated by an example of beta priors.
2. Minimaxity The independent observations functions, respectively,
f(x;In;,pi)=P(X,=xj}
i = 1, . . . , k with
are Xi - B(ni, pi),
= ;
probability
pi”((l -p;)n’-x’,
0 I x;~{O,l,...,
n,},
pi~[O,l],
i=l,...,
k.
(1)
To represent
the natural rule dN in a concise form, let us introduce the following of X= randomization scheme U = (Ur , . . . , U,), say, which is independent Xk at random, whenever they occur. X,), to split ties among X1, . . . . (X,, . . . . on [0, E], where E>O is sufficiently u,, . . . . CJ, are assumed to be i.i.d., uniformly small to avoid that the ordering of distinct X;‘s could be reversed. To be more The natural rule dN can now be written specific, any E < l/n[,]ntk_ iI is appropriate. (a.e.) as dN(X,U)=i
iff
Xi+ U,=
max J=l,...,
{Xj+Uj},
i=l
9 . . . . k.
(2)
k
For the problem under concern, a minimax rule minimizes the maximum possible risk, i.e. the expected loss, among all selection procedures. Under the linear loss function, the situation turns out to be trivial. Since the risk of every rule assumes its maximum value of u at p, = ... =pk = 0, all of the selection procedures are in fact minimax, and the minimax value of the selection problem is equal to the worst possible loss. In the remainder of this section, the case of a ‘O-l’ loss will be studied. Let Et = {Xi+ Ui
(3)
Thus, all decision theoretic formulations in terms of the risk can be translated immediately into the P(CS)-language used in the area of ranking and selection. Since every binomial family B(n;,p;) is stochastically increasing in pin [0,11, i=l , . . . . k, the maximum risk of dN, and thus its minimum probability of a correct
56
selection,
M.M. Abughalous, K. J. Miescke / Selecting largest success probability
occurs at some parameter
configuration
of the type pr = ... =pk. There-
fore, let us first examine the behavior of (3) at this particular situation. For this purpose, let M1 be the auxiliary function M1@) = P, ,,,,,,(E,), p E [0,11, t = 1, . . . , k. Moreover, let n = nr + .a. + nk, and A = n/k. Then the following can be shown. Lemma 1. Let te (1, . . . . k} be fixed. At the endpoints of the unit interval [0,11, M,
assumes the value l/k, and its one-sided derivatives are equal to (M,);(O)=n,-A
and
(M,)L(l)=(k-l))‘(n,-n).
(4)
Thus, if n, #A, Mr assumes the value l/k at least once in (0, I), the interior of the unit interval. Proof. Apparently, for every fixed t, M,(O) =M,(l) = l/k. To find the derivative the left of MI at p= 1, one can see that for p close to one,
M,(p)=k-‘p”+(k-1))‘~”
on
(5)
C p-“~(l-p”l)+o(l-p). j+t
Hereby, the first term on the right hand side of (5) refers to the outcome Xi= ni, i=l ,..., k, whereas the second one refers to Xj
(1) = lim P&(l)
-M(PWU
-P)
p-1
=k-‘n-(k-
1))’ j;i
nj=(k-
1))‘(nt-A).
(6)
The derivative on the right of MI at p = 0 can be found in a similar to zero, one can see that
Mt(p)=k-‘(1
-p)“+n,(l
way. For p close
-~)~-‘p+o@),
(7)
where the first term on the right hand side refers to the outcome and the second one to X, = 1 and Xj = 0, for all j# t. Thus,
(M,) ‘+(0) = lim [M((p) - M,(O)]/p = n, - A.
Xt = 1.. =X,=0,
(8)
p-0
The last assertion follows from the fact that in case of n,fn, same sign. This completes the proof.
(6) and (8) have the
Corollary 1. Ifp,, . . . . pk are sufficiently close to zero (one), and zt is the best population, then the probability of making a correct selection with dN is increasing (decreasing) in n,, and decreasing (increasing) in ni for i # t. Proof. This follows from the fact that the one-sided derivatives of M, at the endpoints of [0, 11, which are given in (4), are increasing in n,, and decreasing in n, for i# t. Thus, standard continuity argumentation completes the proof.
MM.
Abughalous, K. J. Miescke / Selecting largest success probability
57
the case where p *,. .., pk are dose to some point p E (0, l), except for large nr, . . . , nk. If the ni’s tend to infinity in such a way that (n;/n,)“+& i= 1, . . . . k, where of course yl= 1, then Ml@) tends to Not much
can be said about
_I
Hfh
*..T
Yk)=
I
II @(YiZMZ)dZ,
(9)
R i#f
where @ and cp are the cumulative distribution function and density, respectively, of the standard normal distribution N(0, 1). This follows directly from the asymptotic normality of Xi, i = 1, . . . , k. One interesting point now is that the function Ht in (9) is strictly increasing in yi, i# t. This has been shown in two different approaches by Tong and Wetzell (1979) and by Gupta and Miescke (1987). Another point is that (9) implies that M,(p) is in general not equal to l/k at p=+, despite of the fact that the binomial distributions B(ni, pi), i= 1, . . . . k, are all symmetric at PI = ... =pk =+. Of course, at yr = ..’ = Yk= 1, H, is equal to l/k, but this holds already in the non-asymptotic case, as it is stated in the first part of the following lemma. Lemma 2. If n, = ... =nk, thenMt(p)=l/kforallp~[O,l] other hand, if qll < n[k], then min I=l,...,k
min
M,(p)<
l/k<
max
andt=l,...,k. min
On the
M,(p).
(10)
t=l,...,k p~(O,l)
.DE(O,l)
first assertion follows from the fact that under p1 = 0.. =pk and the random variables rt + Ui, i = 1, . . . , k, are i.i.d., and have a density nl = “’ -k? with respect to the Lebesgue measure on the real line. The second can be seen to hold by considering (4) in Lemma 1 for some t with n,#A, and standard argumentation completes the proof. Proof.
The
The main result of this section is in accordance Miescke (1987) for the normal case.
with the findings
of Gupta
and
1. Under the ‘O-l’ loss, the natural rule dN is minimax if and only if n, = . . . =nk_ Moreover, the minimax value of the selection problem is 1 - l/k, Theorem
Proof. Consider first the no-data rule do, say, which selects each population xi with the same probability l/k, i= 1, . . . . k. Obviously, it has the constant risk l-l/k. On the other hand, in view of (3) and subsequent considerations, the maximum risk of dN is found, cf. Gupta and Huang (1976), to be equal to max PEIO,llk
R@,dN)=
1-
min t=l,...,k
min ,rJE[O,I]
Ml@).
(11)
58
M.M. Abughalous, K. .I. Miescke / Selecting largest success probability
Since, by Lemma 2, this is equal to 1 - l/k for n, = se. =nk, and strictly greater than 1 - l/k for nlll < nlkl, the natural rule dN can Only be minimax if n, = ... = nk. To complete the proof, it has to be shown that the minimax value of the selection problem is equal to 1 - l/k. Since there is an equalizer rule do with constant risk 1 - l/k, it suffices to find, in the Bayesian approach, a sequence of prior distrip = (PI, . . . , pk) treated butions for Q=(O,, . . . . ok), i.e. the success probabilities now a priori tends to this be found in a > 0, p > 0,
as random variables, such that the sequence of associated Bayes risks value of 1 - l/k, cf. Berger (1985), p. 354. Such a sequence can in fact the usual class of conjugate priors, i.e. the beta-distributions Be(a, p), with densities b(pla,P)=T(cx+P)r(a)-‘r(P)~‘pa-l(l-p)B-’,
PE[O,l].
(12)
given Let, a priori, O,, . . . . @k be i.i.d. Be(a, a), for some a>O. Then, a posteriori, that X=x, Oi, . . . . @k are independent, with respective distributions Be(a + xi, a + ni-Xi), i= 1, ..., k. Thus, the Bayes rule da(X), say, which minimizes the posterior expected loss, yields the posterior risk
min i1 -p{@j=@[k]Ix=X}] = 1 -
i=l,...,k
where B,, . . . . Bk are generic random Be(a + Xi, a + n; - Xi) with expectation
max
i=l,...,k
p{Bi=B,k,},
(13)
variables, which are independent, and Bipi = (a + x~)/(~cx + ni) and variance
a~(a)=(a+X;)(~-tni-Xi)/(2a+n;)2(2a+ni+1),
i=l,...,k.
Now it is well known, cf. Johnson and Kotz (1970), p. 41, that a standardized betadistribution tends to N(0, 1) if its variance tends to zero while its expectation is held fixed. The latter condition can be relaxed to the requirement that the expectations tend to a fixed value in (0, l), cf. Bickel and Doksum (1977), p. 53. If c!= 1,2, . . . tends to infinity, one can see that pi tends to 3, 8aa~(a) tends to 1, and that I+-,~~(a)lai_‘((~) tends to zero, which implies that the random variable (8a)1’2(Bi3) has the limiting distribution N(0, l), i = 1, . . . , k. Therefore, (13) tends to the value 1 - l/k at every outcome X=x. The marginal distribution of X has the limit lim P{X=X> a-m
= iir f(xiIni,
3),
Xi-O, 1, ...,ni*
i=l , . . . , k, and thus the Bayes risk of da tends to 1 - l/k pletes the proof of the theorem.
(14) for large a, which com-
As a final note, let us point out what might go wrong with the natural rule dN under the ‘O-l’ loss. If the IEi’S are not all equal, and if pi, . . . . pk are close to some pe [0, 11, then dN can in fact perform ‘worse than at random’. This follows from (11) and (lo), where explicit results for p = 0 and p = 1 are given in Corollary 1, and asymptotic ones subsequently by means of (9).
M.M. Abughalous, K. J. Miescke / Selecting largest success probability
59
3. Bayes rules under the ‘O-l’ loss Throughout this section, it is assumed that the ‘O-l’ loss is given. As long as Bayes rules are under concern, we may restrict considerations to nonrandomized selection rules d, which can be represented simply by functions d(x) with range (1, . . . , k}, where d(x) = i means that at X=x, d selects population nj, xi E (0, 1, . . . . nj}, i= 1, . . . . k. For any fixed prior distribution 5 of 0 = (O,, . . . , Ok), which are the success probabilities treated now as random variables, the Bayes rules dB , say, which minimize the posterior risks P(Oj # O,,,/X= x}, i = 1, . . . , k, pointwise at every given X=x, are characterized by max
dB(x)e{ijYJ(i/x)= j=
g(jlx),
i=l,...,
k},
1, . . ..k
(15)
where
and f is given by (1). Although not always unique, we will talk about the Bayes rule dB in the sequel, thereby tacitly assuming that one particular choice of (15) has been made for each x where the Bayes rule is not unique. Finding the minimum posterior risk can be done through pairwise comparisons k. Hereby, it is natural to say that the of ~((sjx)=P(O,=O,k,lX=X}, s=l,..., Bayes rule dB prefers, at X=x, population 7~;over 7Cjif P{ Oj = @[k]Ix= x} is larger than P{ Oj= @[k]IX=x}, and that it selects one of the most preferred populations. In many applications, there is no initial knowledge available as to how the ordered ?z~,. . . . nk, i.e. success probabilities pII], . . . , p[k] are associated with the populations the apriori belief is that each of the populations may equally likely have the largest success probability. The class of priors which represent this situation are the permutation invariant priors, which are also called permutation symmetric or exchangeable priors. To find out under which conditions one population is preferred over another one if r is permutation invariant, let us compare without loss of generality %(21x) with $(l Ix), say, to keep the notation simple. Although this is basically the same approach used by Gupta and Miescke (1987) in the normal case, the results presented below for the binomial case are rather different in nature. After exchanging the variables p, and p2 in the integral representation of 9 (21x) given in (15), we can see that if the support of 5 is contained in (0, l)k, an assumption being made temporarily to get a simpler representation, 9(2/x)-
9(11x)=
P~~,,(x,P)-
s {PIPI=Plkl~
11 fi f(xjl~;~Pj)W-& j=l
where M2,
I(-?
PI
=
[PI
/P21
xz-x1[(1 _p2)/(l
_pl)](“l-xI)-(nz-x2).
(16)
60
M.M. Abughalous,
K. J. Miescke / Selecting largest success probability
The Bayes rule, derived from (15), depends on the particular prior r and may be rather cumbersome to evaluate in concrete applications. However, in some interesting special cases, its behavior can be seen to be quite robust to variations in the prior, as long as exchangeability is preserved, as well as to be rather intuitive. Theorem 2. For every permutation invariant prior T which satisfies t{ (p, . . . , p)jO I pi l} < 1, the (every) Bayes rule dB prefers, at X=x, population zi over nj if one of the two conditions (a), (b) are fulfilled, which do not involve x,, n,, for sf i, j. (a) Xi>Xj and n,-x;Inj-Xj, (b) Xi’Xj and n,-x; 1, no matter what x,, pS, for s? 3, might actually be. A careful examination of all situations where pz = 0 or p, = 1 occurs shows that the function which is integrated with respect to T in (16), after cancelling out powers of p2 or 1 -p,, respectively, is positive under (a) and (b), except for some cases where it is equal to zero. Thus, in view of the permutation invariance of r, and the fact that r is not concentrated on the diagonal of the unit cube, the proof is completed. Theorem populations if (a) or (b) and number worth to be omitted.
2 provides a very natural partial ordering of preference among the ni is preferred over Zj, ni, . . . , 7~~at every outcome X=x. Population is fulfilled, i.e. if it is as good as ~~ in terms of number of successes of failures, and strictly better in at least one of them. Two special cases mentioned are considered below. The proof is straightforward and thus
2. Under the assumptions of Theorem 2, the following holds. If ni = nj, then dB prefers 71;over nj if and only if xi > xj. On the other hand, if X; = Xj, then dB prefers ni over nj if and only if ni< nj. Corollary
A third and last special case of interest is that one where xi=,Fj occurs. Such a situation was described in the example of a gambling problem considered in the Introduction. It was stated under the assumption of a linear loss, and it will be discussed in this form later in Section 4. At this point, let us see what the gambler should do if the loss were actually a ‘O-l’ loss. His Bayes rule dB is of course given by (15), and it can be determined through (16) if the prior r is permutation invariant. However, more can be said if additional information about r is available. Theorem 3. Under the assumptions of Theorem 2, suppose that at X=X, Xi = Xj = [0, l] holds. Zf the prior t is known to satisfy r([O,~1~) = 1, then the (every)
KE
Bayes rule dB prefers rt; over ~j if and only if n;>nj. On the other hand, if it is known that the prior satisfies T([x, ilk) = 1, then rti is preferred over nj if and only if Tli
M.M. Abughalous, K. J. Mescke / Selecting largest success probability
Proof.
From
(16) it follows
that for Z7,=R2=K
and pl,
61
p2c5(0, l), (17)
Since the function
ha@) =px(l
-p)’ -”
is, for every fixed RE [0, I], increasing
for p E (0, Z), and decreasing in p for p E (x, l), the proof same way as it was done for Theorem 2.
can be completed
in p in the
Besides the cases of K = 0 and K = 1, where such a required knowledge about the prior r is obviously at hand, situations are possible where at some RE (0,l) enough is known about r to be able to apply Theorem 3. In the gambling example it may very well be known that by some existing law, the winning probabilities pl, . . . , pk have to be at least 0.45, in which case the gambler decides that the first type of game, which was played only 20 times, is the best. On the other hand, if such a law would require a lower bound of 0.40, then one can be pretty sure that the casino does not offer gambling with winning chances of 0.45 or higher, in which case the gambler decides that the third type of game, which was played most often, is the best. If, instead of the ‘O-l’ loss, the linear loss is considered, stronger results in the same direction can be seen to hold. This will be discussed in the next section. To conclude this section, let us mention that in the general case of any given prior, which may be permutation invariant or not, the Bayes rules select in terms of the largest P(Oi=O,,,~X=x}, i= 1, . . . . k, under the ‘O-l’ loss. The use of such Bayes rules typically requires numerical integration, as it is exemplified by Bratcher and Bland (1979), where priors are considered under which O,, . . . , 0, are independent with Oi-Be(a;,&), i= 1, . . . . k.
4. Bayes rules under the linear loss In this section,
it is assumed
that the linear loss function
L(p, j) = v - (V + W)Pj,
is given, which is equal to
V, w > 0 fixed,
(18)
if at p=@,, . . . . pk), population Icj is selected, j= 1, . . . . k. This loss function has been discussed already in the Introduction, and it should be pointed out that it is different from the loss function considered by Bratcher and Bland (1979), which is called there linear, too. As before in Section 3, we simply talk about the Bayes rule, although it may not always be unique. By doing so, we tacitly assume that one particular version has been chosen. Since, under the linear loss, the choices at X=x are dB(x)~
{i]E{@ilX=~}
=
max E{OjlX=X), /= 1, . . . . k
i= 1, . . . . k},
(19)
we can simply say that the Bayes rule dB selects in terms of the largest posterior expectations of O,, . . . , ok. Also, as before, the Bayes rule can be found through
M.M. Abughalous,
62
K. J. Miescke / Selecting largest success probability
x},
pairwise comparisons, this time of E{ 0; IX= i = 1, . . . , k, and we can say that the Bayes rule prefers population 7~;over TCjif the posterior expectation of 0; is larger than the one of 0,. The most natural priors are the permutation invariant priors, as it has been justified in Section 3, and thus let us consider them first. A thorough analysis shows that Theorem 2, Corollary 2, and Theorem 3 do not only hold also under the linear loss, but even under any loss function which is monotone and permutation invariant. These results can thus be called ‘universal laws of binomial selections’, and their proofs are postponed to Section 5, where this natural class of loss functions is considered. For specific priors, more can be said about the Bayes rules. This will be done in the remainder of this section. To begin with a permutation invariant prior, let us assume that a priori, Or, . . . , Ok are independent with Oi - Be(a, /I), i = 1, . . . , k, where a, p> 0 are fixed. In this case, a posteriori at X=x, O,, . . . . Ok are independent with respective marginal distributions Be(a + xi, p + nj -xi), and
E(O;lX=x}
=E{O;IX;=Xi}
=((X+Xi)/(a+p+ni),
i= 1, . . ..k.
Therefore,
the Bayes rule dB selects in terms of the largest (a+xj)/(a+p+ into pairwise comparisons, one can see that at X=x, population 71i over ~j if and only if
i=l , . . . . k. Reformulated prefers
ni[(l +nj/(a+/f))X;-aa/(a+/3)]>nj[(l where a/(a+/3) is the common a priori mean lowing result can be derived, where its second sidered in Theorem 3.
+ni/(a+P))~j-a/(a+P)],
n,), dB
(20)
of Or, . . . . Ok. From this, the folpart deals with the situation con-
Theorem 4. Let the prior be given as stated above, and let X=x
be observed. If a and p are sufficiently large, then the (every) Bayes rule dB selects in terms of the largest ni[Zi-a/(a+P)], i= 1, . . . . k. Also, if x~=.Y~=RE [O,l], then the (every) Bayes rule d B prefers population pi over Zj if and only if (n; - nj)[.F- a/(a + p)] > 0. The condition in the first part of the theorem is met for example if for some fixed prior mean a/(a + p), the prior variance ap/(a +/3)*(a + p+ 1) is known to be small. And the condition in the second part can be relaxed at suitable situations to pi and ~j being close to some Z. It should be noted that the rule considered in Theorem 4 has its analogue in the normal case, cf. Gupta and Miescke (1987). The cases of a =p= 1 and a =p = 5 provide noninformative priors, cf. Berger (1985), p. 230. In the first, the Bayes rule selects in terms of the largest (Xi+ l)/(q+2), i= 1, . . . . k, whereas in the second, it selects in terms of the largest (x;++)/(n;+ l), i= 1, . . . . k. And for smaller values of a and p, one can see that the Bayes rule can be approximated by the natural rule dN, which is also analogous to the normal case. A natural class of priors which are not permutation invariant results from the
M.M. Abughalous, K. J. Miescke / Selecting largest success probability
conjugate family of beta-distributions. Suppose that a priori, where the (x’s and p’s are fixed positive i=l , . . . , k, are independent,
63
Oi- Be(ai, /3;), numbers. Then
the Bayes rule dB selects in terms of the largest (ai + ~;)/(a; + pi + n,), i = 1, . . . , k, or equivalently, in terms of the largest ((Y;+ Xi)/(p; + ni - Xi), i = 1, . . . , k, where Xi and n, -Xi are the number of successes and failures, respectively, which occurred in will be further considered in the next population ni, i= 1, . . . . k. This situation section. To conclude this section, let us look again at the case of (pi = ... = ok = a> 0 and p, = . . . =Pk =p> 0. If for some fixed prior mean a/(a + fi) E (0, l), a and p are sufficiently large, then the Bayes rule dB given by Theorem 4, which selects in terms of the largest of the values ni[~i- (x/(a+/3)], i= 1, . . . . k, can be approximated by the rule d *, say, which selects in terms of the largest ni[~~-js],
i= 1, . . . . k,
(21)
where
This can be justified by noting that for large a and p, the prior variance is small. The obvious advantage of using d * is that it does not depend on a and /3, a sometimes desirable robustness property. Applying d * to the data presented by Berger and Deely (1988), which are observed batting averages of k= 12 baseball teams, one can see that d * produces the same a posteriori rank order, except for an exchange of ranks 7 and 8, and an exchange of ranks 9 and 10, which are explainable through closeness of performances.
5. Bayes rules under monotone, In this section, loss if population ditions.
permutation
invariant
loss
it is assumed that the loss function L(p,j), which represents the nj is selected at p = (‘pl, . . . . pk), satisfies the following two con-
L(P,O
ifPi>Pj, a(i))
for every permutation
cr,
(22)
where a(p) = bO(i), . . . , poCkI), p E [0, ilk, i, j= 1, . . . , k. Also assumed is that L is integrable properly, whenever it is needed. It should be noted that by the permutation invariance of L, L(p, i) = L(p, j) holds whenever pi =Pj occurs. The linear loss function given by (18) satisfies (22), whereas the ‘O-l ’ loss function satisfies only L(p, i) I L(p, j) under pi>Pj, besides of the permutation invariance. As mentioned in Section 4, let us first show that Theorem 2, Corollary 2, and Theorem 3 remain to be valid if, instead of a ‘O-l’ loss, a monotone, permutation
MM.
64
Abughalous,
K. J. Miescke / Selecting largest success probability
invariant loss function is assumed. The key for the proof was (16) in the case of ‘O-l’ loss. Below, a similar pairwise comparison of two posterior expected losses is derived, from which this fact follows easily, since the comparison involves the auxiliary function M,, 1(x, p) in basically the same way as before in (16). Lemma 3. Under every monotone,
invariant loss function L, and for l} < 1, the difference of the posterior expected losses for selecting populations x1 and rt2, at X=x, has the representation every permutation
permutation
prior 5 which satisfies z{@, . . . . p)lO~p~
invariant
E{L(O, 1)1X=x} -E{L(O,2)IX=x} =
,fI, f(xjlnjtpjPj)drOl),
l)lPfz,~k~)-ll
[L(P,2)-L(P, 5{PIPZ
(23) where L(p, 2) - L(p, 1) > 0 over the entire range of integration. Proof. Let E{L(O, l)-L(0,2)1X=x} =I, +Z2+Z3, say, where Zr, Z2, 1s are the corresponding integrals over the ranges {pip, p2} respectively. Exchanging variables p1 and p2 in the integral Zr results in I, =
=
Pl, P3, ***,
tL(cn,, PI>P3, . . .9Pk), 1) -NP2, i {PlP2
PA 2)l
Lw92)-UP7111 s{PlPZ
(24)
j23
The first equation hereby follows from the invariance from the invariance of L. Since now Z2 is obviously 13 =
s
[Lb I)-L@,2)1
of t, and the second follows zero, and 1s is simply
(25)
!i, f(xjInj,pj)d@),
{PlP2
the proof
is completed
To summarize,
by noting
that Ml,,
it has been shown
can be written
that the following
as
holds.
5. The results of Theorem 2, Corollary 2, and Theorem 3 are also valid if, instead of a ‘O-l’ loss, any monotone, permutation invariant loss function is given. Theorem
M.M. Abughalous, K. J. Miescke / Selecting largest success probability
The remainder
of this section
deals with the question
65
of how to choose nl, . . . . nk
properly, under a given prior, to get a Bayes rule of simple structure. This situation seems to be quite realistic since in many applications, it is the design of nl, . .., nk rather than the prior r that can be controlled by the experimenter. It will be seen below that the Bayes rule dB is in fact very simple, as it selects in terms of the largest Xi=nj~ji i= 1, . . . . k, if the posterior distribution has a density r(plx) say, with respect to a permutation invariant, sigma finite measure p on the Bore1 sets of which is decreasing in transposition (DT), cf. Hollander, Proschan, and 10, ilk, Sethuraman (1977), i.e. which satisfies the following two conditions: s(PiX)~z@l(XZ,X1,X3,...,Xk))
ifplsP2,
xl~-%, (26)
z(a(p)ja(x))
For such a prior,
for all p, x, and permutations
= T(J~X)
the following
can be shown
cr.
to hold.
Lemma 4. Under a monotone, permutation
invariant loss function, suppose that the posterior distribution has a density with respect to a permutation invariant, sigmafinite measure ,a on the Bore1 sets of [0, ilk, which is (DT). Then the (every) Bayes rule dB selects in terms of the largest Xi = ni.Yi, i = 1, . . . , k. Proof. Under the assumptions X=x, for selecting population
stated in the lemma, the posterior Xi, i E { 1, . . . , k}, is given by
expected
loss at
LF(x,i)=E{L(O,i)lX=x} = yI [O,ilk
UP, i)@lx)
(27)
WA
wherexjE{O,l ,..., nj},j=l ,..., k. By Lemma 3 of Gupta and Miescke (1984), it follows that Z(x,i) satisfies (22), i.e. it has all the properties of a monotone, permutation invariant loss function, where now x plays the role of p. Since the Bayes rule selects in terms of the smallest g(x,i), i= 1, . . . . k, the proof follows from (22). Let us now find sufficient conditions, under which the posterior density r@lx) is (DT). Naturally, for the remainder of this section, we assume that the prior has a density r(p), say, with respect to ,z. And, for simplicity of exposition, since {Oi= 1 and Xi> O> is an impossible event, i = 1,. . . , k, we can represent r@lx) by r01l4
= chxW(x,
pkb,
n),
(28)
where d(x,p)=
fi i=l
and
bi/(l
-Pi)l’,
e@, n> = ii 1=1
(1 -Pi)“‘r@)7
MM.
66
Abughalous, K. J. Miescke / Selecting largest success probability
c(n, x) = 1
0,
p)eOl, n) d&J).
1.i to,Ilk From
this representation,
the following
can be concluded.
Theorem 6. Under a monotone,
permutation invariant loss function, suppose that the posterior distribution has a density t(plx) with respect to a permutation invariant, sigma-finite measure u on the Bore1 sets of [0, Ilk. If, in the representation of s@lx) given by (28), e(p, n) = e(a@), n), for every p E [0, ilk and every permutation o, then s@lx) is (DT), and thus the Bayes rule selects in terms of the largest xi=niR;, i= 1, . . . . k. Proof. Apparently, d(x, p) is (DT). Therefore, t(pix) is (DT) if c(n, x) = c(n, o(x)), for every x and every permutation o. The latter can be shown as follows: 1/c(n, o ~ l(x)) =
4x,
O))e@,
n) Q(P)
s [O, 1Y
0, O))e(cQ),
=
n) 44dd)
n
=
I
4x, p)e@,4 d&G = 1/dn, 4,
(29)
VIIlk
where the first equation follows from d(o-l(x), p) = d(x, a(p)), the second from the assumption made on e(p,n), and the invariance of p, and the third from a simple change of variables in the integration. The second claim of the theorem follows from Lemma 4. Therefore, the proof is completed. The following
result is an immediate
consequence
of (28).
Corollary 3. For a given prior density t(p), each of the following two conditions on n,, . . . . nk are sufficient to apply Theorem 6. (a) r(p) = nf=, ri(pi), where Si(pi) = c;(l -pi)-“lh(pi), p; E [0, 11, h is non-negative, and Ci is a normalizing constant, i= 1, . . . . k, (b) r(p) = flF= 1 (1 -pi) -“Sg(p), where g(o(p)) = g(p) 2 0, for every p E [0, l] “, and every permutation o. To conclude this section, let us demonstrate that the concept of posterior densities with the (DT) property can be extended to cases where (DT) is given with respect to p and y, say, where y is some transformation of x. The following result shows clearly, how this technique can be used for specific types of priors, similar as it is done here for the conjugate family of beta priors. Theorem I. Under a monotone, permutation invariant loss function, suppose that a priori, Oi - Be(oi, pi), where ai > 0 and pi > 0 are fixed, i = 1, . . . , k, are indepen-
M.M. Abughalous, K. J. Miescke / Selecting largest success probability
67
dent. If n,, . . . . nk can be chosen in such a way that for some 6 > 0, ai + /Ii f nj = S, i=l 7-.., k, then the (every) Bayes rule dB selects in terms of the largest (Xi+Xi, i=l 7.-*, k. Proof. Under the assumptions of the theorem, suppose that cXi+p;+ nj=6, i= 1, . . . . k, for some 6> 0. Then one can see that the posterior density t(plx) is proportional to
;ftl [Pi/Cl -Pi)ly’pl ii
HIP9Y)=
t1 -Pj)"p',
j=l
where p E [0, l)k, and y is a transformation of x, given by yi = (Xi+ X, , i = 1, . . . , k. Since, apparently, g@, y) is (DT), the proof proceeds along the lines of the proof of Lemma 4, where now y plays the role of x there. To conclude this section, let us mention that in many situations, n,, . . . . nk with the required condition can in fact be found. This is the case, whenever o;+Pi, i= 1, . . . , k, have a common fractional part, and thus especially, whenever oi + p, , i= 1,...,k,areintegers.Thecaseofa,=...=akandP,+n,=...=Pk+nkprovides an example where Theorem 6 applies.
References Bechhofer, about Berger,
R.E. (1985). Selection
and ranking
present,
alternatives Bickel,
- Some personal
reminiscences,
and thoughts
and future.
York. Berger, J.O. and J. Deely (1988). A Bayesian
Bratcher,
procedures
Amer. J. Mafh. Manag. Sci. 5, 201-234. J.O. (1985). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer-Verlag, its past,
P.J.
to AOV methodology. and K.A.
approach
to ranking
and selection
means with
J. Amer. Statist. Assoc., 83, 364-373.
(1977). Mathematical Statistics. Holden-Day,
Doksum
of related
New
T.L. and R.P. Bland (1975). On comparing
binomial
probabilities
San Francisco, from a Bayesian
CA. viewpoint.
Commun. Statist. 4, 975-985. Gupta, S.S. and D.Y. Huang (1976). Selection procedures binomial populations. Sankhyn Ser. A 38, 153-173.
for the entropy
Gupta,
approach
to binomial
procedures
- A decision-theoretic
S.S.
Technology Gupta,
and
G.C.
McDonald
(1986).
A statistical
function
associated models.
with the
Quality
J.
18, 103-115.
S.S. and K.J. Miescke
(1984). Sequential
selection
approach.
Ann. Statist. 12, 336-350. Gupta,
S.S. and K.J. Miescke (1987). On the problem
of finding
the largest normal
mean under heteros-
cedasticity. In: S.S. Gupta and J.O. Berger, Eds., Statistical Decision Theory and Related Topics IV. Springer Verlag, Berlin-New York, Vol. 2, 37-49. Gupta S.S. and S. Panchapakesan (1979). Multiple Decision Procedures. Wiley, New York. Gupta, S.S. and M. Sobel (1960). Selecting a subset containing the best of several binomial populations. In: I. Olkin et al., Eds., Stanford, CA, 224-248.
Contributions to Probability and Statistics. Stanford
Hollander, M., F. Proschan and J. Sethuraman (1977). Functions applications in ranking problems. Ann. Statist. 5, 722-733.
decreasing
University
in transposition
Press,
and their
68
M.M. Abughalous, K. J. Miescke / Selecting largest success probability
Johnson,
N.L.
and S. Kotz (1970).
Boston, MA. Sobel, M. and M.J.
Huyett
Continuous Univariate Distributions. Vol. 2. Houghton
(1957). Selecting
the best one of several binomial
populations.
Mifflin,
Bell System
Tech. J. 36, 537-576. Tong,
Y.L. and D.E. Wetzell (1979). On the behavior
normal
population.
Biometrika 66, 174-176.
of the probability
function
for selecting
the best