On combining selection and estimation in the search for the largest binomial parameter

On combining selection and estimation in the search for the largest binomial parameter

Journal of Statistical Planning and Inference 129 36 (1993) 129-140 North-Holland On combining selection and estimation in the search for the l...

740KB Sizes 1 Downloads 32 Views

Journal

of Statistical

Planning

and Inference

129

36 (1993) 129-140

North-Holland

On combining selection and estimation in the search for the largest binomial parameter* Shanti S. Gupta Purdue University.

Wrst Lafuyrtte,

IN, USA

Klaus J. Miescke of Illinois at

University

Chicago, Chicago, IL, USA

Received

1 July 1990; accepted 27 April 1992

Abstract:

For k 2 2 independent

served, the problem B-parameter

of the selected population

derived and studied for independent A numerical

binomial

AM.7 Subject Classification: Key uaords andphrases;

with the largest Q-value and simultaneously

is considered.

Beta-priors.

example is given to illustrate

from which X,- /:(n,@,), i = 1,. ._, k, have been ob-

populations,

of selecting the population

Under several loss functions,

A fixed sample size look ahead procedure

the performance

estimating

the

Bayes decision rules are is also considered.

of the procedures.

62F07; 62FlS

Selection and estimation;

largest binomial

parameter;

Bayes procedures.

1. Introduction Let ka2

binomial

populations

be given,

from which independent

observations

xj-~(ni,ej), i=l,..., k, have been drawn, where nl, . . . , nk are assumed to be known. Suppose we want to find the population with the largest success probability, i.e., O-value, and simultaneously estimate the parameter 0 of the selected population. All results in the vast literature on ranking and selection are separate treatments of either one of the two decision problems, except two. Cohen and Sackrowitz (1988) have presented in their paper a decision-theoretic framework, but derived results only for k = 2 normal and uniform distributions and n, = n2. Cupta and

Correspondence to: Prof. S.S. Gupta, 47907, USA.

Department

of Statistics,

* Research supported in part by the Office of Naval Research Grants DMS-8923071, DMS-8717799 at Purdue University.

0378-3758/93/$06.00

0

1993-Elsevier

Science

Publishers

Purdue

University,

Contract

B.V. All rights

West Lafayette,

N00014-88-K-0170

reserved

IN

and NSF

130

S.S. Gupta, K.J. Mescke

/ Largest binomial parameter

have extended results for populations to k>2, not necessarily equal sample sizes HI, . . . , nk, and to a larger class of loss functions. Estimating the mean of the selected population has been treated in the literature so far only under the assumption that the ‘natural’ selection rule is employed, which selects in terms of the largest sample mean, i.e., in the present framework in terms of the largest X,/n,, i= 1, . . . , k. Abughalous and Miescke (1989) have shown that in the binomial case the ‘natural’ selection rule is not optimum if the sample sizes are not equal. Further discussions and references can be found in Gupta and Miescke (1990). It is well known by now that the ‘natural’ selection rule does not always perform satisfactorily under nonsymmetric models. It is more reasonable to incorporate loss due to selection and loss due to estimation in one loss function and then let both types of decision, selection and estimation, be subject to risk evaluation. Rather than ‘estimating after selection’, the decision theoretic treatment leads to ‘selecting after estimation’, as has been pointed out by Cohen and Sackrowitz (1988). This will be shown in Section 2 in a general framework. Bayes rules for independent Betapriors will be derived and studied in Section 3, and a fixed sample size look ahead procedure is the topic of Section 4. A numerical example from Lehmann (1975) will be reconsidered, under the present situation, at the ends of Sections 3 and 4 to illustrate the performance of the procedures derived.

2. General

framework

Let X=(X,, . . . . Xk) be a random vector of observations where Xj-Z?((n;, e,), i=l , . . . , k, are independently binomially distributed with known n,, . . . , nk, and unknown parameters el, . . . , ok in the unit interval. The likelihood function is thus given by

/(x~e)=;~,~(x;~o;)=fI (n')e:;(l-e;)n-, Xi i=l

where X,E (0, 1, . . . . n;}, B;E[O, 11, i= 1, . . . . k. The goal is to select that population, i.e., coordinate, which is associated with ok}, and to simultaneously estimate the &value of the selected 6(k]=max{81,..., population. Since Bayes rules are the main topic of this paper, only nonrandomized decision rules need to be considered, which can be represented by d(x)=(s(x),/,(,,(x)),

x,~{O,l,...,

n,},

i=l,...,

k,

(2)

k} is the selection rule, and where l;(x) E [0,11, i = 1, . . . , k, is a wheres(x)E{1,2,..., available at selection. collection of k estimates of 8,, i= 1, . . . , k, respectively, The loss function is assumed to be a member of the following class L(e,(s,I,))=A(e,s)+B(e,s)[e,-/,]2, which represents

the combined

loss at 8, if population

(3) (i.e., coordinate)

s is selected

S.S. Gupta, K. J. Miescke / Largest binomial parameter

and I, is used as an estimate sidered

later on in connection

Additive

of 0,. Two special types of loss functions with conjugate

Beta-priors.

131

will be con-

The first is called

Type: A I (0,s) = B,kl - 6, B,(8,s)=e,

or

or

A2(0,s)=B5?‘(1-B,)d,

B,(~,~)=Q[B,(~-~,)I-I,

c,d>O. Q30.

Hereby, any choice of A, or A, represents loss due to selection, B, controls the relative importance of selection and estimation, and B2 adjusts also the precision of the estimate I, to the position of 8, in [0, 11. A justification of the latter will be given later. The second type is called

Multiplicative

Type:

A,(B,s)=O,

and

B,(8,s)=B,-“(1-B,)d,

c,d>O.

Hereby, any choices of loss due to selection, relative importance of selection and estimation, and adjustment (or non-adjustment) of the precision of the estimate to the position of the parameter is represented by the two parameters c and d. In the Bayes approach, let the vector of k unknown parameters be random and denoted by 0. The prior is assumed to have a density n(e), 0E [0, l]“, with respect to the Lebesgue measure, with posterior density denoted by ~(0 /x) and marginal posterior densities Tci(Bi1x), i = 1, . . . , k. In the latter, index i at T[i will be suppressed for simplicity whenever it is clear from the context what is meant. As has been mentioned in the Introduction, the decision theoretic treatment of the combined selection-estimation problem leads to ‘selection after estimation’, which was first pointed out by Cohen and Sackrowitz (1988). Similar to Lemma 1 in Gupta and Miescke (1990), the following extension can be seen to hold. Lemma 1. Let I,*(x) minimize E{B(0,i)[Oj-I,]2~X=x} for l;~[O,l], i=l,...,k. Furthermore, let S*(X) minimize E{A(0,i)+B(0,i)[Oj-I,*(x)]2 1X=x}, 1, . . . , k. Then the Bayes rule, at X=x, is d*(x) = (s*(x), I&,(X>).

i=

We can get some steps further ahead toward finding the Bayes rule explicitly under a loss of the additive or multiplicative type and independent Beta-priors, if we restrict considerations to those situations X=x, where B(B, i)n(t3 1x) is integrable on [0, Ilk and has second moments of B;, i= 1, . . . , k. Cases where this does not hold will not cause any major problems. They occur, if at al1, at the lower ends of the ranges of X,, . . . ,X,. For i= 1, . . . , k, let qe+)=r,(+)

[‘r,(,+)dh, I <,0

where

rite; /X) = and

B(e, i)rr(B 1X) da,,

(4)

132

S.S. Gupta, K. J. Miescke / Largest binomial paramerer

6;=(S,,... . B;-,,e,+*, Then

we can state the following

. . . . 0,). result.

Theorem 1. At every X=x, for which sSi(di1x) exists and has second moments, i=l , . . . , k, the Bayes rule d*(x) satisfies /,*(x) = E”l” Ix)(Oj), i= 1, . . . , k, and s*(x) minimizes E”” ~X’(A(O,i))+E”” iX’(B(O,i)) var”‘(’ I”(@,), i= 1, . . . . k. Proof. Suppose that at X=x, I,*(x) has to minimize

E{B(0,i)[Oj-/j]2

the i-th population

is selected.

Then,

/X=x}

B(B,i)z(O [ [Bi-li]25i(ei) X) dOi [ J IO,11 J LO,11~ if Ts,(&) x) has a second

1,

(5)

as a function of ,i E [O, I]. If now iii(0i 1 X) exists, the conditional be written as

Thus,

by Lemma

moment,

(X)

the minimum

expectation

(5) can

dB.

(6)

of (6) as a function

of

/,E [0, l] occurs at

/Y(x)=

!

B;fi(B; / x) d&=EfJ’. !“‘(O;),

(7)

* [O,II and the minimum of (6) turns out to be the product of the variance of Ts,(. 1x) with the expectation of B(O, i) under rc(. / x). Once the best estimate Ii*(x) has been found for a possible use in connection with k, the optimum selection s*(x) minimizes selection of the i-th population, i= 1, . . . . the sum of the expectation of A(@, i) under n(. ) x) and the product of the variance of Y?;(. 1x) with the expectation of B(O, i) under rc(. I x), i= 1, . . . , k. This completes the proof

of the theorem.

Remark 1. As we have seen, the proof of Theorem 1 proceeds componentwise. Thus, this approach can be used also in other situations where the assumptions of the theorem are fulfilled only for some iEM(x)c { 1, . . . , k}, say. For these populations iEM( one may just proceed as in the proof. On the other hand, for every j$M(x), one has to find b*(x) by minimizing, as a function of 4 E [0, 11,

[Oj-$]2B(S,_i)Z(0

I X)

de,

(8)

.i 10,ilk which gives G*(x), and to use its minimum value as a substitute for the non-existing product of the variance of Ej(. I x) with the expectation of B(O,j) under n(. 1x) in the final minimization step that leads to s*(x), i.e., the optimum selection. Remark 2. It should

be pointed

out that all optimum

estimates

l,*(x), i= 1, . . . , k,

S.S. Gupta, K. J. Miescke / Largest binomial parameter

considered

are the usual

restricted

to one population

Bayes estimates

if selection

is ignored

133

and estimation

is

at a time.

3. Bayes rules for Beta-priors In this section, we will derive the Bayes rules d*(x) explicitly and discuss their properties, assuming that the loss (3) is of the additive or multiplicative type and that a priori, O,, . . . , Ok are independent and follow k given Beta distributions. To recall briefly some well-known facts, a random variable 0 is Beta-distributed with parameters a,P>O, if it has the density n(e) =

W+P)

W-1(1 - e)p-1,

l9E [O, 11.

r(o)r(p) Its expectation

and variance

En(@) =” o+P

are given, and

var”(O)

respectively, =

by oD

(cr+p)*(a+p+l)’

(10)

This family of &(a,p), a>O, p>O, is conjugate to the binomial family, since the posterior distributions are again of the Beta-type. More precisely, if O- ~Z?~((x,p) and X, given 0 = 0, is B((n, e), then 0, given X=x, follows a ?.& (a + x, /3 + n -x) distribution, which is called the posterior of 0 at X=x, XE {0, 1, . . . , n}. If finding the Bayes rule is our only concern, the marginal distribution of X does not need to be considered. It is only relevant for averaging posterior expected loss, i.e., posterior risk. Although this will not be done before Section 4, let us give it already here for the sake of simplicity and completeness. The probability that X is equal to x is given by n T(a+P) r(ct+x)r(p+n-x) m(x) = x=0,1 ,..., n, (11) T(cx+B+n) x r(o)W)

0

)

which is a Polya-Eggenberger distribution with the four parameters n, a, p, and 1, cf. Johnson and Kotz (1969, p. 230). Its expectation and variance are given, respectively, by nap(a+/3+ n) and var(X) = (12) (a+p)2((x+p+l)’ o+P

E(X)=d-

Finally, it should be mentioned that three of the four noninformative priors presented in Berger (1985, p. 89), fit into our present framework: The uniform distribution %%(l, l), which makes the marginal distribution of X uniform as well, the proper prior %‘e(i, +), and the improper prior which one gets as a limit of Z&(a,p) as (Yand p tend to zero, i.e., the function rc(B) = [0(1 - Q-i, which is not integrable on the unit interval but can be used to derive generalized Bayes rules. After these preliminary considerations, we are now ready to derive and study the Bayes rules in the given framework. Let the likelihood function be given by (l), and

134

S.S. Gupia, K. J. Miescke / Largesi binomial parameter

assume that a priori Oj- ~5%(a;,P,), i = I, . . . , k, are independent, where the (Y’S and p’s are all known. It follows then that a posteriori, given X=x, O,9%(f~~+x~,~~+n~-x,), i= 1, . . . . k, are independent. Moreover, in the marginal distribution of X, Xi, . . . , X, are independent and P{Xj =x,} is given by (11) with n,x, a, andpindexed byi,i=l,..., k. First, let us study the Bayes rules for losses of the additive type. Among other interesting facts we shall see that B, has an undesirable effect for large values of Q which makes then B2 preferable, and that A, leads to the same Bayes rules as A, with c=O and d = 1. It is natural to consider the simplest situation at the beginning which is the case of A =A, and B= B, in (3), i.e., the loss function

L(8, 6, I,)) = Q] - 6, + e]& - 41”.

(13)

Lemma 1 is sufficient to find here the Bayes rule d*(x) conveniently since B(t?, i) = Q. The optimum estimates are found to be f,*(x) = (CW; +X;)/((.X; + /3; + n;), i = 1, . . . , k, the usual Bayes estimates for the single component estimation problems under squared

error loss, and s*(x) minimizes E{O,,,

(X=x} -I;*(x)+

The undesirable effect mentioned variance of Oi, id (1, . . . . k},

for i= 1, . . . , k,

e

/,*(x)[ 1 - /Y(x)].

cr;+p;+n;+1 above

comes

from

vark(‘~x)(@i)=(~;+j3;+ni+l))‘/i*(x)[l

the fact that

-I,*(x)],

(14) the posterior

(15)

decreases as /,.*(x) moves away from 0.5 in either direction. This causes a similar behavior of (14) if Q > oi +p, + n, + 1. In such a case, if all k estimates are close to zero, s*(x) would favor smaller estimates because of a smaller posterior risk due to estimation. From (14) one can see also that the term E{ @,,I X=x} has no influence at all on the determination of the Bayes rule. It could as well be replaced by 1, i.e., A, could be replaced by A2 with c=O and d = 1 in the loss function without any change in the Bayes rule d*(x). The next case to be considered is a loss function which combines A =A, and B = B2 in (3), i.e., L(8, (s, /,)) = B,,] - e,+e[e,(l-e,)l-‘[e,-/,12.

(16)

We do not yet replace B,kl by 1 since in Section 4, this loss function will be used to derive a fixed sample size look ahead procedure, where it is not obvious from the beginning that A =AZ with c = 0 and d = 1 gives the same rule as A = A, does. TO simplify the presentation, let us first look at the Bayes rules for (Y;> 1 and p;> 1, (4) that Tsi(. 1x) is i= 1, .., , k. To apply Theorem 1, one can see from its definition a S& (CT~ + xi - 1, p, + ni - xi - I)-density, and therefore the optimum estimates are k. The variance of 0; under 7si(. / x) is /j*(x)=(aj+xj-l)/(a,+~j+nj-2), i=l,..., readily available from (lo), and the expectation of [O,(l - Oi)lP1 under rr(. 1X) is

135

S.S. Gupta, K.J. Miescke / Largest binomial parameter

found

from

density.

(9) by manipulating

Finally,

the normalizing

s*(x) is found

E{O,,, (X=x} -

factor

of the associated

Beta-

to minimize

aj+x;

e

(17)

aj+P,+ni+ a,+pj+n,-2'

Adjustments for the general case of positive o’s and p’s are to be made as follows. If a,< 1 and xi = 0, then l;*(x) =O, and the last summand in (17) for that particular i changes to ~a;/(/?; + nj - 1). Similarly, if pi < 1 and x, = n, , then I,*(x) = 1, and the last summand in (17) changes to QP,/((Y,+~,1). The last case of an additive type loss function is a choice of A =A, in (3), combined with B=B, or B= BZ. In view of Theorem 1 and the results derived so far concerning B, what remains to be found is that for i = 1, . . . , k, E{A,(O,i)

(X=x}

T(Crj +Xi-

ni-Xi+

C)r(Pi+

d)T(ai+

pi + ni)

(18)

=T(ai+pi+ni+d-c)T(ai+xi)r(pi+ni-Xi) if ai+Xi>c,

whereas it is infinity if Cti+x;
Xi>C-ai

-&)lr’[e,-412

W,(sJ,))=~,‘+e[~,(1 employs the estimates I,*(x), minimizes, for i= 1, . . . , k,

Pi+%-4 ai+xj-1 whereas

under

+

i=

1, . . . , k, which are given below

e

(20)

tX,+/li+n,-2’

the loss function

L(B, (s, Q) = (1 - @J2 + e(Q I,*(x),

i=

(19) of (16), and s*(x)

Q2?

(21)

1, . . . . k, are those given below of (13), and s*(x) minimizes

I 1

+

@

-

ta,+P,+

cri+pi+ni+l

ni)

I,*(x) [ 1 - I,*(x)]. 1

for

i=

1, . . . , k,

(22)

It is interesting to note that the undesirable effect of (14) mentioned there applies also in this case. At the end of this section, Bayes rules for loss functions of the multiplicative type will be studied. Let L(8,(s,I,~))=8,“(1-B,)d[~r-[s]2,

c,d>O.

(23)

136

S.S. Gupta, Ii. J. Miescke

/ Largest binomial

parameter

To apply Theorem 1, with A r0 and B(8,s) = @,-‘(I - B,)d, one can see from (4) that si(. Ix) is a Ze(cx;+x,-c,fi;+n,-x,+d)-density whenever cuj+xj>c, i=l,...,k. Thus from (10) it follows that I,*(x) = (a, +x, - c)/((w; + pi + nj + d - c), if a, +x, > c, and one can see easily that /,*(x)=0, otherwise, i= 1, . . . . k. For (Y,+x,>c, the expectation of B(O, i) under rr(. /x) can be found by manipulating normalizing factors of Beta-densities, and the variance of 0; under Tsi(. /x) is provided by (10). Finally, the product of the two, which enters the minimization

step of s*(x),

turns

out to be

q+&+n,+d-c X

r(a,+p,+n,)r(crj+x;-c+I)Z-(P;+n;-x;+d+l) r(a;+x;)r(D;+n;-x;)T(a,+~;+n;+d-c+2)



If for someiE{l,..., k}, a;+~,
(24) by

(25)

Only one special case will be considered for brevity. For c = 0 and d= 2, we have /,Y(x)=((Y,+x,)/((X;+~;+n;+2), i=l,..., k. And since cy,+ x,> c holds, (24) is used in the minimization step of s*(x) for all i= 1, . . . , k. Selecting the largest of k success probabilities without estimating the selected parameter 0, has been treated previously by Abughalous and Miescke (1989). Some fundamental properties of the Bayes selection rule have been shown there to hold under all permutation symmetric priors and for all monotone, permutation invariant loss functions. Among others, one is that population i is preferred over population j if xj>Xj and nj -xi
137

S.S. Gupia, K.J. Miescke / Largest binomial parameter

Table

1.

Reply

Age 20-34

35-54

Over 54

Unfavorable

311

417

160

No opinion

35

50

25

110

Favorable

153

182

65

400

Total

565

649

250

1464

Total 954

population are known, which appears to be a realistic assumption, then that part of the likelihood relevant for Bayesian decision making turns out to be exactly the same as above in the case of stratified sampling. Therefore, one realizes that there are various situations in empirical studies where independent samples of unequal sizes have to be examined. Let us consider now the problem where we want to find that age group which is associated with the largest proportion of persons having an unfavorable opinion, and simultaneously estimate the selected proportion. By assuming that a priori, the breakdown of persons who are having no unfavorable opinion into those with no opinion and those being favorable has no influence on the tendency of persons to be unfavorable, we can combine the second and the third reply class into one single reply class. Then one can see that for the Bayes decisions the relevant part of the likelihood is the same as that of k=3 independent binomial samples with sample sizes n, = 565, n,=649, n3 =250, numbers of success xl =377, x,=417, x3 = 160, and success rates xl/n t = 0.6673, x2/n2 = 0.6425, x3/n3 = 0.6400, respectively. The three standard 95%-confidence intervals for the proportions, using normal approximation, have all their half-widths larger than 0.0368, and even a more refined paired comparison approach as it is described in Hochberg and Tamhane (1987, p. 275), would lead to the same conclusion that the rates of non-favorable persons in the three age groups do not differ. Suppose that the parameters of the priors are ai = pi = 1, i = 1,2,3. Then we make the following decisions. Under loss (13), s*(x) = 1 and I:(x) = 0.6667 whenever ~~634.005, whereas s*(x)=2 and /,*(x)=0.6421, otherwise. Under loss (16), s*(x) = 1 and f:(x) =0.6673 whenever Q< 107.290, whereas s*(x) = 2 and I,*(x) = 0.6425, otherwise. Under loss (19), s*(x) = 1 and l:(x) = 0.6673 whenever @<250.689, whereas s*(x) =2 and IT(x) =0.6425, otherwise. Under loss (21), s*(x)=1 andIT(x)=0.6667 whenever@<437.255, ands*(x)=2andI,*(x)=0.6421, otherwise. Finally, under loss (23) with c=O and d= 2, one gets s*(x) = 1 and I;(x) = 0.6643. At the end of the next section, where a fixed sample size look ahead procedure is derived, this example will be considered again.

138

S.S. Gupta,

K. J. Miescke

/ Largest

binomial

parameter

4. A fixed sample size look ahead procedure The question considered in this section tional observations after having observed the type (16), augmented

is, whether it is worthwhile to take addiXj- a((n,, e,), i = 1,. . . , k, if the loss is of

by costs for sampling.

Let

L(4 (4 Q) = d[k] - (4 + e[&(l - &)I-’ 10, -

/,I2 + YN

(26)

where N= n, + -1. + nk and y is the cost of observing one Bernoulli variable. Let a priori @i-C%(ai,bi), i= 1, . . . , k, be independent, where for simplicity of presentation oi> 1 and pi> 1, i= 1, . . . , k, is assumed. If no further observations are taken, the Bayes decision is described below of (16), and the posterior Bayes risk is the minimum of (17) for i= 1, . . . , k, i.e., with 6;= 1_2(ai+pi+ ni)-‘, di(ai+Xi)-@ + yN. (27) k ai+pi+ni-2 1

I

Suppose now that we consider taking additional observations 1, . . . . k, which are mutually independent and independent posterior expected risk, at X=x, is seen to be / X Y> ) X=x>

EW@,,,

F- B(mi, ei), i= of Xi, . . . ,Xk. The

+ YW’+W

d,(a;+xi+

F>-@

(28)

a;$p;+?li+t??i-2

where di=l_2(ai+pi+n;+mi)-’ (28), i.e. the iterated conditional be replaced by 1 in (26) without is seen now to hold. 2. At X=x, Y,,*.‘, Y,, if

Theorem

max i=l,...,k


and M=mr+“‘+mk. Since the first term in expectation, is simply E{@,,l ( X=x}, elk] could changing any result in this section. The following

it is worthwhile

taking

these additional

observations

di(Clj+X;)-@

I ai+P,+Hi-2

+YM I Lli(ai+X;+

I:)-@ (29)

a;+fi,+tl,+mi-2

This result can be used in several ways depending on the sampling scheme adopted. First, one could search through all possible m = (m,, . . . , mk) to determine whether it is worth at all taking more observations. This fixed sample size look ahead procedure is due to Amster (1963) and discussed in Berger (1985). It is useful in situations like the present one where a fully sequential Bayes procedure is not feasible. Second, if M=m, + ... +mk is fixed predetermined, one could find the optimum allocation m *, say, which maximizes the conditional expectation shown in the theorem and go ahead with additional observations using allocation m* if the ine-

139

S.S. Gupia, K.J. Miescke / Largest binomial parameter

quality

is met. This procedure

can be called an adaptive

look ahead M procedure.

Other possible applications of Theorem 2 are reasonable but omitted for brevity. All that is needed to find these procedures is the conditional distribution of Y, given X=x. Since apriori O,, . . . , 0, are independent, we have P{Y=ylX=X}=iiP{~=yjIX,=Xi},

(30)

i=l

and the conditional distribution of Y, given X;=x,, is the same as the marginal distribution of Y with respect to the ‘updated’ prior .%L(cxi+xi,/?, + n,-x;), i.e., in view of (1 l), for i= 1, . . . , k, it follows that P{ Y=_Yj 1X,=Xi} =

mi

~(c.xj+~j+nj)~(aj+x,+yj)~(~j+nj-xj+mj-y,)

Y,

r(ai+x,)r(/l;+n;-x;)r(a;+j3;+n;+m;)

(>



(31)

wherex;E{O,l,..., n,} andy,c{O,l,..., mi}. This can be used quite easily in a computer program to evaluate the conditional expectation in the criterion given in Theorem 2. An upper bound to the latter is provided by replacing Y by mi, i=l ,..., k, in it. Thus, the search through all possible m in the first described fixed sample size look ahead procedure is actually limited to a finite, typically small, collection of m’s, as the cost of additional sampling, i.e., yM, becomes prohibitive as M increases. To conclude this section, let us continue the treatment of our numerical example considered at the end of the previous section. For cr; =p, = 1, i = 1, . . . , k, (3 1) can be written as (32) which

can be computed

with

a subroutine

that

provides

hypergeometric

prob-

abilities. Recall that we have observed three independent samples from binomial populations (i.e., age groups 20-34, 35-54, over 54) with sample sizes (n,, n2, n3) = (565, 649,250), and with numbers of successes (x1,x2,xs) =(377,417,160). For 16 mi< 30, i = 1,2,3, e = 100, and y = 0.0001, the inequality (29) is achieved for various allocations of (m,,m2, m3). The ten largest differences between the right-hand side and the left-hand side of (29) occur, in ascending order, at (30,3, O), (2% 0, finally (30,0,0) In the same from selection

I), (2% I, 0), (30,0,2), (30,1, l), (30,2, O), (29,0, O), (30,0, I), (30, I, 0), and as the optimum allocation. setting, if Q is replaced by Q = 400, i.e., if emphasis is shifted away toward estimation, then the ten largest differences occur, in ascend-

ing order, at (0,30,3), (3,30,0), (1,30,2), (A39 I), C&30, Oh (0,30,2), Cl,34 11, (A34 01, (0,30, l), and finally (0,30,0) as the optimum allocation.

140

S.S. Gupta, K.J. Miescke / Largest binomial parameter

References Abughalous, sample Amster,

M.M. and K.J. Miescke (1989). On selecting

S.J. (1963). A modified

Bayes stopping

J.O.

Cohen,

A. and H.B. Sackrowitz

by estimating

(1988). A decision

the mean of the selected

S.S. and K.J. Miescke

In: S.S. Gupta

Vol. 2. Springer

(1990). On finding

the largest

Y. and A.C. N.L.

Lehmann,

and J.O.

Verlag,

New York.

selection

Berger,

followed

eds., Statistical

Verlag, New York, pp. 33-36. normal mean and estimating the selected

E.L.

Tamhane

(1987). Multiple Comparison Procedures. John

and S. Kotz (1969). Discrete Distributions. Houghton (1975).

Nonparametrics:

and Mifflin

Wiley,

New York.

Company,

Boston.

Statistical Methods Based on Ranks. Holden-Day,

Francisco. Walker

for population

Sankhyd, Ser. B, 52, 144-157.

Hochberg, Johnson,

theory formulation

population.

Decision Theory and Related Topics-IV, mean.

under unequal

Ataih. Statist. 34, 1404-1413.

rule. Ann.

(1985). Statistical Decision Theory and Bayesian Analysis. Springer

Berger,

Gupta,

the largest success probability

sizes. Journ. Statist. Planning Inference 21, 53-68.

and Lev (1953). Statistical Inference. Henry

Holt,

New York.

San