On selecting the best natural exponential families with quadratic variance function

On selecting the best natural exponential families with quadratic variance function

7* , STATISTICS& |~" ELSEVIER Statistics & Probability Letters 25 (1995) 341-349 On selecting the best natural exponential families with quadrat...

380KB Sizes 2 Downloads 80 Views

7*

,

STATISTICS&

|~"

ELSEVIER

Statistics & Probability Letters 25 (1995) 341-349

On selecting the best natural exponential families with quadratic variance function Mansour M. Abughalous a, Naveen K. Bansal b,, a Department of Mathematics, University of Wisconsin-Platteville, Platteville, WI 53818, USA b Department of Mathematics, Statistics and Computer Science, Marquette University, Katharine Reed Cudahy Hall Bldg., Milwaukee, WI 53233, USA Received June 1994; revised October 1994

Abstract

We consider a problem of selecting a best of k one parameter exponential families with quadratic variance functions which is associated with the largest mean. It is shown that the minimax value under the "0-1" loss function is 1 - 1/k. Also the Bayes rules are discussed under the "0-1" loss and the other loss functions.

Keywords: Natural exponential families with quadratic variance function; Minimax value; Bayes selection rules; Selection

I. Introduction

The one parameter natural exponential families (NEF), with quadratic variance functions (QVF) include many of the most useful and widely used distributions, namely normal, Poisson, gamma, binomial, and negative binomial; indeed, these are five of the six basic NEF-QVF distributions, which were studied by Morris (1982, 1983). Let rq . . . . . nk be k ~>2 independent one parameter NEF-QVF with unknown parameters ~/1,... ,~/k, where #i is the mean of the ith population. Selecting the population which is associated with #[k], i.e., the largest #i, i = 1. . . . . k, was studied by Gupta and Miescke (1988) for the normal populations, Abughalous and Miescke (1989) for the binomial populations, by Abughalous and Bansal (1994) and Abughalous et al. (1994) for the gamma populations. In this paper, we consider this selection problem for the more general class of distributions NEF-QVF. Lehmann (1966) and Eaton (1967) also considered this problem for a more general class of families of distributions but their results are restricted to the symmetric case, i.e., when the sample sizes of k populations are the same. Minimaxity under the "0-1" loss function is the topic of Section 2. It is clear from Eaton (1967) that the minimax value when the sample sizes are the same is 1 - 1/k. We show in this section that the same is true even when the sample sizes are not the same. In Section 3, Bayes rules for permutation invariant priors are studied under the "'0-1" loss function, Bayes rule for conjugate priors is derived under regret loss function, * Corresponding author. 0167-7152/95/$09.50 (~) 1995 Elsevier Science B.V. All rights reserved SSD1 0 1 6 7 - 7 1 5 2 ( 9 4 ) 0 0 2 3 9 - 8

342

M.M. Abughalous, N.K. Bansal/ Statistics & Probability Letters 25 (1995) 341 349

and under the monotone permutation invariant loss, Bayes rules are studied for ( l ) permutation invariant priors and (2) priors which are not permutation invariant but have a decreasing in transposition posterior densities with respect to some symmetric measure.

2. Minimaxity under the "0-1" loss function Let nl ..... nk be k ~>2 independent one parameter NEF-QVF with respective densities

f(x~lO~) =

exp{Oixi - ~(Oi) + S(xi)}IA(xi),

i = 1. . . . . k ,

(1)

with /~i -=- E(Xi) = ~'(Oi) and the variance function V(lli) = Var(X/[0i) which is a quadratic function v0 + vl/l~ + v2#2, and IA(.) is an indicator function, where the support A does not depend on the parameter Oi. Let Oi = ~b(/~i) = (~,)-i(/~i). Let the independent observations Xij be taken from the ith population hi, j : 1. . . . . ni, i = 1. . . . . k. The density function of the sample mean . ~ of the ith population is given by f(Yi[pi) = exp{ni ~)([2i)Yi

-

-

ni ~(~b(kti))} h(xi)IA? (xi),

(2)

where h0?~) is some function of~f~ which is independent of kti and A* is the image ofA "i = { (x~,xi2 . . . . . xi,,); xij c A, j = 1. . . . . hi} under ~?i = 1/n ~-~'=l xij. Denote .~ = ( X I , X a .... , x x ) T . We consider a sequence of priors, proposed by Morris (1983), on #i with densities gin(Ill) = K ( m, l~o ) exp{mkt0 ~b(/li) - m ~ ( dp(lii ) ) } / V (l~i ) ,

(3)

where K(m,12o) is such that gin(') is a density function. The posterior density of/ti given-~i = Yi is gm(#ilxi ) = Ko exp{(rti.~i -}- mflo)(9(fli ) -- (ni + m ) ~ ( 49(pi ) ) } / V (lti ) ,

(4)

where K0 : K (ni + m,(niYi + ml~o)/m).

Theorem 1. Under the "0-1" loss function, the m i n i m a x value o f the selection problem is equal to 1 - l/k. Proof. Using the similar technique as in Gupta and Miescke (1988), consider first the no data rule d °, say, which selects each population with the same probability 1/k. Clearly, it has the constant risk 1 - l/k, i.e., d o is an equalizer rule. To show that the minimax value is equal to 1 - l/k, it suffices to find a sequence of proper priors such that the sequence of associated Bayes risk converges to this value, cf. Berger (1985, p. 350). Let the sequence of independent priors be given by (3). The BaYes rule din(X), which minimizes the posterior expected loss, yields the posterior risk

min [ 1 - P { p i = / t [ , ] l X = £ } ] .

i=l,...,k

The posterior density of/~i given .~ = £ is given by (4). Thus, the Bayes risk is given by

,=, max

M.M. Abughalous, N.K. BansallStatistics & Probability Letters 25 (1995) 341-349

343

Thus, if lim [ 1 _ max p { p i = # [ k l l ~ = 2 } l

m--+gc~

i=l,...,k

=1

1 k '

(6)

for all £, then l i m , , ~ r(nm)= 1 - l/k. The proof of Theorem 1 will be established if (6) holds. Note that

P {,'i ~ [2[k]iX:x}

= P {n~m)= n(ff]]'},

where H!m) =

G! m) __

/20 --' x/V(#o)/(m - v2)'

(7)

and G~") denotes the random variable with density function given by (4). Thus, (6) holds using the symmetry argument and from the following Lemma. Lemma 1. H~ m) converges in distribution to N(0, 1) as m tends to infinity. ~(m) , (Glrn)_ Yl o ) ) / ~ / V ( y l O } ) / ( m + n i Proof. Let ,-ate A,t~°) denote the fth moment of normalized -i

v2), w h e r e

yl °) = E(GI m~ ) = ni/(m + ni)£i + m/(m + ni)#o, see Eq. (5.11) of Morris (1983). From Theorem 5.4 of Morris (1983), the central moments M 7 of GI n) are given by the recursive formula Mr+l_

_r m +

ni

[V'(y}°))M~ + V(y}°))Mr_l],

--

rv2 L "

~

r >~1,

J

Mg-----1, MI* = O. Thus, the recursive formula for M~°) is given by

M,o)

r ni + m

r+l -

1,

{(ni+m=vz)'/2V'(y,°)) -

(V(~I5

rv2

M~rO)

(n i +

. ,,,t0)) + m - v2}lVlr_ 1 ,

r>~2

,

MI°'= o.

It can be shown that limiting moments ~r~ of M~°) as m tends to infinity are given by &¢2 = 1, Mzt+2 = ( 2 ? +

l)h4'zt, for l = 1,2 .....

and /~2t'+ I = 0,

for g = 0 , 1 ....

and, from this, (2()! Mz/ - 2t ( ! ,

M2~+l = 0 ,

~ = 0 , 1 , 2 ....

Thus, the moments of the limiting distribution of the normalized -G! - I m) as m ~ oo are the same as the moments of N(0, 1). Thus, by Theorem 6.45 of Chung (1974, p. 158), normalized random variable Crlm) converges in distribution to N(0, 1) as m tends to infinity.

344

M.M. Abuyhalous, N.K. BansallStatL~tics & Probability Letters 25 (1995) 341-349

Note that, from (7),

H~m,=

GIm)-YI°) ( m - v 2 1/2 (V(y,°)))'/2 V~o) J ~/V(y i(o---~-m )/( + n i - v 2 ) m-~ni--v2

+

s y(O)

#o

(V(l~o) / (m -

v2 ))1/2

"

By Slutsky's theorem, cf. Bickel and Doksum (1977), H/~m) converges in distribution to N(0,1) as m tends to infinity, which completes the proof. [] The natural decision rule can be written as d N ( ) ( ) = i if and only if.~i = max(Xi . . . . . -~k)

(8)

and ties are broken at random. Now, we would like to evaluate the maximum risk of the natural decision rule d N over the parameter space ~'~/~ = {(~1 . . . . , ~ l k ) ' ~ l i = ~lt(Oi)}. Let 9t~i) = {/1 E f~l,: Pi = max ~j}, then ~21, = U i ~-~(i) and I <.j~k

sup R(p,d N) = 1 -

~EQ,,

=

1

-

fx~l~,

inf Pl, {dN(i) = i}

min

inf P# (.~j ~<.~,, V j ¢ i}

i:L...,k ,uc~',?

=1where Fy, l~,, and

min

i-l,...,k I~EQ'[)

min inf [ ]-] F£,lu(X)fy~l~,(x)dx, i=l,....k /l~',:' JR 7~-i'"

(9)

denote the distribution and density functions o f . ~ / given/re.

Corollary 1. If 49(.) is an increasin9 function and d N is a minimax rule.

ni =

n Jbr all

i = 1. . . . .

k, then the natural decision rule

Proof. If 49(.) is an increasing function, then fx, l,, has MLR property in xt and since ni i---- 1. . . . . k, thus, F~,lui(x)<~F~jl~' (x) for /~i~>pj, cf. Lehmann (1986, p. 85). Hence, from (9),

sup R(/~,d N) = 1 -

~Cf~I~

min inf i=l,...,k ~,

which completes the proof.

,Ip, (x)

fxil~, (x)dx = 1 - -

k '

[]

3. Bayes rules

3.1. Bayes rules under 0-1 loss Under the 0-1 loss function, the Bayes rule d a, which minimizes the posterior risk

P {~i ~ ~Lk~IX = x } with respect to i = 1 , . . . , k at every given X = x, is given by

dB(x) C { i lf~(ilx)= j=,..,kmaxc~(j'lx),

i=1 ..... k},

:

n for all

M.M. Abughalous, N.K. Bansal/Statistics & Probability Letters 25 (1995) 341-349

345

where ~(slx)

=

f{

k f(Yjllaj)dz(~), Ill ~l:y,=l~l~]} j:l

with z(.) a permutation invariant prior. Finding the best population can be done through the pairwise comparison of if(six), s = 1. . . . . k as it was done in Gupta and Miescke (1988). For simplicity, let us consider the comparison of fq(2lx) and ~ ( l l x ) : ~(2ix)-

~(llx):

f,

IkI f(2:lttj)dv(la) -

11l:l~2=ltlk]} j=l

H, f(2jlttj)dv(l~). ~:]~l=/2[k]} j=l

Exchanging /al and f12 in the first integral and combining the integrals, one can get

k f f~(2[x) - f~(1 Ix) = ]{/~:~.:~t,1} [M2,, ( x , p ) - 1] 1-] f(~jl/~j)dT(P), j=l

where exp {nff?, ~b(bt2) - nl ~tt(~b(#2))} exp {n2-r2 ~b(~, ) - n2 t/t(t~(]21 ))} M2,1(x,/u) = exp {nl£1 q~(~l) -- nl ~v(~b(pt ))} exp {n2x2 q~(~2) - n2 ~(t~(/.t2))} exp {(nl£1 -- n2.~2) (~b(~2) - ~b(#l )) } exp {(n2 - nl ) [~(~b(p2)) - ~(~b(#l ))]}" Assume that 4if') and 7/(.) are increasing functions. This is a reasonable assumption. It holds for normal, Poisson, gamma, binomial and negative binomial. Case 1: If nl = n2, then d B prefers Population 2 over Population 1 if and only if £2/>£1. Thus, in general if ni = n j, then d B prefers the ith population over the jth population if and only if £i >~£j. Case 2: If n2£2>~nl£1 and nl ~>n2, then d B prefers Population 2 over Population 1. Thus, in general if ?li.~i ~ nj.~j and nj >~ni, then d ~ prefers the ith population over the jth population.

Remark. If ni = n for all i, then d B is the same as the natural decision rule d N. In the case of unequal n/s, d B does not select in terms of the largest sample mean. 3.2. Bayes rule under regret loss We consider the regret loss L(/~, i) = ~u[k] - #i. The Bayes rule is given by dB(x) = i

if E[l~ilX = x] = max E [ ~ j l X = x]. j=L....k

If a priori, kq .... ,/~k are independent with densities gm,(lai ) = Ko( mi, lao ) exp {mi#o(9(l~i ) - mi T/( c~(#i ) ) } / V (#i );

M.M. Abughalous, N.K. Bansal/Statistics & Probability Letters 25 (1995) 341 349

346

then, a posteriori, given that X = x, #1,#2 ..... #k are independent with respective densities

gmi(I.tilXi) = Ko(mi, Po ) exp{ (ni.~i + miPo ) O(kti) -(ni + #i)

IP(~o(lAi))}/V(~i),

i =

1..... k.

Also E(lai ~ i = .~i ) -

ni'~i + miPo , ni q- mi

cf. Morris (1983). So the Bayes rule is

dB(x)=i

if

ni.~i + rail.to ni q- mi

-- max

j=l,...,k

nj.~j q- mjlA 0 nj + mj

Remark. If ni is a scalar multiple of mi, then d 8 is the same as the natural decision rule d N. In other words, if ni's are selected proportionally to the weights given to pi's a priori, then the Bayes rule is same as the natural decision rule.

3.3. Bayes rule under monotone permutation invariant loss Definition. The loss function L(p,j) is called a monotone, permutation invariant loss if it satisfies the following conditions: (a) L(/~, i) < L(I*,j) if Pi > #j. (b) L(a(l~), i) = L(ll, a(i)) for every permutation a(g) -- (P~(t) . . . . . #~(k)). Lemma 2. Under every monotone, permutation &variant loss function L, and for every permutation invariant prior z(.), the following holds."

E[L(g, I)[X = x] - E [ L ( p , 2 ) I X = x] =

k f(ijliJi ) [L(l~,2)-L(l~,l)][M21(x, I I ) - 1] I-[ dz(/t) '

j=l

g(-~j)

'

where g(2j) is the marginal density of gj and M2,1 (x,/u) = f(£1 [P2) f(x21tq)/f(xl [tq) f(£2[P2). Proof.

E[L(p, 1)IX = x] - E[L(I~,2)IX = x] = 11 + 12 + 13, where ll = f

[L(p, 1) - L(/u,2)] Hk f ( ~ j l ~ j ) dz(/u), ,U:,u, <,u2 }

/2 = f{ 13 = f

j=l

g(-~j)

[L(p, 1) - L(,u,2)] Hk f(.fjllaj) dz(,u),

/t:u2 <,, }

[L(.u, 1) - L(.u, 2)] Hk f ( 2 j l m ) dz(~u). j=, g(2j)

Note that/2 = 0. because L and z are permutation invariant.

M.M. Abughalous, N.K. Bansall Statistics & Probability Letters 25 (1995) 341-349

347

By interchanging Pl and P2 in Ii, since L and z are permutation invariant, we get

I, =

.u:u~
l_ij=3 f(.~/I,u/) llj~k=l g(~j)

dz(~).

Now the result follows by adding It and 13. Since L is monotone,

E[L(tt, 1)IX = x] - E[L(tt, 2)IX = x] ~o l ~ 0 . Thus, the results of Section 3.1 are also valid if, instead of "0-1" loss function, any monotone, permutation invariant loss function is given.

provided M2, l ( x , ~ ) -

The remainder of this section deals with the question of how to choose nl . . . . . ng properly, under a given prior, to get Bayes rule of simple structure. This situation seems to be quite realistic since in many applications, it is the design of n~ ..... nk rather than the prior r that can be controlled by the experimentor. It will be seen below that the Bayes rule is in fact very simple, as it selects in terms of the largest ni.~i, i = 1..... k, if the posterior distribution has a density v(/Jl(nlJ?l . . . . . nkJ?k)) with respect to a permutation invariant, sigma finite measure v, which is decreasing in transposition (DT). Definition. The posterior distribution with density satisfies the following two conditions:

r( lx)

is said to be decreasing in transposition (DT) if it

(a) z(plx)<~Z(l~[(x2,xl,x3 ..... xk)) if Pl ~/A2, Xl ~X2. (b) r(a(p)la(x)) = z(/~lx) for all p,x and every permutation a. For more discussion of DT property, see Hollander et al. (1977). Lemma 3. Under monotone, permutation &variant loss function, suppose that the posterior distribution has density with respect to permutation invariant sigma-finite measure, which is DT. Then ever), Bayes rule d B selects in terms of the largest n i x i , i = 1..... k. For proof, see Abughalous and Miescke (1989). Let us find sufficient condition, under which the posterior density z(/uls ), which depends only on s = (nl£1 ..... nk£k), is DT. Let z(/J) be the density function of the prior distribution with respect to permutation invariant sigma-finite measure v(g). Then the posterior density is given by

= FI =I

exp{si q~(#i) - ni~(ck(#i))} z(l~) f(s,n)

where

f(s,n) =/exp

ni~(~)(lZi))

--

=

z(p)dv(p).

i=1

It can be seen that z(/~[s) : c(s, n) d(s, !~) e(l~, n),

(10)

348

M.M. Abughalous, N.K. BansallStatistics & Probability Letters 25 (1995) 341-349

where

e(It, n ) = e x p { - ~ n i ~ ( q b ( p i ) ) } i = l

"t'(It)

(12)

and 1

c ( s , n ) - J)t~"s,n--~'"

(13)

Theorem 2. Under monotone, permutation &variant loss function, suppose that the posterior distribution has density r(Itlx) with respect to a permutation invariant sioma-finite measure v(.). I f the prior is such that e(It, n) = e(a(It), n)for every permutation tr and 49(.) is an increasing function, then r(Itls ) is DT, and

thus the Bayes rule selects in terms of the largest nifi~i, i = 1..... k. Proof. Clearly, from (1 1 ), d(tr(s), tr(#)) = d(s, It) for every permutation tr and

d(s, It)<-..d((s2,sl,s3 ..... sk),It)

for #l ~>P2, sl ~>s2.

Since e(#, n) - - e ( a ( i t ) , n) for every permutation tr, then to show that the posterior z(itls) has DT property, it suffices to prove that

e(cr(s), n) = c(s, n). Since d(s, It) is DT and from the given condition, we get

1/c(tr(s), n) = f d(a(s), It) e(It, n) dr(It) = / d(tr(s), a(It)) e(a(It), n) dv(It)

= fd(s, It)e(It, n)dv(It) 1 -- c(s,n)

Therefore the proof is completed.

[]

Corollary 2. For a given prior density ~(#), each of the followin 9 two conditions on nl,n2 . . . . . nk are suffi-

cient to apply the above theorem. (a) ~ ( # ) = Hik=l Ci exp{ni~(C~(#i))} h(#i), where h(.) is nonnegative function and ci is a normalized

constant. (b) v ( i t ) = I-I~=l exp{ni~(c~(#i))} g(p), where g(') is a permutation invariant function. To illustrate how Corollary 2 can be applied, let us consider a special case of Poisson (#i), i = 1. . . . . k populations. If a priori ]2i ~ exponential (2i), i = 1. . . . . k are independent, then r(it) can be written as

r(g) = f l ci exp {ni tP(dp(lii)) } exp {-(ni + 2i)Pi}, i=1

M. M. Abuohalous, N. K. Bansal / Statistics & Probability Letters 25 (1995) 341 349

where ~(q~(/~i)) =/~i. Then, to satisfy condition (a) of Corollary 2, way that n i + 2i : t$, i = 1 . . . . . n for some 6 > 0.

hi,

i =

1 .....

k

349

can be chosen in such a

Acknowledgements The authors are indebted to Professor G.G. Hamedani for reviewing an earlier version of this paper and providing comments that led to improvements of the paper.

References Abughalous, M.M. and N.K. Bansal (1994), On the problem of selecting the best population in life testing models, Commun. Statist. Theory Method 23, 1471-1481. Abughalous, M.M., N.K. Bansal and G.G. Hamedani (1994), On the invariance minimaxily of the natural decision rule for selecting the normal population with the smallest variance, J. Appl. Statist. ScL I, 289-298. Abughalous, M.M. and K.J. Miescke (1989), On selecting the largest success probability under unequal sample sizes, J, Statist. Plan. Infererence 21, 53-68. Berger, J.O. (1985), Statistical Decision Theory and Bayesian Analysis (Springer, New York, 2nd ed.). Bickel, P.J. and K.A. Doksum (1977), Mathematical Statistics: Basic Ideas and Selected Topics, Holden-Day, San Francisco. Billingsley, P. (1986), Probability and Measure (Wiley, New York, 2nd ed.). Chung, K.L. (1974), A Course in Probability Theory (Academic Press, New York, 2nd ed.). Eaton, M.L. (1967), Some optimum properties of ranking procedures, Ann. Math. Statist. 38, 124-137. Gupta, S.S. and K.J. Miescke (1988), On the problem of finding the largest normal mean under hetroscedasticity, in S.S. Gupta and J.O. Berger, eds., Statistical Decision Theory and Related Topics IV, Vol. 2 (Springer, Berlin, New York) 37-49. Hollander, M., F. Praschan and J. Sethuraman (1977), Functions decreasing in transposition and their applications in ranking problem, Ann. Statist. 5, 722 733. Lehmann, E.L. (1966), On the theorem of Bahadur and Goodman, Ann. Math. Statist. 37, 1-6. Lehmann, E.L. (1986), Testiny Statistical Hypotheses (Wiley, New York, 2nd ed.). Morris, C.N. (1982), Natural exponential families with quadratic variance functions, Ann. Statist. 10, 65-80. Morris, C.N. (1983), Natural exponential families with quadratic variance functions: statistical theory, Ann. Statist. 11, 515-529.