On improving standard estimators via linear empirical Bayes methods

On improving standard estimators via linear empirical Bayes methods

Statistics & Probability Letters 44 (1999) 309 – 318 www.elsevier.nl/locate/stapro On improving standard estimators via linear empirical Bayes metho...

107KB Sizes 2 Downloads 91 Views

Statistics & Probability Letters 44 (1999) 309 – 318

www.elsevier.nl/locate/stapro

On improving standard estimators via linear empirical Bayes methods Francisco J. Samaniego ∗ , Eric Vestrup 1 Division of Statistics, University of California, Davis, CA 95616-8705, USA Received June 1998; received in revised form November 1998

Abstract Suppose one wishes to estimate the parameter  in a current experiment when one also has in hand data from k past experiments satisfying empirical Bayes sampling assumptions. It has long been known that, for a variety of models, empirical Bayes estimators tend to outperform, asymptotically, standard estimators based on the current experiment alone. Much less is known about the superiority of empirical Bayes estimators over standard estimators when k is ÿxed; what is known in that regard is largely the product of Monte Carlo studies. Conditions are given here under which certain linear c 1999 Elsevier Science B.V. All empirical Bayes estimators are superior to the standard estimator for arbitrary k¿1. rights reserved Keywords: Empirical Bayes; Bayes risk; Linear decision rules; Parametric empirical Bayes problems

1. Introduction The empirical Bayes (EB) framework, as introduced by Robbins (1955), presumes the existence of a sequence of independent but similar experiments. More precisely, it is assumed that the parameter values governing these experiments may vary, with that assumption quantiÿed as i:i:d:

1 ; : : : ; k ; k+1 ∼ G;

(1)

where the prior distribution G is unknown. It is further assumed that Xi |i ∼ Fi ;

i = 1; : : : ; k + 1;

(2)

where, typically, Fi is assumed to be a member of some ÿxed parametric family. The independence of the random pairs (Xi ; i ) and (Xj ; j ) for all i 6= j is tacitly assumed. Robbins’ stated goal, and ours, is that of ∗

Corresponding author. E-mail address: [email protected] (F.J. Samaniego) 1

Now with Dept. of Mathematics, De Paul University, Chicago, IL 60614, USA.

c 1999 Elsevier Science B.V. All rights reserved 0167-7152/99/$ - see front matter PII: S 0 1 6 7 - 7 1 5 2 ( 9 9 ) 0 0 0 2 2 - X

310

F.J. Samaniego, E. Vestrup / Statistics & Probability Letters 44 (1999) 309 – 318

estimating the parameter value k+1 in the current experiment based on the current datum X k+1 and, possibly, on data X1 ; : : : ; X k from past experiments. Following Robbins, we will restrict attention to squared error as a loss criterion, and employ the (expected) Bayes risk r(G; d) of a rule d relative to the true prior G, that is, r(G; d) = E  EX |  (d(X ) − k+1 )2 ; ∼



(3)

as our measure of its performance. Since the “true prior” G is unknown, the Bayes rule dG for estimating k+1 , that is, the rule which minimizes r(G; ·) in Eq. (3), is unavailable as an estimator of k+1 . Robbins noted that one could e ectively approximate dG using data from past experiments. For example, when F is the Poisson model, dG may be expressed as dG (x) =

(x + 1)pG (x + 1) ; pG (x)

(4)

where pG (·) is the marginal probability mass function (pmf) of X , so that one might utilize the decision rule dk (xk+1 ) =

(xk+1 + 1)pk (xk+1 + 1) pk (xk+1 )

(5)

as an EB estimator of k+1 , where pk (·) is the empirical pmf based on X1 ; : : : ; X k . It is clear from Eqs. (4) and (5) that, for all integers x¿0; dk (x) → dG (x) as k → ∞. Johns (1956) showed that dk was indeed “asymptotically optimal”, i.e. that r(G; dk ) → r(G; dG )

(6)

as k → ∞, provided that the prior G has ÿnite second moment. Aside from demonstrations of the asymptotic optimality of EB rules in various settings, there has been rather little analytical work on the comparative performance of EB rules. One issue that has generated considerable interest, but no deÿnitive results, is the comparison of various EB rules based on a ÿxed number of past experiments with a standard estimator (the MLE or UMVUE, e.g.) based on the current experiment alone. What is generally known is that most “reasonable” EB rules outperform the standard rule if k is suciently large. If d∗ represents the standard estimator, with dk representing an EB rule of interest, then the question we wish to shed light on here pertains to the nature of the set K = {k : r(G; dk ) 6 r(G; d∗ )}. Let k ∗ be the smallest member of the set K, i.e., let k ∗ be the smallest number of past experiments needed for dk to dominate the standard rule. What is presently known about k ∗ has been derived, primarily, from simulation studies. Maritz and Lwin (1989), e.g. compared the performance of seven EB rules in the problem of estimating a Poisson parameter 51 in the current (51st) experiment when the true prior was modeled as a speciÿc gamma distribution. They found that Robbins’ estimator d50 in Eq. (5) performed very poorly relative to the MLE d∗ (x51 ) = x51 , but that certain other EB rules, especially those involving some smoothing, performed better than the MLE. In that particular study, it is evident that, for the right EB estimator, k ∗ is substantially smaller than 50. In other comparative simulation studies, Canavos (1973) and Bennett (1977) showed that, for certain classes of prior distributions, smooth EB estimators of exponential or Weibull failure rates were generally superior to the MLE in the (k + 1)st experiment when k was quite small. To our knowledge, however, there have been no theoretical results which characterize or bound the threshold value k ∗ . Indeed, Maritz and Lwin (1989, p. 87) state: “Obtaining analytical results for rk (G; EB), where EB here stands for any empirical Bayes estimator : : : seems virtually impossible, except as approximations for large k”. In this paper, we study the performance of a certain class of convex empirical Bayes estimators (CEBEs) of k+1 , and demonstrate that, in the context studied, there is always a subclass of such estimators whose performance is superior to that of “the standard estimator” of k+1 based on the current experiment alone. While, in general, the best convex empirical Bayes estimator (BCEBE) depends formally on the ÿrst two moments of the true prior G, examples are given of parametric empirical Bayes problems in which the

F.J. Samaniego, E. Vestrup / Statistics & Probability Letters 44 (1999) 309 – 318

311

BCEBE can be computed explicitly. In such problems, we are thus able to derive a most satisfying answer to the question: what is the value of the constant k ∗ ? In Section 2, we identify a class of problems in which k ∗ = 1. Thus, even with only one past experiment, there are EB rules of a very speciÿc form that dominate the standard rule based solely on the current experiment. In Section 3, we treat the general empirical Bayes problem, showing that, for arbitrary values of k, a certain class of EB rules provides uniform improvement over the standard rule. 2. The case of one past experiment We proceed directly to our main result. The subscripts on the expectations below will be subsumed when clarity is not compromised thereby; variances will be denoted by V . A bold E represents expectation with respect to the joint distribution of the random vectors X and . The empirical Bayes framework in Eqs. (1) and (2) is broadened slightly in the theorem below; the model for the conditional distribution of X is permitted to vary from one experiment to the next. Theorem 1. Let (1 ; X1 ) and (2 ; X2 ) be independent random pairs satisfying i:i:d:

1 ; 2 ∼ G

(7)

and Xi |i ∼ F(i)i ;

i = 1; 2;

(8)

F(i)i

are distributions having ÿnite second moments. If Xi is an unbiased estimator of i for where G and i = 1; 2; then r(G; cX1 + (1 − c)X2 )¡r(G; X2 ) for any constant c satisfying 0¡c¡

2EV (X2 |2 ) ; EV (X1 |1 ) + EV (X2 |2 ) + 2V ()

(9)

where  is a generic random variable having distribution G. Proof. We may write r(c) = r(G; cX1 + (1 − c)X2 ) = E(2 − cX1 − (1 − c)X2 )2

(10)

= E(c(2 − X1 ) + (1 − c) (2 − X2 ))2

(11)

= c2 E(X1 − 2 )2 + (1 − c)2 E(X2 − 2 )2 ;

(12)

since the cross product term obtained in expanding the quadratic in Eq. (11) vanishes on account of the unbiasedness of X2 . The Bayes risk in Eq. (12), being quadratic in c, is uniquely minimized by the positive value E(X2 − 2 )2 ; (13) c∗ = E(X1 − 2 )2 + E(X2 − 2 )2 and, in fact, r(c)¡r(0) for any c ∈ (0; 2c∗ ). We complete the proof by showing that 2c∗ may be rewritten as the right-hand side of Eq. (9). Clearly, E(X2 − 2 )2 = E2 EX2 |2 (X2 − 2 )2 = EV (X2 |2 );

(14) (15)

312

F.J. Samaniego, E. Vestrup / Statistics & Probability Letters 44 (1999) 309 – 318

while, by the unbiasedness of X1 , we obtain E(X1 − 2 )2 = E  EX|  (X1 − 1 )2 + E  EX|  (1 − 2 )2 ∼







(16)

= EV (X1 |1 ) + V (1 − 2 )

(17)

= EV (X1 |1 ) + 2V ():

(18)

Substituting Eqs. (15) and (18) into Eq. (13) yields the desired expression in (9). The inequality in Eq. (9) provides information about the mixing constant c in a readily interpretable form. It indicates, for example, that only in the case that X2 is degenerate at 2 is it impossible to improve upon X2 as an estimator of 2 . Further, it indicates that the size of c, that is, the weight one would wish to place on the past datum X1 , depends on the variability of X1 ; X2 and . One should place substantial weight on X1 when (i)  is not highly variable, and (ii) X2 is much more variable than X1 . These observations, extracted quite easily from Eq. (9), agree with one’s native intuition on this problem since, together, conditions (i) and (ii) imply that 1 and 2 are close, so that the past experiment does provide useful information about the current parameter, and that, under such a circumstance, a precise estimate of 1 is more useful as an estimator of 2 than an imprecise estimator of 2 . Note that since EV (Xi |i ) = V (Xi ) − V (i )

for i = 1; 2;

(19)

the constant c∗ in Eq. (13) may be rewritten as c∗ =

V (X2 ) − VG () : V (X1 ) + V (X2 )

(20)

When VG () = 0, that is, when G is degenerate at the (unknown) point 0 , and the model F is the same in each experiment (as in Eq. (2)), then c∗ in Eq. (20) reduces to the familiar c∗ = 1=2. When VG () is large, borrowing strength from the past is still beneÿcial, but the weight assigned to X1 in the convex combination cX1 + (1 − c)X2 should be suitably small. In many problems of interest, a rough upper bound for c may be obtained by examining plausible priors in some parametric class. It should be noted that a conservative choice of c, smaller than an approximated optimal value c∗ , might be reasonable when one’s intuition about the variability of  is a bit fuzzy. As it happens, there are situations in which the optimal EB rule of the form cX1 + (1 − c)X2 may be identiÿed explicitly. We give two examples of such situations below, both in the context of empirical Bayes estimation with parametric, but not completely speciÿed, priors. See Morris (1983) for a detailed discussion of the parametric empirical Bayes approach. Example 1. Consider a “parametric Empirical Bayes” treatment of Robbins’ Poisson problem. Assume that, i:i:d: for i = 1; 2; Xi |i ∼ P(i ) and 1 ; 2 ∼ ( ; 1); where ( ; 1) is the gamma distribution with density f(| ) =

1 −1 −  e I(0;∞) (): ( )

The posterior distribution of 2 , given X2 = x2 , is the of 2 , were known, would be dG (x2 ) = 12 ( + x2 ):

( + x2 ; 1=2) distribution, so that the Bayes estimate

F.J. Samaniego, E. Vestrup / Statistics & Probability Letters 44 (1999) 309 – 318

313

In the parametric EB approach, one would seek to estimate from the data. Given that the mean of the marginal distribution of X is , the estimator ˆ = (x1 + x2 )=2 seems reasonable, and yields the EB estimator d(x) = 14 x1 + 34 x2 : Interestingly, the EB estimator above is, in fact, a convex combination of X1 and X2 . Does it outperform the standard estimator d∗ (x2 ) = x2 ? From Eq. (9), we see that the convex combination cX1 + (1 − c)X2 improves upon X2 alone whenever 0 ¡ c¡ =

2EV (X2 |2 ) EV (X1 |1 ) + EV (X2 |2 ) + 2VG ()

2 1 2E2 = : = E1 + E2 + 2V () 4 2

In particular, the optimal constant c∗ is equal to 14 . Thus, in this case, the parametric EB rule is not only better than the standard estimator X2 but is, in fact, best in a class of linear EB rules that dominate X2 . Example 2. Consider two binomial experiments, with Xi |pi ∼ B(ni ; pi );

i = 1; 2;

so that X1 =n1 and X2 =n2 play the roles of X1 and X2 in the theorem. Suppose that the unknown prior G is assumed to belong to the subclass of Beta distributions with unknown mean  ∈ (0; 1) having density function f(|) =

(K) K−1 (1 − )(1−)K−1 I(0;1) (); (K) ((1 − )K)

where K is a ÿxed, known positive constant. Given the model above, the upper bound in inequality (9) reduces to 2=n2 : 1=n1 + 1=n2 + 2=K It thus follows that the EB estimator x1 x2 d1 (x2 ) = c + (1 − c) n1 n2 is superior to the standard estimator pˆ 2 = x2 =n2 for any c satisfying 0¡c¡

2n1 K ; n1 K + n2 K + 2n1 n2

with the optimal choice of c equal to c∗ =

n1 K : n1 K + n2 K + 2n1 n2

When K and n1 are both considerably larger than n2 , we note that c∗ ≈ 1. 3. The general case Consider now a general empirical Bayes problem in which there are k¿1 past experiments. Theorem 1 implies the existence of linear EB rules which dominate the estimator X k+1 of k+1 ; indeed, one can improve on X k+1 by constructing appropriate linear combinations of X k+1 and any single outcome of a past experiment. But one expects to do substantially better by employing a linear EB rule which exploits all the past data.

314

F.J. Samaniego, E. Vestrup / Statistics & Probability Letters 44 (1999) 309 – 318

Pk+1 In this section, we obtain the general form of the convex combination i=1 ci Xi with the smallest possible Bayes risk. Our restriction to convex estimators, that is, to linear estimators, whose coecients satisfy Pk+1 Pk+1 i=1 ci = 1, is motivated by the fact that, under that restriction, i=1 ci Xi is an unbiased estimator of the mean of G. While the optimal coecient vector c will, in general, depend on parameters of the prior G, we show by example that the best linear EB estimator can be obtained explicitly in certain problems. We begin with a derivation of a useful representation of the Bayes risk of a linear EB rule. We then provide formulae corresponding to the best rule in this class, and we discuss a special subclass of linear EB rules which dominate X k+1 . We close this section with an example based on the normal EB paradigm. Theorem 2. Let (X1 ; 1 ); : : : ; (X k+1 ; k+1 ) be independent real-valued random pairs satisfying i:i:d: (i) 1 ; : : : ; k+1 ∼ G; where G has ÿnite second moment; i:i:d: (ii) Xi |i ∼ F(i)i ; for i = 1; 2; : : : ; k + 1; where F(i)i has ÿnite second moment; and (iii) E(Xi |i ) = i ; for i = 1; 2; : : : ; k + 1. Pk+1 Pk+1 Then the Bayes risk of the empirical Bayes rule ˆk+1 = i=1 ci Xi , with i=1 ci = 1, is given by ! k+1 k X X 2 ci Xi = ci2 (EV (Xi |i ) + V ()) + ck+1 EV (X k+1 |k+1 ) + (1 − ck+1 )2 V (); r G; i=1

(21)

i=1

where  is a generic random variable with distribution G. Pk+1 Proof. Let Sk+1 = {(c1 ; : : : ; ck+1 ) ∈ R k+1 : i=1 ci = 1 and c1 ; : : : ; ck+1 ¿0}. We may write, for c ∈ Sk+1 , !2 ! !2 k+1 k+1 k+1 X X X ci Xi = E ci Xi − k+1 = E ci (Xi − k+1 ) r(c) = r G; i=1

i=1

=

k+1 X

i=1

ci2 E(Xi − k+1 )2 +

i=1

k+1 k+1 X X

ci cj E[(Xi − k+1 ) (Xj − k+1 )]:

i=1 j=1 i6=j

Now, E(Xi − k+1 )2 = E(Xi − i )2 + E(i − k+1 )2 + 2E(i − k+1 ) (Xi − i ) = EV (Xi |i ) + V (i − k+1 ) + 2E  (i − k+1 )EX | (Xi − i ) ( =



EV (Xi |i ) + 2V (); for i = 1; : : : ; k; EV (X k+i |k+1 );

for i = k + 1:

Also, if 16i 6= j6k, E(Xi − k+1 ) (Xj − k+1 ) = E  EXi |i EXj |j (Xi − k+1 ) (Xj − k+1 ) ∼

= E  (i − k+1 ) (j − k+1 ) ∼

= Ek+1 [Ei (i − k+1 )Ej (j − k+1 ) ] = E(k+1 − Ek+1 )2 = V ():

F.J. Samaniego, E. Vestrup / Statistics & Probability Letters 44 (1999) 309 – 318

315

If i 6= j and either i = k + 1 or j = k + 1, then E(Xi − k+1 ) (Xj − k+1 ) = 0. Hence r(c) =

k X

2 ci2 (EV (Xi |i ) + 2V ()) + ck+1 EV (X k+1 |k+1 ) +

i=1

=

k X

k k X X

ci cj V ()

i=1 j=1 i6=j

2 ci2 (EV (Xi |i ) + 2V ()) + ck+1 EV (X k+1 |k+1 ) +

(1 − ck+1 )2 −

i=1

=

k X

k X

! ci2

V ()

i=1 2 ci2 (EV (Xi |i ) + V ()) + ck+1 EV (X k+1 |k+1 ) + (1 − ck+1 )2 V ():

i=1

In our next result, we identify the precise linear combination of the X s that minimizes the Bayes risk r(c). Theorem 3. Assume that conditions (i)–(iii) of Theorem 2 hold. For i=1; : : : ; k; let ai =EV (Xi |i )+V (); let Pk+1 ak+1 = EV (X k+1 |k+1 ); and let V = V (). Then the value of c ∈ Sk+1 that minimizes r(c) ≡ r(G; i=1 ci Xi ) ∗ ∗ ∗ is c = (ci ; : : : ; ck+1 ) such that ak+1 ; i = 1; : : : ; k; (22) ci∗ = Pk ai [1 + (ak+1 + V ) j=1 a1j ] ∗ ck+1

=

1+V

Pk

1 + (ak+1 +

1 j=1 aj Pk V ) j=1 a1j

:

Proof. We will minimize r(c) subject to seek solutions to the system of equations

(23) Pk+1 i=1

ci = 1. Let the Lagrange multiplier be denoted by  ∈ R . We

∇r(c) = ∇(c1 + c2 + · · · + ck+1 );

(24)

c1 + c2 + · · · + ck+1 = 1:

(25)

Eqs. (24) and (25) reduce to 2ci ai = ;

i = 1; : : : ; k;

(26)

2[ck+1 (ak+1 + V ) − V ] = ;

(27)

c1 + · · · + ck+1 = 1;

(28)

we may rewrite Eqs. (26) and (27) as ci =

 ; 2ai

ck+1 =

i = 1; : : : ; k;

 + 2V : 2(ak+1 + V )

Utilizing Eqs. (29) and (30) in Eq. (28), we have k

 + 2V X 1 = 1; + 2 aj 2(ak+1 + V ) j=1

(29) (30)

316

F.J. Samaniego, E. Vestrup / Statistics & Probability Letters 44 (1999) 309 – 318

from which we obtain 2ak+1 : = Pk 1 + (ak+1 + V ) j=1 (1=aj )

(31)

Using Eq. (31), the unique solutions to the system in Eqs. (24) and (25) are thus identiÿed as ci∗ =

 ak+1 = Pk 2ai ai [1 + (ak+1 + V ) j=1

1 aj ]

;

i = 1; : : : ; k;

Pk 1 + V j=1 a1j  + 2V ∗ = = ck+1 P 2(ak+1 + V ) 1 + (ak+1 + V ) kj=1

1 aj

:

It is easy to demonstrate that c∗ corresponds to an absolute minimum of r(c) among c ∈ Sk+1 . In the form above, our result is an existence theorem which demonstrates that there always exists a collection of estimators in the class of convex combinations of X1 ; : : : ; X k+1 which will dominate X k+1 alone as an estimator of k+1 . From a practical point of view, it is clear that some guidance on the selection of the mixing constants c1 ; : : : ; ck+1 , or on the identiÿcation of the optimal vector c ∗ , would be of use. The following result shows that the estimator Xk+1 of k+1 is dominated by the class of linear EB estimators which place suciently large weight on the observation Xk+1 while reserving some positive weight for observations from past experiments. Theorem 4. Assume that conditions (i)–(iii) of Theorem 2 hold. For i = 1; : : : ; k; let ai = EV (Xi |i ) + V (); ∗ −ak+1 let ak+1 = EV (Xk+1 |k+1 ); and let V = V (). Let a∗ = max{a1 ; : : : ; ak }. If ck+1 ∈ ( aa∗+V +V +ak+1 ; 1); then r((c1 ; : : : ; ck+1 ))¡r(G; X k+1 ):

(32)

Proof. The inequality r(c)¡r(G; X k+1 ) may be written as k X

2 ci2 ai + ck+1 ak+1 + (1 − ck+1 )2 V ¡ak+1 :

i=1

Since ci ¿0 for each i, we have

Pk

2 ak+1 + (1 − ck+1 )2 a∗ + ck+1

Pk 2 2 i=1 ci 6( i=1 ci ) (1 − ck+1 )2 V ¡ak+1

(33) = (1 − ck+1 )2 . Thus (34)

will imply Eq. (33). Now Eq. (34) is equivalent to 2 (a∗ + ak+1 + V ) − 2ck+1 (a∗ + V ) + a∗ + V − ak+1 ¡0: ck+1

(35)

Since a∗ + ak+1 + V ¿0, the inequality (35) is satisÿed for ck+1 between the two roots of the quadratic equation in (35). It is easily veriÿed that these roots are a∗ + V ± ak+1 : a∗ + V + ak+1 Thus, the inequality (32) holds for all c for which ck+1 satisÿes 1−

2ak+1 ¡ck+1 ¡1: a∗ + V + ak+1

Example 3. The classic problem of estimating the mean of a multivariate normal distribution N (; I ) has been examined from an empirical Bayes viewpoint by Efron and Morris (1973). Here, we will consider a

F.J. Samaniego, E. Vestrup / Statistics & Probability Letters 44 (1999) 309 – 318

317

related problem, that of estimating a single component k+1 of a (k + 1) dimensional mean vector, taking the view that the information on the other components of  as data from k past experiments. Speciÿcally, suppose that the (k + 1) random pairs (Xi ; i ), are independent, with i:i:d:

1 ; 2 ; : : : ; k ; k+1 ∼ N (0 ; 1); Xi |i ∼ N (i ; 1);

for i = 1; : : : ; k + 1:

Using the notation of Theorem 3, we have, V = V () = 1 and ai = EV (Xi |i ) + V () = E(1) + 1 = 2;

for i = 1; : : : ; k

ak+1 = EV (X k+1 |k+1 ) = 1: It follows that the best convex EB estimator (BCEBE) of k+1 is given by ˆk+1 =

k+1 X

ci∗ Xi ;

i=1

where ci∗ =

1 1 ; Pk 1 = 2[1 + 2 1 ( 2 ) ] 2k + 2

∗ = ck+1

Pk

1 k +2 1( 2 ) : Pk 1 = 2k +2 1+2 1

1+1

2

Thus, the BCEBE in this problem is the estimator ˆk+1 =

k k +2 X k+1 ; Xk + 2k + 2 2k + 2

where k

Xk =

1X Xi : k i=1

In the case of one past experiment, that is, when k = 1, the best linear EB rule places weight 3=4 on the current observation and weight 1=4 on the past observation. 4. Discussion The primary domain of application of these results is to problems in which the standard estimator in the current experiment is unbiased. This tends to be the case in many common problems involving exponential families, and includes the problems of estimating a Binomial proportion or the mean of a Poisson, normal or exponential distribution. It must be borne in mind, of course, that the empirical Bayes assumptions are essential and are stringent enough to eliminate many settings in which combining data from disparate sources might be contemplated. As is well recognized, Robbins’ empirical Bayes approach is a frequentist theory of inference which utilizes no prior modeling regarding unknown parameters. Bayes empirical Bayes methods have been treated by, among others, Deely and Lindley (1981) and Walter and Hamedani (1991). For an analysis complementary to the present one which compares Bayes estimators in an empirical Bayes setting, see Samaniego and Neath (1996).

318

F.J. Samaniego, E. Vestrup / Statistics & Probability Letters 44 (1999) 309 – 318

References Bennett, G.K., 1977. Basic concepts of empirical Bayes methods with some results for the Weibull distribution. In: Tsokos, C., Shimi, I.N. (Eds.), The Theory and Applications of Reliability, vol. II. Academic Press, New York, pp. 181–202. Canavos, G.C., 1973. An empirical Bayes approach for the Poisson life distribution. IEEE Trans. Reliab. R-22, 91–96. Deely, J., Lindley, D., 1981. Bayes empirical Bayes. J. Amer. Statist. Assoc. 76, 833–841. Efron, B., Morris, C., 1973. Stein’s estimation rule and its competitors – an empirical Bayes approach. J. Amer. Statist. Assoc. 68, 117–130. Johns, M.V. Jr., 1956. Contributions to the theory of non-parametric empirical Bayes procedures in statistics. Unpublished Ph.D. Dissertation, Columbia University. Maritz, J.S., Lwin, T., 1989. Empirical Bayes Methods. Chapman and Hall, London. Morris, C., 1983. Parametric empirical Bayes inference: theory and applications. J. Amer. Statist. Assoc. 78, 47–65 (with discussion). Robbins, H., 1955. An empirical Bayes approach to statistics. Proc. 3rd Berkeley Symp. on Mathematical Statistics and Probability. UC Press, Berkeley. Samaniego, F.J., Neath, A.A., 1996. How to be a better Bayesian. J. Amer. Statist. Assoc. 91, 733–742. Walter, G., Hamedani, G., 1991. Bayes empirical Bayes estimation for natural exponential families with quadratic variance functions. Ann. Statist. 19, 1191–1224.