Journal
of Statistical
Planning
and Inference
81
24 (1990) 81-86
North-Holland
ESTIMATION OF PRIOR DISTRIBUTION AND EMPIRICAL BAYES ESTIMATION IN A NONEXPONENTIAL FAMILY B. PRASAD Directorate of Economics & Statistics, Telhan Bhavan, Hyderabad, India
Radhey
S. SINGH
Department of Mathematics & Statistics, University of Guelph, Guelph, Ontario, Canada Received
3 February
Recommended
1987; revised
by V.P.
manuscript
received
14 November
1988
Godambe
Abstract: Empirical Bayes squared error loss estimation for a nonexponential family with densities f(xl0) = exp( - (x- @)1(x> 0) for 6’E 0, a subset of the real line, is considered. An almost sure consistent
estimator
on this estimator is exhibited.
of the prior distribution
an empirical
Asymptotic
Bayes estimator
optimality
of this estimator
AMS Subject Classifications: Primary Key words and phrases: Prior asymptotic
G, whatever consistent
62C12;
distribution;
is further
secondary empirical
it may be, on 0, is proposed.
for the minimum
risk optimal
Based
estimator
established.
62F05 Bayes;
square
error
loss; consistency;
optimality.
1. Introduction Emprical
Bayes
approach
to statistical
problems
was pioneered
by Robbins
(1955). It was later studied in detail in the context of various statistical estimation and/or hypothesis testing problems by a number of authors including Robbins (1963, 1964), Johns and Van Ryzin (1971, 1972), Fox (1978), Susarla and O’Brayan (1975) and Singh (1977, 1979, 1985). For example, Johns and Van Ryzin (1972), Singh ((1977) and (1979)) and Lin (1975) studied the EB approach to square error loss estimation (SELE) in certain exponential families of probability densities, and Fox (1978) studied EB SELE in some nonexponential families. The approach taken in these works is to estimate the Bayes estimator directly. In this paper EB SELE in a useful nonexponential family is considered and the approach adopted is to use the Bayes estimator w.r.t. an almost sure consistent estimator of the prior, whatever it may be. The advantage with this approach is that it provides separately a consistent estimator of the prior distribution as well, not available by the direct approach. 0378.3758/90/$3.50
0
1990, Elsevier
Science Publishers
B.V. (North-Holland)
B. Prasad, R.S. Singh / Estimation of prior distribution
82
In the EB context, the component problem considered here is the SELE of 19based on an observation from the density f(x 10)= exp( - (x - O)Z(x> 19), where parameter 8 is a random variable with an unknown prior distribution G on 0, a subset of the repetitions real line. Based on observations Xi, . . . , X, from the past n independent of the component problem, where Xi-f(. 119;)and 0;‘s are unobservable random variables i.i.d. according to G, a with probability one consistent estimator of G is presented. This estimator of G and the observation X from the present problem are then used to exhibit an EB estimator, which is asymptotically optimal in the sense of Robbins
(1955).
2. The probability
model
and a consistent
estimator
of the prior distribution
function As mentioned in Section 1, the random observation X of our interest in the component problem is distributed according to the conditional probability density function f(x 10) = exp( - (x - O))Z(x> 0) where 6’is an unobservable random variable with an unknown prior distribution function G on 0 c (- 03, 00). The conditional cumulative distribution function (c.d.f.) of X given 0 at a point t is therefore ’
F,(t1e)=F(tI8)=P[XItIe]=
f(xIe)dx=z(t>e)-f(t18).
L Since G is the unconditional c.d.f. of 8, the marginal p.d.f. of X at x is given by f(x)= jf(xlO) dG(@ and the marginal c.d.f. of X at x is given by 1
F(x) =
I
F(xl0)
dG(8)
= 1 [Z(x>O)-f(xIO)] c Thus we can write the c.d.f.
dG(B)=G(x)-f(x).
G of 6’ at a point
(2.1)
x as (2.2)
G(x) = F(x) +.0x),
which can be estimated by estimating F and f. obtained from n independent past experiences Let Xi, X,, . . . . X,, be observations of the component problem in our empirical Bayes frame work, where Xi lOi has p.d.f. f(. 10;)and Bi, . . . . 0, are i.i.d. random variables with common c.d.f. G. Then the empirical distribution function of Xi, . . . . X, at a point x is F,(x)=~-’
2 Z(XjSX), j=l
and,
since Xi, . . . . X,, are i.i.d.
according
to common
c.d.f.
F, by the Glivenko-
83
B. Prasad, R.S. Singh / Estimation of prior distribution
Cantelli
theorem,
definition
sup,(F,(x)-F(x)\+0
of a probability
density
2h
h-0
f(x)
w.p.
1. Further,
since
by the
F(x+h)-F(x-h)
f(x) = lim we estimate
as n+co function,
’
by
f,(x) =
F,(x+h,)-F(x-h,)
(2.3)
2h,
where O
1. The estimator G, defined by
(2.4)
G,(x) = F,(x) +f,(x)
with h,-n _ “’ is a strongly consistent estimator of the prior distribution G(x) for almost all x, whatever G may be on 0.
3. Proposed
empirical
Throughout
Bayes estimator
the remainder
of this paper we consider
a square error loss function
for EB estimation of f? for the model under our study and restrict our attention to the parameter space 0 which is a subset of (- n, a) for an 0
d,(X)
= E(8 IX) =
Sef(xte)dG(@ !f(xtQ) dG(B)
which is not available to us for use since G is unknown. Since G, consistently estimates G, a natural EB estimator component problem, evaluated at X,, + r =X, would be e,(x)=
Sww) dG,W b-WI@d’%(e) .
However,
since the parameter
E(BlX) =d,(X)
(3.1)
of B in the (n + 1)st
(3.2)
space under study is in (-a,a) and hence -a< all values of X, we propose to restrict (3.2) to
B. Prasad, R.S. Singh / Estimation of prior distribution
-a d,(X)=
We will now computational Notice that
express
-a,
if Qn(X)(
if -a
Q,(X) a 1: d,(X)
explicitly
(3.3)
in terms
of X,, X2, . . . , X,
and
X for
purpose.
s
f(xlO)dF,(B)=n-’
jet fCxIxj)
=exp(-x).n-’
f(xl@dF,(~+kJ=
i
i eXp(Xj)Z(Xj
f(xl(u-h,))dF,(u)
-’ ji,
=n
f(xI(xj-hn))
=exp(-x-h,).n-’
i exp(Xj)Z(Xj
Similarly, f(-+)
dF,(e-
A,) =
f(xl(u
+ h,)) dF,(u)
i
=exp(-x+h,).n-’
i eXp(Xj)Z(Xj
Thus we have from (2.4), (2.3) and the above expressions,
s fw
dwe) = fw)
i
=exp(-x)n-’
w,(e)
+mm
i Aj(X) j=l
where Aj(x)=eXP(Xj)[Z(Xj
(3.4)
Similarly, using the above techniques if we write the expressions for j ef(xlS) dF,(O), ~Of(xlO) d(F,J and ~Of(xiO) dF,(O-h,) in terms of 4, we get from (2.3) and (2.4),
.i’
ef(xle)dG,(e)=
i
ef(xle)d(F,(e)+f,(e))=exp(-x)n-'
i
j=l
Bj(X)
B. Prasad,
R.S. Singh / Estimation
ofprior
85
distribution
where B~(X)=eXP(Xj)[XjZ(X;
-(Xj+h,)exp(h,)Z(Xi
(x)
=
n
4. Consistency
in our pro-
c;=
14(X)
C;=, Aj (X) ’
and asymptotic
optimality
d,
of EB estimator
In this section we show that d,, is not only a consistent estimator of do but is also asymptotically optimal. As we have indicated earlier, F,,(x)+F(x) with probability one uniformly in x, and f,,(x)+f(x) with probability one for almost all x. Hence G,(x)=F,(x)+f,(x) converges to G(x) =F(x) +f(x) with probability one for almost all x. Further, since for each x, f(xl0) and f3f(xl0) are continuous and bounded in 0, by Helly-Bray theorem, as n-w,
f(xl@ dG,(W
I’
f(xl0)
dG(0) =f(x)
in prob.,
I
and @-(xl@ dG,(W
II
Of(xlO)f(xJO)
dG(0)
in prob.
Hence, Q,(x)
=
h+‘) dG,(@ --,jQf(xlQdG(@
in probability for almost all x where f(x)>O. Now consider (d,(X) -do(X)j2. From the definition that j&(X)-d,(X))
I IQ,(X) - d,(X)1 A2a
(x)
=d
(4.1)
’
@-(xl@) dG,(Q) Sf(xl@dG(B)
of d,(X)
for almost
it is easy to see
all X,
(4.2)
since - a~ d,(X) I a for almost all X. Hence from (4.1) and (4.2), it follows that Id,(X) -d,(X)1 -+O in prob. for almost all X. Further since by (4.2), Id,(X) - d,(X)/ < _ 2 a, and the risks due to d,, and dc are respectively R(d,,, G) =&d,(X) - f3>2and R(G) =E(d,(X) - O)‘, and since ZW,,, G) - R(G) = W,,(X)
- dc(X))2
(4.3)
86
of prior distribution
B. Prasad, R.S. Singh / Estimation
(e.g., see Singh (1985)), by the Lebesgue dominated convergence converges to zero. We have thus proved the following theorem.
Theorem,
(4.3)
2. Let d,, be as defined in (3.3). Then d,,(x) is a consistent (in probability) estimator of the &yes optimal estimator do(x) for almost all x in the set {x: f(x)>O}. Further, d,, is asymptotically optimal in the sense that its risk approaches to the minimum possible risk R(G) as n+w. Theorem
References Ferguson,
T.S. (1967). Mathematical Statistics: A Decision Theoretic Approach. Academic
Press,
New
York. Fox,
R. (1978). Solutions
to empirical
Bayes squared
error
loss estimation
Ann. Statist. 6,
problems.
846-853. Johns,
M.V. and J. Van Ryzin (1971). Convergence
Johns,
rates for empirical
Bayes two-action
problems,
1.
case. Ann. Math. Statist. 42, 1521-1539.
Discrete
M.V. and J. Van Ryzin (1972). Convergence
Continuous
rates for Empirical
Bayes two-action
problems,
II.
case. Ann. Math. Statist. 43, 934-947.
Lin, P.E. (1975). Rates of convergence
in empirical
Bayes estimation
problems.
case. Ann.
Continuous
Stutist. 3, 155-164. Nadaraya,
E.A. (1964). On nonparametric
estimates
of density
function
approach
to statistics.
and regression
curves.
Theory
Probab. Appl. 10, 186-190. Robbins,
H. (1955).
An empirical
Bayes
Proc. Third Berkeley Symp. Math.
Statist. Vol. 1, 157-164. Robbins,
H. (1964). The empirical
Bayes approach
to statistical
decision
Ann. Mafh. Statist.
problems.
35, l-20. Robbins,
H. (1963). The empirical
Bayes approach
to testing
statistical
hypotheses.
Rev. Inst. Statist.
Inst. 31, 195-208. Rosenblatt,
M. (1956).
Remarks
on some nonparametric
estimators
of density
function.
Ann. Math.
Statist. 27, 832-837. Singh,
R.S.
(1976).
exponential Singh,
Bayes
estimation
with convergence
rates
in noncontinuous
Lebesgue
Ann. Statist. 4, 431-439.
R.S. (1979). Empirical
possible Singh,
Empirical
families.
Bayes estimation
in Lebesgue
exponential
families with rates near the best
rate. Ann. Statist. 7, 890-902.
R.S. (1985). Empirical
Bayes estimation
in a multiple
linear regression
model.
Ann. Inst. Stat.
Mafh. 37(A), 71-86. Susarla, V. and T. O’Brayan (1975). An empirical nents for a translated exponential distribution.
Bayes two action
problem
with nonidentical
Comm. Statist. 4(8), 767-775.
compo-