Journal
of Statistical
Planning
and Inference
37 (1993) 159-167
159
North-Holland
Bayesian robustness for hierarchical E-contamination models Elias Moreno Universidad de Granada,
Granada, Spain
Luis Ratil Pericchi Universidad Sim6n Bolivar. Caracas,
Received
Abstract:
Venezuela
1 April 1991; revised manuscript
In the robust
Bayesian
received 2 October
literature
in order to investigate
form of a base prior distribution
n, (in particular
tion model of prior distributions
f = {z: a=( 1-.s)~~(8
Q and E reflects the amount of error in n&B considering
/;1)+eq,qE
to replace n,(O
1, is the ‘known’ part of/z and AZ is the ‘uncertain’ constraints. hypothesis AMS
Ranges
of posterior
belonging
Here n,(B
family of priors n,(B
/ ~)=z~‘a,[(O-,u)/z],
probabilities
instead of
1A) by z,,(S 1L,)=j n,(8 ~/2J,)h(dA,),where
part, with the uncertainty
Classes considered
(A) is the
to some suitable class
inference is highly sensitive to changes of /2. Therefore,
a single base prior n,, it is reasonable measures).
with respect to the functional
Q}, has been proposed.
4 is a contamination
/ A). For the location-scale
it turns out that the posterior
being a class of probability
robustness
with respect to the shape of the prior tails) the s-contamina-
base elicited prior, L is a vector of fixed hyperparameters, where ,l=gl,r),
1992
reflected by allowing
here include those with unimodality
of any set may be computed.
Illustrations
hEH
(H
and quantile
related to normal
testing are given.
1980 Subject Class$cation:
Key words andphrases:
Primary
c-contaminated
62Al5;
secondary
62Fl5
classes of priors; hierarchical
model; robust
Bayes; scale mixture of
normals.
1. Introduction A very fruitful and realistic approach to modern statistical inference is Bayesian analysis where the use of partial information is recognized and modelled through classes of priors. This has been coined ‘Robust Bayesian analysis with respect to the prior’, see Berger (1984). One way of modelling partial prior information, when Correspondence
de Granada,
to: Prof. E. Moreno,
Campus
0378-3758/93/$06.00
de Fuente Nueva,
0
1993-Elsevier
Departamento
de Estadistica
1807 1 Granada,
Science
e Investigation
Operativa,
Spain.
Publishers
B.V. All rights
reserved
Universidad
160
E. Moreno,
L. R. Pericchi / Bayesian robustness for hierarchical models
there is enough information contamination class r(Q)={7~:
to suggest
rr=(l-cE)nc(0
where rce(B 1A) is a fixed base elicited
a base
1A)+&q,
prior
rrO, is to consider
the E-
qEQ},
prior (typically
(1.1)
chosen
to be conjugate
with
respect to the sampling model), A is a vector of hyperparameters considered fixed, is a contamination, E is such that O
mal base prior may be adequate but surely we shall not know the exact values of the mean and variance. It is implicitly taken for granted that the model (1.1) is able to avoid the need for a careful elicitation of the hyperparameters. Having ‘robustified’ the fixed base prior rco through r(Q), it is hoped that the analyses are not very sensitive to the chosen value of A. In this work, for different contaminating classes Q, we examined the Econtamination model (1.1) for rro belonging to the location-scale family of distributions, that is rro(8 1 A)=r-‘~o[(d-,u)/7], so that A =(p,t), and check if the posterior inference is reasonably robust with respect to the hyperparameters. If high sensitivity to A is detected we should try to model the variation of the hyperparameter A (or some components of it, typically the scale 7 which is more difficult to assess), in what is to become a robust hierarchical model. The quantity of interest considered here is the posterior probability of one-sided hypotheses. We shall see that inferences in this problem are highly sensitive to r. Often, partial prior information about r is available, and can be modelled by easily elicitable classes of hyperprior distributions, for example, the unimodal class and classes having a few quantiles fixed. We give mathematical results that allow the calculation of upper and lower probabilities of arbitrary sets, as the prior ranges over the class rco(8/A)h(dA)+~q,
~EH,
qEQ
, (1.2) 1
where H and Q are such classes of distributions. We illustrate the analysis for scale mixtures of normals as the base elicited prior, where the mixing distributions h(r) have quantile and unimodality constraints. The special case of (1.2) when E = 0 is of particular interest. Then ~~((3) belongs to the very important class of scale mixtures of normal distributions, motivated by de
E. Moreno,
Finetti
as outlier
161
L. R. Pericchi / Bayesian robustness for hierarchical models
prone
distributions;
see de Finetti
(1974), and West (1984). The idea behind the general (1.2), with H of specifying E (typically small) to represent in general; this could be regarded as ‘global’ for the mixing distribution would be more of
(1961), Andrews
and Mallows
and Q, is to also allow the possibility the probability of a wrong assessment uncertainty. The elicitation of the class a reflection of specific knowledge, and
could be thought of as ‘local’ uncertainty. Section 2 gives the variation of the ranges of the posterior probabilities of H, = { 8: 860) as the scale parameter of rrO changes, and gives the results needed to obtain the extrema of the posterior probabilities of an arbitrary set, as the prior ranges over T(H, Q). Section 3 illustrates these results and Section 4 gives some concluding remarks.
2. Modelling
uncertainty
of the hyperparameters
in &-contamination
classes
At this stage we highlight the fact that the inference given by the e-contamination model (1.1) is highly sensitive to the variation of the hyperparameters I, both for classes Q defined by quantile and unimodality restrictions. Consider the classes
qEQ,},
I-(QJ={n:
n=(l-&)T-‘TCO[(e-~O)/T]+&q,
r>O,
I-(Q,)={TI:
n=(1-e)7-‘TIO[(B-iLlg)/S]+eq,
r>O, qEQ,},
and f-(Q,,)={z:
7C=(1-&)7-1TCO[(B-FIO)/T]+&q,
r>O,
qEQuc3,
where Q, = {q: q has median
located
Q, = {q: q is unimodal
with mode at ,Y~},
at pO},
and Q,, = {q: q is unimodal with mode at p. and its median is located at ,D~}. 1. Let f (x 1Q) be the likelihood function where t9E I?. Assume v/(a) =jii+’ f (x IO) d6’ is a bounded function. Then, for any E> 0
Theorem
inf P”(H, n Er(Quc)
( x) = 0,
sup
that
P(Ho(x)=l,
n Er(Q,c)
and the same applies for r(Q,), since r(Q,,)Cr(Q,). Furthermore, it is enough to require that lim supI _ Q,f (x / 0) = 0 for the above results to hold for r(Q,). Proof. The assertions follow from Corollary Fatou-Lebesgue’s Lemma, and the representation ture of uniform distributions (see, for instance,
1 in Moreno and Cano (1991), of a unimodal density as a mixBerger, 1985, p. 233).
162
E. Moreno,
L. R. Pericchi / Bayesian robustness for hierarchical models
Table 2.1 Sensitivity
with respect
to T
T
P”Wo
p’, P’
1x1
p”, PU
0.10
0.42
0.36
0.49
0.33
0.55
0.95
0.27
0.19
0.45
0.16
0.57
2.24
0.27
0.14
0.58
0.10
0.70
5.00
0.26
0.09
0.73
0.06
0.83
Remark 1. Note that a priori infnErcQEj P”(H,) =
the prior imprecision of N, under r(Q,) Ho is one for any observation x.
supnEr(Qc) Pn(Ho) = 0.5. That is, is zero. However, the posterior range of
Example 1. We illustrate the above result for the likelihoodf(x
10) = N(B,O. l), and a normal base prior rcOwith mean 0 and variance r2, allowing the latter to vary. Consider the one-sided testing problem HO: 8<0 versus Hi : fl> 0. Let x= 0.2 be the observed value. In Table 2.1 for various values of the prior standard deviation r and E = 0.2 the ranges of the posterior probabilities of Ho as rc ranges over r(Q,) and r(Q,) are shown. Here p. is taken to be 0 in the definition of Q, and Q,. The notation in Table 2.1 is PC = infXEr(ocj P’(Ho 1 x), PC = supnEr(Q,) P’(f& 1 x), p’(ffO
fU=infnEr(Q.)
j x)2
p”
=
SUP~EI-(Q,)
p=(ffO
1 xl-
From Table 2.1 it can be seen that changes of the prior standard deviation r affect the range of the posterior probabilities of Ho considerably. Observe that the variation in r is of greater importance for the e-contamination neighborhood than for a precise analysis based on rro alone; see values of P”(Ho 1x) in Table 2.1. For the latter, as r grows, the posterior probabilities stabilize. Consider the class (H,,Q,) of prior distributions with some specified quantiles,
n} are partitions of the parameter spaces of where {Bj,j=l,..., m>, {Ci, izl,..., r and B respectively, and hi, qi are numbers such that t hj=l,
hj~0,
j=l,...,
m;
;c,q;=l,
qi>O, i=l,...,
n.
j=l
2. Let X be a random variable with density f (x 1f3), where 6’ is an unknown parameter. Let IVY,, Q,) be the class of priors
Theorem
W&,
Q,> = Z: de) = U-d I
.I’
no@ 1rP(dr) + M%
(k 4 E WC, Q,) , 1
where 0 GE < 1. Then for any Bore1 set A of 0 ‘s, the ranges of the posterior proba-
E. Moreno,
bilities P(A
(ii)
L. R. Pericchi / Bayesian robustness for hierarchical models
163
) x) as TCranges over T(H,, Q,) are given by
sup P(A J76l-
1x) =
sup Vl....,rm)E n4
(l-&l
f .;=
g,4(XIrj)hj+E
1
i i=l
m (l-&J
C j=l
e,
4ioEs~~Cf(X/
,
II
dx(
rj)h,+&i~J4i~~f(X(
e)+E
C i=l
4io~~~cf(x1
0)
where AC denotes the complement of A, I and J are subsets of indices given by i E I if and only if CiCA, ieJ if and only ifAnCi=O, gA(xIr)=jlA(e)f(xIe)x 7r0(df? / r), and g(x ) r) = sf (x ) 8)xo(dB ) r). We have adopted the convention SUP~.~-(X~
e)=0.
Proof. The results are proved using similar techniques to those of Theorems 2 in Moreno and Cano (1991), and is omitted for the sake of brevity. Theorem
1 and
3. Let X be a random variable with density f (x I t9), where B is an unknown real parameter. Let T(H,,, Q,,) be the class of priors
T(H,,,Q,,)=
1
71: n=(l-E)
n,(e
1rM(W + wW%
where O
Q h(dr)=P
(fixed),
-cc
1
80
q(dB) = (Y (fixed),
Quc= q(e): q is unimodal with mode f$,, -cc
164
E. Moreno,
L.R. Pericchi / Bayesian robustness for hierarchical models
Then for any &Bore/ set A, we have inf P’(A lx) ncr = inf (1-&)[P~~(a~)+(l-~)H,(a2)1+E[crK,(bl)+(1-a)K,(bz)l
aIGo (1 - e)[PH(a,)
+ (1 -P)H(az)]
+ ~[aK(b,)
’
+ (1 - cr)K(&)]
(1.3) where gA(xI r)dr
I
g(x 1r) dr
4,+0)(a)
I r0)l(a=0)(a)9
l(a+0)(4+g.4(x
I.0+ a +
& I ro)l(a=o)(a),
70
ii
K,(a) =
1Qd6’
-1 e0+ 0 1,(00x a -e. 0,
and
1
I 4Jl(,=o~(a>,
1c,+ojW+f(x
if cola ifa=O,
r+‘f(x I 0)
de]
Ic,,0)@)
+f(x
(
8,$A,
QJl~aEo,(a).
0
The sup is obtained
by replacing
inf with sup in expression
(1.3).
Proof. Note that both h and q can be represented as mixtures of uniform distributions, where the mixing distributions have a quantile constraint. Then following an analogous procedure to that in Theorem 1 in Moreno and Cano (1991), the assertions
follow.
3. Illustrations
for scale mixtures
of normal distributions
Throughout this section we assume that the sampling model is f(x ) 0) = N(B, I), and the base prior is no(6’ ) 0, r) = N(0, r), with r = T-~. We present first an illustration for contaminated scale mixtures of a normal (E = 0.2), and secondly an illustration for uncontaminated scale mixtures of normals (E = 0). 3.1.
Case O
1
We now illustrate the theory of Section 2 for c-contaminations by calculating the ranges of posterior probabilities of Ho: 0<0 for the class T(H,, Q,,), where H, is
E. Moreno, L.R. Pericchi / Bayesian robustness for hierarchical models
165
Table 3.1 Ranges of posterior probabilities of Ho: 0<0 X
0.2 0.5 1.o 2.0 3.00
p,
0.36, 0.29, 0.19, 0.05, 0.01,
p
0.57 0.57 0.57 0.57 0.56
the class with one quantile fixed, H, = {h(r): jd h(dr) =0.26}, Q,, = {q(Q): q(8) is unimodal with mode at 0 and sfm q(d0) = 0.51, and E is taken as 0.2. In Table 3.1, we report P=inf,GTCHC,Q,,j PK(HO 1x) and P= supnET(Hc,Qucj PK(HO lx). Formulae for this case were obtained from Theorems 2 and 3. Recall that p and P were 0 and 1, respectively, if H, was the class of all distributions. Table 3.1 shows, however, that if even a quantile of the scale is fixed, then the range of the posterior probabilities is drastically reduced and becomes informative. Note that the upper probability is very insensitive to the observed value of x, but the lower probability is monotonically decreasing as x grows, that is, as H, becomes less plausible. 3.2.
Case E=O
This case is important because the usual way of integrating out hyperparameters via a precise mixing distribution ignores the fact that, in the higher stages of the hierarchy, less is typically known about the hyperparameters, Pericchi and Nazaret (1988). It might well happen that a small change in the mixing distribution would yield a large change in the posterior probabilities. We shall consider four classes of mixing distributions H. (i) Ho, the class of all distributions: Ho = {h(r): {,” h(r) dr = l}, (ii) H’, a class with a unimodal constraint: H’ = {h(r): h is unimodal with mode at ro= l}, (iii) H*, a class with one quantile fixed: H2= {h(r): $i h(r) dr=0.26}, and (iv) H3, a class with two quantiles fixed: H3 = (h(r): 1; h(r) dr = 0.26, 1: h(r) dr = 0.5}. The extrema of the posterior probabilities, P(HO / x), as the mixing distribution h ranges over the above classes, are obtained from Theorems 2 and 3, taking E = 0. In Table 3.2 for various observations x the ranges are denoted by p’ = infhCHj P(Ho lx), and Pi=suphEHJ P(Ho Ix), for i=O, 1,2,3. First of all, it is interesting to note that the family of all scale mixtures of a normal distribution (which contains normal, student t and double exponential among many other distributions) does not produce a vacuous answer; see the second and third columns in Table 3.2. Recall, however, that the mean of the base prior is fixed at zero and no contaminations are allowed. The ranges of the probabilities, assuming either unimodality or one quantile fix-
E. Moreno,
166
Table Ranges
L.R. Pericchi / Bayesian robustness for hierarchical models
3.2 of the posterior
X
po.
probabilities PO
of Ho: 060 p’,
P’
p’,
P=
p,
P’
0.2
0.42,
0.50
0.44,
0.50
0.44,
0.50
0.45,
0.48
0.5
0.31,
0.50
0.34,
0.50
0.36,
0.50
0.37,
0.45
1 .o
0.16,
0.50
0.21,
0.50
0.23,
0.50
0.26,
0.40
2.0
0.02,
0.50
0.05,
0.50
0.06,
0.50
0.11,
0.29
3.0
0.00,
0.50
0.01,
0.50
0.01,
0.50
0.04,
0.15
ed, are roughly the same, and the upper probability hand, if two quantiles are assumed fixed, the ranges the upper probability is no longer constant at 0.5.
4. Comments
remains at 0.5. On the other are substantially reduced and
and conclusions
From Theorem 1 we conclude that, for the s-contamination model (0 5 E < l), we must model our knowledge about the hyperparameters since otherwise we get a vacuous posterior answer. Assuming even one quantile of r fixed results in non trivial posterior probabilities. On the other hand, when E = 0, from Table 3.2 we conclude that for one-sided hypothesis testing, we increase the infimum of the probabilities by modelling one quantile or unimodality. However, neither modelling with one quantile nor with unimodality is able to reduce the sup of the probabilities of Z-J, from 0.5, suggesting that more stringent restrictions for the class of distributions that models r are needed if a more substantial reduction of the range of the probabilities is desired. This reduction is obtained by assuming two quantiles fixed, see the last two columns in Table 3.2. We conclude by remarking that the class with c=O, in the illustrations the class of scale mixtures of normals without contaminations, is of importance in its own right for robust Bayesian statistics. Upon modelling the class of distributions for the nuisance hyperparameter, the number of precise assumptions of the original model has been substantially reduced and, furthermore, considerable variation in the tail behavior of the base prior distribution is allowed. Moreover, the extra assessments required by the s-contamination model are avoided, while retaining some freedom in the tail behavior.
Acknowledgements We are very grateful to the Associate constructive comments.
Editor and two referees for their careful and
E. Moreno, L.R. Pericchi / Bayesian robustness for hierarchical models
167
References Andrews,
D.F. and C.L. Mallows
(1974). Scale mixtures
of Normal
distributions.
J. Royal Statist. Sot.
Series B, 31, 99-102. J.O. (1984). The robust Bayesian viewpoint (with discussion). In: J. Kadane, ed., Robustness in Bayesian Stafistics. Elsevier Science Publishers, Amsterdam, II, 63-124. Berger, J.O. (1985). Statistical Decision Theory and Bayesian Analysis (second edition). Springer-Verlag, Berger,
Berlin, Berger,
New York. J.O.
(1990).
Robust
Bayesian
analysis:
Sensitivity
to the priors.
J. Statisf. Plann. Infer. 25,
303-328. Berger,
J.O. and L.M. Berliner
(1986). Robust
Bayes and empirical
Bayes analysis
with c-contaminated
Ann. Sfatist. 14, 461-486.
priors. de Finetti,
B. (1961). The Bayesian
approach
to the rejection
of outliers.
Proc. 4th Berkeley Symp. on
Math. Stat. and Prob. 1, 199-210. DeRobertis,
L. and J. Hartigan
(1981). Bayesian
inference
using intervals
of measures.
Ann. of Statist.
1, 235-244. Moreno,
E. and J.A. Cano (1991). Robust
Bayesian
analysis
with e-contaminations
partially
known.
J.
Royal Statist. Sot. Series 8, 53, 143-155. Moreno,
E. and A. Gonzalez
quantile Moreno,
E. and L.R.
quantile
constraints.
(1990). Empirical
Bayes analysis
Pericchi
(1991).
In: R. Gutierrez
priors with shape and
Robust
Bayesian
L.R. and W.A.
Nazaret
model (with discussion).
analysis
for e-contaminations
with shape and
et al., eds., Proc. of the Fifth Inter. Symp. on Applied Stochastic
Models and Data Analysis, World Scientific, Pericchi,
for e-contaminated
Brazi/ian J. Prob. Statist. 4, 177-200.
constraints.
Singapore,
(1988). On being imprecise
In: J.M. Bernard0
454-470. at the higher
levels of a hierarchical
et al., eds., Bayesian Sfatistics3, Oxford
University
linear Press,
361-375. Pericchi,
L.R.
and P. Walley
(1991).
Robust
Bayesian
credible
intervals
and prior
ignorance.
Intern.
Statist. Review 59, l-23. Sivaganesan, taminations.
S. and J.O.
Berger
(1989).
Ranges
of posterior
measures
for priors
with unimodal
con-
Ann. of Statist. 17, 869-889.
West, M. (1984). Outliers
models and prior distributions
Sot. Series B, 46, 431-439.
in Bayesian
linear regression.
J. Royal Statist.