Bayesian robustness for hierarchical ε-contamination models

Bayesian robustness for hierarchical ε-contamination models

Journal of Statistical Planning and Inference 37 (1993) 159-167 159 North-Holland Bayesian robustness for hierarchical E-contamination models E...

527KB Sizes 9 Downloads 124 Views

Journal

of Statistical

Planning

and Inference

37 (1993) 159-167

159

North-Holland

Bayesian robustness for hierarchical E-contamination models Elias Moreno Universidad de Granada,

Granada, Spain

Luis Ratil Pericchi Universidad Sim6n Bolivar. Caracas,

Received

Abstract:

Venezuela

1 April 1991; revised manuscript

In the robust

Bayesian

received 2 October

literature

in order to investigate

form of a base prior distribution

n, (in particular

tion model of prior distributions

f = {z: a=( 1-.s)~~(8

Q and E reflects the amount of error in n&B considering

/;1)+eq,qE

to replace n,(O

1, is the ‘known’ part of/z and AZ is the ‘uncertain’ constraints. hypothesis AMS

Ranges

of posterior

belonging

Here n,(B

family of priors n,(B

/ ~)=z~‘a,[(O-,u)/z],

probabilities

instead of

1A) by z,,(S 1L,)=j n,(8 ~/2J,)h(dA,),where

part, with the uncertainty

Classes considered

(A) is the

to some suitable class

inference is highly sensitive to changes of /2. Therefore,

a single base prior n,, it is reasonable measures).

with respect to the functional

Q}, has been proposed.

4 is a contamination

/ A). For the location-scale

it turns out that the posterior

being a class of probability

robustness

with respect to the shape of the prior tails) the s-contamina-

base elicited prior, L is a vector of fixed hyperparameters, where ,l=gl,r),

1992

reflected by allowing

here include those with unimodality

of any set may be computed.

Illustrations

hEH

(H

and quantile

related to normal

testing are given.

1980 Subject Class$cation:

Key words andphrases:

Primary

c-contaminated

62Al5;

secondary

62Fl5

classes of priors; hierarchical

model; robust

Bayes; scale mixture of

normals.

1. Introduction A very fruitful and realistic approach to modern statistical inference is Bayesian analysis where the use of partial information is recognized and modelled through classes of priors. This has been coined ‘Robust Bayesian analysis with respect to the prior’, see Berger (1984). One way of modelling partial prior information, when Correspondence

de Granada,

to: Prof. E. Moreno,

Campus

0378-3758/93/$06.00

de Fuente Nueva,

0

1993-Elsevier

Departamento

de Estadistica

1807 1 Granada,

Science

e Investigation

Operativa,

Spain.

Publishers

B.V. All rights

reserved

Universidad

160

E. Moreno,

L. R. Pericchi / Bayesian robustness for hierarchical models

there is enough information contamination class r(Q)={7~:

to suggest

rr=(l-cE)nc(0

where rce(B 1A) is a fixed base elicited

a base

1A)+&q,

prior

rrO, is to consider

the E-

qEQ},

prior (typically

(1.1)

chosen

to be conjugate

with

respect to the sampling model), A is a vector of hyperparameters considered fixed, is a contamination, E is such that O
mal base prior may be adequate but surely we shall not know the exact values of the mean and variance. It is implicitly taken for granted that the model (1.1) is able to avoid the need for a careful elicitation of the hyperparameters. Having ‘robustified’ the fixed base prior rco through r(Q), it is hoped that the analyses are not very sensitive to the chosen value of A. In this work, for different contaminating classes Q, we examined the Econtamination model (1.1) for rro belonging to the location-scale family of distributions, that is rro(8 1 A)=r-‘~o[(d-,u)/7], so that A =(p,t), and check if the posterior inference is reasonably robust with respect to the hyperparameters. If high sensitivity to A is detected we should try to model the variation of the hyperparameter A (or some components of it, typically the scale 7 which is more difficult to assess), in what is to become a robust hierarchical model. The quantity of interest considered here is the posterior probability of one-sided hypotheses. We shall see that inferences in this problem are highly sensitive to r. Often, partial prior information about r is available, and can be modelled by easily elicitable classes of hyperprior distributions, for example, the unimodal class and classes having a few quantiles fixed. We give mathematical results that allow the calculation of upper and lower probabilities of arbitrary sets, as the prior ranges over the class rco(8/A)h(dA)+~q,

~EH,

qEQ

, (1.2) 1

where H and Q are such classes of distributions. We illustrate the analysis for scale mixtures of normals as the base elicited prior, where the mixing distributions h(r) have quantile and unimodality constraints. The special case of (1.2) when E = 0 is of particular interest. Then ~~((3) belongs to the very important class of scale mixtures of normal distributions, motivated by de

E. Moreno,

Finetti

as outlier

161

L. R. Pericchi / Bayesian robustness for hierarchical models

prone

distributions;

see de Finetti

(1974), and West (1984). The idea behind the general (1.2), with H of specifying E (typically small) to represent in general; this could be regarded as ‘global’ for the mixing distribution would be more of

(1961), Andrews

and Mallows

and Q, is to also allow the possibility the probability of a wrong assessment uncertainty. The elicitation of the class a reflection of specific knowledge, and

could be thought of as ‘local’ uncertainty. Section 2 gives the variation of the ranges of the posterior probabilities of H, = { 8: 860) as the scale parameter of rrO changes, and gives the results needed to obtain the extrema of the posterior probabilities of an arbitrary set, as the prior ranges over T(H, Q). Section 3 illustrates these results and Section 4 gives some concluding remarks.

2. Modelling

uncertainty

of the hyperparameters

in &-contamination

classes

At this stage we highlight the fact that the inference given by the e-contamination model (1.1) is highly sensitive to the variation of the hyperparameters I, both for classes Q defined by quantile and unimodality restrictions. Consider the classes

qEQ,},

I-(QJ={n:

n=(l-&)T-‘TCO[(e-~O)/T]+&q,

r>O,

I-(Q,)={TI:

n=(1-e)7-‘TIO[(B-iLlg)/S]+eq,

r>O, qEQ,},

and f-(Q,,)={z:

7C=(1-&)7-1TCO[(B-FIO)/T]+&q,

r>O,

qEQuc3,

where Q, = {q: q has median

located

Q, = {q: q is unimodal

with mode at ,Y~},

at pO},

and Q,, = {q: q is unimodal with mode at p. and its median is located at ,D~}. 1. Let f (x 1Q) be the likelihood function where t9E I?. Assume v/(a) =jii+’ f (x IO) d6’ is a bounded function. Then, for any E> 0

Theorem

inf P”(H, n Er(Quc)

( x) = 0,

sup

that

P(Ho(x)=l,

n Er(Q,c)

and the same applies for r(Q,), since r(Q,,)Cr(Q,). Furthermore, it is enough to require that lim supI _ Q,f (x / 0) = 0 for the above results to hold for r(Q,). Proof. The assertions follow from Corollary Fatou-Lebesgue’s Lemma, and the representation ture of uniform distributions (see, for instance,

1 in Moreno and Cano (1991), of a unimodal density as a mixBerger, 1985, p. 233).

162

E. Moreno,

L. R. Pericchi / Bayesian robustness for hierarchical models

Table 2.1 Sensitivity

with respect

to T

T

P”Wo

p’, P’

1x1

p”, PU

0.10

0.42

0.36

0.49

0.33

0.55

0.95

0.27

0.19

0.45

0.16

0.57

2.24

0.27

0.14

0.58

0.10

0.70

5.00

0.26

0.09

0.73

0.06

0.83

Remark 1. Note that a priori infnErcQEj P”(H,) =

the prior imprecision of N, under r(Q,) Ho is one for any observation x.

supnEr(Qc) Pn(Ho) = 0.5. That is, is zero. However, the posterior range of

Example 1. We illustrate the above result for the likelihoodf(x

10) = N(B,O. l), and a normal base prior rcOwith mean 0 and variance r2, allowing the latter to vary. Consider the one-sided testing problem HO: 8<0 versus Hi : fl> 0. Let x= 0.2 be the observed value. In Table 2.1 for various values of the prior standard deviation r and E = 0.2 the ranges of the posterior probabilities of Ho as rc ranges over r(Q,) and r(Q,) are shown. Here p. is taken to be 0 in the definition of Q, and Q,. The notation in Table 2.1 is PC = infXEr(ocj P’(Ho 1 x), PC = supnEr(Q,) P’(f& 1 x), p’(ffO

fU=infnEr(Q.)

j x)2

p”

=

SUP~EI-(Q,)

p=(ffO

1 xl-

From Table 2.1 it can be seen that changes of the prior standard deviation r affect the range of the posterior probabilities of Ho considerably. Observe that the variation in r is of greater importance for the e-contamination neighborhood than for a precise analysis based on rro alone; see values of P”(Ho 1x) in Table 2.1. For the latter, as r grows, the posterior probabilities stabilize. Consider the class (H,,Q,) of prior distributions with some specified quantiles,

n} are partitions of the parameter spaces of where {Bj,j=l,..., m>, {Ci, izl,..., r and B respectively, and hi, qi are numbers such that t hj=l,

hj~0,

j=l,...,

m;

;c,q;=l,

qi>O, i=l,...,

n.

j=l

2. Let X be a random variable with density f (x 1f3), where 6’ is an unknown parameter. Let IVY,, Q,) be the class of priors

Theorem

W&,

Q,> = Z: de) = U-d I

.I’

no@ 1rP(dr) + M%

(k 4 E WC, Q,) , 1

where 0 GE < 1. Then for any Bore1 set A of 0 ‘s, the ranges of the posterior proba-

E. Moreno,

bilities P(A

(ii)

L. R. Pericchi / Bayesian robustness for hierarchical models

163

) x) as TCranges over T(H,, Q,) are given by

sup P(A J76l-

1x) =

sup Vl....,rm)E n4

(l-&l

f .;=

g,4(XIrj)hj+E

1

i i=l

m (l-&J

C j=l

e,

4ioEs~~Cf(X/

,

II

dx(

rj)h,+&i~J4i~~f(X(

e)+E

C i=l

4io~~~cf(x1

0)

where AC denotes the complement of A, I and J are subsets of indices given by i E I if and only if CiCA, ieJ if and only ifAnCi=O, gA(xIr)=jlA(e)f(xIe)x 7r0(df? / r), and g(x ) r) = sf (x ) 8)xo(dB ) r). We have adopted the convention SUP~.~-(X~

e)=0.

Proof. The results are proved using similar techniques to those of Theorems 2 in Moreno and Cano (1991), and is omitted for the sake of brevity. Theorem

1 and

3. Let X be a random variable with density f (x I t9), where B is an unknown real parameter. Let T(H,,, Q,,) be the class of priors

T(H,,,Q,,)=

1

71: n=(l-E)

n,(e

1rM(W + wW%

where O
Q h(dr)=P

(fixed),

-cc

1

80

q(dB) = (Y (fixed),

Quc= q(e): q is unimodal with mode f$,, -cc

164

E. Moreno,

L.R. Pericchi / Bayesian robustness for hierarchical models

Then for any &Bore/ set A, we have inf P’(A lx) ncr = inf (1-&)[P~~(a~)+(l-~)H,(a2)1+E[crK,(bl)+(1-a)K,(bz)l

aIGo (1 - e)[PH(a,)

+ (1 -P)H(az)]

+ ~[aK(b,)



+ (1 - cr)K(&)]

(1.3) where gA(xI r)dr

I

g(x 1r) dr

4,+0)(a)

I r0)l(a=0)(a)9

l(a+0)(4+g.4(x

I.0+ a +

& I ro)l(a=o)(a),

70

ii

K,(a) =

1Qd6’

-1 e0+ 0 1,(00x a -e. 0,

and

1

I 4Jl(,=o~(a>,

1c,+ojW+f(x

if cola ifa=O,

r+‘f(x I 0)

de]

Ic,,0)@)

+f(x

(

8,$A,

QJl~aEo,(a).

0

The sup is obtained

by replacing

inf with sup in expression

(1.3).

Proof. Note that both h and q can be represented as mixtures of uniform distributions, where the mixing distributions have a quantile constraint. Then following an analogous procedure to that in Theorem 1 in Moreno and Cano (1991), the assertions

follow.

3. Illustrations

for scale mixtures

of normal distributions

Throughout this section we assume that the sampling model is f(x ) 0) = N(B, I), and the base prior is no(6’ ) 0, r) = N(0, r), with r = T-~. We present first an illustration for contaminated scale mixtures of a normal (E = 0.2), and secondly an illustration for uncontaminated scale mixtures of normals (E = 0). 3.1.

Case O
1

We now illustrate the theory of Section 2 for c-contaminations by calculating the ranges of posterior probabilities of Ho: 0<0 for the class T(H,, Q,,), where H, is

E. Moreno, L.R. Pericchi / Bayesian robustness for hierarchical models

165

Table 3.1 Ranges of posterior probabilities of Ho: 0<0 X

0.2 0.5 1.o 2.0 3.00

p,

0.36, 0.29, 0.19, 0.05, 0.01,

p

0.57 0.57 0.57 0.57 0.56

the class with one quantile fixed, H, = {h(r): jd h(dr) =0.26}, Q,, = {q(Q): q(8) is unimodal with mode at 0 and sfm q(d0) = 0.51, and E is taken as 0.2. In Table 3.1, we report P=inf,GTCHC,Q,,j PK(HO 1x) and P= supnET(Hc,Qucj PK(HO lx). Formulae for this case were obtained from Theorems 2 and 3. Recall that p and P were 0 and 1, respectively, if H, was the class of all distributions. Table 3.1 shows, however, that if even a quantile of the scale is fixed, then the range of the posterior probabilities is drastically reduced and becomes informative. Note that the upper probability is very insensitive to the observed value of x, but the lower probability is monotonically decreasing as x grows, that is, as H, becomes less plausible. 3.2.

Case E=O

This case is important because the usual way of integrating out hyperparameters via a precise mixing distribution ignores the fact that, in the higher stages of the hierarchy, less is typically known about the hyperparameters, Pericchi and Nazaret (1988). It might well happen that a small change in the mixing distribution would yield a large change in the posterior probabilities. We shall consider four classes of mixing distributions H. (i) Ho, the class of all distributions: Ho = {h(r): {,” h(r) dr = l}, (ii) H’, a class with a unimodal constraint: H’ = {h(r): h is unimodal with mode at ro= l}, (iii) H*, a class with one quantile fixed: H2= {h(r): $i h(r) dr=0.26}, and (iv) H3, a class with two quantiles fixed: H3 = (h(r): 1; h(r) dr = 0.26, 1: h(r) dr = 0.5}. The extrema of the posterior probabilities, P(HO / x), as the mixing distribution h ranges over the above classes, are obtained from Theorems 2 and 3, taking E = 0. In Table 3.2 for various observations x the ranges are denoted by p’ = infhCHj P(Ho lx), and Pi=suphEHJ P(Ho Ix), for i=O, 1,2,3. First of all, it is interesting to note that the family of all scale mixtures of a normal distribution (which contains normal, student t and double exponential among many other distributions) does not produce a vacuous answer; see the second and third columns in Table 3.2. Recall, however, that the mean of the base prior is fixed at zero and no contaminations are allowed. The ranges of the probabilities, assuming either unimodality or one quantile fix-

E. Moreno,

166

Table Ranges

L.R. Pericchi / Bayesian robustness for hierarchical models

3.2 of the posterior

X

po.

probabilities PO

of Ho: 060 p’,

P’

p’,

P=

p,

P’

0.2

0.42,

0.50

0.44,

0.50

0.44,

0.50

0.45,

0.48

0.5

0.31,

0.50

0.34,

0.50

0.36,

0.50

0.37,

0.45

1 .o

0.16,

0.50

0.21,

0.50

0.23,

0.50

0.26,

0.40

2.0

0.02,

0.50

0.05,

0.50

0.06,

0.50

0.11,

0.29

3.0

0.00,

0.50

0.01,

0.50

0.01,

0.50

0.04,

0.15

ed, are roughly the same, and the upper probability hand, if two quantiles are assumed fixed, the ranges the upper probability is no longer constant at 0.5.

4. Comments

remains at 0.5. On the other are substantially reduced and

and conclusions

From Theorem 1 we conclude that, for the s-contamination model (0 5 E < l), we must model our knowledge about the hyperparameters since otherwise we get a vacuous posterior answer. Assuming even one quantile of r fixed results in non trivial posterior probabilities. On the other hand, when E = 0, from Table 3.2 we conclude that for one-sided hypothesis testing, we increase the infimum of the probabilities by modelling one quantile or unimodality. However, neither modelling with one quantile nor with unimodality is able to reduce the sup of the probabilities of Z-J, from 0.5, suggesting that more stringent restrictions for the class of distributions that models r are needed if a more substantial reduction of the range of the probabilities is desired. This reduction is obtained by assuming two quantiles fixed, see the last two columns in Table 3.2. We conclude by remarking that the class with c=O, in the illustrations the class of scale mixtures of normals without contaminations, is of importance in its own right for robust Bayesian statistics. Upon modelling the class of distributions for the nuisance hyperparameter, the number of precise assumptions of the original model has been substantially reduced and, furthermore, considerable variation in the tail behavior of the base prior distribution is allowed. Moreover, the extra assessments required by the s-contamination model are avoided, while retaining some freedom in the tail behavior.

Acknowledgements We are very grateful to the Associate constructive comments.

Editor and two referees for their careful and

E. Moreno, L.R. Pericchi / Bayesian robustness for hierarchical models

167

References Andrews,

D.F. and C.L. Mallows

(1974). Scale mixtures

of Normal

distributions.

J. Royal Statist. Sot.

Series B, 31, 99-102. J.O. (1984). The robust Bayesian viewpoint (with discussion). In: J. Kadane, ed., Robustness in Bayesian Stafistics. Elsevier Science Publishers, Amsterdam, II, 63-124. Berger, J.O. (1985). Statistical Decision Theory and Bayesian Analysis (second edition). Springer-Verlag, Berger,

Berlin, Berger,

New York. J.O.

(1990).

Robust

Bayesian

analysis:

Sensitivity

to the priors.

J. Statisf. Plann. Infer. 25,

303-328. Berger,

J.O. and L.M. Berliner

(1986). Robust

Bayes and empirical

Bayes analysis

with c-contaminated

Ann. Sfatist. 14, 461-486.

priors. de Finetti,

B. (1961). The Bayesian

approach

to the rejection

of outliers.

Proc. 4th Berkeley Symp. on

Math. Stat. and Prob. 1, 199-210. DeRobertis,

L. and J. Hartigan

(1981). Bayesian

inference

using intervals

of measures.

Ann. of Statist.

1, 235-244. Moreno,

E. and J.A. Cano (1991). Robust

Bayesian

analysis

with e-contaminations

partially

known.

J.

Royal Statist. Sot. Series 8, 53, 143-155. Moreno,

E. and A. Gonzalez

quantile Moreno,

E. and L.R.

quantile

constraints.

(1990). Empirical

Bayes analysis

Pericchi

(1991).

In: R. Gutierrez

priors with shape and

Robust

Bayesian

L.R. and W.A.

Nazaret

model (with discussion).

analysis

for e-contaminations

with shape and

et al., eds., Proc. of the Fifth Inter. Symp. on Applied Stochastic

Models and Data Analysis, World Scientific, Pericchi,

for e-contaminated

Brazi/ian J. Prob. Statist. 4, 177-200.

constraints.

Singapore,

(1988). On being imprecise

In: J.M. Bernard0

454-470. at the higher

levels of a hierarchical

et al., eds., Bayesian Sfatistics3, Oxford

University

linear Press,

361-375. Pericchi,

L.R.

and P. Walley

(1991).

Robust

Bayesian

credible

intervals

and prior

ignorance.

Intern.

Statist. Review 59, l-23. Sivaganesan, taminations.

S. and J.O.

Berger

(1989).

Ranges

of posterior

measures

for priors

with unimodal

con-

Ann. of Statist. 17, 869-889.

West, M. (1984). Outliers

models and prior distributions

Sot. Series B, 46, 431-439.

in Bayesian

linear regression.

J. Royal Statist.