Probability matching property of adjusted likelihoods

Probability matching property of adjusted likelihoods

ARTICLE IN PRESS Statistics & Probability Letters 76 (2006) 838–842 www.elsevier.com/locate/stapro Probability matching property of adjusted likelih...

165KB Sizes 0 Downloads 51 Views

ARTICLE IN PRESS

Statistics & Probability Letters 76 (2006) 838–842 www.elsevier.com/locate/stapro

Probability matching property of adjusted likelihoods In Hong Changa, Rahul Mukerjeeb, a

Department of Computer Science and Statistics, Chosun University, Gwangju 501-759, South Korea b Indian Institute of Management, Joka, Diamond Harbour Road, Kolkata 700 104, India Received 22 May 2005 Available online 14 November 2005

Abstract For models characterized by a scalar parameter, it is well known that Jeffrey’s prior ensures approximate frequentist validity of posterior quantiles. We examine how far this result remains robust in the presence of nuisance parameters, when the interest parameter y1 is scalar, a prior on y1 alone is considered, and the analysis is based on an adjusted version of the profile likelihood, rather than the true likelihood. This provides justification, from a Bayesian viewpoint, for some popular adjustments in term of their ability to neutralize unknown nuisance parameters. The dual problem of identifying adjustments that make a given prior on y1 probability matching in the above sense is also addressed. r 2005 Elsevier B.V. All rights reserved. MSC: primary 62F10 Keywords: Factorization; Jeffreys’ prior; Parametric orthogonality; Posterior quantile

1. Introduction Probability matching priors, that ensure approximate frequentist validity of posterior credible sets, have received much attention in recent years; see Datta and Mukerjee (2004) and the references therein. An early and fundamental result in this area, due to Welch and Peers (1963), concerns the frequentist validity of posterior quantiles, under Jeffreys’ prior, for models characterized by a scalar parameter. Under wide generality, such validity is known to hold with margin of error o(n1/2), where n is the sample size. We aim at examining the extent to which this result remains robust in the presence of nuisance parameters, when the interest parameter is scalar and the analysis is based on an adjusted version of the profile likelihood, rather than the true likelihood. Specifically, we are interested in adjusted likelihoods that were proposed by Cox and Reid (1987), McCullagh and Tibshirani (1990) and Barndorff-Nielsen (1994) and that have been widely studied in the literature. One of the objectives underlying these adjustments is to neutralize the presence of unknown nuisance parameters. Indeed, in frequentist inference, it has been demonstrated that they attain this objective; see e.g., Mukerjee (1992). We investigate the corresponding issue in the context of probability matching priors. Corresponding author.

E-mail address: [email protected] (R. Mukerjee). 0167-7152/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2005.10.015

ARTICLE IN PRESS I.H. Chang, R. Mukerjee / Statistics & Probability Letters 76 (2006) 838–842

839

Suppose an adjusted profile likelihood for the interest parameter y1 is treated as the true likelihood and a prior on y1 alone is considered. Then, in analogy with the true likelihood-based analysis in the scalar parameter case, does one get a matching prior for the posterior quantiles of y1, with margin of error o(n1/2)? If so, then is it the same as the Jeffreys’ prior that would have arisen had the nuisance parameters been known? These questions are addressed here and satisfyingly the answers turn out to be positive for many, though not all, models. Thus one gets additional justification for the aforesaid adjusted likelihoods from a Bayesian perspective, with regard to the neutralization of unknown nuisance parameters. The matching condition in our setup is worked out in the next section via appropriate expansions for the posterior density and quantiles. To save space, we highlight only those aspects of the derivation which differ from that based on the true likelihood. In Section 3, we present the results for the above-mentioned adjustments and contrast them with those for the unadjusted profile likelihood. To streamline the presentation, we work with a general class of adjusted likelihoods. As a by-product, for a given model and a given prior, it is also possible to identify the adjustments that make the prior probability matching in the sense considered here. This is indicated in Section 4. 2. Matching condition Let {Xi}, iX1, be a sequence of independent and identically distributed possibly vector-valued random variables with common density f ðx; yÞ, where the parameter vector y ¼ ðy1 ; . . . ; yp Þ0 belongs to the pdimensional Euclidean space or some open subset thereof, and y1 is the scalar parameter of interest. We work essentially under the assumptions of Johnson (1970) and also need the Edgeworth assumptions of Bickel and Ghosh (1990, p. 1078). All formal expansions for the posterior, as considered here, are over a set S with Pyprobability 1+o(n1/2), uniformly on compact y-sets. The set S may be defined along the line of Bickel and Ghosh (1990). Let y^ ¼Pðy^ 1 ; . . . ; y^ p Þ0 be the maximum likelihood estimator (MLE) of y based on X ¼ ðX 1 ; . . . ; X n Þ0 , lðyÞ ¼ n1 ni¼1 log f ðX i ; yÞ, and, with Dj ¼ q=qyj for 1pj; s; upp, let l j ðyÞ ¼ Dj lðyÞ; l js ðyÞ ¼ Dj Ds lðyÞ; l jsu ðyÞ ¼ Dj Ds Du lðyÞ, ^ ^ cjs ¼ l js ðyÞ; ajsu ¼ l jsu ðyÞ; Ljsu ¼ E y fDj Ds Du log f ðX 1 ; yÞg. The matrix C ¼ ðcjs Þ is positive definite over S. Let C 1 ¼ ðcjs Þ. The per observation (expected) Fisher information matrix at y is denoted by I ¼ ðI js Þ and let I 1 ¼ ðI js Þ. Let ðy^ 2 ðy1 Þ; . . . ; y^ p ðy1 ÞÞ0 be the MLE of ðy2 ; . . . ; yp Þ0 , based on X, given y1. Define qðy1 Þ ¼ ðq1 ðy1 Þ; . . . ; qp ðy1 ÞÞ0 , where q ðy1 Þ ¼ y1 and q ðy1 Þ ¼ y^ j ðy1 Þ, 2pjpp. For m ¼ 1; 2 and 1pjpp, let qðmÞ ðy1 Þ ¼ Dm q ðy1 Þ and j

1

j

1

j

^ ~ ~ ¼ qðmÞ q^ ðmÞ j j ðy1 Þ. The profile (log-)likelihood for y1 is given by nlðy1 Þ, where lðy1 Þ ¼ lðqðy1 ÞÞ. Let ðmÞ ~ ~ðmÞ ðy^ 1 Þ ðm ¼ 1; 2; 3Þ, as obtained in (2.6) below, will be needed in the l~ ðy1 Þ ¼ Dm 1 lðy1 Þ. The expressions for l sequel. Throughout, unless otherwise specified, we follow the summation convention with implicit sums ranging over 1; . . . ; p. By the definition of qðy1 Þ,

l j ðqðy1 ÞÞ ¼ 0;

2pjpp.

(2.1)

Differentiating (2.1) twice with respect to y1 and then substituting y^ 1 for y1, cjs q^ ð1Þ s ¼ 0; Let k

wj

wj

^ ð1Þ ^ ð2Þ ajsu q^ ð1Þ s q u  cjs q s ¼ 0; w1 j1

11

2pjpp.

¼ c  ðc c =c Þ; 1pw, jpp. Then k

w1

¼ 0 for each w, and

kwj cjs ¼ dws  ðcw1 =c11 Þd1s; where dws and d1s are Kronecker deltas. Hence by the first equation in (2.2), w1 11 ð1Þ ^ ð1Þ ^1 . 0 ¼ kwj cjs q^ ð1Þ s ¼ q w  ðc =c Þq

(2.2)

(2.3)

ARTICLE IN PRESS I.H. Chang, R. Mukerjee / Statistics & Probability Letters 76 (2006) 838–842

840

Since q^ ð1Þ 1 ¼ 1, this yields w1 11 q^ ð1Þ w ¼ c =c ;

1pwpp.

(2.4)

Similarly, from the second equation in (2.2) one obtains wj ^ ð1Þ ^ ð1Þ q^ ð2Þ 1pwpp. w ¼ k ajsu q s q u ; ~ 1 Þ ¼ lðqðy1 ÞÞ; by (2.1) and the fact that qð1Þ ðy1 Þ ¼ 1, Since lðy 1 ð1Þ l~ ðy1 Þ ¼ l 1 ðqðy1 ÞÞ;

(2.5)

ð2Þ l~ ðy1 Þ ¼ l 1j ðqðy1 ÞÞqð1Þ j ðy1 Þ,

ð3Þ ð2Þ ð1Þ l~ ðy1 Þ ¼ l 1js ðqðy1 ÞÞqð1Þ j ðy1 Þqs ðy1 Þ þ l 1j ðqðy1 ÞÞqj ðy1 Þ.

Hence using (2.3)–(2.5), after some simplification ð1Þ l~ ðy^ 1 Þ ¼ 0;

ð2Þ l~ ðy^ 1 Þ ¼ ðc11 Þ1 ;

ð3Þ l~ ðy^ 1 Þ ¼ ajsu cj1 cs1 cu1 ðc11 Þ3 .

(2.6)

~ 1 Þ þ Bðy1 Þ, where the adjustment We now consider an adjusted profile (log-)likelihood of the form nlðy Bðy1 Þ½ Bðy1 ; X Þ is a smooth function of y1, and B(y1) and its derivatives with respect to y1 are at most of order Op(1). Let pðy1 Þ be a prior density (on y1) which is positive and twice continuously differentiable. If one treats the adjusted likelihood for y1 as the true likelihood, then the posterior density of y1, under pðy1 Þ, is given by ~ 1 Þ þ Bðy1 Þg. p ðy1 jX Þ / pðy1 Þ expfnlðy

(2.7)

It is assumed that the above is proper for sufficiently large n. Let y ¼ ðn=c11 Þ1=2 ðy1  y^ 1 Þ. One can employ (2.6) to expand (2.7) about y^ 1 and hence find an expansion for the posterior density of y with margin of error oðn1=2 Þ. Since y is a posterior standardized version of y1, this yields an expression for the (1a)th posterior quantile of y1 as ðp; X Þ ¼ y^ 1 þ ðc11 =nÞ1=2 ½z þ n1=2 fA1 þ A3 ðz2 þ 2Þg, yð1aÞ 1 ð1; y1ð1aÞ ðp; X Þ,

such that the posterior coverage of the interval Here z is the (1a)th quantile of a standard normal variate, A1 ¼ ðc11 Þ1=2 ½Bð1Þ ðy^ 1 Þ þ fpð1Þ ðy^ 1 Þ=pðy^ 1 Þg;

(2.8) with reference to (2.7), is 1a+o(n

A3 ¼ 16ajsu cj1 cs1 cu1 ðc11 Þ3=2 ,

1/2

).

(2.9)

and Bð1Þ ðy1 Þ ¼ D1 Bðy1 Þ; pð1Þ ðy1 Þ ¼ D1 pðy1 Þ. While (2.8) formally resembles the corresponding expression based on the true likelihood, there are differences. For instance, the term Bð1Þ ðy^ 1 Þ in A1 does not arise there; moreover, the prior now involves y1 alone. We employ a shrinkage argument, popular in Bayesian asymptotics (Datta and Mukerjee, 2004), to find the frequentist probability Py fy1 py1ð1aÞ ðp; X Þg. This calls for a true likelihood-based analysis using an auxiliary prior, on the entire parameter vector y, satisfying the conditions of Bickel and Ghosh (1990). By (2.8) and (2.9), after some algebra, this approach yields Py fy1 py1ð1aÞ ðp; X Þg ¼ 1  a þ n1=2 fðzÞDðp; yÞ þ oðn1=2 Þ,

(2.10)

where Dðp; yÞ ¼ xðyÞ  12ðI 11 Þ1=2 Ljsu sjs I u1 þ ðI 11 Þ1=2 fpð1Þ ðy1 Þ=pðy1 Þg þ Dj fðI 11 Þ1=2 I j1 g, js

js

j1 s1

(2.11)

11

with s ¼ I  ðI I =I Þ and xðyÞ free from n, satisfying E y fðc11 Þ1=2 Bð1Þ ðy^ 1 Þg ¼ xðyÞ þ oð1Þ. The frequentist probability in (2.10) equals 1  a þ oðn Dðp; yÞ  0.

(2.12) 1=2

Þ for every a and y if and only if (2.13)

Since the interest parameter y1 is scalar, without loss of generality, we can always assume parametric orthogonality (Cox and Reid, 1987), i.e., I 1j ¼ I j1 ¼ 0, identically in y ð2pjppÞ. Then by (2.11), the matching

ARTICLE IN PRESS I.H. Chang, R. Mukerjee / Statistics & Probability Letters 76 (2006) 838–842

841

condition (2.13) becomes 1 1=2 X X js 1=2 I Ljs1 þ fpðy1 Þg1 D1 fI 11 pðy1 Þg ¼ 0. xðyÞ  I 11 2 j¼2 s¼2 p

p

(2.14)

3. Adjusted versus unadjusted likelihoods Hereafter, we work under the framework of orthogonal parametrization. Following Mukerjee and Reid (1999) and Datta and DiCiccio (2001), for adjusted likelihoods due to Cox and Reid (1987) and McCullagh and Tibshirani (1990), 1 1=2 X X js xðyÞ ¼ I 11 I Ljs1 . 2 j¼2 s¼2 p

p

An algebra, involving the use of Bartlett conditions as applicable under parametric orthogonality, shows that the same holds also for the two adjustments proposed by Barndorff-Nielsen (1994, Section 3). Hence, for all 1=2 1=2 these adjustments, (2.14) reduces to D1 fI 11 pðy1 Þg ¼ 0, with solution pðy1 Þ ¼ gðy2 ; . . . ; yp ÞI 11 , provided there 1=2 exists a smooth function gðy2 ; . . . ; yp Þð40Þ such that gðy2 ; . . . ; yp ÞI 11 is free from y2 ; . . . ; yp . This happens if and only if I11 admits a factorization of the form I 11 ¼ lðy1 Þrðy2 ; . . . ; yp Þ,

(3.1) 1=2

1=2

in which case one can take gðy2 ; . . . ; yp Þ / frðy2 ; . . . ; yp Þg , and pðy1 Þ / flðy1 Þg turns out to be the unique solution of (2.14). If y2 ; . . . ; yp were known, then this would be the Jeffreys’ prior for y1 under (3.1). Thus, even with unknown y2 ; . . . ; yp , the above-mentioned adjustments to the profile likelihood yield a matching prior that would have arisen had these nuisance parameters been known, as long as the model satisfies (3.1). Satisfyingly, factorization (3.1) holds for a wide variety of models including the location-scale model (with either the location or the scale parameter being of interest). For instance, it holds for all the models discussed in the examples in Section 2.6 of Datta and Mukerjee (2004) except Example 2.6.2 that concerns the product of two independent normal means. For the unadjusted profile likelihood, both B(y1) and xðyÞ are identically equal to zero. Hence (2.14) becomes 1 1=2 X X js 1=2 I Ljs1 . fpðy1 Þg1 D1 fI 11 pðy1 Þg ¼ I 11 2 j¼2 s¼2 p

p

(3.2)

Even for models satisfying (3.1), the prior pðy1 Þ / flðy1 Þg1=2 often fails to satisfy (3.2). For example, consider the univariate normal model with standard deviation y1 and mean y2 . Then the unique solution to (3.2) is pðy1 Þ ¼ constant, which is different from the Jeffreys’ prior that would have arisen had y2 been known. Thus, even under (3.1), the unadjusted profile likelihood often fails to neutralize the presence of unknown nuisance parameters in the way the aforesaid adjustments do. 4. Matching adjustments for a given prior For a given model (i.e., given the information matrix and the Ljsu) and a given prior pðy1 Þ, condition (2.14) also yields a method for finding adjusted likelihoods under which the prior becomes probability matching in the sense considered here. For this purpose, one needs to identify adjustments so that the corresponding xðyÞ satisfy (2.14). To illustrate this, we consider adjusted likelihoods with Bðy1 Þ of the form Bðy1 Þ ¼ Hðqðy1 ÞÞ, ^ q^ ð1Þ . Therefore, by where H(y) is a smooth function of y. Let H j ðyÞ ¼ Dj HðyÞ, 1pjpp. Then Bð1Þ ðy^ 1 Þ ¼ H j ðyÞ j 1=2 (2.4) and (2.12), invoking parametric orthogonality, xðyÞ ¼ I 11 H 1 ðyÞ, and (2.14) holds if and only if HðyÞ ¼ QðyÞ  log pðy1 Þ þ 12 log I 11 ,

(4.1) P Pp 1 p

where Q(y) is any smooth function of y satisfying D1 QðyÞ ¼ 2

j¼2

s¼2 I

js

Ljs1 .

ARTICLE IN PRESS 842

I.H. Chang, R. Mukerjee / Statistics & Probability Letters 76 (2006) 838–842

Example 1. For the univariate normal model with standard deviation y1 and mean y2, we have I 11 ¼ 2y2 1 , 3 I 22 ¼ y2 , I ¼ 0, L ¼ 2y , and, upon simplification, (4.1) yields HðyÞ ¼ hðy Þ  log pðy Þ, where h(y ) 12 221 2 1 2 is 1 1 any smooth function of y2.

Example 2. Consider the inverse Gaussian model   y2 1=2 f ðx; yÞ ¼ expf12y2 ðx  y1 Þ2 =ðy21 xÞg; 2px3

x40,

where y1, y240, and the p in f ðx; yÞ is the usual transcendental number (not to be confused with a prior). Then 3 I 11 ¼ y3 1 y2 , I 12 ¼ 0, L221 ¼ 0, and (4.1) yields HðyÞ ¼ hðy2 Þ  log pðy1 Þ  2 log y1 , where h(y2) is any smooth function of y2. Acknowledgements This work was supported by research funds, starting 2004, from Chosun University. References Barndorff-Nielsen, O.E., 1994. Adjusted versions of profile likelihood and directed likelihood, and extended likelihood. J. Roy. Statist. Soc. B 56, 125–140. Bickel, P.J., Ghosh, J.K., 1990. A decomposition for the likelihood ratio statistic and the Bartlett correction—a Bayesian argument. Ann. Statist. 18, 1070–1090. Cox, D.R., Reid, N., 1987. Parameter orthogonality and approximate conditional inference (with discussion). J. Roy. Statist. Soc. B 49, 1–39. Datta, G.S., DiCiccio, T.J., 2001. On expected volumes of multidimensional confidence sets associated with the usual and adjusted likelihoods. J. Roy. Statist. Soc. B 63, 691–703. Datta, G.S., Mukerjee, R., 2004. Probability Matching Priors: Higher Order Asymptotics. Springer, Berlin. Johnson, R.A., 1970. Asymptotic expansions associated with posterior distributions. Ann. Math. Statist. 41, 851–864. McCullagh, P., Tibshirani, R., 1990. A simple method for the adjustment of profile likelihoods. J. Roy. Statist. Soc. B 52, 325–344. Mukerjee, R., 1992. Conditional likelihood and power: higher order asymptotics. Proc. Roy. Soc. London A 438, 433–446. Mukerjee, R., Reid, N., 1999. On confidence intervals associated with the usual and adjusted likelihoods. J. Roy. Statist. Soc. B. 61, 945–953. Welch, B.L., Peers, H.W., 1963. On formulae for confidence points based on integrals of weighted likelihoods. J. Roy. Statist. Soc. B 25, 318–329.