Efficient likelihood ratio tests under P-ancillarity and P-sufficiency

Efficient likelihood ratio tests under P-ancillarity and P-sufficiency

STATISTICS& PROBABILITY LETTERS ELSEVIER Statistics & Probability Letters 29 (1996)213-221 Efficient likelihood ratio tests under P-ancillarity and ...

462KB Sizes 0 Downloads 83 Views

STATISTICS& PROBABILITY LETTERS ELSEVIER

Statistics & Probability Letters 29 (1996)213-221

Efficient likelihood ratio tests under P-ancillarity and P-sufficiency Yiliang Zhu a'*, Nancy Reid b aDepartment of Epidemiology and Biostatistics, College of Public Health, University of South Florida, Tampa, FL 33612-3805, USA bDepartment of Statistics, University of Toronto, Toronto, Canada MSS IA1

Received January 1993; revisedAugust 1995

Abstract A local power bound is obtained for asymptotic level-a tests in the presence of infinitely many nuisance parameters. Likelihood ratio tests, conditioning on P-ancillary statistics or based on marginal P-sufficient statistics, attain this power bound, and are thus locally most powerful similar. The efficiency of the marginal likelihood ratio tests based on conditional P-sufficient statistics is also discussed. Keywords: Ancillarity; Sufficiency; Nuisance parameters; Asymptotic efficiency; Local asymptotic normality; Likelihood

ratio tests

1. Introduction Consider testing composite null hypothesis with local alternatives. Michel (1979) showed that in the natural exponential family with a scalar parameter of interest, score tests based on conditional score statistics about the parameter of interest maximize local power, and are thus efficient among level-~ tests. Under similar conditions, but allowing for possibly dependent sample observations, Basawa (1981) demonstrated that the conditional likelihood ratio statistics, conditioning on the ancillary statistics proposed by Godambe (1976, Theorem 3.1), are efficient under the same efficiency criterion. Extending the work of Godambe (1976, 1984), Liang (1983) and Bhapkar (1989, 1991) on information, ancillarity and sufficiency in the presence of nuisance parameters, Zhu and Reid (1994) recently discussed the concepts of P-information, P-ancillarity and P-sufficiency in the sense of local parametrization. In this paper we show that likelihood ratio tests, conditioning on P-ancillary statistics or based on marginal P-sufficient statistics, are locally most powerful similar among level-~t tests; and so are those based on conditional P-sufficient statistics when the dimension of nuisance parameters is fixed. These results are not restricted to exponential families of distributions, and are valid for vector parameter of interest with infinitely many nuisance parameters. Although rigorous discussion of regularity conditions is not the focus of this paper, hence omitted sometimes, references are made only so far as necessary. * Corresponding author. 0167-7152/96/$12.00 © 1996 ElsevierScienceB.V. All rights reserved SSDI 0167-7152(95)00175-1

Y. Zhu, N. Reid~Statistics & Probability Letters 29 (1996) 213-221

214

In Section 2, P-information, P-ancillarity and P-sufficiency are introduced. In Section 3, a local power bound for level-e tests, an extension of those given by Chibisov (1973) and Michel (1979), is obtained for the case of infinitely many nuisance parameters. In Sections 4 and 5, the efficiency of conditional likelihood ratio tests under P-ancillarity and marginal likelihood ratio tests under marginal or conditional P-sufficiency is discussed. Some general discussions are given in Section 6.

2. P-information, ancillarity and sufficiency Let y(,) = (yl, ..., y,) be a sample of observations with joint pdf f(y(,); 0, 2) where the p-vector parameter of interest ~ e 71 and the q-vector nuisance parameter 2 e A; both ~ and A are assumed to be open subsets in Euclidean space. For simplicity, it is also assumed that the parameter space ~P x A is a Cartesian product although only local product space is required to ensure that ~ and 2 are locally variation independent. The dimension q is allowed to increase with sample size. The Greek letters/~, 7, ... are used to index terms associated with the nuisance parameters. We now introduce the concepts of P-information, P-ancillarity and P-sufficiency discussed by Zhu and Reid (1994). Pointwise, with respect to the parameters of interest, these concepts are in contrast with global reparameterization associated with the concepts proposed previously in the literature by Andersen (1970), Godambe (1976, 1984), Liang (1983) and Bhapkar (1989, 1991).

2.1. P-information Let U, = {Ui; i = 1,2, ... ,p} and Ux = {Up; /~ = 1,2, ... ,q} be the vector score functions of ~ and 2, respectively. Denote by Tz the linear space (tangent space) spanned by the coordinate vector {Ua}, and by P~ the projection operator onto the tangent space Tz. The effective scores (Begun et al., 1983) of 0 are given by the residuals U , ta = U• -- P x U , = (I -- P x ) U ,

after projecting U• onto Tz, where I denotes the identity projector.

Definition 1. The P-information about ~k in the presence of 2 is given by

~r(~12) =

=I**

- - 1 ,•4 [ I ~• ]

-1

I~,, •

where = Cov(x,y), I~,z = , and U~ is any sub-vector of Ux, whose coordinates form a basis of Tz. The P-information I(~kl 2) is connected to local parameter orthogonalization, and is, moreover, invariant with respect to the nuisance parameters (Zhu and Reid, 1994). Fisher information in the presence of nuisance parameters was also discussed by Liang (1983), Godambe (1984) and Bhapkar (1989, 1991). When the likelihood factorization f0't,); ~b,2) ocfl(t.; ~k,2)fz(y(.) It.; ~')

(1)

f(y~.); @,2) ocfl(t.; ~k)fz(yt.)lt.; ~k,2)

(2)

or

exists, where t. is a statistic of yr.), we have

t(~, 12) = I1(~ 12) + I2(~ 12),

(3)

Y. Zhu, iV. Reid~Statistics & Probability Letters 29 (1996) 213-221

215

with 11 and I2 being the P-information matrices associated with fl and f2, respectively. In general, however, an inequality (i>) holds under any product likelihood (Zhu and Reid, 1994). Since statistical inference about the parameters of interest ~, may be based on f2(y(.)lt.;~k) or fl(t.; ~) depending on whether (1) or (2) is available, this information equality facilitates measuring information loss due to using only a sub-model of the full likelihood. Intuitively, the concepts of ancillarity and sufficiency may be extended on the basis of no pointwise P-information loss. D e f i n i t i o n 2. (a) Under factorization (1), t. is P-ancillary for ~b if the P-information matrix

11 (~/12) ~-- I(~/1/~) - 12 (~ 12) = 0 at given ~kfor every 2. (b) Under factorization (2), t. is marginal P-sufficient for ~bif the P-information matrix I~ (~, 12) = I(~, 12) - L ( ~ 12) = 0

at given ~, for every 2. P-ancillarity and P-sufficiency imply that there is essentially no information loss when using only the conditional likelihood f2 (.vt.) It.; ¢) or the marginal likelihood fl (t(.); ~) for inference about ~b. The following is a conditional version of P-sufficiency. D e f i n i t i o n 3. Under factorization (2), t. is conditional P-sufficient for ~b if the conditional P-information

matrix tH) I~')(~,L2) =
=

0

at given ~ for every 2. Here, the inner product, the conditional projector P ~ and the conditional effective tD score Ut~la = (I - -2~).-,2~ °(t")~rr are all based on the conditional distribution fz(y~.)I t.; ~O,2). If there exists a local smooth reparameterization (~b,2) of (~,, 2) for given ~k such that the corresponding sub-model in (1) or (2) depends on (~b,2) through 2 only, it can then be shown that the sub-model contains no P-information about ~k and may thus be ignored. If such a reparametrization is global with respect to both ~k and 2, P-ancillarity and marginal P-sufficiency given in Definition 2 reduce to Godambe's ancillarity (Godambe, 1976, Theorem 3.1) and Bhapkar's (1991) sufficiency, respectively. Notice that for the conditional P-sufficiency in Definition 3, 2 generally depends on tt.). It can be shown that marginal P-information loss 12(~ 12) = 0 implies conditional P-information loss I2" (~,12) = 0 almost everywhere in t(.). Consequently, marginal P-sufficiency implies conditional P-sufficiency. In Sections 4 and 5, we discuss efficient conditional and marginal likelihood ratio tests based on these concepts.

3. A local power bound

For simplicity, we assume hereafter that the observationsy(.) = (y: . . . . . y.) are independently distributed, each with pdf fp (ya; if, ),P), fl = 1. . . . . n. To obtain the local power bound for level-~ tests, we first present the local asymptotic normality (LAN), generally known as Le Cam's lemma, for likelihood ratios.

3.1. Local asymptotic normality For (~ko,2a), fl = 1. . . . . n, fixed, consider two contiguous sequences of parameter values Ho.: ~k = ~Oo, 2,p G A

(4)

216

Y. Zhu, N. Reid~Statistics & Probability Letters 29 (1996) 213-221

and Hn:~k.,

2p ~ A

(5)

in that lim,_,~ n~(2~ - 2 p) = h a and lim, ~ nO(~,. - ~'o) = h. Here, h is a fixed constant vector and {h p } are uniformly bounded under, for example, the usual L2 norm. The value 6 > 0 satisfies the condition y, ala(~kl 2 p) = O(n z~) (typically 6 = ½), where Ip(ff 12p) is the P-information associated with fp. Ho. and Hn being contiguous essentially implies the contiguity condition for the two sequences of distributions of Y~n) under {Ho.} and {Hn}, respectively. As stated in the following lemma, the LAN holds under both {Ho.} and {Hn} for the log likelihood ratio Ln = log

f~(Y~; On, 2P)/fp(Y~; 0o, 2

(6)

as n ~ oc. This result is essentially implied by Propositions 3.1.1 and 4.2.1-4.2.3 of Le Cam and Yang (1990). An alternative proof was given by Zhu (1992). Lemma. Assuming mild regularity conditions, we have

Ln ~ N(--r2/2,r 2) (n ~ oe) under Hoo

(7)

Ln ~ N(r2/2, r 2) (n --* oe) under Hn,

(8)

and

respectively, where r 2 = lim n -2~ ~ (h T, --(h#)T)I#(I/l,,~P)(h T, --(hP)T) T n ~

fl=l

and Ia(~b, 2 a) is the Fisher information matrix based on fa(Ya; ~, 2a) • 3.2. Level-~ tests The asymptotic efficiency of a test may be evaluated by considering the composite null hypothesis Ho:¢=~bo,

ff6A

(9)

and a sequence of contiguous alternatives

2 a ~ A,

Hn:~n=~k0+n-°hn,

(10)

13 = 1. . . . . n, where the vectors hn ~ h as n ~ 0o, representing a directional departure from ~ko. A sequence of test {~n} are said of level-~ with respect to the nuisance parameters {if} for testing Ho if they satisfy the condition l i m s u p E { ~ ; H o . } ~<~; 0 < ~ < 1 , n "-~ oO

Hon

with Ho. given in (4) (cf. Chibisov, 1973; Michel, 1979). Here, E{ "; Ho,} denotes expectation taken under Ho.. Let Q be the class of all level-~ tests. A sequence of tests {~*} 6 Q are locally most powerful or efficient, if for any other {~n} 6 Q, lim supE{~n;Hn} ~< lim E{~*;H,} t~ - ~ oo

Hn

n --* oo

holds for any h and {2t~}.

(11)

Y. Zhu, N. Reid~Statistics & Probability Letters 29 (1996) 213-221

217

3.3. A power bound The following theorem gives a bound for the local power of level-a tests. This power bound includes as special cases a similar result based on i.i.d, sample from the exponential family (Michel, 1979) and a result for a class of test statistics that are sum of independent terms (Chibisov, 1973). In Sections 4 and 5 we show that this power b o u n d is attained by certain likelihood ratio tests. Theorem. For testin9 null hypothesis (9) with alternatives (10), the asymptotic local power of any level-a tests satisfies lim s u p E { ( , ; H n } ~< 1 - ~(z, - ro), ?l --~ oO

(12)

Hn

where r 2 -- hTIo(~bo l 2)h and lo(~kol2) = lira n -2~ ~ Ip(~kolff), n--, oo

#=1

and cb is the cumulative distribution function of the N(0,1) distribution with ~(z.) = 1 - a. When the power bound is attained, ro is also the coefficient of Pitman's efficacy if the distribution of the statistics under Ho is asymptotically normal. Proof. Consider the sequence of functions c~. = I l L . >t -- r2/2 + z~r], where L, is the log likelihood ratio given in (6) and r is given in the Lemma. ({~b,} are not necessarily test functions because of possible dependence on 2P.) F r o m Eqs. (7) and (8), we have lira E{~b.;Ho.} = 1 - ~(z,) = a n~ot~

and lim E{q~.;H.} = 1 - ~(z~ - r), .~o0

respectively. F o r any sequence of tests {(.} ~ Q and arbitrarily small e > 0, there exists sufficiently large n such that, by definition, E { ( , ; H o . } < E{~b.;Ho.} + e.

(13)

Define the following subsets of the sample space A ; = {y~.): c~, = O, ~. = 1}, A + = {yr.): ~b. = 1, ft, = 0}, and A. = {y~.): ~b. = 1, ff~ = 1}. It follows from (13) E { ¢ . I [ A ; ] ; H o , } < E{q~.I[A+];Ho.} + e.

(14)

Y. Zhu, N. Reid~Statistics & Probability Letters 29 (1996) 213-221

218

Letting k = e x p ( - r 2 / 2 + z,r), we note that A~- c {y~.):exp(L,) ~< k} and A,+ c {y(,): exp(L.) 1> k}. Multiplying both sides of (14) by k gives

E { ( . I [ A i ] ; H . } <~E{k(.I[Ai];Ho.} < E{k(o.IEA+];Ho.} + ke <~E{c~.I[A+];H.} + ke. In deriving the preceding inequality, we have used the equality

E { k ( . I [ a ; ] ; Ho.} = E { k ( . I [ A ; ] exp(--L.); H.}. Adding E{(.I[A.]; H.} = E{(a.I[A.]; H.} to both sides of this inequality yields E{(.;H.} < E{~b.;H.} + ke. It follows by letting n ~ oo and e -~ 0 limsupE{(.;H.} ~< 1 - ~(z~ - r). n

(15)

H.

Since this inequality is valid for any {ha }, including the least favorable direction {ha } = {I~-0aIo, h } (Chibisov, 1973), we can replace r with its minimum value ro with r E = hTIo(~Ool2)h. The last assertion follows from the definition of Pitman's efficacy under certain regularity conditions (Serfling, 1980, Section 10.2). Details may be found in Zhu (1992). []

4. Conditional likelihood ratio tests In the natural exponential family with ff being scalar and 2° identical, the conditional score tests (Michel, 1979) and the conditional likelihood ratio tests (Basawa, 1981) under Godambe's ancillarity (Godambe, 1976, Theorem 3.1) are locally most powerful among level-~ tests. For general families of distributions with vector parameter ~, and infinitely many nuisance parameters {2°}, we show here that the conditional likelihood ratio tests, conditioning on P-ancillary statistics for ~O,are asymptotically unbiased and locally most powerful similar in the class Q. For simplicity, we assume that the pdf f0 is from the same family, but indexed by different parameter values 2 °. This implies that t¢.) = {t0 = t(y0); fl = 1, ..., n}, i.e. each t 0 is P-ancillary for ~Oin the presence of 2°. It may be directly verified that the LAN holds for the conditional likelihood ratio g2. = log =

[f2(Y0 It0; ~b.)/f2(Yol to; 0o)]

by noting that the LAN holds for both the full likelihood ratio L. and the marginal likelihood ratio La, of tt.j under factorization (1). Thus, Eqs. (7) and (8) hold with r replaced by r2 and r22 = lim n -26 ~ hXI2,c,(¢,,Aa)h. .~oo

B=l

Because the to's are P-ancillary for ~b, we have I(~k. 12o) = I2¢,q,(~k.,ff), and consequently r~ = ro2. Let {C.(tt.)) } be any sequence of constants, free of 2 ° but possibly involving t., such that lim C.(tt.)) = - rZ,/2 + z~r2. n -~ oo

For example, substituting I2~,~,(~0;to) for I2**(~k, 2o) in r2 yields such a sequence of C(t~.)) since the law of large numbers ensures convergence to rE in probability. The LAN for L2. under Ho. implies that the conditional likelihood ratio test

~. = I{L2. >1 C.(t¢.))}

(16)

Y. Zhu, N. Reid~Statistics & Probability Letters 29 (1996) 213-221

219

for testing (9) against (10) satisfies the condition lim E{~.lt~.);Ho.} = a; i1~oo

hence marginally, lim E{(.; Ho.} = ~. n --* c~

That is, {(.} belong to Q. The LAN under H. further implies that the local power of {(.} attains the power bound as lim E{(.;H.} = 1 - ~(z. - r2}. n --* oo

Therefore, the likelihood ratio test is asymptotically unbiased, locally most powerful similar. Example 1. 2 x 2 tables. Suppose we have n independent 2 x 2 tables, each generated by two independent binomial variables Xxa ~ B(nxp,plp) and X2p ",~B(n2a,P2p), fl = 1, ...,n. The log odds ratio qJ = l o g { p i p ( 1 - p 2 p ) / [ p 2 p ( 1 - Pip)I}, common to all n tables, is the parameter of interest, and the log odds {if} = {log[p2a/(1 - - P 2 # ) ] } are the nuisance parameters. Since the marginal totals {X.a} = {X~p + X2p} are P-ancillary for qJ at only ~O= 0 (p~a = P2p) (Zhu and Reid, 1994), the likelihood ratio tests, conditioning on the marginal totals {X.p } are locally most powerful similar among level-a tests for ~k = 0.

5. Marginal likelihood ratio tests

Similar to Section 4, we assume for simplicity that the P-sufficient statistics are of the form tt,) = {ta = t(yp); fl = 1, ..., n}. That is, each tp is P-sufficient (marginal or conditional) for gt in the presence of 2p. It follows that the LAN holds for the (log) marginal likelihood ratio La. = log [~=l~xfa(t,;~k.)/fl(t~;~ko)], with r being replaced by rl which is given by r z = lim n ---} oo

n -2~

~ hXllq, o(O.)h. fl=l

For testing (9) with local alternatives (10), the marginal likelihood ratio tests

~. = ILL1. >>.C.]

(17)

belong to Q as {C.} are chosen to satisfy the condition lim.~ooE{~.;Ho.} = a. Since the LAN under Ho. implies lira C. = - r~/2 + z=rl, n --* o o

the local power is given by lim E{~.;H.} = 1 - ~(z. - rl). The tests {(.} are thus asymptotically unbiased and similar.

220

Y. Zhu, N. Reid~Statistics & Probability Letters 29 (1996) 213-221

When t¢.) are marginally P-sufficient for 0, the marginal P-information loss is given by I2(~ [if) = 0, leading to the fact I(~12 p) = I 1 ~ and rx = to. Therefore, {~.} in (17) are also locally most powerful among level-a tests. Note that when {ta} are S-sufficient statistics (i.e. f2(y¢.)lt,;2) is free of ~), the marginal likelihood ratio test is actually uniformly most powerful against alternatives H,: ~ = ~,~ (Fraser, 1956). On the other hand, when t(.) are conditional P-sufficient for 0, the marginal likelihood ratio tests (17) would attain the local power bound only if there are finitely many nuisance parameters. In this case, the information loss is typically of the order

I~(~ 12) = oE/(~o 12)3, hence r~ = ro. If there are infinitely many nuisance parameters, we have

t~(¢ I,~) = o f f ( ¢ I,~)3. It follows that rl < ro and {~.} do not attain the power bound (Zhu and Reid, 1994). Example 2. Gamma mean ratio ofn pairs. Let {(Xp, Ya), fl = 1, ..., n} be n independent pairs of observations from the Gamma(a, ~,2~) and Gamma(a, 2a) distributions, respectively, where a is the given shape parameter, and ~k2a and 2 a are the scale parameters of the two distributions, respectively. Note that the parameter of interest ~, measures the mean ratio. The joint likelihood of (xa, yp) can be factored into

f(xa, yp;O, 2~) ocf~ (ta; g')f2 (Ya I tp; if), with ta = xp/ya. Since f2 is the pdf of a G a m m a distribution conditioning on ta with the shape parameter 2a and the scale parameter 2p = 2p/(1 + ta/~,), {tp} are conditional P-sufficient for 0. Moreover, {tp} are i.i.d., with 1/(1 + Ta/O) following a beta distribution. Because the LAN holds for ib

L l n = nalog ~

1+ + 2a ~ log i=~ 1 + t./~,.'

the resulting marginal likelihood ratio test has local power 1 - q~(z~ - rl), where r~ = h 2 (~2/[- [2a + 1)~/2] }, less than r2o = h 2 [a/(2~k2)]. Thus the local power is below the power bound. Example 3. Mean ratio of the two samples. If 2 ~ = 2 (fl = 1, ..., n) in Example 2 above, the ratio of the sample means, t = £/y, is conditional P-sufficient for 0. The marginal likelihood ratio test based on t for (4) is locally most powerful similar among level-~ tests, as r 2 = h 2 [a/(2~k2)] = r~.

6. Discussion

In order to apply the LAN to log likelihood ratios and to establish the local power bound, we have assumed in this paper that yt,) = {yp; fl = 1, ..., n} are independently distributed. Since it is the contiguity conditions of the likelihood functions that are essential in LAN, the independence assumption may be relaxed so long as the contiguity conditions are satisfied. See Le Cam and Yang (1990, Ch. 4) for a more general discussion. P-ancillary or P-sufficient statistics of the form tt,) = {tl, ..., t,} are common, but may not be essential for contiguity conditions to hold. Therefore, the results presented in this paper may be applicable in more general settings.

Y. Zhu, N. Reid~Statistics & Probability Letters 29 (1996) 213-221

221

References Andersen, E.B. (1970), Asymptotic properties of conditional maximum likelihood estimators, J. Roy. Statist. Soc. B 32, 283-301. Basawa, I.V. (1981), Efficient conditional tests for mixtures experiments with applications to the birth and branching processes, Biometrika 68, 153-164. Begun, J.M., W.J. Hall, W.-M. Huang and J.A. WeUner (1983), Information and asymptotic efficiency in parametric-nonparametric models, Ann. Statist. 11, 432-452. Bhapkar, V.P. (1989), Conditioning on ancillary statistics and loss of information in the presence of nuisance parameters, J. Statist. Plann. Inference 21, 139-160. Bhapkar, V.P. (1991), Loss of information in the presence of nuisance parameters and partial sufficiency, J. Statist. Plann. Inference 28, 185-203. Chibisov, D.M. (1973), Asymptotic expansions for Neyman's C(~) tests, in: G. Maryama and Yu. V. Prokhorov, eds., 2nd Japan-USSR Syrup. on Pobab. Theory, Lecture Notes in Math. Vol. 330 (Springer, Berlin) pp. 16-45. Fraser, D.A.S. (1956), Sufficient statistics with nuisance parameters, Ann. Math. Statist. 27, 828-842. Godambe, V.P. (1976), Conditional likelihood and unconditional optimum estimating equations, Biometrika 63, 277-284. Godambe, V.P. (1984), On ancillarity and Fisher information in the presence of a nuisance parameter, Biometrika 71, 626-629. Le Cam, L. and G. Yang (1990), Asymptotics in Statistics: Basic Concepts (Springer, New York). Liang, K.-Y. (1983), On information and ancillarity in the presence of a nuisance parameter, Biometrika 70, 607-612. Michel, R. (1979), On the asymptotic efficiency of conditional tests for exponential families, Ann. Statist. 7, 1256-1263. Serfling, R.J. (1980), Approximation Theorems of Mathematical Statistics (Wiley, New York). Zhu, Y.-L. (1992), Generalized information measures and asymptotic efficiency, Ph.D. Thesis, Dept. of Statistics, Univ. of Toronto. Zhu, Y.-L. and N. Reid (1994), Information, ancillarity and sufficiency in the presence of nuisance parameters, Canad. J. Statist, 22, 111-123.