ELSEVIER
Journal of Statistical Planning and Inference 52 (1996) 109-129
journal of statistical planning and inference
Edgeworth expansions for the conditional distributions in logistic regression models F a n h u i K o n g a'*, Bruce Levin b aDepartment of Mathematics and Statistics, University of Maryland Baltimore County, 5401 Wilkens Avenue, Baltimore, MD 21228, USA b Division of Biostatistics, Columbia University School of Public Health, 600 W 168 ST., New York, NY 10032, USA
Received 11 April 1994; revised 1 May 1995
Abstract
Edgeworth expansions are derived for conditional distributions of sufficient statistics as well as conditional maximum likelihood estimators of log odds ratios in logistic regression models assuming that the risk factors are not almost equally distanced. Expansions are given in several special cases. Similar results are obtained for models with polytomous outcomes. Keywords: Edgeworth expansion; Characteristic function; Logistic regression; Odds ratio;
Conditional distribution
1. Introduction
Assume Y is a discrete random variable with outcomes {0, 1, ... ,t}, for t ~> 1, which follows a polytomous multiple logistic regression model log ~pr(Y~Pr(Y==JIX'0Ix,W)}w) = e J ( W ) + f l } X '
j=l
. . . . ,t,
(1.11
where W is a matching or stratification variable, X is a v-dimensional antecedent risk factor or "exposure" variable, c~j is an unspecified nuisance function of W, and /?j = (/31j, ...,/3v j)' is a v-dimensional structural log odds ratio parameter of interest. This model is a commonly assumed disease incidence model for the likelihood analysis of stratified prospective or retrospective comparative study designs. The
* Corresponding author. 0378-3758/96/$15.00 © 1996~Elsevier Science B.V. All rights reserved SSDI 0 3 7 8 - 3 7 5 8 ( 9 5 ) 0 0 1 0 6 - 9
110
F. Kong, B. Levin/Journal of Statistical Planning and InJerence 52 (1996) 109 129
coefficient flij has a simple interpretation as the increase in log odds of disease state [Y = j ] vs [Y = 0] per unit increase in Xi. Model (1.1) is well suited for the analysis of data collected either prospectively or retrospectively. In retrospective studies, the outcome frequencies are fixed by design and the X vectors are random, while in prospective studies the reverse is true. In either case (1.1) induces a conditional distribution for the sufficient statistics given fixed outcome frequencies and subsets of exposure vectors in each stratum. It is shown by Breslow and Power (1978), Farewell (1979) and Prentice and Pyke (1979) that the conditional likelihood functions are identical for both prospective and retrospective sampling models, assuming only unbiased sampling. The conditional likelihood function is also formally identical to the partial likelihood for survival data given a proportional hazards model for cause of death [Y = j ] versus survival [Y = 0] (Prentice and Breslow, 1978). The value of fi that maximizes the conditional distribution is called the conditional maximum likelihood estimator (c.m.l.e), denoted by tic. Algorithms for exact computation of the c.m.l.e, of fi have been discussed in special cases of (1.1) by Gail et al. (1981), Howard (1972), Thomas (1971, 1975), Liang (1983), Krailo and Pike (1984), and Smith et al. (1981). Levin (1987) specifies the recursive algorithm of Howard (1972) for the general case of (1.1) and provides an APL implementation of it. Farewell (1979) compares the efficiency of conditional and unconditional maximum likelihood estimators, and concludes that for /3 = 0, both estimators are asymptotically equivalent, and for f i ¢ 0 and some special cases of X, the efficiency ratio of c.m.l.e, is quite high. A double saddle-point approximation to the conditional likelihood function of fl given a sufficient statistic for e is given by Levin (1990). Two asymptotic cases are of special interest. The classical large sample case, where the sample size per stratum is large, with a fixed number of strata; and the large sparse case, where the sample size per stratum is bounded, while the number of strata grows large. In this paper, we focus primarly on the classical asymptotic framework for a single stratum. Similar expansions can be extended to the large sparse case, or intermediate designs with several strata of moderate sizes. Consider model (1.1) for a fixed stratum W = w, and write e = ~ ( w ) = (~l(w) . . . . . ~,(w))'. Suppose there are n = n(w) observations (Xi Yi), i = 1 . . . . . n. Write Zi = (Zil .... ,Zit )' for i = 1, ... ,n, with Zij = I [ Y i = j ] , j = 1 . . . . ,t. Then the sufficient statistics for ~ and fl are S, = ~i Zi and T, = (~i X~Zil . . . . . Zi X~ Zi~)'. Let 1
U, = ~nn { T , -- E ( T . IS,)}.
(1.2)
In this paper, we derive the Edgeworth expansion for the conditional distribution of U, given S,, as well as the expansion of c.m.l.e, of coefficients /3 given S,. The expansions we derive here are specifically conditional on the fixed values of X~'s. First, let us introduce the following assumptions
F. Kong, B. Levin / Journal of Statistical Planning and Inference 52 (1996) 109-129
111
Assumption 1. There exist some constants M1, M2, with 0 < M~ < M 2 < ~ , such
that, M1 ~< IXijl
~< M2, for all i = 1, ... ,n a n d j = 1, .,. ,v.
F o r a positive n u m b e r d, we define an m-dimensional rectangle Bd B~ = {~ll~,l ~< d for i = 1 . . . . . v}. Furthermore, for 0 < dl ~< d2 < 0(3, we define Bd2 -- Bdl as the region between two rectangles Bdl and Bd2. Assumption 2. F o r ( ~ R v, define ~'X~ = 5~= 1 X~j~j, i = 1 . . . . . n. There is a constant C', 0 < C' < 1, and positive numbers C and ~, such that #{Xi: I~'X,+ q0-2krtl >C',k=O,
+_ 1, + 2 . . . . } >~Cn ~
(1.3)
for almost all ~ in BT/./n -- B1/M~ and arbitrary n u m b e r (p. Assumption 3. There exist two constants M3, M 4 uniformly for n, such that 0 < M 3 <~ Sn/n <~ M 4 < 1. R e m a r k 1. Assumption 2 was first introduced by Albers et al. (1976), and used by Bickel and van Zwet (1978) and Robinson (1978). It requires that X1 . . . . . X , do not cluster a r o u n d too few points. F o r one-dimensional X~, designs in which X~'s cluster at two values are excluded, because for any constant C', there will be at least an interval of ~ such that the n u m b e r of X~'s in (1.3) is zero, since all ~'Xi's are within C' n e i g h b o r h o o d of 2rtk and 2rtk'. R e m a r k 2. Consider more general situations. F o r the one-dimensional case, Assumption 2 basically excludes the situation where most (at least n - Cn~)X~'s cluster closely a r o u n d a subset of a series of equally spaced numbers, e.g. integers, since in that case one can find an interval of ~ and ~0 such that most of ~Xi + q) are in n e i g h b o r h o o d s of 2k~z's for k = 0, _+ 1, + 2 . . . . Examples of such fixed numbers that satisfy this assumption are: (1) Xi = i~, (i = 1, 2 . . . . . n), where a is not a nonnegative integer, (2) Xz = In i, (i = 1, 2 . . . . , n). R e m a r k 3. In the case that {Xi, i = 1, 2, ... ,n} are r a n d o m numbers from a population X, we have the following conclusion: If there does not exist a g r o u p of equally distanced numbers such that X is distributed on a subset of these numbers with probability one, then as n goes to oe, the chance that Assumption 2 is satisfied converges to one. In fact, by our assumption, for any series of equally distanced numbers and some small positive n u m b e r C', there is a positive n u m b e r 6 such that the probability that X falls out of the C ' - n e i g h b o r h o o d s of these numbers is 6.
112
F. Kong, B. Levin / Journal of Statistical Planning and Inference 52 (1996) 109-129
For n observations of this population, the number of Xi's fallling out of these C'-neighborhoods, denoted by K, has a Binomial distribution B(n, 6). One has n6 {ntS(fT~)}1/2 -
Pr{K ~ Cn ~} = P
-
K - n6 {na(1 - 6)}
"< {na(1 - a ) } ' " J"
By the Central Limit Theorem, this probability goes to zero if e < 1/2. That means Assumption 2 is satisfied with probability approaching 1. Remark 4. For the two-dimensional case, if there does not exist a class of parallel straight lines with same distance between them such that most (at least n - Cn ~) of Xi's (Xi = (X~I, Xi2)') are almost located on these lines, Assumption 2 is satisfied. In fact, for such X~'s, there exists common number { = (41, ~2) t that {'X~'s are almost equally distanced. So Assumption 2 is easy to verify. For the random multidimensional vector design similar results are valid. For simplicity we state here the results for dichotomous outcomes (t = 1). The results for polytomous variables will be given in Section 3. For t = 1, we identify = ~(w) = ~l(w) and fl = fll = (ill, . . . . ,fl~l)'. Given s as the observed values of S,, let ~ = 0~(fl)be the number satisfying e~t+ fl'x~
,~./~' = ,~. 1 + e ~+''x'-s"
(1.4)
For fixed values of x~, fl and 0 < s < n, such ~ exists and is unique. We note that this is a saddle-point for the profile likelihood function used in the saddle-point approximation for conditional likelihood analysis, see Levin (1990). We standardize T, by introducing T* as 1 rn@ -- ~ n { r n -- ~,i X i f f l } ,
(1.5)
where/~i is defined as in (1.4). We prove that the Edgeworth expansion is valid for T*. Since there is a linear relationship between T * and U., the Edgeworth expansion for U. will also be valid, and the coefficients in the expansion are derived by using the cumulants for U,. Theorem 1.1. For T * defined in (1.5), let F,s(x) be the conditional distribution function o f T * given S, = s. Then the rth order Edgeworth expansion for F,~(x) is valid for r >>.3 under Assumptions 1-3. Similarly, for U, defined in (1.2), the rth order Edgeworth expansion for the conditional distribution of U, is valid for r >~ 3 under the same assumptions. As a direct consequence, the expansion for the distribution of the c.m.l.e, of fl is also valid.
F. Kong, B. Levin /Journal of Statistical Planning and Inference 52 (1996) 109 129
113
2. D i c h o t o m o u s outcomes In this section, we prove T h e o r e m 1.1 for the simplist case where Y is a binarydependent variable ( t - - 1), X is a one-dimensional exposure variable ( v - 1) and W denotes a fixed stratum W = w. In this case, (1.1) becomes the simple binary logistic regression model log ~'pr(Y = 1 [X!'~ = [pr(Y OIX)J
o:
+ fiX,
(2.1)
where c~ = ~(w). F o r observations (X~, Y~), i = 1 . . . . , n, the conditional likelihood function given S. = s is then
e flT. y , e ~ r .,
L¢(fllS, = s)
{2.2)
where y * denotes the s u m m a t i o n over all Y~'s satisfying ~¢ Y~ = s. T , is the conditional sufficient statistic of fl given S, = s. The conditional characteristic function of T, is therefore c~.(~[s,
fl)
--
E(ei*r"[S,
=
s)
= ~* e i~(x'' + - + X')Pix "'" Pi~qi,, "'" % ,
(2.3)
~* Pil "'" Pi~qi~< "'" qi. where Pi-
e ~+p;' 1 + e =+px''
1 qi = 1 + e =+~x''
{iI, ... ,i,}
is a subset of {1, ... ,n} and {is+i, ... ,in} is the complement, and where the s u m m a tion y * is over all subsets {i, . . . . , is} with 1 ~< ii < ... < is ~< n and fixed s. Using an expression by Erd6s and Renyi {1959), one has 1 ~ (-[ {qt + ptei(~'+°'}e-i°s dO, 4"({ Is' fl) = 2nB,s(p) a - =k=l
(2.4)
where B,s(p) = Y~* Pl, "'" Pi,q~,+, "'" % = ~1 ~-~ I-[~,= 1 {qk + pte~°}e -i°* dO. Since ~b,({ Ls, fl) does not depend on ~, we will choose e satisfying (1.4) and denotef,*({ I s, fl) as the conditional characteristic function of T * in (1.5), we have f"*({ I s' fl) = e x p { - 7i~ ~ Xkl~k }
exp ~
--TY~kXk~k
}
1~9n( ~ -~n ]S, fl) (,/.~ f i Pk(~nk,
q))e -i'ps/'/'n d~p,
(2.5)
where q~ = x/nO • [ - x/nr~, x/n=], #,k = {{Xk + ~o)/x/n, and Pk({.k, gO) = qk + pke i~"*, /~k is the value of Pk at c~ = £ so is #k-
F. Kong, B. Levin /Journal of Statistical Planning and Inference 52 (1996) 109-129
114
The expansion off,* (~[s, fl) is an i m p o r t a n t part in deriving the expansion for the conditional density of T*. The main steps are similar to Robinson (1978), a detailed p r o o f is given as follows. First we let exp
- i ~ Pk~.k
Pk(~.k, ~0) =
I
exp
1
Pkqk~.~ +S=l (j +
-5
k=17j+2"k(i~'k)S+2
+ ~
'
(2.6) where ~k is the jth c u m u l a n t of YR, for k = 1 . . . . . n, and Rr+ 1,k(i~.k) is the (r + 1)th derivative of 1og(qk + pke') at t = 2k~.k divided by (r + 1)! times (i~.k) r+l for some 0 < 2k < 1 depending on ~.k. Put
V(u) = r
1 • j+2 ?j+e,k0~,k) j=l (j + 2)! k=l
Uj + ~R,+,,k(i¢,k)U,-1 , 1
(2.7)
for lu[ ~< 1 and define P,1(¢, fl) by r
2
e T M = 1 + ~ P,j(~, fi)u ~ + R(u).
(2.8)
j=l
So the expansion in (2.6) becomes exp
--~Pkqk~.~
1+
1
P.j(~,fl) + R(1) . j=l
Let us derive the upper b o u n d of R(1). Since the complex function
lk(z) = log{1 + pk(e ~ -- 1)} is analytic on the complex disk D(0, 2ro) with center at the origin and radius 2ro, then II,(z)l has an upper b o u n d C on the disk. Also, we can choose C and ro uniformly for fl in a n e i g h b o r h o o d of the true value flo and all X ~, ... , X , satisfying Assumption 1. By Cauchy's theorem,
OJlR(O) [TSk[ = ~
Cj! ~ (2ro)J,
uniformly for k = 1 . . . . . n. Consider the coefficients in the series in (2.7); we have
k=l
k=l
<<, Cj!. m a x [~,klJ-2 ~ (~,k) 2 (2ro)S 1 v k ,<,
k= 1
F. Kong, B. Levin/Journalof StatisticalPlanningand Inference52 (1996) 109 129
115
" (~x~ + +)2. ~cj~ (!~lM2_+ \ ,/. L<)'-21~ ~_2 So, choose 0 < C, < 2 ' such that for I~_l < C,x/n/M2, and [+1 < 2C, x/n,
,j+2,k(i~,k)'+2 IV(1)l = j:~ (j +l 2)! k ~, =l 1
c- i (~x~+ +)2 i H k: I
1
{!~IM2 + Iq~l) i
j = l <2ro)j + : \
x/n
(2~'0)2 j~=l k ~rO / ~ k~=l(iRk ~- + ) 2 _
(2ro) 2 2~-ro/\
2ro J k:~
W h e n C1 becomes small enough, C/(2ro) 2 3C1/2ro/(1 - 3Cff2ro) can be made arbitrarily small, say less than 6/2 > O. On the other hand, for IZol < ro, the analytic function lk(Z) is b o u n d e d by C on disk D(zo, ro) with radius ro and center Zo; therefore, by the same a r g u m e n t as before,
I~" l~(zo) <~-V-r~o C ( r + 1)
1"
SO, ]Rr+ 1,k(i{,k) = ]1~r+ 1)(i2k{.k)/(r + 1)! (i~,k) r+ 1] ~< C/r'o + 1 ]{.kr+ 1. F o r small enough {,k and i ~> 1, and we see from (2.7) that
V(1)(1)=O(k~_l"/i+2,k(i~nk)i+2 ). So, there exist positive constants D1, D2 such that
~ , i2 D2('~]M2~k°') i+2 I~i+2,kl]{nk] + ~ ~7~ \ rO
Iv(i)(1)] ~< D1 k = l
for i = 1 . . . . . r - 1. F o r V(u), R(u) defined in (2.8), we have d r - leV(u) ,
IR(1)[ ~< dur_------5 for some ]ul ~< 1, and taking r - 1 derivatives one has IR(1)I <~ I~_, aj, ... j, VU')(u) ...
V(*>(u)lleV<")l,
where the s u m m a t i o n is over all choices of positive integers Jl . . . . . j~, with 1 <~ v ~< r - 1 and jl + .-. +j,, = r - 1, and a j , . . j ' s are quantities depending on
P. Kong, B. Levin/Journal of Statistical Planning and Inference 52 (1996) 109 129
116
r only. Now we are able to give a bound of R(1). Note that by the way we derived the bounds of IV")(1)I, we have IR(1)I ~< ~ laj,. j~llV(J°(1)l .-. Iv(J')(1)lerV(X)l
//-(r-1)/2Z Ibj, ...j~l tl_(r_l)/2Vl
i~IM2_+ I~ol r - 1+2relY(l)
ro
(I~IM2 + [¢Pl)exp {_62,=1~ CZk},
(2.9)
where bi, ..jo depends on r only and Pl(t) is a polynomial of t and 6 is an arbitrarily small positive number. Putting ~ = f one derives an intermediate approximation for f,*(¢ I s, fl), accurate up to the term of order n-{"-~)/2: 1
2r%/nB.~(/~)
exp
{
1"
_.
2
- ~ ~1 Pkqk~.k
}{
}
1 + J=~' P.j'(~, fl) do,
(2.10)
where P,j(~, fl) is the value of P,~(~, fl) at c~ = ~. The change of integral limits only leads to an error less than the order of n -(~- 1)/2 In order to derive the Edgeworth expansion, put I~(fl) = ~ F~'I ~kglk(Xk -- y)2, where
X"~k4, is the weighted mean of X. 27(fl) is an approximation to the conditional variance of T*, which equals 1In times the variance of the profile score statistic in an unconditional logistic model, see Levin (1990). Let go.s(e)(¢) = exp { - ½ _f(fl)~2 } be the characteristic function of the normal distribution with mean 0, and variance 27(fl), and define 1
(27xn)l/2
exp
1=
1".. -- ~
2
Pkqk(nk
I + ~" P,j(~,fl)
j=l
)
de
l 11/2 f r~_~2 ) LE]~kq--~J go,~)(~) 1 + j=l Q*j(~, fl) ,
=F
(2.11)
where Q*i(¢, fl) is a polynomial in 4. For a small positive number C1, the difference between (2.11) and the integral of the same function on ( - 2Cxx/n, 2Clx/n) is O(e-d,) = o(n-(,-1)/2), for given d > 0. In fact, according to (2.6) to (2.9), one has
f2c1 nexp{-i j - 2c,,/.
LE,~#k~kJ
~
'
~=1
~/n J f i ~k ( ~k ' g°) drp
F. Kong, B. Levin / Journal of Statistical Planning and Inference 52 (1996) 109 129
117
n-{r-1)/2P2(1~1)exp{ -l~D(x, i0)42}, where D(x, ~) is a positive function of x and/~, P2(t) is a polynomial in t. Consider q) and ~ in the region such that 2C1 nU2 <1~o[
•
{2--~ ~kglkC2},
14k +/~ke'¢'l = = 1 - 2/3kqk(1 -- COS {.k) ~< exp and
{1.
}
fil [[)k({,k, ~0)[ ~ exp - 3 ~ ~kglkC2 " Then putting all these together, we have that for 141 < C~x/n/M2, exp
L~J
r
--
~ Xiffi
f i Pk(~nk, q))dq)
go,s@(4) 1 + }-" (~*j({, 8)
j=l
1
~< n - ~ - 1)/2Pa(141)exp{ - ~ D(x,
}
~)42},
where P3(t) is a polynomial of t. In particular, when 4 = 0,
B..(/~)
=
2 r t J n .J- ./.~
- (2r0' a
exp
-
2 ~-k ' h l
f i {qk + /~kei~/'/"} d~0 J n J k=,
1 + j=l Z 0.j(,8) -* 0
+ O(n-~'-l~/2)-
Therefore for ]4] < Clx/n/M2, we arrive at the following expansion forf.*(4 Is, 8):
f*(4ls, 8) = go,z(al(4)
1 + ~ Q,o ~* (4, 8)
j=l
1 + ~ ~,o(0, * 8)
j=1
~- O(n -(r- 1)/2)1 = g*(~[s, 8) + O( n-('- ~)/2), where b{[ < Clx/n/M2 •
(2.12)
F. Kong, B. Levin / Journal of Statistical Planning and Inference 52 (1996) 109-129
118
By the smoothing l e m m a in Feller (1971), one has
supxlF"s(x)-G"s(x)[<'l~ f~r f*(~ls'fl)-g*(~]s'fl)~where T = en (~- 2)/2 for O*(~ls, fl) and F,s(x) is
d~ 4- lZMrt~,
(2.13)
e > 0, G,s(x) is the signed measure of characteristic function the distribution function of T * . The integral in (2.13) can be
separated into ~\ n
T
+
- 1
'
+
~1
+
i X//n
+
(2.14)
C i \/"n
--
1
A s / * ( ~ I s, [1) and O*(~[s, [I) are analytic, and the difference on ( - 1, 1) is O ( n - ( ' - 1)/2), the arguments in B h a t t a c h a r y a and Rao (1986) for the one dimensional case can be used here to show that the last integral is o(n-('-2)/2). F o r the third and fourth integral,
j'-I f;1./n + -Civ/n
1 /C,-'"
~<-
~ -C>,//nIf*(~ I s, [1) -
g*(~ I s, [1)1 d~,
which is o ( n - (~- 2)/2) according to (2.12). F o r those 1 - COS(~'Xk + q~) /> 1 -- COS C'
>>-~1 C '2 --
1
Xk
satisfying (1.3), 1
~ C '4 > ~ C t2,
SO
1 1 1 - 2~k,~k(1 -- COS(~'Xk + ~0)) ~ 1 -- ~ k 0 k C '2 ~< 1 -Z
dC '2 ~ e -c,
for some C > 0, therefore according to Assumptions 2 and 3,
fi IPk(~Xk +
q0)l2 = f i {1 -- 2/SkC~k(1 - cos(¢xk + ~0)3} ~< exp{ -
1
Cn ~}
1
for same C > 0. We see
1 fc
T,jnf fi
i~f'nI f * ( ~ l s , [ 1 ) l d ~ - - - 2rtB.,(/~) Jc, n('-3)/2{1}
~<--exp
Bns(Pk)
and for some C1 > 0, ff
g*(~]s, [1)ld~ = o(n-(r-2)/2), j x//n
a-= 1 -
2
[fik(~Xk +
~P)[ dq) d~
Cn ~ = o(n -(~ -
2)/2),
(2.15)
F. Kong, B. Levin/Journal o f Statistical Planning and Inference 52 (1996) 109 129
119
by definition. Thus sup IF,~(x) - G,~(x)l ~< O(n-(r-2)/2). x
This concludes the proof of the first part of Theorem 1.1. Since T* and U, are linearly related, by Bhattacharya and Ghosh (1978) and Skovgaard (1981a, b), the Edgeworth expansions for U, and c.m.l.e, are valid and can be constructed using their own cumulants. We next give the explicit expression for the expansion of the conditional density of T* given S, corresponding to 9*(~ls, fl), for order r = 4. First we have ~/3k = Pk~lk( 1 - - 2/~k)
and ~4k = Pk~lk[ 1 -- 6ilk qk].
From (2.8) we see Pnl(~, fl) = 31.
k=l
~3k(i~nk)3
and
1 ~ ~/4k(i~nk)4_F1 ~'1 & }2 P.2(¢, fl) = ~. k=, 2 (3.w k= ~ , ~3k(i¢"k)3 " Define £ and e(fl) as above (2.11); (2.11) becomes
1;:
(2~n)1/2
F ;L
}
~ exp 1
-- 2~1/Skqk¢2 {1 + P.I(~, fl) + P.2(¢, fl)} dcp
11/2
J
+
with
-- i{E~= l ~/3k(Xk -- 2) Qnl (~, fl) -
O*~(~,/~) -
2!x/n52"1fii~ i 8{Eqp,~,} 2
(i{)3E~,= t ~3k(Xk _ ;)3 +
3 ! n 3/2
24{52]~,q,}3
'
4nE]~,~,
3(i¢) 2 {E~,= 1£/3k( Xk -- .~)}2 (i~) 2 E~,= l~3k E~ = I ~3k(Xk -- 3~)2 8n (y~]/~i~i)2 + 4~ (y'~ fiiqi) 2 (i~)4Y], = 174k(Xk - ~)4 4!n 2
(i~) ~ { X ~ = , ~ 3 k ( x k
8n 2
12 ?12~nlfflqi
+
+ 72n 3 [k-@'l ~3k(Xk __ 3~)3
.
- ~)2}2
Enl~i~i
120
F. Kong, B. Levin / Journal of Statistical Planning and Inference 52 (1996) 109-129
Given S,, the approximate characteristic function of the conditional distribution of T* is g*(~ ls, fl) -- 9o&t~)(~){1 + Qnl(~, fl) Av Qn2(~,
fl)},
with 1 + 0.,(4,/~) + 0.2(~,/~) = {1 + ~'1(~,/~) + ~'2(~,/~)}{1 + (?*,(0,/~) + Q.2(0, 3)} -1 + o
By substitution of Q*I(~, fl) and 0"2(~, fl), we may give the expressions for Q,I(~, fl) and Q,2(~,fl) explicitly, although the expressions are complicated. The inverse Fourier transformation ofg*(~ Is, fl), which is the approximate density of T*, is given as follows, neglecting terms of order o(n-1):
[-O E ~n - l ~ ~3 k ( x k - - ; ) qbo.r(a)(x) + [
__
D32;,= ,%k(xk- ;) 3
2!4n~.]fiigli
D2 (
3In 3/2
Y~,=l~(x~ - ;)2
3 {E~=1%~(x~ - ;)}~ + 2
(El/~,0~)2
E~=l%kE~--_,%~(___x~- ;)2 t -~
o ~ f z ~ = l ~ ( x ~ - ;)4
+ ~
o6{
i
373k(Xk-- ;)3
(E] fi, O,) 2
{2~=,%~(x~- ;)2}2
O0.SII~I(X),
(2.16)
k=l
where D J means the jth derivative with respect to x. From this it is easy to derive an approximation formula for the tail probability of T*. For practical use, choose r = 3. Then the density function of T* becomes qSo.z,~,(x) { 1 _ Z],= ( ~ I~3k(Xk ) 2 , x / n S--] ,2', 0 ,
Define a 2 = e~([3),
a. =
~ = l~3k(Xk -- ; ) 2]Plgli
+ 2~= ~~3k(Xk 3 33/2, --n Y'
F. Kong, B. Levin /Journal of Statistical Planning and Inference 52 (1996) 109-129
121
and E ~ = 1'~3~(x~ -
b. =
;)3
then the standardized statistic {T* + a./2x/n}/a characteristic function: exp{ - ~2/2} {1
has the following approximate
(i{)3b"
the Edgeworth expansion of its density function is
1
(2rt)1/2 exp { -- x2/2}
{1 q
)
3!o-3x/n
with tail probability 1 -
b. 3 !a3x/n
~(x)
+ -
-
See Kong (1992) for examples of the performance of this formula compared to the first-order normal approximation in various situations. There we use the exact cumulants of T. instead of the approximate ones. We note that T* + a./2 is also the double saddle-point approximation to the conditional score function, 0/~?fl Lc(fl[ S, = s). See Levin (1990) and Levin and Kong (1990). We now give an Edgeworth expansion for the c.m.l.e, of ft. The conditional log likelihood function is l~(fl) = log L~(fl[ S. = s) = fiT. - log 32* e at", where y * denotes the summation over all z[s such that ~ zi = s. Define Z _
1 c~l~ -
1
1 {
/
r.
Y*T.e at"} 52, e~r"
and 3 = x/n(fl - fl), McCullagh (1987) gives a relationship between Z and 3. With K2 -
1 632/c n c32fl
and
K 3 --
1 6331c n c~3fl '
the relationship is
3-
1
Z + - -2~/~ -1- KK~3 Z2 + o(~-'). K2
122
F. Kong, B. Levin / Journal of Statistical Planning and Inference 52 (1996) 109-129
So the cumulants of 3 are / ~ = E ( 3 ) = - I / 2 ~ / n K a / K ~ + O ( 1 / n ) , ~2= Var(3)=-1/Kz+O(1/n), and the skewness of 3 is 2 3 = - 5 / 2 x / n K 3 / K 3 + O(1/n). The Edgeworth approximation for the density of d = (3 -/~)/r is
p2(x)=~o(x)
l+3.~sr3(xg-x)
+O(n
1),
and the Edgeworth approximation of the tail probability of d is
For numerical calculations comparing the Edgeworth approximations with the normal approximations in various cases, see Kong (1992). Generally, X is a v-dimensional exposure vector in the case of binary logistic regression, and fl is a v-dimensional parameter. For Zi and S, as defined earlier, the conditional likelihood function becomes eP'T,,
Lv(fllS. = s) = pr(YIX, W, S.) - ~.ea,T°, where T. = (T.~ . . . . , T.v)' = ~ XiZi = (~ X~IZ~, ... ,~X.,ZI)', and ~* is the summation over all Zi's such that ~ Zi = s. Without affecting the distribution of T., ~ can be chosen so that e~+p'x,
,~.1 + e~ +t;'x, - s,
(2.17)
for given S, = s. The characteristic function of the standardized statistic T* = 1~,,In {T, - Y~i Xi!~i} is then
i4' ~i Xifii}E[{ei~'T'/'/'}lS'=s] f,* (41 s, fl) = exp { - ~nn 1
(2rm),/2B.s(p) exp { exp
i(n 4'
, x, ij
N
-
~. ~(~,k, ~o)do,
(2.18)
where Pk(~,k, rp) = ~lk + /~kei~°' and ¢,k = (~'XR + ~o)/x/n. ~,k can also be expressed as
~,k = 2~ = 1 ~k, ~k = (~',Xk~ + ~o)/v/~/n. Then by the same arguments as before, for ~/,fn in a neighborhood of the origin B~, we have f,*(~ls, fl) = go.r~el(~)
1 + ~ (~*j(~, B) j=l
x
1+~, j=l
~* Q,j(o, fl)
+ O(n - -(" 1)/2) ,
(2.19)
F. Kong, B. Levin/Journal of Statistical Planning and Inference 52 (1996) I09 129
123
where
1
go,t@(~) - (2r01/2
IZ@I-~/zexp
{1
t
- ~n ~'Z(fl)~_ ,
Z(fi) =~,]~k~lk(XkX'k--XX'),
a n d (?*t(~,fl) is a p o l y n o m i a l in ~, such that go:z(/~)(~)Q,t(~, fl) is the integral o f e x p { - -~Y'.I 1 , Pkqk~.k}P.t(~, - - 2 ~ fl) t a k e n with respect to ~o. Similar f o r m u l a to (2.16) can be o b t a i n e d by calculating Q*t(~, fl) explicitly.
3. P o l y t o m o u s o u t c o m e s
In general, Y m a y follow a p o l y t o m o u s m u l t i p l e logistic regression m o d e l (1.1), in which different values of Y represent different states of a disease. F o r simplicity, we shall restrict flj to be one d i m e n s i o n a l (v = 1). The general case m a y be h a n d l e d in a m a n n e r similar to the case v = 1. C o n s i d e r a generic s t r a t u m at some fixed level of W, with n = So + ... + S, statistically i n d e p e n d e n t o b s e r v a t i o n s , c o m p r i s i n g of S t >~ 0 o u t c o m e s of type Y = j . G i v e n exposures X1 . . . . , X , , let (X ("), S,) d e n o t e the s t r a t u m a n d fi = (fl~, ... , fl,)', where X(,) = {Xl,
... ,Xn},
Sn = ( S 1 ,
... ,St)' ,
o m i t t i n g So = n - (S, + -.. + S,) from the vector n o t a t i o n . The o b s e r v e d corresp o n d e n c e between o u t c o m e state a n d e x p o s u r e forms an o r d e r e d p a r t i t i o n of X {"), d e n o t e d by 7r° = (D o . . . . . D°), where D o = {X,: Y, = j } , f o r j = 0, 1 . . . . . t. T h e o b served p a r t i t i o n Tc° is one of the n!/so!si! ... s,! possible o r d e r e d p a r t i t i o n s of X ~') consistent with S,, i.e. 7r° is one of the rr's such that ~r = ( D o . . . . ,D,), with X (") = Dow ... L~D,, D:hDj = q~ for i # j a n d IDtl = S t. F o r a n y such n, a n d 1 ~
Tt(70
=
~ Xij, XieDj
T. = T,(~) = (Tx(u) ....
T,(u))'.
(3.1)
At the s a m e time, we represent the o u t c o m e of Y by a t - d i m e n s i o n a l vector Z = (Z1, ... , Z , ) ' with Z1 = I [ Y = 1], ... , Z , = I [ Y = t]. T h e n there is an equivalence between = a n d Z ~") = (Z1, ... , Z , ) ' . N o t e S, = Y] Zi = (~2] Z i l , ... ,Y~] Zit)' a n d T, = T,(Tr) = (Y,] XiZil .... ,~"~ XiZ,)', a n d the c o n d i t i o n a l p r o b a b i l i t y of x0 given X {"), S, a n d W is
Pr(= ° I X('), S,, W ) =
exp{fl'T,(n°)} y~ e x p { f l ' T . ( x ) } '
(3.2)
where the s u m m a t i o n y~= is over all the o r d e r e d p a r t i t i o n s ~ of X (") consistent with S,. Since T , is the sufficient statistic for the c o n d i t i o n a l likelihood, the e x p a n s i o n of the c o n d i t i o n a l d i s t r i b u t i o n of T , given S, will be necessary for us to derive the e x p a n s i o n
124
F. Kong, B. Levin /Journal of Statistical Planning and Inference 52 (1996) 109-129
for the conditional distribution of other important statistics such as the score statistic, log likelihood ratio statistic or the c.m.l.e, of ft. Similar as in Section 2, for observed value s, of S,, define g = (~1, --., ~t) as the unique solution of equation e~j+~jx' k=lfikJ= ~k 1 + e aj+ a j x ' - s j '
j = 1. . . . ,t.
We standardize T. by .7 Y Xkfik j ,
T * = x/nl { T . -
(3.3)
where Pk = (Pkl, ..., Pkt)' and denote the conditional distribution of T* by F.s(x). As a generalization of Theorem 1.1, we have the following result. Theorem 3.1. In model (1.1) with v = 1, define F.Ax ) as the conditional distribution of T * in (3.3). Then under Assumptions 1-3, the Edgeworth expansion of F.~(x) is valid. Also, as a direct conclusion, the Edgeworth expansion for the distribution of the c.m.l.e, of fl is valid. Although the proof is similar as that for binary response variable case, the notations are more complicated, verifications of details are more tedious. We define the notations first and state some intermediate steps as a lemma, and then give a simple proof of the theorem. For the conditional distribution of T. given S, defined in (3.2), the characteristic function is ~ b n ( ~ l s , fl) = E(ei¢'r"lsn = s) =
E[exp{i(l~
XkZkl
n
~- "'" "-~ i C t ~ , X k Z k t } ] n
X Z Zkx = $1' "'" ' 2 Z k t 1
= St]
1
Z~ [I~,~,o pko "" l-L~o, pk, (3.4) where y~, is same as in (3.2). Just as in Section 2, we have the expression 1
q~"(~ IS' fl) = (2~)tB.~(P)
~
"'"
{PRo + Pkl ei(~'+°') -
n k=
1
+ ... + Pktei(¢x~+o,)}e-iO,~ - .... io,~,dOl --- dO,
with B.s(p)
~
1
× e-iO,s,-
...
=
J-
....
~k=l
io,~,
,eLi01 {Pko + ~Fkl~" d- "'" + Pkte i¢}
dO1 ...
dot.
(3.5)
F. Kong, B. Levin / Journal of Statistical Planning and Inference 52 (1996) 109-129
125
Under transformation 0 = ~o/x/n, (3.5) becomes
c~"(~/x/nls'fl)=(2x/nx)tB,s(p)j_
F F /,~
j_,/,~
I{Pko + pklei¢'2+ "" + pk, e i¢"r} k=l
× e -i(¢'s'+
with ~(i) %nk =
( ~ i X k + ~Oi)/x/ n.
fi{Pko
" +'P's')/x/nd(pa
Put ~,k
,~(l) . . . . V.znk
=
-'- d ( p t ,
~t)), then
,
+ pk~ei¢"r + "" + pkte i¢"~,}
k=l
t
(Pko + pklei~"'~'+ "'" +
= exp
" "' .
k
Further denote lk(Z, fl) = 1og(Pk0 + Pkl eizl + " " "[- pkte i~') for a t-dimensional vector z, Ik(~,k, fl) can be expanded into a Taylor expansion, r-2
lk(~.k, fl) = ipi(,k -- 1 ~,nkY~nk ~nk
1
~- ~
j= ~ (j + 2)!
7(nJk+2)(i~nk) (j+2) + R~+ Lk(i~,k). (3.6)
In (3.6), Y~,R is a t x t symmetric matrix with elements Y,,k(i, i ) = p u ( 1 - Pk~) and Z,k ( i, j ) = -- PkiPk~ for i # j, i, j = 1, ..., t, k = 1, ..., n, R,+ Lk(i~,k) is the remainder of the Taylor expansion for lk(~,k, fl), and function ,,(J),.~(J) = j! v
,~lk(O, ~)
J,
z~' ... z,
.._
j,-i'
with ~j denoting summation over integers Jl, ... ,jr ~> 0 such that Y*Jk = J. Similar to (2.7), denote
V(u) =
1 j = l ( j + 2)!
~ 7(.{+2)(i~,k)°+2)
ul+~R~+I,k(~.k)Ui r-l,
k=l
(3.7)
1
and for lul ~< l, define P,i and R(u) as follows, r--2
e T M = 1 + ~ P,j (~, fl)u j + R(u). j=l
L e m m a 3.1. For P,j and R(u) defined as above, we have JR(l)] ~< n - ( ¢ - l)/2p2(J~l M 2 + xexp
~n
(Pl[ .....
[~tM2 At- ~ot])
((~lXk + q)l) 2 + "'" + ( ~ t X k +~Ot) 2) ,
where P2(u~, ..., u,) is a polynomial of ul . . . . . u, and 6 is an arbitrarily small positive number.
126
F. Kong, B. Levin/Journal of Statistical Planning and Inference 52 (1996) 109 129
Proof. First we will obtain an upper bound of Rr+ 1,k. Express z = pzo for any z with p = IIz II > 0. Define
~-2 1 .(j+2I(z)(j+2) "~nk ~- R~+ l,k(Z), z.~(p, fl) = Y~ j= 1 (J + 2)!
(3.8)
and )~,,,~(p, fl) = )~,k(P, fl) -- R~+ Lk(Z) as the summation term in (3.8). According to Durbin (1980), we have (p - "'*,~ ' ~,t,k " ( " 1)(U, fi) -- ,~,k ~'(~+ 1)(0, fl)} du,
z.k(p, fl) = z.k,,(p, fl) + 7.,
(3.9)
where d" lk(Z, fl) - r! g" c~l--k(z--'fl) ZO . . 1. . . ~z] . . . . ~z~' r I ! dff
z~.7(p, fl) - _ _
Z~t
rt! '
for p > 0, and (Zol, ... ,Zot)' = Zo. Since X1 . . . . . X, are bounded, by the definition of lk(Z, fl), we can choose a small positive number d uniformly for Zo and fl in a neighborhood of true flo such that lk(Z, fl) is analytic in the region [[z [[ ~< 3d. In this region, as Z.k p < 3d, IZ~(P, fl) (¢) "tU, fl)[ < r ! a Since [IZo11= 1, then [z~'l ... z~,[ ~< 1, and -
IRr+ Lk(Z)[ = [Z,k(P, fl) -- Z,k,,(P, fl)[ ~< (r + 1)e,
;o
(p -- u) r du = eft + 2.
Since [Xk], k = 1. . . . . t, have upper bound M2, we have
~ Rr+l,k(i~.k)~< gn-r/2+l/2l~M2
q- (,01r+l.
Again a use of Cauchy's theorem in the multivariate case and a similar procedure as in the binary case will lead to the conclusion. 3.1. F o r any observed value s = (s~ . . . . . st) of S,, and the unique = (~1, -.., St), we will derive the Edgeworth expansion for T,*. First T,* has the characteristic function Proof of Theorem
f,*({ls, f l ) = e x p { - i { ' ~ x / n ,
(3.10)
xd~,}~b,(~nnlS, fl )
By L e m m a 3.1, we have x/n i Xl/~i
-
exp
{
- ~
~/njk=
~',kZ,k ~,k
{/~ko+pk~e'¢*+ . . - +
}{ re
1 + ~ P,j(~,fl) j=l
}
F. Kong, B. Levin/Journal o f Statistical Planning and Inference 52 (1996) 109 129
127
~< n-(r-1)/2p2(I~lM2 + q)~ l, ..., I~tM2 +
2~
~'.kE.k~.k+~ ~'.k~.k}"
One can choose 6 small enough such that the right-hand side becomes O(n -tr- 1)/2). Define fig1 (1 - Pkl)
....
~ ffulPk, k
jz~n ~
. . . . . . . . .
""
\--Zfiklfikt k
\
- Z Xkl~kl~k,
Z/~k,(1--/~k,) k
"'"
t
Z Xk,~k,(1 -- Pk,)
k
k
and
....
3. =
1
then there are polynomials 0*j(~, fl), J = 1, ..., r - 2, such that 1 ff (2rtn),/2
oc
... f_~ exp { -- ~1 ~ ~'.k E,k ~,k } { 1 + ~-2 2 P'O(~' fl) } d~ol "'" d~o~
[~,i1/2 exp
{
oc
-
j=l
B.A.
~n ~'(/~" - ~' ~-
}{._2
}
(3.11)
j=l
The difference between this integral and the integral of the same integrand on R t -- B z C I . / n is o ( r t - t r - 1)/2). T o avoid complexity, we will not given explicit expression of (~*i(~, fl). Suppose on region [~,l < C l x / n / M 2 , ko, I < x/nrt, i = 1. . . . . n, there is at least one i', such that Iq~i,I > 2 C l x / n , then for this i', C1 < [~/k)] < 2 ~ - - C 1 , k = 1.... , n, and k=(I {/~kO+ /~klei~'2+ "" + /~ktei~':~'}
~< 121 [1 -- 2/~kO/~kV(1 -- COS {~/k))] ~ exp k=l
/'/ ~ ~ 2} -- ~PkoPkrCo ,
128
1:. Kong, B. Levin/ Journal of Statistical Planning and Inference 52 (1996) 109-129
for some Co > 0, it goes to zero faster than any power of 1In. As in Section 2,
~~//nn2-,/,= ...
i exp { _ ~i¢'~7~iXi~i i ( p~-~; ' s ~ kfi=1 {l~kO+ ffklei'~'~-"'"
"--/~fv"~
/~k,ei¢:'~ d o , "" d e ,
i/]n11/2exp
- ~n ~'/~n¢
@
1+ ~
j=l (3*j(~, B)
= O ( n - ( r - 1)/2), for/~, = / ) n -/~;A~- 1t~,. Therefore on I~il < Clx/n/M2, i = 1. . . . . n, the characteristic function for the conditional distribution of T * given S, = s can be expanded as
f*(~[s, fl)=.qo,e,(a)(~)
1+
2
j=l
(~*~(~,fl)
1+
2
j=l
O*j(0,fl)
+ O ( n -('-1)/2)
= y~(¢ I s,/~) + O(n -(~- t)/2),
(3.12)
where go,~o(a)(~) is the normal density with mean zero and v a r i a n c e / ~ . Similar to (2.15), by Assumption 2, we have
f " ' f B _ B ~ ` / ~ [f~*(~ls'~)ld~
1
~< (2~)t/2B~(P) ~.
f f 01
I f' "
~/,/o-8~
[1 - 2/~ko ~/~ki(1 - cos(~ixki + (pi))] 1/2 d(p
1'
: o(n-(r- 1)/2),
(3.13)
and so is the integral of y,(~ls,]~) on the same region. Using the multivariate smoothing theorem in B h a t t a c h a r y a and Rao (1986) we can prove that the Edgeworth expansion for the conditional distribution F~(xls, ,6) is valid.
Acknowledgements The authors wish to thank the referee for m a n y helpful suggestions.
References Albers, W., P.L. Bickel and W.R. van Zwet (1976). Asymptotic expansions for the power of distribution free tests in the one sample problem. Ann. Statist. 4, 108-156. Babu, G.J. and K. Singh (1989). On Edgeworth expansions in the mixture cases. Ann. Statist. 17, 443 447. Bai, Z.D. and C.R, Rao (1991). Edgeworth expansion of a function of sample means. Ann. Statist. 19, 1295-1315.
F. Kong, B. Levin / Journal o f Statistical Planning and Inference 52 (1996) 109-129
129
Bhattacharya, R.N. and J.K. Ghosh (1978). On the validity of the formal Edgeworth expansion. Ann. Statist. 6, 434 451. Bhattacharya, R.N. and R.R. Rao (1986). Normal Approximations and Asymptotic Expansions (revised). Wiley, New York. Bickel, P.J. and W.R. van Zwet (1978). Asymptotic expansions for the power of distribution free tests in the two sample problem. Ann. Statist. 6, 937 1004. Breslow, N.E. and W. Power (1978). Are there two logistic regressions for retrospetive studies? Biometrics 34, 100-105. Chambers, J.M. (1967). On methods of asymptotic approximation for multivariate distributions. Biometrika 54, 367 383. Davison, A.C. (1988). Approximate conditional inference in generalized linear models. J. Roy. Statist. Soc. B 50, 445-461. Does, R.J.M.M. (1983). An Edgeworth expansion for simple linear rank statistics under the null-hypothesis. Ann. Statist. 11, 607-624. Durbin, J. (1980). Approximations for densities of sufficient estimators. Biometrika 67, 311--333. Erd6s, P. and A. Renyi (1959). On the central limit theorem for samples from a finite population. Publ. Math. Inst. Hungar. Acad. Sci. 4, 49-61. Farewell, V.T. (1979). Some results on the estimation of logistic models based on retrospective data. Biometrika 66, 27 32. Feller, W. (1971). An Introduction to Probability Theory and its Applications, Vol. 2, Wiley, New York. Gail, M.H., J.H. Lubin, and L.V. Rubinstein (1981). Likelihood calculations for matched case-control studies and survival studies with tied death times. Biometrika 68, 703-707. Hipp, C. (1984). Asymptotic expansions for conditional distribution: the lattice case. Probab. Math. Statist. 4, 207 219. Howard, S. (1972). Discussion of paper by D.R. Cox. J.R. Statist. Soc. B 34, 187-220. Kong, F. (1992). Edgeworth expansions in generalized linear models and logistic regression models. Ph.D. Thesis. Division of Biostatistics, Columbia University, New York. Krailo, M.D. and M.C. Pike (1984). Conditional multivariate logistic analysis of stratified case-control studies. Appl. Statist. 33, 95-103. Levin, B. (1987). Conditional likelihood analysis in stratum - - matched retrospective studies with polytomous disease states. Comm. Statist. B 16, 699-718. Levin, B. (1990). The saddlepoint correction in conditional logistic likelihood analysis. Biometrika 77, 275-285. Levin, B. and F. Kong (1990). Bartlett's bias correction to the profile likelihood function is a saddlepoint correction. Biometrika 77, 219 221. Liang, K.Y. (1983). On information and ancillarity in the presence of a nuisance parameter. Biometrika 70, 607--612. McCullagh, P. (1987). Tensor Methods in Statistics. Chapman & Hall, London. Prentice, R.L. and N.E. Breslow (1978). Retrospective studies and failure time models. Biometrika 65, 153 158. Prentice, R.L. and R. Pyke (1979). Logistic disease incidence models and case-control studies. Biometrika 66, 403 411. Robinson, J. (1978). An asymptotic expansion for samples from a finite population. Ann. Statist. 6, 1005-1011. Skovgaard, I.M. (1981a). Transformation of an Edgeworth expansion by a sequence of smooth functions. Scand. J Statist. 8, 207 217. Skovgaard, I.M. (1981 b). Edgeworth expansions of the distributions of maximum likelihood estimators in the general (non i.i.d.) case. Scand. J Statist. 8, 227-236. Smith, P.G., M.C. Pike, A.P. Hill, N.E. Breslow and N.E. Day (1981). Algorithm AS 162. Multivariate conditional logistic analysis of stratum matched case control studies. Appl. Statist. 30, 190-197. Thomas, D.G. (1971). Exact confidence limits for the odds ratio in a 2 × 2 tables. Appl. Statist. 20, 105-110. Thomas, D.G. (1975). Exact and asymptotic methods for the combination of 2 x 2 tables. Comput. Biomed. Res. 8, 423-446. Wallace, D.L. (1958). Asymptotic approximations to distributions. Ann. Math. Statist. 29, 635-654. van Zwet, W.R. (1982). On the Edgeworth expansion for the simple linear rank statistics. Coll. Math. Soc. J. Bolyai. 32, 889 909; B.V. Gredenko, M.L. Puri, I Vincze eds. North-Holland, Amsterdam.