Journal of Statistical Planning and Inference 67 (1098) 227-245
ELSEVIER
of Js•tOumal istical planning and inference
Orthogonal projections and the geometry of estimating functions n Schultz Chan, M a l a y Ghosh* DtT~artment o]" Statistics. Unirersitv of Florida. P.O. Box ! h~'545. 103 Gri[lin-Floyd Hall. Gainesrilh,. "l~) b FL .~.611-,~1545. USA
Received 13 December 1996; received in revised tbrm 16 June 1997
Abstract in this paper, a notion of generalized inner product spaces is introduced to study optimal estimating fimctions. The basic technique involves an idea of orthogonal projection lirst introduced by Small and McLeish ( 1988, 1989, 1991. 1992, 1994). A characterization of orthogonai projections in generalized inner product spaces is given, it is shown that the orthogonal projection of the score fimction into a linear subspace of estimating functions is optimal in that subspace, and a characterization of optimal estimating functions is given. As special cases of the main results of this paper, we derive the results of Godambe 11985) on the tbundation of estimation in stochastic processes, the result of Godambe and Thompson (1989) on the extension of quasilikelihood, and the generalized estimating equations Ibr multivariate data due to Liang and Zeger 11986). Also we have derived optimal estimating functions in the Bayesian ti"amework. (t.~ i t~98 Elsevier Science B,V. All rights reserved.
I. Introduction The theory of estimating t'tmctions has made rapid strides during the past decade. One of the major advantages of this theory is that it can be applied in a variety of contexts. parametric, semiparametric or nonparametric. As pointed out by Godambe and Kale (1991), one of the highlights of this theory is that it unilies two most important methods of estimation, namely the maximum likelihood and the least squares, combining many of their strengths, and avoiding much of their shortcomings. For a likelihood fully specilied except Ibr some unknown parameter, Godambe (1960), in his pioneering article, introduced a criterion under which the score function turns out to be the optimal estimating function. Related ideas are also found in Durbin (1960) and Kale (1962). Later, Godambe (1985), and Godambe and Thompson (1989) found optimal estimating I'unctions within certain classes ot" semiparametric models " Corrcsl~onding author. n Research partially supporled by NSF (irant Number SBR-t}42399t~. 0378-3758/98/$19.00 cQ 1998 Elsevier Science B.V. All rights reserved. P!1S0378-3758(97)00118-3
228
S Chart. hl. Ghosh/Journal of Statistical Phmn#tq and b!f,,rence 67 (1998) 227-245
which did not require specification of the functional form of the likelihood, and relied instead on a quasi-likelihood approach. Optimal estimating functions for vector-valued parameters of interest were found by Chandrasekar and Kale (1984). Bhapkar (1972) introduced the important concept of information contained in estimating functions. It is our understanding that the underlying thread in all these results is a geometric phenomenon which views the optimal estimating functions as orthogonal projections of score functions into certain linear subspaces. This idea was brought out first in a series of articles by Small and McLeish (1988, 1989, 1991 ), and is also available in Small and McLeish (1994, Section 2.4). Chang and Hsiung (1991), and Murphy and Li (1995) used the projection idea tbr some specific examples, though somewhat implicitly. Recently, Waterman and Lindsay (1996) have used projected scores to approximate conditional scores. One of the objectives of this article is to make this geometry very explicit, so that it becomes accessible to a wider readership. A second, and possibly more important goal is to collect, extend and unify a number of important results in the estimating functions literature under one single banner of orthogonal projection, which may pave the way for future important research in this general area. The outline of the remaining sections is as follows, in Section 2, we sketch very briefly the mathematical prerequisites tbr results of the subsequent sections. In particular, we detine generalized inner product spaces, and show the existence of orthogonal projections of elements in these spaces into some appropriate linear subspaces. A characterization theorem Ibr orthogonal projections, and a related information inequality are provided which are used repeatedly in the subsequent sections. Section 3 begins with the application of the results of Section 2 to generalize some results o1" (;odambc (1985), and Godambe and Thompson (It)89) to multiparameter situations. Also, in this section, optimal generalized estimating equations ((il!Es)are derived tbr multivariate data. As special cases, optimality of {JEl!'s given in Liang and Zegcr (Iq86). Zhao and Prentice (1991). and Liang et al. (1992) are established. We repeat that the common thread in the derivation o1" all the optimal estimating I'unctions is the idea of orthogonal projection developed in Section 2. The notion of orthogonal projections can also be adapted within the Bayesian paradigm. Section 4 uses this idea to derive optimal Bayes estimating funclions. The results of Ferreira (1982) and Ghosh (1990) are obtained as special cases. Amari and Kumon (1988) also explored very effectively the geometry ot" estimating functions. Their main objective was to characterize estimators that arise as sohttions of estimating equations, and remain consistent in the presence of i,tinitelv many nuisance parameters, avoiding thereby the Ncyman 'qcott paradox.
2. Orthogonal projeclion As lUCntioncd ill Ii1,: introduction, much o1"what is presented in this section is implicit in Ch, 2 of Small and McLcish (1994), it seems worthwhile to present these restdts
S. Chart. M. GhoshlJournal olStatisth'al
Plammag and h!lerence 67 (1998) 227-245
229
explicitly for the sake of completeness with the hope that they will serve as a potential source of future reference. For brevity, most of the proot~ are ommitted. For details, one is referred to Chan(1996). Let .t" be the sample space, and (9 the parameter space. Consider the class of all functions L = {h : ,'/" x 6} ~ 1~ }, such that every element of Eo[h(A', O)h(X, 0)'] is finite. For any two elements hi and h2 E L, define hi + h2 as (hi + h2 )(x, O)= hl(x, O) + h,.(x,O) and tbr any k × k matrix N, define Nh by (Nh)(x,O)=Nh(x,O). Further, we define the generalized inner product (hl,h2)o as
(1)
(hl,h2)o = Eo[ht(X, O)h2(X, 0)'],
where E0 denotes expectation conditional on 0. This family of generalized inner products satisties the tbllowing tbur properties: (1) for all hl,h~.EL, (hl,h2)o=(h2,hl)to; (2) for any k × k matrix M, and hi,h2 E L, (M hi,h2), = M (hl,h,.)o; (3) for all hi,h2 and h3 E L, (hi,h2 + h3)0 = (hi,h2), + (hl,h3)o; (4) for all hE L,(h,h)o is non-negative-definite (n.n.d.), and (h,h)0=0 implies that P.( h(X, 0 ) = 0 ) = I.
Two fimctions hi,h,. E L are said to be orthogonal if (hl,h~.)o = 0 for all O. Also, two subspaces LI,L2 of L are said to be orthogonal if every element of L~ is orthogonal to every element of L,. We now begin with the definition of an orthogonal projection. Definition I. l, el L be the generalized inner product space with generalized inner products (.,.)0, supposc Lo is a linear subspace of L. Let s <~L. An clement yo ~ Lo is called the onhogonal projection of s into L. if (.,........... r,,,s '
r.),, '
n}in(s T
I~
.... r..~. "
~'),,
(2}
°
for all 0, where rain is taken with respect to the usual ordering ot" malrices. More specitically, tbr two square matrices .4 and B of the same order, we say that .4 ~ B if A ~o B is n.n.d. The following theorem characterizes the orthogonal projection in generalized inner product spaces. For its proof, wc refer to Chan (1996). Theorem I. Lel L I~e Ihe gem,rali:ed imwr ln'ochtcl space with .qem,rali:ed inner producls (., .),, and Lo he a linear .~'tchspace o.f L. Lel s ~ L, then yo E L~} is the orlhogonal
pro.j¢clion q/'.~" inlo Lo (I and oH/l' (1' (s -Yo, Y),, = 0
(3)
.for all 0 and r E Lo. i.e. s--.to L~" orthoqonal to ererr eh'menl of L,. Furlhermore, !/ I/If orlho~lOll{ll projeclion e.x'isls, lht,ll il is iolique.
S. Chan. M. GhoshI Journal of Statistical Phmninq and b!terem'e 6 7 (1098) 22 7-245
230
Next we observe that tbr any finite dimensional subspace in a generalized inner product space, the orthogonal projection always exists. In order to do this, the famous Gram-Schmidt orthogonalization procedure is used in generalized inner product spaces. We need another definition. Definition 2. Let (L, (.,.)0) be the generalized inner product space. A set of functions {hi}~: I is said to be linearly independent if for any set of k × k Inatrices /IA I.!'-I I / = I' defining i-I
el = h i ,
e, = h, -
Y~ Ajh/.
i = 2 .....
n,
j z:: I
l(e,.e,),: i~ {! ..... . / . 0 ~
6)} are all invertible.
The tbliowing is the Gram-Schmidt orthogonalization procedure in generalized inner product spaces.
Proposition I. /f {h, }','.....! is/#war/.l' ei el,
hi. =
e~
h,
#UhT,emh'nt.
(h~,ei)<,(el,el)o i
hl, - ~ (ht,.e,),,(e,.e,)~Ie~, I
h't
k E {2 ..... n}.
I
Then { e, }'~' I are orthm/omd. The above result is used to prove the existence ot" the orthogonal projection ot' every elelll¢lli of a generalized inner product space into a Iinitc-diinensional subspac¢.
Theorem 2. Let (L. (,.,).) lw the (,ten('rali:cd/mwr imldu('l ,Via('c. and h'! Lil lie a /mitc-d#nensi.nal su/tV~m'c ,/' L Irflh /im,ar/y imh,pcmlcm I~,lsis, Th(,n f o r any s ~i~L. the m'thor/omd projection o#'s into Lo always e.vists, Proof, Pronl Proposition I, without loss of generality, we can assume that {hi ..... h,,} is an onhogonal basis tbr L.. Let A, =- (s.h,),,(h,.h,),i I.
i ~: { I ..... ,n}.
We ¢laini illat the orlhogonal projection ors into Lo is ttl
h, : ~..t, h,. QII
1"o see this, lbr any h =: \~"/)a /is ~ L., (s ......h.,h), :=: ( s , h ) , ,t
(h..h), i.
r ill
]
S. Chan, M. GhoshlJounml of Statistical Phmn#u,I and b!/erence 67 (1998) 227-.,245 m
nl
.i= I
.i= I
231
.4j,,bI
III
[<.'. h,>,, - Ai,,lbl i= !
=0. Now apply Theorem 1. [] Theorem 2 will be used repeatedly in the subsequent sections for the derivation of optimal estimating functions. Next we state an abstract infonnation inequality, which is fundamental to our later study. The motivation for the following definition will be clear from the subsequent section where we define information related to an estimating function.
Definition 3. Let (L, (.,.>,) be the generalized inner product space, let s E L be a fixed element. For any ,q E L, the information of g with respect to s is defined as
= <.,i, s>l,<.,l,.,>,7 <.,,s>,,,
(4)
where ' - ' denotes a generalized inverse. We shall also need the tbllowing theorem later. Theorem 3. Let (L, (.,.),) he the t,lem,rali=ed imwr l,'ochtct space, amt let Lo he a linear stthspace of L. For tin.l, s c L, .~'uppose g* L~' the ortho~lOmd Iwo.fi'ctiml o.[s into Lo. Then /,,. (0) ........I,,(0) is n.n.cL, ,l}," all ~.1c Lo trod O.
We refer to Chan(1996) tbr its proof. For the one-parameter case, an alternate proof is due to Small and McLeish (1994, p. 82) via the Riesz representation theorem (cf. Hahnos, 1951, pp. 31-32). The following result will be used to establish the essential uniqueness of optimal estimating functions. Theorem 4. With the same notation clS above, i/' .q*, ~.!~ Lo, and g* L~' IIw m'thogomd pro.fi'clioll O/'s into Lo, and (ff*,g*)o, (.q,.q>o arc im,ertil~h,, then i,r(O)=l,l(O) (fatal
only i/' there e.vi.sl.s an im;erlihh' matrix N(O) such that .q*(X, O) = N (O)g(X. 0),
u'ilh prohehilil.r one wilh re.vwcl Io I~.
232
S Chan. M. Ghosh l Journal ~!]"Statistical Planninq and h!ference 67 (1998) 227.245
Once again, we refer to Chan (1996) for its proof, As an easy consequence of Theorems 2-4. we have the following corollary. Corollary. Let (L,(.,.)o) be the ,qeneralized #met pro&wt space, and let Lo be a finite-dimensional subspace of L with linearly independent basis. For all s E L, and .q E Lo. let
/o(0) =
:/),T (:/,s),,.
Then there exists #* E Lo such that
(5) is n.n.d., for all .q ~ Lo and O. Futhermore, i f (~.i*,O*), and (~,I,~,I)o are invertihh', then /,~.(0) = l,j(O) i/'and only if there exists an inrertible matrix N(O) such that ~.t.(X, O) = N(O)~./(X, 0), with prohabili O" one with rt,spect to Po.
3. Optimal estimating functions: A geometric approach In this section, we will apply tile results obtained in tile previous section to the theory of estimating functions. We begin with the delinition of unbiased estimating fitnctions. A fimction
,q:.¢'×~
..... I~ ~
is an unbiased estimating !unctio~i if
EI.,j(A.0)I01=O. ,/0cO.
{6)
An unbiased estimating .~hnction c/is called rcguhu" il" the Ibllowing coTlditions hold: (i) d.(O) ~ El?~,l,/?Orl~' I, ( I <~i,i ~k ) ,;:,.¢sts; (ii) Eic,I[X,O) c,l{&0¢]0! is p~itive detinitc. Let L denote the spa,.,.- of all regular unbiased estimating functions. This hunily of geperalized inner products will b~- used throughout this section without specilic refi:rence to it. Also we shall denote by s the score function of a parametric tamily of distributions, We assume also that the score vector is regular in the sense described in (i) and (ii), Recall tile delinition of generalized inner product as given in (!) o1" tile previous ~ection.
Delinition 4. Let (L, t.,./~) be the fiuuily of generalized inner product spaces, and let L, be a subspace of L. For any ,q E Lu, the intbnnation function of ,q is defined as
S. Chart, M. GhoshlJournal of Statistical Pkmn#u,I and h!lerence 67 (1998) 227 245
233
follows:
!,,(0) = E L,~0
I0]' (~,1,,,I)[,~E[~10] ,
(7)
where it is assumed that (tj, g)o is invertible. An element ,q* E Lo is said to be an optimal estimating function in Lo if
!,,. { 0 ) - ~,(0) is n.n.d., for all a E Lo and 0 E 61. Next we prove a key result which snows that (8) is indeed equivalent to (4) of the previous section. In the rest of ,*.his section, unless otherwise stated, we shall assume the following regularity condition for unbiased estimating functions. (.'~). For any ,q E L,
E ~
]
0 =-~[,,jsl01.
(8)
Lemma I. Umh,r the reouho'ity r~,.:ufithm (.~), for alO' ~,!E L, the #!formation matrLv o['t,! can be written as
~,( o) = (.q.s)',,(.q. ,i) i; ~ (.q..,.),,, where s is the score .lum'thm.
Proof. The result Ibllows easily since Ibr any ~.1c4 L, use (8) to get
r,,,,i] Theorem 5. Let Lo be a suhspace o I' L, assume that the ortho~.lOmd proje¢'tion ~,1" o./ s #tto Lo exists, then !,~(0 ) ,l" (0),
Vii E 61, ,q E Lo.
(9)
Le. g*E L{} is an opt#rod estimat#lg ./imction in Lo. The opt#rod eh,ment #a Lo L~' unique in the .lollowinq sense: (1' .qELo, then /,j(0)---/q.(0), VOE61, i/' aml onh' i/' there e.vists hwertilde matrix-rah~ed.limcthm M : 69 --,/t,l~ ,<~ sudt that ./or an.r 0 E (-),
.q*(X. 0) = M(0) .q(A~0).
(10)
with prohahilit)' I with respect to Po. Proof. The lirst part Ibllows easily ti"om Lcmma I and Theorem 3. The second part
follows li"om Theorem 4. K1 Note that it" Lo is a tinitc-dimcnsional subspacc of L. fi'om Theorem 2, an orthogonal projection .q* ofs(E L) into L0 always exists, so that the conclusions given in (9) and
234
S. Chan, M. GhoshlJournal of Statistical Planning and Infi,rence 67 (1998) 227-245
(10) always hold. Also, in this case, Proposition l and Theorem 2 show that how to construct optimal estimating functions. In the remainder of this section, we shall see several applications of Theorem 5 for deriving optimal estimating functions in different contexts. We begin by generalizing a result of Godambe (1985) when the parameter space is multidimensional. Also we bring out more explicitly the characterization of optimal estimating functions in a more general framework thm~. what is given in Theorem 1 of Godambe (1985). Let {XI,X2,...,X,,} be a discrete stochastic process, O c I~~ be an open set. Let h, be a [~k-valued function of X! .... ,X~ and 0 to i~k, such that E~_~[h,(X~ . . . . .
X,; 0)101 = 0
(i = i . . . . .
..
0 ~
0).
(It)
where E,_ I denotes the conditional expectation conditioning on the lirst i - ! variables, namely, Xi . . . . . X,_ i. Let
where A~-i is a Mkx~-valued function of Xi .... ,X,-i and O, for all i E { I . . . . . n}. A result about optimal estimatioa in stochastic processes is the following theorem, which generalizes the result of Godambe (1985). Theorem 6. With the .wun(' notathms as abot,e, suppose h, sati.~:lies the regulariO' condition (,#). Let
Ih,I ]
I
E. ilh. hl[O]
~
Vi~ { I . 2 ..... . }
mtd
(./* =
It
E 'a*h,. I
I
Th('ll lhe ./hllou'hlg ('on('hlshms hohk (a) g* is the orthogomd pro, fl'('lion ol's into Lo. (b) ~,1" i,s" col opthll(d (',s'lhtl(Ithlc, I .liol('thm hi Lo, L('.,
/,~(0) ~/,,. (0), lor all (3 E L. mM 0 ~ (,). (c) !/'(.l~; Lo oral E[(,t if'Ill] is im'('rtil)le, then /,AO)=/~o.(/t), Vtt~i (-) il mul onh" (1' there exisL~' ml im,ertibh, molri.x ° .lira(thin N • (-) ~ M~ ~~ su('h th(lt .lhr m O' I) c ~), ~,l,(Xm . . . . . . '~;,:0) :: N(0) ,q(A't ..... A;,: 0). with probobility I with r(,,~lw(.t to Po.
S. Chan, M. GhoshlJournal o,l" Statistical Plmm#~e, am/hzfi'rem'e.
67
(1998)
...7-,,.4.'~ ~ ~;
235
Proof. (a) For any g = ~-'~'i'=::lAihi E Lo, OE 6),
(.~ - .~l*. ;I), = (.,'.o),- (,.F..q),, Ii
II
I1
- ~ E[~ hl All0] - ~ ~ E[,q h, h) A)I0] i=l
i=i j = l
Ii
II
-~
E{E,_,[~ hl AIlO]IO}-E i=1
E[A? h, hl All0]
i--I
- E ETA,.* h, h~ A~I0]- E EtA? h, h', A~I0]. i<
.i
(12)
i>j
But for i
E[A? h, h'. A~IO]--E{Ej ,I
.
-
,[A* h, h~ A'.IO]IO} .
.!
= E{A* h,E/_,th ~ A)lO]lO}-o. Similarly, for i>.i,
EtA* h, h~ A'IO] = o. Thus, from Eq. (12), we get
'/
(.,' - ;f.;I),, = ~ E
E,_ , ~
}
0 ,~I10 - E E{A*E,-~,lh,hl O]AI O} =0.
l'l
i~:l
Ilcnce g* is the orlhogonal pro.icction of s into Lo. Parts (b)and (c) of the theorem Ibllow easily l'rom part (a) and Theorem 5.
Ii]
A second application of Theorem 5 is to give a geometrical Ibrmulation of a result of Godambe and Thompson (1989), who proved the existence of optimal estimating functions using mutually orthogonal estimating functions. What we show is that the optimal estimating function of Godambe and Thompson is indeed the orthogonal projection of the score function into an appropriate linear subspace. To this end, let .'/' denote the sample space, 0 = (0~ ..... 0,,, ) be a vector of parameters, hi, j = I ..... k be real functions on .'/'x 69 such that
EIhAX, O)IO,.'~)I=O,
VO~6), . i = I ..... k,
where .'/) is a specilied partition o1" .'~'..i = I ..... k. Wc will denote EI.IO.
,1 .... E~,d.lOl.
•~ '
Consider the class o1' estimating functions Lo = {g" g = (gl ..... g,,, )},
S. Chan, M. Ghosh/Journal of Statisth'al Phmning and h!leren('e 67 (1998) 227-245
236
where k
gr = ~ qirh,, i= !
r--
m,
I .....
q ? . - . f x 0 ~ R being measurable with respect to the partition d) for j = 1,..., k, r = 1.....
n/.
Let
E,/,[?h:/:O,.lO]
e,j,t/qlo]
q~'=
for all j = ! . . . . . k, r =
(~3) I .....
m,
and
k
:/,*.
=
E
j=: I
qj,.h * !,
r=
I,. . . , m .
The estimating functions h / , j = I . . . . . k are said to be mutually orthogonal if E(
/)[q/,.h/q/,,.,hf[O]=O, V.i * •
# j',l;r'=
(14)
I ..... m.
Theorem 7. With the same m~tathms as ahore, i[" {h i }~=l ore m u m u i h ' orthogonui. then the .followhtg hohl:
(a) g* is the orthogonal projection o.[ the score fimction s h~to Lo. (b) #* is an opthnul esthnathu, l,/iuwtion ht Lo. t o ) / f .,to L,, ,rod f[~l .,/'lOi i.,. hwertil, h,, then /,,(0) = /,,.(0), VOEt9 (/'and on/y il' there e.vLvts an hn'ertitde motrfv .[,ncthm M : (') .... M~ ~~ such that fin' un.r 0 E O,
~I*(X; 0 ) ~ M (0) ,q(X;0 ). with I:,rohuhilitl' ! with rt',Twct to Po. Proof. (a) We only need to show that, VrE {! ..... Is - ~.l,Y,g,.), =0,
m}.
<./,. =
VOc 0
i.e.,
(.,,,.I,.l,, = (,.l,~,c.I,),,.
vo~ o .
But {~.l,*.,C.h.},~::= E ~ ili'
Elq~,h,q:,.h,,lO] I
.~ E E rIq,,,. *-~ 'h,q,ooe, I)[qi,.hiq,,,.hi, lO]lO} ~* * i ll':l
~l
qi,'hi ,
S. Chart, M. Ghosh IJoun~al (.~I"Statistical PhmnbtO and h!li'r('n('e 67 (1098) 227 245
237
k
= E
E(q,*.qi,.E~j)lh~lOllO}
i= I
=~E
qj,.E(j~,
o
o .
]= I
Also k
(s,g,.)o =
~
E[q#.shjl O]
./'= I k
= ~ E{q,.E,i)lshilOllOI .J:: I
=
rh,I] ,,}
'{
EE qi,.E,J,l~
i=: I
0
Thus, g* is the orthogonal projection of the score function into L0. Once again (b) and (c) tbllow from part (a) and Theorem 5. This completes the proof. Note fllat part (b) of Theorem 7 is due to Godambe and Thompson (1989), while the other two parts arc new. We repeat that this theorem provides a geometrical formulation of optimal estimating functions in a finite dimensional subspacc of estimating fimctions. Also through this approach, the characterization of optimal estimating function is very easy te establish. Finally, we apply the above result to obtain optimal generalized estimating equations I'or multivariate data. Let .'/) denote the sample space flu" the jth subject, (") C IIU be a subset with nonompty interior,
ui'.'~i x ¢9 ~, ItU,
i = I ..... k,
where .',ti is a vector of dimension hi, such that E[,AXi, O)JO]=O, V0c O. Suppose that conditional oll 0, { . d X i , 0)}~ I are independent. Consider the space of functions
L,, : { ~ It;(0) ui(A~, 0)}, where 1~(tl) is a d x ni matrix, i = I ..... k. Let tt;*(0)=-=E (,-~j0
[Varlu, JO)] .....~,
k
:j* ......~ Ir;,*lOl,,lX,.O). i
I
Then we have the following result.
i= I ..... k,
S. Chan. M GhoshI Journal of Statistical Phmn#l,q aml b!ference 6 7 (1908) 22 7...245
238
Theorem 8. With the same notathms as above, (a) g* is the orthogonai projection (~" the score ftou'tiot! into Lo. (b) g* is an optimal est#nating fimction in Lo. (c) I f gELo, and Eta gtlO] is #wertihle, then lq(O)=!u.(O), VOEO il'and only ~l there exists an invertible matrix fimction N : 0 --+ Alk ×k such that .[or any 0 E O, g*(X; O) = N(O) .q(X; 0), with probabilio, I with respect to P,.
Proof.
(a) We only need to show that Vq =
~]~::=l I1~(0) u,(Xi,0),
<.,', ;/>,, ==
But ( .q*.,/>,,
k
k
w,* (.,(x,.o)..,,(x,,.o)),,ir;'
=
I
1i'
t
I
I
Also
t
I
,
i]
Thus, ~1' is the onhogonal projection of the score |hnclion into IQ,. Parts (b) and (c) [bilow from part (a) and Theorem 5. Ill Note that by choosing appropriate fimctions tt,, we can very easily get the generalized estimating equations introduced by Liang and Zeger (1986). ()f course, with this particular tbrmulation, GEEs thil to include 'working ¢ovariances'. For further inlbrmation about generalized estimating equations, we relbr to Liang et al. (1992).
4. Oplimal Bayesian estimating functions 11) this section, we study the geomelry of estimating functions within a Bayesian IVamework. There are two basic approaches here. One Ibrmulation is based on the joint distribution of the data and prior, as introduced by Ferreira (1982). The second
S. Chart, ill. GhoshlJournal ol" Statistical Plutm#l~.! mid Inli,rence 67 (1998) 22T245
239
formulation, due to Ghosh (1990), is based on the posterior density. We shall study both, and see how the notion of orthogonal projection can be brought within Bayesian formulation as well. We begin with Ferreira's (1982) formulation. Let .'g be the sample space, O C [Rk be an open set, p(x[O) be the conditional density of X given 0, and n(0) be a prior density. Let .q : ;~" x O --, [Rk be a function such that ( ! ) (~0/?0 exists, V0 E O; ( 2 ) E[o(X, O)o(X, O)t] is invertible, where E denotes expectation over the joint distribution o f X and 0. Let L denote the set o f all functions .q :.%' × O ---, [~k which satisfy ( 1 ) and ( 2 ) above.
The generalized inner product on L is defined by
(.f,t/)=E[.l'(X,O).q(X,O)t],
V./', ~/EL.
(15)
This definition of generalized product is based on the product moment calculated over the joint distribution of X and 0 as opposed to the one in the previous section where it is conditional on 0. it is straightforward to verily that (15) is a generalized inner product on L in as much as il satisfies the four properties mentioned in Section 2. The following calculation will be used to serve as a key connection between the tbrmulation of Ferreira about optimal Bayesian estimating functions and our geometrical tbrmulaiion, it also provides a geolnetric insight to the result of Ferreira. Throughout this section, we shall always assume that p(XI0) and 7t(0) are differentiable with respect to O. I , e m m a 2.
s, =
Let rt(OIX )/)e the posterior ck'nsit.l', mid
i~l°g~(OIX) ?0 l
m i d ~1, :.#' x 0 ........, It\~ I)('
,
v.ic {I ..... /,.1,
a .limctio,.
ls'l.~l.,', 1 = -E [ a~, ]
7'7w,
+ E{ c~t.'l,q,lOl
,~o/
+ rl,q, lO]
:log~(O)} ,~o,
'
116)
Proof.
(t
E [~.hS/]- E E
~.h
0
c~O/
E{ E[cI, ( '~i°g l'(Xl¢l)
elog,~(o) a!; ) I°] }
.....
.....
,,o/
+
:--E{t'["'( *~1°gp(xl"):o, })]o]+-E{L~[""~I°~(O)I°]}:o; ......
1~'{<~L'Io,IOI L a!/ J + E
[hl!] a!/
+ E[~.I, IO] ~ f ~ /
: log ~(o) .
S Chart. M. GhoshiJournal (~I Statistical PlmmMq and b!li'rem'e 67 (1998) 227 245
240
This completes the proof.
Note that if
[] then
E[~,IO]=O,
r<-..;1.
E[qi.vs] = - EL(.~t~.j,
(17)
also if g, is only a function of O, then
[0,e~(o)] ~'o, J"
(18)
{ ;E[.q,10] ? log :re(0)} ;0, + EIa, IOI ?0,
(19)
EIo:jl=EIEI<.#mlOI}=F. Suppo~ now
B,,t,.t)=E tbr i then
=
I .....
k , .i =
<:,..,.> :
-
I .....
k,
where g =(gl
.....
.q~ ). Let s =(si
.....
s~
). using Lemma 2,
(el":" L~]-B,,(:,,).
If E[:I]O]=O, from (17) and (18). ((,I, s) == - E '''q ,'
(20 )
also if ~l is only ;i function of O, thell ~'0
( 21 )
'
Now by combi,i,g the previous theorem and the above lelmna, we have the Ibllowing restdl, which is a generalization of the nlaln result due to Ferreira to the multidimensional case, Tht, irem
9, F,r (./c! L. h't
A/,, =: E l:/(x. 0):/(.x: 0)' l.
(22)
the.
:~)) l',w,, ((L.r::,,] L?O,j - B,,(
[;o,1
...... i~,,(:1
.))
lm" all (,Ie L.
Prtmf, From the previous lemma,
<:,.,>= ........
_,,,<:,,))
Also M, := (s, slris, s) IIs, sl. Thus, the result fi.)llows easily from Theoren'! 3. E]
S. Chan. 711. Ghosh/Jourmil
o[ Statistical Plamlhul and h!/i,rem'e 67 (1998) 227--245
241
Note that if k = I, the above theorem reduces to the result proved by Ferreira (Biometrika, i 981 ). For any g E L, let
,.,: ((.r..,.1 ,,..<,,.))',,.,... ((.r,,.1 _,,.,,<,,.)) L OJJ
<2.. "
In the definition o f / , , ((E[Cgi/~Oi] - Big(g))) is a measure of sensitivity of g, and M,~ is a measure of variability of g. Thus, in analog with the frequentist case, the tbllowing definition seems to be appropriate about the optimal estimating function in the Bayesian framework. Definition 5. if Lo is a subspace of L, and g*E Lo, g* is called an optimal Bayesian estimating function in Lo, if for any g E L, #,l ~,,'.
Next we prove an optimality restllt about Bayesian estimating functioris in this tbrmulation.
Theorem I0. Wilh the same nolaliolt as ahot'e, the gem,rafi:ed imu'r prodm'# on L L~ dl:lhu'd I}y (15), uml let Lo he a suh.vnwe o./ L, !f g* is the orthogomd pro.fi, ction o./ s into Lo, theo ( I ) if* is an oplhmd Ba.resian eslimalinq,lilllClioll in Lo; (2) the optimal Baresian estimatim, I ,limction in L{} is lmique in the ./blhnviml sense: ,lot any ~,1ELo, I,~ = I,~. (/" uml only (/" there e.x'isL~' ao im'erlilde it × k mao'i.x" IV 3'llt'h lilol c,l* :~:~MI, I.
Proof. (I) From Theorem 3,
tbr all .q E Lo. But t'rom Lemma 2, !,, = (.q,s) t (g,g) .....I(.q, s) lbr any g E Lo. Thus, the result tbllows easily. (2) tbllows easily t'ronn Theorem 4. [ii] Next wc apply Theorem 10 to a case where Lo is a linile-dimcnsional subspacc of L, with linearly independent basis. Let {u,(Xi, O)}hi" i be a I'amily ol'n, x I vectors ot" parametric functions and c(0) be a m x I vector such that ( I ) tbr lixed 0 E ~2, u~(.,O) : ' / ~ R'" is measurable; (2) c : 6 ) - , R" is measurable;
242
S. Chan, M. Ghosh / Journal t!l Statisth'al Phmn#i.q mid b!li'rence 67 (1998) 227-245
(3) E[u~lO]=O, and E[v]=O; (4) conditional on 0, {u,(X,,0)}~: I are independent. Consider the space L0 of estimating functions of the form K
h = ~ [Wi(O)ui(Xi, O)] + Qv(O), i=l
where for any O E O , Wi(O) is a p x n i matrix.
matrix, for all iE {! ..... K}, and Q is a p x m
Theorem I I. With the same notation as ahore, let
[I]'
Wi*(OI=E ?u---L0 ( Var(u, lO)) -i ,
(E[,,(O)v(O)'])-
Q* = E
'
and K
:#*
=
~ (w,*(o) ui(x,,o)) + Q* r(o). i
I
Then (a) t,l* is the orthotlonal pr~/et'thm o.f s hlto Lo; (h) tI* i.~' till optimal Bayesian estimating ftmt'tioll ill La; (¢) ol)liltltl/ IJ,ft)'t'.~hlll ('.vtimtltilR, I ./iOlclhtl! ill Lo is tllliqllt" ill the .['olloirillfl sells(': (/ t,I E Lit, tht'n l,I = Iq. (/'tltld Otl/,l' (/there exi,vts an illl'erlihh' nuln'ix M such thai r/* (.Vi ..... XA'; 0 ) .... M .q(Xi ...... t'~.: 0 ),
u'ith prohaMlity I with respect to th(' joint ~fistriliiition ~#/'the .r's and tl. Prool, (it) I:or all 7 11 ~ h i ( il',( 0 )u,(.!i~, ill ) 4 Q r(0),
(.,' ..... ,,#*.,,#) ~ (.~',,,#) -(u*,,q). But k
I':{Elsu,(A~, 0)'!011/;(0)'}
(s,,q) = ~ I
+
Elsv(O)'IQ'
I
~.~E i
l:.. r,
< ) 0
II;10 1,
I
) [ +E
rl/tl
?0
Q'
and K
(:~*,,,~)
~ Ic{ lI;'~(o)/~'l.,(.v,.o).,(.v,.oetOltl;(o) '} =-~E
E
0
t1'0(01t
C)*/'l,(O)r(O)'lC>'
+ E viol
Thus by Theorenl I, ,q* is die orlhogonal projection of s into L0.
O t.
S. Chart, ,~L GhoshlJoumal ol" Statisth'al Plamibuj mid b!ference 67 (1998) 227-245
Parts (b) and (c) follow from (a) and Theorem !0.
243
[]
Next we turn to the formulation o f Bayesian estimating functions introduced by Ghosh (1990). In this formulation, the parameter space is assumed to have the form 0 = ( a l , b i ) x . . . × (ak, bk). We start with a result which is very similar to Lemma 2. Lemma 3. L e t n(OIX)be the posterior densiO', and s i = t~ log n(OlX)f~O~, j = 1 . . . . . k, and ~,li : .:'/' x 6 ) ~ R be a fum'tion with suitable regulariO, condition; then
It'ht'/'l'
Bj(,q,)=
lira r/AX,O)n(OIX)- lira O,(~X;O)n(OIX).
O,-.I,,
Proof. Note that
El'~lisil'¥] = .
O,- -,'.a/
i, ,~zc(OlX ) 'q'
?,0,
dO
['"'1 ]
-- ElS, lq,)lX] - E ~
X.
Next the delinition about posterior estimating functions is introduced. A function .q: 6) x .'/'--~ II;~t is called a posterior unbiased estimating fimction (PUEF) if
EI,,IIo,X )ix I = o, B,(~,I, I ::,::O,
(24)
Vx ~:! .~', i,.i c { I ..... k }.
(25)
Aclually, all we reqtiirc is lhal
i'-'llt,(.,t, )lXl = o,
v.,,: E .~', i,.i ,-~.: { I ..... k
}.
Let 1, be the space consisting o f all Ihncfions #:(.) x d'~-. II~~, which is i'UEI: and EJg,qtlX] is invertible. A I'amily o f generalized inner products on L is delined as
lbllows: tbr any .f, g E L, and x E .'#', U , . q ) , ~= E [ S ( 0 , X
)~.l(O,X ) ' I X = :,,1.
(26)
Thus, in this Ibm'mlatiork the expectation is calculated conditional on X ::x. If' the score function s E L, then Ii"om Lemma 3, <:,...,.>
'
........
((,.:r,,,,,i.,.=,.]))
Next Ibr every .q E L, x E .I', deline
'
244
X Chart. M. GhoshlJournal oi Statistical Phinnino and htJerence 67 (1998) 227245
Let Lo be a subspace of L, O*E Lo is said to be an optimal element in Lo if
for all .q E Lo, and x E .I'. The following result now follows very easily. T~r~m
12. With the same notation as above, suppose thai the orthogonal proiecthut
.q* o r s into Lo exists with respect to the .qeneralized inner pt'odtwts. Then g* is ¢~l,Iinlal in Lo, Le..for ,111 g E Lo, we Itat'e t/tat
Vx E .t". Furthermore, the opthnai eh,ment hi Lo is unique #t the .folhnr#tg sense: (f ¢IE Lo, then / , / ( x ) = / , r ( x ) , V.rE.~" it" aml only (I" there e.vists an im,ertible matri.vrahwd.liott'thm M :.'t" --, M~ ×~ su,'h thai g(O; x ) = M (x).q* ( O; x ).
Proof. The first part of tile theorem is a consequence of Tlleorcm 3, and the second part is a consequence of Theorem 4. Note that if s E !.o, then s is an optimal estimating function. As a corollary of Theorem 12, we have the lbllowing generalization o1" a result clue to Godambe (1994) about optimal estimating I'unctions to multi-dimensional paramctcr space.
Corollary0 !f ~,t* ~I L. is Ih(, rwtho~lott(ll proi¢clhttl 01 .v j/rIo I,¢~; Ih('tt (a) I,(x ) ~ I,. (x). fro' all ¢1~: L, and x .: .t;
Not,: that it is easy to see that if" the parameter space is one-dimensional, then (a) is
equivalent to corr{~,l*,StX} '= )corr{c./,slx} "~, tbr all .q(~ L. :rod x E:r. This is the result proved by Godambe (1994).
Acknowledgements We thank two retirees Ibr their very careful reading ot" the manuscript.
Referenct~ ,&nliiri, N,, I(umotl, M,, 19~R I~limltti~m m tltc pi~nol~'¢ of many mii~an¢¢ paramctcrs-gc~mtctry of estimating ftlll~,'tit)l|,~ Alttl, ~|{iti,~|, i(L I1)44 lOGX, Ithapkar, V,P., 1972. ()n ii meanur¢ of ¢l|iclo~cy of an ¢,qiil|i|tin~ ~quatiotl. Saakl|ya A 35, 467 472. ('hal|, ,~., I')LJ(~, (,'OIIveMty ~tlld tl|¢ ~¢oltl~try of ~stm)atmg ftll|k, tiolln. Ph.I). dissertalhm. I)k~'pilrtl|lent Of Sti|tistien, [ll|i~¢rsily t~|" i:lorida, tmimblishcd.
S. Chart, ,~L Ghosh I Jourmd o,f Statisth'al Phmnhu.I and h!t'erem'(, 6 7 (1998) 227245
245
Chandrasekar, B., Kale, B.K., 1984. Unbiased statistical estimation fimctions in the presence of nuisance parameters. J. Statist. Plann. Inference 9, 45 54. Chang, I.S., |lsiung, C.A., 1991. An [-ancillarity projection property of ('ox's partial score fimction. Ann. Statist, 19, 1651-1660. Durbin, J., 1960. Estimation of parameters in time series regression models. J. Roy. Statist. Soc. Ser. B 22, 139153. Ferreira, P.E,, 1982, Estimating equations in the presence of prior knowledge. Biometrika 69, 667-669. Ghosh, M., 1990, On a Bayesian analog of the theory of estimating function. D.G. Khatri Memorial Volume. Gujaral Stat. Rev. 47-52. Godambe, V,P., 1960. An optimum prt)perty of a regular maximum likelihood estimation. Ann. Math. Statist. 31, 1208-1212, GodamL¢, V.P., 1985. The tbundations of finite sample estimation in stochastic processes. Biometrika 72, 419-428. Godambe, V.P., 1994, Linear Bayes and optimal estimation, preprint. (iodambe, V.P., Kale, B.K., 1991. i-stimating timctions: an overview. In: (iodambe, V.P. (Ed.), Estimating Fu:a~;tlons. Oxford University Press, Oxford, pp. 3 20. Godambe, V.P., Thompson, M.E., 1989. An extension of quasi-likelihood estimation (with discussion). J. Statist. Plann. Inference 22, 137-172. Halmos, P.R., 1951. Intrt)ductit)n It) |lilbert Space and the Theory of Spectral Multiplicity. Chelsea, New York. Kale, B.K., 1962. An extension of ('r;.|n|¢r Rao inequalily tbr statistical estimation fimctions. Skand. Aktur. 4 5 , 6 0 ~ 89. Liang, K.Y, Zcger, S.L., It)86. Longitudinal data analysis using generalized linear models, Bit)metrika 73, 1322, Liang, K.Y., Zeger, S.I.... Qaqish, B., 1992. Multivariate regression analysis for categorical data (with discussion). J. Roy. Statist. Soc. Ser. B 54, 3 41). Mcl,eish, D.L.., It)92. A projected likelihood timction for semiparametric models. Biometrika 79, 93102. Murphy, S., l,i, B., It)95. Projected partial likelihood and its application It) longitudinal data. Biomctrika 82, 399 4116. Small, ('., Mclxish, I).1.... 1~88. (ieneralizatit)n of ancillarity, cotnpletcness and sullicicncy m an inference limctit)u space. Ann. Statist. l('b, 534 551. Silmll, ('., Mcl,eish, I).L., 198t). Projection as a method I't)r illcreasiug sensitivity iltld elin)inating nuisance i~ar;itlletCl's. Ilitmletrika 76, (¢)3 7113. Small, ('., Mcl°eish, I).1..... I tit)l. (icometrical aspects t~f cllicicncy criteria fur spaces of estimating ftmcliotls. lit: (iodatnbc, V.P. (I-d.), I!stimating I:unctituls. I)xlbl'd I!nivcrslty Press, ()xl'urti, pp. 2c~7 271~. bmlali, (.',, McLclsh, D,I It,ttM. IIiIbcrl Space Method,, m Probahilily and Slatistical Inferct|cc. Wiley, New York° Waterman, R.P., Lindsay, B.(i., I~)01~.The accuracy of projccled score nlcthods in apl~roxinlalitlg condilional scores. Biomdrika, 83, I i3. Zlmo, [,.P., Prentice, R.I.... It)~)l. tim of a quadratic expollClltial illodcl to generalc cslilnatit)g eqtlatlt)ns for means, variances and covarianccs. In: (iodamhe, V.P. I I-d. L I-slimalitlg Functions. ()xft)rd Ilnivcrstty Press, Oxlbrd, pp, 1113 117.