JOURNAL
OF MULTIVARIATE
ANALYSIS
19, 183-188 (1986)
A Remark on Semiparametric
Models
J. F'FANZAGL Mathematisches
Institut der Universitiit K&I, West Germany
Communicated
zu K&I,
by P. R. Krishnaiah
Let K be a functional defined on a family of probability measures ‘p, containing a subfamily &,. The paper presents a condition involving the gradient of the functional under whichlfor probability measures in Q-“adaptation” is possible, i.e., under which the asymptotic variance bound for estimators of K(P) under the assumption P E Cpa is the same as under the assumption P E !JI. This condition applies in particular to semiparametric models. 0 1986 Academic Press, Inc.
In an interesting paper, Begun et al. [ 11 used the term “semiparametric” (or parametric-nonparametric) to denote models which depend on an unknown function, and a Euclidean parameter which is to be estimated. A typical example: The model consists of probability measures with Lebesgue density x -+g(x- O), where g is symmetric about zero but unknown otherwise, and 8 E R! is to be estimated. The purpose of this note is modest: To suggest that a particular theory for semiparametric models is not necessary. The “functional” approach, outlined in [6], is general enough to cover semiparametric models as a special case. We start with the following more general problem: Let ‘p be a differentiable family of mutually absolutely continuous probability measures on some measurable space (X, a), and rc: !l3 --f R a differentiable functional. Assume that we know the canonical gradient of rc, restricted to a certain subfamily ‘p,,c Fp. How can we recover-for probability measures in ‘&--the canonical gradient of K in !IJ? Since the canonical gradient determines the asymptotic variance bound, the motivation for asking this question is obvious: we wish to know the impact on the asymptotic variance bound if we relax our prior assumption on the family from ‘lpO to Cp, in a situation where the true probability belongs, in fact, to ‘$,. Received April 4, 1984; revised August 20, 1984. AMS 1980 subject classifications: 62605, 62620. Key words and phrases: tangent space, differentiable estimator, adaptation.
183
functional,
minimum
distance
0047-259X/86 $3.00 Copyright 0 1986 by Academic Press, Inc. All rights 01 reproduction m any form reserved.
184
J. PFANZAGL
Let T(P, ‘&,) denote the tangent space of ‘?J3,,at P. Recall that the tangent space is defined as the set of all functions g E L&(P), P(g) = 0, occurring in representations p,/p = 1 + tg+ tr, of the densities p, of p-measures P, E ‘p,,, defined for t + 0. The remainder term is assumed to converge to zero in an appropriate sense, and T(P, ‘&) is assumed to be a linear, closed subspace of 2&(P). Let K*(., P) E T(P, ‘$&) denote the canonical gradient of K at P in !&,, defined by K(P,) = K(P) + tP(K*(',
P)
g)+O(t).
To obtain for fixed P, E ‘$3, the canonical gradient of K'(., P), we introduce the “level set” Q,:=
K
at PO in ‘p, say
(1)
{QE(~:K(Q)=K(P,,)}.
By definition of QO, the gradient K'(., PO) is orthogonal to T(P,, Q,), the tangent space of Q0 at P,, and T(P,, !$3) is spanned by T(P,, Q,) and K'(., PO) (see 16, p. 116, Example 8.1.21). Throughout the following we assume that K 1‘$I,, is injective and that K*(., P,) & 0. Under these assumptions, K*(., PO) 4 T(Po, Qo). Let K+(., P,) denote the projection of K*(., P,) into T(P,, Q,). Then ~*(a, PO) - rc+(*, PO) is orthogonal to T(P,, Q,). Since T(P,, Cp) = T(P,, a,)@ [~.(a, P,)] (with @ denoting the orthogonal product), this implies K'(',
P,,)
= C,(K*(',
PO)-
K+(',
PO)).
(2)
Moreover, K'(., P,) is a gradient of K at PO in ‘$, hence also a gradient of K at P, in CpO.This implies (see [6, p. 72, Proposition 4.3.21) that K'(., P,) K*(., P,) is orthogonal to K*(., P,,), i.e.,
po((K’(‘, PO)- K*(‘, PO)) K*(‘, PO)) = 0.
(3)
Relations (2) and (3), together with Po(K’(‘, PO) K+ (‘, PO)) = 0,
(4)
po(K*(-, PO)*) = CoPo(K*(‘, PO)*),
(5)
imply that with C,:=(l
According
-PO(K+(~,PO)')/PO(K*(~,PO)*))-'.
[6, p. 155, Theorem 9.2.21, PO(~‘(., PO)‘) and variance of estimator-sequences which are asymptotically median unbiased for ‘p, respectively Fpo. According to an appropriate version of the convolution theorem (see [6, P,(K*(*,
to Pfanzagl
(6)
PO)*) are bounds for the asymptotic
A REMARK
ON SEMIPARAMETRIC
MODELS
185
p. 158, Theorem 9.3.11) these bounds hold without the assumption of asymptotic median unbiasedness for estimator-sequences which are regular in the sense of converging locally uniformly to some limiting distribution. The relaxation of the model from ‘@,, to ‘$ results in an increase of the asymptotic variance bound by the factor co 2 1 and is, therefore, determined by K+(., PO), the projection of K*(., PO) into T(P,, a,). We have no increase in the asymptotic variance bound iff co = 1, i.e., iff K*(., PO) is orthogonal to T(P,, a,). Pfanzagl [6, p. 1683 gives an apparently different (necessary) condition for adaptiveness: that the canonical gradient of K in T(P,, ‘@), i.e., K’(., PO), belongs to T(P,, PO). It follows from (2) that these two conditions are equivalent. Note that the ratio of the asymptotic variance bounds PO(~‘(*, PO)‘) and PO(rc*(*, Po)2) depends only on the level sets of K (i.e., on the sets on which K is constant), and not on the values which K attains. It is, therefore, the same for any increasing function of rc. Professor LeCam suggested that similar geometric arguments can be applied to the limit experiment of an LAN-family. This opens a way for proving such results for more general situations than i.i.d., to which the approach via tangent spaces is limited. As a particular application we consider now the case of a semiparametric model. Let ‘$ = {PO,,: 13E 8, r E T}, where 0 c R and where T is a general parameter space (e.g., the class of all Lebesgue densities which are symmetric about 0). If 0 is identifiable, we may define a functional K I‘$ by K(P&) := 8.
(7)
Let x+ p(x, 8,7) denote a density of PO,, (with respect to some dominating measure p). Given a fixed (0,, 70) E 8 x T, let
If l(x, 0, 7) := (a/&?) logp(x, 6,7) exists, we have under Lipschitz ditions on the 2nd derivatives (see [6, p. 83, Proposition 53.11) ant,
By definition
peo,ro)
= k
eo7 70)ipBo,dw~
eoT 70)2).
(7) of K, the level set introduced
II, := {P,,:
7~
con-
(9)
in (1) becomes
T}.
From the general results indicated above, the asymptotic variance bound for asymptotically median unbiased or regular estimator-sequences of K on $3 can be obtained from the projection of r(., eo, zo) into T(Pe,,,,,,,Q,) according to (5) and (6). This agrees with the assertion of Begun et al. that
J. PFANZAGL
186
1;’ is the asymptotic variance bound for “regular” because (see [l, (3.4)]) in our notations,
estimator-sequences
I*] = l/~&),,,((Q~, eo, To) - l’C.3 80, To))*) = cO~Bo,ro(~*(.? eo, To12) (where I+ is the projection of 1 into T(PeO,rO,Q,)). In particular: Adaptation is impossible unless r(., 00, zO) is orthogonal to the tangent space of {PBo,r : z E T}, for every (e,,, zO) E 8 x T. It will not escape the reader’s attention that the concept of a tangent space used by Begun et al. is based on Hellinger differentiability, whereas the results of Pfanzagl [6] are based on the concept of “weak differentiability ( [6, Sect. I.11 ). Technically speaking, the different differentiability concepts result from different conditions on the remainder term rt in the representation pi/p = 1 + tg -t tr,. As shown by LeCam [S] these seemingly different technical conditions are, in fact, equivalent, so that Hellinger differentiability and weak differentiability are the same. For the purpose of illustration let us consider the case Tc [Wk. For this case, it is natural to write (e,, e,,..., 0,) instead of (0, r), and we write 0 to denote a fixed element. For i = 0, l,..., k let
P(x, e) :=-$ log P(X, eo, 8, ,...,e,), , and LiJ(e) := P,(P(.,
e) I”)(~ 9e)) *
Moreover, let ,4 denote the inverse of L = (Lij)ij=O,l,,.,,k inverse of (Lij)ij= ,,,,.,k.
and n + the
For ~(~~e,,e,,....ek)) = e. we have (see (9)) q.,
e) = L,,,(e)-1
toy., e).
(See also the classical paper of Stein [7, Sect. 23.) Since
Qo= {q,,, ,,,.., ok):(6 ?...V ed E Tj, T(P,, Q,) is the linear space spanned by I”‘(., II),..., Zck’(., 0)). Hence K+(-, e) = L&e)
i f i=lj=l
Lo.i(e) n;(e)
I”)(-, e),
AREMARKONSEMIPARAMETRICMODELS
187
and therefore
Since A$ = A,-
A&,lA,iA,,j,
we obtain
Together with P,JK*(., 0)‘) = L,$ this implies (see (6)) co = Lo,o(B) A,(e), so that (see (5)) we obtain the well-known asymptotic variance bound P,(K’(., Q2) = A,(e). As another application we consider an example not related to semiparametric families. Let v. be a given family, and IC: ‘p. --t R an injective functional. We extend IC to a larger family ‘p 1 v. by the following “minimum distance” procedure. Let d denote a distance on ‘$3x ‘%&For arbitrary PE ‘$3we determine PO E ‘p. such that d(P, PO) = d(P, PO). If such a PO exists uniquely for every P E $3, we may define IC for P E ‘$3 by K(P) := lc(P0). Given PO E ‘$lo, we have (see (1)) Q,={PE~:~(P,
P,)=d(P,‘p,)}.
According to the results obtained above, K*(., PO) I T(P,, a,) is necessary and sufficient for the asymptotic variance bound (for asymptotically median unbiased or “regular” estimator-sequences) to remain unchanged for measures in PO if the basic family is enlarged to ‘@. This will be the case if we choose for d the Hellinger distance or any other asymptotically equivalent distance (see [6, Sect. 7.3]), but not, in general for other distances (such as the Kolmogorov distance or a Cramer-von Mises distance). Finding estimators for K on ‘p which are asymptotically efficient for measures in Fpo may be difficult, of course. For examples of this type see [2,4], where Cpo is a parametric family and K 1y. the parameter, and Beran [3], where ‘p. is the family of all symmetric distributions, and K 1 '$I0 the center of symmetry.
ACKNOWLEDGMENTS The author wishes to thank Dr. W. Wefelmeyer for valuable suggestions, and Professor L&am for a hint about possible generalizations.
188
J. PFANZAGL REFERENCES
[l] [2] [3] [4] [S]
[6] [7]
BEGUN, J. M., HALL, W. J., HUANG, W.-M., AND WELLNER, J. A. (1983). Information and asymptotic efficiency in parametric-nonparametric models. Ann. Statist. 11 432-452. BERAN, R. (1977). Minimum HeIIinger distance estimates for parametric models. Ann. Statist. 5 445463. BERAN, R. (1978). An efficient and robust adaptive estimator of location. Ann. Stafist. 6 292-313. BERAN, R. (1981). Efficient robust estimates in parametric models. Z. Wuhrsch. Verw. Gebiete 55 91-108. LECAM, L. (1983). Differentiability, tangent spaces and Gaussian auras. Unpublished manuscript. PFANZAGL, J. (with the assistance of W. Wefelmeyer) (1982). Contributions ?o a General Asymptotic Statisfical Theory. Lecture Notes in Statistics, Vol. 13, Springer Verlag, New York. STEIN, C. (1956). EIIicient nonparametric testing and estimation. Proc. Third Berkeley Symp. Math. Statist. Probab. 1 187-195.