Journal
of Statistical
Planning
and Inference
187
27 (1991) 187-201
North-Holland
A two-stage spline smoothing method for partially linear models Hung
Chen*
Department Stony
of Applied
Brook,
NY
Jyh-Jen Institute
Horng
7 August
Recommended Abstract: parametric a two-stage
National
showed
regression
model.
component
undersmoothing
Hsinchu,
received
the partial
19 January
spline
is generally
biased
estimate
these estimates Subject
proposed
Brook,
Taiwan 30043, R.O.C. 1990
estimate and
of the parametric
it is necessary
component
Classification:
component
in a
the non-
to undersmooth
and phrases: partial
1. Introduction paper,
Partial
regression;
splines;
rate
parameters,
with
by Denby (1986) and Speckman are also shown
for both estimates.
secondary
new
we show that the estimate
without
(1988). Asymptotic Furthermore,
normali-
we associate
62599.
semiparametric
scores;
the
scores methods.
62605;
efficient
parametric
We also show that the same result holds for the partial
(1986) efficient Primary
rates for smoothing
at the
component.
independently
with Wellner’s
choosing
be estimated
the nonparametric
for the parametric
In this
that
By appropriately can
ty results
words
manuscript
model
regression
vergence;
York-Stony
component to force the bias to be negligible with respect to the standard error. We propose spline smoothing method for estimating the parametric and nonparametric components in
parametric
Key
of New
Studden
Rice (1986)
a semiparametric
AMS
State University
Tsing Hua University,
1989; revised
by W.J.
semiparametric
and Statistics,
Shiau**
of Statistics,
Received
Mathematics
11794, U.S.A.
additive
regression;
smoothing
splines,
rate of con-
models.
and summary we
consider
the following
yin=X;B+g(fin)+eeln,
i=
l)...)
semiparametric
regression
n,
model (1)
where both the xin = (xi,,,, . . . ,x~~,)~ (a d-vector) and lj,, (a real number) are known, ’ is a vector of unknown regression coefficients, g is a smooth funcP=(P17...,RJ tion to be estimated, and the {ej,} are independent noise terms with mean zero and variance ~7~. * This work was sponsored by the National ** Current address: AT&T Bell Laboratories, NJ 08540,
Science Foundation under Grant No. DMS-8901556. Engineering Research Center, P.O. Box 900, Princeton,
U.S.A.
0378-3758/91/$03.50
0
1991-Elsevier
Science
Publishers
B.V. (North-Holland)
H. Chen,
188
J.-J.H.
Shiau / Spline smoothing
for partially
linear models
There have been several approaches to estimating /I and g from noisy data { ri,}. One approach is the so-called partial spline estimation method proposed by Engle et al. (1986), Wahba (1984, 1986) and Shiau et al. (1986) among others. The partial spline estimate is the minimizers of the following variational problem: min A i (U,,-XlrnP-g(tin))2+~J(g) p~@,gs w;” n ;= I where WT is the Sobolev space {f 1f has m - 1 absolutely continuous derivatives and fcfn) E&[O, 11) and J(g) = 1; (g’“‘(t))2 dt is a penalty functional measuring the smoothness of g. The smoothing parameter A controls the tradeoff between fidelity to the data and roughness of the solution. Let X=(x,.,) be the n x d design matrix for the parametric part of (1) and y = (yl,, . . . , Y,,)~. It can be shown easily (e.g., Shiau and Wahba and g = (g(tl,), $=
(1988), Speckman (1988)) that the partial obtained from (2) are
spline estimates
for /I
. . . , g(t,,))T
(XT(I-S,)X)-‘X’(Z-S~)y
and
g = S,(v-Xp^),
(3)
where SA is the smoother matrix for ordinary spline smoothing in (2) with /?=O. There have been several studies on the asymptotic behavior of j?. Heckman (1986) considered the case where x,, is ‘white noise’ and showed that fi(p-p) is asymptotically normal under mild assumptions. However, Rice (1986) considered a simple model where the X, and t, are not independent. For d = 1, he let xin = h(tjn) + zin where h is a smooth function and zin behaves like white noise. He found that the parametric rate of convergence O(n-1’2) for /I can be achieved in general only at the expense of undersmoothing the nonparametric component g. Thus the use of the generalized cross-validation (GCV) method proposed by Craven and Wahba (1979) for choosing A is questionable in this case. Similar results have been obtained in the case when x,, is deterministic, say, xin =h(r;,,). Shiau and Wahba (1988) (SW) studied convergence rates for the mean square errors of the partial spline estimate in (3) as well as the partial regression estimates (called Denby/Speckman-type estimates Speckman
in SW since it was proposed (1988)) defined as follows:
independently
p, = (XT(Z-S~)2X)~1XT(Z-S~)2y
and
For d = 1, they reported parametric convergence for the bias for both rate than that of p^ in some cases. for A optimal for predictive mean not be optimal for mean square reported similar results as in SW.
by
Denby
gl = SA(y-X/?,).
(1986)
and
(4)
convergence for the variance but nonparametric p^ and p,. However, /?I has a faster convergence They also pointed out that data based estimates square error in the function, such as GCV, may error of /?. Eubank and Whitney (1989) also
We propose a new two-stage spline smoothing method motivated by the following ideas. First, we note that the partial spline method does not penalize the roughness of the smooth term in the parametric component, h(t). In the case of x,,= h(fi,)+zi,, it is natural to consider including h(t) in the penalty term. Second, we
H. Chen, J.-J.H. Shiau / Spline smoothing for partially linear models
189
notice that the independence assumption of xi, to t gives Heckman (1986) the asymptotic normality result while Rice (1986) reported the negative result described above. This gives us the idea of extracting z,, from xin such that zin is more or less orthogonal and hence independent to t. We remark that Eubank and Speckman (1989) also suggested the same idea of orthogonalization motivated by the partial regression method. Thus we propose the following two-stage spline smoothing procedure. Procedure.
Stage 1. Forj=
1, . . . . d, smooth the j-th column vector xj of X with respect the residual vectors rj and form the block matrix R = [r,, . . . , rd]. Stage 2. Apply partial spline smoothing with X replaced by R to obtain estimate as in (3).
to t
to obtain
the
In this paper, we study the case that the smoother is the smoothing spline smoother. Note that in the stage two we are basically penalizing the roughness of the whole function rather than just the function g in the partial spline smoothing. If we use the same smoother S>,,,for each regressor, we obtain
(5) However, the appropriateness of using the same smoothing parameter is doubtful since the regressors may have different smoothness, not mentioning they may have quite different scales and variations among themselves. It is more natural to use different smoothing parameter for each regressor. In this case our new estimate can be written as
p(’= (F((P- &y(Z- S,)(i- s)z)- ‘XT&
g~=s~(y-x~“)-(z-s~)s~~~,
S)r(z - SA)Yt
(6)
where x7, f, and Sare block matrices to be defined in Section 2. We show that under regularity conditions, the negative result reported in Rice (1986) disappears for these new estimates of /3 and g. More specifically, by choosing appropriate smoothing parameters, the convergence rate of /& reaches the parametric rate O(n-I’*) while go keeps the same optimal convergence rate as that of the ordinary smoothing splines. This estimate has another interpretation. Wellner (1986) gave two genera1 schemes using scores for constructing asymptotic efficient estimates for semiparametric models. It can be shown that WelIner’s first method leads to an estimate of the form (6). This gives the estimate (6) a flavor of efficiency. However, we must warn the readers that Wellner’s approaches do not guarantee the efficiency for every semiparametric model, and it is not our goal to discuss the efficiency in this paper. It is interesting to note that if the second method in WelIner (1986) is used, we obtain
190
H. Chen, J.-J. H. Shiau / @line smoothing
estimates for /I and g identical of Wellner’s second approach shown in Section 3.2 that
linear models
to (3), the partial spline estimate. Also, a variation leads to the partial regression estimate. It will be p”i in (4) achieves the parametric rate without
undersmoothing the nonparametric the context of kernel smoothing. Buja, Hastie, and Tibshirani estimates for additive models.
for partially
component.
Speckman
(1989) proposed another Model (1) can be treated
(1988) gave this result in
general scheme to construct as a special case of additive
models. In this approach, first the system of normal equations in the population version (in terms of random variables and the conditional expectations) are derived and then converted into the data version (in terms of the realizations of the random variables and the smoother). The smoother in the sample version can be quite general, e.g., a smoothing spline smoother, a kernel smoother, a running mean smoother, etc. A backfitting algorithm is then used to solve the normal equations. This leads to the estimate in (3), the partial spline estimate, if the smoother S is the smoothing spline smoother. For the estimate (3), it is shown in Rice (1986) and Theorem 1 in Speckman (1988) that it is not possible to estimate /I at the parametric rate n-i’* while the average mean square error of & achieves the optimal rate of convergence when we use either the smoothing spline smoother or the kernel smoother. On the other hand, Chen (1988) using the estimate (3), Speckman (1988) using the estimate (4), and this paper using the new estimate (5) or (6) show that, by choosing appropriate smoothing parameter(s), /3 can be estimated at the parametric rate X1’* while the average mean square error of & achieves the optimal rate of convergence. However, we note that the regression spline smoother used in Chen (1988) is a projection so that his estimate (3) is in fact a special case of (4) or (5). Since many general estimation schemes have been proposed recently with smoothers up to user’s free choice, we think it is important to warn users that different smoothers can behave quite differently even for the same scheme. The choice of the smoothing parameters is well known to be crucial to the solution. For the semiparametric model (1), one thought is to estimate both p and g well by a two-stage method, say, first by undersmoothing g to get a good estimate of /I and then smooth the data with the estimated parametric component taken off to get the ‘right’ smoothness for g. Two essential questions arise: (1) How much undersmoothing is appropriate? (2) How to do undersmoothing automatically (i.e., data-driven)? The two-stage smoothing method we propose in this paper does not require undersmoothing. Furthermore, our results hold for a wide range of rates for smoothing parameters. We conjecture that our results hold for the L’s chosen by the generalized cross validation method proposed by Craven and Wahba (1979). Research on this aspect is underway. The remainder of this paper is organized as follows. In Section 2, we introduce notation. In Section 3, under regularity conditions, we derive the convergence rate results and show that fiO and fi, can be estimated at the parametric rate O(K”*) without undersmoothing the nonparametric component. We also derive the asymptotic normality for &, and p^t.
H. Chen, J.-J.H. Shiau / Spline smoothing for partially linear models
191
2. Notation Let I= [Z, . . . . I] be an n x nd matrix composed by putting d n x n identity matrix Z in a row. Similarly, S(n,, . . . ,&) = [S,,, . . . , S,,] where S,, is an n xn smoother matrix. Finally, let 8 be the nd x d block diagonal matrix defined as follows:
(7)
where Xi = (Xlj,, . . . , x,,,,,)~, the j-th column vector of X, and O,, , = (0, . . . , O)T, an n-vector. Throughout the rest of this paper, we will denote $(A,, . . . ,&) by s when there is no confusion. Simple algebra shows that X=1x,
QFT(f- qT(z- S,)(f- S)8), = XT(Z-SJ(Zand that the i-th element
3. Convergence
of xT(P-
$)T(Z-
SA)(Z_ SA,)X,
SA)y is x,r(Z- S,,)T(Z-
Sn)y.
rates
Asymptotic analysis parallel to that in Rice (1986) is used to study the behavior of our new estimate (6) and the partial regression estimate (4). It is well known that the solution to (2) when /Z = 0 is in the space of natural splines of order m on [0,11. According to Demmler and Reinsch (1975), a basis for this space is (@jj.,(t)>isj<, with the following biorthogonality property:
Here {A,,} is a nondecreasing values of SA are (1 +Ukn)-‘. Set
(n-‘cz=i rkin
(kjnjo,
sequence
Zu,,@kn(tln), Note that xijn
of nonnegative
hkjn=n-“2C:=i =
hj(ti,)
numbers hj(tin)@kn(t;n),
and the eigenand
I’=
+ zijn. For j = 1, . . . , d, we assume:
l’l+oO.
n--t 03; z is positive
definite.
A3. SUplsk5n I
192
H.
Chen, J.-J.H.
Shiau / Spline smoothing for partially linear models
A4. The points t, are regular in the sense that (2i- 1)/2n = j: p(t) dt for some continuous density function p(t) on [0, 11. A.5. mz2. Set hj=(hlj,, . . . . h,,)T. Let B~,=n~‘gT(I-S~)*g, Bt,=n-‘h,7(~-S~,)*hj, and B:jp = K’h,T(I- SA)*hj. Note that BFP is the averaged squared bias of the ordinary smoothing spline estimate of g when /3 = 0. A similar interpretation is applicable to B& and B2,,,. The following lemma is due to Speckman (1981). Lemma 1. Under A4 and AS, (a) ,I,, = ~k~~(1-t o(1)) where c is a constant
depending on p( . ), and o(1) denotes a term tending to zero as n + 00 uniformly for k,, I k< k,, for any sequence k,, + 03 and k,, = o(n*‘(*“+ I’); (b) B&=0(1)
ifge
W;l.
Thus Lemma l(b) also implies that Bzjp = 0(/l,) and B$,,= O(L) if hjE WT. Throughout the rest of this paper, we will write a(n)= b(n) when a(n)/b(n) is bounded away from zero and infinity. Lemma
2. Assume that A4 and AS hold and A = KS for 0 < 6 < 1, ;
(1 + &,A))’
for some positive o(P*m).
= o(n-
constant
1/2m)+~(,~l/2~~l)~~(nl/2-r)
5, depending
on 6 only. Also,
Ck (1 + A,&)*
=
Proof. The second equality is given in Speckman (1981). To show the first equality, we mimic the argument used in Speckman (1981) to split the range of summation into [1,Am1’2m], [X1’2m,n3,‘4m], and [n3’4m, n]. The sum for the first range is bounded by AP1’2m. Over the second range, by Lemma l(a), we can approximate A,, by the second sum by an integral which gives K1’2m as a ck21n and approximate each summand on the third range is bounded bound. Since Akn is nondecreasing, by the first term. Thus the third sum is bounded by O(n * nm3’2X1) = O(n-“*A-‘). This completes the proof. 0 To study asymptotic behaviors of the two estimates (4) and (6), we first obtain convergence rates for some terms which will be used later in the proofs of Theorem 1 through 6. The proof of Lemma 3 is given in the Appendix. Lemma 3. Assume that Al-A5 hold, A t. nPs and Ai=npsi for 0<6,,6< that g,h,e Wpfor i=l,...,d. Then for lli,j
1, and
H. Chen, J.-J.H.
Shiau / Spline smoothing ,for partially linear models
>(I-SE,)Sn,xj=o(n-“2)+O((AJ)“2), (e) XiTSI,,(IS,)iSl,Xj=O(,,) + 0((~j~j)~“4’n10g2 (f) tr S,” = 0(AP”2”z),
193
(d) n-‘$(I-&
(8) x,‘]S, + (I- USirlTISA + (I(h) n~‘XT(Z-S~)2Xj=~;j+ O(l), (i) n-‘~,~S,‘x, = 0( 1).
S,P,,lx, = O(n),
We are now ready to show the main 3. I.
?Z),
results.
Two-stage smoothing spline estimate
For the parametric
component
/I,,, we have
Theorem 1. Suppose that the same assumptions (a) E(&)=/?+o(n~“2)+O((maxj A,A)“2, (b) nVar(&,) --f a2Xm’ as n-m. Proof.
as in Lemma 3 hold. Then
We show (b) first. By (6), we have nVar(&)=a2A[‘A,A;r
where
Sj~)(P- 5)x
At = K’ZT(f-
S)‘(Z-
A, = K’,f’(F-
S)T(Z- S,)‘(f-
and
S)%.
Note that both A, and A2 converge to Z as n -+ ~0 by Lemma proves (b). To show (a), observe that E(/$-P
3(a) and 3(b). This
= A,‘[n-18T(J-S)T(Z-S~)S~~]
+x~,‘[~-‘X~(II-$)~(Z-S~,)~]. Thus
(a) is obtained
by Lemma
3(c) and 3(d).
q
Next, we study the asymptotic behavior of &. Define the average squared bias of & at the data points, B2(A,x) =n-‘C:=, (E&(t,,)-g(tj,))2, and the average variance of &, V(A,i) = n-‘Cl_, Var(&(t,)) where i= (A,, . . . ,A,). Convergence rates of B’(A,x) and k’(A,I) are given in the following theorem. 2. Suppose the same assumptions as in Lemma 3 hold. Then (a) B2(A, x) = O(A) + O(n-‘(maxi A3i))1’2mlog’ n), (b) v(A, 1) = 0(n-1A-1’2’“).
Theorem
Proof.
Recall that & = SAy - [S, f+ (I-
E&-g=
S,)s”]x&.
We have
-(Z-SA)g-(Z-S,)~~~-[Sl~+(Z-SA)$]~(E/&/3).
194
H. Chen, .I-J.H. Shiau / Spline smoothing for partially linear models
Then (n&(13., K))“2 = IIJ%.-gII~
IlFS,)gII + ll(~-wm + ll[S,~+(~-S,)Sl~(E~~-_p)ll.
03)
Note that II(Z-SJg~~2=gr(Z-SA)2g=nB~p=O(n~) by Lemma l(b) and II(Z-S,Jf?~jl /12= + O((max, Ai))1’2m log2 n) by Lemma 3(e). The square of the third term of (8)
O(d)
equals [E(B,) -p]rF[SJ+
(z-S$?]r[SJ+
(I-
S,)Q~[E(&)
-/I]
= [o(n~l)+O(max;~ii)]O(n)=o(l)+O(nmaxi~i~) by Theorem l(a) and Lemma 3(g). Putting these three terms together, we have nB*(A, X) = O(nA) + O((max; Ai))1’2m log2 n) + O(n maxi A;A), hence (a) holds. To show (b), we note that nv(&X)
=E(ll&-E&l~2)
=E(IIS,e-[S~I”+(Z-S~)~]~(~~-~~~)~~2)
~2~(11~~el12+lI[~~~+(~-~~)~l~(~~-~~~)l12) = 2o2trS~+2tr(8T[S*~+(Z-_~)S]T[S~~+(Z-S~)S]~Var(Bo)) where e=(e,,, O(n_‘F’2”). Define
. . . . e,,)T. By Lemma 0
the average AMSE(A,L)
mean
square
= n-l
3(f), 3(g) and Theorem
error
of &(t)
i MSE(&(tj)) i=l
l(b), we have I’(n, I)=
as
= B*(1, x) + I’@, I).
Let A*, 27 be the ‘optimal’ A, 1, which ‘minimize’ AMSE(A,x) in the sense of convergence rates. In fact, we equate the convergence rates in B2(&z) and that in v&1-> to get: Corollary AMSE(I*,
1. 17 = O(K 2m’(2m+1)(log n)4”) for j= 1, . . . ,d, x*) = 0(,-2m’(2m+ t)) where n” = (AT, . . . , As).
A* =O(np2m’(2m’1)),
and
Remark 1. By Theorem l(a), we know that ED,, =p+ ~(n-l’~) if max; AiL = 0(/C’). Suppose 1 can be chosen so that it achieves the optimal rate 0(n~2m’(2m+1)). Then we only need maxi Ai = O(n?O) with 1 > a,,> 142~ + 1). However, by Theorem 2, for AMSE(I, x) to achieve the optimal rate, we need maxi Ai = 0(n-2”‘(2m+ “(log n)4”) which goes to zero faster than O(n -2m’(2m+1)ip) for any positive constant E. There exists E such that 2m/(2m + 1) --E> 1/(2m + l), say, E = 1/(2m + 1). Thus we have shown that by choosing appropriate rates for ,J and ~j, we can estimate jI at the parametric rate n -1’2 while the average mean square error of & can achieve the optimal rate of convergence at the ‘about right’ degree of smoothness. By applying the Markov inequality to X1 Cp (&(t;,,) -g(ti,))2, Theorem 2 immediately gives us
H. Chen, _I-J.H. Shiau / Spline smoothing for partially linear models
Theorem
3. Under the same conditions
n-l;!,(&(t;,)
-g(t,,))*
=
195
as in Lemma 3,
0,(2 + n~‘~~“2m + n-‘(maxi
&)“*“(log
n)*).
When A=A* and ~j==~, np’C:=, (~~(t~~)-g(t;~))2=O~(~~2m’~2m~‘~). Next, we will show that fi(&/?) converges in distribution to N(O,o*z-‘) under the following additional conditions. A6. z is a random vector with mean zero, covariance matrix .E’, and finite absolute third moment. A7. The {e,,) are independent random variables with uniformly bounded absolute third moments and e, is independent of z. Theorem
4. Under all the conditions of Lemma 3 and A6 and A7, fi(/?,, - p) converges in distribution to N(0, a*C-‘) if maxi A,,I = o(n-‘).
Proof. Since &/?=FIA,‘[~~‘T=?~(~-$)(Z-S~)(F?.?/?+~+~)], by Lemma 3(c) and to 3(d), it remains to show that n -“*T?‘(~- $)(I- S,Je converges in distribution N(0,02C). Define iiO, &, to be the dn x d matrices of the form (7) with xi replaced by h, and Zi, respectively. Write .?r(F-
S)(Z-
Sri))) = AT(F-
S)(Z-
SA)e + ZT[(F-
S)(Z-
SJ - f]e+
ZTe.
Since EhT(Z- S,,)(Z- Si)e= 0 and Var(K”*hT(Z-
S,,)(Z-
SJe)
~02B4j~ = o(l),
= n-‘02h~(Z-S~,)(Z-S~)*(Z-S~,)hj we
have
n-“*AT(P-
S)(Z- S,)e = o,(l).
(9)
Let o be any unit d-vector and a=Zu. Then vTZTe= Cy=, aiejn where ai is the i-th element of a. By A6, the Iail are i.i.d. random variables with finite third moments. Observe that
np3’*;il E (,iein~35np3’2j~l
E Iail
supE
le,,13 = o(1)
which means oTZTe satisfies a Lindeberg condition of order 3. By A6 and the law of large numbers, np’ZTZ+ .X. Then by Theorem 9.1 of Chow and Teicher (1978),
n-“2ZTe + N(0, 0~2’) in distribution. By (9) and (lo),
Theorem
4 holds if we can show that for all i,
nm”2z’[(Z- Sk,)(Z- S,) - Z]e -+0 Observe
that _!$[(I-
(LO)
in probability.
SA,)(Z- S,) - Z]e = 0 and
(11)
196
H. Chen, J.-J. H. Shiau / Spline smoothing for partially linear models
Var(z’[(Z-SAJ)(Z-S,) -
Z]e)
= EVar{zTK-%,)(Z-SA)-Zle = U2E(z’[(Z-S~,)(Z-S~)-Z]22;} 5 Mtr(si
+ S:J = O(K”2”
where A4 is a constant.
Again
1z,})
/ zi} +Var(E(z.,T[(Z-Sh,)(Z-Si)-Z]e = 02C7;itr[(Z-Sj,)(Z-S>~)-Z]2 + AI”‘“)
by the Markov
=
0(n”2)
inequality,
(11) holds.
0
Remark 2. Based on Theorem 4, the asymptotic variance of & is n-‘a2C-‘. When is the smallest possible achievable asymp% - N(0, 02), it can be shown that a2F’ totic variance among those estimators of j? which utilize the information of g E W;” only. See Remark 1 in Chen (1988) for further detail. 3.2.
Partial
estimate
regression
In this section, we study the asymptotic behavior of the partial defined in (4) and summarize them in the following theorems. Theorem
5. Suppose the same assumptions
nP’;gl (g’(rj,)-g(tjn))2 when
A =
regression
estimate
as in Lemma 3 hold. Then
= 0p(nP2m’(2m+‘))
n-2’77/(2m + ‘)+
Proof. Define B2(A)=nm1Cr=1 (Eg’(t,,)-g(tj,))2 Observe that
and V(l)=n-‘Cy=’
Var(g,(t,)).
2, = S~[Z-X(XT(Z-S,)2X)-‘Xr(Z-S~)2](g+e). We have
Ef,-g
= -(Z-SA)g-SAX(n-‘XT(Z-SA)2X)-‘n-‘Xr(Z-SA)2g
and
nV(A) = a2[tr S: - 2tr(XT(Z-
SA)2X)P’XT(Z-
SA)2SkX
+ tr(XT(Z-SA)2X)P’XTSjX(XT(Z-Sk)2X)P’XT(Z-Si)4X]. Note that
//(Z-SA)g112= O(nA) by Lemma
l(b) and
lISAX(nm’XT(Z- SA)2X)-‘nm1XT(Z= K’ [n-‘XT(Z. (n-‘XT(ZThen
by the proof flO(log
of Lemma
SA)2g/12
S A)’ glT(n-‘XT(ZSn)2X)P’nP’XT(Z-
SJ2X)-‘(n-‘X’SjfX) S,)2g.
3(c), ~n-1x~(Z-SA)2g/
is bounded
n)[O(nP’Am”2m) + 0(n-3’21.)]“2 + O(A).
above
by
197
H. Chen, J.-J. H. Shiau / @line smoothing for partially linear models
Hence,
EZ2(n)=0(2)
by the above
arguments
and Lemma
3(b), (f), (h) and (i), we have nV(1)=0(X”2’“). n-l i E&Q,,) i=l
-g(l,n))2
By applying the Markov follows easily. 0
3(h) and (i). By Lemma
Hence,
= P(A) + V(n) = O(A + nP’P2m).
inequality
to
n -’ C:=, @?I(tin) -g(t,,))2,
Theorem
6. Under all the conditions of Lemma 3 and A6 and Al, fi(j?r verges in distribution to N(0, a2T’) if A2 = o(n-‘).
Theorem
Proof.
Observe
5
- p) con-
that
p^, -B = (Yry-
s,)2X)-‘X’(z-
S$g+
(XT(Z-
S$X))‘XT(I-
S&?.
Write
n m”2xT(Z- Sj,)% = nm”2HT(Z- SA)2e+ n- “2ZT[(Z-Sj,)2-Z]f?+n~“2ZTe. By (9), (lo), and (1 l), nm”‘XT(ZLemma 3(c),
nm”2XT(ZCombine
Appendix.
the above
Sj,)‘e converges
to N(O,o’_Z)
in distribution.
By
SA)2g = o,(l) + O,(n”‘A).
results
and Lemma
3(h), Theorem
6 holds.
0
Proof of Lemma 3
We first show (a). Note that
n-‘x,‘~(Z- S,<)(Z- S,)(Z- SA,)xJ
Recall that BiJP= n-’ Ck h~J,(~k,~ji/(l $ ~kn~J))2=O(~j) (1 +A,&)5 1, we have
Next,
observe
that
for
hJ E Wzm. By )LknA/
H. Chen, J.-J. H. Shiau / Spline smoothing for partially linear models
198
+AknIIi)+(l
(1 +Aknl;)+AknA(l
KkinLjnl
= ;
(l
+
A,nAi)(l
+
kn
A.+&+ J
+Iknli)(~kn~j)+~kn~j.Akn~
AknAj)(l
+ Aknn>
l+:
kn
kn
(A.11
A.]. I
By A3 and Lemma 2, ;
j &;,&.jn(
(1 + d&-
= o((bg
n)2)o(d’2)
= o(n).
(A.3
(A.2) holds for I. replaced by Ai or ii,. Thus the right hand side of (A.l) equals to o(n). Then by A2, we have (A-3) By the Cauchy-Schwartz .-’
;
inequality, the cross product term,
rkinhkjn
AknAiA’& *
1 + Ak”Ai 1 +
1 f AknA
Ak”Aj
=(O(~j)~ii)“2
=
o(l).
Putting pieces together, we have shown (a). Note that the cross product term will never dominate the rate due to the Cauchy-Schwartz inequality. Therefore, there is no need to obtain the convergence rate for them in the subsequent equalities (b)-(i). Using the same argument, the proof of (b) is straightforward. Set ckn= K”‘C~=, g(rin)~kn(tin). TO show (c), we note that
n-‘$(I-
s,,)(l-
s,& =
.-l
c (
kn
hn
Observe that
l/2 1
(
F
(1 iYkAA)2 n
>
“2*
(A.4)
Note that n-l ck c,$kn< 03 for g E w2nl and that &,&(l +J&,,A)~%l/(1 +AknA). Then by Lemma 2 and A3, the right-hand side of (A.4) is bounded above by $O(log n)(O(n_‘A- “2m) + O(K~‘~A))“* = o(K”~). Also by the Cauchy-Schwartz inequality, we have
which is of the order O((AiA)“‘) by Lemma l(b). Thus (c) is proved. Next, we show (d). Note that
H. Chen,
1..J.H.
Shiau / Spline smoothing
n-‘qT(z- S,,)(Z-
By A3, Lemma
linear models
199
S,)S,,x;
2, and mr2,
= O(n-l(log
for partially
it follows
rl)2(A;-l’2n’
that
+ r?A;‘))
= .(n-*‘2).
Recall that
I .-’ Putting
F lhklnhkJnI
these terms together,
we have shown
We show (e) by observing F lrkinrkjnl
~kn*r ~ “B,;,B,j~ 1 + &,lIi; 1 + &,,A
the following
= O((n~;)“2).
(d).
inequalities:
’
l (i%J 1 $ hknhi 1 + AknA] l/2
1
I
O(log2 n)
c
k
(
c
k
(1 + Ak,,A;12
= 0((~j~j)~“4’“10g2
l
(1 + hk,&)2
)
n)
and
5
(+$&-nB,,BI, =W/l).
$ b%inhkjnI
Thus $&(IS~)2S,,x;=O(n~)+ 0((~;Ij)-“4mlog2 n). (f) is a direct consequence of Lemma 2 since tr s,‘= ck (1 + Ak,,A)m2. TO show (g), we first show that (i) holds. This is shown by noting F k&&I
(1
+lk,?A)-2
= 0(A-1’2’n10g2
n)
and
IhkirthkjrzI (1+AknA)m2
C k
5
F
IhkinhkjnI
-K 2 )I”( - ( Y$ hkin
-$
h&,,)‘12 = O(n).
H. Chen, J.-J.H. Shiau / Spline smoothing for partially linear models
200
The last equality holds since h,(t) and h,(t) are continuous functions with compact domain and hence bounded. Hence n~‘x,rS~~~=O(l). Next, we have X,%].(I-
S&S&X, = O(P2Yog2
II + .m1’2L-’ log2 n + n/P)
by the fact that
=O(log%2)O(P2~+n~“2~-‘) and
Combining these results and (e), we have (g). To show (h), observe that
K’X’(Z- S,)2Xj= n-’ F
(hk;, + rkin)(hk~n + Skjn)
kn (IL”,)
2. kn
We have
nmlF hkinhkp By (A.2),
Hence
I B,i,B,jp
= O(n) = o(l).
we have
(h) holds by A2.
Acknowledgements We thank the referee for comments paper.
which helped improve
the presentation
of the
References Buja, A., T. Hastie
and R. Tibshirani
(1989). Linear
smoothers
and additive
models.
Ann. Statist. 17,
4.53-555. Chen,
H. (1988). Convergence
16, 136-146.
rates for parametric
components
in a partly
linear model.
Ann. Statist.
H. Chen,
J.-J.H.
Chow, Y.S. and H. Teicher Craven, P. and G. Wahba
Shiau / Spline smoothing
(1978). Probability (1979). Smoothing
for partially
linear models
Theory. Springer-Verlag, noisy data with spline
201
New York. functions. Numer.
Math.
31,
Math.
24,
377-403. Demmler,
A. and C. Reisch
(1975).
Oscillation
matrices
with spline
smoothing.
Numer.
375-382. Denby,
L.
(1986).
Laboratories,
Smooth
Princeton,
regression
function.
Statistical
Research
Report
#26,
Engle, R.F., C.W. Granger, J. Rice and A. Weiss (1986). Semiparametric estimates tween weather and electricity sales. J. Alter. Statist. Assoc. 81, 310-320. Eubank,
R.L.
Buja,
and P. Speckman
A., T. Hastie
Eubank, Heckman,
Plann. NE.
(1986).
Shiau,
J. Atmos.
Ann.
to “Linear
Sfatist.
and additive
be-
models”,
by
17, 525-529.
rates for estimation
smoothing
in partly
rates for partially spline estimation
linear
splined
models.
models.
of functions
and D.R.
boundary Ocean.
Johnson
information
Technol.
(1986). Partial
in otherwise
in certain /. Roy.
partially
Statist.
Stafist.
Probab.
with discontinuities. spline models
smooth
linear models.
Assoc.
Left.
Ser. B 48,
4, 203-208.
Tech. Rep. #768,
Dept.
for the inclusion
two and three dimensional
of tropopause
objective
analysis.
for a semiparametric
model.
3, 713-725.
Shiau, J. and Cl. Wahba (1988). Rates of convergence Comm. Sta/is/. 17 (4), 1117-l 133. Speckman,
smoothers
of the relation
Univ. of Wisconsin-Madison.
J., G. Wahba
and frontal
Bell
23, 33-43.
Spline
J. (1985). Smoothing
of Statistics,
Discussion
(1989). Convergence
Inference
2444248. Rice, J. (1986). Convergence Shiau,
(1989).
and R. Tibshirani.
R.L. and P. Whitney
J. Statist.
AT&T
NJ.
P. (1981). The asymptotic
integrated
of some estimators
mean square error for smoothing
noisy data by splines.
Manuscript. Speckman,
P. (1988).
Kernel
smoothing
in partial
linear
models.
J. Roy.
Statist.
Assoc.
Ser. B 50,
413-436. Wahba,
G. (1984).
Partial
spline
models
for the semiparametric
estimation
of functions
of several
variables. In: SfatisticalAnalysis of TimeSeries,312-329. Institute of Statistical Mathematics, Tokyo. Wahba, G. (1986). Partial and interaction splines for the semiparametric estimation of functions of several
variables.
In: T.J.
Boardman,
Ed., Computer
Science and Statistics:
Symposium on the Interface. American Statistical Association, Wellner, J. (1986). Semiparametric models: progress and problems. ISI Centenary
Session.
Center
for Mathematics
and Computer
Proceedings
of the 18th
Washington, DC, 75-80. In: R.D. Gill and M.N. Voors, Eds., Science,
Amsterdam.