Nonparametric estimation of bivariate additive models

Nonparametric estimation of bivariate additive models

Journal of the Korean Statistical Society ( ) – Contents lists available at ScienceDirect Journal of the Korean Statistical Society journal homepa...

340KB Sizes 0 Downloads 115 Views

Journal of the Korean Statistical Society (

)



Contents lists available at ScienceDirect

Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss

Nonparametric estimation of bivariate additive models Young Kyung Lee Kangwon National University, Republic of Korea

article

a b s t r a c t

info

Article history: Received 14 November 2016 Accepted 28 November 2016 Available online xxxx

In this paper we discuss the estimation of a bivariate additive model where the multivariate regression function is expressed as a sum of unknown univariate and bivariate component functions. We discuss the identifiability of the component functions and show that each component function of the model can be estimated at the optimal rate in bivariate kernel smoothing. © 2016 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

AMS 2000 subject classifications: primary 62G07 secondary 62G20 Keywords: Additive models Smooth backfitting Kernel smoothing

1. Introduction Additive modeling is known to be an effective way of avoiding the curse of dimensionality in nonparametric multivariate regression. It arises in the estimation of various nonparametric regression models such as those with time series errors or with repeated measurements. Nonparametric regression modeling with individual effects for panel data also reduces to an additive model. We refer to Mammen, Shienle, and Park (2014) for illustration of its wide applicability in nonparametric estimation. Two earlier methods of estimating additive models are the ordinary backfitting (Opsomer & Ruppert, 1997) and the marginal integration technique (Linton & Nielsen, 1995). The ordinary backfitting method requires strong conditions on the joint density of covariates for the convergence of the corresponding backfitting algorithm and thus for the existence of the estimators, while the marginal integration is not completely free from the curse of dimensionality, see Lee (2004) for example. The most successful technique of fitting additive models is the smooth backfitting proposed and studied by Mammen, Linton, and Nielsen (1999). It was shown that the method yields an estimator that has the optimal univariate rate regardless of the dimension of the covariate. The method and the theory of smooth backfitting have been extended to more complicated models, see Lee, Mammen, and Park (2010, 2012) and Yu, Park, and Mammen (2008) among others. In this paper, we extend the smooth backfitting technique and its theory to bivariate additive models. Specifically, we study the estimation of the model E(Y |X) =

d ∑ j=1

fj (Xj ) +

∑∑

fjj′ (Xj , Xj′ ),

(1.1)

1≤j
where Y is a response variable and X is a d-variate covariate. There were several studies on the above model. For example, Stone (1994) and Stone, Hansen, Kooperberg, and Truong (1997) established some results for methods based on polynomial splines and their tensor products. Huang (1998) extended them to the sieve estimation technique. However, there has been E-mail address: [email protected]. http://dx.doi.org/10.1016/j.jkss.2016.11.004 1226-3192/© 2016 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

Please cite this article in press as: Lee, Y.K., Nonparametric estimation of bivariate additive models. Journal of the Korean Statistical Society (2016), http://dx.doi.org/10.1016/j.jkss.2016.11.004

2

Y.K. Lee / Journal of the Korean Statistical Society (

)



no attempt for developing a smooth backfitting technique and its theory for fitting the model (1.1). We show that the smooth backfitting estimators of the components fj and fjj′ have the optimal bivariate rate Op (n−1/3 ). It is worthwhile to mention here that one cannot estimate the univariate components fj at a better rate than the bivariate n−1/3 . This is because one can only each component function at best at the same rate as one can estimate the whole regression function ∑∑ ∑estimate d ′ ′ f (x) = j=1 fj (xj ) + j


fk (u)wk (u) du = 0,

1 ≤ k ≤ d,

0 1



v

fkk′ (u, v )wk (u) du ≡ 0,

1 ≤ k < k′ ≤ d,

(2.1)

0 1



u

fkk′ (u, v )wk′ (v ) dv ≡ 0,

1 ≤ k < k ′ ≤ d.

0

With these constraints we may express the model (1.1) as follows, introducing a constant f0 . E(Y |X) = f0 +

d ∑

fk (Xk ) +

∑∑

fkk′ (Xk , Xk′ ).

(2.2)

1≤k
k=1

Below in Theorem 2.1 we demonstrate that all the component functions fk and fkk′ are identifiable under the constraints (2.1). For this we assume (A) There exist constants 0 < c < C < ∞ and nonnegative weight functions wk , 1 ≤ k ≤ d, with that c · p(x) ≤

d ∏

∫1 0

wk (xk ) dxk = 1 such

wj (xj ) ≤ C · p(x) for all x ∈ [0, 1]d ,

j=1

where p denotes the joint density function of X. The condition (A) is satisfied if p is bounded away from zero and infinity on [0, 1]d , the latter typically assumed in nonparametric regression problems. The following theorem tells not only that fk and fkk′ are identifiable but also that one can estimate each component function at the same accuracy as one estimates the whole regression function. Theorem 2.1. Suppose that a tuple of functions (gk , gkk′ : 1 ≤ k ≤ d, 1 ≤ k < k′ ≤ d) satisfies the constraints (2.1). Then, under the condition (A) it holds that

⎡ c · C −1 · ⎣g02 +

d ∑

⎤ Egk (Xk )2 +

≤ c −1 · C · ⎣g02 +



Egkk′ (Xk , Xk′ )2 ⎦ ≤ E ⎣g0 +

1≤k
k=1



∑∑

k=1

k=1

⎤2 gk (Xk ) +

∑∑

gkk′ (Xk , Xk′ )⎦

1≤k


d



d ∑

Egk (Xk )2 +

∑∑

Egkk′ (Xk , Xk′ )2 ⎦ .

(2.3)

1≤k
∏d

Proof. We prove the first inequality. The proof of the second inequality is similar. We write w (x) = j=1 wj (xj ). For x ∈ ′ −1 , xj′ +1 , . . . , xd ), respectively. Then, [∫0, 1]d , let x−j and x−jj′ denote (x , . . . , x , x , . . . , x ) and (x , . . . , x , x , . . . , x 1 j − 1 j + 1 d 1 j − 1 j + 1 j ∫ ∫ ∫ w(x) dx−j = wj (xj ) and w(x) dx−jj′ = wj (xj )wj′ (xj′ ). It also holds that p(x) dx−j = pj (xj ) and p(x) dx−jj′ = pjj′ (xj , xj′ ), Please cite this article in press as: Lee, Y.K., Nonparametric estimation of bivariate additive models. Journal of the Korean Statistical Society (2016), http://dx.doi.org/10.1016/j.jkss.2016.11.004

Y.K. Lee / Journal of the Korean Statistical Society (

)



3

where pj and pjj′ denotes the density functions of Xj and (Xj , Xj′ ), respectively. We get

⎡ E ⎣g0 +

d ∑

⎤2 gk (Xk ) +

∑∑

gkk′ (Xk , Xk′ )⎦ ≥ C −1



⎡ ⎣g0 +

1≤k
k=1

⎡ = C −1 ⎣g02 +

d ∫ ∑

≥ c · C −1 ⎣g02 +

gk (xk )2 w (x) dx +

∑∑ ∫

gk (xk )2 p(x) dx +

1≤k
gkk′ (xk , xk′ )⎦

d ∏

wj (xj ) dxj

j=1

⎤ gkk′ (xk , xk′ )2 w (x) dx⎦

d ∑

∑∑ ∫

⎤ gkk′ (xk , xk′ )2 p(x) dx⎦

1≤k
k=1

= c · C −1 · ⎣g02 +

gk (xk ) +

∑∑

1≤k
d ∫ ∑



⎤2

k=1

k=1



d ∑

⎤ Egk (Xk )2 +

∑∑

Egkk′ (Xk , Xk′ )2 ⎦ .

(2.4)

1≤k
k=1

The two inequalities in (2.4) follow from the condition (A) and the first equality from the constraints (2.1). This proves the first inequality of the theorem. □ Theorem 2.1 has another important implication, which serves as a bottommost property for the theory of our proposed method of estimating the component functions. Define Hj (w ) = {g ∈ L2 (w ) : g(x) = g(xj )},

Hjk (w ) = {g ∈ L2 (w ) : g(x) = g(xj , xk )}.

Likewise, define Hj (p) ∫ and Hjk (p) with the joint density p of X, as subspaces of L2 (p). Also, define H˜ j (w) as a subspace of ˜ jk (w) as a subspace of g ∈ Hjk (w) such that g ∈ Hj (w ) such that g(xj )wj (xj ) dxj = 0, and H



xk

xj

g(xj , xk )wj (xj ) dxj ≡ 0 ≡



g(xj , xk )wk (xk ) dxk .

Clearly, Hj ⊂ Hjk . As the spaces of the underlying regression functions, define H(w ) = H12 (w ) + · · · + Hd−1,d (w ),

H(p) = H12 (p) + · · · + Hd−1,d (p).

Under the condition (A), H(w ) = H(p), Hj (w ) = Hj (p) and Hjk (w ) = Hjk (p). It also follows that, writing Hj = Hj (w ), ˜ j = H˜ j (w ) and H˜ jk = H˜ jk (w), Hjk = Hjk (w ), H

˜ j ⊕ R, Hj = H

˜ jk ⊕ H˜ j ⊕ H˜ k ⊕ R. Hjk = H

Thus, we obtain

˜ 1 (w) ⊕ · · · ⊕ H˜ d (w) ⊕ H˜ 12 (w) ⊕ · · · ⊕ H˜ d−1,d (w). H(p) = H(w ) = R ⊕ H

(2.5)

˜ k (w) for 1 ≤ k ≤ d and fll′ ∈ H˜ ll′ (w ) for This and Theorem 2.1 imply that,∑ for any f ∑ ∈∑ H(p), we can find f0 ∈ R, fk ∈ H d ′ 1 ≤ l < l′ ≤ d such that f = f0 + k=1 fk + f , and that, for such component functions, we have 1≤k
(2.6)

This property of H(p) with (2.6) implies the following corollary of Theorem 2.1. Corollary 2.1. Under the condition (A), the space H(p) is closed. Proof. The conclusion of the corollary is an immediate consequence of the application of Part A of Proposition A.4.2 of Bickel, Klaassen, Ritov, and Wellner (1993). □ 3. Estimation of the model We discuss the estimation of the model (2.2). Our method is an extension of the smooth backfitting technique (Mammen et al., 1999), the latter having been developed for fitting additive models without bivariate functions fkk′ . Let Kh (·, ·) denote a kernel function with bandwidth h. We assign the ∫weight Kh (x, u) to an observation u for the local smoothing around x. 1 We use the so-called ‘normalized kernel’ such that 0 K (x, u) dx = 1 for all u ∈ [0, 1]. This is important for the smooth backfitting estimator to have a projection interpretation, which is essential to develop its theory. For a baseline kernel function K supported on a compact set, say [−1, 1], a normalized kernel can be constructed as Kh (x, u) = ∫ 1 0

Kh (x − u) Kh (z − u) dz

· I[0,1]2 (x, u).

Please cite this article in press as: Lee, Y.K., Nonparametric estimation of bivariate additive models. Journal of the Korean Statistical Society (2016), http://dx.doi.org/10.1016/j.jkss.2016.11.004

4

Y.K. Lee / Journal of the Korean Statistical Society (

)



Here and below, we write Kh (u) = K (u/h)/h. Suppose that we observe a random sample {(Xi , Y i ) : 1 ≤ i ≤ n} from the model (2.2). We minimize



n− 1



n ∑

[0,1]d

⎣Y i − g0 −

d ∑

⎤2 gk (xk ) −

gkk′ (xk , xk′ )⎦

1≤k
k=1

i=1

∑∑

d ∏

Kh (xj , Xji ) dx

(3.1)

j=1

over g ∈ H(w ). Since H(w ) is closed as demonstrated by Corollary 2.1,∑ the minimizer ∑ ∑exists, and for the minimizer denoted ˆ ′ ˜ k and fˆkk′ ∈ H˜ kk′ such that fˆ = fˆ0 + dk=1 fˆk + by fˆ its components fˆ0 ∈ R, fˆk ∈ H 1≤k


n−1

0=



n ∑

[0,1]d

⎣Y i −

d ∑

i=1

⎤ fˆk (xk ) −

fˆ0 =

n−1

n ∑

[0,1]d

⎡ ⎣Y i −

i=1

∏d

n−1

0=



n−1

0=

fˆk (xk ) −

[0,1]d−2

, Xji ). Specializing δ to a constant and to a function in each of Hj and Hjj′ , we

∑∑

fˆkk′ (xk , xk′ )⎦ Kh (x, Xi ) dx,

1≤k


d

⎣Y i − fˆ0 −

i=1



n ∑

(3.2)



d ∑ k=1



[0,1]d−1

j=1 Kh (xj



n



fˆkk′ (xk , xk′ )⎦ Kh (x, Xi ) · δ (x) dx

1≤k
k=1

for all δ ∈ H(w ), where Kh (x, Xi ) = get



∑∑

⎣Y i − fˆ0 −

i=1



fˆk (xk ) −

∑∑

k=1

1≤k
d ∑

∑∑

fˆkk′ (xk , xk′ )⎦ Kh (x, Xi ) dx−j ,

(3.3)

⎤ fˆk (xk ) −

fˆkk′ (xk , xk′ )⎦ Kh (x, Xi ) dx−jj′ ,

1≤k
k=1

for 1 ≤ j ≤ d and 1 ≤ j < j ≤ d. Let pˆ j , pˆ jk , pˆ jkl and pˆ jkll′ denote the kernel estimators of the densities pj , pjk , pjkl and pjkll′ ˜ j and m ˜ jj′ be the of Xj , (Xj , Xk ), (Xj , Xk , Xl ) and (Xj , Xk , Xl , Xl′ ), respectively, based on the normalized kernel Kh (·, ·). Also, let m kernel estimators of the marginal regression functions E(Y |Xj = ·) and E(Y |Xj = ·, Xj′ = ·), respectively. For example, ′

n ∑

˜ jj′ (xj , xj′ ) = pˆ jj′ (xj , xj′ )−1 n−1 m

Kh (xj , Xji )Kh (xj′ , Xji′ )Y i .

i=1

Then, by the normalization property that

˜ j (xj ) − fˆ0 − fˆj (xj ) = m

∫ d ∑

1

fˆjk (xj , xk ) 1

∑∑ ∫ k
0

pˆ j (xj )

pˆ j (xj )

1



Kh (x, u) dx = 1 for all u ∈ [0, 1], the first equation in (3.3) is equivalent to

pˆ jk (xj , xk )

pˆ jk (xj , xk )

0

k=j+1



fˆk (xk )

0

0

k̸ =j



1

∑∫

∫1

fˆkk′ (xk , xk′ )

dxk −

dxk j−1 ∫ ∑ k=1

1

fˆkj (xk , xj )

pˆ jkk′ (xj , xk , xk′ ) pˆ j (xj )

0

pˆ jk (xj , xk )

0

dxk dxk′ ,

pˆ j (xj )

dxk

1 ≤ j ≤ d.

(3.4)

To write (3.4) more concisely let πˆ j g for g ∈ H be defined by (πˆ j g)(xj ) = where pˆ (x) = n−1

∫ g(x) · [0,1]d−1

∑n

pˆ (x) pˆ j (xj )

dx−j ,

, Xi ) is the full-dimensional kernel density estimator of the joint density p of X, and πˆ jj′ by

i=1 Kh (x

(πˆ jj′ g)(xj , xj′ ) =

∫ g(x) · [0,1]d−2

pˆ (x) pˆ jj′ (xj , xj′ )

dx−jj′ .

They are the projection operators that maps H to its subspaces Hj (pˆ ) and Hjj′ (pˆ ), respectively. Interpreting fˆk as fˆk (x) = fˆk (xj ) and fˆkk′ as fˆkk′ (x) = fˆkk′ (xk , xk′ ), we can write (3.4) as

⎛ ⎞ j−1 d ∑ ∑ ∑ ∑∑ ˜ j − fˆ0 − πˆ j ⎝ fˆj = m fˆk + fˆjk + fˆkj + fˆkk′ ⎠ , k̸ =j

k=j+1

k=1

1 ≤ j ≤ d.

(3.5)

k
Please cite this article in press as: Lee, Y.K., Nonparametric estimation of bivariate additive models. Journal of the Korean Statistical Society (2016), http://dx.doi.org/10.1016/j.jkss.2016.11.004

Y.K. Lee / Journal of the Korean Statistical Society (

)



5

We note that, although pˆ in the definition of πˆ j involves the full-dimensional kernel density estimator pˆ , the system of equations (3.5) actually involves only up to three-dimensional density estimators. We also can write the second equation in (3.3) as

⎛ ˜ jj′ − fˆ0 − fˆj − fˆj′ − πˆ jj′ ⎝ fˆjj′ = m

∑ k̸ =j,j′

+

j−1 ∑

d ∑

fˆk +

k=j+1,k̸ =j′

fˆkj +

fˆkj′ +

k=1,k̸ =j

k=1

d ∑

fˆj′ k

k=j′ +1





j −1 ∑

fˆjk +

∑∑

fˆkk′ ⎠ ,

1 ≤ j < j ′ ≤ d.

(3.6)

k
′ ′ Here and below, k
∑∑

4. Theoretical properties In this section we prove that the system of equations at (3.5) and (3.6) has a unique solution (fˆj , fˆkk′ : 1 ≤ j ≤ d, 1 ≤ k < k′ ≤ d) with probability tending to one. We also derive the rates of convergence for each component function estimator. The rates depend on the magnitude of the bandwidth. We assume h ≍ n−1/6 that is known to be optimal ∑d for∑the ∑estimation of ′ two-dimensional functions. We note that the accuracy of estimating the regression function f = j=1 fj + 1≤k
˜ jk as follows using single subscripts. For the notational convenience we re-enumerate the Hilbert subspaces H ˜ 12 → H˜ d+1 H

...

˜ d−1,d → H˜ d(d+1)/2 . H

Below we write d = d(d + 1)/2 for simplicity. Let πj and πjk denote projection operators that are defined in the same way ˜ jk as πˆ j and πˆ jk with pˆ , pˆ j and pˆ jk being replaced by the true densities p, pj and pjk , respectively. We re-enumerate πˆ jk , πjk , m ˜ jk . Define and fˆjk in the same way as H ∗

Tˆ = (I − πˆ d∗ )(I − πˆ d∗ −1 ) · · · (I − πˆ 2 )(I − πˆ 1 ) and likewise define T with πˆ j being replaced by πj . Also, define

˜⊕ =m ˜ d∗ + (I − πˆ d∗ )m ˜ d∗ −1 + (I − πˆ d∗ )(I − πˆ d∗ −1 )m ˜ d∗ −2 + · · · + (I − πˆ d∗ ) · · · (I − πˆ 2 )m ˜ 1. m Then, it holds that

˜ ⊕. fˆ = Tˆ fˆ + m

(4.1)

˜ j (p) and Sj+1 (p) = H˜ j+1 (p) + · · · + H˜ d∗ (p). At this point, we note Let θj denote the angle between the two subspaces H ˜ j (p) differs from H˜ j (w) only in the inner products they are endowed with. The subspaces H˜ j (p) for different j may that H ˜ j (w) are. This means that sin(θj ) may not equal one. However, sin(θj ) > 0 for all not be orthogonal to each other, while H 1 ≤ j ≤ d∗ − 1 according to Part D of Proposition A.4.2 of Bickel et al. (1993) and Corollary 2.1. Thus, from Smith, Solomon, and Wagner (1977) it follows that d∗ −1

∥T ∥ ≡ sup {∥Tg ∥p : g ∈ H(p) with ∥g ∥p = 1} ≤ 1 −



sin2 (θj ) < 1.

(4.2)

j=1

Now, from the standard theory of kernel smoothing, the kernel estimators of the joint densities of up to four covariates converge to the corresponding true densities uniformly on [2h, 1 − 2h]k for 1 ≤ k ≤ 4, and those of the joint densities of up to two covariates are bounded away from zero with probability tending to one. This implies that ∥πˆ j − π ∥p → 0 Please cite this article in press as: Lee, Y.K., Nonparametric estimation of bivariate additive models. Journal of the Korean Statistical Society (2016), http://dx.doi.org/10.1016/j.jkss.2016.11.004

6

Y.K. Lee / Journal of the Korean Statistical Society (

)



in probability for each 1 ≤ j ≤ d∗ , so that ∥Tˆ − T ∥ → 0 in probability. From (4.1) and (4.2) we obtain the following theorem, which proves the existence and the uniqueness of the solution of the smooth backfitting equation represented by (3.5) and (3.6). Theorem 4.1. Assume that the joint density p is bounded away from zero and infinity on [0, 1]d and the (joint) densities pi , pij , pijk and pijkl for 1 ≤ i < j < k < l ≤ d are continuous on [0, 1]k for 1 ≤ k ≤ 4, respectively. Assume also that the baseline kernel function K is bounded, has compact support [−1, 1], is symmetric about zero and Lipschitz continuous, and that h → 0 and nh4 /∑ log n → ∞ as n → ∞. Then, with probability tending to one, the solution of Eq. (4.1) exists and is uniquely given by ∞ ˆl ˜ fˆ = l=0 T m⊕ . 4.2. Rates of convergence of the component estimators In this section, we derive the rates at which the estimators converge to their corresponding true functions. For this we first find a stochastic expansion of the estimator fˆ of the sum function f , and then those of the individual component functions using the constraints (2.1). For a stochastic expansion of fˆ , it is convenient to express the model (2.2) as

∑∑

E(Y |X) =

∗ fkk ′ (Xk , Xk′ ).

1≤k
The corresponding backfitting equation for this representation is

⎛ ˜ jj′ − πˆ jj′ ⎝ fˆjj∗′ = m

⎞ ∑∑

∗ ⎠ fˆkk . ′

(4.3)

(k,k′ )̸ =(j,j′ )

′ ′ Here and below, (k,k′ )̸ =(j,j′ ) denotes the summation over all indices 1 ≤ k < k ≤ d such that (k, k ) as an ordered pair does not equal to (j, j′ ). Thus,

∑∑

∑∑

d ∑

∗ fˆkk ′ =

(k,k′ )̸ =(j,j′ )

fˆjk∗′ +

k′ =j+1,̸ =j′

d ∑

fˆj′∗k′ +

k′ =j′ +1

j−1 ∑



fˆkj∗ +

k=1

j −1 ∑ k=1,̸ =j

fˆkj∗′ +

∑∑

∗ fˆkk ′.

k
Define

µl (u) = h−l

1



(v − u)l Kh (u, v ) dv,

l ≥ 0.

0

We note that µ1 (u) = 0 if u ∈ [2h, 1 − 2h] and that supu∈[0,1] |µl (u)| = O(1) for all l ≥ 0. Let

µ1 (xj ) ∂ ∗ · f ′ (xj , xj′ ), µ0 (xj ) ∂ xj jj µ1 (xj′ ) ∂ ∗ · f ′ (xj , xj′ ). bRjj′ (xj , xj′ ) = h · µ0 (xj′ ) ∂ xj′ jj

bLjj′ (xj , xj′ ) = h ·

Also, let ε i = Y i − E(Y i |Xi ) and define f˜jjA′ (xj , xj′ ) = pˆ jj′ (xj , xj′ )−1 n−1

n ∑

Kh (xj , Xji )Kh (xj′ , Xji′ ) ε i .

i=1

Theorem 4.2. Assume the conditions of Theorem 4.1. Assume also that pi , pij , pijk and pijkl are (partially) continuously differentiable, that fjk∗ are twice partially continuously differentiable and that E|Y |α < ∞ for some α > 5/2. If h ≍ n−1/6 , then it holds that

∑∑(

fˆjj∗′ − fjj∗′ − f˜jjA′ − bLjj′ − bRjj′

)

= Op (n−1/3 )

1≤j
uniformly for x ∈ [0, 1]d . We discuss∑ the of the above theorem. Let I0 = [2h, 1 − 2h] denote the interior region on [0, 1]. The theorem ∑implications ˆ∗ tells that fˆ = j


log n),

sup |fˆ (x) − f (x)| = Op (n−1/6 ).

(4.4)

x∈[0,1]d

This follows from the facts that µ1 (xj ) = 0 for xj ∈ I0 and that ∥f˜jjA′ ∥∞ = Op (n−1/3 over the unit interval [0, 1].



log n), where ∥ · ∥∞ denotes the sup-norm

Please cite this article in press as: Lee, Y.K., Nonparametric estimation of bivariate additive models. Journal of the Korean Statistical Society (2016), http://dx.doi.org/10.1016/j.jkss.2016.11.004

Y.K. Lee / Journal of the Korean Statistical Society (

)



7

For the individual component estimators fˆjj′ , we observe that we can write fˆjj′ (xj , xj′ ) − fjj′ (xj , xj′ ) = f˜jjA′ (xj , xj′ ) + bLjj′ (xj , xj′ ) + bRjj′ (xj , xj′ ) + δj (xj ) + δj′ (xj′ ) + cjj′ + Op (n−1/3 )

(4.5)

for some (random) constant cjj′ and (random) univariate functions δj and δj′ . Integrating both sides of (4.5) after multiplication of the weights wj (xj ) and wj′ (xj′ ) we get 1



δj (xj )wj (xj ) dxj −

cjj′ = − 0

1



δj′ (xj′ )wj′ (xj′ ) dxj′ + Op (n−1/3 )

0

because of the constraints (2.1). Here, we have used the fact that 1

∫ 0 1



bLjj′ (xj , xj′ )wj (xj ) dxj = O(n−1/3 ),

bRjj′ (xj , xj′ )wj′ (xj′ ) dxj′ = O(n−1/3 ),

0 1

∫ 0 1



(4.6)

(



)

(



)

f˜jjA′ (xj , xj′ )wj (xj ) dxj = Op n−5/12

f˜jjA′ (xj , xj′ )wj′ (xj′ ) dxj′ = Op n−5/12

log n , log n

0

uniformly for xj , xj′ ∈ [0, 1]. Thus, it holds that fˆjj′ (xj , xj′ ) − fjj′ (xj , xj′ ) = f˜jjA′ (xj , xj′ ) + bLjj′ (xj , xj′ ) + bRjj′ (xj , xj′ ) + δ˜ j (xj ) + δ˜ j′ (xj′ ) + Op (n−1/3 )

(4.7)

uniformly for xj , xj′ ∈ [0, 1] for some (random) univariate functions δ˜ j and δ˜ j′ such that 1



δ˜j (xj )wj (xj ) dxj =

1



δ˜j′ (xj′ )wj′ (xj′ ) dxj′ = 0.

(4.8)

0

0

Integrating both sides of (4.7) again with respect to xj after multiplication of the weight wj (xj ), we get from (4.6) to (4.8) that supx ′ ∈[0,1] |δ˜ j′ (xj′ )| = Op (n−1/3 ). Similarly, we obtain supxj ∈[0,1] |δ˜ j (xj )| = Op (n−1/3 ). This proves j

fˆjj′ (xj , xj′ ) − fjj′ (xj , xj′ ) = f˜jjA′ (xj , xj′ ) + bLjj′ (xj , xj′ ) + bRjj′ (xj , xj′ ) + Op (n−1/3 ) uniformly for xj , xj′ ∈ [0, 1]. The stochastic expansions of the univariate component estimators fˆj are obtained by similar arguments. We get from (4.4) that fˆj (xj ) − fj (xj ) = κj (xj ) + cj + Op (n−1/3 )

(4.9)

uniformly for xj ∈ [0, 1] for some (random) constant cj and (random) univariate function κj such that sup |κ (xj )| = Op (n−1/3

sup |κ (xj )| = Op (n−1/6 ).



log n),

xj ∈[0,1]

xj ∈I0

Integrating both sides of (4.9) after multiplication of the weight wj (xj ) gives 1

∫ cj = −

κj (xj )wj (xj ) dxj + Op (n−1/3 ).

0

Thus, we obtain fˆj (xj ) − fj (xj ) = κ˜ j (xj ) + Op (n−1/3 ) uniformly for xj ∈ [0, 1] for some (random) univariate function κ˜ j such that sup |κ˜ (xj )| = Op (n−1/3

sup |κ˜ (xj )| = Op (n−1/6 ).



log n),

xj ∈I0

xj ∈[0,1]

Corollary 4.1. Under the conditions of Theorem 4.2, it holds that fˆjj′ (xj , xj′ ) − fjj′ (xj , xj′ ) = f˜jjA′ (xj , xj′ ) + bLjj′ (xj , xj′ ) + bRjj′ (xj , xj′ ) + Op (n−1/3 ) uniformly for xj , xj′ ∈ [0, 1]. Furthermore, sup |fˆj (xj ) − fj (xj )| = Op (n−1/3



log n),

xj ∈I0

sup |fˆj (xj ) − fj (xj )| = Op (n−1/6 ).

xj ∈[0,1]

Please cite this article in press as: Lee, Y.K., Nonparametric estimation of bivariate additive models. Journal of the Korean Statistical Society (2016), http://dx.doi.org/10.1016/j.jkss.2016.11.004

8

Y.K. Lee / Journal of the Korean Statistical Society (

)



A direct implication of the above corollary is as follows. Since supxj ,x ′ ∈[0,1] |f˜jjA′ (xj , xj′ )| = Op (n−1/3 j



log n) and bLjj′ (xj , xj′ ) =

0 and bRjj′ (xj , xj′ ) = 0 for (xj , xj′ ) ∈ I0 × [0, 1] and for (xj , xj′ ) ∈ [0, 1] × I0 , respectively, the corollary implies that sup |fˆjj′ (xj , xj′ ) − fjj′ (xj , xj′ )| = Op (n−1/3



xj ,xj′ ∈I0

sup

xj ,xj′ ∈[0,1]

log n),

|fˆjj′ (xj , xj′ ) − fjj′ (xj , xj′ )| = Op (n−1/6 ).

5. Proof of Theorem 4.2 Define f˜jjB′ (xj , xj′ ) = pˆ jj′ (xj , xj′ )−1 n−1

n ∑

(

)

Kh (xj , Xji )Kh (xj′ , Xji′ ) fjj∗′ (Xji , Xji′ ) − fjj∗′ (xj , xj′ ) ,

i=1

fkC|jj′ (xj

˜

, xj′ ) = pˆ jj′ (xj , xj′ ) n

−1 −1

n ∑

Kh (xj , Xji )Kh (xj′ , Xji′ ) ×

f˜kD|jj′ (xj , xj′ ) = pˆ jj′ (xj , xj′ )−1 n−1 E ˆ jj′ (xj , xj′ )−1 n−1 f˜kk ′ |jj′ (xj , xj′ ) = p 1



Kh (xj , Xji )Kh (xj′ , Xji′ ) ×

1

Kh (v, Xki ) fkj∗ (Xki , Xji ) − fkj∗ (v, xj ) dv,

(

)

0

Kh (xj , Xji )Kh (xj′ , Xji′ )

∗ i i ∗ Kh (v, Xki )Kh (w, Xki ′ ) fkk ′ (Xk , Xk′ ) − fkk′ (v, w ) dv dw.

(

× 0

)

i=1

1



(



i=1

n ∑

Kh (v, Xki ) fjk∗ (Xji , Xki ) − fjk∗ (xj , v ) dv, 0

i=1

n ∑

1



)

0

Then, from (4.3) it holds that fˆjj∗′ = fjj∗′ + f˜jjA′ + f˜jjB′ +

d ∑

f˜ C′

k |jj′

k′ =j+1,̸ =j′

d ∑

+

f˜ C′

k |jj′

⎛ +

∑∑

E f˜kk ˆ jj′ ⎝ ′ |jj′ + π

k
+

k′ =j′ +1



f˜kD|jj′ +

j −1 ∑





∗ ⎠ − πˆ jj′ ⎝ fkk ′

f˜kD|jj′

k=1,̸ =j

k=1

⎞ ∑∑

j−1 ∑

∑∑

∗ ⎠ . fˆkk ′

(5.1)

(k,k′ )̸ =(j,j′ )

(k,k′ )̸ =(j,j′ )

We claim that, for each given pair (j, j′ ) such that 1 ≤ j < j′ ≤ d,

) ( √ ∥πˆ jj′ (f˜kkA ′ )∥∞ = Op n−5/12 log n , ∥f˜jjB′ − bLjj′ − bRjj′ ∥∞  ( ) ˜C  fk|jj′ − πˆ jj′ bLjk + bRjk  ∞  ( L ) ˜C R  ′ fk|j′ j − πˆ jj bj′ k + bj′ k  ∞  ( L ) ˜D R  fk|jj′ − πˆ jj′ bkj + bkj  ∞  ( ) ˜D  fk|j′ j − πˆ jj′ bLkj′ + bRkj′   ∞ ˜E  fkk′ |jj′  ∞

(k, k′ ) ̸ = (j, j′ ),

(5.2)

= Op (n−1/3 ),

(5.3)

= Op (n−1/3 ),

j + 1 ≤ k ≤ d, k ̸ = j′ ,

(5.4)

= Op (n−1/3 ),

j′ + 1 ≤ k ≤ d,

(5.5)

= Op (n−1/3 ),

1 ≤ k ≤ j − 1,

(5.6)

= Op (n−1/3 ),

1 ≤ k ≤ j′ − 1, k ̸ = j,

(5.7)

= Op (n−1/3 ),

k, k′ ̸ = j, j′ .

(5.8)

Eq. (5.1) and the approximations (5.2)–(5.8) give

⎛ fˆjj∗′ − fjj∗′ − f˜jjA′ − bLjj′ − bRjj′ = δjj′ − πˆ jj′ ⎝

⎞ ∑∑ (

)

∗ ∗ L R ˜A fˆkk ′ − fkk′ − fkk′ − bkk′ − bkk′ ⎠ ,

(k,k′ )̸ =(j,j′ )

ˆ+ = where δjj′ are some stochastic bivariate functions such that ∥δjj′ ∥∞ = Op (n−1/3 ). Let ∆ bRjj′ ). Then, we can write

∑∑

1≤j
(fˆjj∗′ − fjj∗′ − f˜jjA′ − bLjj′ −

ˆ + = Tˆ ∆ ˆ + + δ˜⊕ , ∆ Please cite this article in press as: Lee, Y.K., Nonparametric estimation of bivariate additive models. Journal of the Korean Statistical Society (2016), http://dx.doi.org/10.1016/j.jkss.2016.11.004

Y.K. Lee / Journal of the Korean Statistical Society (

)



9

−1/3 ˜′ ˜′ ˜′ ′ where δ˜ ⊕ (x) = ). Considering a 1≤j
∑∑

πˆ

A jj′ (fjk′ )(xj

˜

,

xj′ )

= pˆ jj′ (xj , xj′ )−1 n−1

n ∑

ε · i

,

Kh (xj Xji )

1



Kh (xk′ , Xki ′ ) ·

×

pˆ jj′ k′ (xj , xj′ , xk′ ) pˆ jk′ (xj , xk′ )

0

i=1

dxk′ .

(5.9)

By the standard theory of kernel smoothing, it holds that

( ) √ ∥πˆ jj′ (f˜jkA′ )∥∞ = Op n−1/2 log n · κn , where

κn2 = sup n−1 xj ∈[0,1]

n ∑

Kh (xj , Xji )2

1

[∫

Kh (xk′ , Xki ′ ) ·

pˆ jj′ k′ (xj , xj′ , xk′ ) pˆ jk′ (xj , xk′ )

0

i=1

]2 dxk′

= Op (h ). −1

This proves (5.2) for the case where k = j but k′ ̸ = j′ . Other cases with (k, k′ ) ̸ = (j, j′ ) may be proved similarly. This completes the proof of (5.2). To prove (5.3), we note that there exists a constant C > 0 such that

⏐ ⏐ ∂ ∂ ∗ ⏐ ∗ ⏐ fjj′ (xj , xj′ )⏐ ≤ Ch2 ⏐fjj′ (u, v ) − fjj′ (xj , xj′ ) − (u − xj ) fjj∗′ (xj , xj′ ) − (v − xj′ ) ∂ xj ∂ xj′

(5.10)

for all (u, v ) ∈ [xj − h, xj + h] × [xj′ − h, xj′ + h]. This implies f˜jjB′ (xj , xj′ ) = pˆ jj′ (xj , xj′ )−1

1



(

Kh (xj , u)Kh (xj′ , v ) fjj∗′ (u, v ) − fjj∗′ (xj , xj′ ) × pjj′ (u, v ) du dv + Op n−5/12

(

)



log n

)

0

=

pjj′ (xj , xj′ )

[

pˆ jj′ (xj , xj′ )

h · µ1 (xj )µ0 (xj′ )

] ∂ ∗ ∂ ∗ fjj′ (xj , xj′ ) + h · µ0 (xj )µ1 (xj′ ) fjj′ (xj , xj′ ) + Op (n−1/3 ) ∂ xj ∂ xj′

uniformly for xj , xj′ ∈ [0, 1]. Since pˆ jj′ (xj , xj′ ) = pjj′ (xj , xj′ ) · µ0 (xj )µ0 (xj′ ) + Op (n−1/6 ) uniformly for xj , xj′ ∈ [0, 1], we get f˜jjB′ (xj , xj′ ) = bLjj′ (xj , xj′ ) + bRjj′ (xj , xj′ ) + Op (n−1/3 ) uniformly for xj , xj′ ∈ [0, 1]. This completes the proof of (5.3). C ,1 C ,2 Now, we prove (5.4). We decompose f˜kC|jj′ into f˜k|jj′ + f˜k|jj′ , where C ,1 f˜k|jj′ (xj , xj′ ) = pˆ jj′ (xj , xj′ )−1 n−1

n ∑

,

Kh (xj Xji )Kh (xj′

,

Xji′ )

C ,2

f˜k|jj′ (xj , xj′ ) = pˆ jj′ (xj , xj′ )−1 n−1 For the approximation of ˜

(

)

0

Kh (xj , Xji )Kh (xj′ , Xji′ ) ×

1



Kh (v, Xki ) fjk∗ (xj , Xki ) − fjk∗ (xj , v ) dv.

(

)

0

i=1 C ,1 fk|jj′ ,

Kh (v, Xki ) fjk∗ (Xji , Xki ) − fjk∗ (xj , Xki ) dv,

×

i=1

n ∑

1



we observe that

] ⏐ ∂ ∗ ⏐ i i E , Kh (v, · − xj ) fjk (xj , Xk ) dv ⏐ Xj′ = z ∂ xj 0 ∫ 1 pjj′ k (xj , z , v ) = bLjk (xj , v )µ0 (xj )µ0 (v ) dv + Op (n−1/3 ) [

Kh (xj Xji )



1

Xki )

(Xji

pj′ (z)

0

uniformly for xj , z ∈ [0, 1]. This gives C ,1

f˜k|jj′ (xj , xj′ ) = pˆ jj′ (xj , xj′ )−1 n−1

n ∑

Kh (xj′ , Xji′ )

= pˆ jj′ (xj , xj′ ) n

−1 −1

bLjk (xj , v )µ0 (xj )µ0 (v ) × 0

i=1 n ∑

1



,

Kh (xj′ Xji′ )

i=1



1

pjj′ k (xj , Xji′ , v ) pj′ (Xji′ )

(

bLjk (xj , v ) × E Kh (xj , Xji )Kh (v, Xki )|Xji′

)

dv + Op (n−1/3 ) dv + Op (n−1/3 )

(5.11)

0

uniformly for xj , xj′ ∈ [0, 1]. The second equation of (5.11) follows from the fact that

[

⏐ ⏐

]

E Kh (xj , Xji )Kh (v, Xki ) ⏐ Xji′ = µ0 (xj )µ0 (v )

pjj′ k (xj , Xji′ , v ) pj′ (Xji′ )

+ Op (n−1/6 )

Please cite this article in press as: Lee, Y.K., Nonparametric estimation of bivariate additive models. Journal of the Korean Statistical Society (2016), http://dx.doi.org/10.1016/j.jkss.2016.11.004

10

Y.K. Lee / Journal of the Korean Statistical Society (

)



uniformly for xj , v ∈ [0, 1] and for 1 ≤ i ≤ n. On the other hand, we also find that pˆ jj′ (xj , xj′ )−1

1



bLjk (xj , v )pˆ jj′ k (xj , xj′ , v ) dv 0

= pˆ jj′ (xj , xj′ )−1 n−1

n ∑

Kh (xj′ , Xji′ )

bLjk (xj , v )Kh (xj , Xji )Kh (v, Xki ) dv 0

i=1

= pˆ jj′ (xj , xj′ )−1 n−1

1



n ∑

Kh (xj′ , Xji′ )

1



(

)

(

bLjk (xj , v )E Kh (xj , Xji )Kh (v, Xki )|Xji′ dv + Op n−5/12



log n

)

(5.12)

0

i=1

uniformly for xj , xj′ ∈ [0, 1]. The approximations (5.11) and (5.12) imply C ,1

f˜k|jj′ (xj , xj′ )(xj , xj′ ) = πˆ jj′ bLjk (xj , xj′ ) + Op (n−1/3 )

(

)

uniformly for xj , xj′ ∈ [0, 1]. Similarly, we get C ,2

f˜k|jj′ (xj , xj′ )(xj , xj′ ) = πˆ jj′ bRjk (xj , xj′ ) + Op (n−1/3 )

(

)

uniformly for xj , xj′ ∈ [0, 1], which concludes the proof of (5.4). This completes the proof of Theorem 4.2. Acknowledgments This work was supported by the 2016 Research Grant from Kangwon National University (No. 520150418) and also by the National Research Foundation of Korea (NRF) grant funded by the Korea government MSIP (NRF-2015R1A2A2A01005039). References Bickel, P. J., Klaassen, C. A. J., Ritov, Y., & Wellner, J. A. (1993). Efficient and adaptive estimation for semiparametric models. John Hopkins University Press. Huang, J. (1998). Projection estimation in multiple regression with application to functional ANOVA models. The Annals of Statistics, 26, 242–272. Lee, Y. K. (2004). On marginal integration method in nonparametric regression. Journal of the Korean Statistical Society, 33, 435–448. Lee, Y. K., Mammen, E., & Park, B. U. (2010). Backfitting and smooth backfitting for additive quantile models. The Annals of Statistics, 38, 2857–2883. Lee, Y. K., Mammen, E., & Park, B. U. (2012). Flexible generalized varying coefficient regression models. The Annals of Statistics, 40, 1906–1933. Linton, O., & Nielsen, J. P. (1995). A kernel method of estimating structured nonparametric regression based on marginal integration. Biometrika, 82, 93–100. Mammen, E., Linton, O., & Nielsen, J. P. (1999). The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. The Annals of Statistics, 27, 1443–1490. Mammen, E., Shienle, M., & Park, B. U. (2014). Additive models: Extensions and related models. In Oxford Handbook of Applied Nonparametric and SemiParametric Econometrics. Oxford University Press. Opsomer, J.-D., & Ruppert, D. (1997). Fitting a bivariate additive model by local polynomial regression. Annals of Statististics, 25, 186–211. Smith, K. T., Solomon, D. C., & Wagner, S. L. (1977). Practical and mathematical aspects of the problem of reconstructing objects from radiographs. American Mathematical Society. Bulletin, 83, 1227–1270. Stone, C. J. (1994). The use of polynomial splines and their tensor products in multivariate function estimation (with discussion). The Annals of Statistics, 22, 118–171. Stone, C. J., Hansen, M., Kooperberg, C., & Truong, Y. (1997). Polynomial splines and their tensor products in extended linear modeling (with discussion). The Annals of Statistics, 25, 1371–1470. Yu, K., Park, B. U., & Mammen, E. (2008). Smooth backfitting in generalized additive models. The Annals of Statistics, 36, 228–260.

Please cite this article in press as: Lee, Y.K., Nonparametric estimation of bivariate additive models. Journal of the Korean Statistical Society (2016), http://dx.doi.org/10.1016/j.jkss.2016.11.004