Estimation and inference for additive partially nonlinear models

Estimation and inference for additive partially nonlinear models

Journal of the Korean Statistical Society ( ) – Contents lists available at ScienceDirect Journal of the Korean Statistical Society journal homepa...

418KB Sizes 0 Downloads 107 Views

Journal of the Korean Statistical Society (

)



Contents lists available at ScienceDirect

Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss

Estimation and inference for additive partially nonlinear models Xiaoshuang Zhou a,∗ , Peixin Zhao b , Zehui Liu c a

College of Mathematical Sciences, Dezhou University, Dezhou, Shandong 253023, China

b

College of Mathematics and Statistics, Chongqing Technology and Business University, Chongqing 400067, China

c

School of Finance and Statistics, East China Normal University, Shanghai, 200241, China

article

info

Article history: Received 19 March 2015 Accepted 18 February 2016 Available online xxxx AMS 2000 subject classifications: primary 62G05 secondary 62H10 Keywords: Additive partially nonlinear model Profile nonlinear least-squares estimation Empirical likelihood Confidence region

abstract In this paper, we extend the additive partially linear model to the additive partially nonlinear model in which the linear part of the additive partially linear model is replaced by a nonlinear function of the covariates. A profile nonlinear least squares estimation procedure for the parameter vector in nonlinear function and the nonparametric functions of the additive partially nonlinear model is proposed and the asymptotic properties of the resulting estimators are established. Furthermore, we apply the empirical likelihood method to the additive partially nonlinear model. An empirical likelihood ratio for the parameter vector and a residual adjusted empirical likelihood ratio for the nonparametric functions have been proposed. Wilks phenomenon is proved and the confidence regions for the parametric vector and the nonparametric functions are constructed. Some simulations have been conducted to assess the performance of the proposed estimating procedures. The results have demonstrated that both the procedures perform well in finite samples. Compared with the results from the empirical likelihood method with those from the profile nonlinear least squares method, the empirical likelihood method performs better in terms of coverage probabilities and average widths of confidence bands. © 2016 Published by Elsevier B.V. on behalf of The Korean Statistical Society.

1. Introduction In the last three decades, semiparametric models have received much attention because they keep the explanatory power of the parametric models and the flexibility of the nonparametric models. One of the most important semiparametric models is the additive partially linear model which takes the form of

Y = XT β +

D 

fd (Zd ) + ε,

d=1



Corresponding author. E-mail address: [email protected] (X. Zhou).

http://dx.doi.org/10.1016/j.jkss.2016.02.001 1226-3192/© 2016 Published by Elsevier B.V. on behalf of The Korean Statistical Society.

(1.1)

2

X. Zhou et al. / Journal of the Korean Statistical Society (

)



where Y is the response variable, X ∈ Rp and Z = (Z1 , Z2 , . . . , ZD )T ∈ RD are covariates, β = (β1 , . . . , βp )T is a pdimensional vector of unknown parameters, f1 (·), . . . , fD (·) are unknown smooth functions, ε is the random error with conditional mean zero given X and Z . The model assumes that the relationship between the response variable Y and its associate covariate X is linear. However, the linear relationship is not complex enough to capture the underlying relationship between the response variable Y and the covariate X in many practical situations. To solve the problem, in this paper we extend the additive partially linear model to the so-called additive partially nonlinear model which can be described as

Y = g (X , β) +

D 

fd (Zd ) + ε,

(1.2)

d=1

where g (·, ·) is a pre-specified nonlinear function. Note that X and β in model (1.2) do not necessarily have the same dimension. The model retains the flexibility of the additive model and the interpretability of the nonlinear regression model. Model (1.2) includes many important statistical models as its special cases. For instance, it reduces to the additive model when g (·, ·) = 0. Recently, a variety of algorithms such as the backfitting algorithm (Hastie & Tibshirani, 1990; Opsomer & Ruppert, 1999), marginal integration method (Linton & Nielsen, 1995) have been proposed for estimating the nonparametric functions in the additive model. Opsomer and Ruppert (1997) studied the bivariate additive model by the local polynomial regression method. Moreover, if g (X , β) = X T β , the additive partially nonlinear model becomes the additive partially linear model (1.1). Model (1.1) has been extensively studied in the literature. Besides the above mentioned backfitting algorithm and marginal integration method, Li (2000) proposed the series estimation method for estimating the parameter vector β in model (1.1). Liang, Thurston, and Ruppert (2008) proposed a correction-for-attenuation estimator for β in model (1.1) when X is measured with error. Guo, Tang, and Tian (2013) and Liu, Wang, and Liang (2011) studied the variable selection issues in model (1.1). Furthermore, based on the empirical likelihood method, Liang, Su, and Thurston (2009), Wang, Chen, and Lin (2010) and Wei, Luo, and Wu (2012) considered the construction of the confidence region of β in model (1.1) when the covariate X is measured with additive errors. Zhao and Xue (2013) constructed the confidence intervals of the nonparametric components when the linear covariate X is measured with and without errors. Zhou, Zhao, and Lin (2014) applied the empirical likelihood method to the longitudinal additive partially linear errors-in-variables model. A special case of model (1.2) with D = 1, the partially nonlinear model, has gained much attention in recent years. Li and Nie (2007) proposed an estimation procedure for β through a nonlinear mixed-effects approach. Furthermore, Li and Nie (2008) developed two estimation procedures via profile nonlinear least squares and linear approximation approach. Meanwhile, Huang and Chen (2008) considered the spline profile least square estimator of β when the nonparametric function was approximated by some graduating functions. Xiao, Tian, and Li (2014) applied the empirical likelihood method to the inference for the parameter vector and the nonparametric function. When there are no nonparametric functions f1 (·), . . . , fD (·), model (1.2) turns into the well known nonlinear regression model. The estimation and inference procedures of the nonlinear regression model can be referred to Bates and Watts (1988) and other relevant references. In this paper, we are concerned with the estimation and inference issue for the additive partially nonlinear model (1.2). Firstly, motivated by Li and Nie (2008), we propose a profile nonlinear least squares estimation procedure for the parameter vector β and the nonparametric functions in model (1.2), establish the asymptotic properties of the resulting estimators. Specifically, we first rewrite model (1.2) as an additive model by pretending β to be known. By means of the backfitting algorithm, we get the pseudo-backfitting estimators of the nonparametric functions, then the nonlinear least squares approach is used to estimate the parameter vector β by replacing the nonparametric functions in model (1.2) with their pseudo-backfitting estimators. Secondly, we apply the empirical likelihood method to construct the confidence region for the parameter vector β and the nonparametric functions. It is well known that the empirical likelihood method which was introduced by Owen (1988, 1990) enjoys many nice features in the construction of confidence region, for example, it does not impose prior constraints on the shape of confidence region, it does not require the construction of a pivotal quantity and it does not involve a plug-in estimator for the asymptotic covariance matrix. For these reasons, the empirical likelihood method has found many applications in many kinds of models such as linear models (Owen, 1991; Wang & Rao, 2002), partially linear models (Li & Xue, 2008; Shi & Lau, 2000; Xue & Xue, 2011), varying-coefficient partially linear models (Huang & Zhang, 2009; Wang, Li, & Lin, 2011; Wei & Mei, 2012; Zhao & Xue, 2009), additive partially linear models (Wang et al., 2010; Wei et al., 2012; Zhou et al., 2014) and so on. Finally, we conduct some simulations to assess the performance of the proposed nonlinear least squares estimation procedure and the empirical likelihood method. The rest of this paper is organized as follows. In Section 2, we introduce the profile nonlinear least squares procedure for the parameter vector β and the nonparametric functions, and study the relevant asymptotic properties. In Section 3, we apply the empirical likelihood method to the parameter vector β and the nonparametric functions. The empirical log-likelihood ratio for β is defined and its asymptotic distribution is derived. The corresponding confidence region for β is constructed. Furthermore, the residual adjusted empirical log-likelihood ratio for the nonparametric functions is developed and the corresponding confidence region is constructed. Section 4 provides examples based on simulated data, a comparison between the proposed nonlinear least squares estimation approach and the empirical likelihood method is performed in terms of coverage accuracy and areas widths of confidence regions bands. Concluding comments are given in Section 5 and the proofs of the main results are presented in the Appendix.

X. Zhou et al. / Journal of the Korean Statistical Society (

)



3

2. Profile nonlinear least squares inferences 2.1. Profile nonlinear least squares estimators For notational simplicity, we only consider the case of D = 2 in model (1.2). Similar to the arguments in Liang et al. (2008), the procedures and the relevant asymptotic results can be easily extended to the case of D > 2. To ensure identifiability of the nonparametric functions, we assume that E (f1 (Z1 )) = E (f2 (Z2 )) = 0. Suppose the observed data {Yi , Xi , Zi1 , Zi2 , i = 1, . . . , n} are generated from the following model Yi = g (Xi , β) + f1 (Zi1 ) + f2 (Zi2 ) + εi ,

i = 1 , . . . , n,

(2.1)

where ε1 , . . . , εn are independent and identically distributed random errors with zero mean and finite variance σ . If the true parameter vector β is known previously, the above model (2.1) can be rewritten as 2

Yi − g (Xi , β) = f1 (Zi1 ) + f2 (Zi2 ) + εi ,

i = 1, . . . , n.

(2.2)

Denote Y = (Y1 , . . . , Yn )T ,

ε = (ε1 , . . . , εn )T , g(X , β) = (g (X1 , β), . . . , g (Xn , β))T , T Z1 = (Z11 , . . . , Zn1 ) , Z2 = (Z12 , . . . , Zn2 ) , F1 = (f1 (Z11 ), . . . , f1 (Zn1 ))T , F2 = (f2 (Z12 ), . . . , f2 (Zn2 ))T . T

In matrix notation, (2.2) can be written as Y − g(X , β) = F1 + F2 + ε.

(2.3)

Similar to the discussion of Liang et al. (2008), let S1,z1 , S2,z2 denote the equivalent kernels for the local linear regression at z1 and z2 , respectively. Then S1,z1 = eT1 [(Z1 )T Ω1 Z1 ]−1 (Z1 )T Ω1 , S2,z2 = eT1 [(Z2 )T Ω2 Z2 ]−1 (Z2 )T Ω2 , where eT1 = (1, 0),

Ω1 = diag(Kh1 (Z11 − z1 )), . . . , (Kh1 (Zn1 − z1 )), Ω2 = diag(Kh2 (Z11 − z2 )), . . . , (Kh2 (Zn2 − z2 )), where Khi (·) = K (·/hi ), i = 1, 2, K (·) is a kernel function and h1 and h2 are bandwidths. Moreover 1 1



Z1 =   ..

.

1

Z11 − z1 Z21 − z1 

1 1





..  , . Zn1 − z1

Z2 =   ..

.

1

Z12 − z2 Z22 − z2 



..  , . Zn2 − z2

are n × 2 design matrices. Let S1 and S2 represent the smoother matrices whose rows are the equivalent kernels at the observations Z1 and Z2 , respectively, namely,



S1,Z11



 ..  . ,

S1 = 

S1,Zn1



S2,Z12



 ..  . .

S2 = 

S2,Zn2

Using the backfitting algorithm, we get the pseudo-backfitting estimators of F1 and F2 , denoted by Fˆ 1 (β) (fˆ1 (Z11 , β), . . . , fˆ1 (Zn1 , β))T and Fˆ 2 (β) = (fˆ2 (Z12 , β), . . . , fˆ2 (Zn2 , β))T , which can be expressed as



Fˆ 1 (β) = {I − (I − S1c S2c )−1 (I − S1c )}(Y − g(X , β)), Fˆ 2 (β) = {I − (I − S2c S1c )−1 (I − S2c )}(Y − g(X , β)),

=

(2.4)

where Sic = (I − 11T /n)Si is the centered smoothing matrix corresponding to Si , i = 1, 2, and 1 denotes the n × 1 vector with all entries 1. Motivated by the idea of Li and Nie (2008), minimizing the following profile nonlinear least squares function Q (β) =

n  i=1

{Yi − fˆ1 (Zi1 , β) − fˆ2 (Zi2 , β) − g (Xi , β)}2

(2.5)

4

X. Zhou et al. / Journal of the Korean Statistical Society (

)



with respect to β can yield the profile nonlinear least squares estimator βˆ of β , that is

βˆ = arg min Q (β).

(2.6)

β

ˆ of Fˆ 1 (β) and the estimator Fˆ 2 (β) ˆ of Fˆ 2 (β), that is Substituting βˆ into Fˆ 1 (β) and Fˆ 2 (β) respectively, we get the estimator Fˆ 1 (β) ˆ = {I − (I − S1c S2c )−1 (I − S1c )}(Y − g(X , β)), ˆ Fˆ 1 (β) ˆ ˆ = {I − (I − S2c S1c )−1 (I − S2c )}(Y − g(X , β)). Fˆ 2 (β)



(2.7)

As a consequence, the estimator of the residual vector ε = (ε1 , . . . , εn )T can be represented as

ˆ − Fˆ 2 (β) ˆ − g(X , β). ˆ εˆ = (ˆε1 , . . . , εˆ n )T = Y − Fˆ 1 (β)

(2.8)

2.2. Asymptotic properties of the profile nonlinear least squares estimators In this section, we shall establish the asymptotic behavior of βˆ . The following are several assumptions which are required to derive the main conclusions. Assumption C1. The density functions of Z1 and Z2 are bounded away from zero and have bounded continuous second partial derivatives. Assumption C2. Σ = ˆ Cov(˘g ′ (Xi , β)) is a positive-definite matrix, where g˘ ′ (Xi , β) ∂ g (X ,β) ′ E (g (Xi , β)|Zi2 ) and g ′ (Xi , β) = ∂βi .

= g ′ (Xi , β) − E (g ′ (Xi , β)|Zi1 ) −

Assumption C3. supz1 E ∥g ′ (Xi , β)∥4 |Zi1 = z1 < ∞, supz2 E ∥g ′ (Xi , β)∥4 |Zi2 = z2 < ∞, E ∥˘g ′ (Xi , β)∥4 < ∞, Eg ′ (Xi , β) = 0, E (εi |Xi , Zi1 , Zi2 ) = 0 and E (εi4 |Xi , Zi1 , Zi2 ) < ∞.









Assumption C4. The bandwidths h1 and h2 satisfy hj → 0, nh3j → ∞, and nh8j → 0, for j = 1, 2. Assumption C5. The kernel function K (·) is a symmetric density function with compact support and satisfies  1, sK (s)ds = 0 and s2 K (s)ds < ∞.



K (s)ds =

Under these Assumptions C1–C5, we give the following theorem that states the asymptotic normality of βˆ . Theorem 2.1. Suppose that Assumptions C1–C5 hold, then we have



D

n(βˆ − β) −→ N (0, σ 2 Σ −1 ), D

where‘‘ −→’’ denotes the convergence in distribution. Remark 2.1. From Theorem 2.1, the asymptotic covariance matrix of βˆ can be estimated consistently by combining the plug in method with the sample moment method, that is 1 n

ˆ −1 , σˆ 2 Σ

(2.9)

ˆ g ′ (Xi , β)) ˆ T and σˆ 2 = n−1 i=1 {Yi − fˆ1 (Zi1 , β)− ˆ fˆ2 (Zi2 , β)− ˆ g (Xi , β)} ˆ 2 . Then, the confidence ˆ = n−1 i=1 (˘g ′ (Xi , β))(˘ where Σ 2 region for β can be constructed. More precisely, for any 0 < α < 1, let cα be such that P (χp ≤ cα ) = 1 −α , we can construct the 1 − α normal approximated confidence region for β by n

n

ˆ −1 (βˆ − β) ≤ cα }. ℑnor (α) = {β ∈ Rp : n(βˆ − β)T σˆ 2 Σ

(2.10)

X. Zhou et al. / Journal of the Korean Statistical Society (

)



5

3. Empirical likelihood procedures 3.1. Empirical likelihood for the parameter vector In this section, we apply the empirical likelihood method to the inferences on the parameter vector β . Firstly, substituting (2.4) into (2.3), we have

(I − S )Y = (I − S )g(X , β) + ε, where S = {I − (I − Y˜i = Yi −

n 

) (I −

S1c S2c −1

Sik Yk ,

S1c

(3.1)

)} + {I − (I −

) (I −

S2c S1c −1

g˜ (Xi , β) = g (Xi , β) −

k=1

n 

S2c

)}. Denote

Sik g (Xk , β),

k=1

g˜ ′ (Xi , β) = g ′ (Xi , β) −

n 

Sik g ′ (Xk , β),

k =1

where Sik is the (i, k)-th component of matrix S. Then, the auxiliary random vector of β can be introduced as

ξi (β) = g˜ ′ (Xi , β)[Y˜i − g˜ (Xi , β)].

(3.2)

Therefore, the empirical likelihood ratio function for β can be defined by R(β) = max

 n  i =1

 n n     pi = 1, pi ξi (β) = 0 . (npi )pi ≥ 0, i =1

(3.3)

i =1

With the assumption that 0 is inside the convex hull of the point (ξ1 (β), . . . , ξn (β)), a unique value for R(β) exists. By the Lagrange multiplier method and some calculations, we can obtain log(R(β)) = −

n 

log(1 + λT ξi (β)),

(3.4)

= 0.

(3.5)

i =1

where λ is determined by n 1

ξi (β)

n i=1 1 + λT ξi (β)

Theorem 3.1 will show that −2 log(R(β)) is asymptotically χ 2 -distributed. Theorem 3.1. Suppose that Assumptions C1–C5 hold, then we have D

−2 log(R(β)) −→ χp2 , where χp2 denotes the chi-square distribution with p degrees of freedom. Therefore, we can construct the confidence region of β from Theorem 3.1. For any 0 < α < 1, we have

  ℑel (α) = β ∈ Rp : −2 log(R(β)) ≤ cα

(3.6)

constitutes a confidence region of β with asymptotic coverage probability 1 − α . Remark 3.1. Compared with the confidence region based on (2.10), the confidence region based on (3.6) is not predetermined to be symmetric so that it can better correspond to the true shape of the underlying distribution and there is no need to estimate the asymptotic covariance matrix of βˆ . 3.2. Empirical likelihood for the nonparametric component In this section, we shall consider how to apply the empirical likelihood method to the nonparametric function f1 (z1 ). For f2 (z2 ), the methods and results are similar and thus are omitted here. We know that Fˆ 1 and Fˆ 2 are only the estimators of f1 (·) and f2 (·) at the observations Z1 and Z2 respectively. Then, as in Liang et al. (2008) and Zhou, Jiang, and Qian (2011), two-stage backfitting estimators f1 (z1 ) and f2 (z2 ) can be defined as



ˆ fˆ1 (z1 ) = (1, 0)[(Z1 )T Ω1 Z1 ]−1 (Z1 )T Ω1 (I − S1c S2c )−1 (I − S2c )(Y − g(X , β)), ˆf2 (z2 ) = (1, 0)[(Z1 )T Ω2 Z2 ]−1 (Z2 )T Ω2 (I − S2c S1c )−1 (I − S1c )(Y − g(X , β)). ˆ

(3.7)

6

X. Zhou et al. / Journal of the Korean Statistical Society (

)



To motivate the construction of the empirical likelihood ratio function for f1 (z1 ), we first employ the constraint E {Yi − g (Xi , β) − f1 (Zi1 ) − f2 (Zi2 )|Zi1 = z1 }p1 (z1 ) = 0, where p1 (z1 ) is the density function of Z1 . Then, the auxiliary random vector for f1 (z1 ) can be defined as

ηi (f1 (z1 )) = [Yi − g (Xi , β) − f2 (Zi2 ) − f1 (z1 )]Kh1 (Zi1 − z1 ).

(3.8)

Note that if ηi (f1 (z1 )), i = 1, . . . , n are independent, by means of the total expectation formula, we can obtain E ηi (f1 (z1 )) = 0, then we can define the empirical likelihood ratio function of f1 (z1 ). However, ηi (f1 (z1 )) cannot be directly used to carry on the statistical inference on f1 (z1 ) because it involves the unknown β and f2 (·). A natural thought to deal with this problem is to replace β and f2 (·) with their estimators βˆ and fˆ2 (·) respectively. Thus, we obtain an estimated auxiliary random vector

ˆ − fˆ2 (Zi2 ) − f1 (z1 )]Kh1 (Zi1 − z1 ), η˜ i (f1 (z1 )) = [Yi − g (Xi , β)

(3.9)

and the corresponding estimated empirical log likelihood ratio function can be expressed as

 ˜l(f1 (z1 )) = max

n 

log(npi )|pi ≥ 0,

i=1

n 

pi = 1,

i=1

n 

 pi η˜ i (f1 (z1 )) = 0 .

(3.10)

i =1

Because η˜ i (f1 (z1 )) contains the nonparametric estimator fˆ2 (·), then with the similar arguments to Xue and Zhu (2007), we can derive that ˜l(f1 (z1 )) is not asymptotic standard chi-squared unless we choose a relatively fast decay rate for the bandwidth h1 (i.e. undersmoothing). For the Wilks phenomenon to hold, we propose a residual-adjusted auxiliary random vector

ˆ − fˆ2 (Zi2 ) − f1 (z1 ) − (fˆ1 (Zi1 ) − fˆ1 (z1 ))]Kh1 (Zi1 − z1 ). ηˆ i (f1 (z1 )) = [Yi − g (Xi , β)

(3.11)

Therefore, the residual-adjusted empirical log-likelihood ratio function for f1 (z1 ) can be defined as

 l(f1 (z1 )) = max

n 

log(npi )|pi ≥ 0,

i=1

n  i=1

pi = 1,

n 

 pi ηˆ i (f1 (z1 )) = 0 .

(3.12)

i =1

By the Lagrange multiplier method, l(f1 (z1 )) can be represented as l(f1 (z1 )) = −

n 

log(1 + γ T ηˆ i (f1 (z1 ))),

(3.13)

i=1

where γ is determined by n 1

n i =1

ηˆ i (f1 (z1 )) = 0. 1 + γ T ηˆ i (f1 (z1 ))

(3.14)

Remark 3.2. Although the local linear empirical likelihood proposed by Chen and Qin (2000) can improve the accuracy of the empirical likelihood method. Compared with Chen and Qin (2000), by the residual adjustment for the local constant empirical likelihood in (3.11), our method is simpler and the estimation accuracy does not decrease. Simulation results in Section 4 indicate that our method is workable. Theorem 3.2 will show that −2l(f1 (z1 )) is asymptotically χ 2 -distributed. Theorem 3.2. Suppose that Assumptions C1–C5 hold, σ 2 (z1 ) = E (ε 2 |Z1 = z1 ) is a continuous function of z1 , then we have D

− 2l(f1 (z1 )) −→ χ12 .

(3.15)

As a consequence of Theorem 3.2, the point-wise confidence interval for f1 (z1 ) can be constructed. More precisely, for any 0 < α < 1, let dα be such that P (χ12 ≤ dα ) = 1 − α . Then

ℑ(α) = {f1 (z1 )| − 2l(f1 (z1 )) ≤ dα } constitutes a point-wise confidence interval for f1 (z1 ) with asymptotic coverage probability 1 − α .

X. Zhou et al. / Journal of the Korean Statistical Society (

)



7

Fig. 1. The 95% confidence regions for the parametric components based on the EL method (dotted curves) and the NA method (dashed curves).

4. Simulation studies In this section, we assess the performance of the proposed empirical likelihood method by some simulation experiments. In the following simulation, the kernel function is taken as K (u) = 0.75(1 − u2 )+ , and the bandwidth is taken by the cross validation criterion. In addition, we simulate data from the following model Y = g (X1 , X2 ; β1 , β2 ) + f1 (Z1 ) + f2 (Z2 ) + ε,

(4.1)

where f1 (Z1 ) = sin(2π Z1 ), f2 (Z2 ) = 10(Z2 − 0.5) , and g (X1 , X2 ; β1 , β2 ) = exp{X1 β1 + X2 β2 } with β1 = 1 and β2 = 1. The covariates Z1 ∼ U (0, 1), Z2 ∼ U (0, 1), X1 ∼ U (−1, 1) and X2 ∼ U (−1, 1). Y is generated according to the model (4.1), where the model error term ε ∼ N (0, 0.5). The sample size n was taken as 150 and 300 respectively, and for each case, we draw 1000 random samples from the above model. Firstly, to evaluate the performance of the confidence regions for the parametric component β = (β1 , β2 )T , two methods are compared: the empirical likelihood (EL) method based on Theorem 3.1 and the normal approximation (NA) method based on Theorem 3.2. The averages of 95% confidence regions, based on the 1000 simulation runs, are presented in Fig. 1. Here, the basic idea of the empirical likelihood confidence region plotting is very similar to Owen (1990). More specifically, we first choose a 50 × 50 uniform grid points matrix in the region [0.8, 1.2] × [0.8, 1.2]. Secondly, for each simulation, we calculate the empirical log-likelihood ratio on each grid point. Then, based on 1000 simulation runs, we can obtain 1000 empirical log-likelihood ratio values on each grid point. Lastly, we calculate the average of the 1000 empirical log-likelihood ratio values on each grid point, and then we plot the c0.95 contour line of these averages of empirical log-likelihood ratios, where c0.95 satisfies P (χ22 ≤ c0.95 ) = 0.95. Then, this contour line is the averages of 95% confidence regions based on the 1000 simulation runs. More details about the idea of this plotting method also can be seen in Owen (1990). Furthermore, the corresponding coverage probabilities of the resulting confidence regions are also computed. The simulation results are shown that the coverage probabilities based on the EL method and the NA method are 0.936 and 0.932 respectively when n = 150. In addition, for n = 300, the coverage probabilities based on the EL method and NA method are 0.947 and 0.945 respectively. From Fig. 1, it is clear that the EL method gives smaller confidence regions than the NA method does, and the coverage probabilities based on the two methods differ slightly. This implies that the estimation of asymptotic covariance, used in NA method, will affect the accuracy of NA based confidence regions, and this is especially true when the size of sample is small (n = 150). In all, the results in Fig. 1 imply that the EL method performs better than the NA method for the inferences of parametric components. Next, we evaluate the performance of the EL based point-wise confidence intervals for the nonparametric components f1 (z1 ) and f2 (z2 ). The averages of 95% point-wise confidence intervals and the corresponding coverage probabilities are computed based on 1000 simulation runs. The simulation results of f1 (z1 ) and f2 (z2 ) are shown in Figs. 2 and 3 respectively. From Figs. 2 and 3, we can see that lengths of the resulting point-wise confidence intervals become short when n increases, and the corresponding coverage probabilities become more closer to the confidence level 0.95. These results imply that the proposed EL method for nonparametric components is workable. 2

5. Conclusion In this paper, we have extended the additive partially linear model to the additive partially nonlinear model which is in fact a combination of the additive model and the nonlinear regression model. Such a model can be used to describe more

8

X. Zhou et al. / Journal of the Korean Statistical Society (

)



Fig. 2. The 95% point-wise confidence intervals and the corresponding coverage probabilities for the nonparametric component f1 (z1 ) based on the EL method with n = 150 (dashed curves) and n = 300 (dotted curves), where the solid curve is the real curve.

Fig. 3. The 95% point-wise confidence intervals and the corresponding coverage probabilities for the nonparametric component f2 (z2 ) based on the EL method with n = 150 (dashed curves) and n = 300 (dotted curves), where the solid curve is the real curve.

complex relationship between the response variable and the covariates, which is an essential generalization of the additive partially linear model. We have developed a profile nonlinear least squares estimation procedure for the parameter vector and the nonparametric functions of the additive partially nonlinear model and established the asymptotic properties of the resulting estimators. Furthermore, we applied the empirical likelihood method to the additive partially nonlinear model. An empirical log-likelihood ratio for the parameter vector in nonlinear function has been proposed and the nonparametric version of the Wilks theorem has been obtained. Meanwhile, a residual adjusted empirical likelihood ratio for the nonparametric functions has been proposed, and the corresponding Wilks phenomenon is proved. The confidence regions for the parametric vector and the nonparametric components are constructed. Some simulations have been conducted to assess the performance of the proposed estimating procedures. The results have demonstrated that both the procedures perform well in finite samples. By comparing the results from the empirical likelihood method with those from the profile nonlinear least squares-based method, we conclude that the empirical likelihood method performs better in terms of the confidence region. Acknowledgments The research was supported by Shandong Provincial Natural Science Foundation of China (Grant Nos. ZR2014AL005 and ZR2012AL05) and the National Natural Science Foundation of China (Grant Nos. 11426057, 11301309 and 11301569). The authors acknowledged gratefully the constructive comments and suggestions from the editor, the associate editor and the two reviewers, which led to a significant improvement on the paper. Appendix. Proofs Before embarking on the proof of main conclusions, we first prove some lemmas. For the convenience and simplicity, 1 denote vdn = (nhd )−1 log(h− d )



1/2

, cdn = vdn + h2d , d = 1, 2, cn = c1n + c2n , ε˜ = (I − S )ε, µk =



sk K (s)ds, k = 0, 1, 2.

Lemma A.1. Let (X1 , Y1 ), . . . , (Xn , Yn ) be i.i.d. random vectors, where Yi′ s are scalar random variables. Further assume that E |Yi |s < ∞ and supx |y|s p(x, y)dy < ∞, where p(x, y) denotes the joint density of (X , Y ). Let K (·) be a bounded positive function with a bounded support, satisfying a Lipschitz condition. Given that n2δ−1 h → ∞ for some δ < 1 − s−1 , then

   1/2 n  1  log(1/h)   {Kh (Xi − x)Yi − E [Kh (Xi − x)Yi ]} = Op sup  .  nh x n i=1

X. Zhou et al. / Journal of the Korean Statistical Society (

)



9

This follows immediately from the result obtained by Mack and Silverman (1982). Lemma A.2. Let D1 , . . . , Dn be independent random variables. If supi E |Di |s is bounded for s > 1, then max1≤i≤n |Di |s = o(n1/s ) a.s. The proof of Lemma A.2 can be found in Shi and Lau (2000). Lemma A.3. Suppose that Assumptions C1–C5 hold, then the following asymptotic approximations hold uniformly over all the elements of the matrices: Sdc = Sd − 11T /n + o(11T /n)

a.s.

and (I − S1c S2c )−1 = I + O(11T /n) a.s.

where d = 1, 2. Similarly, the second equation is true for (I − S2c S1c )−1 . This lemma is Lemma 3.1 and Lemma 3.2 of the Opsomer and Ruppert (1997), see the proof in above references. Lemma A.4. Suppose that Assumptions C1–C5 hold. Then as n → ∞ it holds 1 ′ P g (X , β)(I − S )T (I − S )g′ (X , β)T −→ Σ . n

(A.1)

Proof. It is easy to obtain n 

   (Zd )T Ωd Zd =  n 

Khd (Zid − zd )

i =1

(Zid − zd )Khd (Zid − zd )

i=1

 n  (Zid − zd )Khd (Zid − zd )   i =1 , n   2 (Zid − zd ) Khd (Zid − zd ) i =1

d = 1, 2. Let pd (·) be the density of Zd , d = 1, 2. Note that each element of the above matrix is in the form of a kernel regression, with Lemma A.1, we have n 

Khd (Zid − zd )

  = n E [Khd (Zid − zd )] + Op (vdn )

i =1

 =n  =n



Khd (u − zd )pd (u)du + Op (vdn ) K (s)pd (zd + shd )ds + Op (vdn )



= n[pd (zd ) + O(h2d ) + Op (vdn )] = npd (zd )(1 + Op (cdn )),   n  (Zid − zd )Khd (Zid − zd ) = n (u − zd )Khd (u − zd )pd (u)du + Op (vdn ) i =1

 =n  =n

sK (s)pd (zd + shd )hd ds + Op (vdn )



shd K (s)[pd (zd ) + shd p′d (zd ) + O(h2d )]ds + Op (vdn )



   = n hd sK (s)ds · pd (zd ) + Op (cdn ) = 0. n

By the same argument, we can get that

i=1

(Zid − zd )2 Khd (Zid − zd ) = npd (zd )µ2 (1 + Op (cdn )), d = 1, 2. These arguments imply

  (Zd )T Ωd Zd = npd (zd ) ⊗ diag(1, µ2 ) 1 + Op (cdn ) ,

10

X. Zhou et al. / Journal of the Korean Statistical Society (

)



where ⊗ is the Kronecker product. Similarly, it is easy to obtain n 

   (Zd )T Ωd g′ (X , β)T =  n 

g ′ (Xi , β)T Khd (Zid − zd )

i =1

g ′ (Xi , β)T (Zid − zd )Khd (Zid − zd )

   , 

i=1 n 

g ′ (Xi , β)T Khd (Zid − zd ) = n E [g ′ (X , β)T Khd (Zd − zd )] + Op (vdn )





i =1

  = n E (g ′ (X , β)T |Zd )E [Khd (Zd − zd )] + Op (vdn ) = nE (g ′ (X , β)T |Zd )[pd (zd ) + O(h2d ) + Op (vdn )] = npd (zd )E (g ′ (X , β)T |Zd ){1 + Op (cdn )}. n Using the same argument, we can obtain i=1 g ′ (Xi , β)T (Zid − zd )Khd (Zid − zd ) = 0, thus we have   (Zd )T Ωd g′ (X , β)T = npd (zd )E (g ′ (X , β)T |Zd ) ⊗ (1, 0)T 1 + Op (cdn ) . So Sd g′ (X , β)T = E (g ′ (X , β)T |Zd ){1 + Op (cdn )},

d = 1, 2.

By Lemma A.3 and direct calculations, we have 1 ′ g (X , β)(I − S )T (I − S )g′ (X , β)T = E {g ′ (X , β)T − E (g ′ (X , β)T |Z1 ) − E (g ′ (X , β)T |Z2 )}⊗2 {1 + Op (cn )}. n Lemma A.5. Suppose that Assumptions C1–C5 hold. Then n n 1  ′ 1  ′ g˜ (Xi , β)˜εi = √ g˘ (Xi , β)εi + op (1). n i =1 n i=1

(A.2)







n n    1  ′ g˜ (Xi , β) f1 (Zi1 ) + f2 (Zi2 ) − Sij f1 (Zj1 ) + f2 (Zj2 ) √ n i =1 j=1

= op (1).

(A.3)

Proof. With the same arguments in Lemma A.4, we can obtain (A.2) easily. The following will outline the proof of (A.3). Similar to the proof of Theorem 4.1 in Opsomer and Ruppert (1997), we can get f1 (Zi1 ) + f2 (Zi2 ) −

n 

Sij [f1 (Zj1 ) + f2 (Zj2 )] = C1 h21 f1′′ (z1 ) + C2 h22 f2′′ (z2 ) + op (h21 + h22 ),

j =1

where C1 and C2 are some constants, combining with

(I − S )g′ (Xi , β) = g ′ (Xi , β) − E (g ′ (Xi , β)|Zi1 ) − E (g ′ (Xi , β)|Zi2 ) + Op (cn ), then (A.3) is proved. Lemma A.6. Suppose that Assumptions C1–C5 hold. Then n 1 



n i =1

n 1

n i =1

D

ξi (β) −→ N (0, Λ),

ξi (β)ξi (β)T − Λ = op (1),

max ∥ξi (β)∥ = op (n1/2 ),

(A.4)

(A.5) (A.6)

1≤i≤n

n 1

n i =1

∥ξi (β)∥3 = op (n1/2 ),

where Λ = E {ε[g ′ (X , β) − E (g ′ (X , β)|Z1 ) − E (g ′ (X , β)|Z2 )]}⊗2 = σ 2 Σ .

(A.7)

X. Zhou et al. / Journal of the Korean Statistical Society (

)



11

Proof. By Lemma A.4, we have n 1 



n i=1

n 1  ′ g˜ (Xi , β)[Y˜i − g˜ (Xi , β)] n i=1

ξi (β) = √ √ =



n

√ = where ξn =



1 ′ g (X , β)T (I − S )T (I − S )(Y − g(X , β)) n

n[ξn + Op (cn2 )],

εi [g ′ (Xi , β) − E (g ′ (Xi , β)|Zi1 ) − E (g ′ (Xi , β)|Zi2 )]. It is easy to obtain E ξn = 0 and Cov(ξn ) = n D 1 E {(g ′ (X , β) − E (g ′ (X , β)|Z1 ) − E (g ′ (X , β)|Z2 ))ε}⊗2 . Thus we have √1n i=1 ξi (β) −→ N (0, Λ), (A.4) is proved. It is easy n 1 n

n

i=1

to prove (A.5), we omit here. Next we prove (A.6). Obviously, we have max ∥ξi (β)∥ ≤ max ∥˜g ′ (Xi , β)∥ + max ∥Y˜i − g˜ (Xi , β)∥,

1≤i≤n

1≤i≤n

1≤i≤n

with the Assumption C3, we have



1

max ∥˜g ′ (Xi , β)∥ = o n 2s



1≤i≤n



,

1

max ∥Y˜i − g˜ (Xi , β)∥ = max ∥εi ∥ = o n 2s

1≤i≤n

1≤i≤n



.

Thus, (A.6) is proved. By (A.5) and (A.6), we can easily obtain (A.7). Proof of Theorem 2.1. We first prove the consistency of βˆ . It suffices to prove that for any sufficiently small a, lim P [ sup Q (β + t ) > Q (β)] = 1.

n→∞

(A.8)

∥t ∥=a

By means of the Taylor’s expansion, we have Q (β + t ) − Q (β) = Q ′ (β)T t +

1 2

t T Q ′′ (β ∗ )t ,

where β ∗ lies between β and β + t, Q ′ (β) is the first order derivative of Q (β) at β , and Q ′′ (β ∗ ) is the second order derivative of Q (β) at β ∗ . We can show that n−1 Q ′ (β) = −2n−1 g′ (X , β)(I − S )T (I − S )[Y − g(X , β)] = −2ξn + Op (cn2 ).

(A.9)

√ √ Thus, Q (β) t = n[−2ξn + ( )]t = [−2 n( nξn ) + Op (ncn2 )]t is of order √ √ √ √ Op ( n∥t ∥ + ncn2 ∥t ∥) = Op ( n(1 + ncn2 )∥t ∥) = Op ( n∥t ∥) √ as ncn2 = o(1) by the Assumption C6. Note that ′

T

Op cn2

t T Q ′′ (β)t = 2t T g′ (X , β)T (I − S )T (I − S )g′ (X , β)t − 2G(X , β)T (I − S )T (I − S )(ε + F1 + F2 ), where G(X , β) = (G(X1 , β), G(X2 , β), . . . , G(Xn , β))T with G(Xi , β) = t T g ′′ (Xi , β)t , i = 1, 2, . . . , n. Hence, by Lemma A.4, it can be shown that t T Q ′′ (β)t = 2n{t T Σ t + Op (∥t ∥3 )}.

(A.10)

Since Σ is positive definite, t Q (β )t dominates Q (β) t for sufficiently large n and sufficiently small a. Hence (A.8) holds. Then we prove the asymptotic normality of βˆ . It follows from the Taylor’s expansion that T

′′





T

ˆ = Q ′ (β) + Q ′′ (β ∗ )T (βˆ − β), 0 = Q ′ (β)

(A.11)

where β ∗ lies between βˆ and β . Using (A.10), under the Assumptions C1–C5, we have 1 2n

Q ′′ (β ∗ ) = Σ {1 + op (1)}.

It follows from (A.9) that



nΣ {1 + op (1)}(βˆ − β) =

as





nξn + Op ( ncn2 ) =



nξn + op (1)



ncn2 = o(1) by the Assumption C5. Note that n 1  ′ nξ n = √ {g (Xi , β) − E (g ′ (Xi , β)|Zi1 ) − E (g ′ (Xi , β)|Zi2 )}εi n i =1



12

X. Zhou et al. / Journal of the Korean Statistical Society (

)



By the Slutsky theorem and the central limit theorem, we have



D

n(βˆ − β) −→ N (0, σ 2 Σ −1 ).

Proof of Theorem 3.1. We assume that {ξi (β), i = 1, . . . , n}. Let

ϕ(λ) =

n 1

n i =1

1 n

n

i=1

ξi (β)ξiT (β) is positive definite, and 0 belongs to the convex hull of

ξi (β) = 0. 1 + λT ξi (β)

(A.12)

By Lemma A.6 and using the same arguments as used in the proof of (2.14) in Owen (1990), we can prove ∥λ∥ = OP (n−1/2 ). By the identity (A.12), we obtain

 T 2  λ ξi (β) 0 = ϕ(λ) = ξi (β) 1 − λ ξi (β) + n i =1 1 + λT ξi (β)   2 n 1  ξi β)(λT ξi (β) , = ξ¯ − V λ + n i =1 1 + λT ξi (β) where ξ¯ =

1 n

n 1

n i =1

n

i=1

n 1



ξi (β), V =

1 n

T

n

i=1

ξi (β)ξi (β)T . From (A.7), we have

∥ξi (β)∥3 ∥λ∥2 |1 + λT ξi (β)|−1 = op (n1/2 )Op (n−1 )Op (1) = op (n−1/2 ).

Therefore λ = V −1 ξ¯ + op (n−1/2 ). Applying the Taylor expansion formula, we get

−2 log R(β) = 2nλT ξ¯ − nλT V λ + 2

n 

τi = n(ξ¯ )T V −1 ξ¯ + 2

i =1

n 

τi + op (1).

i=1

D

From (A.4) and (A.5), as n → ∞, we have n(ξ¯ )T V −1 ξ¯ −→ χp2 and

  n n      τi  ≤ 2C ∥λ∥3 ∥ξi (β)∥3 = Op (n−3/2 )op (n3/2 ) = op (1). 2  i=1  i=1 D

Therefore −2 log R(β) −→ χp2 . This completes the proof. Lemma A.7. Suppose that Assumptions C1–C5 hold, σ 2 (z1 )= ˆ E (ε 2 |Z1 = z1 ) is a continuous function of z1 , then as n → ∞, we have n 1 

nh1 i=1

D

ηˆ i (f1 (z1 )) −→ N (0, B)

where B = σ 2 (z1 )p1 (z1 )v0 and v0 =



K 2 (s)ds.

Proof. Using some simple calculations, we can obtain

ˆ − Fˆ 2 = (I − S2c S1c )−1 (I − S2c )(Y − g(X , β)) ˆ Y − g(X , β) ˆ = (I − S2c S1c )−1 (I − S2c )[Y − g(X , β) + g(X , β) − g(X , β)] c c −1 c ˆ + F1 + F2 + ε] = (I − S2 S1 ) (I − S2 )[g(X , β) − g(X , β) = ˆ T1 + T2 + T3 , where

ˆ T1 = (I − S2c S1c )−1 (I − S2c )[g(X , β) − g(X , β)], T2 = (I − S2c S1c )−1 (I − S2c )(F1 + F2 ), T3 = (I − S2c S1c )−1 (I − S2c )ε.

X. Zhou et al. / Journal of the Korean Statistical Society (

)



13

Note that

(I − S2c S1c )−1 = I + op

  1 n

,

(I − S2c )(F1 + F2 ) = F1 + Op (h21 + h22 ), (I − S2c )g′ (X , β) = (g′ (X , β) − E (g′ (X , β)|Z2 ))(I + op (1)). We have

ˆ − F˜ 2 = ε + [g′ (X , β) − g′ (X , β) ˆ − E (g′ (X , β) − g′ (X , β)| ˆ Z2 )][I + op (1)] + F1 + Op (h21 + h22 ). Y − g(X , β) Therefore



1

n 

nh1 i=1

ηˆ i (f1 (z1 )) = √

n 

1

nh1 i=1

+√

εi Kh1 (Zi1 − z1 )

n  ˆ − E (g ′ (Xi , β) − g ′ (Xi , β)| ˆ Zi2 )]Kh1 (Zi1 − z1 ) [g ′ (Xi , β) − g ′ (Xi , β)

1

nh1 i=1

+√

n  [(f1 (Zi1 ) − f1 (z1 )) − (fˆ1 (Zi1 ) − fˆ1 (z1 ))]Kh1 (Zi1 − z1 ) + op (1)

1

nh1 i=1 = ˆ D1 + D2 + D3 + op (1).

ˆ at β respectively, we have Applying Taylor expansion to g ′ (Xi , β) and g ′ (Xi , β) ˆ − E (g ′ (Xi , β) − g ′ (Xi , β)| ˆ Zi2 ) = {g ′′ (Xi , β) − E [g ′′ (Xi , β)|Zi2 ]}(β − β) ˆ + op (β − β). ˆ g ′ (Xi , β) − g ′ (Xi , β) It is easy to obtain that ED1 = 0 and Cov(D1 ) = B, and D1 satisfies the Linderberg condition. By means of the central limit D

theorem, the conclusion D1 −→ N (0, B) can be obtained. Next we prove Di = op (1), i = 2, 3. It is easy to obtain



1

n  [g ′′ (Xi , β) − E (g ′′ (Xi , β)|Zi2 )]Kh1 (Zi1 − z1 ) = Op (1).

nh1 i=1 Since



ˆ = Op (1), we have n(β − β)

√ h

n √ √ 1  ′′ ˆ = Op ( h). [g (Xi , β) − E (g ′′ (Xi , β)|Zi2 )]Kh1 (Zi1 − z1 ) n(β − β) nh1 i=1

Applying Taylor expansion to f1 (Zi1 ) − f1 (z1 ) and fˆ1 (Zi1 ) − fˆ1 (z1 ) at z1 respectively, we have

(f1 (Zi1 ) − f1 (z1 )) − (fˆ1 (Zi1 ) − fˆ1 (z1 )) = (f1′ (Zi1 ) − f1′ (z1 ))(Zi1 − z1 ) + op (Zi1 − z1 ). Then n n 1  ′ 1  (f1 (Zi1 ) − f1′ (z1 ))(Zi1 − z1 )Kh1 (Zi1 − z1 ) + √ Kh1 (Zi1 − z1 )op (Zi1 − z1 ), D3 = √ nh1 i=1 nh1 i=1

and it is easy to obtain that



1

n  (Zi1 − z1 )Kh1 (Zi1 − z1 ) = Op (1).

nh1 i=1

Similar to Theorem 3.1 in Zhou et al. (2011), we obtain fˆ1′ (z1 ) − f1′ (z1 ) = op (1), thus D3 = op (1). This completes the proof of Lemma A.7. Lemma A.8. Suppose that Assumptions C1–C5 hold, σ 2 (z1 )= ˆ E (ε2 |Z1 = z1 ) is a continuous function of z1 , then as n → ∞, we have n 1 

nh1 i=1

P

ηˆ i (f1 (z1 ))ηˆ i (f1 (z1 ))T −→ B,

  1 γ = Op (nh1 )− 2 . The proof of Lemma A.8 can be referred to the Lemmas A.4 and A.5 in Zhao and Xue (2013), here we omit it.

14

X. Zhou et al. / Journal of the Korean Statistical Society (

)



Proof of Theorem 3.2. Some simple calculations yield

 −2ˆl(f1 (z1 )) =



1

n 

nh1 i=1

T  ηˆ i (f1 (z1 ))

n 1 

nh1 i=1

−1  ηˆ i (f1 (z1 ))ηˆ i (f1 (z1 ))

T



1

n 

nh1 i=1

 ηˆ i (f1 (z1 )) + op (1).

D

Thus −2ˆl(f1 (z1 )) −→ χ12 can be obtained. This completes the proof of Theorem 3.2. References Bates, D. M., & Watts, D. G. (1988). Nonlinear regression: iterative estimation and linear approximations. John Wiley & Sons, Inc.. Chen, S. X., & Qin, Y. S. (2000). Empirical likelihood confidence intervals for local linear smoothers. Biometrika, 87(4), 946–953. Guo, J., Tang, M., Tian, M., et al. (2013). Variable selection in high-dimensional partially linear additive models for composite quantile regression. Computational Statistics & Data Analysis, 65, 56–67. Hastie, T. J., & Tibshirani, R. J. (1990). Generalized additive models. CRC Press. Huang, T. M., & Chen, H. (2008). Estimating the parametric component of nonlinear partial spline model. Journal of Multivariate Analysis, 99(8), 1665–1680. Huang, Z., & Zhang, R. (2009). Empirical likelihood for nonparametric parts in semiparametric varying-coefficient partially linear models. Statistics & Probability Letters, 79(16), 1798–1808. Li, Q. (2000). Efficient estimation of additive partially linear models. International Economic Review, 41(4), 1073–1092. Li, R., & Nie, L. (2007). A new estimation procedure for a partially nonlinear model via a mixed effects approach. Canadian Journal of Statistics, 35(3), 399–411. Li, R., & Nie, L. (2008). Efficient statistical inference procedures for partially nonlinear models and their applications. Biometrics, 64(3), 904–911. Li, G., & Xue, L. (2008). Empirical likelihood confidence region for the parameter in a partially linear errors-in-variables model. Communications in Statistics— Theory and Methods, 37(10), 1552–1564. Liang, H., Su, H., Thurston, S. W., et al. (2009). Empirical likelihood based inference for additive partial linear measurement error models. Statistics and Its Interface, 2, 83–90. Liang, H., Thurston, S. W., Ruppert, D., et al. (2008). Additive partial linear models with measurement errors. Biometrika, 95(3), 667–678. Linton, O., & Nielsen, J. P. (1995). A kernel method of estimating structured nonparametric regression based on marginal integration. Biometrika, 93–100. Liu, X., Wang, L., & Liang, H. (2011). Estimation and Variable Selection for Semiparametric Additive Partial Linear Models (SS-09-140). Statistica Sinica, 21(3), 1225. Mack, Y. P., & Silverman, B. W. (1982). Weak and strong uniform consistency of kernel regression estimates. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 61, 405–415. Opsomer, J. D., & Ruppert, D. (1997). Fitting a bivariate additive model by local polynomial regression. The Annals of Statistics, 186–211. Opsomer, J. D., & Ruppert, D. (1999). A root-n consistent backfitting estimator for semiparametric additive modeling. Journal of Computational and Graphical Statistics, 8(4), 715–732. Owen, A. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2), 237–249. Owen, A. (1990). Empirical likelihood ratio confidence regions. The Annals of Statistics, 90–120. Owen, A. (1991). Empirical likelihood for linear models. The Annals of Statistics, 1725–1747. Shi, J., & Lau, T. S. (2000). Empirical likelihood for partially linear models. Journal of Multivariate Analysis, 72(1), 132–148. Wang, X., Chen, F., & Lin, L. (2010). Empirical likelihood inference for the parameter in additive partially linear EV models. Communications in Statistics— Theory and Methods, 39(19), 3513–3524. Wang, X., Li, G., & Lin, L. (2011). Empirical likelihood inference for semi-parametric varying-coefficient partially linear EV models. Metrika, 73(2), 171–185. Wang, Q., & Rao, J. N. K. (2002). Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics, 30(3), 896–924. Wei, C., Luo, Y., & Wu, X. (2012). Empirical likelihood for partially linear additive errors-in-variables models. Statistical Papers, 53(2), 485–496. Wei, C., & Mei, C. (2012). Empirical likelihood for partially linear varying-coefficient models with missing response variables and error-prone covariates. Journal of the Korean Statistical Society, 41(1), 97–103. Xiao, Y., Tian, Z., & Li, F. (2014). Empirical likelihood-based inference for parameter and nonparametric function in partially nonlinear models. Journal of the Korean Statistical Society, 43(3), 367–379. Xue, L., & Xue, D. (2011). Empirical likelihood for semiparametric regression model with missing response data. Journal of Multivariate Analysis, 102(4), 723–740. Xue, L., & Zhu, L. (2007). Empirical likelihood semiparametric regression analysis for longitudinal data. Biometrika, 94(4), 921–937. Zhao, P., & Xue, L. (2009). Empirical likelihood inferences for semiparametric varying-coefficient partially linear errors-in-variables models with longitudinal data. Journal of Nonparametric Statistics, 21(7), 907–923. Zhao, P., & Xue, L. (2013). Empirical likelihood for nonparametric components in additive partially linear models. Communications in Statistics—Simulation and Computation, 42(9), 1935–1947. Zhou, Z., Jiang, R., & Qian, W. (2011). Variable selection for additive partially linear models with measurement error. Metrika, 74(2), 185–202. Zhou, X., Zhao, P., & Lin, L. (2014). Empirical likelihood for parameters in an additive partially linear errors-in-variables model with longitudinal data. Journal of the Korean Statistical Society, 43(1), 91–103.