Empirical likelihood-based inference in varying-coefficient single-index models

Empirical likelihood-based inference in varying-coefficient single-index models

Journal of the Korean Statistical Society 40 (2011) 205–215 Contents lists available at ScienceDirect Journal of the Korean Statistical Society jour...

297KB Sizes 1 Downloads 63 Views

Journal of the Korean Statistical Society 40 (2011) 205–215

Contents lists available at ScienceDirect

Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss

Empirical likelihood-based inference in varying-coefficient single-index models Zhensheng Huang School of Mathematics, Hefei University of Technology, Hefei, Anhui, 230009, China

article

info

Article history: Received 9 April 2010 Accepted 27 September 2010 Available online 3 November 2010 AMS 2000 subject classification: primary 62G08 secondary 62G20 Keywords: Confidence interval Chi-squared distribution Empirical likelihood Least-squared method Varying-coefficient single-index model

abstract This article deals with statistical inferences based on varying-coefficient parts in varying-coefficient single-index models (VCSIM). To improve the accuracy of the normal approximation-based confidence regions/intervals, an estimated empirical likelihoodbased statistic is proposed. The resulting statistic is shown to be asymptotically chisquared distributed. The construction of the empirical likelihood-based confidence regions of some varying-coefficient components in VCSIM is considered. In addition, the pointwise confidence intervals for a single function of the varying-coefficient parts are constructed. A simulation study is carried out to compare our results with several existing methods. © 2010 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

1. Introduction In multivariate regression analysis, to avoid the so-called ‘‘curse of dimensionality’’, various nonparametric and semiparametric models have been proposed. Examples include varying-coefficient models (Hastie & Tibshirani, 1993; Wu, Chiang, & Hoover, 1998), additive modeling (Hastie & Tibshirani, 1990), adaptive varying-coefficient partially linear regression models (Huang & Zhang, 2009), generalized partially linear single-index models (Carroll, Fan, Gijbel, & Wand, 1997), and so on. To capture more accurately the underlying relationship between the response variable and the covariates, Wong, Ip, and Zhang (2008) proposed the following varying-coefficient single-index models (VCSIM) Y = g (β T X) + θ T (U )Z + ε,

(1)

where Y ∈ R is a response variable, U ∈ R, X = (X1 , . . . , Xq )T ∈ Rq and Z = (Z1 , . . . , Zp )T ∈ Rp are covariates; g (·) is an unknown univariable measurable function, the unknown parametric β is in Rq and ‖β‖ = 1 for model identifiability, θ (·) = (θ1 (·), . . . , θp (·))T is a vector of unknown functions and ε is a random error with E (ε|X, U , Z) = 0 and var(ε|X, U , Z) = σ 2 (U ). The VCSIM is flexible enough to cover many important models as special examples. For example, if g (·) = 0, the VCSIM reduces to the standard varying-coefficient model; see Hastie and Tibshirani (1993); Wu et al. (1998). When θ (·) is a constant parameter vector, (1) reduces to the partially linear single-index model. Furthermore, if the constant parameter vector is a zero vector, (1) reduces to the pure single-index model. Relevant literature can be found in Carroll et al. (1997), Härdle, Hall, and Ichimura (1993), Yu and Ruppert (2002) and Zhu and Xue (2006). When g (x) = x, (1) becomes the varying-coefficient partially linear model, which can be seen in Ahmad, Leelahanon, and Li (2005). In addition, the VCSIM also includes the partially linear model with β = 1 and θ (·) is a constant parameter vector. E-mail addresses: [email protected], [email protected]. 1226-3192/$ – see front matter © 2010 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.jkss.2010.09.005

206

Z. Huang / Journal of the Korean Statistical Society 40 (2011) 205–215

Based on the local linear method, average method and backfitting technique, Wong et al. (2008) studied the estimation of the unknown parameters and the unknown functions in the VCSIM, and their asymptotic distributions were also derived. Using their proposed estimates, Huang and Zhang (2010) extended the generalized likelihood ratio (GLR) tests to the VCSIM, and showed that the newly proposed GLR tests follow the chi-squared distribution asymptotically with scale constant and degree of freedom independent of the nuisance parameters, known as Wilks phenomenon. In this paper, we are mainly concerned with the construction of a confidence region of the varying-coefficient θ (u). Obviously, a natural method is the use of the asymptotic normal distribution of θˆ (u) defined by Wong et al. (2008) to construct the confidence regions/intervals of θ(u). However, the finite-sample performance of the resulting confidence regions/intervals may not be appealing because the complicated variance structure, which needs to be estimated with estimates plugged in for several parameters. Taking this issue into account, we recommend using the empirical likelihood method to construct the confidence regions/intervals for θ(u). The empirical likelihood was introduced by Owen (1988) and its general properties were studied in Owen (1990). This method has many advantages over its competitors such as normal approximation-based method and the bootstrap method: firstly, the empirical likelihood does not involve a plug-in estimate for the limiting variance; secondly, the empirical likelihood-based confidence region does not need to impose prior constraints on the region shape, and the region is range preserving and transformation respecting (see Hall & La Scala, 1990); thirdly, as DiCiccio, Hall, and Romano (1991) proved, the empirical likelihood is Bartlett correctable and, thus, has an advantage over the bootstrap method. Due to these nice properties, the applications of the empirical likelihood in parametric and nonparametric models have received a great amount of attention; for example, see Chen and Hall (1993), and Zhu and Xue (2006). In addition, Owen (2001) proposed a fairly comprehensive reference. More recently, Xue and Zhu (2007) considered the empirical likelihood-based inference for the varying-coefficient model with longitudinal data. But the tools used in the above literature cannot be used directly in the VCSIM because of the complication of the semiparametric model (1). In this article, our main purpose is to apply the empirical likelihood method to the VCSIM. Our proposed procedure is a generalization of the empirical likelihood procedure to a combination of single-index model and varying-coefficient model. And this generalization is by no means straightforward, and we actually encountered several challenges. For example, firstly, to construct an estimated empirical log-likelihood, we have to give the estimates of the unknown functions and the unknown parameter: g (·) and β . However, the estimators proposed by Wong et al. (2008) cannot be used to construct the empirical likelihood ratio function since they depend on each other in a very complicate way; secondly, the choice of pilot estimator of β in Wong et al. (2008) is very difficult because it may result in radically different estimators, and the pilot estimator is often assumed to be root-n consistent, although they are not given in a constructible way. In view to these issues, a stepwise estimation procedure for the VCSIM is proposed to obtain the estimators of the unknown functions and the unknown parameter: g (·) and β . Based on this, an estimated empirical log-likelihood ratio function is defined. Furthermore, the confidence region for θ (u) can be constructed. To demonstrate the performance of the proposed method, we conducted a comparison with the normal approximation-based method and the bootstrap method. Through comparing with those based on the existing methods, the confidence regions/intervals based on the proposed empirical likelihood method performs fairly well. In addition, the construction of the simultaneous and bootstrap confidence band for a single function of θ (u) is also considered. The rest of this paper is organized as follows. In Section 2 the estimation procedure for the VCSIM is provided, an estimated empirical log-likelihood ratio function is defined and its asymptotic standard χ 2 -distribution is derived. Furthermore, the confidence regions/intervals for θ (u)/θr (u), r = 1, . . . , p are constructed, and we also constructed the simultaneous and bootstrap confidence bands for θr (u), r = 1, . . . , p. Section 3 provides examples based on simulated data, and a comparison between the empirical likelihood method proposed and the existing methods is performed in terms of coverage accuracies and areas/widths of confidence regions/intervals. The proofs of the main results are collected in the Appendix. 2. Methodology and results In this section we will present an estimated empirical likelihood ratio function for θ (u) in the VCSIM. Since there exist the unknown nonparametric components g (·) and the unknown parameter β in the VCSIM, we have to give their estimators in order to apply empirical likelihood method, and then define an estimated empirical likelihood ratio function and use it to construct the confidence regions/intervals of θ (u)/θr (u), r = 1, . . . , p. When a single component of the vector function θ (u) is of particular interest, based on the NA method and the bootstrap method we also consider the construction of confidence intervals for a single component. 2.1. Estimation procedure for the VCSIM Suppose that {(Yi , Xi , Ui , Zi )}ni=1 is an i.i.d sample from the VCSIM, i.e. Yi = g (β T Xi ) + ZTi θ (Ui ) + εi ,

i = 1, . . . , n,

(2)

where {ε } are i.i.d random errors with E (εi |Xi , Ui , Zi ) = 0, Xi = (Xi1 , . . . , Xiq ) , and Zi = (Zi1 , . . . , Zip ) . Then we approximate θj (·) locally by a linear function θj (v) ≈ θj (u) + θj′ (u)(v − u) ≡ aj + bj (v − u), j = 1, . . . , p, for v in a n i i=1

T

T

1 neighborhood of u. Throughout this article, we write Khl (·) = K (·/hl )h− l , where K (·) is a kernel function, L(·, ·) is a kernel

Z. Huang / Journal of the Korean Statistical Society 40 (2011) 205–215

207

function on R2 and hl > 0 are bandwidths, l = 1, 2, 3. Following the idea of Liang and Wang (2005) and Wang and Xue (2011), the estimation procedure for the VCSIM is as follows. Step 1. Obtain the estimates of θj (·), j = 1, . . . , p. By model (2), it follows that E (Yi |β T Xi , Ui ) = g (β T Xi ) + E T (Zi |β T Xi , Ui )θ(Ui ). This, along with (11), yields Yi − ψ1 (β T Xi , Ui ) = Zi − ψ2 (β T Xi , Ui )



T

θ (Ui ) + εi ,

(3)

where ψ1 (t , u) = E (Y |β X = t , U = u) and ψ2 (t , u) = E (Z|β X = t , U = u). Then the local linear estimators of θj (·), T

T

p

j = 1, . . . , q, are defined as θ¨j (u) = aˆ j and θ¨j′ (u) = bˆ j , where {(ˆaj , bˆ j )}j=1 minimize the sum of weighted squares n −

2  p −   Yi − [aj + bj (Ui − u)]Zij Kh1 (Ui − u),

(4)

j =1

i =1

ˆ 1 (τˆ1T Xi , Ui ; τˆ1 ),  ˆ 2 (τˆ2T Xi , Ui ; τˆ2 ), and ψˆ k (t , u; τˆk ), k = 1, 2 are the Nadaraya–Watson kernel where  Yi = Yi − ψ Zij = Zij − ψ estimation of ψk (t , u), k = 1, 2, respectively, i.e. ψˆ k (t , u; τˆk ) =

n −

Yi Hni (t , u; τˆk ),

i =1

here Hni (t , u; τˆk ) = L



τˆkT Xi −t Ui −u √ , √ h2

 ∑

h2

n i=1

L



τˆkT Xi −t Ui −u √ , √ h2

h2



, k = 1, 2, τˆ1 and τˆ2 are two sliced inverse regression

estimations of β corresponding to ψ1 (β X, U ) and ψ2 (β X, U ), respectively (see Remark 1). From (4), and by a simple calculation we can obtain that T



θ¨ (u)T , h1 θ¨ ′ (u)T

T

T

= Φn−1 (u)ηn (u).

(5)

Furthermore

θ¨j (u) = eTj,2q Φn−1 (u)ηn (u),

j = 1, . . . , q,

(6)

  where ej,2q denotes the unit vector of length 2q with 1 at position j, and Φn (u) = Φn,01 (u), Φn,12 (u) , Φn,01 (u) = ∑ (ΦnT,0 (u), ΦnT,1 (u))T , Φn,12 (u) = (ΦnT,1 (u), ΦnT,2 (u))T , Φn,j (u) = 1n ni=1  Zi ZTi ((Ui − u)/h1 )j Kh1 (Ui − u), j = 0, 1, 2;  T  ∑ T n  j 1 T ηn (u) = ηn,0 (u), ηn,1 (u) , ηn,j (u) = n i=1 Zi ((Ui − u)/h1 ) Kh1 (Ui − u) Yi , j = 0, 1,  Zi = ( Zi1 , . . . ,  Ziq )T . Remark 1. As to the two sliced inverse regression estimations τˆ1 and τˆ2 of β corresponding to ψ1 (β T X, U ) and ψ2 (β T X, U ), ˘ , α2T X˘ ) and ψ2 (β T X, U ) = ψ2 (α1T X˘ , α2T X˘ ), where X˘ = (XT , U )T , respectively, only if noting that ψ1 (β T X, U ) = ψ1 (α1T X α1 = (β T , 0)T , α2 = (0T , 1)T , 0 denotes zero vector, then β can be estimated by using the sliced inverse regression techniques (see Duan & Li, 1991). Step 2. Obtain the estimates of g (·) and g ′ (·). Assuming that β is known, using the same method as that used in Step 1, we can define the estimates of g (·) as g˘ (t ) = a˘ 0 , where a˘ 0 is the first minimizer of the weighted sum of squares n − 

2

Yi − ZTi θ¨ (Ui ) − a0 − b0 (β T Xi − t )

Kh3 (β T Xi − t )

(7)

i =1

with respect to a0 and b0 . Then simple calculation yields g˘ (t ; β) =

n −

Wni (t ; β)(Yi − ZTi θ¨ (Ui )),

(8)

i=1

where

− n

Wni (t ; β) = Vni (t ; β)

Vnj (t ; β),

j=1

and Vni (t ; β) = Kh3 (β T Xi − t ) Sn,2 (t ; β) − (β T Xi − t )Sn,1 (t ; β) ,





Sn,k (t ; β) =

n 1−

n i =1

(β T Xi − t )k Kh3 (β T Xi − t ),

k = 0, 1, 2.

208

Z. Huang / Journal of the Korean Statistical Society 40 (2011) 205–215

Step 3. Obtain the estimates βˆ of β by minimizing the sum of squared errors n − 

2

Yi − ZTi θ¨ (Ui ) − g˘ (β T Xi ; β)

.

(9)

i=1

Step 4. Obtain the final estimates gˆ (·) of g (·). With βˆ obtained from Step 3, we define the estimate of g (t ) as

ˆ gˆ (t ) = g˘ (t ; β).

(10)

Remark 2. In Steps 2 and 3, the final estimator βˆ of β can be obtained by using the following algorithm: (a) fit a partially ˆ β‖ ˆ ; (b) based on (a), linear single-index model (see Carroll et al., 1997) to obtain the initial estimates βˆ 1 and set βˆ = β/‖ ∑ ˆ , and minimize ni=1 [Yi − ZTi θ¨ (Ui ) − g˘ (β T Xi ; β)] ˆ 2 with respect to β ; (c) continue Steps (a) and (b) until we find g˘ (β T Xi ; β) convergence. We obtain the final estimator βˆ of β . Then we obtain the final estimator gˆ (t ) in Step 4. 2.2. Empirical likelihood for varying-coefficient parts In this section, based on the estimation procedure in Section 2.1, we define an empirical log-likelihood ratio function by the empirical likelihood principle. Suppose that the recorded data {Yi , Ui , Xi , Zi }ni=1 are generated by model (1), this is Yi = ZTi θ (Ui ) + g (β T Xi ) + εi ,

i = 1, . . . , n,

(11)

where ε1 , . . . , εn are independent and identically distributed random errors with E (εi |Ui , Xi , Zi ) = 0 and Var(εi |Ui , Xi , Zi ) = σ 2 (Ui ), i = 1, . . . , n, Xi = (Xi1 , . . . , Xiq )T ∈ Rq and Zi = (Zi1 , . . . , Zip )T ∈ Rp . To define the empirical likelihood estimator of θ (u), we employ the constraint E {[Y −ZT θ (U )−g (β T X)]Z|U = u}f (u) = 0 if we assume that g (·) and β are known, where f (u) is the density function of U, and it has a compact support S(f ). With this, an auxiliary random vector is defined as

  ξi (θ (u)) = Yi − ZTi θ (u) − g (β T Xi ) Zi Kh (Ui − u),

(12)

where Kh (·) = K (·/h). Note that the {ξi (θ (u))}ni=1 are independent and E [ξi (θ (u))] = 0 and that if θ (u) is the true parameter, whereas, when ∑n E [ξi (θ (u))] = 0, we can construct an estimating equation i=1 ξi (θ (u)) = 0. If we assume that g (·) and β are known, then the solution of the equation is just the least-squares estimator of θ (u). Therefore, by Owen (1991), this can be done using the empirical likelihood, and we can define the profile empirical log-likelihood ratio function Ln (θ (u)) = −2 max

 n −

log(npi ) : pi ⩾ 0,

i =1

n −

pi = 1 ,

i=1

n −

 pi ξi (θ (u)) = 0 .

(13)

i=1

If θ(u) is the true parameter, then Ln (θ (u)) can be shown to be an asymptotically standard χ 2 -distribution with p degrees of freedom, denoted by χp2 . However, Ln (θ (u)) cannot be directly used to make inference on θ (u) because Ln (θ (u)) contains the unknown function g (·) and the unknown parameter β . A natural way of solving this problem is to replace g (·) and β in Ln (θ(u)) by estimators gˆ (·) and βˆ defined in (9) and (10) of Section 2.1, respectively; see Remark 2. Let ξˆi (θ (u)) be an estimator of ξi (θ (u)) with g (·) and β being replaced by gˆ (·) and βˆ , respectively, for i = 1, . . . , n. Then an estimated empirical log-likelihood is defined as Ln (θ (u)) = −2 max

 n −

log(npi ) : pi ⩾ 0,

i =1

n − i=1

pi = 1 ,

n −

 pi ξˆi (θ (u)) = 0 .

(14)

i=1

By the Lagrange multiplier method, Ln (θ (u)) can be represented as Ln (θ (u)) = 2

n −

log{1 + λT ξˆi (θ (u))},

(15)

i =1

where λ is determined by n 1−

n i =1

ξˆi (θ (u)) = 0. 1 + λT ξˆi (θ (u))

(16)

Next we will show in the next section that if θ (u) is the true parameter, Ln (θ (u)) is asymptotically χ 2 -distributed. To derive a theory for Ln (θ (u)), the following assumptions are required. These assumptions, although look a bit lengthy, are actually quite mild and can be easily satisfied.

Z. Huang / Journal of the Korean Statistical Society 40 (2011) 205–215

209

2.3. Asymptotic results It is known that Owen’s empirical log-likelihood ratio statistic has a chi-squared limiting distribution which is analogous to the well known Wilks theorem for parametric settings. Hence we can expect that Ln (θ (u)) will also be asymptotically chi-squared distributed. To derive a theory for Ln (θ (u)), the following assumptions are required. Assumption 1. The density of β T X, r (·), is bounded away from zero and satisfies the Lipschitz condition of order 1 on T = {t = β T x : x ∈ A}, and A is a compact support of X . Assumption 2. g (t ) has two bounded and continuous derivatives on T . The density function of U, f (u), has support [0, 1], and is continuous at u0 , where u0 is an interior point of [0, 1]. For j = 1, . . . , q, θj (·) has continuous derivative of order 2. Assumption 3. The joint density function of (β T X, U ), f (t , u), is bounded away from zero on T ×[0, 1]. The functions f (t , u), ψ1 (t , u) and ψ2j (t , u) have bounded partial derivative up to order four almost surely, where ψ2j (t , u) is the jth component of ψ2 (t , u) for j = 1, . . . , q. Assumption 4. The matrix Γ (u0 ) = (γl,k (u0 )) is a positive definite matrix of order p × p, where γl,k (u0 ) E {[Zil − µl (u0 )][Zil − µl (u0 )]|U = u0 } and µl (u0 ) = E {Zil |U = u0 }.

=

Assumption 5. For some s ⩾ 2 and j = 1, . . . , q, E (|Z1j |2s |U = u), E (|ε|2s |U = u), E (|Z1j |s |β T X = t , U = u), E (|ε|s |β T X = t , U = u, Z = z ) and E (|ε|s |X = x) are bounded. Assumption 6. The bandwidths h1 satisfies that n2ς −1 h1 → ∞ for some ς < 2 − s−1 , where s is the same as that in Assumption 5. And nh2 /(h1 log2 (n))2 → ∞, nh1 h2 → ∞, h42 /h1 → 0. 2 2 Assumption 7. nh3 → ∞, nh8 → 0, nh33 → ∞, nh12 3 → 0, and nhv /(log(n)) → ∞, v = 1, 2.

Assumption 8. The L(·, ·) is of bounded variation and is a right continuous kernel function of order four. Assumption  ∞ 9. The kernel K (u) is∞a bounded and symmetric probability density function, with a bounded derivative, and satisfies −∞ u2 K (u)du ̸= 0 and −∞ |u|i K (u)du < ∞, i = 1, 2, . . . . Theorem 1. Assume that Assumptions 1–9 hold. If θ (u0 ) is the true value of the parameter,then we have d

Ln (θ (u0 )) −→ χp2 ,

(17)

d

where −→ stands for convergence in distribution, and χp2 is a chi-squared distribution with p degrees of freedom. As a consequence of Theorem 1, confidence regions for θ (u0 ) can be constructed by (10). More precisely, for any 0 ⩽ φ < 1, let cφ be such that P (χp2 ⩽ cφ ) = 1 − φ . Then Cφ (θ˜ (u0 )) = {θ˜ (u0 ) : Ln (θ˜ (u0 )) ⩽ cφ } constitutes a confidence region for θ(u0 ) with asymptotically correct coverage probability 1 − φ . 2.4. Partial profile empirical likelihood ratio In order to construct the pointwise confidence interval for a component of θ (u), we employ the partial profile empirical likelihood ratio to define the estimated empirical log-likelihood ratio functions for θr (u) as Ln,r (θr (u)) = 2

n −





log 1 + λr ξˆi,r (θr (u)) ,

r = 1, . . . , p,

i =1

where

ξˆi,r (θr (u)) = eTr W −1 (u)ξˆi (θˆ1 (u), . . . , θˆr −1 (u), θr (u), θˆr +1 (u), . . . , θˆp (u)),

(18)

and er denotes the unit vector of length p with 1 at position r for r = 1, . . . , p. θˆr (u) = eTr θˆ (u) is the estimator of the rth component θr (u) for r = 1, . . . , p, and θˆ (·) is the minimizer of Ln (θ (u)), i.e., if we assume that W (u) is invertible n   − θˆ (u) = W −1 (u)(nh)−1 Zi Yi − gˆ (βˆ T Xi ) Kh (Ui − u) + op ((nh)−1/2 ), (19) i=1 T ˆ where W (u) = (nh) i=1 Zi Zi Kh (Ui − u). Xue and Zhu (2007) pointed out that θ (u) is equivalent to the least-squares estimator, and by using Lemmas 3 and 4 in the Appendix and the same method as that used in Theorem 1 of Wu et al. (1998), we can obtain the same theorem as that given in Section 2.2 of Xue and Zhu (2007). Here we omit it.

−1

∑n

Theorem 2. Under the assumptions of Theorem 1, we have d

Ln,r (θr (u0 )) −→ χ12 .

(20)

210

Z. Huang / Journal of the Korean Statistical Society 40 (2011) 205–215

a

b

Fig. 1. 95% confidence bands for (a) θ1 (u), (b) θ2 (u) based on EEL (dashed curve), NA (dotted curve) and the BS (dot-dashed curve), and the solid curves are the mean estimated curves, n = 200.

Table 1 Estimated coverage probabilities of 95% confidence intervals. n

Methods

θ1 (·)

θ2 (·)

100

EEL NA BS EEL NA BS

0.917 0.903 0.901 0.935 0.920 0.922

0.915 0.904 0.903 0.942 0.926 0.918

200

Applying Theorem 2, the approximate 1 − φ confidence region for θr (u0 ) is defined as Cφ,r (θ˜r (u0 )) = {θ˜r (u0 ) :

Ln,r (θ˜r (u0 )) ⩽ cφ } with P (χ12 ⩽ cφ ) = 1 − φ .

3. Numerical results In this section, we reported on a simulation study. For comparison, we considered three approaches to constructing a confidence region/interval: the estimated empirical likelihood (EEL) as suggested in Section 2.2, the normal approximation (NA) (see Wu et al., 1998) and the bootstrap (BS) method (see Xue & Zhu, 2007). To demonstrate the performance of the EEL, we make a comparison between the EEL and the other two methods through coverage accuracies and average widths of the confidence intervals. We consider a general ‘sine bump’ model Y = g (β T X) + θ1 (U )Z1 + θ2 (U )Z2 + ε,

(21)

√ where β = (1, 1, 1)T / 3, U is uniform (0, 1) variable, X are trivariate with independent uniform (0, 1) components, the covariates Z1 , Z2 are jointly normally distributed with means 0 and variance 1. Furthermore, the correlation coefficients among the two random variables Z1 , Z2 are 2/3. And ε is a normal variable with mean zero and variance 0.22 . X, U, (Z1 , Z2 )   π (t −a) and ε are simulated independently. In model (21), g (t ) = sin , θ1 (u) = cos(π u/2), θ2 (u) = [1 − (u − 2)2 /2], where b −a √ √ √ √ a = 3/2 − 1.645/ 12 and b = 3/2 + 1.645/ 12. This model was used by Carroll et al. (1997) and Zhu and Xue (2006) when θ (·) is a constant. In this example, we took the sample size n = 100, 200. The confidence intervals and their coverage probabilities, with nominal level 1 − φ = 0.95, were computed from 1000 runs. For the smoother, we use the second-order kernel 15 function K (u) = 16 (1 − u2 )2 I(|u|⩽1) and the product kernel L(t , u) = K0 (t )K0 (u) through all smoothing steps, where 2 K0 (t ) = 3/8(3 − 5t ) is a kernel of order four. To implement the least-squares estimators of the unknown parameter with undersmoothing, we used the bandwidths h1,LOSCV × n−2/15 , h2,LOSCV × n−2/15 , h3,LOSCV × n−2/15 in the simulation experiments (see Carroll et al., 1997), where hj,LOSCV , j = 1, 2, 3 can be obtained by the ‘‘leave-one-subject’’ cross-validation method (see Fan & Gijbels, 1996). As to the choice of the bandwidth h, we still employed the ‘‘leave-one-subject’’ cross-validation procedure used in Section 5 of Xue and Zhu (2007). Here the details are omitted. Sliced inverse regression (SIR) used to obtain the initial estimator of β using 10 elements per slice generally yields good results. Here we present the performance of the three methods, which are the EEL, the NA and the BS. Approximately 95% confidence level intervals for θr (u), r = 1, 2 were constructed in this example. The simulation results are reported in Fig. 1,

Z. Huang / Journal of the Korean Statistical Society 40 (2011) 205–215

211

which shows that the EEL gives narrower confidence intervals than the other two methods. And the pointwise intervals on the EEL yield higher coverage probabilities than the other methods; see Table 1. From Table 1, it is shown that the EEL consistently achieves higher coverage probabilities. We also see that, as n increases, the coverage probabilities of the EEL increase. Our limited simulation study suggests that the EEL outperforms the NA and the BS. Acknowledgements The author thanks the Editor and the two referees for their constructive comments that led to the improvement of the quality of this paper. Appendix In order to prove the main results we first introduce several lemmas that will be used in the proof of Theorems 1–2. Let

ρn =



log n nh1

1/2

+



log n nh2

1/2

+ h21 + h22 , µj =



uj K (u)du, j = 0, 1, 2, 3. Throughout this section, we use c > 0 generically

to represent any constant which may take a different value for each appearance. Let V (u0 ) = σ 2 (u0 )f (u0 ) where σ 2 (·) is defined in Section 1 and Γ (u0 ) is defined in Assumption 4.



K 2 (u)duΓ (u0 ),

Lemma 1. Let (X1 , Y1 ), . . . , (Xn , Yn ) be i.i.d random vectors, where the Yi ’s are scalar random variables. Further assume that E |y|s < ∞ and supx |y|s f (x, y)dy < ∞, where f denotes the joint density of (X , Y ). Let K be a bounded positive function with a bounded support, satisfying a Lipschitz condition. Given that n2ς −1 h → ∞ for some ς < 1 − s−1 , then

  n  1 −     sup  [Kh (Xi − x)Yi − E {Kh (Xi − x)Yi }] = Op {log(1/h)/nh}1/2 ,  x∈D  n i=1

(A.1)

where D is some closed set. Lemma 1 follows immediately from the result that was obtained by Mack and Silverman (1982). Lemma 2. Suppose that Assumptions 1–9 hold, then, for any integer r ⩾ 2, we have, uniformly over 1 ⩽ i ⩽ n,





r E θ¨j (Ui ) − θj (Ui )

= O(ρnr ).

(A.2)

Proof. By nearly the same method as that for Theorem 3.3 of You, Zhou, and Chen (2006), we can complete the proof of this lemma. Here we omit the details.  Lemma 3. Under the assumptions of Lemmas 1 and 2, for any integer r ⩾ 2, we have, uniformly over 1 ⩽ i ⩽ n,





r E gˆ (β T Xi ; β) − g (β T Xi )

−r / 2 1 −r = O(h2r h3 + ρn2r h13−r ). 3 +n

Proof. By Lemma 2 and using the methods that are used in the proof of Lemma 3 in Xue and Zhu (2007), one can complete the proof. So we omit it.  Lemma 4. Under Assumptions 1–9, we have

βˆ − β = Op (n−1/2 ). Proof. By using nearly the same method as that used in the proof of Theorem 2.2 in Liang and Wang (2005), one can complete the proof of Lemma 4. Thus we omit the details.  Lemma 5. Under the assumptions of Theorem 1, if θ (u0 ) is the true value of the parameter, we have max1⩽i⩽n ‖ξˆi (θ (u0 ))‖ = op ((nh)1/2 ). Proof. By the definition of ξˆi (θ (u0 )) in Section 2.2, we have

 T ξˆi (θ (u0 )) = ξˆi,1 (θ (u0 )), . . . , ξˆi,p (θ (u0 )) and

212

Z. Huang / Journal of the Korean Statistical Society 40 (2011) 205–215



 ξˆi (θ (u0 )) = Yi − ZTi θ (u) − gˆ (βˆ T Xi ) Zi Kh (Ui − u0 )   = εi + ZTi (θ (Ui ) − θ (u0 )) + (g (β T Xi ) − gˆ (βˆ T Xi )) Zi Kh (Ui − u0 ) = εi Zi Kh (Ui − u0 ) + Zi ZTi (θ (Ui ) − θ (u0 ))Kh (Ui − u0 ) + (g (β T Xi ) − gˆ (βˆ T Xi ))Zi Kh (Ui − u0 ). Therefore, we only consider ξˆi,s (θ (u0 )), 1 ⩽ s ⩽ p, the sth component of ξˆi (θ (u0 )), the other components can be proved similarly. Note that max |ξˆi,s (θ (u0 ))| ⩽ c max |εi Zi,s Kh (Ui − u0 )|

1⩽i⩽n

1⩽i⩽n

+ c max |Zi,s ZTi (θ (Ui ) − θ (u0 ))Kh (Ui − u0 )| 1⩽i⩽n

+ c max |(g (β T Xi ) − gˆ (βˆ T Xi ))Zi,s Kh (Ui − u0 )| 1⩽i⩽n

≡ A1 + A2 + A3 , where Zi,s denotes the sth component of Zi . To prove Lemma 5, we need to prove only that Al = op ((nh)1/2 ), l = 1, 2, 3. Here we only prove that A3 = op ((nh)1/2 ), A1 and A2 can be dealt with similarly. Note that A3 = c max |(g (β T Xi ) − gˆ (βˆ T Xi ))Zi,s Kh (Ui − u0 )| 1⩽i⩽n

⩽ c max |(ˆ g (β T Xi ) − g (β T Xi ))Zi,s Kh (Ui − u0 )| 1⩽i⩽n

+ c max |(ˆg (βˆ T Xi ) − gˆ (β T Xi ))Zi,s Kh (Ui − u0 )| 1⩽i⩽n

≡ A31 + A32 .

(A.3) 1/2

Now we first show that A31 = op ((nh) any δ > 0, P A31 > (nh)1/2 δ ⩽





). By the Hölder inequality, the Markov inequality and Lemma 3, we have, for

n −   P |(ˆg (β T Xi ) − g (β T Xi ))Zi,s Kh (Ui − u0 )| > (nh)1/2 δ i =1





n − 

1

(nh)δ

E |(ˆg (β T Xi ) − g (β T Xi ))Zi,s Kh (Ui − u0 )|2

2 i=1 n −

1

(nh)δ 2

⩽ c (nh )



E 1/2 |(ˆg (β T Xi ) − g (β T Xi ))|



4

E 1/2 |Zi,s Kh (Ui − u0 )|



4

i=1

3 −1

3 −3 1/2 7 [h83 + n−2 h− , 3 + ρn (ρn h3 )]

(A.4)

which, together with the assumptions of Lemma 5, proves that (A.4) converges to 0. Hence A31 = op ((nh)1/2 ).

(A.5)

Next we prove that A32 = op ((nh)1/2 ). An elementary calculation yields gˆ (βˆ T Xi ) − gˆ (β T Xi ) = g ′ (β T Xi )(βˆ − β)T Xi + (ˆg ′ (β T Xi ) − g ′ (β T Xi ))(βˆ − β)T Xi + op (n−1/2 ).

(A.6)

By (A.6) we have A32 ⩽ c max |g ′ (β T Xi )(βˆ − β)T Xi Zi,s Kh (Ui − u0 )| 1⩽i⩽n

+ c max |(ˆg ′ (β T Xi ) − g ′ (β T Xi ))(βˆ − β)T Xi Zi,s Kh (Ui − u0 )| 1⩽i⩽n

+ c max |op (n−1/2 )Zi,s Kh (Ui − u0 )| 1⩽i⩽n

≡ A321 + A322 + A323 ,

(A.7)

By Lemma 4 and using similar arguments to those which were used in the proofs of A31 , we can obtain that A32k = op ((nh)1/2 ) for each k = 1, 2, 3. It implies that A32 = op ((nh)1/2 ).

(A.8)

Z. Huang / Journal of the Korean Statistical Society 40 (2011) 205–215

213

This, together with (A.5), proves that A3 = op ((nh)1/2 ).

(A.9)

The proof of Lemma 5 is completed.



Lemma 6. Under the assumptions of Theorem 1, if θ (u0 ) is the true value of the parameter, we have 1



n −

nh i=1

d ξˆi (θ (u0 )) −→ N (0, V (u0 )).

Proof. By the definition of ξˆi (θ (u0 )) in Section 2.2, it is easy to show that 1



n −

nh i=1

1

ξˆi (θ (u0 )) = √

n −

nh i=1

{Zi − E (Zi |Ui = u0 )} εi Kh (Ui − u0 ) +

3 −

Bl ,

(A.10)

l=1

where n 1 − B1 = √ εi E (Zi |Ui = u0 )Kh (Ui − u0 ), nh i=1 n 1 − B2 = √ Zi ZTi (θ (Ui ) − θ (u0 ))Kh (Ui − u0 ), nh i=1 n 1 − B3 = √ (g (β T Xi ) − gˆ (βˆ T Xi ))Zi Kh (Ui − u0 ). nh i=1

In the following, we show that 1

Λ≡ √

n −

nh i=1

d

{Zi − E (Zi |Ui = u0 )} εi Kh (Ui − u0 ) −→ N (0, V (u0 )).

(A.11)

By directly computing the mean and variance of Λ, we can obtain E (Λ) = o(1) and cov(Λ) = V (u0 ) + o(1). By the Central Limit Theorem, the proof of (A.2) is completed. p

By arguments similar to those which were used in the proof of A3 of Lemma 5, we can show that Bl −→ 0, l = 1, 2, 3. This, together with (A.10) and (A.11), proves Lemma 6.  Lemma 7. Under the assumptions of Theorem 1, if θ (u0 ) is the true value of the parameter, we have n 1 −

nh i=1

p ξˆi (θ (u0 ))ξˆiT (θ (u0 )) −→ V (u0 ).

Proof. Let

Λi = {Zi − E (Zi |Ui = u0 )} Kh (Ui − u0 ), Rni = εi E (Zi |Ui = u0 )Kh (Ui − u0 ) + Zi ZTi (θ (Ui ) − θ (u0 ))Kh (Ui − u0 )

+ (g (β T Xi ) − gˆ (βˆ T Xi ))Zi Kh (Ui − u0 ). By the definition of ξˆi (θ (u0 )) in Section 2.1, it is easy to see that n 1−

n i=1

ξˆi (θ (u0 ))ξˆiT (θ (u0 )) = =

n 1 −

nh i=1 n 1 −

nh i=1

+

{εi Λi + Rni } {εi Λi + Rni }T εi2 Λi Λi T +

n 1 −

nh i=1

n 1 −

nh i=1

εi Λi RTni +

≡ D1 + D2 + D3 + D4 .

Rni RTni

n 1 −

nh i=1

εi Rni Λi T (A.12)

214

Z. Huang / Journal of the Korean Statistical Society 40 (2011) 205–215

By the proof of (A.12) in Wu et al. (1998), we can get D1 =

n 1−

n i=1

p

εi2 Λi Λi T −→ V (u0 ).

(A.13)

Next we prove that D2 =

n 1−

n i=1

p

Rni RTni −→ 0.

(A.14)

By the Cauchy–Schwarz inequality, we have



n 1−

|D2,sl | ⩽

n i =1

1/2  R2ni,s

n 1−

n i =1

1/2 ,

R2ni,l

(A.15)

where D2,sl denotes the (s, l) element of D2 , and Rni,l denotes the lth component of Rni . By Lemmas 5 and 6, we can obtain p

p

shown that D3 −→ 0,

1 n

p

∑n

i=1

R2ni,s −→ 0, which, combining with (A.15), leads to (A.14). Similarly, it can be

D4 −→ 0. This, together with (A.12)–(A.15), yields Lemma 7.



Lemma 8. Under the assumptions of Theorem 1, if θ (u0 ) is the true value of the parameter, we have λ = Op ((nh)−1/2 ), where λ is defined in (16). 1 −1/2 ˆ Proof. By Lemma 6, it can be shown that nh ), which, combining with Lemma 7, leads to i=1 ξi (θ (u0 )) = Op ((nh) Lemma 8 by using the same arguments as that used in the proof of expression (2.14) in Owen (1990). 

∑n

With these preparations, we are in a position to prove Theorems 1 and 2. Proof of Theorem 1. Applying a Taylor series expansion to Eq. (15) and invoking Lemmas 5, 7 and 8, we can obtain that Ln (θ (u0 )) = 2

n  −

 1 T 2 ˆ ˆ λ ξi (θ (u0 )) − [λ ξi (θ (u0 ))] + op (1). T

(A.16)

2

i=1

By Eq. (16), it follows that 0=

n 1−

n i=1

n n 1− ξˆi (θ (u0 )) 1− ξˆi (θ (u0 )) − ξˆi (θ (u0 ))ξˆiT (θ (u0 ))λ = T ˆ n n 1 + λ ξi (θ (u0 )) i=1 i =1

+

n 1 − ξˆi (θ (u0 ))[λT ξˆi (θ (u0 ))]2

1 + λT ξˆi (θ (u0 ))

n i=1

,

(A.17)

which, combining with Lemmas 5, 7 and 8, leads to

 λ=

n −

 −1 ξˆi (θ (u0 ))ξˆiT (θ (u0 ))

n −

i=1

ξˆi (θ (u0 )) + op ((nh)−1/2 ),

(A.18)

i=1

and n n − − [λT ξˆi (θ (u0 ))]2 = λT ξˆi (θ (u0 )) + op (1), i=1

(A.19)

i=1

this, together with (A.16), proves that

 Ln (θ (u0 )) =

1



n −

nh i=1

T  ξˆi (θ (u0 ))

n 1 −

nh i=1

 −1  ξˆi (θ(u0 ))ξˆ (θ (u0 )) T i

1



n −

nh i=1

 ξˆi (θ (u0 )) + op (1).

This, together with Lemma 7, yields

 Ln (θ (u0 )) =

1



n −

nh i=1

T ξˆi (θ (u0 ))

 V

−1

1

(u0 ) √

n −

nh i=1

 ξˆi (θ (u0 )) + op (1).

(A.20)

Z. Huang / Journal of the Korean Statistical Society 40 (2011) 205–215

215

Applying Lemma 6, we obtain n 1 − d ξˆi (θ (u0 )) −→ N (0, Ip ), V −1/2 (u0 ) √ nh i=1

where Ip is the p × p identity matrix. Expressions (A.12) and (A.20) yield Theorem 1.

(A.21) 

Proof of Theorem 2. By arguments that are similar to those which were used in the proof of Theorem 1, we can show Theorem 2. Thus the details are omitted.  References Ahmad, I., Leelahanon, S., & Li, Q. (2005). Efficient estimation of a semiparametric partially linear varying coefficient model. The Annals of Statistics, 33, 258–283. Carroll, R. J., Fan, J. Q., Gijbel, I., & Wand, M. P. (1997). Generalized partially linear single-index models. Journal of The American Statistical Association, 92, 477–489. Chen, S. X., & Hall, P. (1993). Smoothed empirical likelihood confidence intervals for quantiles. The Annals of Statistics, 21, 1166–1181. DiCiccio, T. J., Hall, P., & Romano, J. P. (1991). Bartlett adjustment for empirical likelihood. The Annals of Statistics, 19, 1053–1061. Duan, N., & Li, K. C. (1991). Slicing regression: a link free regression method. The Annals of Statistics, 19, 505–530. Fan, J., & Gijbels, I. (1996). Local polynomial modeling and its applications. London: Chapman and Hall. Hall, P., & La Scala, B. (1990). Methodology and algorithms of empirical likelihood. International Statistical Review, 58, 109–127. Härdle, W., Hall, P., & Ichimura, H. (1993). Optimal smoothing in single-index models. The Annals of Statistics, 21, 157–178. Hastie, T. J., & Tibshirani, R. (1990). Generalized additive models. New York: Chapman & Hall. Hastie, T. J., & Tibshirani, R. (1993). Varying-coefficient models. Journal of Royal Statistical Association B, 55, 757–796. Huang, Z. S., & Zhang, R. Q. (2009). Efficient estimation of adaptive varying-coefficient partially Linear regression model. Statistics & Probability Letters, 79, 943–952. Huang, Z. S., & Zhang, R. Q. (2010). Tests for varying-coefficient parts on varying-coefficient single-index model. Journal of the Korean Mathematical Society, 47, 385–407. Liang, H., & Wang, N. (2005). Partially linear single-index measurement error models. Statistica Sinica, 15, 99–116. Mack, Y. P., & Silverman, B. W. (1982). Weak and strong uniform consistence of kernel regression estimates. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 61, 405–415. Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single function. Biometrika, 75, 237–249. Owen, A. B. (1990). Empirical likelihood ratio confidence regions. The Annals of Statistics, 18, 90–120. Owen, A. B. (1991). Empirical likelihood for linear models. The Annals of Statistics, 19, 1725–1747. Owen, A. B. (2001). Empirical likelihood. Boca Raton, FL: Chapman & Hall/CRC. Wang, Q. H., & Xue, L. G. (2011). Statistical estimation in partially-varying-coefficient single-index models. Journal of Multivariate Analysis, 102, 1–19. Wong, H., Ip, W., & Zhang, R. Q. (2008). Varying-coefficient single-index models. Computational Statistics and Data Analysis, 52, 1458–1476. Wu, C. O., Chiang, C. T., & Hoover, D. R. (1998). Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data. Journal of The American Statistical Association, 93, 1388–1402. Xue, L. G., & Zhu, L. X. (2007). Empirical likelihood for a varying coefficient model with longitudinal data. Journal of The American Statistical Association, 102, 642–654. Yu, Y., & Ruppert, D. (2002). Penalized spline estimation for partially linear single-index models. Journal of The American Statistical Association, 97, 1042–1054. Zhu, L. X., & Xue, L. G. (2006). Empirical likelihood confidence regions in a partially linear single-index model. Journal of Royal Statistical Association B, 68, 549–570. You, J., Zhou, Y., & Chen, G. (2006). Corrected local polynomial estimation in varying-coefficient models with measurement errors. The Canadian Journal of Statistics, 34, 391–410.