Weighted composite quantile regression for single-index models

Weighted composite quantile regression for single-index models

Journal of Multivariate Analysis 148 (2016) 34–48 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.e...

444KB Sizes 0 Downloads 128 Views

Journal of Multivariate Analysis 148 (2016) 34–48

Contents lists available at ScienceDirect

Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva

Weighted composite quantile regression for single-index models Rong Jiang a,∗ , Wei-Min Qian b , Zhan-Gong Zhou c a

Department of Mathematics, College of Sciences, Donghua University, Shanghai 201620, China

b

Department of Mathematics, Tongji University, Shanghai 200092, China

c

Nanhu College, Jiaxing University, Jiaxing 314001, China

article

info

Article history: Received 5 February 2015 Available online 2 March 2016 AMS subject classifications: primary 60G08 secondary 62G20 Keywords: Single-index model Weighted composite quantile regression Adaptive LASSO

abstract In this paper we propose a weighted composite quantile regression (WCQR) estimation for single-index models. For parametric part, the WCQR is augmented using a data-driven weighting scheme. With the error distribution unspecified, the proposed estimators share robustness from quantile regression and achieve nearly the same efficiency as the semiparametric maximum likelihood estimator for a variety of error distributions including the Normal, Student’s t, Cauchy distributions, etc. Furthermore, based on the proposed WCQR, we use the adaptive-LASSO to study variable selection for parametric part in the single-index models. For nonparametric part, the WCQR is augmented combining the equal weighted estimators with possibly different weights. Because of the use of weights, the estimation bias is eliminated asymptotically. By comparing asymptotic relative efficiency theoretically and numerically, WCQR estimation all outperforms the CQR estimation and some other estimate methods. Under regularity conditions, the asymptotic properties of the proposed estimations are established. The simulation studies and two real data applications are conducted to illustrate the finite sample performance of the proposed methods. © 2016 Elsevier Inc. All rights reserved.

1. Introduction Single-index models provide an efficient way of coping with high-dimensional nonparametric estimation problems and avoid the ‘‘curse of dimensionality’’ by assuming that the response is only related to a single linear combination of the covariates. Therefore, much effort has been devoted to studying its estimation and other relevant inference problems, Härdle and Stoker [6] proposed the average derivative method (ADE). Ichimura [7] studied the properties of a semiparametric least-squares estimator in a general single-index model. Yu and Ruppert [34] considered the penalized spline estimation procedure, while Xia and Härdle [32] applied the minimum average variance estimation method, which was originally introduced by Xia, et al. [33] for dimension reduction. Wu, et al. [31] studied single-index quantile regression. Feng, et al. [5] proposed the rank-based outer product of gradients estimator for parametric part in the single-index model. Liu, et al. [20] applied the local linear model regression estimator method to single-index model. In this paper, we consider the following heteroscedastic single-index model: Y = g0 (X⊤ γ0 ) + σ (X⊤ γ0 )ε,



Corresponding author. E-mail addresses: [email protected] (R. Jiang), [email protected] (W.M. Qian), [email protected] (Z.G. Zhou).

http://dx.doi.org/10.1016/j.jmva.2016.02.015 0047-259X/© 2016 Elsevier Inc. All rights reserved.

(1.1)

R. Jiang et al. / Journal of Multivariate Analysis 148 (2016) 34–48

35

where Y is the univariate response and X is a vector of p-dimensional covariates. The functions g0 (·) and σ (·) are unspecified, nonparametric smoothing functions. γ0 is the unknown single-index vector coefficient, and for the sake of identifiability, see [19], we assume that ∥γ0 ∥ = 1 and that the first component of γ0 is positive, here ∥ · ∥ denotes the Euclidean norm, and the error term ε is assumed to be independent of X with E(ε) = 0 and Var(ε) = 1. The model (1.1) was also studied by Zhu and Zhu [36] and Zhu et al. [35]. The composite quantile regression (CQR) was first proposed by Zou and Yuan [39] for estimating the regression coefficients in the classical linear regression model. Zou and Yuan [39] showed that the relative efficiency of the CQR estimator compared with the least squares estimator is greater than 70% regardless of the error distribution. Based on CQR, Kai, et al. [16] proposed the local polynomial CQR estimators for estimating the nonparametric regression function and its derivative. It is shown that the local CQR method can significantly improve the estimation efficiency of the local least squares estimator for commonly-used non-normal error distributions. Kai, et al. [17] studied semiparametric CQR estimates for semiparametric varying-coefficient partially linear model. Jiang et al. [10] and Tang et al. [28] considered CQR method for random censored data. Jiang et al. [15] extended CQR method to single-index model. Jiang et al. [14] proposed a computationally efficient two-step composite quantile regression for single-index model. Jiang et al. [13] considered CQR estimates for single-index models with heteroscedasticity and general error distributions. Other references about CQR method can see Jiang et al. [9], Jiang et al. [11], Ning and Tang [21], Jiang et al. [12] and so on. However, the CQR method is a sum of different quantile regression with equal weights. Intuitively, equal weights are not optimal in general, and hence we propose a weighted CQR (WCQR) estimation method for single-index model. By comparing asymptotic relative efficiency theoretically and numerically, WCQR estimation all outperforms the CQR estimation and some other estimate methods. The paper is organized as follows. In Section 2, we introduce the weighted composite quantile procedure for parametric part in model (1.1). In Section 3, a variable selection method is developed. We propose the weighted local composite quantile procedure for nonparametric part in Section 4. Both simulation examples and the application of two real data are given in Section 5 to illustrate the proposed procedures. Final remarks are given in Section 6. All the conditions and technical proofs are deferred to the Appendix. 2. Weighted composite quantile regression for γ0 Let {Xi , Yi }ni=1 be an independent identically distributed (i.i.d.) sample from (X, Y). For X⊤ i γ ‘‘close to’’ u, the τ th conditional quantile at X⊤ γ can be approximated linearly by i ′ ⊤ ⊤ g (X⊤ i γ ) ≈ g (u) + g (u)(Xi γ − u) = a + b(Xi γ − u),

where a , g (u) and b , g ′ (u). Let ρτk (r ) = τk r − rI (r < 0), k = 1, . . . , q, be q check loss functions with 0 < τ1 < · · · < τq < 1. Typically, we use the equally spaced quantiles: τk = k/(q + 1) for k = 1, . . . , q. Let K (·) be the kernel weight function and h is the bandwidth. In order to define the weighted composite quantile regression, let us briefly review the quantile regression (QR) and composite quantile regression (CQR) method to estimate γ0 in the single-index model. By Wu et al. [31], the τ th QR estimate of γ0 can be obtained as follows: Step 1.0 (Initialization step): Obtain initial γˆ (0) from average derivative estimate (ADE) of Chaudhuri et al. [1]. Standardize QR the initial estimate such that ∥γˆ QR ∥ = 1 and γˆ1 > 0. Step 1.1: Given γˆ QR , obtain {ˆaj , bˆ j }nj=1 by solving a series of the following min

n 

(aj ,bj )

ρτ {Yi − aj − bj (Xi − Xj )⊤ γˆ QR }ωij ,

i =1

where ωij = K



X⊤ ˆ QR − X⊤ ˆ QR /h / i γ j γ



 n

l=1

K

ˆ QR − X⊤ ˆ QR /h and with the bandwidth h chosen optimally. X⊤ l γ j γ







Step 1.2: Given {ˆaj , bˆ j }nj=1 , obtain γˆ QR by solving min γ

n  n 

ρτ {Yi − aˆ j − bˆ j (Xi − Xj )⊤ γ }ωij ,

j=1 i=1

with ωij evaluated at γ and h from step 1.1. Step 1.3: Repeat Steps 1.1 and 1.2 until convergence. Composite several quantile information, Jiang et al. [13] proposed CQR method to estimate γ0 as follows: Step 2.0 (Initialization step): Obtain initial γˆ (0) from the minimum average variance estimation (MAVE) in Xia and CQR Härdle [32]. Standardize the initial estimate such that ∥γˆ CQR ∥ = 1 and γˆ1 > 0.

36

R. Jiang et al. / Journal of Multivariate Analysis 148 (2016) 34–48

Step 2.1: Given γˆ CQR , obtain {ˆa1j , . . . , aˆ qj , bˆ j }nj=1 by solving a series of the following q  n 

min

(a1j ,...,aqj ,bj )

  ρτk Yi − akj − bj (Xi − Xj )⊤ γˆ CQR ωij ,

k=1 i=1

with the bandwidth h chosen optimally. Step 2.2: Given {ˆa1j , . . . , aˆ qj , bˆ j }nj=1 , obtain γˆ CQR by solving min γ

q  n  n 

  ρτk Yi − aˆ kj − bˆ j (Xi − Xj )⊤ γ ωij ,

j=1 k=1 i=1

with ωij evaluated at γ and h from step 2.1. Step 2.3: Repeat Steps 2.1 and 2.2 until convergence. Note that the CQR method uses the same weight for different quantile regression models. Intuitively, it is more effective if different weights are used, which leads to following procedure: Step 3.0 (Initialization step): Obtain initial γˆ (0) from MAVE in Xia and Härdle [32]. Standardize the initial estimate such WCQR that ∥γˆ WCQR ∥ = 1 and γˆ1 > 0. Step 3.1: Given γˆ WCQR , obtain {ˆa1j , . . . , aˆ qj , bˆ j }nj=1 by solving a series of the following min

q  n 

(a1j ,...,aqj ,bj )

  vk ρτk Yi − akj − bj (Xi − Xj )⊤ γˆ WCQR ωij ,

k=1 i=1

with the bandwidth h chosen optimally and v = (v1 , . . . , vq )⊤ is a vector of weights such that ∥v∥ = 1. Step 3.2: Given {ˆa1j , . . . , aˆ qj , bˆ j }nj=1 , obtain γˆ WCQR by solving min γ

q  n  n 

  vk ρτk Yi − aˆ kj − bˆ j (Xi − Xj )⊤ γ ωij ,

k=1 i=1 j=1

with ωij evaluated at γ and h from step 3.1. Step 3.3: Repeat Steps 3.1 and 3.2 until convergence. Remark 1. In the above algorithm, the bandwidth h can be obtained by the ‘‘leave-one-subject-out’’ cross-validation proposed by Rice and Silverman [23]. 2.1. Asymptotic properties We establish the asymptotic normality of the WCQR estimator. Let F (·) and f (·) be the cumulative distribution function ⊤ and density function of the model error, respectively. Denote by fU (·) the marginal density function  j  j 2of U = X γ0 . We choose the kernel K (·) as a symmetric density function and write µj = u K (u)du, sj = u K (u)du. R1 (v) =

q

k=1

 −2  q  q vk f (ck ) v v ′ τ ′ , where τkk′ = τk ∧ τk′ − τk τk′ and ck = F −1 (τk ). k=1 k′ =1 k k kk

Theorem 1. Suppose that conditions (C1)–(C5) given in the Appendix hold. If n → ∞, h → 0 and nh → ∞, then



L

n(γˆ WCQR − γ0 ) − → N 0, S−1 R1 (v) ,





where S = E[σ −2 (X⊤ γ0 )g0′ (X⊤ γ0 )2 {X − E(X|X⊤ γ0 )}{X − E(X|X⊤ γ0 )}⊤ ]. 2.2. Choice of weights From Theorem 1, we find that the asymptotic variance of γˆ WCQR depends on v only through R1 (v). Thus, the optimal choice of weights for maximizing efficiency of the estimator γˆ WCQR is v˜ = arg min R1 (v), v

then, we can obtain v˜ = (f⊤ Ω −2 f)−1/2 Ω −1 f

(2.1)

where f = (f (c1 ), . . . , f (cq ))⊤ and Ω is a q × q matrix with the (k, k′ ) element Ωkk′ = τkk′ . Thus, with these optimal weights, R1 (˜v) = (f⊤ Ω −1 f)−1 . Remark 2. The optimal weight components can be very different, and some of them may even be negative, the similar result can be seen in Jiang et al. [8].

R. Jiang et al. / Journal of Multivariate Analysis 148 (2016) 34–48

37

2.3. Estimation of optimal weight It can be seen from (2.1), the optimal weight vector v is rather complicated and involves the density of the errors ck = F −1 (τk ) and f (ck ), k = 1, . . . , q. In practice, the error density f (·) is generally unknown. Following Jiang et al. [13], we propose an estimation procedure as follows. Step 4.1: Construct an initial estimator γˆ and gˆ (·) by MAVE method for γ0 and g0 (·), respectively. Step 4.2: Estimate σ 2 (·) by local linear least squares (see [25]).

(σˆ 2 (u), aˆ 1 ) = arg min a0 ,a1

n  

Yi − gˆ (X⊤ ˆ ) − a0 − a1 (X⊤ ˆ − u)⊤ i γ i γ

2

 K

X⊤ ˆ −u i γ

i=1

h1



.

Step 4.3: Compute εˆ i = {Yi − gˆ (X⊤ ˆ )}/{σ 2 (X⊤ ˆ )}1/2 and then use kernel density estimation to estimate f (·) as follows: i γ i γ

  n  εˆ i − · ˆf (·) = 1 K . nh2 i=1

h2

Step 4.4: The estimator Fˆ −1 (τ ) of F −1 (τ ) is the sample τ -quantile of {ˆεi , i = 1, . . . , n}, and estimate f (F −1 (τ )) by

fˆ (Fˆ −1 (τ )).

In Step 4.1 and 4.2, we use the same bandwidth h1 and  the bandwidth selection is taken by  method in Xia et al. [33]. Furthermore, we choose the bandwidth h2 = 0.9 × min std(ˆε1 , . . . , εˆ n ), IQR(ˆε1 , . . . , εˆ n )/1.34 × n−1/5 , where std and IQR denote the sample standard deviation and sample interquantile, respectively (see [26]). 2.4. Asymptotic relative efficiency In this section, we first investigate the asymptotic relative efficiency (ARE) of the WCQR relative to CQR proposed by Jiang et al. [15]. The asymptotic variance of the CQR is respect to the CQR is q q   k=1 k′ =1

ARE γ0 (WCQR, CQR) = 

q 

q

k=1

−2 q

f (ck )

k=1

q

k′ =1

τkk′ S−1 . Therefore, the ARE of the WCQR with

τkk′

⊤ −1 2 (f Ω f). f (ck )

k=1

It is easy to see that ARE γ0 (WCQR, CQR) ≥ 1, this implies that the WCQR is more efficient than the CQR method. Second, when σ (X⊤ γ0 ) ≡ σ , the asymptotic variance of the minimum average variance estimation (MAVE) method is S−1 . Therefore, the ARE of the WCQR with respect to the MAVE is ARE γ0 (WCQR, MAVE ) = f⊤ Ω −1 f, and it is easy to know that



q  q 

−1

τ    k=1 k′ =1  2  q    f (ck ) kk′

ARE γ0 (CQR, LS ) =  

.

k=1

Note that ARE γ0 (CQR, LS ) is the same as the result obtained by Zou and Yuan [39]. Therefore, the relative efficiency of CQR compared to the MAVE is greater than 70% regardless of the error distribution. Thus, ARE γ0 (WCQR, MAVE ) = f⊤ Ω −1 f ≥ 70%. Third, we study the ARE of the WCQR relative to the semiparametric maximum likelihood (SML) estimator (see [2]). It is  easy to show that the asymptotic variance of SML estimator is S−1 If−1 under σ (X⊤ γ0 ) ≡ σ , where If = {f ′ (t )}2 /f (t )dt is the Fisher information, and hence ARE γ0 (WCQR, SML) = If−1 (f⊤ Ω −1 f). The following theorem demonstrates that the WCQR method is nearly efficient as the SML method.

38

R. Jiang et al. / Journal of Multivariate Analysis 148 (2016) 34–48

Table 1 The relative efficiency of estimators γ .

ε

q

ARE(WCQR, MAVE)

ARE(WCQR, LAD)

ARE(WCQR, CQR)

ARE(WCQR, SML)

ARE(WCQR, ROPG)

ARE(WCQR, LMR)

N (0, 1)

5 9 19 29 99 999 5 9 19 29 99 999 5 9 19 29 99 999 5 9 19 29 99 999

0.9194 0.9590 0.9833 0.9900 0.9977 0.9998 1.9303 1.9631 1.9848 1.9912 1.9984 2.0000 1.4666 1.7136 1.9833 2.1132 2.4078 2.7255 NaN NaN NaN NaN NaN NaN

1.4442 1.5065 1.5446 1.5552 1.5673 1.5706 1.1907 1.2109 1.2243 1.2282 1.2327 1.2336 2.0102 2.3487 2.7184 2.8964 3.3001 3.7356 1.1250 1.1936 1.2235 1.2292 1.2332 1.2337

1.0144 1.0253 1.0357 1.0396 1.0451 1.0470 1.0191 1.0327 1.0441 1.0477 1.0519 1.0527 1.2744 1.4135 1.5900 1.6825 1.9046 2.1535 1.4583 1.5756 1.6274 1.6371 1.6442 1.6449

0.9194 0.9590 0.9833 0.9900 0.9977 0.9998 0.9647 0.9811 0.9919 0.9951 0.9987 0.9995 0.4889 0.5712 0.6611 0.7044 0.8026 0.9085 0.9119 0.9675 0.9918 0.9963 0.9997 1.0000

0.9628 1.0043 1.0297 1.0368 1.0449 1.0471 1.0160 1.0333 1.0447 1.0480 1.0518 1.0527 1.1585 1.3536 1.5666 1.6692 1.9019 2.1528 1.4998 1.5913 1.6313 1.6387 1.6442 1.6447

0.9194 0.9590 0.9833 0.9900 0.9977 0.9998 0.9831 0.9998 1.0109 1.0141 1.0178 1.0186 0.9126 1.0663 1.2341 1.3149 1.4982 1.6959 0.9686 1.0277 1.0535 1.0583 1.0619 1.0622

t (3 )

χ 2 (6)

C (0, 1)

Theorem 2. Suppose the derivative f ′ (·) of f (·) is uniformly continuous. Then, for q → ∞, we have lim ARE γ0 (WCQR, SML) = 1.

q→∞

Fourth, we study the ARE of the WCQR relative to the  rank-based outer product of gradients (ROPG) estimator (see [5]). The asymptotic variance of ROPG estimator is S−1 {12 f 2 (t )dt }−1 under σ (X⊤ γ0 ) ≡ σ . Then, we have ARE γ0 (WCQR, ROPG) =

f⊤ Ω −1 f 12



f 2 (t )dt

.

Fifth, we consider the ARE of the WCQR relative to the local linear model regression (LMR) estimator (see [20]). The asymptotic variance of LMR estimator is E{φh′′ (ε)}−2 E{φh′ (ε)2 }S−1 under σ (X⊤ γ0 ) ≡ σ , where φh (t ) = φ(t /h)/h and φ(·) is the density function of a standard normal distribution. Then, we can obtain ARE γ0 (WCQR, LMR) = E{φh′′ (ε)}−2 E{φh′ (ε)2 }(f⊤ Ω −1 f). For each q, the AREs of WCQR estimator with respect to some other estimators can be calculated. To appreciate how much efficiency is gained in practice, we investigate the performance of six estimators. Table 1 reports WCQR estimator is nearly efficient as the SML estimators for various error distributions. WCQR estimator is highly efficient than the other five estimators when q is large enough, except is less efficient than MAVE and LMR method under normal distribution. 3. Variable selection The adaptive LASSO (see [37]) can be viewed as a generalization of the LASSO penalty. Basically the idea is to penalize the coefficients of different covariates at a different level by using adaptive weights. The adaptive LASSO penalized weighted composite quantile regression estimator (PWCQR) for model (1.1), denoted by γˆ PWCQR , is the minimizer of the following function q  n  n 

p    vk ρτk Yi − akj − bj (Xi − Xj )⊤ γ ωij + λ

k=1 i=1 j=1

j =1

|γj | |γˆjWCQR |2

.

We propose to estimate γ in (3.1) with an iterative procedure described below. Step 5.0: (Initialization step). Obtain initial γˆ PWCQR(0) = γˆ WCQR . Step 5.1: Given γˆ PWCQR , obtain {ˆa1j , . . . , aˆ qj , bˆ j }nj=1 by solving a series of the following min

(a1j ,...,aqj ,bj )

q  n 

  vk ρτk Yi − akj − bj (Xi − Xj )⊤ γˆ PWCQR ωij ,

k=1 i=1

with the bandwidth h chosen optimally.

(3.1)

R. Jiang et al. / Journal of Multivariate Analysis 148 (2016) 34–48

39

Step 5.2: Given {ˆa1j , . . . , aˆ qj , bˆ j }nj=1 , obtain γˆ PWCQR by solving min γ

q  n  n 

p    vk ρτk Yi − aˆ kj − bˆ j (Xi − Xj )⊤ γ ωij + λ j =1

k=1 i=1 j=1

|γj | |γˆ

WCQR 2 j

|

,

with ωij evaluated at γ and h from step 5.1. Step 5.3: Repeat Steps 5.1 and 5.2 until convergence. In what follows, we show the adaptive LASSO PWCQR estimator enjoys the oracle properties.



Theorem 3. Under model (1.1) suppose that conditions (C1)–(C5) given in the Appendix hold. If λ/ n → 0, λ → ∞ and n → ∞, then 1. Consistency in selection : Pr({j : γˆj

PWCQR

2. Asymptotic normality :



n(γˆΛ

PWCQR

̸= 0} = Λ) → 1. L

− γΛ ) − → N (0, {S−1 R1 (v)}ΛΛ )

where Λ = {j : γj ̸= 0}. Thus, compared with Theorem 1, we can obtain vopt = v˜ , and the estimation of vopt can be obtained by the method in Section 2.3. Remark 3. To tune the parameter λ, many selection criteria such as cross validation (CV), generalized cross validation (GCV), BIC and AIC selection can be used. Wang et al. [29] pointed out that the GCV approach tends to produce overfitted models even as the sample size goes to infinity. For this reason, Wang et al. [30] developed a BIC-type selection criterion, which motivates us to consider the following BIC criterion BIC (λ) = ln

 n 

ρτ Yi − gˆ Xi γˆ 





PWCQR

 

+ dfλ ln(n)/n,

i=1

where dfλ is the number of nonzero coefficients in γˆ PWCQR , a simple estimate for the degrees of freedom (see [38]). We can ˆ = arg minλ BIC (λ). select λ 4. Weighted local composite quantile regression for g0 (·) After obtaining the WCQR estimation γˆ WCQR of γ0 in model (1.1), we can estimate g0 (·) in the model (1.1). We first recall the local composite quantile regression proposed by Jiang et al. [15]. For any interior point u of the support of XT γˆ WCQR , the q ˆ final estimate of g (·) is gˆ (u) = a / q, where k k=1

(ˆa1 , . . . , aˆ q , bˆ ) = arg

min

(a1 ,...,aq ,b)

q  n 

   ρτk Yi − ak − b X⊤ ˆ WCQR − u K i γ

k=1 i=1



XTi γˆ WCQR − u h



.

(4.1)

By the same method introduced in Section 2, we consider the following weighted local composite quantile regression loss: q  n 

   wk ρτk Yi − ak − b X⊤ ˆ WCQR − u K i γ

k=1 i=1



X⊤ ˆ WCQR − u i γ

 (4.2)

h

where w = (w1 , . . . , wq )⊤ is a vector of weights such that ∥w∥ = 1. By minimizing the above objective function, we can get a composite estimator for nonparametric regression function g (u). However, Theorem 4 demonstrates that this approach q will not work because the estimator of {ak }k=1 obtained by minimizing weighted local composite quantile regression loss (4.2) has the same asymptotic distribution as that obtained by (4.1). Theorem 4. By minimizing weighted local composite quantile regression loss (4.2) with any specified weights, we get the same asymptotic normal distribution with the estimations without the weights (4.1). q

Thus, we turn to apply the method proposed by Sun et al. [27]: combining the initial estimators {ak }k=1 from (4.1) with possibly different weights w. Then, for any interior point u of the support of X⊤ γˆ WCQR , we define the weighted local composite quantile regression (WLCQR) estimator of g (u) as gˆ (u) =

q  k=1

wk aˆ k ,

40

R. Jiang et al. / Journal of Multivariate Analysis 148 (2016) 34–48

where the weight vector w = (w1 , . . . , wq )⊤ satisfies conditions q 

q 

wk = 1,

k=1

wk ck = 0.

(4.3)

k=1

We state the asymptotic normality for gˆ (u) in the following theorem. Theorem 5. Suppose that conditions (C1)–(C5) given in the Appendix hold, and the weight vector w satisfies condition (4.3). If n → ∞, h → 0 and nh → ∞, then for an interior point u,





1 ′′ g0 (u)µ2 h2 2

nh gˆ (u) − g0 (u) −

where R2 (w) =

q

k=1

q

k′ =1





L

− →N

0,

s0 σ 2 (u) fU (u)



R2 (w) ,

[wk wk′ τkk′ /{f (ck )f (ck′ )}].

It is easy to see, the bias of gˆ (u) is free of the choice of weight vector w, only the variance term depends on the weight vector w. Then the optimal weights are corresponding to the minimum asymptotic variance of gˆ (u). Thus

˜ = arg min R2 (w), w w

then, we can obtain

˜ = w

(r⊤ A−1 r)A−1 1 − (1⊤ A−1 r)A−1 r , (r⊤ A−1 r)(1⊤ A−1 1) − (1⊤ A−1 r)2

where r is a q-dimensional column vector with kth element ck and 1 is a q-dimensional column vector with all elements 1, and A is a q × q matrix with (k, k′ )-element τkk′ /(f (ck )f (ck′ )), k, k′ = 1, . . . , q. Moreover, the estimations of ck and f (ck ) can be obtained by the method in Section 2.3. Furthermore, the asymptotic normality for the optimal estimator, denoted as gˆ ∗ (u), is given by





nh gˆ ∗ (u) − g0 (u) −

˜) = where R3 (w

q

k=1

q

k′ =1

1 ′′ g0 (u)µ2 h2 2





L

− →N

0,

s0 σ 2 (u) fU (u)



˜) , R3 ( w

[w ˜ kw ˜ k′ τkk′ /{f (ck )f (ck′ )}].

4.1. Bandwidth selection Bandwidth selection is always crucial in local smoothing as it governs the curvature of the fitted function. Theoretically, when the sample size is large, the optimal bandwidth could be derived by minimizing the asymptotic mean squared error of gˆ ∗ (u) as MSE {ˆg (u)} = ∗



1 ′′ g0 (u)µ2 2

2

4

h +

1 s 0 σ 2 ( u) nh fU (u)

˜ ) + op R3 (w



4

h +

1 nh



,

obtained by minimizing the MSE, the optimal variable bandwidth is h

opt

(u) =



˜) s0 σ 2 (u)R3 (w

1/5

fU (u){g0′′ (u)µ2 }2

n−1/5 .

According to Fan and Gijbels [4], the optimal bandwidth for the local linear least squares estimator can be expressed as opt

hLS (u) =



s0 σ 2 (u) fU (u){g0′′ (u)µ2 }2

1/5

n−1/5 .

Thus, we have

˜ )1/5 hLS . hopt (u) = R3 (w opt

opt

˜ ) can be estimated by the methods in Section 2.3. Thus, hopt Since there are many existing algorithms for hLS (see [4]), R3 (w opt is also readily available. In practice, hLS is selected by a plug-in bandwidth selector (see [24]).

R. Jiang et al. / Journal of Multivariate Analysis 148 (2016) 34–48

41

Table 2 The relative efficiency of estimators g.

ε

q

ARE(WLCQR,MAVE)

ARE(WLCQR,LAD)

ARE(WLCQR,CQR)

ARE(WLCQR,LMR)

N (0, 1)

5 9 19 29 99 999 5 9 19 29 99 999 5 9 19 29 99 999

0.9350 0.9671 0.9866 0.9920 0.9982 0.9999 1.6924 1.7154 1.7305 1.7349 1.7400 1.7411 NaN NaN NaN NaN NaN NaN

1.3418 1.3879 1.4159 1.4237 1.4325 1.4350 1.1499 1.1655 1.1758 1.1788 1.1822 1.1830 1.0988 1.1521 1.1752 1.1795 1.1826 1.1830

1.0011 1.0012 1.0009 1.0007 1.0002 1.0000 1.0599 1.1255 1.2203 1.2738 1.4120 1.5830 1.9188 2.9016 5.0073 6.8888 17.8770 112.3093

0.9350 0.9671 0.9866 0.9920 0.9982 0.9999 0.9865 0.9999 1.0087 1.0113 1.0142 1.0148 0.9748 1.0221 1.0426 1.0464 1.0492 1.0495

t (3 )

C (0, 1)

4.2. Asymptotic relative efficiency In this section, we first investigate the asymptotic relative efficiency (ARE) of the WLCQR relative to the mean regression by MAVE in Xia and Härdle [32] and CQR method proposed in Jiang et al. [15]. Moreover, the asymptotic variance of SML estimator (see [2]) is the same as the asymptotic variance of MAVE estimator. We consider the ARE of the WLCQR nonparametric estimation of g0 (·). By similar deduction of Sun et al. [27], we can get that the asymptotic efficiency of the WLCQR estimation of g0 (·) relative to the MAVE and CQR estimation for the case of symmetric errors will be

˜ )−4/5 , ARE g0 (WLCQR, MAVE ) = R3 (w ARE g0 (WLCQR, CQR) =

q

q

where R4 = k=1 symmetric, we have

k′ =1



˜) R3 (w

−4/5

R4

,

[τkk′ /{f (ck )f (ck′ )}]/q2 . It has been shown by Sun et al. [27] that when the error distribution is

lim inf ARE g0 (WLCQR, MAVE ) ≥ 1,

q→∞

lim inf ARE g0 (WLCQR, CQR) ≥ 1.

q→∞

Second, we consider the ARE of the WLCQR relative to the local linear model regression (LMR) estimator (see [20]). ARE g0 (WLCQR, LMR) =



˜) R3 (w E{φh′′ (ε)}−2 E{φh′ (ε)2 }

−4/5

.

For each q, the AREs of WLCQR estimator with respect to the above common estimators can be calculated. To appreciate how much efficiency is gained in practice, we investigate the performance of some other estimators. Table 2 reports AREs with various error distributions, it shows that the WLCQR estimator is more efficient than the other four estimators when q is large enough, except is less efficient than MAVE and LMR method under normal distribution.

5. Numerical studies In this section, we first use Monte Carlo simulation studies to assess the finite sample performance of the proposed procedures and then demonstrate the application of the proposed methods with two real data analyses. From Tables 1 and 2, we can see that q = 9 is a good choice for WCQR and WLCQR. Therefore, we only consider q = 9 for WCQR and WLCQR method in this section. Furthermore, we include five competitors in our comparison: (1) the minimum average variance estimation (MAVE) method (see [32]); (2) the quantile regression with τ = 0.5 (QR0.5 ) (see [31]); (3) the composite quantile regression with q = 9 (CQR9 ) (see [15]); (4) the rank-based outer product of gradients (ROPG) estimator (see [5]). (5) the local linear model regression (LMR) estimator (see [20]). The programs are written in Matlab and are available upon request from the authors.

42

R. Jiang et al. / Journal of Multivariate Analysis 148 (2016) 34–48

Table 3 The mean of MSE and RMSE for homoscedastic model. N (0 , 1 )

WCQR9 0iMAVE QR0.5 CQR9 ROPG LMR

t (3)

C (0, 1)

MSE

RMSE

MSE

RMSE

MSE

RMSE

0.0261 0.0260 0.0277 0.0321 0.0556 0.0330

– 0.9845 1.1888 1.4745 2.6994 1.2575

0.0273 0.0325 0.0301 0.0369 0.0480 0.1472

– 1.8099 1.6757 2.1446 2.7304 9.6802

0.0096 0.0275 0.0208 0.0098 0.0455 0.0515

– 5.2292 3.2223 1.0178 9.8176 12.8434

Table 4 The mean of ASE and RASE for homoscedastic model. N (0 , 1 )

WCQR9 MAVE QR0.5 CQR9 LMR

t (3 )

C (0, 1)

ASE

RASE

ASE

RASE

ASE

RASE

0.0013 0.0011 0.0016 0.0018 0.0194

– 0.9758 1.5993 1.8035 9.9600

0.0011 0.0017 0.0012 0.0024 0.0161



0.0002 0.0014 0.0007 0.0006 0.0029

– 2.5017 1.8866 1.7994 4.9411

1.4167 1.0422 2.1351 10.7657

5.1. Example for homoscedastic model We conduct a small simulation study with n = 100 and the data are generated from the following ‘‘sine-bump’’ model Y = sin{π (X⊤ γ0 − A)/(B − A)} + 0.1ε, where X is uniformly distributed on [0, 1]3 . In our simulation, we consider three error distributions for ε : standard normal distribution√(N (0, 1√ )), a t distribution and a standard Cauchy distribution (C (0, 1)), and γ0 = √ with freedom √ of 3 (t (3)) √ (1, 1, 1)⊤ / 3. A = 3/2 − 1.645/ 12 and B = 3/2 + 1.645/ 12 are taken to ensure that the design is relatively thick in the tail. All of the simulations are run for 500 replicates. Table 3 depicts mean squared errors (MSEs) of the estimate γˆ to assess the accuracy of estimate methods MSE =

 (γˆ − γ0 )⊤ (γˆ − γ0 ).

Moreover, we also compare the different estimators γˆ with γˆ WCQR via the ratio of MSE (RMSE) defined by RMSE (γˆ ) =

MSE (γˆ ) MSE (γˆ WCQR )

.

The performance of gˆ (·) will be assessed by average squared errors (ASE) ASE =

ngrid 1 

ngrid i=1

gˆ (ui ) − g (ui )

2

,

where ui , i = 1, . . . , ngrid are grid points of the support of X⊤ γ0 . Here, ngrid = 100 is used. Moreover, we also compare the different estimators gˆ to gˆ WCQR of g0 (·) via the ratio of average squared errors (RASE) defined by RASE (ˆg ) =

ASE (ˆg ) ASE (ˆg WLCQR )

.

The results are present in Table 4. From Tables 3 and 4, we can see that when the error is normal, the MAVE is the most efficient. For other error distributions, WCQR9 is consistently superior with the other five methods. 5.2. Example for heteroscedastic model It is necessary to investigate the effect of heteroscedastic errors. Consider the following model: Y = 5 sin(2X⊤ γ0 ) + exp{−16(X⊤ γ0 )2 } + 0.1[2 + cos{2π (X⊤ γ0 )}]ε, where the index parameter γ0 = (2, −2, 1)⊤ /3, and the covariate vector X is generated as multivariate normal with mean zero and covariance matrix Var(X) = (σij )p×p with σij = 0.5|i−j| . Other settings are defined the same as those in Example 5.1. The simulation results are summarized in Tables 5 and 6. From Tables 5 and 6, we can see that WCQR9 performs the best than other four estimate methods under different three error distributions.

R. Jiang et al. / Journal of Multivariate Analysis 148 (2016) 34–48

43

Table 5 The mean of MSE and RMSE for heteroscedastic model. N (0 , 1 )

WCQR9 MAVE QR0.5 CQR9 ROPG

t (3 )

C (0, 1)

MSE

RMSE

MSE

RMSE

MSE

RMSE

0.0051 0.0153 0.0114 0.0052 0.0095

– 5.1047 3.6751 1.0236 3.4703

0.0064 0.0163 0.0114 0.0079 0.0093

– 9.0210 6.5681 1.0348 4.0239

0.0037 0.0135 0.0077 0.0045 0.0075

– 6.2025 3.1886 1.3204 3.1531

Table 6 The mean of ASE and RASE for heteroscedastic model. N (0 , 1 )

WCQR9 MAVE QR0.5 CQR9

t (3 )

C (0, 1)

ASE

RASE

ASE

RASE

ASE

RASE

0.0146 0.0578 0.0220 0.0153

– 4.0096 2.6566 1.5725

0.0244 0.0641 0.0279 0.0251

– 4.6468 1.3540 1.1348

0.0130 0.0473 0.0278 0.0240

– 4.4997 1.3849 1.1949

Table 7 The simulation results for variable selection.

ε

RMRME

C

IC

U-fit

C-fit

O-fit

N (0 , 1 ) t (3) C (0 , 1 )

0.7804 0.8885 0.8218

2.7900 2.7700 2.8200

0.5100 0.6300 0.6000

0.0760 0.0600 0.0620

0.8380 0.8740 0.8800

0.0860 0.0660 0.0580

5.3. Example for variable selection In this section, we consider the variable selection for the following model: Y = exp{X⊤ γ0 } + 0.2ε, where γ0 = (2, −2, 1, 0, 0, 0, 0, 0)⊤ /3, and the covariate vector X = (X1 , . . . , X8 )⊤ is generated from the uniform distribution on [−1, 1] with independent components. Other settings are the same as those in Example 5.1. To assess the performance, we consider the median of relative model error (MRME), where the relative model error is defined as RMRME = MRME PWCQR /MRME WCQR , MRME PWCQR is defined as E (X⊤ γˆ PWCQR − X⊤ γ0 )2 , and MRME WCQR = E (X⊤ γˆ WCQR − X⊤ γ0 )2 . In addition, the proportions of models under-fitted (U-fit), correctly fitted (C-fit), over-fitted (O-fit) are also reported in Table 7, in which the row labeled ‘‘C’’ shows the average number of nonzero coefficients correctly estimated to be nonzero, while the row labeled ‘‘IC’’ presents the average of number of zero coefficients incorrectly estimated to be nonzero. From Table 7, we can see that the variable selection method performs well in all cases. 5.4. Real data example: Boston housing data As an illustration, we now apply the proposed WCQR9 methodology to the Boston housing data. The data contain 506 observations on 14 variables, the dependent variable of interest is medv. Thirteen other statistical measurements on the 506 census tracts in suburban Boston from the 1970 census are also included. This data can be found in the StatLib library maintained at Carnegie Mellon University. Many regression studies have used this data set and found potential relationship between medv and RM, TAX, PTRATIO, LSTAT (see [31]). In this study, we focus on the following four covariates: RM: average number of rooms per dwelling; TAX: full-value property tax (in dollar) per $10,000; PTRATIO: pupil–teacher ratio by town; LSTAT: lower status of the population (percent). We follow previous studies and take logarithmic transformations on TAX and LSTAT. The dependent variable is centered around zero. In this study, the following single-index model is used to fit the data medv = g {γ1 RM + γ2 log(TAX ) + γ3 PTRATIO + γ4 log(LSTAT )} + ε. The estimated coefficients for the above model are summarized in Table 8. Moreover, we use the bootstrap method (see [3]) to compute the standard deviations (STD) of estimated coefficients, the results are also summarized in Table 8. From Table 8, the WCQR9 estimators for index coefficients are (0.5240, −0.5604, −0.0757, −0.6369). It is worth noticing that PTRATIO has the smallest effect on house prices among the four covariates and LSTAT is the most important covariate. After

44

R. Jiang et al. / Journal of Multivariate Analysis 148 (2016) 34–48 Table 8 Single-index coefficient estimates and MSE for Example 5.4. Method

γˆ1 (STD)

γˆ2 (STD)

γˆ3 (STD)

γˆ4 (STD)

WCQR9 MAVE QR0.5 CQR9

0.5240 (0.0207) 0.2284 (0.0616) 0.3845 (0.0151) 0.5346 (0.0743)

−0.5604 (0.0214) −0.2072 (0.0564) −0.3870 (0.0556) −0.5608 (0.0618)

−0.0757 (0.0040) −0.0630 (0.0054) −0.0762 (0.0032) −0.0783 (0.0711)

−0.6369 (0.0153) −0.9492 (0.0309) −0.8346 (0.0211) −0.6273 (0.0571)

Table 9 Single-index coefficient estimates and MSE for Example 5.5. Method

γˆ1 (STD)

γˆ2 (STD)

γˆ3 (STD)

MSE

WCQR9 MAVE QR0.5 CQR9

0.6078 (0.0141) 0.5679 (0.0177) 0.6117 (0.0142) 0.6007 (0.0170)

−0.7941 (0.0103) −0.8211 (0.0128) −0.7911 (0.0105) −0.7994 (0.0153)

−0.0029 (0.0077) −0.0574 (0.0148) −0.0020 (0.0084) −0.0024 (0.0183)

8.9427 9.6068 9.0220 8.9755

Fig. 1. Fitted index function for Boston housing data.

obtaining an estimate for γ1 − γ4 , we then estimate the relationship between medv and γ1 RM + γ2 log(TAX ) + γ3 PTRATIO + γ4 log(LSTAT ). Fig. 1 shows the estimated g (·) along with the data. From Fig. 1, one can see that WCQR9 are quite close to the true value. 5.5. Real data example: walking behavior data We also illustrate the methodology via an application to a walking behavior data set. The data set used here consists of the number of weekly walking times, three individual attribute factors based on a travel survey of 1737 individuals from 21 neighborhoods in Shanghai between February 2012 and April 2012. Three individual attributes factors: age X1 ; education X2 and income X3 . Our main interest is to study the relationship between individual attribute variables and the times of weekly walking times (Y). Y = g (X⊤ γ0 ) + ε, where X = (X1 , X2 , X3 )⊤ . The mean squared error (MSE) for fitting is used to assess the relative success of two estimate n ˆ 2 ˆ methods, where MSE = i=1 (Yi − Yi ) /n, Yi is the fitted value of Yi and n is the sample size. If the MSE of an estimate method is small, the estimate method is good. Conversely, if the MSE for an estimate method is large, the estimate method is unsatisfactory. The estimators, STDs and MSEs are given in Table 9. The results show that WCQR estimate is a good initial estimate and comparing coefficient we can find education X2 is the most important individual attribute variable. Moreover, from Fig. 2, we can see that the higher education, the less walk. 6. Conclusion In this paper, we develop a weighted composite quantile regression estimation method for single-index regression. The proposed estimators share robustness from quantile regression and achieve nearly the same efficiency as the semiparametric

R. Jiang et al. / Journal of Multivariate Analysis 148 (2016) 34–48

45

Fig. 2. Link function for walking behavior data.

maximum likelihood estimator for a variety of error distributions including the Normal, Student’s t, Cauchy distributions, etc. Furthermore, based on the proposed WCQR, we use the adaptive-LASSO to study variable selection for parametric part in the single-index models. By comparing asymptotic relative efficiency theoretically and numerically, WCQR estimation all outperforms the CQR estimation and some normal estimate methods, such as the minimum average variance estimation method, quantile regression estimation method, the rank-based outer product of gradients estimation method and the local linear model regression estimation method. Acknowledgments The authors would like to thank Dr. Yong Chen for sharing the walking behavior survey data and thank the Editor and Referees for their helpful suggestions that improved the paper. Appendix To establish the asymptotic properties of the proposed estimators, the following technical conditions are imposed. C1. The kernel K (·) is a symmetric density function with finite support. C2. The density function of X⊤ γ is positive and uniformly continuous for γ in a neighborhood of γ0 . Further the density of X⊤ γ0 is continuous and bounded away from 0 and ∞ on its support. C3. The function g0 (·) has a continuous and bounded second derivative. C4. Assume that the model error ε has a positive density f (·). C5. The conditional variance σ (·) is positive and continuous. Remark 4. Conditions C1–C4 are standard conditions, which are commonly used in single-index regression model, see Wu et al. [31]. Condition C5 is also assumed in Sun et al. [27]. Proof of Theorem 1. Write γˆ ∗ = q  n  n 



n(γˆ WCQR − γ0 ) and given (ˆa1j , . . . , aˆ qj , bˆ j ), note that γˆ ∗ minimizes the following

  vk ρτk Yi − aˆ kj − bˆ j (Xi − Xj )⊤ γ ωij .

j=1 k=1 i=1

Then, γˆ ∗ is also the minimizer of Ln (γ ∗ ) =

q  n  n  j=1 k=1 i=1

    √  ⊤ ∗ ⊤ ˆ ˜ ˜ n − ρ σ ( X γ )[ε − c ] − r ωij , vk ρτk σ (X⊤ γ )[ε − c ] − r − b X γ / 0 i k i , k , j 0 i k i , k , j j τ i i ij k

46

R. Jiang et al. / Journal of Multivariate Analysis 148 (2016) 34–48

⊤ ˆ ⊤ where r˜i,k,j = aˆ kj − g0 (X⊤ i γ0 ) − ck σ (Xi γ0 ) + bj Xij γ0 with Xij = Xi − Xj . By applying the identity (see [18]), we can rewrite ∗ Ln (γ ) as follows: q n n 1  ∗ Ln (γ ∗ ) = √ vk ωij bˆ j X⊤ ij {I (εi ≤ ck ) − τk } γ n j=1 k=1 i=1

+

q  n  n 

vk ωij

√ r˜i,k,j /σ (X⊤ γ )+bˆ j X⊤ γ ∗/ n i 0 ij



  I

r˜i,k,j /σ (X⊤ γ ) i 0

j=1 k=1 i=1

εi ≤ ck +



z

σ (X⊤ i γ0 )

 − I (εi ≤ ck ) dz

= L1n (γ ∗ ) + L2n (γ ∗ ), where q n n 1  ∗ L1n (γ ∗ ) = √ vk ωij bˆ j X⊤ ij {I (εi ≤ ck ) − τk } γ n j =1 k =1 i =1

L2n (γ ) = ∗

q  n  n 

vk ωij

√ r˜i,k,j /σ (X⊤ γ )+bˆ j XTij γ ∗ / n i 0



  I

r˜i,k,j /σ (X⊤ γ ) i 0

j=1 k=1 i=1

εi ≤ ck +



z

σ (X⊤ i γ0 )

 − I (εi ≤ ck ) dz .

Firstly, we consider the conditional expectation of L2n (γ ∗ ), q  n  n 



√ r˜i,k,j /σ (X⊤ γ )+bˆ j X⊤ γ ∗/ n i 0 ij

z

f (ck )dz σ (XTi γ0 ) j=1 k=1 i=1   q n n 1 ∗ ⊤ 1  f (ck ) ⊤ ⊤ = (γ ) bˆ j Xij Xij bˆ j γ ∗ vk ωij 2 n j=1 k=1 i=1 σ (X ⊤ γ ) 0 i   q  n  n  r˜i,k,j f (ck ) 1 ⊤ · bˆ j Xij γ ∗ + op (1) vk ωij + √ ⊤ n j=1 k=1 i=1 σ (X⊤ γ ) σ ( X γ ) 0 0 i i

E{L2n (γ )|X γ0 } = ∗



vk ωij

r˜i,k,j /σ (X⊤ γ ) i 0

= L2n1 (γ ∗ ) + L2n2 (γ ∗ ) + op (1). In the following, we consider L2n1 (γ ∗ ) and L2n2 (γ ∗ ), 1

L2n1 (γ ) = ∗

2

(γ )

∗ ⊤

 q 

 vk f (ck ) Sγ ∗ + op (1).

k =1

Next consider L2n2 (γ ), we can obtain ∗

q n n 1  vk ηi,k bˆ j {E(Xj |X⊤ γ0 ) − Xj }ωij γ ∗ + op (1), L2n2 (γ ∗ ) = − √ n j=1 k=1 i=1

where ηi,k = I (εi ≤ ck ) − τk . It is easy to obtain that L2n (γ ∗ ) − E{L2n (γ ∗ )|X⊤ γ0 } = op (1), thus L2n (γ ) = ∗

1 2

 (γ )

∗ ⊤

q 



q n n 1 

vk f (ck ) Sγ ∗ − √

k=1

n j=1 k=1 i=1

vk ηi,k bˆ j {E(Xj |X⊤ γ0 ) − Xj }ωij γ ∗ + op (1).

Then, q n n 1 

1

vk ηi,k bˆ j {Xi − E(Xj |X γ0 )}ωij γ + (γ ) Ln (γ ) = √ 2 n j=1 k=1 i=1 ∗



 q 

1

= Wn γ ∗ + (γ ∗ )⊤ 2



∗ ⊤

 q 

 vk f (ck ) Sγ ∗ + op (1)

k=1

 vk f (ck ) Sγ ∗ + op (1),

k=1

n q

vk ηi,k bˆ j {Xi − E(Xj |X⊤ γ0 )}ωij . It follows by the convexity lemma (see [22]) that the quadratic approximation to Ln (γ ) holds uniformly for γ ∗ in any compact set Θ . Thus, it follows that   −1 q  ∗ γˆ = − vk f (ck ) S−1 Wn + op (1).

where Wn = √1n

k=1

j =1

k=1

n

i=1



R. Jiang et al. / Journal of Multivariate Analysis 148 (2016) 34–48

47

By the Cramér–Wald theorem and the Central Limit Theorem for Wn holds and Var(Wn ) → completes the proof.

q

k=1

q

k′ =1

vk vk′ τkk′ S. This

Proof of Theorem 2. By using the techniques similar to those used in Theorem 4 in Jiang et al. [8], one can complete the proof of this theorem. Proof of Theorem 3. We first prove the asymptotic normality part. Let λ = λn , and γ¯ ∗ = minimizer of the following criterion: q  n  n 

Qn∗ (γ ∗ ) =



n(γˆ PWCQR − γ0 ). Then γ¯ ∗ is the

    √  ⊤ ˆ ⊤ ∗ ωij v˜ k ρτk σ (X⊤ i γ0 )(εi − ck ) − r˜i,k,j − bj Xij γ / n − ρτk σ (Xi γ0 )(εi ck ) − r˜i,k,j

j=1 k=1 i=1

+

p 



j=1

λn

n|γˆj



WCQR 2

|

 γ∗ 

 



j n γ0j + √  − |γ0j | . n

Following the arguments in the proof of Theorem 1, Qn (γ ) = Wn γ + ∗





1 2

(γ )

∗ ⊤

 q 

 vk f (ck ) Sγ ∗ +

k=1

p  j =1



λn

n|γˆj



WCQR 2

|

 

 γ∗ 



j n γ0j + √  − |γ0j | n

+ op (1).

Similar to the derivation in the proof of Theorem 4.1 in Zou [37], the third term above can be expressed as



λn

n|γˆj

    0,  γj∗  P → 0, n γ0j + √  − |γ0j | − ∞, n



WCQR 2

|

if γ0j ̸= 0, if γ0j = 0 and γj∗ = 0, if γ0j = 0 and γj∗ ̸= 0.

∗ ∗ Let us write γn∗ = (γ1n , γ2n ) where γin∗ contains the first K element of γn∗ . Using the same arguments in Knight [18], we have Pr

∗ ∗ − → 0 and the asymptotic results for γ¯1n can be proven. Similar as the discussion in the proof of Theorem 4.1 in Zou [37], γ2n

model selection consistence can be derived. Proof of Theorem 4. The proof is similar to the Proposition 2.1 in Sun et al. [27]. Proof of Theorem 5. The proof is similar to the Theorem 1 in Jiang et al. [13]. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27]

F. Chaudhuri, K. Doksum, A. Samarov, On average derivative quantile regression, Ann. Statist. 25 (1997) 715–744. M. Delecroix, M. Hristache, V. Patilea, On semiparametric M-estimation in single-index regression, J. Statist. Plann. Inference 136 (2006) 730–769. B. Efron, Bootstrap methods: another look at the jackknife, Ann. Statist. 7 (1979) 1–26. J. Fan, I. Gijbels, Local Polynomial Modelling and Its Applications, Chapman and Hall, London, 1996. L. Feng, C.L. Zou, Z.J. Wang, Rank-based inference for the single-index model, Statist. Probab. Lett. 82 (2012) 535–541. W. Härdle, T. Stoker, Investing smooth multiple regression by the method of average derivatives, J. Amer. Statist. Assoc. 84 (1989) 986–995. H. Ichimura, Semiparametric least squares (SLS) and weighted SLS estimation of singleindex models, J. Econometrics 58 (1993) 71–120. X.J. Jiang, J.C. Jiang, X.Y. Song, Oracle model selection for nonlinear models based on weighted composite quantile regression, Statist. Sinica 22 (2012) 1479–1506. R. Jiang, W.M. Qian, J.R. Li, Testing in linear composite quantile regression models, Comput. Statist. 29 (2014) 1381–1402. R. Jiang, W.M. Qian, Z.G. Zhou, Variable selection and coefficient estimation via composite quantile regression with randomly censored data, Statist. Probab. Lett. 2 (2012) 308–317. R. Jiang, W.M. Qian, Z.G. Zhou, Test for single-index composite quantile regression, Hacet. J. Math. Stat. 43 (2014) 861–871. R. Jiang, W.M. Qian, Z.G. Zhou, Composite quantile regression for linear errors-in-variables models, Hacet. J. Math. Stat. 44 (2015) 707–713. R. Jiang, W.M. Qian, Z.G. Zhou, Single-index composite quantile regression with heteroscedasticity and general error distributions, Statist. Papers 57 (2016) 185–203. R. Jiang, Z.G. Zhou, W.M. Qian, Y. Chen, Two step composite quantile regression for single-index models, Comput. Statist. Data Anal. 64 (2013) 180–191. R. Jiang, Z.G. Zhou, W.M. Qian, W.Q. Shao, Single-index composite quantile regression, J. Korean Statist. Soc. 3 (2012) 323–332. B. Kai, R. Li, H. Zou, Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression, J. R. Stat. Soc. Ser. B 72 (2010) 49–69. B. Kai, R. Li, H. Zou, New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models, Ann. Statist. 39 (2011) 305–332. K. Knight, Limiting distributions for L1 regression estimators under general conditions, Ann. Statist. 26 (1998) 755–770. W. Lin, K.B. Kulasekera, Identifiability of single-index models and additive-index models, Biometrika 94 (2007) 496–501. J. Liu, R. Zhang, W. Zhao, Y. Lv, A robust and efficient estimation method for single index models, J. Multivariate Anal. 122 (2013) 226–238. Z. Ning, L. Tang, Estimation and test procedures for composite quantile regression with covariates missing at random, Statist. Probab. Lett. 95 (2014) 15–25. D. Pollard, Asymptotics for least absolute deviation regression estimators, Econometric Theory 7 (1991) 186–199. J.A. Rice, B.W. Sliverman, Estimating the mean and covariance structure nonparametrically when the data are curves, J. R. Stat. Soc. Ser. B 53 (1991) 233–243. D. Ruppert, S.J. Sheather, M.P. Wand, An effective bandwidth selector for local least squares regression, J. Amer. Statist. Assoc. 90 (1995) 1257–1270. D. Ruppert, M.P. Wand, U. Holst, O. Hossjer, Local polynomial variance-function estimation, J. Amer. Statist. Assoc. 39 (1997) 262–273. B.W. Silverman, Density Estimation, Chapman and Hall, London, 1986. J. Sun, Y. Gai, L. Lin, Weighted local linear composite quantile estimation for the case of general error distributions, J. Statist. Plann. Inference 143 (2013) 1049–1063.

48

R. Jiang et al. / Journal of Multivariate Analysis 148 (2016) 34–48

[28] L. Tang, Z. Zhou, C. Wu, Weighted composite quantile estimation and variable selection method for censored regression model, Statist. Probab. Lett. 3 (2012) 653–663. [29] H. Wang, G. Li, G. Jiang, Robust regression shrinkage and consistent variable selection via the LAD-LASSO, J. Bus. Econom. Statist. 20 (2007) 347–355. [30] H. Wang, R. Li, C.L. Tsai, Tuning parameter selectors for the smoothly clipped absolute deviation method, Biometrika 94 (2007) 553–568. [31] T.Z. Wu, K. Yu, Y. Yu, Single-index quantile regression, J. Multivariate Anal. 101 (2010) 1607–1621. [32] Y. Xia, W. Härdle, Semi-parametric estimation of partially linear single-index models, J. Multivariate Anal. 97 (2006) 1162–1184. [33] Y. Xia, H. Tong, W.K. Li, L. Zhu, An adaptive estimation of dimension reduction space, J. R. Stat. Soc. Ser. B 64 (2002) 363–410. [34] Y. Yu, D. Ruppert, Penalized spline estimation for partially linear single-index models, J. Amer. Statist. Assoc. 97 (2002) 1042–1054. [35] L. Zhu, M. Huang, R. Li, Semiparametric quantile regression with high-dimensional covariates, Statist. Sinica 22 (2012) 1379–1401. [36] L.P. Zhu, L.X. Zhu, Nonconcave penalized inverse regression in single-index models with high dimensional predictors, J. Multivariate Anal. 100 (2009) 862–875. [37] H. Zou, The adaptive LASSO and its oracle properties, J. Amer. Statist. Assoc. 101 (2006) 1418–1429. [38] H. Zou, T. Hastie, R. Tibshirani, On the degrees of freedom of the Lasso, Ann. Statist. 35 (2007) 2173–2192. [39] H. Zou, M. Yuan, Composite quantile regression and the oracle model selection theory, Ann. Statist. 36 (2008) 1108–1126.