Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression

Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression

Journal of the Korean Statistical Society ( ) – Contents lists available at ScienceDirect Journal of the Korean Statistical Society journal homepa...

439KB Sizes 0 Downloads 91 Views

Journal of the Korean Statistical Society (

)



Contents lists available at ScienceDirect

Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss

Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression Xiaofei Sun a , Kangning Wang a,b, *, Lu Lin b a b

School of Statistics, Shandong Technology and Business University, Yantai, China Institute of Financial Studies, Shandong University, Jinan, China

article

info

Article history: Received 7 May 2017 Accepted 7 September 2017 Available online xxxx AMS 2000 subject classifications: 62G05 62E20 62J02 Keywords: Partial linear varying coefficient models Rank regression Selection consistency Oracle property Robustness and efficiency

a b s t r a c t Partial linear varying coefficient models are often used in real data analysis for a good balance between flexibility and parsimony. In this paper, we propose a robust adaptive model selection method based on the rank regression, which can do simultaneous coefficient estimation and three types of selections, i.e., varying and constant effects selection, relevant variable selection. The new method has superiority in robustness and efficiency by inheriting the advantage of the rank regression approach. Furthermore, consistency in the three types of selections and oracle property in estimation are established as well. Simulation studies also confirm our method. © 2017 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

1. Introduction Partial linear varying coefficient models (PLVCM) (Ahmad, Leelahanon, & Li, 2005; Fan & Huang, 2005; Kai, Li, & Zou, 2011) are often considered for its good balance between flexibility and parsimony. There are a large number of literatures on the estimation and variable selection for PLVCM. For estimation, we refer Ahmad et al. (2005), Fan and Huang (2005), Kai et al. (2011), Wang, Zhu, and Zhou (2009), Zhang, Zhao, and Liu (2013) and Zhou and Liang (2009). For variable selection, examples include but are not limited to Li and Liang (2008), Wang, Li, and Huang (2008), Wang and Lin (2016), Wang and Xia (2009), Zhang et al. (2013), Zhao and Xue (2009) and Zhao, Zhang, Liu, and Lv (2014). The most important assumption in the aforementioned methods is to assume that the subset of variables having constant or varying effect on the response is known in advance, or say, the true model structure is determined. This assumption underlies the construction of the estimators and investigation of their theoretical properties in the existing methods. However, in the application, it is unreasonable to artificially determine which subset of variables has constant or varying effect on the response. To solve the above problem, Hu and Xia (2012), Leng (2009), Noh and Keilegom (2012) and Xia, Zhang, and Tong (2004) proposed some methods to identify the partial linear structure. Furthermore, Tang, Wang, Zhu, and Song (2012) proposed unified methods, which can select the relevant variables and partial linear structure simultaneously. However, the aforementioned methods are mainly built upon mean regression or likelihood based methods, which can be adversely influenced by outliers or heavy-tail distributions. Although, Tang et al. (2012) gave a quantile regression method

*

Corresponding author at: School of Statistics, Shandong Technology and Business University, Yantai, China. E-mail address: [email protected] (K. Wang).

https://doi.org/10.1016/j.jkss.2017.09.003 1226-3192/© 2017 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

Please cite this article in press as: Sun, X., et al., Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression. Journal of the Korean Statistical Society (2017), https://doi.org/10.1016/j.jkss.2017.09.003.

2

X. Sun et al. / Journal of the Korean Statistical Society (

)



which is robust, it has limitations in terms of efficiency. Furthermore, the method in Tang et al. (2012) needed an iterative two-step procedure that is very inconvenient in the application. Hence, it would be highly desirable to develop an efficient and robust adaptive method that can simultaneously conduct model identification and estimation in one step. Recently, Wang, Kai, and Li (2009) proposed a novel procedure for the varying coefficient model based on rank regression and demonstrated that the new method is highly efficient across a wide class of error distributions and possesses comparable efficiency in the worst case scenario compared with mean regression. Similar conclusions on rank regression have been further confirmed in Feng, Zou, Wang, Wei, and Chen (2015), Leng (2010), Sun and Lin (2014), and the references therein. Therefore, motivated by the above discussion, we propose a robust adaptive model selection procedure in the rank regression setting, which can do simultaneous coefficient estimation and three types of selections, i.e., varying and constant effects selection, relevant variable selection. Specifically, we first embed the PLVCM into a varying coefficient model and use the spline method to approximate unknown functions. Then, a two-fold SCAD (Fan & Li, 2001) penalty is employed to discriminate the nonzero components as well as linear components from the nonlinear ones by penalizing both the coefficient functions and their first derivatives. The new adaptive selection procedure has superiority in robustness and efficiency by inheriting the advantage of the rank regression approach. Furthermore, consistency in the three types of selections and oracle property in estimation are established as well. Although, Feng et al. (2015) also proposed a penalized rank regression procedure, their method is only for selecting the relevant variables, which is completely different from our method. The rest of this paper is organized as follows. In Section 2, we introduce the new method and investigate its theoretical properties and related implementation issues. Numerical studies are reported in Section 3. All the technical proofs are provided in Appendix. 2. Robust adaptive model selection in rank regression 2.1. Two-fold penalization rank regression Suppose that the observed full data set is Dn = {Di = (Yi , X i , Ui ), i = 1, . . . , n} ,

(2.1) (1)

(p)

where Yi is the response of the ith observation, X i = (Xi , . . . , Xi )T ∈ Rp is the covariate vector, and assume index variable Ui ∈ [0, 1] without loss of generality. Then PLVCM with underlying true partial linear structure and irrelevant variables must have the following form: Yi =



(k)

Xi αk (Ui ) +

k∈AV



(k)

Xi βk +

k∈AC



(k)

Xi 0(Ui ) + ϵi ,

(2.2)

k∈AZ

where unknown sets AV , AC and AZ are the index sets varying effects, nonzero constant effects and zero effects, ⋃ for ⋃ AC AZ = {1, . . . , p}. respectively, they are mutually exclusive and satisfy AV Thus, given data set Dn , our main aim is to identify the index sets AV , AC , AZ , and estimate the nonzero coefficients αk (u), k ∈ AV and βk , k ∈ AC efficiently and robustly. As the partial linear structure is unknown in advance, the PLVCM (2.2) can be embedded into the following varying coefficient model: (p)

(1)

Yi = Xi α1 (Ui ) + · · · + Xi αp (Ui ) + ϵi .

(2.3)

Thus, if αk (u) ≡ 0 for u ∈ [0, 1], X (k) is an irrelevant variable, otherwise, if derivative αk′ (u) ≡ 0 for u ∈ [0, 1], then X (k) has constant effect, otherwise, αk (u) is a varying function. Therefore, problem becomes that of determining which αk (·)s are zero functions and which αk′ (·)s are zero functions. Then, we can use the polynomial splines to approximate the αk (·)s. Let 0 = τ0 < τ1 < · · · < τKn < τKn +1 = 1 be a partition of [0, 1] into Kn + 1 subintervals Inj = [τj , τj+1 ), j = 0, . . . , Kn − 1, and InKn = [τKn , τKn +1 ], where Kn = nϑ with 0 < ϑ < 0.5 is a positive integer such that max1≤j≤Kn +1 |τj − τj−1 | = O(n−ϑ ). Let Fn be the space of polynomial splines of degree D ≥ 1 consisting of functions f satisfying: (i) the restriction of f to Inj is a polynomial of degree D for 0 ≤ j ≤ Kn ; (ii) f is (D − 1)-times continuously differentiable on [0, 1] (Schumaker, 1981). There exist B-spline basis functions B(·) = (B1,D (·), . . . , Bdn ,D (·))T for Fn , where dn = Kn + D + 1 (Schumaker, 1981). Then αk (u) can be approximated as

αk (u) ≈

dn ∑

Bj,D (u)γk,j = B(u)T γ k ,

(2.4)

j=1

where γ k = (γk,1 , . . . , γk,dn )T . Please cite this article in press as: Sun, X., et al., Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression. Journal of the Korean Statistical Society (2017), https://doi.org/10.1016/j.jkss.2017.09.003.

X. Sun et al. / Journal of the Korean Statistical Society (

)



3

Recall that we are interested in finding the zero components and linear components of model (2.3). Thus, based on spline approximation, empirically, the former can be done by shrinking ∥B(u)T γ k ∥ to zero. Furthermore, by de Boor (2001, Ch. X), the derivative αk′ (u) can be approximated by dn −1

αk′ (u) ≈ D

∑ Bj+1,D−1 (u) (γk,j+1 − γk,j ) τ˜j+D+1 − τ˜j+1 j=1

= D(u)T δk = D(u)T Aγ k ,

(2.5)

where δk = (γk,2 − γk,1 , . . . , γk,dn − γk,dn −1 )T , D(u) =

(

B

(u)

B

(u)

D τ˜2,D−−1 τ˜ , . . . , D τ˜ dn ,D−−1τ˜ D+2

2

dn +D

)T

dn

, (B2,D−1 (u), . . . , Bdn ,D−1 (u)) are the

B-spline functions of degree D − 1, τ˜1 = · · · = τ˜D+1 = 0, τ˜D+2 = τ1 , . . . , τ˜D+Kn +1 = τKn , τ˜D+Kn +2 = · · · = τ˜Kn +2(D+1) = 1 be the extended knots, and

−1 A = ···

1

[

0

0

··· ···

... 0

0

··· ··· −1

]

··· 1

is a (dn − 1) × dn constant matrix. Thus, finding the linear parts can be achieved by shrinking ∥D(u)T δk ∥ to zero. Therefore, we propose the following two-fold penalization rank regression loss function

L(γ ) =

1∑ n

|ei (γ ) − ej (γ )| + n

i
p ∑

pλ1 (∥γ k ∥R ) + n

p ∑

pλ2 (∥δk ∥D ),

(2.6)

k=1

k=1

(1)

(p)

where ei (γ ) = Yi − ΠTi γ with γ = (γ T1 , . . . , γ Tp )T and ΠTi = (Xi B(Ui )T , . . . , Xi B(Ui )T ), ∥γ k ∥2R = γ Tk R γ k , with R = ∫ ∫ B(u)B(u)T du, ∥δk ∥2D = δTk Dδk , with D = [0,1] D(u)D(u)T du, pλ (·) is the SCAD penalty function (Fan & Li, 2001) with [0,1] tuning parameter λ, defined as

{

p′λ (|t |) = λ I(|t | ≤ λ) +

(aλ − |t |)+ (a − 1)λ

I(|t | > λ)

}

for some a > 2,

(2.7)

where (t)+ stands for the positive part of t, I(·) is the indicator function. Fan and Li (2001) showed that the choice of a = 3.7 performs well in a variety of situations. Hence, we use their suggestion throughout this paper. Remark 1. The first penalty term of L(γ ) aims to select the nonzero coefficients, the second penalty term of L(γ ) is used to detect the nonzero coefficients whether constants or varying functions. Furthermore, by inheriting the advantage of the (1) rank regression approach, our adaptive estimation and selection procedure is robust and efficient. Note that, when Xi ≡ 1, α1 (·) in the model (2.3) can be regarded as the marginal component, i.e., α0 (·), which is included in Tang et al. (2012). If we do not want to penalize the marginal component, we can delete the penalty terms pλ1 (∥γ 1 ∥R ) and pλ2 (∥δ1 ∥D ) in loss function (2.6), which means that we do not impose any penalty on the marginal component, and the marginal component can be estimated directly. Let γˆ = argminγ

{

} L(γ ) , then the estimated αk (u) is given by αˆ k (u) = B(u)T γˆ k , and by the properties of B-splines (de ∑dn ˆ

Boor 1978), we know that, for each j = 1, . . . , dn , Bj,D (u) ≥ 0 and j=1 Bj,D (u) = 1, thus if ∥γˆ k ∥1 ̸ = 0 and ∥δk ∥1 = 0, then αk (u) will be estimated as a nonzero constant. Furthermore, the estimated varying effects, nonzero constant effects and zero effects index sets are respectively

ˆ V = {k : ∥γˆ ∥1 ̸= 0 and ∥δˆ k ∥1 ̸= 0, k = 1, . . . , p}, A k ˆ C = {k : ∥γˆ ∥1 ̸= 0 and ∥δˆ k ∥1 = 0, k = 1, . . . , p}, A k ˆ Z = {k : ∥γˆ ∥1 = 0, k = 1, . . . , p}. A k 2.2. Algorithm implementation and tuning parameters selections It is worth noting that the commonly used gradient-based optimization technique is not ∑ feasible here for solving (2.6) due to its irregularity at the origin. According to Sievers and Abebe (2004), we approximate 1n i
n 1∑

n

n

|ei (γ ) − ej (γ )| ≈

i
wi (γ )(ei (γ ) − ζ (γ ))2 ,

i=1

Please cite this article in press as: Sun, X., et al., Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression. Journal of the Korean Statistical Society (2017), https://doi.org/10.1016/j.jkss.2017.09.003.

4

X. Sun et al. / Journal of the Korean Statistical Society (

)



where ζ (γ ) is the median of {e1 (γ ), . . . , en (γ )} and

wi (γ ) =

⎧ ⎪ ⎨

R(e1 (γ )) n+1



1 2

ei (γ ) − ζ (γ ) ⎪ ⎩ 0,

,

for ei (γ ) ̸ = ζ (γ ), otherwise,

with R(e1 (γ )) being the rank of e1 (γ ) among {e1 (γ ), . . . , en (γ )}. On the other hand, following Fan and Li (2001),( we apply ) LQA to the last two penalty terms. Specially, given an initial (0) value γ (0) with ∥γ k ∥R > 0, k = 1, . . . , p, then pλ1 ∥γ k ∥R can be approximated by a quadratic form

( ) ( ) 1 p′λ1 ∥γ (0) ) ∥R ( ( ) k 2 (0) pλ1 ∥γ k ∥R ≈ pλ1 ∥γ k ∥R + ∥γ k ∥2R − ∥γ (0) k ∥R . (0) 2 ∥γ k ∥R ( ) Similarly, pλ2 ∥δk ∥D also can be approximated as ( ) ) ( ) 1 p′λ2 ∥δ(0) ∥D ( ( ) k (0) 2 (0) 2 ∥δ ∥ − ∥δ ∥ pλ2 ∥δk ∥D ≈ pλ2 ∥δk ∥D + k D k D 2 ∥δ(0) ∥D ) (k ( ) 1 p′λ2 ∥δ(0) ( ) k ∥D (0) (0) 2 T T = pλ2 ∥δk ∥D + γ A DA γ − ∥δ ∥ . k k k D 2 ∥δ(0) k ∥D

(2.8)

(2.9)

Let

{ Σ1 (γ

(0)

Σ2 (γ

(0)

) = diag

(0)

(0)

p′λ1 (∥γ 1 ∥R )

∥γ (0) 1 ∥R

R, . . . ,

p′λ1 (∥γ p ∥R )

∥γ (0) p ∥R

} R

and

{ ) = diag

(0)

p′λ2 (∥δ1 ∥D )

∥δ(0) 1 ∥D

A DA, . . . , T

p′λ2 (∥δ(0) p ∥D )

∥δ(0) p ∥D

} A DA . T

Then, except for a constant term, the penalized loss function (2.6) becomes

)T ( ) n n 1 ( (0) S − Πγ W (γ (0) ) S (0) − Πγ + γ T Σ1 (γ (0) )γ + γ T Σ2 (γ (0) )γ, n 2 2

(2.10)

where S (0) = (Y1 − ζ (γ (0) ), . . . , Yn − ζ (γ (0) ))T , Π = (Π1 , . . . , Πn )T and W (γ (0) ) is diagonal weight matrix with wi (γ (0) ). This is a quadratic form with a minimizer satisfying { } n n ΠT W (γ (0) )Π + Σ1 (γ (0) ) + Σ2 (γ (0) ) γ = ΠT W (γ (0) )S (0) . (2.11) 2 2 Therefore, the computational algorithm can be implemented as follows: Step 1. Initialize γ = γ (0) . Step 2. Given γ (t) , update γ to γ (t +1) by solving (2.11), where γ (0) and the γ (0) s in W (γ (0) ), Σ1 (γ (0) ), Σ2 (γ (0) ) and S (0) are all set to be γ (t) . Step 3. Iterate Step 2 until convergence of γ is achieved. (t)

(t)

Remark 2. Similar to Fan and Li (2001), in our simulation studies, if ∥γ k ∥2 < 10−4 or ∥δk ∥2 < 10−4 , we will set it to zero. Furthermore, we need an initial estimator γ (0) , and we find that final estimator might depend on the initial choice, thus in (0) order to achieve the robustness, ∑the initial estimator γ is chosen as the unpenalized rank regression estimator, which also 1 (0) is robust, i.e., γ = argmin{ n i
λ1 and λ2 appropriately. Here we fix the spline order to be 4, which means that cubic splines are used in all our numerical implementations. Then we use 5-fold cross-validation (CV) to select Kn as well as (λ1 , λ2 ) simultaneously. To be more specific, we randomly divide the data Dn into five roughly equal parts, denoted as {Di , i ∈ S(j)} for j = 1, 2, 3, 4, 5, where S(j) is the set of subject indices corresponding to the jth part. For each j, we treat {Di , i ∈ S(j)} as the validation data set, and the remaining four parts of data as the training data set. For any candidate (Kn , λ1 , λ2 ) and each S(j), we first use the training data set to (p) p (1) estimate {αk (·)}k=1 by (2.6), and calculate the corresponding prediction Yˆi = Xi αˆ 1 (Ui ) + · · · + Xi αˆ p (Ui ), i ∈ S(j). Then the cross validation error corresponding to a fixed (Kn , λ1 , λ2 ) is defined as } 5 ∑ ∑ { R(eˆ i ) 1 CV5 (Kn , λ1 , λ2 ) = − R(eˆ i ), (2.12) n+1 2 j=1 i∈S(j)

Please cite this article in press as: Sun, X., et al., Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression. Journal of the Korean Statistical Society (2017), https://doi.org/10.1016/j.jkss.2017.09.003.

X. Sun et al. / Journal of the Korean Statistical Society (

)



5

where eˆ i = Yi − Yˆi and R(eˆ i ) represents the rank of eˆ i among {ˆei : i ∈ S(j)}. Finally, the optimal (Kn , λ1 , λ2 ) is selected by minimizing the cross validation error CV5 (Kn , λ1 , λ2 ). In our simulation studies, we search Kn in {2, 3, 4, 5, 6} and λ1 , λ2 are chosen equally spaced on (0, 1.5), the simulation results show that such choice works well. 2.3. Asymptotic properties Without loss of generality, for model (2.3), we assume that αk (·) is truly nonparametric for k = 1, . . . , v with true value α0,k (·), nonzero constant for k = v + 1, . . . , v + c, with the true slope parameters for the parametric components that are denoted by βc0 = (β0,v+1 , . . . , β0,v+c )T , and zero for k = v + c + 1, . . . , p. Thus the vectors X v = (X (1) , . . . , X (v ) )T and X v = (X (v+1) , . . . , X (v+c) )T correspond to the varying and nonzero constant effects, the true varying effects, nonzero constant effects and zero effects index sets are AV = {1, . . . , v}, AC = {v + 1, . . . , v + c } and AZ = {v + c + 1, . . . , p}. To establish the asymptotic properties, we first introduce two definitions and present some conditions. Definition 1. Define Hr as the collection of all functions on [0, 1] whose mth order derivative satisfies the Hölder condition of order ν with r ≡ m + ν . That is, for any h ∈ Hr , there exists a constant c ∈ (0, ∞) such that for each h ∈ Hr , |h(m) (s) − h(m) (t)| ≤ c |s − t |ν , for any 0 ≤ s, t ≤ 1. Definition 2. The ∑ function H(X v , U) is said to belong to the varying coefficient class of functions ∑v v (k) (k) 2 k=1 X hk (U); (ii) k=1 E[X hk (U)] < ∞, where hk (·) ∈ Hr .

G if (i) H(X v , U) =

For any random variable Ω with E(Ω 2 ) < ∞, let EG (Ω ) denote the projection of Ω onto G in the sense that: E[(Ω − EG (Ω ))(Ω − EG (Ω ))] =

inf E[(Ω − H(X v , U))(Ω − H(X v , U))],

H(·,·)∈G

if Ω is a random vector, EG (Ω ) is defined by componentwise projection. C1. α0,k (·) ∈ Hr , for k ∈ AV and some r > 1/2. C2. The density function of U, fU (·), is continuous and bounded away from zero and infinity on the support [0, 1]. C3. The covariate vector X has a continuous density and marginal densities of X (k) , k = 1, . . . , p are bounded away from zero and infinity uniformly. [ ] C4. The matrix Σ = E (X c − EG (X c ))(X c − EG (X c ))T is positive ∫ definite. C5. The error ϵ has a positive density function h(x) satisfying [h′ (x)]2 /h(x)dx < ∞, which means that ϵ has finite Fisher information. The conditions C1–C4 are commonly adopted in the semiparametric regression setting, such as in Li and Liang (2008), Tang et al. (2012) and Zhao and Xue (2009). Condition C1 states the smoothness condition on the coefficient functions, which describes a requirement on the best convergence rate that the coefficient functions can be estimated. Condition C2 is needed for establishing the consistency and asymptotic normality of the resulting estimators. Condition C3 is a standard condition on the design matrix. Condition C4 is used to prove the asymptotical normality. Condition C5 is a regular condition on the random errors which is the same as those used in works on rank regression such as Wang, Zhu et al. (2009) and Sun and Lin (2014). r

→ ∞ and Kn = Theorem 1) (Selection Consistency). Under the regularity conditions C 1–C 5. If λmax → 0, n 2r +1 λmin n n ( min O n1/(2r +1) as n → ∞, where λmax = max {λ , λ } and λ = min {λ , λ } . Then we have 1 2 1 2 n n lim Pr

n→∞

( ) ˆ V = AV , A ˆ C = AC and A ˆ Z = AZ = 1. A

Theorem 1 shows that the proposed new method is selection consistent, meaning that it can correctly determine which covariates have a constant or varying effect and select the relevant variables with probability that tends to 1. Theorem 2 (Optimal Convergence Rate). Under the same conditions of Theorem 1. Then

( )   αˆ k (u) − α0,k (u)2 = Op n 2r−+2r1 , k ∈ AV . Thus by Stone (1982), the estimators of varying coefficient components achieve the optimal convergence rate as if the true model structure sets AV , AC and AZ were already known. Theorem 3 (Asymptotic Normality). Under the same conditions of Theorem 1. Then

( ) √ ( ˆc n β − βc0 →d N 0,

1 12τ

where βˆc is the estimator of βc0 , τ =

Σ−1 2



) ;

h(x)2 dx.

Please cite this article in press as: Sun, X., et al., Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression. Journal of the Korean Statistical Society (2017), https://doi.org/10.1016/j.jkss.2017.09.003.

6

X. Sun et al. / Journal of the Korean Statistical Society (

)



By Theorems 2 and 3, we know that the estimated varying coefficients and nonzero constant coefficients all enjoy the oracle property, which means the resulting varying coefficient estimates possess the optimal convergence rate, and the constant coefficient estimates have the same asymptotic distribution as their counterparts obtained under the true model. Let σ 2 = Var(ϵ ), based on the asymptotic variance of least squares estimators given in Tang et al. (2012) and Zhao and Xue (2009), we have the following corollary about asymptotic relative efficiency (ARE). c

ˆ obtained under the Corollary 4. The ARE of the new rank regression based estimator βˆc to the least squares based estimator β LS true model is c

ˆ ˆ c ) = Var(βLS ) = 12σ 2 τ 2 . ARE(βˆc , β LS Var(βˆc ) This ARE has a lower bound√ of 0.864 for estimating the parameter component, which is attained at the random error density h(x) = 3√ (5 − x2 )I(|x| ≤ 5). 20 5

Note that the above obtained ARE is the same as that of the signed-rank Wilcoxon test with respect to the t-test. It is well known in the literature of rank analysis that the ARE is as high as 0.955 for the normal error distribution, and can be significantly higher than 1 for many heavier-tailed distributions. For instance, this quantity is 1.5 for the double exponential distribution and 1.9 for the t distribution with three degrees of freedom. Furthermore, let us consider the following mixture of normals:

ϵi ∼ (1 − ρ )N(0, 1) + ρ N(0, υ (ρ )), i = 1, . . . , n where 0 < ρ < 1, and υ (ρ ) is bounded on [0, 1]. The following corollary indicates that we can let ARE be arbitrarily large by slightly perturbing the normal distribution, while keeping the error variance bounded. Corollary 5. In the above normal mixture case, 12σ τ = 2 2

3

(

)2 √ 2 2ρ (1 − ρ ) ρ2 + √ (ρυ (ρ ) − ρ + 1). (1 − ρ ) + √ υ (ρ ) 1 + υ (ρ ) 2

π

If υ (ρ ) = o(ρ 4 ) (ρ → 0), then 12σ 2 τ 2 → ∞ as ρ → 0. 3. Simulation studies Experiment 1. In this experiment, we demonstrate the finite sample performance of our new method (denoted by NEW), and make comparison with the iterative two-step method in Tang et al. (2012), which used the first step to identify the varying effects and constant effects (including the zero effects), and the second step for separating the zero effects and nonzero constant effect, based on mean regression (denoted by MR) and quantile regression at quantile position τ (denoted by QRτ ). 500 replicates are generated from Yi = α0 (Ui ) +

15 ∑

(k)

Xi αk (Ui ) + ϵi , i = 1, . . . , 500,

(3.1)

k=1 (k)

where Ui ∼ U [0, 1], Xi , k = 1, . . . , 15 are independent N(0, 1) variables. Three cases of error distributions are considered. Case 1: ϵi ∼ N(0, 1). Case 2: ϵi ∼ t(3). Case 3: Tukey contaminated normal ϵi ∼ T (0.10, 5) (with the cumulative distribution function 0.9Φ (x) + 0.1Φ (x/5) where Φ (x) is the distribution function of a standard normal distribution). Set α0 (u) = 20 sin(0.5π u), α1 (u) = 3 cos(π (6u − 5)/3), α2 (u) = 0.5(2 − 3u)3 , α3 (u) ≡ −1.5, α4 (u) ≡ 2 and α5 (u) ≡ · · · ≡ α15 (u) ≡ 0. To assess the selection accuracy, same as Tang et al. (2012), the percentage of correctly selecting the true model (CS), the average number of effects (excluding the intercept) that are selected as varying (A.v), and the average number of redundant variables that are incorrectly selected (A.r) are reported. To summarize the estimation error, the mean squared error (MSE) of αk conditional on the models where αk is selected as constant, k = 3, 4, and the integrated mean squared error (IMSE) of αˆ k (u), k = 0, 1, 2, defined as: IMSE{αˆ k (u)} =

500 100 1 ∑ 1 ∑

500

i=1

100

{αˆ k,i (uj ) − αk (uj )}2 ,

j=1

where { } is a grid equally spaced on [0.02, 0.98], αˆ k,i (·), i = 1, . . . , 500 are the estimates of αk (·) in the ith replicate. The values in parentheses are the standard errors. We also report the results obtained under the true model referred to as Oracle. Several observations can be found from Tables 1–3. First, our new method works well in every case, with high probability, it can correctly separate the varying and constant effects and select the true relevant variables, also it can estimate the uj 100 j=1

Please cite this article in press as: Sun, X., et al., Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression. Journal of the Korean Statistical Society (2017), https://doi.org/10.1016/j.jkss.2017.09.003.

X. Sun et al. / Journal of the Korean Statistical Society (

)



7

Table 1 Simulation results for Case 1 in Experiment 1. CS: the percentage of correctly selecting the true model; A.v: the average number of effects (excluding the intercept) that are selected as varying; A.r: the average number of redundant variables that are incorrectly selected; MSE: the mean squared error; IMSE: the integrated mean squared error. NEW

Oracle.NEW

MR

Oracle.MR

QR0.5

Oracle.QR0.5

90.20 2.06 0.13

100.00 2.00 0.00

86.00 2.09 0.14

100.00 2.00 0.00

85.20 2.09 0.15

100.00 2.00 0.00

4.30(0.12) 3.41(0.17) 5.42(0.09)

4.25(0.13) 3.44(0.15) 5.21(0.12)

4.38(0.10) 3.52(0.14) 5.33(0.11)

4.29(0.11) 3.48(0.12) 5.26(0.10)

4.47(0.20) 3.90(0.12) 5.56(0.13)

4.33(0.17) 3.79(0.13) 5.40(0.11)

α3 α4

3.62(0.13) 3.51(0.12)

3.56(0.12) 3.44(0.09)

3.75(0.14) 3.61(0.11)

3.70(0.13) 3.54(0.10)

3.81(0.11) 3.65(0.12)

3.72(0.12) 3.63(0.11)

Time

7.08(1.87)



21.93(1.68)



13.21(3.23)



CS A.v A.r IMSE×102 α0 (t) α1 (t) α2 (t) MSE×103

Table 2 Simulation results for Case 2 in Experiment 1. CS: the percentage of correctly selecting the true model; A.v: the average number of effects (excluding the intercept) that are selected as varying; A.r: the average number of redundant variables that are incorrectly selected; MSE: the mean squared error; IMSE: the integrated mean squared error. NEW

Oracle.NEW

MR

Oracle.MR

QR0.5

Oracle.QR0.5

87.80 2.09 0.15

100.00 2.00 0.00

60.40 2.46 0.69

100.00 2.00 0.00

78.00 2.20 0.31

100.00 2.00 0.00

4.61(0.19) 3.75(0.21) 5.41(0.14)

4.52(0.15) 3.66(0.17) 5.39(0.12)

6.43(0.25) 4.86(0.28) 7.67(0.26)

6.27(0.23) 4.77(0.30) 7.58(0.27)

5.76(0.21) 4.40(0.14) 6.29(0.17)

5.68(0.18) 4.31(0.15) 6.20(0.13)

α3 α4

3.87(0.15) 3.91(0.18)

3.76(0.20) 3.82(0.15)

8.72(0.26) 6.16(0.23)

8.67(0.24) 6.03(0.25)

5.71(0.21) 5.13(0.15)

5.62(0.20) 5.07(0.11)

Time

8.20(1.27)



23.02(2.11)



14.13(2.08)



CS A.v A.r IMSE×102 α0 (t) α1 (t) α2 (t) MSE×103

Table 3 Simulation results for Case 3 in Experiment 1. CS: the percentage of correctly selecting the true model; A.v: the average number of effects (excluding the intercept) that are selected as varying; A.r: the average number of redundant variables that are incorrectly selected; MSE: the mean squared error; IMSE: the integrated mean squared error. NEW

Oracle.NEW

MR

Oracle.MR

QR0.5

Oracle.QR0.5

85.80 2.10 0.17

100.00 2.00 0.00

54.80 2.57 0.86

100.00 2.00 0.00

75.20 2.31 0.39

100.00 2.00 0.00

4.83(0.21) 4.02(0.12) 5.51(0.15)

4.79(0.20) 3.88(0.17) 5.43(0.18)

6.50(0.29) 5.81(0.33) 8.14(0.30)

6.43(0.27) 5.73(0.31) 8.03(0.33)

6.10(0.23) 4.94(0.20) 7.03(0.18)

6.01(0.20) 4.81(0.19) 6.90(0.15)

α3 α4

3.99(0.20) 4.11(0.22)

3.89(0.16) 4.05(0.20)

9.11(0.30) 9.97(0.30)

9.03(0.32) 9.84(0.28)

6.11(0.20) 5.51(0.17)

5.95(0.19) 5.43(0.15)

Time

8.21(1.44)



21.33(2.35)



15.06(1.78)



CS A.v A.r IMSE×102 α0 (t) α1 (t) α2 (t) MSE×103

nonzero varying and constant coefficients accurately. Second, the new method is more efficient than the quantile regression based method, especially in the view of the estimation error, it is always smaller than QR0.5 . Third, the new method performs comparably with the mean regression based method when the error distribution is normal, but it is more efficient and robust than the mean regression method when errors follow heavy-tail distributions, it has smaller selection error and estimation error. Furthermore, it is of interest to compare the computational efficacy of our method with the iterative two-step method. Tables 1–3 summarize the average computing time (seconds) and the standard errors in parentheses. For a fairer comparison and assessment on how complex the method really is, the computing time listed in Tables 1–3, includes the time for the choice of all the involved tuning parameters. It can be seen that our method is faster than the iterative two-step method. Please cite this article in press as: Sun, X., et al., Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression. Journal of the Korean Statistical Society (2017), https://doi.org/10.1016/j.jkss.2017.09.003.

8

X. Sun et al. / Journal of the Korean Statistical Society (

)



Table 4 Frequencies that α0 (u), k = 0, 1, 2, . . . , 6 are selected to be varying in 100 replicates. Error

α0 (u)

α1 (u)

α2 (u)

α3 (u)

α4 (u)

α5 (u)

α6 (u)

N(0, 1) t(3) T (0.10, 5)

100 100 100

100 100 100

100 100 100

100 100 100

4 2 6

3 5 2

5 4 7

Experiment 2. This experiment will investigate the performance of our method when all coefficient functions are varying. We generate 100 replicates from the following model, Yi = α0 (Ui ) +

6 ∑

(k)

Xi αk (Ui ) + ϵi , i = 1, . . . , 500,

(3.2)

k=1

where α0 (u) = sin(0.5π u), α1 (u) = 3 cos(π (6u − 5)/3), α2 (u) = 0.5(2 − 3u)3 , α3 (u) = 6 − 6u, α4 (u) = · · · α6 (u) = 0. In this (1) (6) experiment, predictors Xi , . . . , Xi and index variable Ui are correlated to each other: the index variable is still simulated (1) (2) (1) from U [0, 1]; Xi is sampled uniformly from [3Ui , 2 + 3Ui ]; Xi , conditioning on Xi , is Gaussian with mean 0 and variance (1)

1+Xi

(1) 2+Xi

; the other predictors are generated from a multivariate normal distribution with zero mean and covariance structure (j )

(j )

(2)

cov(Xi 1 , Xi 2 ) = 4Xi exp(−|j1 − j2 |) for j1 , j2 = 3, 4, 5, 6. In this experiment, the intercept term α0 (u) is also penalized. Table 4 reports the simulation results for the N(0, 1), t(3) and T (0.10, 5) errors respectively, which indicate that our method also works well in this case. Acknowledgments The second author’s research was supported by NNSF project (11171188, 11231005 and 71673171) of China, NSF project (ZR2017BA002) of Shandong Province of China. Appendix In this section, let C denote some positive constants not depending on n, but which may assume different values at each appearance. By the Corollary 6.21 of Schumaker (1981), there exists a vector γ 0 = (γ T0,1 , . . . , γ T0,p )T , such that ∥B(u)T γ 0,k − α0,k (u)∥∞ = O(Kn−r ).



1/2

Proof of Theorem 2. Denote by δn = θn + λ1 + λ2 with θn = dn /n, we first prove that ∥γˆ − γ 0 ∥ = Op (dn )δn . Let 1/2 γ = γ 0 + dn δn V , where V is a pdn dimensional vector. Note that, if L(γ ) is convex with respect to γ , it is sufficient to show, for any given ξ > 0, there exists a large C such that

(

Pr

sup

∥V ∥=C

) L(γ ) > L(γ 0 ) ≥ 1 − ξ .

(A.1)

By virtue of the identity |x − y| − |x| = −ysgn(x) + 2(y − x){I(0 < x < y) − I(y < x < 0)}, and the definition of L(γ ), it follows that

L(γ ) − L(γ 0 ) 1 ∑{

=

n

p

∑{ } } |Yij − ΠTij γ| − |Yij − ΠTij γ 0 | + n pλ1 (∥γ k ∥R ) − pλ1 (∥γ 0,k ∥R )

i
k=1 p

+n

∑{

pλ2 (∥δk ∥D ) − pλ2 (∥δ0,k ∥D )

}

k=1

=−

1∑ n

ΠTij (

γ − γ 0 )sgn(Yij − ΠTij γ 0 ) +

i
2∑ T (Πij γ − Yij )× n i
I(0 < Yij − ΠTij γ 0 < ΠTij (γ − γ 0 )) − I(ΠTij (γ − γ 0 ) < Yij − ΠTij γ 0 < 0)

{

+n

}

p ∑ {

}

pλ1 (∥γ k ∥R ) − pλ1 (∥γ 0,k ∥R ) + n

k=1

:= A1 + A2 + A3 + A4 .

p ∑ {

}

pλ2 (∥δk ∥D ) − pλ2 (∥δ0,k ∥D )

k=1

(A.2)

Please cite this article in press as: Sun, X., et al., Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression. Journal of the Korean Statistical Society (2017), https://doi.org/10.1016/j.jkss.2017.09.003.

X. Sun et al. / Journal of the Korean Statistical Society (

)



9 1/2

r

→ ∞, we have that A1 = Op (δn θn−1 dn ∥V ∥). Moreover, From Lemma in Feng et al. (2015) and the assumption n 2r +1 λmin n taking the similar arguments as in the proof of Lemma 1 in Feng et al. (2015), we can obtain that A2 = τ (γ − γ 0 )T ΠT Π(γ − γ 0 )(1 + op (1)). Applying Lemma A.3 of Huang et al. (2004) to A2 yields A2 = Op (nδn2 ∥V ∥2 ). Obviously, by choosing a sufficiently large C, A2 dominates A1 with probability tending to 1. On the other hand, based √ properties of B-spline and ∑p on the well-known the inequality pλ (|x|) − pλ (|y|) ≤ λ|x − y|, we have that A3 ≤ nC λ1 k=1 ∥γ k − γ 0,k ∥/ dn = Op (nδn2 ∥V ∥). Thus A3 is dominated by A2 if a sufficiently large C is chosen. Similarly, it is easy to verify that A4 is also dominated by A2 . Recall that 1/2 A2 > 0, so we have (A.1) holds, which means ∥γˆ − γ 0 ∥ = Op (dn )δn . Finally, we will show that the convergence rate can be further improved to ∥γˆ − γ 0 ∥ = Op (θn ). In fact, as the model is fixed as n → ∞, we can find a constant C > 0, such that ∥γ 0,k ∥2R = γ T0,k R γ 0,k > C for k ≤ v + c and ∥δ0,k ∥2D = δT0,k Dδ0,k{ > C for k ≤ v . As ∥γˆ − γ} 0 ∥2 = Op (dn δn2 ) = op (dn ) from ( above result and λk =) op (1), k = 1, 2, we have that Pr pλ1 (∥γˆ k ∥R ) = pλ1 (∥γ 0,k ∥R ) → 1, k ≤ v + c and Pr pλ2 (∥δˆ k ∥D ) = pλ2 (∥δ0,k ∥D ) → 1, k ≤ v . These facts indicate that

( Pr n

p ∑ {

)

pλ1 (∥γˆ k ∥R ) − pλ1 (∥γ 0,k ∥R ) > 0

→ 1,

}

k=1

( Pr n

p { ∑

) } ˆ pλ2 (∥δk ∥D ) − pλ2 (∥δ0,k ∥D ) > 0 → 1.

k=1

Removing the regularizing terms A3 and A4 in (A.2), the rate can be improved to ∥γˆ − γ 0 ∥ = Op (θn ) by the same reasoning as above. That is ∥γˆ − γ 0 ∥2 = Op (Kn /n).

  αˆ k (u) − α0k (u)2 =

∫ [0,1]

[αˆ k (u) − α0,k (u)]2 du



[B(u)⊤ γˆ k − B(u)⊤ γ 0,k + Rk (u)]2 du [0,1] ∫ ∫ ≤2 [B(u)⊤ γˆ k − B(u)⊤ γ 0,k ]2 du + 2 [Rk (u)]2 du [0,1] [0,1] ∫ = 2(γˆ k − γ 0,k )⊤ R(γˆ k − γ 0,k ) + 2 [Rk (u)]2 du,

=

(A.3)

[0,1]

where Rk (u) = B(u)T γ 0,k − α0,k (u), furthermore, note that ∥R ∥2 = O(1), we can verify that

(

−2r

(γˆ k − γ 0k )⊤ R(γˆ k − γ 0k ) = Op n 2r +1

)

∫ and [0,1]

( −2r ) [Rk (u)]2 du = Op n 2r +1 .

(A.4)

Thus the proof is completed. Lemma 1. Under the same conditions of Theorem 1. Then, with probability tending to 1, (i) αˆ k (·) is a nonzero constant for k = v + 1, . . . , v + c, (ii) αˆ k (·) ≡ 0 for k = v + c + 1, . . . , p. Proof of Lemma 1. We put our main attention on proving part (i) as an illustration and part (ii) can be similarly proved with its detailed proof omitted. Suppose that B(u)T γˆ k does not represent a nonzero constant for k = v + 1, . . . , v + c. Define γ¯ to be the same as γˆ except that γˆ k is replaced by its projection onto the subspace {γ k : B(u)T γ k stands for a nonzero constant}. Therefore, we have that

L(γˆ ) − L(γ¯ ) = [L(γˆ ) − L(γ 0 )] − [L(γ 0 ) − L(γ¯ )] } 1 ∑{ } 1 ∑{ ˆ − |Yij − ΠTij γ 0 | − ¯ − |Yij − ΠTij γ 0 | |Yij − ΠTij γ| |Yij − ΠTij γ| =

0≥

n

+n

n

i
pλ1 (∥γˆ k ∥R ) − pλ1 (∥γ¯ k ∥R ) + n

}

i
p { ∑

pλ2 (∥δˆ k ∥D ) − pλ2 (∥δ¯ k ∥D )

}

k=1

ˆ γ 0 ) − D2 (γ, ¯ γ 0 ) + D3 (γ, ˆ γ¯ ) + D4 (γ, ˆ γ¯ ). = D1 (γ,

(A.5)

Note that, by the same arguments to the derivation of (A.2), it is not difficult to verify that

ˆ γ 0 ) = τ (γˆ − γ 0 )T ΠT Π(γˆ − γ 0 )(1 + op (1)) + θn−1 (γˆ − γ 0 )T Op (1)1dn , D1 (γ, Please cite this article in press as: Sun, X., et al., Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression. Journal of the Korean Statistical Society (2017), https://doi.org/10.1016/j.jkss.2017.09.003.

10

X. Sun et al. / Journal of the Korean Statistical Society (

)



where 1dn is dn -dimension vector of ones, and

¯ γ 0 ) = τ (γ¯ − γ 0 )T ΠT Π(γ¯ − γ 0 )(1 + op (1)) + θn−1 (γ¯ − γ 0 )T Op (1)1dn . D2 (γ, Therefore, we can show that

ˆ γ 0 ) − D2 (γ, ¯ γ 0) D1 (γ, = τ (γˆ − γ¯ + γ¯ − γ 0 )T ΠT Π(γˆ − γ¯ + γ¯ − γ 0 ) + (γ¯ − γ 0 )T ΠT Π(γ¯ − γ 0 )(1 + op (1)) + θn−1 (γˆ − γ¯ )T Op (1)1dn = τ (γˆ − γ¯ )T ΠT Π(γˆ − γ¯ ) + 2τ (γ¯ − γ 0 )T ΠT Π(γˆ − γ¯ ) + θn−1 (γˆ − γ¯ )T Op (1)1dn ≥ 2τ (γ¯ − γ 0 )T ΠT Π(γˆ − γ¯ ) + θn−1 (γˆ − γ¯ )T · Op (1) · 1dn := N1 + N2 .

(A.6)

√ According to Lemma 3, Lemma A.3 of Huang et al. (2004) and the result ∥γ¯ − γ 0 ∥ = Op (Kn / n) from Theorem 1, it follows that

( ∥N1 ∥ ≤ Op

n Kn

( ) ) p ∑ √ ˆ ¯ ∥γ¯ − γ 0 ∥ · ∥γˆ − γ∥ = Op nKn ∥δk ∥D , k=1

) p ∑ √ ) ( −1 ¯ Op (1)1dn = Op nKn ∥δˆ k ∥D . ∥N2 ∥ ≤ Op θn ∥γˆ − γ∥ (

k=1

These facts lead to

( √

ˆ γ 0 ) − D2 (γ, ¯ γ 0 ) ≥ −Op D1 (γ,

nKn

p ∑

) ∥δˆ k ∥D .

(A.7)

k=1

On the other hand, we have that Pr(pλ1 (∥γˆ k ∥R ) = pλ1 (∥γ¯ k ∥R )) → 1 and Pr(∥δ¯ k ∥D = 0) → 1. Substituting these results into (A.5) yields

{ ˆ γ 0 ) − D2 (γ, ¯ γ 0) + n Pr D1 (γ,

p ∑

} ˆ pλ2 (∥δk ∥D ) ≤ 0 → 1.

(A.8)

k=1 r

In addition, based on the result of Theorem 2 and the condition n 2r +1 λmin → ∞, it is easy to verify that n



∥δˆ k ∥D = ˆ ˆ ∥δk ∥D ) → 1 by )the definition of SCAD penalty function. As a Op ( Kn /n) = op (λ2 ). Hence, we have Pr(pλ2 (∥δk ∥D ) = λ2( ∑p ∑p ∥δˆ k ∥D . Thus, we have consequence, if ∥δˆ k ∥D > 0, we have n pλ (∥δˆ k ∥D ) = Op nλ2 √

k=1

2

k=1

p

ˆ γ 0 ) − D2 (γ, ¯ γ 0) + n D1 (γ,



pλ2 (∥δˆ k ∥D ) > 0,

k=1

which is contradictory to (A.8). Then we complete the proof.

ˆ R > aλ1 for k = 1, . . . , v , Proof of Theorem 3. According to the proof of Lemma 6 in Feng et al. (2015), we know that ∥γ∥ ˆ D > aλ2 for k = v + 1, . . . , v + c. Thus by Lemma 1, we only need to consider a correctly specified partially linear and ∥δ∥ varying coefficient model without regularization terms. Specifically, the corresponding objective function is Φ (γ v , β c ) =

1∑ n

|Yij − Πvij T γ v − X cij T βc |,

i
(1) (v ) v ˆ ˜i = where Πij = Πi − Πj with Πi = (Xi B(u)T , . . . , Xi B(u)T )T , X cij = X ci − X cj . Let (γˆ , β ) = argmin Φ (γ v , βc ), ∆

∑v

(k) k=1 Xi ( 0,k (Ui )

α

c

− αˆ k (Ui )), δn = n−1/2 and βc∗ = δn−1 (βc − βc0 ). Then, with probability that tends to one, βˆ ∗ must be the

minimizer of the following function

Φ∗ (βc∗ ) =

1∑ n

˜ i ) − (ϵj + ∆ ˜ j ) − δn X cij T βc∗ |. |(ϵi + ∆

i
Denote by S(β∗ ) the gradient function of Φ∗ (βc∗ ), that is c

S(βc∗ ) =

{ } ∂ Φ∗ (βc∗ ) δn ∑ ˜ i ) − (ϵj + ∆ ˜ j ) − δn X cij T βc∗ X cij . =− sgn (ϵi + ∆ c ∂β∗ n i̸ =j

Please cite this article in press as: Sun, X., et al., Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression. Journal of the Korean Statistical Society (2017), https://doi.org/10.1016/j.jkss.2017.09.003.

X. Sun et al. / Journal of the Korean Statistical Society (

)



11

Then, we can show that S(βc∗ ) − S(0) = −

δn ∑

+

n

{

T

}

˜ i ) − (ϵj + ∆ ˜ j ) − δn X cij βc∗ X cij sgn (ϵi + ∆

i̸ =j

δn ∑ n

˜ i ) − (ϵj + ∆ ˜ j ) X cij . sgn (ϵi + ∆

{

}

(A.9)

i̸ =j

˜ i = Op (Kn−r ) = op (1) as n → ∞. Hence, following Taking into consideration the results obtained in Theorem 2, we have that ∆ the similar proof of Lemma 2 in Feng et al. (2015), we have that S(βc∗ ) − S(0) = 2τ δn2 Σβc∗ ,

(A.10) cT

cT

where Σ is defined in assumption (C4). Furthermore let Bn (β∗ ) = τ δn2 β∗ Σβ∗ + β∗ S(0) + Φ∗ (0) and its minimizer be ˜ c . Then it is not difficult to verify that β˜ c = −(2τ δ 2 Σ)−1 S(0). Based on Eq. (A.10) and a similar argument of denoted by β ∗ ∗ n Lemma 2 in Yang et al. (2017), it follows that c

c

c c βˆ ∗ = β˜ ∗ + op (1) = −(2τ δn2 Σ)−1 S(0) + op (1).

(A.11)

In addition, by the assumption that ϵi is the random error independent of X i , combined with some calculations, we have

δn−2 S(0)→d N(0, E [(2H(ϵ ) − 1)2 ]Σ),

(A.12)

where H(·) stands for the cumulative distribution function of ϵ . Furthermore, it can be shown that E [(2H(ϵ ) − 1)2 ] =



(2H(ϵ ) − 1)2 h(ϵ )dϵ



4H(ϵ )2 h(ϵ )dϵ − 4

=



4H(ϵ )2 dH(ϵ ) − 4

= =

1 3





4H(ϵ )h(ϵ )dϵ +



h(ϵ )dϵ

4H(ϵ )dH(ϵ ) + 1

.

(A.13)

Therefore, substituting (A.12) and (A.13) into (A.11), we complete the proof. Proof of Theorem 1. The result in Theorem 1 can be obtained directly by combining Lemma 1, Theorem 2 and Theorem 3. We omit the details. Proof of Corollary 4. Based on the asymptotic results of Theorem 3 and the least square B-spline estimate given in ˆ c ) = 12σ 2 τ 2 . In addition, a result of Hodges and Theorem 1 of Tang et al. (2012), we immediately obtain ARE(βˆc , β LS Lehmann (1956) indicates that the ARE has a lower bound 0.864, with this lower bound being obtained at the density 3√ h(x) = (5 − x2 )I(|x| ≤ 5). This completes the proof. 20 5

Proof of Corollary 5. The proofs are based on a direct calculation and applications of Corollary 4, we omit the details. References Ahmad, I., Leelahanon, S., & Li, Q. (2005). Efficient estimation of a semiparametric partially linear varying coefficient model. The Annals of Statistics, 39, 305–332. Fan, J., & Huang, T. (2005). Profile likelihood inferences on semiparametric varying coefficient partially linear models. Bernoulli, 11, 1031–1057. Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360. Feng, L., Zou, C., Wang, Z., Wei, X., & Chen, B. (2015). Robust spline-based variable selection in varying coefficient model. Metrika, 78, 85–118. Hodges, J., & Lehmann, E. (1956). The efficiency of some nonparametric competitors of the t-test. The Annals of Mathematical Statistics, 27, 324–335. Huang, J., Wu, C., & Zho, L. (2004). Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statistica Sinica, 14, 763–788. Hu, T., & Xia, Y. (2012). Adaptive semi-varying coefficient model selection. Statistica Sinica, 22, 575–599. Kai, B., Li, R., & Zou, H. (2011). New efficient estimation and variable selection method for semiparametric varying-coefficient partially linear models. The Annals of Statistics, 39, 305–332. Leng, C. (2009). A simple approach for varying-coefficient model selection. Journal of Statistical Planning and Inference, 139, 2138–2146. Leng, C. (2010). Variable selection and coefficient estimation via regularization rank estimation. Statistica Sinica, 20, 167–181. Li, R., & Liang, H. (2008). Variable selection in semiparametric regression model. The Annals of Statistics, 36, 261–286. Noh, H., & Keilegom, V. (2012). Efficient model selection in semivarying coefficient models. Electronic Journal of Statistics, 6, 2519–2534. Schumaker, L. (1981). Spline functions: Basic theory. New York: Wiley.

Please cite this article in press as: Sun, X., et al., Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression. Journal of the Korean Statistical Society (2017), https://doi.org/10.1016/j.jkss.2017.09.003.

12

X. Sun et al. / Journal of the Korean Statistical Society (

)



Sievers, G., & Abebe, A. (2004). Rank estimation of regression coefficients using iterated reweighted least squares. Journal of Statistical Computation and Simulation, 74, 821–831. Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. The Annals of Statistics, 10, 1040–1053. Sun, J., & Lin, L. (2014). Local rank estimation and related test for varying-coefficient partially linear models. Journal of Nonparametric Statistics, 26, 187–206. Tang, Y., Wang, H., Zhu, Z., & Song, X. (2012). A unified variable selection approach for varying coefficient models. Statistica Sinica, 22, 601–628. Wang, L., Kai, B., & Li, R. (2009). Local rank inference for varying coefficient models. Journal of the American Statistical Association, 104, 1631–1645. Wang, L., Li, H., & Huang, J. Z. (2008). Variable selection in nonparametric varying coefficient models for analysis of repeated measurements. Journal of the American Statistical Association, 103, 1556–1569. Wang, K., & Lin, L. (2016). Robust structure identification and variable selection in partial linear varying coefficient models. Journal of Statistical Planning and Inference, 174, 153–168. Wang, H., & Xia, Y. (2009). Shrinkage estimation of the varying coefficient model. Journal of the American Statistical Association, 104, 747–757. Wang, H., Zhu, Z., & Zhou, J. (2009). Quantile regression in partially linear varying coefficient models. The Annals of Statistics, 37, 3841–3866. Xia, Y., Zhang, W., & Tong, H. (2004). Efficient estimation for semivarying-coefficient models. Biometrika, 91, 661–681. Yang, J., Yang, H., & Lu, F. (2017). Rank-based shrinkage estimation for identification in semiparametric additive models. Statistical Papers. http://dx.doi.org/ 10.1007/s00362-017-0874-z. Zhang, R., Zhao, W., & Liu, J. (2013). Robust estimation and variable selection for semiparametric partially linear varying coefficient model based on modal regression. Journal of Nonparametric Statistics, 25, 523–544. Zhao, P., & Xue, L. (2009). Variable selection in semiparametric regression analysis for longitudinal data. Annals of the Institute of Statistical Mathematics, 64, 213–231. Zhao, W., Zhang, R., Liu, J., & Lv, Y. (2014). Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Annals of the Institute of Statistical Mathematics, 66, 165–191. Zhou, Y., & Liang, H. (2009). Statistical inference for semiparametric varying-coefficient partially linear models with error-prone linear covariates. The Annals of Statistics, 37, 427–458.

Please cite this article in press as: Sun, X., et al., Robust adaptive model selection and estimation for partial linear varying coefficient models in rank regression. Journal of the Korean Statistical Society (2017), https://doi.org/10.1016/j.jkss.2017.09.003.