Journal of the Korean Statistical Society (
)
–
Contents lists available at ScienceDirect
Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss
Robust estimation and variable selection in censored partially linear additive models Huilan Liu a,∗ , Hu Yang b , Xiaochao Xia c a
College of Mathematics and Statistics, Guizhou University, Guiyang, Guizhou Province, 550025, PR China
b
College of Mathematics and Statistics, Chongqing University, Chongqing, 401331, PR China
c
College of Science, Huazhong Agricultural University, Wuhan, Hubei Province, 430070, PR China
article
abstract
info
Article history: Received 12 May 2015 Accepted 19 July 2016 Available online xxxx
In this paper, we consider a new estimation in censored partially linear additive models in which the nonparametric components are approximated by polynomial spline. For identifying the significant variables in the linear part, a regularization procedure based on adaptive lasso is proposed for estimation and variable selection simultaneously. Under some regular conditions, the asymptotic normality and oracle property of the parametric components are established, and the convergence rates of the nonparametric components are obtained. Simulation studies and a real data analysis are presented to illustrate the behavior of the proposed estimators. © 2016 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.
AMS 2000 subject classifications: primary 62N01 secondary 62N02 Keywords: Censored data Composite quantile regression Partially linear additive model Spline approximation Variable selection
1. Introduction Censored data is frequently encountered in many scientific fields, such as in economics, biomedical science and many other fields. A variety of semiparametric survival models have been considered for censored data analysis. Among them, the proportional hazards proposed by Cox (1972) is perhaps the most popular one and widely studied in literature. However, the proportional hazards assumption may not hold in some practical applications. An attractive alternative is the accelerated failure time (AFT) model (Cox & Oakes, 1984; Kalbfleisch & Prentice, 1980), which directly formulates the relationship between the logarithm of the failure time and the covariates X by a linear model, log(T ) = X T β + ϵ,
(1)
where T is the failure time, β is a p-vector of regression coefficients, X is a p-vector of covariates and ϵ is the error. It is well-known that the AFT model is useful. But the assumption that each covariate has a linear effect on the log survival time is not appropriate in some situations. Partially linear additive model (PLAM) is a generalization of linear regression model and it retains additive nonparametric components to gain more flexibility of the model. In this work, we consider the partially linear additive AFT model for right censored data. The partially linear additive AFT model can be formed as log(Ti ) = XiT β ∗ +
J
gj∗ (Zij ) + ϵi ,
i = 1, 2, . . . , n,
j =1
∗
Corresponding author at: College of Mathematics and Statistics, Guizhou University, Guiyang, Guizhou Province, 550025, PR China. E-mail address:
[email protected] (H. Liu).
http://dx.doi.org/10.1016/j.jkss.2016.07.002 1226-3192/© 2016 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.
(2)
2
H. Liu et al. / Journal of the Korean Statistical Society (
)
–
where Ti is the failure time, Xi is a p-vector of covariates, β ∗ = (β1∗ , . . . , βp∗ )T is a p-vector, corresponding regression
coefficients, Zi = (Zi1 , . . . , ZiJ )T ∈ RJ with Zij being the jth component of Zi , gj∗ ’s are unknown nonparametric functions of Zij and ϵi ’s are i.i.d random errors, which are independent of (Xi , Zi )’s. To ensure identifiability of the nonparametric functions, we assume that Egj∗ (Zj ) = 0 for j = 1, . . . , J. Due to right censoring, we only observe (Yi , Xi , Zi , δi ) with Yi = min(Ti , Ci ) and δi = I (Ti ≤ Ci ), where Ci is the censoring variable and I (A) is an indicator function of set A. For uncensored PLAM, there are a lot of developments in the literature. For instance, Opsomer and Ruppert (1999) considered the asymptotic properties of the kernel-based backfitting estimators. Liang, Thurston, Ruppert, Apanasovich, and Hauser (2008) studied the PLAM when the linear covariate is measured with error. Liu, Wang, and Liang (2011) developed a variable selection procedure based on least squares regression and spline approximation. Guo, Tang, Tian, and Zhu (2013) considered estimation and variable selection procedure based on composite quantile regression and spline approximation. In this paper, we focus on the estimation of censored composite quantile partially linear additive regression and the variable selection via applying the adaptive lasso penalty to select the significant covariates in the linear part. There are plenty of literatures on some penalization-based variable selection methods for linear parametric model, such as the least absolute shrinkage and selection operator (Lasso) (Tibshirani, 1996), the smoothly clipped absolute deviation (SCAD) (Fan & Li, 2001) and the adaptive Lasso (Zou, 2006). Many studies extended the penalization-based variable selection method to semiparametric models. For example, Xie and Huang (2009) considered SCAD-penalized regression in highdimensional partially linear models. Lian (2012) Wang, Li, and Huang (2008), and Wang and Xia (2009) studied the varyingcoefficient models and Liu et al. (2011) considered partially linear additive models for variable selection on the linear part. Ma and Du (2012) considered variable selection in partially linear regression model with a diverging dimension for right censored data. Quantile regression is an effective approach in censored data analysis. This is due to the fact, quantile regression models (Koenker & Bassett, 1978) compared to the mean regression models, offer more robust merits especially when outliers are present in data. For example, Ying, Jung, and Wei (1995) discussed censored median regression model subject to the case that the censoring variable is independent of the covariates. Yang (1999) studied censored median regression based on weighted empirical survival and hazard functions. Bang and Tsiatis (2002) and Zhou (2006) proposed inverse-censoring-probability weighted least absolute deviation (LAD) method for randomly censored models. Shows, Lu, and Zhang (2010) proposed the inverse-censoring-probability weighted LAD loss with the adaptive LASSO penalty (Zou, 2006). It seems that quantile regression improves the efficiency relative to least squares regression, quantile regression at a single level may result, however, in an arbitrarily small relative efficiency. This might not be helpful in some situation. To avoid this, Zou and Yuan (2008) proposed a composite quantile regression (CQR) oracle procedure to select the significant variables. Considerable research efforts have been devoted to the penalized CQR estimation. Kai, Li, and Zou (2011) considered the CQR method and variable selection in semiparametric varying-coefficient partially linear models. Jiang, Qian, and Zhou (2012) and Tang, Zhou, and Wu (2012) provided the weighted CQR estimation and variable selection for randomly censored linear model. Guo et al. (2013) applied penalized CQR technique to high-dimensional partially linear additive models. Yang and Liu (2014) considered penalized weighted CQR estimation with missing covariates. The remainder of this article is organized as follows: In Section 2, we introduce the proposed estimation and variable selection procedure, and report the main theoretical results. Simulation studies and a real data analysis are presented in Section 3. In Section 4, a conclusion for our proposed methods is drawn. All the proofs of theorems in this paper are given in Appendix. 2. Methodology and main results 2.1. Spline approximation and estimation Suppose that each Zij takes values on a compact support [a, b], where a and b are finite numbers. Let a = ξ0 < ξ1 < · · · < ξM < ξM +1 = b be a partition of [a, b] with M + 1 subintervals IMt = [ξt , ξt +1 ), t = 0, . . . , M − 1 and IMM = [ξM , ξM +1 ), where M ≡ Mn = nv with 0 < v < 0.5 is a positive integer such that hn , max1≤m≤M +1 |ξm − ξm−1 | = O(n−v ) (see, Huang, Horowitz, & Wei, 2010). Let Sn be the space of polynomial splines of degree r ≥ 1 with simple knots at the points ξ1 , . . . , ξM . This space consists of all functions s satisfying: (i) the restriction of s to any interval IMt (0 ≤ t ≤ M ) is a polynomial of degree r, (ii) for r ≥ 2 and 0 ≤ r ′ ≤ r − 2, s is r ′ times continuously differentiable on [a, b]. According to Schumaker (1981), there exists a normalized B-spline basis {Bw , 1 ≤ w ≤ mn } for Sn , where mn = M + r is the dimension of Sn . Thus, for any function gnj ∈ Sn , we have gnj (t ) =
mn
αwj Bw (t ).
(3)
w=1
Under reasonable smoothness assumptions, we can approximate any smooth function by basis functions in Sn . Thus, the problem of estimating gj∗ ’s in the model (2) becomes that of estimating αwj .
H. Liu et al. / Journal of the Korean Statistical Society (
)
–
3
When the data has no censoring, following Guo et al. (2013), one can estimate β ∗ and gj∗ , j = 1, . . . , J by minimizing the following B-spline approximation CQR (BSA-CQR) loss function q n
ρτk log(Ti ) − cτk −
XiT
β−
J mn
αjw Bw (Zij ) ,
(4)
j=1 w=1
k=1 i=1
with the following constraints mn n
αjw Bw (Zij ) = 0,
j = 1, . . . , J ,
(5)
i=1 w=1
where ρτk (x) = τk x+ +(1 −τk )x− is the quantile loss function where superscripts + and − stand respectively for the positive and negative parts, τk = q+k 1 , where q is the number of quantiles, and cτk is the τk quantile of ϵ , (k = 1, 2, . . . , q). These centering constraints ensure unique identification for gj∗ ’s, due to the restriction Egj∗ (Zj ) = 0 for j = 1, . . . , J. Therefore, we can convert (4) and n (5) to an unconstrained optimization problem by centering the basis functions as follows. ¯ Let B¯ jw = 1n i=1 Bw (Zij ), and ψjw (x) = Bw (x) − Bjw , where j = 1, . . . , J and w = 1, . . . , mn . Denote ϕij = (ψj1 (Zij ), T T T T . . . , ψjmn (Zij )) , and ψi = (ϕi1 , . . . , ϕiJ ) , where ψi is the ith sample of ψ . Denote αnj = (αj1 , . . . , αjmn )T , j = 1, . . . , J and T αn = (αn1 , . . . , αnJT )T . (4) and (5) can be rewritten as q n
ρτk (log(Ti ) − cτk − XiT β − ψiT αn ).
(6)
k =1 i =1
Under random censorship, we let G(.) be the survival function for the censoring time Ci , we can obtain the estimators of β ∗ and gj∗ , j = 1, . . . , J, by minimizing the following weighted loss function: q n δi ρτk (log Yi − cτk − XiT β − ψiT αn ). G ( Y ) i k=1 i=1
ˆ (.) be the Kaplan–Meier estimator of However, the survival function G(.) of Ci is unknown and need to be estimated. Let G survival function of Ci based on {(Yi , δi ), i = 1, . . . , n}, we propose the following B-spline approximation weighted CQR (BSA-WCQR) estimation approach for censored PLAM, which minimizes the loss function q n δi ρτk (log Yi − cτk − XiT β − ψiT αn ). ˆ k=1 i=1 G(Yi )
(7)
By minimizing (7), we can get the BSA-WCQR estimators c˜τ1 , . . . , c˜τq , β˜ and α˜ n , and estimators of gj∗ ’s by g˜j (z ) =
mn
α˜ jw ϕjw (z ),
j = 1, . . . , J .
w=1
In Theorem 2.1, we established the asymptotic normality for β˜ and convergence rate of estimated nonparametric functions. 2.2. Variable selection For identifying the significant variables in the linear part, we propose an adaptive lasso penalized BSA-AWCQR method for the censored PLAM. The corresponding BSA-AWCQR loss function is given by p q n |βj | δi ρτk (log Yi − cτk − XiT β − ψiT αn ) + λn , ˜ 2 ˆ | G ( Y ) i j=1 βj | k=1 i=1
(8)
where βj is the jth coordinate of β , β˜ is unpenalized estimator of β ∗ and λn is a tuning parameter which controls the model sparseness and may be chosen by a data-driven method. By minimizing (8), we can get the BSA-AWCQR estimator cˆτ1 , . . . , cˆτq , βˆ and αˆ n , and the corresponding estimators of gj∗ ’s can be obtained by gˆj (z ) =
mn
αˆ jw ϕjw (z ),
j = 1, . . . , J .
w=1
In Theorem 2.2, we established the oracle property of the BSA-AWCQR estimator for β˜ and convergence rate of corresponding nonparametric functions.
4
H. Liu et al. / Journal of the Korean Statistical Society (
)
–
2.3. Computational algorithm In this subsection, we describe the computational procedure for the proposed methods. By introducing slack variables
+ + + + T − − − β + = (β1+ , . . . , βp+ )T , β − = (β1− , . . . , βp− )T , αn+ = (α11 , . . . , α1m , . . . , αJ1 , . . . , αJm ) , αn− = (α11 , . . . , α1m , . . . , αJ1 , n n n − T + − . . . , αJmn ) , (cτ+k , cτ−k ), (ξik , ξik ), (i = 1, 2, . . . , n), (k = 1, . . . , q), the minimization of (7) can be written as the following
linear programming problem:
q n δi (τk ξik+ + (1 − τk )ξik− ) + − − + − + ˆ (Yi ) β + ,β − ,αn ,αn ,cτk ,cτk ,ξik ,ξik k=1 i=1 G
min
subject to
ξik+ ≥ 0, ξik− ≥ 0, β + ≥ 0, β − ≥ 0, αn+ ≥ 0, αn− ≥ 0, cτ+k ≥ 0, cτ−k ≥ 0, ξik+ − ξik− = log Yi − (cτ+k − cτ−k ) − XiT (β + − β − ) − ψiT (αn+ − αn− ), i = 1, 2, . . . , n; k = 1, . . . , q, where z + = zI (z > 0) and z − = −zI (z < 0). Given λn and β˜ , the minimization of (8) can be written as: p q n 1 δi (τk ξik+ + (1 − τk )ξik− ) + λn (βj+ + βj− ) + − + − + − 2 ˜ ˆ (Yi ) | β | β + ,β − ,αn ,αn ,cτk ,cτk ,ξik ,ξik k=1 i=1 G j j=1
min
subject to
ξik+ ≥ 0, ξik− ≥ 0, β + ≥ 0, β − ≥ 0, αn+ ≥ 0, αn− ≥ 0, cτ+k ≥ 0, cτ−k ≥ 0, ξik+ − ξik− = log Yi − (cτ+k − cτ−k ) − XiT (β + − β − ) − ψiT (αn+ − αn− ), i = 1, 2, . . . , n; k = 1, . . . , q; j = 1, . . . , p. In this article, we use lpSolve package in R to solve these linear programming problems easily. 2.4. Parameter tuning The proposed estimators involve the penalization parameter λn , which needs to be selected by minimizing some modelselection criteria such as the generalized cross validation selector (Fan & Li, 2001) and the BIC selector (Lee & Noh, 2014; Wang, Li, & Tsai, 2007). In this work, we use a BIC-type criterion to select λn , that is,
q n δi log n T ˆ T BIC (λn ) = log ρτk (log(Yi ) − Xi βλn − ψi αˆn λn − cˆkλn ) + DFλn ˆ n k=1 i=1 G(Yi )
(9)
where βˆ λn and αˆn λn are the proposed estimators by minimizing (8) with given tuning parameter λn , DFλn is the number of nonzero elements in βˆ λn . We can select the optimal tuning parameter by minimizing BIC(λn ). 2.5. Theoretical properties of our estimators Throughout this paper, let ∥ · ∥ be the Euclidean norm of vectors. For any matrix V , denote its L2 norm as ∥V ∥2 = b ∥Vu∥ sup∥u∦=0 ∥u∥ . Define ∥g ∥2 = [ a g 2 (z )fZ (z )dz ]1/2 for any measurable function g on [a, b], where fZ (z ) is the density function of the covariate Z . We define a filtration z(u) to be the collection of σ algebras generated by {I (Ci ≤ t ), t ≤ u; I (Ti ≤ y), Xi , Zi , τk ∈ (τ1 , . . . , τq ), 0 ≤ y ≤ τ , i = 1, . . . , n}, where τ = inf{t : H (t ) = 1} and H (t ) = P (T ≤ t ). The following conditions are necessary for theoretical derivations in this article. Let γ be a nonnegative integer such that 0 ≤ γ ≤ r − 1 with r > 2. Let H be the collection of function f on [a, b] for which the γ -th order derivative satisfies the Lipschitz condition of order v ∈ (0, 1], that is, for ρ = γ + v > 0.5, |f (γ ) (s) − f (γ ) (t )| ≤ c0 |s − t |v , s, t ∈ [a, b], where c0 is a positive finite constant. Condition (1) gj∗ ∈ H and Egj∗ (Zj ) = 0, j = 1, . . . , J. Condition (2) The covariate Zj has a density fZj (.) and there exist two constant c1 and c2 such that 0 < c1 ≤ fZj (z ) ≤ c2 < ∞ on [a, b] for every j = 1, . . . , J. Condition (3) Ci is assumed to be independent of (Xi , Zi ) and Ti . Condition (4) P (t ≤ T ≤ C ) ≥ ς0 for any t ∈ [0, τ ], where ς0 is a constant greater than zero. Condition (5) ϵ1 , ϵ2 , . . . , ϵn have a common continuous and differentiable probability density function f (·) satisfying 0 < f (ck0 ) ≤ B0 , for k = 1, 2, . . . , q, where B0 is a constant bigger than zero.
H. Liu et al. / Journal of the Korean Statistical Society (
Condition (6) The eigenvalues of E
C
cX T
X cT
cXX T
)
–
5
are bounded and away from zero and X and Z are not linear dependent,
where C, c and c are defined in Lemma 4. These assumptions are standard in the literature. Conditions (1) and (2) are commonly assumed in Huang et al. (2010) and Guo et al. (2013). Conditions (3)–(5) are regular in the context of survival analysis, similar to those in Shows et al. (2010) and Tang et al. (2012). Condition (6) guarantees that the eigenvalues of D which is defined in Lemma 4 in Appendix are bounded and away from zero. Lemma 2 stated in Appendix implies the eigenvalues of E ψψ T are bounded and away from zero. Thus, it is expected that the eigenvalues of D are bounded and away from zero if eigenvalues of E
cX T
C
T
Xc
cXX T
are so, and Z and X
are not linear dependent (see Hu & Lian, 2013; Lian, Li, & Tang, 2014). This can in turn be guaranteed with assumptions on the density of (Y , X , Z ), similar to assumptions (B5) and (B6) of Huang (1999). In this article, in order to simplify the proof of theorems, we assume that condition (6) holds. The following result establishes the asymptotic property of the minimizer of (7). Its proof is given in Appendix. Theorem 2.1. Under the conditions (1)–(6), we have
√ 1 −1 T n(β˜ − β ∗ ) →d N (0, D− 1 Σ (D1 ) ), J −2ρ ∗ 2 (b) ∥ j=1 (˜gnj − gj )∥2 = Op (mn /n) + Op (mn ),
(a)
where D1 and Σ are defined in Appendix. Suppose that mn = O(n1/(2ρ+1) ), part (b) of Theorem 2.1 can be rewritten as
2 J − 2ρ ∗ (˜gnj − gj ) = Op n 2ρ+1 . j =1 2
Part (a) of Theorem 2.1 shows that the asymptotic property of the estimator of the linear part is independent of the information of the nonparametric part. Part (b) gives the convergence rate of the nonparametric part, which is the optimal rate as in nonparametric regression based on B-splines (Stone, 1982). The following theorem demonstrates the oracle property of BSA-AWCQR estimator. λn Theorem 2.2. Under the conditions (1)–(6), as n → ∞, λn → ∞ and √ → 0, then βˆ must satisfy: n
(a) Consistency: P ({j : βˆ j = ̸ 0} = Λ) → 1, (b) Asymptotic normality:
√
1 −1 T ∗ n(βˆ Λ − βΛ ) →d N (0, [D− 1 Σ (D1 ) ]ΛΛ ),
where Λ = {j : βj∗ ̸= 0}. The properties of gˆj given in Theorem 2.1 still hold, that is
2 J 2ρ − 2ρ+ ∗ 2ρ 1 , ) = O n (ˆgnj − gj ) = Op (mn /n) + Op (m− p n j =1 2
1/(2ρ+1)
as mn = O(n ). By Theorem 2.2, we can see that the zero coefficients of the BSA-AWCQR estimator in the parametric components are exactly zero with probability tending to one when n is large. The estimator for nonzero coefficients enjoys the same asymptotic normality as that in Theorem 2.1. The nonparametric part also achieves the optimal convergence rate. 3. Numerical studies 3.1. Simulations In this section, we conduct some simulation studies to illustrate the performance of our proposed methods. We generated data from the model log(Ti ) =
p l =1
Xil βl∗ +
J
gj∗ (Zij ) + ϵi ,
i = 1, 2, . . . , n,
j =1
where n is the sample size, p is the dimension of the linear part and J = 4. The coefficient vector for the linear part is β ∗ = (β1∗ , β2∗ , . . . , βp∗ )T . We take g1∗ (z ) = 5z, g2∗ (z ) = 3(2z − 1)2 , g3∗ (z ) = 4 sin(2π z )/(2 − sin(2π z )) and g4∗ (z ) = 6(0.1 sin(2π z )+ 0.2 cos(2π z )+ 0.3 sin(2π z )2 + 0.4 cos(2π z )3 + 0.5 sin(2π z )3 ). The covariate vector X = (X1 , X2 , . . . , Xp )T is standard multivariate normal and the correlation between Xi and Xj is ϱ|i−j| with ϱ = 0.5. Zj ’s are independent and uniformly distributed on the interval [0, 1] for j = 1, . . . , 4. We consider three different error distributions: N (0, 1), t (3) and 0.9N (0, 1)+ 0.1N (0, 52 ). The censoring variable Ci is generated from a mixture of a point mass at infinity and a uniform
6
H. Liu et al. / Journal of the Korean Statistical Society (
)
–
Table 1 Estimation results for ϵ ∼ N (0, 1). n
Censoring rates
β1
Methods Bias
200
BSA-WCQR BSA-WQR0.25 BSA-WQR0.5 BSA-WQR0.75 BSA-WLS
−0.0010
BSA-WCQR BSA-WQR0.25 BSA-WQR0.5 BSA-WQR0.75 BSA-WLS
−0.0083
10%–15%
30%–35%
10%–15%
30%–35%
300
β2 SD
Bias
β3 SD
Bias
g SD
IABIAS
MISE
0.1112 0.1359 0.1301 0.1419 0.1094
−0.0052 2e−04 −0.0064 −0.0126 −0.0050
0.1188 0.1472 0.1398 0.1522 0.1131
6e-04 0.0101 0.0072 −0.0055 0.0028
0.1070 0.1418 0.1267 0.1393 0.1041
0.2705 0.3214 0.3043 0.3225 0.2642
0.1242 0.1756 0.1566 0.1757 0.1187
0.1655 0.1904 0.1900 0.1987 0.1790
−0.0031 0.0491 −3e−04 −0.0575 −6e−04
0.1806 0.2248 0.2078 0.2165 0.2021
−0.0105
0.0872 −0.0071 −0.0975 −0.0026
0.0446 −0.0089 −0.0720 −0.0125
0.1556 0.1921 0.1833 0.1902 0.1665
0.3509 0.4148 0.3966 0.4155 0.3746
0.2146 0.3002 0.2736 0.3008 0.2418
BSA-WCQR BSA-WQR0.25 BSA-WQR0.5 BSA-WQR0.75 BSA-WLS
0.0010 0.0062 0.0062 −0.0096 0.0027
0.0876 0.1148 0.1022 0.1102 0.0851
−0.0090 −0.0029 −0.0131 −0.0148 −0.0095
0.0986 0.1235 0.1145 0.1273 0.0949
0.0020 0.0099 0.0022 −0.0020 0.0034
0.0841 0.1007 0.1020 0.1168 0.0813
0.2197 0.2615 0.2457 0.2612 0.2165
0.0829 0.1161 0.1029 0.1163 0.0804
BSA-WCQR BSA-WQR0.25 BSA-WQR0.5 BSA-WQR0.75 BSA-WLS
−0.0138
0.1281 0.1494 0.1530 0.1619 0.1577
0.0055 0.0403 0.0083 −0.0304 3e−04
0.1371 0.1726 0.1672 0.1676 0.1731
−0.0076
0.1220 0.1497 0.1425 0.1615 0.1504
0.2780 0.3274 0.3188 0.3423 0.3287
0.1335 0.1867 0.1761 0.2022 0.1856
0.0158 0.0012 −0.0174 −0.0020
0.0528 −0.0143 −0.0848 −0.0121
0.0392 −0.0069 −0.0525 5e−04
distribution on (0, 5). By adjusting the mixing percentage, we could achieve a censoring rate approximately at (10%–15%) or (30%–35%). We use the cubic B-spline with six evenly distributed interior knots for all nonparametric components, similar to Guo et al. (2013) and Huang et al. (2010). Example 1. Estimation of censored PLAMs. In this example, we compare the performance of the proposed BSA-WCQR estimator (q = 5) for censored PLAMs with the weighted quantile regression estimators and weighted least squares regression estimator based on B-spline approximation (BSA-WQR0.25, BSA-WQR0.5, BSA-WQR0.75 and BSA-WLS). We set β ∗ = (3, 1.5, 2)T and conduct 500 simulations for two sample sizes n = 200 and n = 300. For β˜ , we record its empirical biases (Bias) and the standard deviation (SD). For the nonparametric part, we use the averaged integrated absolute biases (IABIAS) and the mean integrated square error (MISE) to show the performance of the above methods. IABIAS and MISE are defined as follows. IABIAS (˜g ) =
500 1
500 a=1
1
ngrid J
ngrid × J i=1 j=1
(a)
∗(a)
|˜gj (zi ) − gj
(zi )| ,
and MISE (˜g ) =
500 1
500 a=1
1
ngrid J (a) ∗(a) 2 [˜gj (zi ) − gj (zi )] ,
ngrid × J i=1 j=1
where {zi : i = 1, . . . , ngrid } is a set of grid points uniformly on [0, 1] with ngrid = 200. The simulation results are presented in Tables 1–3. From Tables 1–3, we can see that the parameter estimates are approximately unbiased and the performance of the BSA-WCQR estimator is better than BSA-WQR estimators in all settings. When model error is normal, our BSA-WCQR estimator and BSA-WLS estimator perform very similarly, meanwhile BSA-WCQR estimator is significantly better than BSA-WLS when model error is not normal. As the censoring percentage increases, the Bias, SD, IABIAS and MISE of all estimators increase. As expected, increasing the sample size leads to a better performance of all estimators. Example 2. Variable selection in censored PLAMs. In this example, we compare the variable selection results of the proposed BSA-AWCQR procedure with the penalized BSA-WQR procedures (BSA-AWQR0.25, BSA-AWQR0.5, BSA-AWQR0.25). We set β ∗ = (3, 1.5, 0, 0, 2, 0, 0, 0)T (i.e. the number of nonzero coefficients in the linear part is 3 and the number of the zero coefficients is 5). The nonvanishing variables are generated same as those in Example 1. For comparison, we use C, IC, CF and GMSE to illustrate the variable selection results in the linear part, where C represents the number of nonzero coefficients in the linear part correctly estimated to be zero, IC stands for the number of nonzero coefficients incorrectly estimated to be zero, CF is the percentage of the models identified correctly, and GMSE is the generalized mean squares error, defined as GMSE = (βˆ − β ∗ )T EXX T (βˆ − β ∗ ). We also use IABIAS and MISE to illustrate the performance of the nonparametric part. For each setting, we conduct 100 simulations and give the simulation results in Table 4.
H. Liu et al. / Journal of the Korean Statistical Society (
)
–
7
Table 2 Estimation results for ϵ ∼ t (3). n
Censoring rates
β1
Methods Bias
200
10%–15%
30%–35%
300
10%–15%
30%–35%
β2 SD
BSA-WCQR BSA-WQR0.25 BSA-WQR0.5 BSA-WQR0.75 BSA-WLS
−0.0032
BSA-WCQR BSA-WQR0.25 BSA-WQR0.5 BSA-WQR0.75 BSA-WLS
−0.0021
BSA-WCQR BSA-WQR0.25 BSA-WQR0.5 BSA-WQR0.75 BSA-WLS
−0.0050
BSA-WCQR BSA-WQR0.25 BSA-WQR0.5 BSA-WQR0.75 BSA-WLS
Bias
β3 SD
Bias
0.1422 0.1926 0.1623 0.2019 0.1779
0.0031 0.0107 0.0055 −0.0035 −0.0072
0.1514 0.2008 0.1710 0.2110 0.1918
−0.0021
0.2242 0.2789 0.2498 0.2735 0.2969
0.0064 0.0703 0.0159 −0.0572 0.0183
0.2381 0.3046 0.2670 0.2999 0.3297
−0.0075
0.1062 0.1428 0.1187 0.1388 0.1401
0.0053 0.0178 0.0096 −0.0019 0.0106
0.1180 0.1535 0.1333 0.1575 0.1514
−0.0061
0.0017 −0.0083 −0.0184 −0.0044 0.0065 0.0993 0.0082 −0.0795 0.0204
0.1633 0.1942 0.1858 0.2130 0.2546
−0.0088
0.1677 0.2144 0.2001 0.2243 0.2550
−0.0108
0.0127
−0.0011 −0.0250 0.0047 0.1272 9e−04 −0.1223 −0.0058
0.0374 −0.0065 −0.0485 −0.0144
g SD
0.0106
−0.0030 −0.0174 0.0044 0.0615 −0.0056 −0.0903 −0.0053
0.0041 −0.0062 −0.0149 −0.0033 0.0460 −0.0123 −0.0794 3e−04
IABIAS
MISE
0.1477 0.1954 0.1609 0.1918 0.1754
0.3187 0.3991 0.3473 0.4079 0.3822
0.1752 0.2762 0.2083 0.2944 0.2572
0.2084 0.2600 0.2372 0.2627 0.2664
0.4411 0.5287 0.4865 0.5649 0.5693
0.3547 0.5139 0.4381 0.6532 0.6102
0.1078 0.1423 0.1233 0.1433 0.1407
0.2539 0.3130 0.2728 0.3180 0.3088
0.1096 0.1689 0.1270 0.1752 0.1656
0.1578 0.1927 0.1788 0.2170 0.2469
0.3403 0.4122 0.3784 0.4467 0.4966
0.2063 0.3108 0.2552 0.3786 0.4654
Table 3 Estimation results for ϵ ∼ 0.9N (0, 1) + 0.1N (0, 52 ). n
Censoring rates
β1
Methods Bias
200
300
β2 SD
Bias
β3 SD
Bias
g SD
IABIAS
MISE
10%–15%
BSA-WCQR BSA-WQR0.25 BSA-WQR0.5 BSA-WQR0.75 BSA-WLS
0.0077 0.0315 0.0138 −0.0138 0.0095
0.1291 0.1695 0.1470 0.1745 0.1950
−0.0079 −0.0052 −0.0110 −0.0118 −0.0054
0.1502 0.1997 0.1642 0.1976 0.2291
0.0038 0.0231 3e−04 −0.0048 0.0093
0.1265 0.1700 0.1389 0.1702 0.1988
0.3022 0.3759 0.3313 0.3854 0.4176
0.1581 0.2540 0.1903 0.2700 0.3065
30%–35%
BSA-WCQR BSA-WQR0.25 BSA-WQR0.5 BSA-WQR0.75 BSA-WLS
−0.0080
0.1976 0.2492 0.2292 0.2607 0.3039
−0.0049
0.2153 0.2688 0.2392 0.3001 0.3498
1e−04 0.0743 0.0043 −0.0720 0.0039
0.2054 0.2385 0.2264 0.2696 0.3099
0.4262 0.5083 0.4728 0.5659 0.6160
0.3353 0.4873 0.4114 0.6286 0.7090
BSA-WCQR BSA-WQR0.25 BSA-WQR0.5 BSA-WQR0.75 BSA-WLS
−0.0025
0.0978 0.1330 0.1119 0.1325 0.1515
−0.0013
0.1086 0.1445 0.1201 0.1521 0.1701
0.0062 0.0176 0.0048 −0.0028 0.0100
0.1059 0.1316 0.1129 0.1381 0.1616
0.2441 0.2987 0.2669 0.3025 0.3331
0.1021 0.1537 0.1212 0.1609 0.1921
0.1713 0.2021 0.1993 0.2319 0.3096
0.0013 0.0555 0.0053 −0.0454 0.0107
0.1502 0.1861 0.1756 0.2018 0.2757
0.3249 0.3838 0.3630 0.4352 0.5410
0.1855 0.2744 0.2331 0.3759 0.5393
10%–15%
30%–35%
BSA-WCQR BSA-WQR0.25 BSA-WQR0.5 BSA-WQR0.75 BSA-WLS
0.1164 0.0033 −0.1271 0.0074
0.0020 −0.0039 −0.0078 0.0018
−4e−04 0.0764 0.0118 −0.0797 −0.0066
0.1487 0.1889 0.1673 0.2054 0.2785
0.0479 −0.0045 −0.0590 −0.0025
0.0096 0.0013 −0.0069 −3e−04
−0.0043 0.0484 −0.006 −0.0519 0.0127
From the results reported in Table 4, we can see that the BSA-AWCQR procedure and BSA-AWQR0.5 procedure perform better than other variable selection procedures in terms of CF regardless of the error distribution, censoring rate and sample size, and the values of GMSE of our BSA-AWCQR estimator are smaller than the values of GMSE of BSA-AWQR0.5. Moreover, IABIAS and MISE of our BSA-AWCQR estimator are smaller than those of the BSA-AWQR estimator. As the censoring percentage increases, the values of GMSE, IABIAS and MISE for all procedures increase and the CF of all procedures decreases. As the sample size increases, the values of GMSE, IABIAS and MISE for all procedures decrease and the values of CF for all procedures increase. 3.2. Real data analysis In this section, we use a real dataset to illustrate our proposed procedures. The dataset is PBC data, collected at the Mayo clinic in primary biliary cirrhosis of the liver between 1974 and 1984. These data can be found in Therneau and Grambsch (2000). In the data, a total of 424 patients agreed to participate in the randomized trial where 312 patients contain largely complete data. We focus on the 276 observations without missing values. Among these 276 patients, there are 111
8
H. Liu et al. / Journal of the Korean Statistical Society (
)
–
Table 4 Variable selection results in Example 2. n
Cr
Dist.
200
10%–15%
N (0, 1)
t (3)
MN
30%–35%
N (0, 1)
t (3)
MN
300
10%–15%
N (0, 1)
t (3)
MN
30%–35%
N (0, 1)
t (3)
MN
Methods
GMSE
C
IC
CF
IABIAS
MISE
BSA-AWCQR BSA-AWQR0.25 BSA-AWQR0.5 BSA-AWQR0.75 BSA-AWCQR BSA-AWQR0.25 BSA-AWQR0.5 BSA-AWQR0.75 BSA-AWCQR BSA-AWQR0.25 BSA-AWQR0.5 BSA-AWQR0.75
0.0330 0.0497 0.0469 0.0406 0.0517 0.0718 0.0642 0.0601 0.0506 0.0758 0.0628 0.0590
4.91 4.80 4.88 4.65 4.91 4.92 4.94 4.79 4.98 4.88 4.97 4.78
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.92 0.85 0.89 0.73 0.95 0.92 0.94 0.83 0.98 0.89 0.98 0.82
0.2678 0.2991 0.2991 0.2999 0.3189 0.3475 0.3444 0.3460 0.2987 0.3299 0.3295 0.3277
0.1211 0.1513 0.1508 0.1515 0.1763 0.2098 0.2059 0.2088 0.1526 0.1852 0.1856 0.1838
BSA-AWCQR BSA-AWQR0.25 BSA-AWQR0.5 BSA-AWQR0.75 BSA-AWCQR BSA-AWQR0.25 BSA-AWQR0.5 BSA-AWQR0.75 BSA-AWCQR BSA-AWQR0.25 BSA-AWQR0.5 BSA-AWQR0.75
0.0941 0.1058 0.1153 0.0940 0.1748 0.1818 0.1943 0.1654 0.1678 0.1868 0.1727 0.1481
4.25 4.30 4.37 4.09 4.31 4.30 4.46 4.11 4.50 4.32 4.65 4.35
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.61 0.53 0.55 0.51 0.61 0.56 0.65 0.48 0.74 0.59 0.74 0.58
0.3524 0.3965 0.3954 0.3949 0.4482 0.4868 0.4873 0.4908 0.4346 0.4743 0.4737 0.4748
0.2117 0.2754 0.2757 0.2734 0.3631 0.4330 0.4393 0.4435 0.3423 0.4092 0.4112 0.4118
BSA-AWCQR BSA-AWQR0.25 BSA-AWQR0.5 BSA-AWQR0.75 BSA-AWCQR BSA-AWQR0.25 BSA-AWQR0.5 BSA-AWQR0.75 BSA-AWCQR BSA-AWQR0.25 BSA-AWQR0.5 BSA-AWQR0.75
0.0168 0.0248 0.0241 0.0219 0.0244 0.0337 0.0293 0.0282 0.0249 0.0318 0.0281 0.0269
4.96 4.93 4.97 4.84 4.98 4.89 4.98 4.80 4.98 4.94 4.98 4.84
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.96 0.95 0.97 0.88 0.98 0.90 0.98 0.84 0.98 0.94 0.98 0.85
0.2224 0.2477 0.2485 0.2495 0.2575 0.2767 0.2750 0.2754 0.2440 0.2654 0.2651 0.2650
0.0849 0.1045 0.1051 0.1057 0.1115 0.1289 0.1280 0.1281 0.1018 0.1204 0.1202 0.1198
BSA-AWCQR BSA-AWQR0.25 BSA-AWQR0.5 BSA-AWQR0.75 BSA-AWCQR BSA-AWQR0.25 BSA-AWQR0.5 BSA-AWQR0.75 BSA-AWCQR BSA-AWQR0.25 BSA-AWQR0.5 BSA-AWQR0.75
0.0461 0.0570 0.0641 0.0492 0.0928 0.1064 0.1134 0.0894 0.0826 0.0929 0.0950 0.0802
4.71 4.63 4.68 4.30 4.58 4.55 4.65 4.39 4.68 4.65 4.75 4.55
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.82 0.72 0.74 0.56 0.75 0.68 0.74 0.63 0.76 0.75 0.78 0.68
0.2833 0.3193 0.3195 0.3186 0.3411 0.3795 0.3790 0.3787 0.3311 0.3690 0.3705 0.3695
0.1372 0.1738 0.1743 0.1748 0.2096 0.2597 0.2595 0.2573 0.2298 0.2984 0.3254 0.3069
deaths, about 60% of censoring. We study the dependence of the survival time on 17 covariates: including 10 continuous variables (age (in years), alb (albumin in g/dl), alk (alkaline phosphatase in U/liter), bil (serum bilirubin in mg/dl), chol (serum cholesterol in mg/dl), cop (urine copper in ug/day), plat (platelets per cubic ml/1000), prot (prothrombin time in seconds), sgot (liver enzyme in U/ml), trig (triglycerides in mg/dl)) and 7 categorical variables (asc (presence of ascites, 0 = no 1 = yes), ede (0 no edema; 0.5 untreated or successfully treated; 1 unsuccessfully treated edema), hep (0, absence of hepatomegaly; 1, presence of hepatomegaly), sex (0 male; 1 female), spid (0, absence of spiders; 1, presence of spiders), stage (histological stage of disease, graded 1, 2, 3 or 4), trt (1 control, 2 treatment)). The dataset has been analyzed by using linear AFT model (Shows et al., 2010) and partially linear proportional hazards model (Hu & Lian, 2013). Before applying our model, all the predictors are linearly transformed to [0, 1]. In the initial analysis, we find that the covariates: ‘‘chol’’ and ‘‘plat’’ are nonlinear with the logarithm of failure time. Thus, we use those covariates as the nonlinear components in our model. For the nonparametric approximation, we use cubic B-spline with 2 interior knots. BIC is employed to select the penalty parameter. The estimation and variable selection results are given in Tables 5 and 6. The estimated curves of the nonparametric components are shown in Fig. 1. From the results, we can see that the BSA-AWCQR procedure and BSA-AWQR0.5 procedure select the same seven variables (age, asc, bil, alb, cop, alk, prot) and the covariates (asc, bil, alb, cop, alk) are selected in all procedures, and also we find that the fitted nonparametric functions are not linear.
H. Liu et al. / Journal of the Korean Statistical Society (
)
–
9
Table 5 Coefficient estimates of the parametric components for the PBC data. BSA-WCQR intercept trt age sex asc hep spid oed bil alb cop alk sgot trig prot stage
BSA-WQR0.25
BSA-WQR0.5
BSA-WQR0.75
6.927
7.071
7.176
7.571
−0.114 −0.878
−0.334 −0.003
−0.059 −1.115
−0.029 −1.128
0.073
0.103
0.164
0.031
−0.562 −0.101 −0.159 −0.375 −1.102
−0.302 −0.226 −0.446 −1.165 −0.927
−0.622 −0.095 −0.222 −0.324 −0.997
−0.736 −0.209 −0.115 −0.131 −1.429
1.542
1.121
1.325
0.568
−1.108
−1.078
−0.804
−1.077
0.844 0.054 0.620 1.126 0.019
1.176 0.399 −0.076 0.087 −0.181
0.522
0.522 0.566 1.271 2.193 0.117
−0.299 0.223 2.381 −0.140
Table 6 Variable selection results for the PBC data. BSA-AWCQR
BSA-AWQR0.25
BSA-AWQR0.5
BSA-AWQR0.75
6.728 0.000 −0.313 0.000 −0.470 0.000 0.000 0.000 −1.139 1.727 −1.213 0.863 0.000 0.000 0.382 0.000
6.485 −0.119 0.000 0.000 −0.590 0.000 0.042 −0.061 −1.085 2.021 −1.145 1.050 0.000 0.000 0.000 0.000
6.936 0.000 −0.716 0.000 −0.674 0.000 0.000 0.000 −1.418 1.612 −0.874 0.733 0.000 0.000 0.983 0.000
7.484 0.000 −1.011 0.000 −0.904 0.000 0.000 0.000 −1.337 0.812 −1.037 0.736 0.000 0.444 1.503 0.000
intercept trt age sex asc hep spid oed bil alb cop alk sgot trig prot stage
4. Discussion In this article, we consider robust estimation and variable selection procedure in censored PLAMs. We approximate the nonparametric components by polynomial spline. The asymptotic properties of the proposed estimators and the oracle property of the variable selection procedure are obtained. Simulation studies show our proposed methods work well under different error distributions and censoring rates. We also apply our method to analyze the PBC data. Acknowledgments This work is supported by the National Natural Science Foundation of China (Grant Nos. 11171361 and 11471058), Ph.D. Programs Foundation of Ministry of Education of China (Grant No. 20110191110033). Xias work is partially supported by the Fundamental Research Funds for the Central Universities (Grant No. 2662016QD006) Appendix Denote the centered version of Sn by
Snj =
gnj : gnj (z ) =
mn
αjw ψjw (z ), (αj1 , . . . , αjmn ) ∈ R
mn
,
1 ≤ j ≤ J.
w=1
Lemma 1. Suppose that g ∈ H and Eg (Zj ) = 0. Then under conditions (1) and (2), there exists a gn∗ ∈ Snj satisfying 1/2 −1/2 ∥gn∗ − g ∥2 = Op (m−ρ ). n + mn n
10
H. Liu et al. / Journal of the Korean Statistical Society (
)
–
Fig. 1. Plots of the estimated nonparametric parts by BSA-AWCQR. The left panel is the estimated curve of the covariate chol. The right panel is the estimated curve of the covariate plat.
In particular, if we choose mn = O(n1/(2ρ+1) ), then
∥gn∗ − g ∥2 = Op (n−ρ/(2ρ+1) ). See Lemma 1 of Huang et al. (2010). Lemma 2. Let DZ = n−1 i=1 ψi ψiT , and λmin (DZ ) and λmax (DZ ) be the smallest and biggest eigenvalues of DZ . Let mn = O(nv ) 1 where 0 < v < 0.5 and h ≡ hn ≍ m− n . Then under the conditions (1) and (2), with probability converging to one,
n
c3 hn ≤ λmin (DZ ) ≤ λmax (DZ ) ≤ c4 hn , where c3 and c4 are two positive constants. See Lemma 1 of Wei (2012). Lemma 3 (Quadratic Approximation Lemma). Suppose An (s) is convex and can be represented as 21 sT Vs + UnT s + Cn + rn (s), where V is symmetric and positive definite, Un is stochastically bounded, Cn is arbitrary, and rn (s) goes to zero in probability for each s. Then αn , the minimizer of An (s), is only op (1) away from βn = −V −1 Un , the minimizer of 12 sT Vs + UnT s + Cn . If also Un →d U, then αn →d −V −1 U. The proof of Lemma 3 is available from Hjort and Pollard (1993). Lemma 4. Under the conditions of Theorem 2.1, minimizing expression (7) is equivalent to minimizing ∗ T Ln (θ ) = (Wn1 + W2n ) θ+
1 2
θ T Dθ + rn (θ ). J
∗ T ∗ Proof. Let cτ∗k , αn∗ , β ∗ be the actual but unknown values of cτk , αn , β respectively, gn∗ = j=1 gnj (Zij ) = ψi αn and Rn,i = J ∗ ∗ ∗ ∗ j=1 (gj (Zij ) − gnj (Zij )) be the approximation errors of the nonparametric part, ηi,k = I (ϵi ≤ cτk ) − τk , ηi,k = I (ϵi ≤
cτ∗k −Rn,i )−τk . Let u˜ k =
√
n(˜cτk −cτ∗k ), µ ˜ =
√
∗ ˜ n(β−β ), ν˜ =
√
n(α˜ n −αn∗ ), θ˜ = (˜u1 , u˜ 2 , . . . , u˜ k ; µ ˜ T ; ν˜ T )T , Xi∗,k = (eTk , XiT , ψiT )T ,
where ek is a q-vector with 1 on the kth position and 0 elsewhere and ∆i,k = √1n Xi∗,kT θ = √1n (uk + XiT µ + ψiT ν). Then θ˜ is the minimizer of the following criterion: Ln (θ ) =
q n δi [ρτk ((ϵi + Rn,i − cτ∗k ) − ∆i,k ) − ρτk (ϵi + Rn,i − cτ∗k )] ˆ k=1 i=1 G(Yi )
H. Liu et al. / Journal of the Korean Statistical Society (
=
)
–
11
q n δi [ρτk ((ϵi + Rn,i − cτ∗k ) − ∆i,k ) − ρτk (ϵi + Rn,i − cτ∗k )] G ( Yi ) k=1 i=1 q n δi δi ∗ ∗ + [ρτk ((ϵi + Rn,i − cτk ) − ∆i,k ) − ρτk (ϵi + Rn,i − cτk )] − ˆ (Yi ) G(Yi ) G k=1 i=1
= I1 + I2 ,
(A.1)
where I1 is the first summation, I2 is the second one. By applying the identity (Knight, 1998), that is ρτ (x − y) − ρτ (x) = y y[I (x < 0) − τ ] + 0 [I (x ≤ t ) − I (x ≤ 0)]dt, we have I1 =
∆i,k q q n n δi δi [I (ϵi + Rn,i − cτ∗k ≤ t ) − I (ϵi + Rn,i − cτ∗k ≤ 0)]dt ∆i,k ηi,k + G(Yi ) G(Yi ) 0 k=1 i=1 k=1 i=1
T = W1n θ+
q
Bn,k (θ )
k=1
where q n 1 δi W1n = √ ηi,k Xi∗,k , n k=1 i=1 G(Yi )
∆i,k n δi [I (ϵi + Rn,i − cτ∗k ≤ t ) − I (ϵi + Rn,i − cτ∗k ≤ 0)]dt . Bn,k (θ ) = G(Yi ) 0 i=1 It is easy to check that W1n has a bounded second moment which implies that W1n is stochastically bounded. Let the matrix cX T cXX T cψ X T
C D = E X cT 0Jmn ×q
0q×Jmn cX ψ T c ψψ T
where C is a q × q diagonal matrix with Cj,j = f (cτ∗k ), c = C1 = (f (cτ∗1 ), . . . , f (cτ∗q ))T , c = 1T C1 = column vector with all components being 1’s. The expectation of q
E (Bn,k (θ )) =
q
k=1
nE
k=1
=
q
δi G(Yi ) ∆i,k
nE 0
k=1 q
=
nE 0
k=1
= =
1 2 1 2
∆i,k
θ E T
q
∆i,k
0
q
k=1
Bn,k (θ ) can be calculated as
[I (ϵi + Rn,i − cτ∗k ≤ t ) − I (ϵi + Rn,i − cτ∗k ≤ 0)]dt
[F (t + cτ∗k − Rn,i ) − F (cτ∗k − Rn,i )]dt (tf (cτ∗k ))dt (1 + O(Rn,i ))
q
k=1
f (cτ∗k ) and 1 is a q
f (cτ∗k )Xi∗,k Xi∗,kT (1
+ O(Rn,i )) θ
k=1
θ T Dθ + op (1), q
r where the last equality stems from E ψ = 0 and θ T E [ k=1 f (cτ∗k )Xi∗,k Xi∗,kT O(Rn,i )]θ ≤ Op (m− n )O(mn ) = op (1). Similar to the proof of Theorem 2.1 of Tang et al. (2012), we can get Var(Bn,k (θ )) → 0. Therefore, T I1 = W1n θ+
1 2
θ T Dθ + op (1).
(A.2)
For I2 in (A.1), using the Taylor expansion as Shows et al. (2010), we have
√ n
1
ˆ (Yi ) G
−
1 G(Yi )
√
=− =
ˆ (Yi ) − G(Yi )] n[G G2 (Yi )
1 G(Yi )
n 1 τ
√
n j =1
0
+ op (1)
I (Yi ≥ s)
dMjC (s) y(s)
+ op (1)
12
H. Liu et al. / Journal of the Korean Statistical Society (
)
–
where y(s) = lim (1/n)
n
n→∞
I (Yi ≥ s),
i =1
MiC (t ) = (1 − δi )I (Yi ≤ t ) −
t
I (Yi ≥ s)dΛC (s),
0
and ΛC (s) is the cumulative hazard function of the censoring time C . This leads to I2 =
q n 1
n δi [ρτk ((ϵi + Rn,i − cτ∗k ) − ∆i,k ) − ρτk (ϵi + Rn,i − cτ∗k )] G(Yi ) j =1
n k=1 i=1
k=1 i=1
n
n
j =1
0
I (Yi ≥ s)
dMjC (s) y(s)
+ op (1)
δi ηi,k I (Yi ≥ s) dMjC (s) + op (1) G(Yi ) y(s)
τ
q n n ∗T 1 Xi,k θ = √
τ
0
T = W2n θ + op (1), 1 where W2n = n√ n
q
k=1
(A.3) C δi ηi,k I (Yi ≥s) dMj (s) ∗ Xi,k . G(Yi ) y(s)
n n τ i=1
j =1
0
Let
Bn (s) = E
q
ηi∗,k I (Yi
≥
s)Xi∗,k
,
k=1 n 1 τ Bn (s) ∗ W2n = √ dMjC (s), n j=1 0 y(s)
we have q n 1 δi ηi,k I (Yi ≥ s) ∗ Xi,k = E n i=1 k=1 G(Yi )
=E
q n 1 δi ηi,k I (Yi ≥ s) ∗ Xi,k n i=1 k=1 G(Yi )
q n ∗ 1 δi ηi,k I (Yi ≥ s) ∗ Xi,k n i=1 k=1 G(Yi )
+ op (1)
+ op (1)
= Bn (s) + op (1).
(A.4)
Combining (A.3) and (A.4), we have ∗T I2 = W2n θ + op (1).
Hence, the desired result follows.
(A.5)
Lemma 5. Denote θ˜ = (˜u1 , u˜ 2 , . . . , u˜ q ; µ; ˜ ν˜ )T as the minimizer of (7), under the conditions of Theorem 2.1, we have ∗ ∗ θ˜ = −D−1 [W1n + E (W1n ) + W2n ] + op (1).
∗ ∗ ∗ , . . . , ω1q , ω2∗ , ω3∗ )T , with Proof. Let W1n = (ω11 , . . . , ω1q , ω2 , ω3 )T , W1n = (ω11 n 1
ω1k = √
n i=1
δi ηi,k , (k = 1, 2, . . . , q), G(Yi )
q n 1 δi ω3 = √ ηi,k ψiT , n k=1 i=1 G(Yi ) q n 1
ω2∗ = √
n k=1 i=1
δi ∗ T ηi,k Xi , G(Yi )
q n 1
ω2 = √
n k=1 i=1
δi ηi,k XiT , G(Yi )
n 1 δi ∗ ω1k = √ ηi∗,k , (k = 1, 2, . . . , q), n i=1 G(Yi ) q n 1
ω3∗ = √
δi
n k=1 i=1 G(Yi )
ηi∗,k ψiT .
H. Liu et al. / Journal of the Korean Statistical Society (
)
–
13
Notice that ∗ Var(ω1k − ω1k )=
≤ =
n 1
Var
n i=1 n 1 1
δi (ηi,k − ηi∗,k ) G(Yi )
E (|ηi,k − ηi∗,k |)
n i=1 ϑ 2 n 1 1
(F (Rni + cτ∗k ) − F (cτ∗k )) = op (1).
n i=1 ϑ 2
(A.6)
Similar to (A.6), we can get Var(ω2 − ω2∗ ) = op (1),
(A.7)
and Var(ω3 − ω3∗ ) = op (1).
(A.8)
Therefore, we have Var(W1n − W1n ) = op (1) which implies ∗
∗ ∗ ∗ W1n − W1n − E (W1n − W1n ) = W1n − W1n − E (W1n ) = op (1).
Then, ∗ ∗ ∗ W1n + W2n = W1n + E (W1n ) + W2n + op (1).
(A.9)
Following Lemmas 3 and 4, we can obtain that ∗ θ˜ = −D−1 (W1n + W2n ) + op (1).
(A.10)
Combining (A.9) and (A.10) leads to ∗ ∗ θ˜ = −D−1 [W1n + E (W1n ) + W2n ] + op (1).
This complete the proof of Lemma 5.
(A.11)
Proof of Theorem 2.1. Denote ΣX = Cov(X ) and Σψ = Cov(ψ) = E ψψ T . Similar to Guo et al. (2013), we can get
−c −1 lEX T ΣX−1 c −1 ΣX−1
C−1
−1 D−1 = −c −1 ΣX EX lT 0Jmn ×q
0Jmn ×p
0q×Jmn 0p×Jmn . c −1 Σψ−1
Letting 1 −1 −1 D− ΣX EX lT , c −1 ΣX−1 ), 1 = (−c
q n 1 δi ∗ W3n = √ ηi∗,k Xi,k , n k=1 i=1 G(Yi )
n 1 τ B1 (s) ∗ W4n = √ dMiC (s), n i=1 0 y(s)
q n 1 δi E (W3n ) = √ ηi,k Xi,k , n k=1 i=1 G(Yi )
where Xi,k = (eTk , XiT )T , B1 (s) = E [
q
k=1
ηi∗,k I (Ti ≥ s)Xi,k ], we have
q √ 2 E [W3n ] = O( n) f (cτ ∗ )E (Rni Xi,k ) + O(m− n ) k k=1
√ = O( n)
q
f (cτ ∗ )ERni EXi,k + O(mn ) −2
k
k=1
Then from (A.11), we can obtain
√
1 ∗ ∗ n(β˜ − β ∗ ) = −D− 1 [W3n + W4n ] + op (1).
Using the property (Robins & Rotnitzky, 1992),
δi =1− G(Yi )
τ
0
dMiC (s) G(s)
,
= o(1).
14
H. Liu et al. / Journal of the Korean Statistical Society (
)
–
∗ ∗ W3n + W4n can be rewritten as
q
n
∗
∗
W3n + W4n
τ
n
1 ∗ 1 = √ ηi,k Xi,k + √ n i=1 k=1 n i=1
B1 (s)/S (s) −
k=1
filtration z(u), have √1n
B1 (s)/S (s)−
ηi∗,k Xi,k
G(s)
0
n
u
1 E √n i =1
B1 (s)/S (s) −
0
(A.12)
I (Yi ≥ s)dΛC (s) is a martingale with respect to the
dMiC (s) is a martingale over [0, τ ],
q k=1
ηi∗,k Xi,k
G(s)
0
u
dMiC (s),
is z(u) predictable and bounded on [0, τ ]. Following Fleming and Harrington (1991), we
G(s) q B1 (s)/S (s)− k=1 ηi∗,k Xi,k
n u i=1
k=1
ηi∗,k Xi,k
G(s)
0
where S (s) = P (T ≥ s). Note that, MiC (u) = (1 − δi )I (Yi ≤ u) − q
q
n
1 dMiC (s) = E E √n
i=1
1 = E √n
u
n
0
i=1
B1 (s)/S (s) −
q k=1
ηi∗,k Xi,k
G(s)
0
B1 (s)/S (s) −
q k=1
ηi∗,k Xi,k
G(s)
0
dMiC (s)|z(0)
dMiC (s)
= 0, and
n u 1 Cov √ n i =1 0
B1 (s)/S (s) −
q k=1
ηi∗,k Xi,k
G(s)
for u ∈ [0, τ ]. Moreover, √1n
n q i=1
k =1
dMiC (s) =E
u
B1 (s)/S (s) −
q k =1
G2 (s)
0
ηi∗,k Xi,k
⊗2 I (Yi ≥ s)dΛC (s)
ηi∗,k Xi,k is z(0) measurable, the two terms on the right hand side in (A.12) are
∗ ∗ E [W3n + W4n ] = 0,
and ∗ ∗ Cov[W3n + W4n ] = Σ,
where
Σ = Σ1 + Σ2 = E
ηi∗,k Xi,k
⊗ 2
k=1
+E
τ 0
B1 (s)/S (s) −
q k=1
ηi∗,k Xi,k
⊗2
G2 (s)
I (Yi ≥ s)dΛC (s)
.
∗ ∗ + W4n is the average of independent random vectors with mean 0 and finite covariance. Thus, we have W3n ∗ ∗ W3n + W4n →d N (0, Σ ).
Then by Slutsky’s theorem, we obtain From (A.11), we can also get
α˜ n − αn = − ∗
−
n c −1 Σψ−1 n
√
1 −1 T n(β˜ − β) →d N (0, D− 1 Σ (D1 ) ).
q n c −1 Σψ−1 δi ∗ δi ηi,k ψi − E ηi,k ψi G(Yi ) n G(Yi ) i=1 k=1 i=1 k =1 q ∗ E η I ( Y ≥ s )ψ i i n τ i,k k=1 dMiC (s) y(s) i=1 0
q n c −1 Σψ−1
= II1 + II2 + II3 .
,
uncorrelated. Then, we have
q
H. Liu et al. / Journal of the Korean Statistical Society (
)
–
15
Note that
q n 1 δi ∗ η ψ = Op (1), √ n i=1 k=1 G(Yi ) i,k i 2
so we have
∥II1 ∥2 ≤ c
−1
∥Σψ−1 ∥2
q n 1 mn δi ∗ 1 O ( 1 ) = O . η ψ = O ( m ) O √ √ p p n n i=1 k=1 G(Yi ) i,k i n n
(A.13)
2
Similar to (A.13), we can prove
∥II3 ∥2 = Op
mn
√
n
.
(A.14)
Recall the definition of II2 , and
∥(Rn1 , . . . , Rnn )(ψ1T , . . . , ψnT )T ∥22 = (Rn1 , . . . , Rnn )
n
ψi ψiT (Rn1 , . . . , Rnn )T
i =1
≤ λmax
n
ψi ψ
T i
2ρ Op (nm− ) n
i =1 1 −2ρ 2ρ−1 = Op (nm− ) = Op (n2 m− ), n )Op (nmn n
we have
q n 1 δi ηi,k ψi E ∥II2 ∥2 ≤ c n i=1 G ( Y ) i k=1 2 q n 1 = O(mn ) f (cτ ∗ )Rni ψi E k n i=1 k=1 −1
∥Σψ−1 ∥2
2
c
=
n c
=
n
O(mn )∥E (Rn1 , . . . , Rnn )(ψ , . . . , ψnT )T ∥2 , T 1
−2ρ−1
O(mn )O nmn
2
−2ρ+1 = O mn 2 .
(A.15)
Combining (A.13)–(A.15), we have
∥α˜ n − αn ∥ = Op ∗
mn
−2ρ+1
√ + mn
2
n
.
Then,
∥˜gn − gn∗ ∥22 = E [ψ T (α˜ n − αn∗ )]2 = E [(α˜ n − αn∗ )T ψψ T (α˜ n − αn∗ )] m n ≤ λmax (E ψψ T )∥α˜ n − αn∗ ∥2 = Op + mn−2ρ . n
Denote
J
j =1
−ρ
gj by g , according to Lemma 1, we have ∥gn∗ − g ∗ ∥2 = Op (mn ∗
∗
1/2
+ mn n−1/2 ). Therefore,
1/2 −1/2 ∥˜gn − g ∗ ∥2 ≤ ∥˜gn − gn∗ ∥2 + ∥gn∗ − g ∗ ∥2 = Op (m−ρ ). n + mn n
This completes the proof of Theorem 2.1.
Proof of Theorem 2.2. Since the proof is similar to the proof of Theorem 2 of Guo et al. (2013), we omit it here.
References Bang, H., & Tsiatis, A. (2002). Median regression with censored cost data. Biometrics, 58, 643–649. Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B., 34, 187–220. Cox, D. R., & Oakes, (1984). Analysis of survival data. London: Chapman and Hall. Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360. Fleming, T. R., & Harrington, D. (1991). Counting processes and survival analysis. New York: Wiley. Guo, J., Tang, M., Tian, M., & Zhu, K. (2013). Variable selection in high-dimensional partially linear additive models for composite quantile regression. Computational Statistics and Data Analysis, 65, 56–67. Hjort, N.L., & Pollard, D. (1993). Asymptotics for minimisers of convex processes, Preprint.
16
H. Liu et al. / Journal of the Korean Statistical Society (
)
–
Hu, Y., & Lian, H. (2013). Variable selection in a partially linear proportional hazards model with a diverging dimensionality. Statistics & Probability Letters, 83, 61–69. Huang, J. (1999). Efficient estimation of the partly linear additive Cox model. The Annals of Statistics, 27, 1536–1563. Huang, J., Horowitz, J. L., & Wei, R. (2010). Variable selection in nonparametric additive models. The Annals of Statistics, 38, 2282–2313. Jiang, R., Qian, W., & Zhou, Z. (2012). Variable selection and coefficient estimation via composite quantile regression with randomly censored data. Statistics & Probability Letters, 82, 308–317. Kai, B., Li, R., & Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. The Annals of Statistics, 39, 305–332. Kalbfleisch, J., & Prentice, R. (1980). The statistical analysis of failure time data. Hoboken, NJ: Wiley. Knight, K. (1998). Limiting distributions for L1 regression estimators under general conditions. The Annals of Statistics, 26, 755–770. Koenker, R., & Bassett, G. (1978). Regression quantiles. Econometrica, 46, 33–50. Lee, E., & Noh, H. (2014). Model selection via Bayesian information criterion for quantile regression models. Journal of the American Statistical Association, 109, 216–229. Lian, H. (2012). Variable selection for high-dimensional generalized varying-coefficient models. Statistica Sinica, 22, 1563–1588. Lian, H., Li, J. B., & Tang, X. Y. (2014). SCAD-penalized regression in additive partially linear proportional hazards models with an ultra-high-dimensional linear part. Journal of Multivariate Analysis, 125, 50–64. Liang, H., Thurston, S., Ruppert, D., Apanasovich, T., & Hauser, R. (2008). Additive partial linear models with measurement errors. Biometrika, 95, 667–678. Liu, X., Wang, L., & Liang, H. (2011). Estimation and variable selection for semiparametric additive partial linear models. Statistica Sinica, 21, 1225–1248. Ma, S., & Du, P. (2012). Variable selection in partly linear regression model with diverging dimensions for right censored data. Statistica Sinica, 22, 1003–1020. Opsomer, J. D., & Ruppert, D. (1999). A root-n consistent backfitting estimator for semiparametric additive modeling. Journal of Computational and Graphical Statistics, 8, 715–732. Robins, J., & Rotnitzky, A. (1992). Recovery of information and adjustment for dependent censoring using surrogate markers. In N. Jewell, K. Dietz, & V. Farewellm (Eds.), AIDS epidemiology - methodological issues (pp. 297–331). Boston: Birkhäuser. Schumaker, L. (1981). Spline functions: Basic theory. New York: Wiley. Shows, J., Lu, W., & Zhang, H. (2010). Sparse estimation and inference for censored median regression. Journal of Statistical Planning and Inference, 140, 1903–1917. Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. The Annals of Statistics, 10, 1348–1360. Tang, L., Zhou, Z., & Wu, C. (2012). Weighted composite quantile estimation and variable selection method for censored regression model. Statistics & Probability Letters, 82, 653–663. Therneau, T., & Grambsch, P. (2000). Modeling survival data: extending the Cox model. New York: Springer. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B., 58, 267–288. Wang, L. F., Li, H. Z., & Huang, J. H. Z. (2008). Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. Journal of the American Statistical Association, 103, 1556–1569. Wang, H., Li, R., & Tsai, C. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94, 553–568. Wang, H. S., & Xia, Y. C. (2009). Shrinkage estimation of the varying coefficient model. Journal of the American Statistical Association, 104, 747–757. Wei, F. (2012). Group selection in high-dimensional partially linear additive models. Brazilian Journal of Probability and Statistics, 26, 219–243. Xie, H. L., & Huang, J. (2009). SCAD-penalized regression in high-dimensional partially linear models. The Annals of Statistics, 37, 673–696. Yang, S. (1999). Censored median regression using weighted empirical survival and hazard functions. Journal of the American Statistical Association, 94, 137–145. Yang, H., & Liu, H. L. (2014). Penalized weighted composite quantile estimators with missing covariates. Statistical Papers, http://dx.doi.org/10.1007/s00362014-0642-2. Ying, Z., Jung, S., & Wei, L. (1995). Survival analysis with median regression models. Journal of the American Statistical Association, 90, 178–184. Zhou, L. (2006). A simple censored median regression estimator. Statistica Sinica, 16, 1043–1058. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429. Zou, H., & Yuan, M. (2008). Composite quantile regression and the oracle model selection theory. The Annals of Statistics, 36, 1108–1126.