Computational Statistics and Data Analysis 144 (2020) 106862
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
Conditional maximum likelihood estimation for semiparametric transformation models with doubly truncated data ∗
Pao-sheng Shen , Huichen Hsu Department of Statistics, Tunghai University, Taiwan, ROC
article
info
Article history: Received 9 May 2019 Received in revised form 29 September 2019 Accepted 2 October 2019 Available online 9 October 2019 Keywords: Double truncation Semiparametric transformation model Conditional independence Conditional maximum likelihood
a b s t r a c t Doubly truncated data arise when a failure time T is observed only if it falls within a subject-specific, possibly random, interval [U , V ], where U and V are referred to as left- and right-truncation times, respectively. In this article, we consider the problem of fitting semiparametric transformation regression models to doubly truncated data. Most of the existing approaches in literature, which adjust for double truncation in regression models, require independence between failure times and truncation times, which may not hold in practice. To relax the independence assumption to conditional independence given covariates, we consider a conditional likelihood approach and develop the conditional maximum likelihood estimators (cMLE) for the regression parameters and cumulative hazard function of models. Based on score equations for the regression parameter and the infinite-dimensional function, we propose an iterative algorithm for obtaining the cMLE. The cMLE is shown to be consistent and asymptotically normal. Simulation studies indicate that the cMLE performs well and outperforms the existing estimators when an independence assumption holds. Applications to an AIDS dataset is given to illustrate the proposed method. © 2019 Elsevier B.V. All rights reserved.
1. Introduction Regression parameter estimation in survival analysis is important for studying factors that affect disease progression. An incident cohort is identified by randomly sampling subjects with initial events (time origin 0) and the sampled subjects are then followed until occurrence of the failure event, loss to follow-up or end-of-study. The incident cohort design is inefficient for natural history studies since it takes a long follow-up time to obtain sufficient failure data. In contrast, it is considerably easier to collect data only from those individuals that fail within a calendar time interval. However, this sampling scheme introduces both left truncation and right truncation, the so-called double truncation. Specifically, suppose that individuals can experience two subsequent events, the first event (such as birth or HIV infection) and a second event (such as onset of chronic diseases or development of AIDS), which define their individual lifetimes, and we are interested in the lifetime distribution. Let C1 and C2 denote the calendar times of the occurrence of the first and second event, respectively. Under double truncation, individuals are selected into the sample if and only if their second events occur within the calendar time interval [τ1 , τ2 ], i.e., τ1 ≤ C2 ≤ τ2 . Let T = C2 − C1 denote the lifetime of interest. Let U = τ1 − C1 and V = τ2 − C1 = U + d0 , where d0 = τ2 − τ1 denote the left and right truncation times, respectively. Let Z ∗ Corresponding author. E-mail address:
[email protected] (P.-S. Shen). https://doi.org/10.1016/j.csda.2019.106862 0167-9473/© 2019 Elsevier B.V. All rights reserved.
2
P.-S. Shen and H. Hsu / Computational Statistics and Data Analysis 144 (2020) 106862
Fig. 1. Schematic depiction of doubly truncated data.
be a p-dimensional covariate vector which may affect the survival distribution of T . Under double truncation, (T , U , V , Z ) is observed if and only if U ≤ T ≤ V , i.e., T is doubly truncated by U and V . Fig. 1 highlights all the different times for doubly truncated data described above. Doubly truncated data play an important role in the statistical analysis of astronomical observations (Lynden-Bell, 1971; Efron and Petrosian, 1999) and in survival analysis. Most of the literature deals with the estimation of the distribution or survival function instead of regression. Efron and Petrosian (1999) introduced the nonparametric maximum likelihood estimator (NPMLE) for the survival distribution function under double truncation. Shen (2010a) investigated the asymptotic properties of the NPMLE and introduced a nonparametric estimator of the truncation distribution function. Moreira and de Uña-Álvarez (2010a) demonstrated that the simple bootstrap method can be used to approximate the sampling distribution of the NPMLE. Moreira et al. (2010) provided the R package DTDA for analyzing truncated data. When the joint distribution of (U , V ) belongs to a parametric family of distribution functions, Moreira and de Uña-Álvarez (2010b) and Shen (2010b) introduced a semiparametric maximum likelihood estimator for the survival distribution function. Emura et al. (2015) established a closed-form variance estimation of the NPMLE. Under the special exponential family (SEF), Hu and Emura (2015) proposed the randomized Newton–Raphson algorithms to obtain the MLE. Emura et al. (2017) showed that the classical asymptotic theory for the i.i.d. data is not adequate for studying the MLE under double-truncation and established the consistency and asymptotic normality of the MLE under a reasonably simple set of regularity conditions. Recently, a textbook by Dörre and Emura (2019) introduces readers to statistical methodologies used to analyze doubly truncated data. The book provides likelihood-based methods, Bayesian methods, non-parametric methods, and linear regression methods. Compared with the studies without covariates, research is fewer on the analysis of doubly truncated data when covariates are present. When covariates are discrete, Shen (2013) proposed a method for regression analysis of interval censored and doubly truncated data using linear transformation models. Moreira et al. (2016) proposed local constant and local linear kernel-type estimators for nonparametric regression with a doubly truncated response. Shen (2016) proposed estimators based on estimating equations (EE). However, the EE estimator suffers from large estimation variance. Frank and Dörre (2017) investigated non-parametric estimation for a linear regression model under double-truncation. For doubly truncated data, based on the density function of observed lifetimes and the random sample size, Dörre (2017) derived a likelihood model that enables simultaneous estimation of the lifetime distribution and the parameters governing the birth process. The proportional hazard (PH) model (Cox, 1972) has often been used in the analysis of survival time and related data. Under the PH model, Rennert and Xie (2018a) adjusted for double truncation based on a weighted estimating equation, which requires a consistent estimator of W (t) = P(U ≤ t ≤ V ). Mandel et al. (2018) presented a different weighted method to fit the Cox regression model. Their method also requires a consistent estimator of W (t). Shen and Liu (2019a) proposed a two-step approach for obtaining the pseudo maximum likelihood estimators (PMLE) of regression parameters for the Cox model with doubly truncated data. Shen and Liu (2019b) extended the pseudo likelihood approach to semiparametric transformation models. Frank et al. (2018) proposed an estimator of the time-dependent regression coefficients in an additive hazard model with doubly truncated data. All the above methods require the assumption that (U , V ) is independent of (T , Z ). Specifically, the approaches of Rennert and Xie (2018a) and Mandel et al. (2018) rely on consistency of the estimator of W (t), which requires the independence assumption. Similarly, the pseudo likelihood approaches of Shen and Liu (2019a), Shen and Liu (2019b) and Frank et al. (2018) require estimating the un-truncation probability P(U ≤ T ≤ V ), which also requires the independence assumption. In practice, the assumption of independence can be violated. For example, for the age at onset of the Alzheimer’s diseases, factors such as depression and stress are associated with delayed study entry. Since these factors are correlated with survival, the left truncation time U and survival time T are not independent. In literature, there exist many statistical methods available for analyzing survival data with dependent left-truncation (e.g. Chaieb et al., 2006; Emura and Konno, 2012a,b; Emura and Murotani, 2015; Emura and Wang, 2016; Shen, 2017; Chiou et al., 2018a,b) and dependent right-truncation (Emura and Wang, 2012). Recently, Rennert and Xie (2018b) proposed a conditional approach for the PH model with doubly truncated data to relax the independence assumption to an assumption of conditional independence,
P.-S. Shen and H. Hsu / Computational Statistics and Data Analysis 144 (2020) 106862
3
i.e., (U , V ) and T are independent given Z . They proposed an expectation–maximization (EM) algorithm to obtain the conditional maximum likelihood estimators (cMLE) and established their asymptotic properties. Their simulation study demonstrates that the cMLE performs well. In this article, we consider fitting semiparametric transformation regression models to doubly truncated data. In Section 2, we consider conditional maximum likelihood estimation for the regression parameters and cumulative hazard function of models. Based on score equations for the regression parameter and an infinite-dimensional function, we propose an iterative algorithm for obtaining the cMLE. The cMLE is shown to be consistent and asymptotically normal. In Section 3, simulation studies are conducted to investigate the performance of the cMLE. In Section 4, the proposed method is illustrated using an AIDS dataset. 2. The proposed estimator 2.1. The cMLE Assume that given Z , (U , V ) is independent of T . Let F (t |Z ) = P(T ≤ t |Z ) denote the cumulative distribution function of T given Z . Let K (u, v|Z ) = P(U ≤ u, V ≤ v|Z ) denote the joint distribution function of U and V given Z . Suppose that the support of T does not depend on Z . Let aF and bF denote the left and right endpoint of the support of T , and similarly, define (aU , bU ) as the left and right endpoint of the support of U, and (aV , bV ) as the left and right endpoint of the support of V . Throughout this article, for identifiability of F (t |Z ) (Woodroofe, 1985), we assume that aU ≤ aF ≤ aV and bU ≤ bF ≤ bV . In practice, proportional hazards assumption may be violated. Semiparametric transformation models have been proposed as a broad class of regression models, which include PH model and the proportional odds (PO) model (Bennett, 1983) as special cases. In literature, semiparametric transformation models have been investigated extensively for right-censored/left-truncated data (Cheng et al., 1995; Chen et al., 2002; Zeng and Lin, 2006, 2010; Chen and Shen, 2018). Given the covariate Z , semiparametric transformation models specify the corresponding cumulative hazard function of T as
{ } Λ(t |Z ) = G R(t) exp(β T Z ) ,
(2.1)
where β is a p × 1 vector of unknown regression parameters and R(t) as an arbitrary baseline cumulative hazard function, and G is a prespecified transformation function that is continuously differentiable and strictly increasing with G(0) = 0 and G(∞) = ∞. When G(t) = t and G(t) = log(1 + t), (2.1) corresponds to the PH and PO models, respectively. Let S(t |Z ) = P(T > t |Z ). Under model (2.1), the un-truncated probability p(Z ) = P(U ≤ T ≤ V |Z ), i.e., the probability an individual is included in the sample given Z , can be written as ∞
∫
∞
∫
[S(u|Z ) − S(v|Z )]dK (u, v|Z ) ∫0 ∞ ∫0 ∞ ( { } { }) −G R(u) exp(β T Z ) −G R(v ) exp(β T Z ) e −e k(u, v|Z )dudv, =
p(Z ) =
0
0
where k(u, v|Z ) is the joint probability density function of U and V given Z . Let (Ti , Ui , Vi , Zi ) (i = 1, . . . , n) denote the observed vector, where Ti , Ui , Vi and Zi are the corresponding variables of T , U, V and Z , respectively. Similar to the full nonparametric likelihood function (Shen, 2010a), given Zi ’s, the likelihood function L of (F , K ) is given by L(F , K ) =
n ∏ dF (Ti |Zi )dK (Ui , Vi |Zi )
p(Zi )
i=1 n
=
dF (Ti |Zi )
∏ i=1
S(Ui |Zi ) − S(Vi − |Zi )
×
n ∏ dK (Ui , Vi |Zi )[S(Ui |Zi ) − S(Vi − |Zi )]
p(Zi )
i=1
.
Note that since the full likelihood function is based on the observed data, the denominator in L(F , K ) is the un-truncation probability given Zi . Let g(x) = dG(x)/dx. Under model (2.1), it follows that the likelihood is proportional to L(β, R) = Lc (β, R) × Lm (β, R, K ), where Lc (β, R) =
[ ] T T T n ∏ dR(Ti )eβ Zi g {R(Ti )eβ Zi } exp{−G{R(Ti )eβ Zi }} i=1
exp{−G{R(Ui )eβ
TZ i
}} − exp{−G{R(Vi )eβ T Zi }}
and Lm (β, R, K ) =
n ∏ i=1
(
{ } −G R(Ui ) exp(β T Zi )
e
∫ bV ∫ bU ( aV
aU
e
{
{ } −G R(Vi ) exp(β T Zi ) )
dK (Ui , Vi |Zi )
−e }
−G R(u) exp(β T Zi )
{
−e
}
−G R(v ) exp(β T Zi )
k(u, v|Zi )dudv
)
.
4
P.-S. Shen and H. Hsu / Computational Statistics and Data Analysis 144 (2020) 106862
Note that Lc and Lm represent the conditional density for Ti given Ui and Vi , and the marginal density for Ui and Vi , respectively. In this article, we consider the cMLE by maximizing the conditional likelihood function Lc (β, R). For nonparametric maximum likelihood estimation, we redefine R(·) to be a step function with jumps only at failure times Ti ’s. The parameter of interest is (β, R(·)). Let βˆ and Rˆ denote the cMLE of β and R, respectively. The following proposition justifies the use of the cMLE when Zi is discrete. Proposition 1. When the covariate Zi is discrete, maximizing L is equivalent to maximizing Lc . Proof. The proof is deferred to Appendix A. When Zi is continuous, we have Lm (β, R) =
dK˜ (Ui ,Vi |Zi ) , i=1 ∫ b2 ∫ b1 ˜ a a dK (u,v|Zi )
∏n
2
dK˜ (u, v|Zi ) = [exp{−G{R(u)eβ
TZ
i
where
1
}} − exp{−G{R(v )eβ
TZ
i
}}]dK (u, v|Zi ).
Since the argument for the discrete case does not hold for the continuous case, the cMLE may not be the unconditional MLE. The full likelihood is proportional to
[ ]( β T Zi ) β T Zi T T n ∏ dR(Ti )eβ Zi g {R(Ti )eβ Zi } e{−G{R(Ui )e }} − e{−G{R(Vi )e }} . LF (β, R, K ) = ∫ bV ∫ bU ( βT Z ) βT Z e−G{R(u)e i } − e−G{R(v )e i } dK (u, v|Zi ) i=1 aV aU The full likelihood approach requires the maximization of LF (β, R, K ), which involves the distribution function of K (u, v|Zi ). One may consider a two-step procedure for obtaining the unconditional MLE. The first step is to obtain a consistent estimator of K (u, v|Zi ), say Kˆ (u, v|Zi ), and then maximize the pseudo-likelihood LF (β, R, Kˆ ). Further research is required in this issue. 2.2. Computational algorithms for the cMLE Based on the conditional likelihood function Lc (β, R), we derive the score functions for all parameters from ln (β, R) = log Lc (β, R). The score function of β is Sβ (β, R) =
n ∑
{ Zi
−g {R(Ti )eβ
TZ
i
}eβ
TZ
i
R(Ti ) +
( g˙ {R(T )eβ T Zi } i
g {R(Ti )e
β T Zi
eβ
TZ i
} ]−1 T T + exp{−G{R(Ui )eβ Zi }} − exp{−G{R(Vi )eβ Zi }} [ T T T × exp{−G{R(Ui )eβ Zi }}g {R(Ui )eβ Zi }eβ Zi R(Ui ) ]} β T Zi β T Zi β T Zi − exp{−G{R(Vi )e }}g {R(Vi )e }e R(Vi ) i=1
)
R(Ti ) + 1
[
˙ = dg(x)/dx. Let T(1) < · · · < T(n) denote the order statistics of Ti ’s. Let dR(T(k) ) be the jump size of R(·) at T(k) . where g(x) Then the score function of dR(T(k) ), k = 1, . . . , d, is Sk (β, R) =
n ( ∑
−g {R(Ti )eβ
TZ
i
}eβ
TZ i
I[Ti ≥T(k) ]
i=1
+
{ g˙ {R(T )eβ T Zi } i
g {R(Ti )eβ
TZ
i
}
eβ
TZ i
I[Ti ≥T(k) ] +
I[Ti =T(k) ] } dR(T(k) )
]−1 T + exp{−G{R(Ui )e i }} − exp{−G{R(Vi )eβ Zi }} [ T T T × exp{−G{R(Ui )eβ Zi }}g {R(Ui )eβ Zi }eβ Zi I[Ui ≥T(k) ] ]) T T T − exp{−G{R(Vi )eβ Zi }}g {R(Vi )eβ Zi }eβ Zi I[Vi ≥T(k) ] . [
Let R(t) = cMLE.
∑
T(k) ≤t
βT Z
dR(T(k) ). Based on these score functions, we propose the following iterative algorithm to obtain the
Step 1: Choose an initial value (βˆ (1) , dRˆ (1) (T(1) ), . . . , dRˆ (1) (T(n) )).
P.-S. Shen and H. Hsu / Computational Statistics and Data Analysis 144 (2020) 106862
5
Step 2: After obtaining (βˆ (j) , Rˆ (j) (t)), the updated estimates at the (j + 1)th iteration can be determined in the following substeps: Step 2.1: Update βˆ (j+1) by finding the root of β of Sβ (β, Rˆ (j) (t)) = 0. Step 2.2: Based on βˆ (j+1) , update dRˆ (j) (t) to dRˆ (j+1) (t) by using the following formula arranged from Sk (β, R) with β = βˆ (j+1) on the right-hand side of the following equation:
∑n dRˆ
(j+1)
(T(k) ) = ∑ n
i=1 I[Ti =T(k) ]
i=1
(
β T Zi
e
β T Zi
g {R(Ti )e
}I[Ti ≥T(k) ] − Bi − Ai I[Ti ≥T(k) ]
),
where g˙ {R(Ti )eβ
Ai =
g {R(Ti
TZ
i
}
T )eβ Zi }
and
[
Bi = exp{−G{R(Ui )eβ
[ ×
}} − exp{−G{R(Vi )eβ
TZ i
exp{−G{R(Ui )eβ
− exp{−G{R(Vi )eβ
TZ i
TZ i
}}g {R(Ui )eβ
}}g {R(Vi )eβ
TZ
TZ i
i
TZ i
]−1 }}
}I[Ui ≥T(k) ] ]
}I[Vi ≥T(k) ]
.
(j)
Define dR˜ k := (dRˆ (j+1) (T(1) ), . . . , dRˆ (j+1) (T(k−1) ), dRˆ (j) (T(k) ), . . . , dRˆ (j) (T(n) )). Here each dRˆ (j+1) (T(k) ) on the left-hand side (j) is updated by plugging dR˜ k into R(·) on the right-hand side of the above equation. Step 3: The convergence criterion is that the difference of the estimators from two consecutive iterations is smaller than a small value, such as ∥βˆ (j+1) − βˆ (j) ∥ < 10−4 . If the convergence requirement is met, then the procedure stops; otherwise, return to Step 2.
ˆ ˆ ·) is a step function with ˆ R(t)) Let (β, = (βˆ (j+1) , T(k) ≤t dRˆ (j+1) (T(k) )) denote the converged estimators. The estimator R( jumps only at failure times Ti ’s. In practice, we may use the pseudo-MLEs from the PH model (Shen and Liu, 2019a) as the initial values. Note that under the PH model, the calculation of the pseudo-MLE for β does not involve the estimation of R(t). When left truncation is present, the conditional likelihood approach does not require estimation of the distribution of left-truncation times. For left-truncated and right-censored data, Chen and Shen (2018) derived the cMLE based on conditional likelihood function and showed that the cMLE is consistent and asymptotically normal. Similarly, for doubly truncated data, we consider the cMLE to avoid difficulties caused by estimation of distributions of left- and right-truncation times. Next, we investigate the asymptotic properties of the cMLE for doubly truncated data. Let Xi = (Ti , Ui , Vi , Zi ) denote the ith observation. The conditional likelihood function Lc (β, R) for the observations Xi (i = 1, . . . , n) can be written as follows:
∑
Lc (β, R) =
) n ( ∏ ∏[ ]dN (t) {R(t)} i Φ (Xi , β, R), i=1
t ≤τ
where Ni (t) = I[Ti ≤t ] and {R(t)} denotes the jump size of the monotone function R(t),
Φ (Xi , β, R) =
(∏
β T Zi
[e
g(e
β T Zi
dNi (t)
R(t))]
t ≤τ
)
{ ∫τ
exp −
J (u)dΛ(u aF i
{ ∫τ
1 − exp −
aF
} |Zi )
Yi (u)dΛ(u|Zi )
},
Ji (t) = I[Ui ≤t ≤Ti ] , Yi (t) = I[Ui ≤t ≤Vi ] and τ < bF . In practice, τ is chosen as T(n) . We maximize the logarithm of Lc (β, R) ln (β, R) = log Lc (β, R) =
n ∫ ∑ i=1
τ
log({R(t)})dNi (t) + log Φ (Xi , β, R).
aF
The following two theorems establishes asymptotic properties of the cMLE.
ˆ ·) − R0 (·)| → 0 a.s. Theorem 1. Under conditions (C1)–(C5) in the Appendix, |βˆ − β0 | + supt ∈[a,τ ] |R(
(2.2)
6
P.-S. Shen and H. Hsu / Computational Statistics and Data Analysis 144 (2020) 106862
Table 1 Simulation results for βˆ 1 , βˆ 2 and Sˆn (tp |1, 1) under the PO model.
βˆ 1
βˆ 2
(lT , rT )
n
bias
std
estd
covp
bias
std
estd
covp
(0.13, 0.37) (0.13, 0.37) (0.13, 0.37)
100 200 400
−0.125 −0.080 −0.063
0 .454 0 .316 0 .169
0 .513 0 .340 0 .183
0 .974 0 .967 0 .962
−0.210 −0.171 −0.078
0 .267 0 .196 0 .120
0 .315 0 .215 0 .126
0 .972 0 .964 0 .957
(0.13, 0.20) (0.13, 0.20) (0.13, 0.20)
100 200 400
−0.129 −0.056 −0.034
0 .521 0 .305 0 .177
0 .484 0 .296 0 .187
0 .932 0 .941 0 .956
−0.201 −0.141 −0.076
0 .285 0 .207 0 .103
0 .287 0 .211 0 .108
0 .937 0 .939 0 .942
(0.13, 0.08) (0.13, 0.08) (0.15, 0.08)
100 200 400
−0.062 −0.055 −0.042
0 .418 0 .253 0 .197
0 .430 0 .278 0 .224
0 .968 0 .965 0 .956
−0.132 −0.109 −0.067
0 .223 0 .147 0 .109
0 .246 0 .153 0 .117
0 .967 0 .960 0 .956
(0.35, 0.23) (0.35, 0.23) (0.35, 0.23)
100 200 400
−0.089 −0.076 −0.065
0 .494 0 .289 0 .195
0 .510 0 .297 0 .206
0 .965 0 .962 0 .958
−0.221 −0.105 −0.067
0 .305 0 .181 0 .107
0 .323 0 .193 0 .115
0 .968 0 .961 0 .956
Sˆn (t0.2 |1, 1)
Sˆn (t0.5 |1, 1)
Sˆn (t0.8 |1, 1)
(lT , rT )
n
bias
std
rmse
bias
std
rmse
bias
std
rmse
(0.13, 0.37) (0.13, 0.37) (0.13, 0.37)
100 200 400
−0.026 −0.023 −0.019
0.086 0.070 0.048
0.090 0.075 0.053
−0.071 −0.056 −0.037
0.141 0.112 0.072
0.158 0.125 0.081
−0.059 −0.045 −0.026
0.118 0.088 0.049
0.132 0.098 0.055
(0.13, 0.20) (0.13, 0.20) (0.13, 0.20)
100 200 400
−0.029 −0.022 −0.019
0.092 0.069 0.052
0.096 0.072 0.055
−0.059 −0.045 −0.032
0.141 0.108 0.080
0.153 0.117 0.086
−0.053 −0.032 −0.024
0.119 0.081 0.064
0.130 0.087 0.068
(0.13, 0.08) (0.13, 0.08) (0.13, 0.08)
100 200 400
−0.023 −0.019 −0.018
0.084 0.074 0.050
0.087 0.076 0.053
−0.056 −0.039 −0.036
0.138 0.114 0.079
0.149 0.120 0.087
−0.051 −0.034 −0.028
0.117 0.081 0.061
0.128 0.088 0.067
(0.35, 0.23) (0.35, 0.23) (0.35, 0.23)
100 200 400
−0.029 −0.023 −0.018
0.102 0.063 0.041
0.106 0.067 0.045
−0.060 −0.039 −0.027
0.158 0.100 0.065
0.169 0.117 0.070
−0.052 −0.027 −0.016
0.137 0.070 0.043
0.147 0.075 0.046
lT and rT are the proportions of left- and right-truncation, respectively.
Proof. The proof is deferred to Appendix B.
ˆ ·) − R0 (·)} converges weakly to a zero-mean Gaussian Theorem 2. Under conditions (C1)–(C7) in the Appendix, n1/2 {βˆ − β0 , R( process in the metric space l∞ (Rp × D), where D = {w (t) : ∥w (t)∥V [a,τ ] ≤ 1} ∥w (t)∥V [0,τ ] is the total variation of w (·) in [a, τ ]. Proof. The proof is deferred to Appendix C. Remark 1. For doubly truncated data, the NPMLE of survival function has convergence rate n1/2 under some conditions ˆ (Shen, 2010a). Theorem 2 implies that the cMLE R(t) has the same convergence rate n1/2 as the NPMLE and βˆ . This implies that the functional parameter {R(Tk )} (k = 1, . . . , n) can be estimated at the same rate as the Euclidean parameter β . Hence, we may treat ln (β, R) as a parametric log-likelihood with β and the jump sizes of R at the observed failure times as the parameters. However, due to the complexity of the second derivatives of ln (β, R), it is difficult to obtain a consistent estimator of the asymptotic covariance matrix for these parameters. Thus, in the simulation study, we consider the jackknife method for constructing interval estimators for β . The jackknife technique is well described in Mosteller and Tukey (1977) and has been shown to be widely useful for obtaining robust confidence intervals for randomly censored data (Gaver and Miller, 1983). In particular, Stute (1996) showed that the jackknife variance estimate of a Kaplan–Meier integral consistently estimates the limit variance. For left-truncated data, Shen (2010c) showed that the jackknife variance estimate of the product limit estimator consistently estimates the limit variance. Simulation results in Section 3 demonstrate that jackknife variance estimators for βˆ n perform well in certain cases when sample size is 400. 3. Simulation studies In Section 3.1, we investigate the performance of the cMLE when the distributions of truncation variables depend on covariates. The simulation results are reported in Tables 1 through 3. In Section 3.2, we compare the cMLE with the EM estimator of Rennert and Xie (2018b) under the PH model. The simulation results are reported in Table 4. In Section 3.3, we investigate the performance of the cMLE when the distributions of truncation variables do not depend on covariates. For purpose of comparison, we also consider the existing two methods, the pseudo-MLE of Shen and Liu (2019a) and the inverse probability weighting (IPW) approach of Mandel et al. (2018). Simulation results are reported in Table 5.
P.-S. Shen and H. Hsu / Computational Statistics and Data Analysis 144 (2020) 106862 Table 2 Simulation results for βˆ 1 , βˆ 2 and Sˆn (tp |1, 1) under G(u) =
7
log(1+0.5u) . 0.5
βˆ 1
βˆ 2
(lT , rT )
n
bias
std
estd
covp
bias
std
estd
covp
(0.14, 0.35) (0.14, 0.35) (0.14, 0.35)
100 200 400
−0.127 −0.053 −0.041
0.327 0.233 0.161
0.398 0.260 0.172
0.980 0.967 0.962
−0.161 −0.092 −0.062
0.234 0.135 0.095
0.283 0.157 0.108
0.978 0.971 0.964
(0.14, 0.20) (0.14, 0.20) (0.14, 0.20)
100 200 400
−0.104 −0.073 −0.032
0.402 0.248 0.156
0.395 0.252 0.163
0.932 0.946 0.952
−0.105 −0.058 −0.027
0.241 0.130 0.113
0.259 0.141 0.121
0.968 0.957 0.954
(0.14, 0.06) (0.14, 0.06) (0.14, 0.06)
100 200 400
−0.104 −0.061 −0.056
0.335 0.198 0.149
0.359 0.214 0.153
0.968 0.965 0.956
−0.107 −0.074 −0.031
0.220 0.127 0.068
0.244 0.134 0.073
0.967 0.963 0.960
(0.36, 0.21) (0.36, 0.21) (0.36, 0.21)
100 200 400
−0.087 −0.059 −0.044
0.371 0.234 0.145
0.392 0.257 0.154
0.968 0.966 0.957
−0.146 −0.057 −0.035
0.288 0.156 0.089
0.312 0.166 0.094
0.967 0.962 0.957
Sˆn (t0.2 |1, 1)
Sˆn (t0.5 |1, 1)
Sˆn (t0.8 |1, 1)
(lT , rT )
n
bias
std
rmse
bias
std
rmse
bias
std
rmse
(0.14, 0.35) (0.14, 0.35) (0.14, 0.35)
100 200 400
−0.029 −0.028 −0.012
0.112 0.088 0.073
0.116 0.092 0.074
−0.068 −0.056 −0.023
0.169 0.121 0.092
0.182 0.133 0.095
−0.055 −0.037 −0.016
0.114 0.085 0.050
0.127 0.093 0.052
(0.14, 0.22) (0.14, 0.22) (0.14, 0.22)
100 200 400
−0.017 −0.016 −0.009
0.121 0.083 0.062
0.122 0.085 0.062
−0.051 −0.035 −0.026
0.159 0.108 0.084
0.167 0.114 0.088
−0.043 −0.020 −0.011
0.117 0.065 0.048
0.125 0.068 0.049
(0.14, 0.06) (0.14, 0.06) (0.14, 0.06)
100 200 400
−0.023 −0.021 −0.013
0.103 0.080 0.061
0.105 0.083 0.065
−0.050 −0.040 −0.031
0.151 0.117 0.084
0.159 0.124 0.094
−0.031 −0.026 −0.021
0.104 0.077 0.055
0.108 0.081 0.059
(0.36, 0.21) (0.36, 0.21) (0.36, 0.21)
100 200 400
−0.032 −0.014 −0.012
0.114 0.075 0.061
0.118 0.076 0.062
−0.058 −0.019 −0.011
0.162 0.104 0.061
0.172 0.106 0.062
−0.042 −0.012 −0.006
0.104 0.069 0.039
0.112 0.070 0.039
lT and rT are the proportions of left- and right-truncation, respectively.
3.1. The cMLE under the dependent case In this section, we conduct simulation studies to evaluate the performance of the cMLE. We consider the class of logarithm transformations, G(u) = log(1 + ρ u)/ρ , where ρ ≥ 0 with ρ = 1 (the PO model), ρ = 0.5 and ρ = 0 (the PH model). We consider the regression part as β1 Z1 + β2 Z2 , where covariate Z1 follows a Bernoulli distribution with probability 0.5 and Z2 is an ordinal variable with P(Z2 = i) = 0.25 for i = 1, 2, 3, 4. The parameters β and R(t) are chosen as β = (β1 = −2, β2 = −3)T and R(t) = et − 1. The U’s are i.i.d. exponentially distributed with distribution function depending on Z1 KU (t |Z1 ) = 1 − e−θ (Z1 )t with θ (Z1 ) = 0.25 if Z1 = 0 and with θ (Z1 ) = 0.5 if Z1 = 1 such that the proportions of left-truncation P(T < U) (denoted by lT ) is around 0.14. The variable V is generated from V = U + d0 , where the values of d0 are chosen as 7.5, 10 and 12.5 such that the proportions of right-truncation P(T > V ) (denoted by rT ) are around 0.35, 0.20 and 0.05, respectively. We also consider the scenario with θ (Z1 ) = 0.1 if Z1 = 0 and with θ (Z1 ) = 0.2 if Z1 = 1 and d0 = 7.5 such that lT and rT are around 0.35 and 0.20, respectively. We keep the sample if U ≤ T ≤ V and ˆ are obtained based on the algorithm proposed in regenerate a sample if T < U or T > V . The cMLE βˆ = (βˆ 1 , βˆ 2 )T and R(t) Section 2. The convergence criterion is set as ∥βˆ (j+1) − βˆ (j) ∥ < 10−4 . The sample sizes are chosen as 100, 200 and 400. In the simulation, 1000 replications are used. Empirically, we find that convergence is typically not slow for all the sample sizes considered. The proposed algorithm requires average of about 30–50 iterations to converge. For the estimation of β , Tables 1 to 3 show the mean biases (bias) over all simulation runs, empirical standard deviations (std), the estimated standard error estimates (estd) using the jackknife method and the coverage probabilities of 95% confidence intervals ( covp). For the estimation of S(tp |Z1 , Z2 ), we consider the estimation of S(tp |1, 1) = e−Λ(tp |1,1) , where the values of tp are chosen as such that S(tp |1, 1) = p, with p = 0.2, 0.5, 0.8. Tables 1 through 3 show the mean biases, empirical standard ˆ p |1, 1) = exp{−G{R(t ˆ p ) exp(βˆ 1 + βˆ 2 )}}. Fig. 2 shows the plot of deviations (std) and root mean squared error (rmse) for S(t the cMLE Sˆn (tp |1, 1) against S(tp |1, 1) for n = 400, d0 = 10 and p = 0.05(0.05)0.95 under the PO and PH models. Based on the results of Tables 1 through 3, we have the following conclusions: (i) For the estimation of β , given proportion of left truncation lT , the standard deviations of βˆ i (i = 1, 2) tend to increase as the proportion of right truncation rT increases. When n = 100 and right truncation is moderate (i.e., rT = 0.37, 0.35) the biases of βˆ i can be large, which are reduced by increasing the sample size. When n = 100, the jackknife variance estimators tend to overestimate the true variances of βˆ i and hence provide conservative confidence intervals for β . The performance of the jackknfie method improves as the sample sizes increase. When n = 400, the jackknife variance estimators are close
8
P.-S. Shen and H. Hsu / Computational Statistics and Data Analysis 144 (2020) 106862
Table 3 Simulation results for βˆ 1 , βˆ 2 and Sˆn (tp |1, 1) under the PH model.
βˆ 1
βˆ 2
(lT , rT )
n
bias
std
estd
covp
bias
std
estd
covp
(0.15, 0.35) (0.15, 0.35) (0.15, 0.35)
100 200 400
−0.069 −0.017 −0.009
0.253 0.166 0.116
0.303 0.184 0.121
0.975 0.968 0.955
−0.084 −0.044 −0.012
0.216 0.132 0.076
0.251 0.154 0.083
0.974 0.962 0.955
(0.15, 0.18) (0.15, 0.18) (0.15, 0.18)
100 200 400
−0.068 −0.017 −0.011
0.282 0.175 0.096
0.307 0.181 0.107
0.964 0.961 0.957
−0.069 −0.033 −0.008
0.233 0.115 0.062
0.257 0.120 0.068
0.968 0.941 0.947
(0.15, 0.05) (0.15, 0.05) (0.15, 0.05)
100 200 400
−0.015 −0.014 −0.006
0.274 0.164 0.110
0.293 0.172 0.112
0.966 0.952 0.954
−0.019 −0.016 −0.013
0.230 0.134 0.046
0.259 0.143 0.052
0.967 0.960 0.954
(0.37, 0.20) (0.37, 0.20) (0.37, 0.20)
100 200 400
−0.088 −0.047 −0.026
0.279 0.166 0.087
0.295 0.174 0.101
0.967 0.957 0.953
−0.104 −0.040 −0.013
0.236 0.136 0.073
0.258 0.145 0.078
0.966 0.958 0.954
Sˆn (t0.2 |1, 1)
Sˆn (t0.5 |1, 1)
Sˆn (t0.8 |1, 1)
(lT , rT )
n
bias
std
rmse
bias
std
rmse
bias
std
rmse
(0.15, 0.35) (0.15, 0.35) (0.15, 0.35)
100 200 400
−0.020 −0.016 −0.012
0.109 0.081 0.061
0.111 0.083 0.062
−0.048 −0.035 −0.026
0.130 0.125 0.094
0.139 0.129 0.098
−0.031 −0.025 −0.019
0.100 0.071 0.046
0.105 0.075 0.050
(0.15, 0.18) (0.15, 0.18) (0.15, 0.18)
100 200 400
−0.026 −0.023 −0.003
0.113 0.093 0.073
0.116 0.096 0.073
−0.057 −0.041 −0.013
0.151 0.121 0.076
0.161 0.128 0.077
−0.023 −0.021 −0.008
0.101 0.068 0.040
0.104 0.071 0.041
(0.15, 0.05) (0.15, 0.05) (0.15, 0.05)
100 200 400
−0.025 −0.021 −0.014
0.121 0.093 0.075
0.124 0.095 0.076
−0.054 −0.045 −0.018
0.160 0.112 0.084
0.169 0.121 0.088
−0.040 −0.023 −0.011
0.109 0.067 0.046
0.174 0.071 0.047
(0.37, 0.20) (0.37, 0.20) (0.37, 0.20)
100 200 400
−0.036 −0.016 −0.006
0.114 0.077 0.056
0.120 0.079 0.056
−0.058 −0.020 −0.005
0.153 0.101 0.059
0.164 0.121 0.059
−0.030 −0.015
0.115 0.062 0.039
0.119 0.064 0.039
0.003
lT and rT are the proportions of left- and right-truncation, respectively.
Fig. 2. The plot of the cMLE Sˆn (tp |1, 1) against the true value S(tp |1, 1) based on n = 400, d0 = 10 and p = 0.05(0.05)0.95.
to the empirical variances for most cases considered, and hence the empirical coverages based on the jackknfie method are close to nominal levels. (ii) For the estimation of S(tp |1, 1), the estimator Sˆn (tp |1, 1) has the largest bias when p = 0.5. This phenomenon is caused by the presence of right truncation and also occurs for the NPMLE of S(t) when there is no covariate. These biases and standard deviations of Sˆn (tp |1, 1) decrease as the sample size increases.
P.-S. Shen and H. Hsu / Computational Statistics and Data Analysis 144 (2020) 106862
9
Table 4 Comparison of the cMLE and EM estimator of Rennert and Xie. Estimation of β1
cMLE
βU
ρ
(lT , rT )
n
−1 −1
−0.35 −0.35
(0.25, 0.25) (0.25, 0.25)
100 250
0 0
0 0
(0.25, 0.25) (0.25, 0.25)
100 250
1 1
0.35 0.35
(0.25, 0.25) (0.25, 0.25)
100 250
EM
bias
std
rmse
bias
std
rmse
0.011 0.003
0.132 0.076
0.132 0.076
0.01 −0.01
0.13 0.08
0.13 0.08
−0.013
0.124 0.075
0.125 0.075
−0.01 −0.01
0.12 0.08
0.12 0.08
0.142 0.083
0.144 0.084
0.03 0.00
0.16 0.09
0.16 0.09
0.007 0.021
−0.015
Estimation of β2
cMLE
EM
βU
ρ
(lT , rT )
n
bias
std
rmse
bias
std
rmse
−1 −1
−0.35 −0.35
(0.25, 0.25) (0.25, 0.25)
100 250
0.013 −0.005
0.137 0.080
0.140 0.080
−0.00 −0.02
0.14 0.08
0.14 0.08
0 0
0 0
(0.25, 0.25) (0.25, 0.25)
100 250
−0.013
0.135 0.084
0.135 0.084
0.00
−0.01
0.13 0.08
0.13 0.08
1 1
0,35 0.35
(0.25, 0.25) (0.25, 0.25)
100 250
0.135 0.083
0.136 0.083
−0.01
0.14 0.08
0.14 0.08
0.003 0.013
−0.011
0.01
lT and rT are the proportions of left- and right-truncation, respectively. Table 5 Comparison of three methods under the PH model for independent case. Estimation of β1
Pseudo-MLE
IPW approach
cMLE
(lT , rT )
n
bias
std
rmse
bias
std
rmse
bias
std
rmse
(0.19, 0.38) (0.19, 0.38) (0.19, 0.38)
100 200 400
0.135 0.118 0.053
0.352 0.247 0.131
0.377 0.274 0.141
−0.130 −0.071 −0.024
0.319 0.205 0.144
0.344 0.217 0.146
−0.070 −0.036 −0.025
0.321 0.209 0.145
0.329 0.212 0.147
(0.19, 0.21) (0.19, 0.21) (0.19, 0.21)
100 200 400
0.096 0.042 0.072
0.316 0.184 0.137
0.330 0.189 0.155
−0.124 −0.069 −0.033
0.299 0.197 0.140
0.324 0.209 0.144
−0.094 −0.036 −0.033
0.281 0.172 0.112
0.296 0.176 0.117
(0.19, 0.05) (0.19, 0.05) (0.19, 0.05)
100 200 400
0.050 0.023 0.015
0.285 0.145 0.132
0.289 0.147 0.133
−0.136 −0.069 −0.031
0.291 0.185 0.135
0.321 0.197 0.139
−0.061 −0.047 −0.027
0.249 0.151 0.088
0.256 0.158 0.092
Estimation of β2
Pseudo-MLE
IPW approach
cMLE
(lT , rT )
n
bias
std
rmse
bias
std
rmse
bias
std
rmse
(0.19, 0.38) (0.19, 0.38) (0.19, 0.38)
100 200 400
0.238 0.134 0.075
0.329 0.252 0.103
0.406 0.285 0.127
−0.204 −0.102 −0.050
0.324 0.206 0.162
0.383 0.230 0.169
−0.135 −0.055 −0.035
0.273 0.165 0.116
0.304 0.174 0.121
(0.19, 0.21) (0.19, 0.21) (0.19, 0.21)
100 200 400
0.086 0.063 0.085
0.317 0.178 0.143
0.328 0.189 0.166
−0.188 −0.088 −0.045
0.309 0.193 0.135
0.362 0.212 0.142
−0.118 −0.049 −0.020
0.240 0.117 0.071
0.267 0.127 0.074
(0.19, 0.05) (0.19, 0.05) (0.19, 0.05)
100 200 400
0.052 0.023 0.010
0.235 0.127 0.150
0.241 0.129 0.150
−0.181 −0.097 −0.049
0.292 0.189 0.144
0.344 0.212 0.152
−0.075 −0.032 −0.011
0.189 0.122 0.064
0.203 0.126 0.065
lT and rT are the proportions of left- and right-truncation, respectively.
We also conduct simulations using the three settings in Tables 1 through 3 except that Z2 is a normal random variable with mean equal to 2.5 and variance equal 1. The simulation results are similar to Tables 1 to 3 and are reported in the Web Appendix. Furthermore, we conduct simulation study under the scenario where U’s are i.i.d. exponentially distributed with distribution function KU (t |Z1 , Z2 ) = 1 − e−θ (Z1 ,Z2 )t , where Z1 follows a Bernoulli distribution with probability 0.5, Z2 is a normal random variable with mean equal 2.5 and variance equal to 1, θ (Z1 , Z2 ) = 0.5 if Z1 = 0 and Z2 > 2.5, θ (Z1 , Z2 ) = 0.375 if Z1 = 1 and Z2 > 2.5, θ (Z1 , Z2 ) = 0.25 if Z1 = 0 and Z2 < 2.5 θ (Z1 , Z2 ) = 0.125 if Z1 = 1 and Z2 < 2.5. The simulation results are reported in Tables 7 to 9 of the Web Appendix. 3.2. Comparison with EM estimator of Rennert and Xie (2018b) Under the PH model, we compare the proposed cMLE with the EM estimator proposed by Rennert and Xie (2018b). They proposed an EM algorithm to relax the independence assumption in the PH model under left, right, or double truncation, to an assumption of conditional independence. We use the same setup as Rennert and Xie (2018b). The survival time T is generated from the PH model with hazard function r0 (t)exp(β1 Z1 + β2 Z2 ), where Z1 and Z2 are from two independent
10
P.-S. Shen and H. Hsu / Computational Statistics and Data Analysis 144 (2020) 106862
uniform Unif[0, 5] distributions, r0 (t) is the hazard function of a Weibull distribution with scale parameter θf = 0.001 and shape parameter δf = 5, and the true value of (β1 , β2 ) is equal to (1, 1). Both U and V are generated from the PH model with hazard functions r0 (t) exp(βU Z1 + W2 ) and r0 (t) exp(Z1 + Y2 ), respectively, where W2 and Y2 are from two independent uniform Unif[0, 5] distributions. The values of βU is chosen as −1, 0, and 1 to induce a negative correlation (ρ = −0.35), independence (ρ = 0) and a positive correlation (ρ = 0.35) between T and U, respectively. To adjust the proportion of truncation, the left- and right-truncation times are multiplied by constants cl and cr , respectively. The values of cl and cr are chosen such that proportions of left- and right-truncation are both equal to 0.25. Table 4 shows the bias, empirical standard deviation (std) and root mean squared error (rmse) of the cMLE and the EM estimator proposed by Rennert and Xie (2018b) (denoted by βˆ R ). The results for EM estimator are obtained from Table 1 of Rennert and Xie (2018b) (displayed with two decimal places). Table 4 indicates that the performance of the cMLE is very close to the EM estimator of Rennert and Xie (2018a). Both estimators perform well with small biases in the three correlated cases. 3.3. Independent case For purpose of comparison, we also consider the case when (U , Z ) is independent of T . Under the PH model, we compare the proposed cMLE with the pseudo-MLE of Shen and Liu (2019a) and the IPW estimator of Mandel et al. (2018). The setup is the same as the dependent case except that the U’s are i.i.d. exponentially distributed with mean equal to 4. The V is generated from V = U + d0 , with d0 = 6, 9, 12. Under the PO model, we compare the proposed cMLE with the pseudo-MLE of Shen and Liu (2019b). The setup of β , R(t), and distributions of Z1 , U and V is the same as the PH model. The Z2 follows a normal distribution with mean equal to 3. Table 5 shows the bias, empirical standard deviation (std) and root mean squared error (rmse) of the estimators based on three approaches. Table 5 indicates that the cMLE outperforms the other two estimators. In particular, for the estimation of β2 , the rmse of the cMLE can be much smaller than that of the IPW estimator proposed by Mandel et al. (2018). 4. Analysis of AIDS data To illustrate the proposed cMLE, we analyze the CDC AIDS blood transfusion data from Table 1 of Kalbfleisch and Lawless (1989) consisting of 295 cases diagnosed prior to July 1, 1986 (τ2 ). The data originally included 494 cases reported to the CDC prior to January 1, 1987, and diagnosed prior to July 1, 1986. The dataset is subject to right truncation since the cases either diagnosed or reported after τ2 were not included to avoid inconsistent data and bias resulting from reporting delay. Moreover, cases having AIDS prior to January 1, 1982 (τ1 ) were not included since HIV was unknown prior to 1982. Of the 494 cases, 295 had consistent data, and the infection could be attributed to a single transfusion or short series of transfusions. Our analyses are restricted to this subset except for one observation with incubation time equal to zero. Thus, the total truncated sample size used is equal to 294. Let C1 and C2 denote the calendar time (in years) of the first event (HIV infection) and second event (development of AIDS), respectively. Let T = 12(C2 − C1 ) (in months) be the incubation time from HIV infection to AIDS. Let U = 12(τ1 − C1 ) and V = U + d0 = 12(τ2 − C1 ) (in months) denote left- and right-truncation times, where d0 = 12(τ2 − τ1 ) = 54. Thus, the incubation time T from the CDC AIDS blood transfusion data is subject to double truncation since T is observable only when τ1 ≤ C2 ≤ τ2 (i.e., U ≤ T ≤ V ). We are interested in studying the relationship between AIDS incubation time and age Z1 at infection. Medley et al. (1987, 1988) considered full parametric modeling of the transfusion data. In their study, the data were grouped into three age classes based on immunological development: ages 0–4 years (sample size = 34), ages 5–59 years (sample size = 120) and ages greater than or equal to 60 years (sample size = 140). Their study demonstrated that there exists a difference in incubation time between 0–4 age group and > 4 age group. Following their grouping method, we try to fit data using the class of logarithm transformation models G(u) = log(1 + ρ u)/ρ with two covariates, but the cMLE fails to converge on the choices of ρ = 0(0.25)5. Lack of convergence can be caused by poor model fit. Hence, we consider fitting the data using models with one covariate defined as Z1 = 1 for individuals with ages 0–4 years, Z1 = 0, for individuals with ages 5–59 years and Z1 = −1 for individuals with ages greater than or equal to 60 years. We consider the logarithm transformations with ρ = 0 and 1 (i.e., the PH and PO models). The estimated coefficients βˆ 1 for ρ = 0 and ρ = 1 are 2.55 and 3.19, respectively. The jackknife estimates of standard error are 0.48 and 0.92 with corresponding p-value less than 0.001 for both models, implying that incubation period tends to decrease as age decreases. Figs. 3 and 4 depict the NPMLE of survival functions and the cMLEs under the PH and the PO model for the three age groups. Fig. 3 indicates that the estimated survival function for age 0–4 years is below that for both age 5–59 and ≥ 60 years. Based on Figs. 3 and 4, we can see that neither model fits the data well although the estimation reaches convergence. Remark 2. In practice, the assumption of independence between (T , Z ) and (U , V ) needs to be tested, given the high sensitivity of existing methods to this assumption as shown in the simulation study of Rennert and Xie (2018b). Furthermore, as pointed out by Rennert and Xie (2018b)), a quick application of a Kendall’s conditional tau test (Martin and Betensky, 2005) revealed that the independence assumption is violated in the CDC AIDS blood transfusion data. Hence, the conditional likelihood approach used here can be a better choice if the time interval (U) between the date of HIV infection and study start date and incubation time T is independent given covariate age Z1 .
P.-S. Shen and H. Hsu / Computational Statistics and Data Analysis 144 (2020) 106862
11
Fig. 3. The plot of the NPMLE for the AIDS data.
Fig. 4. The plot of the cMLE for the AIDS data.
5. Discussion Based on the conditional likelihood function, we have developed the cMLE for the regression parameters and cumulative hazard function of semiparametric transformation models with doubly truncated data. One major advantage of using the conditional approach is that we do not need to estimate the distribution function of truncation times. When covariates are discrete we have shown that the conditional likelihood for T given U and V provides full information and the cMLE for regression parameter is asymptotically efficient. Simulation results indicate that the cMLE performs well for finite sample and outperforms the existing estimators when independence assumption holds. In practice, under the semiparametric transformation model, we need to specify the transformation function G(t). Misspecification of G(t) can lead to erroneous inference. If we limit the link function at the class of logarithm transformation, G(u) = log(1 + ρ u)/ρ, ρ ≥ 0 with discrete covariates, we may select the best fit of ρ using the Kolmogorov–Smirnov type ˆ |zi ) = exp[−G{R(t) ˆ exp(βˆ T Z )}] is statistics KS(zi ) = supt |SˆP (t |zi ) − Sˆn (t |zi )|, where SˆP (x|zi ) is the NPMLE of S(t |zi ) and S(t the estimator based∑ on the cMLE. In this case, we obtain the optimal value ρ that minimizes the weighted sum of the KS(zi ), i.e., KS(z) = i wi KS(zi ), where 0 ≤ wi ≤ 1 are some selected weights. When covariates are continuous, we may select optimum ρ based on conditional likelihood function. Further research is required in this issue. Acknowledgments We thank the associate editor and two reviewers for their constructive and helpful comments, which helped us to significantly improve the manuscript. Funded by ‘‘Ministry of Science and Technology’’ (Taiwan) grant IDs: MOST 108-2118-M-029-001. Appendix A. Proof of Proposition 1 Suppose that Zi takes the possible values Wj (l = 1, . . . , J), the marginal likelihood Lm (β, R) can be written as Lm (β, R) =
nl J ∏ ∏ l=1 i=1
(
β T Wl }}
e{−G{R(Uli )e
∫ b2 ∫ b1 ( a2
a1
β T Wl }
− e−G{R(Vli )e
βT W e−G{R(u)e l }
−
)
dK (Uli , Vli |Wl )
βT W e−G{R(v )e l }
dK (u, v|Wl )
)
,
where (Ul1 , Vl1 ), (Ul2 , Vl2 ), . . . , (Ul,nl , Vl,nl ) are nl discrete data of (Ui , Vi )’s from the subgroup l. For l = 1, . . . , J, let dK˜ (u, v|Wl ) = exp{−G{R(u)eβ
(
TW
l
}} − exp{−G{R(v )eβ
TW
l
) }} dK (u, v|Wl ).
12
P.-S. Shen and H. Hsu / Computational Statistics and Data Analysis 144 (2020) 106862
Then Lm (β, R) =
nl J ∏ ∏ l=1 i=1
dK˜ (Uli , Vli |Wl ) . ∫ bV ∫ bU dK˜ (u, v|Wl ) aV
aU
Thus, when Z is discrete the marginal likelihood Lm (β, R) can be treated as a product-multinomial likelihood (Wang, 1987, 1989), i.e. any K˜ (·, ·|Wl ) that satisfies dK˜ (Uli , Vli |Wl ) 1 = for i = 1, . . . , nl , ∫ bV ∫ bU nl dK˜ (u, v|Wl ) aV
(A.1)
aU
ˆ let maximizes Lm . Given estimators βˆ and R, 1 dKˆ n (u, v|Wl ) = n− l
nl ∑
ˆ li exp{−G{R(U
i=1
I[Uli =u,Vli =v] ˆβ T Wl ˆ )e }} − exp{−G{R(V
li )e
βˆ T Wl }}
and dK˜ n (u, v|Wl ) = exp{−G{R(u)eβ
(
TW
l
}} − exp{−G{R(v )eβ
TW
l
) }} dKˆ n (u, v|Wl ).
dK˜ (U ,V |Wl )
1 −2 ˆ is maximized. Note that the ˆ R) = n− = 1/nl . Hence, (A.1) is satisfied, i.e., Lm (β, l /nl ∏L ˆ Hence, it suffices to maximize maximum value of the marginal likelihood remains a constant l=1 (1/nl )nl for any βˆ and R.
It follows that ∫ bV ∫nbU li li ˜ aV
aU dKn (u,v|Wl )
Lc . The proof is complete. Appendix B. Proof of Theorem 1 Similar to the proof of Zeng and Lin (2010) for right-censored data, we require the following conditions: (C1) The true value β0 lies in the interior of a compact set C in Rp and the true function R0 (·) is strictly increasing and continuously differentiable in [a, τ ]. (C2) With probability one, P(infs∈[a,t ] I[Ti ≥s] ≥ 1|Zi ) > δ0 > 0 for all t ∈ [a, τ ]. (C3) There exists a constant c1 > 0 and a random variable r1 (Xi ) > 0 such that E [log r1 (Xi )] < ∞ and for any β ∈ C and any R,
Φ (Xi , β, R) ≤ r1 (Xi )
∏{
t
∫
dR(s)
1+
}−c1
dR(t)
1+ a
a
t ≤τ
τ
∫
}−dNi (t) {
almost surely. In addition, for any constant c2 , inf{Φ (Xi , β, R) : ∥R∥V [a,τ ] ≤ c2 , β ∈ C } > r2 (Xi ) > 0, where ∥w∥V [a,τ ] is the total variation of w (·) in [a, τ ] and r2 (Xi ), which may depend on c2 , is a finite random variable with E [|log r2 (Xi )|] < ∞. Notice that (C1) and (C2) are standard assumptions in the analysis of censored/truncated data. In (C2), δ0 is independent ˙ β denote of t such that the Ti are uniformly bounded away from a. Moreover, we require certain smoothness of Φ . Let Φ ˙ (h) denote the derivative of Φ along the path R + ϵ h, where h belongs to the the derivative of Φ with respect to β and Φ set of functions in which R + ϵ h is increasing ∫ with bounded total variation. (C4) Let Lr (P) denote the norm ∥f ∥P ,r = ( |f |r dP)1/r . For any (β1 , β2 ) ∈ C and (R1 , R2 ), (h1 , h2 ) with uniformly bounded total variation, there exists a random variable M(Xi ) ∈ L4 (P) and a stochastic process ζi (t , Xi ) ∈ L6 (P) such that
˙ β (Xi , β1 , R1 ) − Φ ˙ β (Xi , β2 , R2 )| |Φ (Xi , β1 , R1 ) − Φ (Xi , β2 , R2 )| + |Φ ⏐ ⏐ ⏐ Φ˙ (Xi , β1 , R1 )(h1 ) Φ˙ (Xi , β2 , R2 )(h2 ) ⏐ ⏐ ˙ (Xi , β1 , R1 )(h1 ) − Φ ˙ (Xi , β2 , R2 )(h2 )| + ⏐ + |Φ − ⏐ Φ (Xi , β1 , R1 ) Φ (Xi , β2 , R2 ) ⏐ [ ] ∫ τ ∫ τ ≤ M(Xi ) |β1 − β2 | + |R1 (t) − R2 (t)|dζi (t , Xi ) + |h1 (t) − h2 (t)|dζi (t , Xi ) . a
a
In addition, ζi (t , Xi ) is non-decreasing and E [M(Xi )ζi (t , Xi )] is left-continuous with uniformly bounded left- and rightderivatives for any t ∈ [aF , τ ]. (C5) If
∏ [
dR∗ (t)
aF ≤t ≤τ
]dNi (t)
Φ (Xi , β ∗ , R∗ ) =
∏ [
]dNi (t)
dR0 (t)
Φ (Xi , β0 , R0 )
aF ≤t ≤τ
almost surely, then β ∗ = β0 and R∗ (t) = R0 (t) for t ∈ [a, τ ].
P.-S. Shen and H. Hsu / Computational Statistics and Data Analysis 144 (2020) 106862
13
Note that (C5) is an identifiability condition. Let A = {R0 : 1/M ≤ R0 (a) ≤ R0 (τ ) ≤ M } for some M > 0. Thus, (C5) implies β and R are identifiable in C × A. (C6) Let BV [aF , τ ] denote the space of functions with bounded total variation in [aF , τ ]. There exists a function η(t , β0 , R0 ) ∈ BV [aF , τ ] and a matrix Mβ0 ,R0 such that
⏐ [ ⏐ ] ∫ τ ⏐ Φ˙ β (Xi , β, R) Φ˙ β (Xi , β0 , R0 ) ⏐ ⏐E η(t , β0 , R0 )d(R(t) − R0 (t))⏐⏐ ⏐ Φ (Xi , β, R) − Φ (Xi , β0 , R0 ) − Mβ0 ,R0 (β − β0 ) − aF = o(|β − β0 | + ∥R − R0 ∥V [a,τ ] ). In addition,
⏐ ⏐
sup ⏐⏐[η(s, β, R) − η(s, β0 , R0 )] − η1 (s, β0 , R0 )(β − β0 ) −
s∈[aF ,τ ]
τ
∫
⏐ ⏐ η2 (s, t , β0 , R0 )d(R − R0 )(t)⏐⏐
aF
= o(|β − β0 | + ∥R − R0 ∥V [aF ,τ ] ), where η1 (s, β, R) is a p-dimensional bounded function and η2 (s, t , β, R) is a bounded bivariate function. Furthermore, there exists a constant c3 such that |η2 (s, t1 β0 , R0 ) − η2 (s, t2 , β0 , R0 )| ≤ c3|t1 − t2 | for any s, t1 , t2 ∈ [a, τ ]. (C7) If τ
∫
w(t)Yi (t)dNi (t) +
˙ β (Xi , β0 , R0 )T v + Φ ˙ (Xi , β0 , R0 )( Φ
∫τ
w (t)dR0 (t))
a
Φ (Xi , β0 , R0 )
a
=0
for some vector v ∈ Rp and w ∈ BV [a, τ ], then v = 0 and w = 0. Proof of Theorem 1. By (C3), the conditional likelihood function is bounded by n ∏
Φ (Xi , β, R)
[∏
Yi (s)dR(s)
}−1
]dNi (t)
{
∫
τ
Yi (t)dR(t)
1+
}−c1
.
aF
aF
t ≤τ
i=1
t
∫
{
dR(t)Yi (t) 1 +
ˆ τ ) < ∞. By differentiating ln (β, R) with Furthermore, (C2) implies that the maximum of (2.2) can be attained only for R( ˆ respect to {R(Ti )}, it follows that R satisfies ˆ =− R(t)
n ∫ t{ n ˆ [T ≥s] ) }−1 ˆ R)(I ∑ ∑ Φ˙ (Xj , β, j i=1
aF
ˆ ˆ R) Φ (Xj , β,
j=1
Yi (s)dNi (s).
ˆ To prove the boundedness of R(t), we construct another step function R˜ with jumps only at the Ti and R˜ satisfies ˜ =− R(t)
n ∫ t{ n ∑ Φ˙ (Xj , β0 , R0 )(I[Tj ≥s] ) }−1 ∑ i=1
aF
Φ (Xj , β0 , R0 )
j=1
dNi (s).
Under condition (C4), by Lemma 1 of Zeng and Lin (2010) and the Glivenko–Cantelli Theorem, R˜ converges uniformly ˆ and ˆ R) to R0 in [aF , τ ]. Using the arguments of Step 2 of Zeng and Lin (2010, page 881), the difference between ln (β, ˜ is negative eventually if R( ˆ τc ) diverges, inducing a contradiction. Hence, lim supn R( ˆ τ ) < ∞ almost surely. Since ln (β0 , R) Rˆ is bounded and monotone, by Helly’s Theorem we can always choose a further subsequence such that Rˆ converges point-wise to some monotone function R∗ . Without loss of generality, assume that βˆ converges to β∗ . Note that
ˆ = R(t)
t
∫ 0
|n−1 |n
∑n
j=1
∑n −1
˙ (Xj , β0 , R0 )(I[Tj ≥s] )/Φ (Xj , β0 , R0 )| Φ
j=1
ˆ | ˆ R) ˙ (Xj , βˆ n , Rˆ n )(I[Tj ≥s] )/Φ (Xj , β, Φ
˜ . dR(s)
(B.1)
Since Rˆ converges to R∗ and is bounded and {M(Xi )ζi (t , Xi ) : t ∈ [aF , τ ]} is a P-Glivenko–Cantelli class, by condition (C4), n n ⏐ ˙ ˆ [T ≥s] ) ˆ R)(I ˙ (Xj , β0 , R0 )(I[Tj ≥s] ) ⏐⏐ ∑ ∑ ⏐ Φ (Xj , β, Φ j −1 ⏐ ⏐ n −n ⏐ ⏐ ˆ ˆ R) Φ (Xj , β0 , R0 ) Φ (Xj , β, j=1 j=1 { } ∫ τ n ∑ −1 ˆ ˆ ≤n M(Xi ) |β − β∗ | + |R(t) − R∗ (t)|dζi (t , Xi ) → 0,
−1
j=1
aF
By Lemma 1 of Zeng and Lin (2010) and the Glivenko–Cantelli Theorem, the numerator and denominator in the integrand of (B.1) converge uniformly to deterministic functions. By condition (C5) and arguments of Step 3 of Zeng and Lin (2010), β∗ = β0 and R∗ = R0 . The proof is complete.
14
P.-S. Shen and H. Hsu / Computational Statistics and Data Analysis 144 (2020) 106862
Appendix C. Proof of Theorem 2 The proof follows the arguments of van der Vaart (1998, page 419–424). Define V = {v ∈ Rp , |v| ≤ 1} and H = {h(t) : ∥h(t)∥V [aF ,τ ] ≤ 1}. Let l(θ, R) be the log-likelihood function from a single observation. Let ˙lβ (β, R) be the derivative of l(β, R) with respect to β and ˙l(β, R)[h] be the path-wise derivative along the path R + ϵ h. Let Pn be the ˆ ˆ empirical measure ∫ based on i.i.d. observations, and P be its expectation. The likelihood equation for (β, R) along the path ˆ and h ∈ BV [aF , τ ] is given by (βˆ + ϵv, Rˆ + ϵ hdR)
{
0 = Pn v T ˙lβ (β, R) + ˙l(β, R)
[∫
]} .
hdR
Thus,
]} [∫ } {∫ ˙ (Xi , β, R) vT Φ ˙ 0 = Pn hdR . h(t)Yi (t)dNi (t) + Φ (Xi , β, R) + Pn Φ (Xi , β, R) ∫ Since (β0 , R0 ) maximizes P {l(β, R)}, 0 = P {v T ˙lβ (β, R)} and 0 = P {˙l[ hdR]}. These equations, combined with the likelihood ˆ yield ˆ R), equations for (β, ]} { [∫ ˆ + ˙l(β, ˆ ˆ R) ˆ R) hdRˆ n1/2 (Pn − P ) v T ˙lβ (β, {
} ˆ ˆ R) ˙ (Xi , β, ˙ (Xi , β0 , R0 ) vT Φ vT Φ − ˆ ˆ R) Φ (Xi , β0 , R0 ) Φ (Xi , β, ∫ ∫ {˙ } ˆ ˆ ˆ ˙ (Xi , β0 , R0 )[ hdR0 ] Φ (Xi , β, R)[ hdR] Φ − n1/2 P − . ˆ ˆ R) Φ (Xi , β0 , R0 ) Φ (Xi , β,
= −n1/2 P
{
ˆ belongs ˆ R) Define Ω0 = {(β, R) : |β − β0 | + ∥R − R0 ∥V [0,τ ] < δ0 }, where δ0 is a small positive constant. When n is large, (β, to Ω0 with probability one. By Lemma 1 of Zeng and Lin (2010) and the Donsker Theorem, op (1) + n
1/2
{
(Pn − P ) v ˙lβ (β0 , R0 ) + ˙l(β0 , R0 ) T
]}
[∫ hdR0
ˆ ˆ R) ˙ (Xi , β, ˙ (Xi , β0 , R0 ) vT Φ vT Φ − ˆ ˆ Φ (Xi , β0 , R0 ) Φ (Xi , β, R) ∫ ∫ {˙ } ˆ [ hdRˆ ] Φ ˆ R) ˙ (Xi , β0 , R0 )[ hdR0 ] Φ (Xi , β, 1/2 −n P , − ˆ ˆ R) Φ (Xi , β0 , R0 ) Φ (Xi , β,
= −n1/2 P
{
}
(C.1)
where op (1) represents some random element converging in probability to zero in l∞ (V × H). Under (C6), it follows that the right-hand side of (C.1) can be expressed as
{ −n1/2 B1 [v, h]T (βˆ − β0 ) +
∫
B2 [v, h]d(Rˆ − R0 )
}
( ) +o n1/2 |βˆ − β0 | + n1/2 ∥Rˆ − R0 ∥V [aF ,τ ] , where (B1 , B2 ) are linear operators in Rp × BV [aF , τ ] and ∫ τ T B1 [v, h] = v Mβ0 ,R0 + h(t)η1 (t , β0 , R0 )dR0 aF
and B2 [v, h] = v T Mβ0 ,R0 + h(t)η1 (t , β0 , R0 ) +
∫
τ
h(t)η2 (s, t , β0 , R0 )h(s)dR0 (s).
aF
˙ (Xi , β0 , R0 )[h]/Φ (Xi , β0 , R0 )] Under (C6), P [Φ
=
∫τ aF
η1 (s, β0 , R0 )dR0 (s). The choice of h(s)
=
I[s≥t ] yields
˙ (Xi , β0 , R0 )[I[·≥t ] ]]/Φ (Xi , β0 , R0 ) = η1 (t , β0 , R0 ). Since the score function along the path R0 + ϵ I[·≥t ] , with the other P [Φ parameters fixed at their true values, has zero expectation, η1 (t , β0 , R0 ) < 0. By Lemma 3 of Zeng and Lin (2010), it follows that (B1 , B2 ) is a Fredholm operator, and the invertibility of (B1 , B2 ) is equivalent to the operator being one-to-one. Thus, ˆ ·) − R0 (·)} converges weakly to a zero-mean Gaussian process in l∞ (Rp × D). The proof is complete. n1/2 {βˆ − β0 , R( Appendix D. Supplementary data Supplementary material related to this article can be found online at https://doi.org/10.1016/j.csda.2019.106862.
P.-S. Shen and H. Hsu / Computational Statistics and Data Analysis 144 (2020) 106862
15
References Bennett, S., 1983. Analysis of survival data by the proportional odds model. Stat. Med. 2, 273–277. Chaieb, L.L., Rivest, L.-P., Abdous, B., 2006. Estimating survival under a dependent truncation. Biometrika 93, 655–669. Chen, K., Jin, Z., Ying, Z., 2002. Semiparametric analysis of transformation models with censored data. Biometrika 89, 659–668. Chen, C.M., Shen, P.S., 2018. Conditional maximum likelihood estimation in semiparametric transformation model with LTRC data. Lifetime Data Anal. 24, 250–272. Cheng, S.C., Wei, L.J., Ying, Z., 1995. Analysis of transformation models with censored data. Biometrika 82 (4), 835–845. Chiou, S.H., Austin, M.D., Qian, J., Betensky, R.A., 2018a. Transformation model estimation of survival under dependent truncation and independent censoring. Stat. Methods Med. Res. http://dx.doi.org/10.1177/0962280218817573. Chiou, S.H., Qian, J., Mormino, E., Betensky, R.A., 2018b. Permutation tests for general dependent truncation. Comput. Statist. Data Anal. 128, 308–324. Cox, D., 1972. Regression models and life tables (with discussion). J. Roy. Statist. Soc. B 34, 187–220. Dörre, A., 2017. Bayesian estimation of a lifetime distribution under double truncation caused by time-restricted data collection. Statist. Papers http://dx.doi.org/10.1007/s00362-017-0968-7. Dörre, A., Emura, T., 2019. Analysis of Doubly Truncated Data an Introduction. Springer Nature Singapore Pte Ltd. Efron, B., Petrosian, V., 1999. Nonparametric methods for doubly truncated data. J. Amer. Statist. Assoc. 94, 824–834. Emura, T., Hu, Y.-H., Konno, Y., 2017. Asymptotic inference for maximum likelihood estimators under the special exponential family with double-truncation. Statist. Papers 58, 877–909. Emura, T., Konno, Y., 2012a. A goodness-of-fit tests for parametric models based on dependently truncated data. Comput. Statist. Data Anal. 56, 2237–2250. Emura, T., Konno, Y., 2012b. Multivariate normal distribution approaches for dependently truncated data. Statical Pap. 53, 133–149. Emura, T., Konno, Y., Michimae, H., 2015. Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation. Lifetime Data Anal. 21, 397–418. Emura, T., Murotani, K., 2015. An algorithm for estimating survival under a copula-based dependent truncation model. Test 24, 734–751. Emura, T., Wang, W., 2012. Nonparametric maximum likelihood estimation for dependent truncation data based on copulas. J. Multivariate Anal. 110, 171–188. Emura, T., Wang, W., 2016. Semiparametric inference for an accelerated failure time model with dependent truncation. Ann. Inst. Statist. Math. 68, 1073–1094. Frank, G., Chae, M., Kim, Y., 2018. Additive time-dependent hazard model with doubly truncated data. J. Korean Stat. Soc. http://dx.doi.org/10.1016/ j.jkss.2018.10.005. Gaver, P., Miller, G., 1983. Jackknifing the kaplan–meier survival estimator for censored data: simulation results and asymptotic analysis. Comm. Statist. Theory Methods 12, 1701–1718. Hu, Y.-H., Emura, T., 2015. Maximum likelihood estimation for a special exponential family under random double-truncation. Comput. Stat. 30, 1199–1229. Kalbfleisch, J.D., Lawless, J.F., 1989. Inference based on retrospective ascertainment: An analysis of the data on transfusion-related aids. J. Amer. Statist. Assoc. 84, 360–372. Lynden-Bell, D., 1971. A method of allowing for known observational selection in small samples applied to 3CR quasars. Mon. Not. R. Astron. Soc. 155, 95–118. Mandel, M., de Uña-Álvarez, J., Simon, D.K., Betensky, R.A., 2018. Inverse probability weighted Cox regression for doubly truncated data. Biometrics 74, 481–487. Martin, E.C., Betensky, R., 2005. Testing quasi-independence of failure and truncation times via conditional Kendall’s tau. J. Amer. Statist. Assoc. 100, 484–494. Medley, G.F., Anderson, R.M., Cox, D.R., Billard, L., 1987. Incubation period of AIDS in patients infected via blood transfusion. Nature 328, 719–721. Medley, G.F., Billard, L., Cox, D.R., Anderson, R.A., 1988. The distribution of the incubation period for the acquired immunodeficiency syndrome (AIDS). Proc. R. Soc. Lond. [Biol.] 233, 367–377. Moreira, C., de Uña-Álvarez, J., 2010a. Bootstrapping the NPMLE for doubly truncated data. J. Nonparametr. Stat. 22 (5), 567–583. Moreira, C., de Uña-Álvarez, J., 2010b. A semiparametric estimator of survival for doubly truncated data. Stat. Med. 29 (30), 3147–3159. Moreira, C., de Uña-Álvarez, J., Meira-Machado, L., 2016. Nonparametric regression with doubly truncated data. Comput. Statist. Data Anal. 93, 294–307. Moreira, C., de Uña-Álvarez, J., Rosa M. Crujeiras, R.M., 2010. DTDA: An R package to analyze randomly truncated data. J. Stat. Softw. 37 (7), 1–20. Mosteller, F., Tukey, J.W., 1977. Data Analysis and Regression. Addison-Wesley, Reading, MA. Rennert, L., Xie, S.X., 2018a. Cox regression model with doubly truncated data. Biometrics 74, 725–733. Rennert, L., Xie, S.X., 2018b. Cox regression model under dependent truncation. arXiv.org > stat > arXiv:1803.09830, submitted for publication. Shen, P.-S., 2010a. Nonparametric analysis of doubly truncated data. Ann. Inst. Statist. Math. 62 (5), 835–853. Shen, P.-S., 2010b. Semiparametric analysis of doubly truncated data. Commun. Stat. - Theory Methods 39, 3178–3190. Shen, P.-S., 2010c. Jackknife methods for left-truncated data. J. Statist. Plann. Inference 140, 3468–3475. Shen, P.S., 2013. Regression analysis of interval censored and doubly truncated data with linear transformation models. Comput. Stat. 28, 581–596. Shen, P.S., 2016. Analysis of transformation models with doubly truncated data. Stat. Methodol. 30, 15–30. Shen, P.S., 2017. Semiparametric analysis of transformation models with dependently left-truncated and right-censored data. Comm. Statist. Simulation Comput. 46, 2474–2487. Shen, P.S., Liu, Y., 2019a. Pseudo maximum likelihood estimation for the Cox model with doubly truncated data. Statist. Papers 60, 1207–1224. Shen, P.S., Liu, Y., 2019b. Pseudo MLE for semiparametric transformation model with doubly truncated data. J. Korean Stat. Soc. 48, 384–395. Stute, W., 1996. The jackknife estimate of variance of a Kaplan–Meier integral. Ann. Statist. 24, 2679–2704. van der Vaart, A.W., 1998. Asymptotic Statistics. Cambridge University Press, Cambridge. Wang, M.-C., 1987. Product-limit estimates: a generalized maximum likelihood study. Comm. Statist. Theory Methods 6, 3117–3132. Wang, M.-C., 1989. A semiparametric model for randomly truncated data. J. Amer. Statist. Assoc. 84, 742–748. Woodroofe, M., 1985. Estimating a distribution function with truncated data. Ann. Statist. 13, 163–177. Zeng, D., Lin, D.Y., 2006. Efficient estimation of semiparametric transformation models for counting processes. Biometrika 93, 627–640. Zeng, D., Lin, D.Y., 2010. A general asymptotic theory for maximum likelihood estimation in semiparametric regression models with censored data. Statist. Sinica 20, 871–910.