Case-cohort analysis with semiparametric transformation models

Case-cohort analysis with semiparametric transformation models

Journal of Statistical Planning and Inference 139 (2009) 3706 -- 3717 Contents lists available at ScienceDirect Journal of Statistical Planning and ...

201KB Sizes 2 Downloads 75 Views

Journal of Statistical Planning and Inference 139 (2009) 3706 -- 3717

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i

Case-cohort analysis with semiparametric transformation models Yi-Hau Chena, ∗ , David M. Zuckerb a b

Institute of Statistical Science, Academia Sinica, Taiwan, ROC Department of Statistics, Hebrew University, Israel

A R T I C L E

I N F O

Article history: Received 23 January 2008 Received in revised form 21 April 2009 Accepted 24 April 2009 Available online 6 May 2009 Keywords: Linear transformation model Survival analysis Weighted estimating equation

A B S T R A C T

Semiparametric transformation models provide flexible regression models for survival analysis, including the Cox proportional hazards and the proportional odds models as special cases. We consider the application of semiparametric transformation models in case-cohort studies, where the covariate data are observed only on cases and on a subcohort randomly sampled from the full cohort. We first propose an approximate profile likelihood approach with fullcohort data, which amounts to the pseudo-partial likelihood approach of Zucker [2005. A pseudo-partial likelihood method for semiparametric survival regression with covariate errors. J. Amer. Statist. Assoc. 100, 1264–1277]. Simulation results show that our proposal is almost as efficient as the nonparametric maximum likelihood estimator. We then extend this approach to the case-cohort design, applying the Horvitz–Thompson weighting method to the estimating equations from the approximated profile likelihood. Two levels of weights can be utilized to achieve unbiasedness and to gain efficiency. The resulting estimator has a closed-form asymptotic covariance matrix, and is found in simulations to be substantially more efficient than the estimator based on martingale estimating equations. The extension to left-truncated data will be discussed. We illustrate the proposed method on data from a cardiovascular risk factor study conducted in Taiwan. © 2009 Elsevier B.V. All rights reserved.

1. Introduction A cohort study follows a study cohort over a time period to collect data on the `failure time' (i.e. the time to the occurrence of a disease event or death), as well as data on relevant covariates, including potential risk factors and confounders. The effects of the risk factors on the failure time can then be assessed by fitting a failure time regression model to the data obtained. As a cost-effective alternative, Prentice (1986) proposed the case-cohort design, where covariate data are collected only on a `casecohort' sample, which consists of subjects in a subcohort randomly sampled from the full cohort, and the `cases', i.e. subjects who experience the event of interest (disease or death). Under the case-cohort design, failure time regression methods can still be applied to assess the covariate effects on the failure time, but some modifications are required to accommodate the incompletely observed covariate data. Semiparametric transformation models, including the well-known Cox (1972) proportional hazards model and the proportional odds model (Pettitt, 1982; Bennett, 1983) as special cases, provide flexible regression models for analysis of failure time data, and have been recently gaining increasing attention (Cheng et al., 1995; Fine et al., 1998; Chen et al., 2002; Zeng and Lin, 2006). To fix notation, let T be the failure time and Z the vector of covariates. The semiparametric transformation models assume that a transformed version of T is linearly associated with Z: 

H(T) = − Z + . ∗ Corresponding author. E-mail addresses: [email protected] (Y.-H. Chen), [email protected] (D.M. Zucker). 0378-3758/$ - see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2009.04.023

(1)

Y.-H. Chen, D.M. Zucker / Journal of Statistical Planning and Inference 139 (2009) 3706 -- 3717

3707

Here H is an unspecified monotonic function,  is a residual error term that is independent of Z and follows a known parametric distribution, and  is the vector of regression parameters, which is of primary interest. The special cases of the Cox model and the proportional odds model correspond to taking  to follow the extreme-value distribution and the standard logistic distribution, respectively. Several inference procedures have been proposed for the semiparametric transformation model. Cheng et al. (1995) and Fine et al. (1998) proposed rank-based estimating equations. This method involves estimation of the survival function of the censoring variable, and hence requires an assumption that the censoring is independent of the covariates. The extension of this method to the case-cohort design was provided by Kong et al. (2004). To allow for covariate-dependent censoring, Chen et al. (2002) proposed a set of martingale estimating equations, which relaxes the assumption of independence between the censoring variable and the covariates, and yields the partial likelihood estimator in the special case of the Cox model. Lu and Tsiatis (2006) recently extended this approach to the case-cohort design. Zeng and Lin (2006) recently developed nonparametric maximum likelihood estimation (NPMLE) for semiparametric transformation models. The NPMLE approach yields an estimator of  that achieves the semiparametric efficiency bound. The NPMLE approach involves high-dimensional maximization, a process that raises computational concerns. Zeng and Lin indicate that the approach is computationally manageable in moderate samples, but for larger samples the computational burden may be serious. The NPMLE was applied to case-cohort data from genetic association studies by Zeng et al. (2006). Such an application is most feasible when the sample size is moderate and the number of the covariates observed only on the `case-cohort' sample is small. To apply the semiparametric transformation model in case-cohort studies, we first consider an approximate profile likelihood approach with full-cohort data, which is equivalent to the `pseudo-partial likelihood' approach proposed by Zucker (2005). The resulting estimating equation for  has a form very similar to the well-known partial score equation for the Cox model, and hence is very easy to implement and extend to various study designs. Through simulations we find that the approximate profile likelihood approach can be markedly more efficient than the martingale estimating equation approach of Chen et al. (2002), and is almost fully efficient relative to the NPMLE. We then extend this proposal to the case-cohort design, applying the Horvitz and Thompson (1952) weighting approach to the estimating equations from the approximate profile likelihood. Following Kulich and Lin (2004), our approach allows two levels of weights, one for the unbiasedness of estimating equations and the other for the enhancement of estimation efficiency. Simulation studies show that the proposed estimator for case-cohort data can achieve major efficiency gain over the estimator of Lu and Tsiatis (2006). The proposed estimators for both cohort and case-cohort studies have closed-form covariance estimators which are easy to calculate. Another advantage of the proposed estimators is that they can be easily modified to accommodate left-truncation data. We will illustrate such applications on data from the CVDFACTS study, a cardiovascular risk factor study conducted in Taiwan. 2. Approximate profile likelihood and pseudo-partial likelihood Assume that the failure time T follows the model (1), where H is a strictly increasing function satisfying H(0) = −∞, and  ∗ follows a distribution with the known hazard and cumulative hazard functions  (·) and ∗ (·), respectively. Assume that T and the censoring time C are conditionally independent given Z. Denote by X the observed follow-up time min(T, C) and by  the failure indicator I(T ⱕ C). We further use the usual counting process notation: Y(t) = I(X ⱖ t) and N(t) = I(X ⱕ t). Suppose that we have a cohort of N independent and identically distributed individuals, yielding the data (Xi , i , Zi ), i = 1, . . . , N. According to (1), the ith    ˙ exp( Zi )} exp( Zi ), subject has the cumulative hazard function i (t) = G{R(t) exp( Zi )} and the hazard function i (t) = R(t)g{R(t) ∗ ˙ where G(s) =  (log s), R(t) = exp{H(t)}, and g(s) = G(s), with the superscript dot denoting derivative. The loglikelihood for (, R), neglecting the noninformative part, is given by l(, R) =

N 



i=1

=

N 



˙ i ) + log g{R(Xi )e Zi } +  Zi ] −G{R(Xi )e Zi } + i [log R(X  −

i=1

Xi 0

 

˙ dt + i log R(X ˙ i) + g{R(t)e Zi }R(t)



Xi

0

  Zi } ˙ g{R(t)e  ˙  Z R(t) dt +  i . g{R(t)e Zi }

(2)

It follows from Zeng et al. (2005, Sec. 3) that the unrestricted maximum likelihood estimator for (, R) does not exist. Accordingly, we restrict R to belong to the collection of all nondecreasing, right continuous step functions on [0, ∞) with R(0) = 0 and with ˙ dt in (2) with the jump size R (t), (2) is rewritten as jumps only at the observed failure times 0 < t1 < · · · < tK < ∞. Replacing R(t) l(, {dR}) = −

N  K  i=1 j=1

+Yi (tj )



Yi (tj )g{R(tj −) exp(

 Zi )}e Zi dR(tj ) +

  Zi }  ˙ g{R(t j −)e  Zi  (t ) +  Z , e R j  i g{R(tj −)e Zi }

N  i=1

i

K 

 I(Xi = tj ) log R (tj )

j=1

where R (tj ) = R(tj ) − R(tj −) denotes the jump sizes of R at tj , j = 1, . . . , K.

(3)

3708

Y.-H. Chen, D.M. Zucker / Journal of Statistical Planning and Inference 139 (2009) 3706 -- 3717

To find the profile loglikelihood of , we need to find the estimator of R that maximizes l(, {R }) with  fixed, an approach that has been adopted by Tsodikov (2003) and Zeng and Lin (2007). Setting the derivative of l(, {R }) with respect to R (tj ) equal to zero yields   N  Zi }   ˙   dNi (tj ) g{R(t j −)e 0= + i Yi (tj ) e Zi . −Yi (tj )g{R(tj −)e Zi }e Zi + (4)  R (tj ) g{R(tj −)e Zi } i=1

Owing to the last term of (4), finding the maximum likelihood estimator for R (tj ) is generally difficult. Note that this term disappears when g(t) is constant (i.e. in the case of the Cox model). If we ignore this term, an estimator for {R } can be readily obtained from the simplified equation

i

 (X ; ) =  R i N





 Zj }e Zj j=1 Yj (Xi )g{R(Xi −)e

i = 1, . . . , N,

,

(5)

where i can be replaced by the total number of failures at Xi when there are tied failure times. The formula (5) defines a ˙ Breslow-type estimator for R(t), and is equivalent to Chen et al. (2002, Eqn. 5). For fixed , Eq. (5) leads to a simple forward ˆ ; ) of R(t ), k = 2, . . . , K: recursion for the estimator R(t k k  ˆ ; ) = R(t ˆ R(t k k−1 ; ) + R (tk ; ), ˆ 1 ) obtained from the equation with R(t N 



Yi (t1 )G{R(t1 )e Zi } = d1 ,

(6)

i=1

where d1 is the total number of failures at the first failure time t1 .  back into (3), replacing R(X ˆ i −; ) with R(X ˆ i ; ) (these two quantities are asymptotically equivalent), and Substituting Rˆ and  R ignoring additive constants, we obtain the approximate profile loglikelihood of : lp () =

N 



i log N j=1

i=1



ˆ i ; )e Zi }e Zi g{R(X .   ˆ i ; )e Zj }e Zj Yj (Xi )g{R(X

(7)

Our proposed estimator ˆ for  is given by the maximizer of the approximate profile loglikelihood (7), i.e. the solution of N *lp  Q() ≡ = * i=1



∞ 0



S(1) (t; ) i (t; ) − (0) S (t; )

where 

ˆ )e Zi } ˙ R(t; g{ i (t; ) = Zi +  ˆ g{R(t; )e Zi }



dNi (t) = 0,

ˆ ˆ )Zi + *R(t; ) R(t;

*

(8)



e Zi

and S(p) (t; ) =

N 





ˆ )e Zj }e Zj  (t; )⊗p , Yj (t)g{R(t; j

p = 0, 1;

j=1

henceforth a⊗0 = 1, a⊗1 = a and a⊗2 = aa for a vector a. Note that by (5) we have ˆ i ; ) ˆ i −; ) *R(X *R(X S(1) (Xi ; ) − = −i , {S(0) (Xi ; )}2 * * ˆ ˆ * through a simple forward recursion, in the same way that we compute R. so that we can compute *R/ It is interesting to note that the pseudo-partial likelihood proposed by Zucker (2005) has the same form as (7). Here (7) is justified by an approximate profile likelihood argument, which elaborates explicitly how (7) deviates from the full likelihood. Remark 1. The estimating equation (8) resembles the well-known partial score equation for the Cox model. This resemblance permits easy extension to various study designs, such as the case-cohort study that we will focus on later, using established techniques for the corresponding extensions of the partial likelihood. Remark 2. In the case of the Cox model, the partial likelihood estimator, the Chen et al. (2002) estimator from the martingale estimating equation, and the estimator ˆ from (7) are equivalent and are semiparametric efficient. In general transformation

Y.-H. Chen, D.M. Zucker / Journal of Statistical Planning and Inference 139 (2009) 3706 -- 3717

3709

models, neither ˆ nor the Chen et al. estimator is semiparametric efficient. Recently, Zeng and Lin (2006) implemented a nonparametric maximum likelihood estimation method by maximizing the likelihood (3) directly using the interior-reflective Newton method. Compared with the NPMLE, the approximate profile likelihood drops the last term of (4), which would convey little information on  when g(t) does not vary much with t. Our simulation results show that the estimator ˆ is more efficient than Chen et al. estimator, and is almost as efficient as Zeng and Lin's NPMLE. Note that the covariance estimate for the NPMLE would involve huge matrix inversion when the number of cases is large, while the proposed estimator has a closed-form asymptotic covariance matrix whose estimate is easy to obtain. Remark 3. The resemblance between the estimating equation (8) and that for the Cox model also makes our proposal very easy to accommodate left-truncation data. Let Li be the left-truncation time of subject i so that the subject i is included in the study only when Ti ⱖ Li ; hence we have Li ⱕ Ti ⱕ Ci for subject i in the study. Following the case for the Cox model, the modification is accomplished by the standard adjustment to the risk set so that the risk set indicator is now defined as Yj (Xi ) = I(Lj ⱕ Xi ⱕ Xj ) (Tsai et al., 1987; Lai and Ying, 1991). 3. Large sample theory We summarize in this section the large sample theory for the approximate profile likelihood estimator ˆ . Similar results have been obtained and proved in Zucker (2005), assuming g(0) > 0. As mentioned in Zucker (2005), technical problems arise when g(0) = 0. We believe, however, that the following results will continue to hold in this case, possibly with slight modifications. Define the martingale process  t  Y(s) dG{R0 (s) exp(0 Z)}, M(t) = N(t) − 0

where (0 , R0 ) are the true values of (, R). Also define 



s(p) (t; ) = E[Y(t)g{R0 (t)e0 Z }e0 Z (t; )⊗p ], 

p = 0, 1, 2,



˙ 0 (t)e0 Z }e20 Z (t; )⊗p ], v(p) (t; ) = E[Y(t)g{R V (p) (t; ) =

N 



p = 0, 1,



˙ 0 (t)e0 Z }e2 Zi i (t; )⊗p , Yi (t)g{R

p = 0, 1,

i=1

e(t; ) = exp



t

0

v(0) (u; ) dR (u) , 0 s(0) (u; )

where dR0 (t) = R˙ 0 (t) dt. Write (t) = (t; 0 ), s(p) (t) = s(p) (t; 0 ), v(p) (t) = v(p) (t; 0 ), and e(t) = e(t; 0 ). Let denote the maximum follow-up time: = inf{t : pr(X > t) = 0}. Theorem 1. Under the assumptions listed in the Appendix, the proposed estimating function Q() has the following asymptotic representation as N → ∞: N

−1/2

Q(0 ) = N

−1/2

N   i=1

0





s(1) (t) i (t) − (0) − q(t) s (t)

dMi (t) + op (1).

(9)

Here q(t) = q(t; 0 ), with q(t; ) =

e(t; ) s(0) (t; )

 c(u; ) dR0 (u) t e(u; )

and c(t; ) = v(1) (t; ) −

s(1) (t; ) (0) v (t; ). s(0) (t; )

It can be verified that, for large N, −1 times the derivative of N−1 Q() tends to   {s(1) (t)}⊗2 dR0 (t) A= s(2) (t) − s(0) (t) 0

(10)

in probability uniformly in a suitable neighborhood of 0 . The following theorem presents the consistency and the asymptotic distribution of ˆ .

3710

Y.-H. Chen, D.M. Zucker / Journal of Statistical Planning and Inference 139 (2009) 3706 -- 3717

Theorem 2. Under the assumptions listed in the Appendix, the solution ˆ to Q() = 0 is strongly consistent, and N1/2 (ˆ − 0 ) is asymptotically normal with mean 0 and covariance matrix A−1 + A−1 BA−1 , where B=

 0

(11)

q(u)⊗2 s(0) (u) dR0 (u).

ˆ ˆ ) and Bˆ = B( ˆ ˆ ), respectively, with The asymptotic covariance can be consistently estimated by Aˆ −1 + Aˆ −1 Bˆ Aˆ −1 , where Aˆ = A( ⎡ ⎤  ⊗2 N  (2) (1)  ˆ ) = N−1 ⎣ S (t; ) − S (t; ) ⎦ dNi (t), A( S(0) (t; ) S(0) (t; ) 0

(12)

i=1

ˆ ) = N−1 B(

N   0

i=1

Here ˆ ) = q(t;

E(t; ) S(0) (t; )

with E(t; ) = exp

 0

t

ˆ )⊗2 dNi (t). q(t;

(13)

 C(u; ) ˆ dR(u; ), t E(u; ) V (0) (u; ) ˆ d R(u;  ) , S(0) (u; )

C(t; ) = V (1) (t; ) −

S(1) (t; ) (0) V (t; ). S(0) (t; )

Remark 4. The asymptotic covariance matrix (11) consists of two terms. The first term is just the inverse of the expected negative Hessian matrix and has a form similar to that for the Cox model. The second term results from the estimation of R0 (t) using (5). We found from numerical experiments that the second term is usually small and the first term tends to be the dominant term of the asymptotic covariance. 4. Estimation in the case-cohort study Here we consider the case-cohort study, where, as a cost-effective strategy, complete covariate information is ascertained only on a `case-cohort' sample consisting of all the cases as well as a subcohort randomly chosen from the full cohort. To simplify the presentation, we will focus on the classical case-cohort design of Prentice (1986), in which the subcohort is a simple random sample from the full cohort. With slight modifications the methodology can also be applied to the stratified case-cohort design of Borgan et al. (2000), where the sampling of the subcohort can be stratified by covariate information available at the start of the study. Under the classical case-cohort design, we observe (X, ) for all subjects in the full cohort, but we observe Z only for cases and subjects in the subcohort. Note that data from such a study forms a special type of missing covariate data, where the missing data mechanism is determined by the study design and hence is fully known. Let n be the size of the subcohort, the probability of a subject being sampled into the subcohort, and i , i = 1, . . . , N, the subcohort indicator which equals 1 if subject i is sampled into the subcohort and equals 0 otherwise. Given the failure status i , the probability of subject i being selected into the case-cohort sample is then i + (1 − i ) , i.e. the selection probability is 1 for a case and for a control (censored subject). The estimating equations (5), (6) and (8) for estimating (, R) with full-cohort data can be easily extended to the case-cohort design using the Horvitz–Thompson method, which weights each selected subject by the inverse of the subject's selection probability. In this way, an (asymptotically) unbiased estimator is obtained. Thus, for subject i in the full cohort we work with the weight wi = i + (1 − i ) i / . In view of the result in Robins et al. (1994), we can replace in the definition of wi with its empirical estimate, so as to exploit the information available in the full-cohort data, and thereby gain potential efficiency in estimating . For case-cohort analysis with the Cox model, Kulich and Lin (2004) suggested estimating the sampling probability with a weighted estimator to achieve further efficiency. Accordingly, here we consider the weighted estimator for given by  N N  

ˆ (t) = i (1 − i )mi (t) (1 − i )mi (t) , i=1

i=1

where the weight mi (t) is possibly time-dependent and satisfies assumption (A8) in the Appendix. We thus propose a Horvitz–Thompson weighted version for the estimating equations (5), (6) and (8) for estimating (, R) with case-cohort data, where the weight for subject i in the full cohort is given as wi (t) = i + (1 − i ) i / ˆ (t),

i = 1, . . . , N.

Y.-H. Chen, D.M. Zucker / Journal of Statistical Planning and Inference 139 (2009) 3706 -- 3717

3711

Write (0)

Sw (t; r, ) =

N 





wj (t)Yj (t)g{re Zj }e Zj ,

j=1

and define, for i = 1, . . . , N,  (X ; ), Rˆ w (Xi ; ) = Rˆ w (Xi −; ) +  Rw i with  (X ; ) =  Rw i N

i

ˆ j=1 Sw (Xj ; Rw (Xi −, ), ) (0)

,

(14)

and with Rˆ w (t1 ; ) obtained by N 



wi (t1 )Yi (t1 )G{Rˆ w (t1 ; )e Zi } = d1 .

i=1

Write further 

wi (t; ) = Zi + (p)

Sw (t; ) =

N 

N   i=1



*Rˆ w (t; ) Rˆ w (t; )Zi + *





wj (t)Yj (t)g{Rˆ w (t; )e Zj }e Zj wj (t; )⊗p ,

j=1

Qw () =

˙ Rˆ w (t; )e Zi } g{  g{Rˆ w (t; )e Zi }

∞ 0



wi (t; ) −

(1)

Sw (t; ) (0)

Sw (t; )



e Zi , p = 0, 1,

dNi (t).

(15)

The proposed estimator for (, R) under the case-cohort design is then the solution ˆ w to Qw () = 0. Note that the estimator of Lu and Tsiatis (2006) is equivalent to the estimator obtained by setting wi (t; ) = Zi in (15), which completely ignores information  in g{Rˆ w (t; ) exp( Z)}. In the special case of the Cox model, our proposal is equivalent to the doubly weighted estimator of Kulich and Lin (2004). Remark 5. For the Cox model, various versions of the weight mi (t) for estimating the sampling probability , including both time-constant and time-dependent weights, have been suggested by Chen and Lo (1999), Borgan et al. (2000), Chen (2001), and Kulich and Lin (2004); see Kulich and Lin (2004) for a comprehensive study. Kulich and Lin (2004) termed the resulting class of estimators of  the `doubly weighted estimators' because they involve two levels of weights. The first-level weight is wi (t), the inverse of the estimated selection probability, which is introduced for the unbiasedness of the estimating equation for . The second-level weight is mi (t), which is introduced in estimating the sampling probability to achieve higher efficiency. Two common choices for mi (t) are mi (t) = 1 and mi (t) = Yi (t). For the Cox model, these two weights yield Borgan et al.'s estimator II with time-fixed and time-dependent weights, respectively. Remark 6. As discussed in Remark 3, the estimating equation (15) can be easily modified to accommodate left-truncation time Li by re-defining the risk set indicator as Yj (Xi ) = I(Lj ⱕ Xi ⱕ Xj ), i = 1, . . . , N. To derive the asymptotic properties of ˆ w , we develop the following asymptotic expansion for the weighted estimating function Qw (), which reduces to Theorem 1 of Kulich and Lin (2004) in the special case of the Cox model. Theorem 3. Assume that r(t) ≡ E{(1 − i )mi (t)} are bounded away from 0 for all t ∈ [0, ]. Then N−1/2 Qw (0 ) = N−1/2 Q(0 ) + N−1/2

  N  (1 − i ) 1 − i {Ui (t) − r(t)−1 mi (t)u(t)} dR0 (t) + op (1), i=1

where  Ui (t) = i (t) −

  s(1) (t) − q(t) Yi (t)g{R0 (t)e0 Zi }e0 Zi s(0) (t)

and u(t) = E{(1 − i )Ui (t)}.



0

(16)

3712

Y.-H. Chen, D.M. Zucker / Journal of Statistical Planning and Inference 139 (2009) 3706 -- 3717

The proof is given in the Appendix. From (16) we have immediately the following large sample result for ˆ w , whose proof is sketched in the Appendix. Theorem 4. Under the assumptions given in the Appendix, ˆ w is strongly consistent and N1/2 (ˆ w − 0 ) is asymptotically normal with mean 0 and covariance matrix A−1 + A−1 (B + )A−1 ,

(17)

with

=

1−



 ⊗2  E (1 − i ) {Ui (t) − r(t)−1 mi (t)u(t)} dR0 (t) . 0

ˆ ˆ −1 ˆ −1 ˆ ˆ ˆ ˆ w (t; ˆ w )) are defined The asymptotic covariance can be consistently estimated by Aˆ −1 w + Aw (Bw + )Aw . Here Aw and Bw (and q (p) (p) (p) ˆ ˆ ˆ ˆ ˆ )) in Section 3 with S (t; ) replaced by S (t;  ), p = 0, 1, 2, and V (t; ) replaced by as Aˆ and Bˆ (and q(t; w

(p)

Vw (t; ) =

N 





˙ Rˆ w (t; )e Zi }e2 Zi wi (t; )⊗p , wi (t)Yi (t)g{

w

p = 0, 1.

i=1

ˆ is obtained as The matrix ˆ =

⊗2   N 1 − ˆ  i ˆ {Uˆ i (t) − rˆ (t)−1 mi (t)u(t)} dRˆ w (t; ˆ w ) , (1 − i ) N ˆ

ˆ 0 i=1

where ˆ = n/N,  (1) Sw (t; ˆ w ) ˆ ˆ ˆ ˆ ˆ Ui (t) = wi (t; w ) − (0) − qˆ w (t; w ) Yi (t)g{Rˆ w (t; ˆ w )ew Zi }ew Zi , ˆ Sw (t; w ) ˆ = N−1 u(t)

N  i (1 − i )Uˆ i (t),

ˆ i=1

and rˆ (t) = N

 −1 N

i=1 (1 − i )mi (t).

Remark 7. Comparing the asymptotic covariance matrix (17) for ˆ w with that for ˆ (11), we can see that the excess variability due to subcohort sampling is accounted for by A−1 A−1 . Remark 8. Our proposal is simple to implement even when the number of covariates observed only on the `case-cohort' sample is large, while the NPMLE proposed by Zeng et al. (2006) would involve high-dimensional integration/summation in this case. 5. Simulation studies The first simulation study was conducted to examine the performance of our proposal with full-cohort data, and to compare it with the performances of the martingale estimating equation estimator (Chen et al., 2002) and the NPMLE (Zeng and Lin, 2006). ∗ The failure time was generated with R0 (t) = t2 and  (t) = exp(t)/{1 + r exp(t)}, r = 1 or 2, which are the generalized odds-rate models of Dabrowska and Doksum (1988). The choice r = 1 corresponds to the proportional odds model. Two covariates Z1 and Z2 were generated, respectively, from the standard normal distribution truncated at ±2 and the Bernoulli distribution with success probability 0.5. The censoring variable was exponentially distributed with the hazard rate given by 0 exp( Z2 ), where = log 2 and 0 was chosen to yield the desired censoring proportion (10% or 30%). The sample size N was 100. The number of simulation replications was 1000. Relative efficiency was calculated as the ratio of the empirical variance of the Chen et al. or the NPMLE estimator to that of the proposed estimator (RE1 : proposed vs. Chen et al., RE2 : proposed vs. NPMLE). Table 1 shows the results. We see that ˆ and its variance estimator are essentially unbiased, and the empirical coverage probability is quite close to the nominal value of 95%. The proposed estimator is more efficient than the Chen et al. (2002) estimator from the martingale estimating equation, with the efficiency gain being more prominent when r = 2 and the censoring proportion is smaller (10%). It is interesting to note that the proposed estimator is almost as efficient as the fully efficient NPMLE estimator, even in the case of r = 2 where g(t) varies substantially with t. The second simulation study was conducted to examine the performance of our proposal with case-cohort data, and to compare it with the performance of the estimator of Lu and Tsiatis (2006) based on the weighted martingale estimating equation. The size of the full cohort is N = 1000, and for each subject in the full cohort the probability of being sampled into the subcohort

Y.-H. Chen, D.M. Zucker / Journal of Statistical Planning and Inference 139 (2009) 3706 -- 3717

3713

Table 1 Results for the first simulation study: the complete cohort data.

1 = 1, 2 = −1

1 = 0.5, 2 = −0.5

ˆ 1 r=2 Mean SE  SE CP RE1 RE2 r=1 Mean SE  SE CP RE1 RE2

ˆ 2

ˆ 1

ˆ 2

1.051 0.276 0.265 93.6 1.66 0.98

1.052 0.291 0.275 94.1 1.45 0.99

−1.014 0.473 0.478 96.1 1.59 0.98

−1.017 0.491 0.493 95.6 1.48 0.99

0.521 0.263 0.252 93.1 1.71 0.97

0.529 0.277 0.259 93.9 1.48 0.99

−0.494 0.469 0.472 95.6 1.58 0.97

−0.498 0.491 0.486 94.9 1.45 0.99

1.025 0.220 0.212 94.9 1.26 0.99

1.028 0.239 0.226 93.9 1.20 1.00

−1.017 0.375 0.375 95.7 1.23 0.99

−1.019 0.403 0.397 95.3 1.20 1.00

0.509 0.204 0.197 94.5 1.28 0.99

0.512 0.219 0.209 94.5 1.21 1.00

−0.503 0.369 0.367 96.2 1.23 0.99

−0.502 0.397 0.389 95.0 1.20 1.00

In each cell, left entry corresponds to a censoring proportion of 0.1, right entry corresponds to a censoring proportion of 0.3. The hazard function for  is  mean of the estimated standard errors; CP: empirical coverage probability of 95% confidence interval; RE1 : relative exp(t)/{1 + r exp(t)} with r = 1 or 2. SE: efficiency of the proposed estimator vs. the Chen et al. (2002) estimator; RE2 : relative efficiency of the proposed estimator vs. the NPMLE (Zeng and Lin, 2006) estimator.

Table 2 Results for the second simulation study: the case-cohort design.

1 = 1, 2 = −1

1 = 0.5, 2 = −0.5





w1

r=2 Mean SE  SE

ˆ w1

w2

ˆ w2

CP RE

1.037 0.265 0.264 95.4 1.69

1.029 0.209 0.214 95.5 1.53

−1.023 0.503 0.465 92.9 1.36

−1.037 0.405 0.390 95.0 1.22

0.513 0.237 0.226 93.5 1.53

0.512 0.184 0.185 96.2 1.38

−0.501 0.458 0.428 93.8 1.31

−0.519 0.375 0.358 94.1 1.19

r=1 Mean SE  SE CP RE

1.040 0.250 0.234 92.9 1.34

1.027 0.195 0.194 95.1 1.28

−1.008 0.491 0.446 93.0 1.19

−1.027 0.390 0.375 94.5 1.11

0.516 0.227 0.209 92.0 1.29

0.513 0.174 0.172 95.6 1.23

−0.491 0.450 0.413 92.8 1.16

−0.514 0.363 0.345 93.6 1.11

In each cell, left entry corresponds to a sampling probability of 19 , right entry corresponds to a sampling probability of 29 . The hazard function for  is exp(t)/{1 +  mean of the estimated standard errors; CP: empirical coverage probability of 95% confidence interval; RE: relative efficiency of the r exp(t)} with r = 1 or 2. SE: proposed estimator relative to the estimator in Lu and Tsiatis (2006).

is = 19 or 29 . All the variables were generated in the same way as in the first simulation study, except that 0 was chosen to yield a censoring proportion of 90%. The number of simulation replications was 1000. Table 2 shows the results, where the proposed estimator was implemented with the weight used for estimating chosen to be mi (t) = Yi (t). Results for the alternative estimator with mi (t) = 1 were very similar and hence have been omitted. We see that the finite sample behavior of the proposed estimator ˆ w is in accord with the theoretical expectation from Theorem 4. Moreover, our estimator ˆ achieves higher efficiency than the estimator of Lu and Tsiatis (2006) based on the weighted martingale estimating w

equation. The efficiency advantage is particularly marked when the sampling probability is small ( = 19 ).

6. Real data example The Cardiovascular Disease Risk Factor Two-Township Study (CVDFACTS) is a community-based follow-up study focusing on cardiovascular diseases and their risk factors in Taiwan. The study was carried out in the Chu-Dong and Pu-Tzu townships from 1991 to 2001. See Chen et al. (2006) for detailed information on this study. We apply the methods proposed in Sections 2 and 4 to the cohort from Chu-Dong township. We focus on the incidence of hypertension and its risk factors, including gender (0 = female, 1 = male), education level (0 = low, 1 = high), regular smoking (0 = no, 1 = yes), regular alcohol consumption (0 = no, 1 = yes), and regular physical activity (0 = no, 1 = yes). After excluding incomplete records, the Chu-Dong cohort in CVDFACTS consists of 2197 subjects, with 422 subjects diagnosed as hypertension cases during the study period. We use age as the time axis and the age of participant i at study entry as the left-truncation time Li , which ranges from 18.1 to 85.6. We work with the generalized ∗ odds-rate models with  (t) = exp(t)/{1 + r exp(t)}, r = 0, 1, 2, for relating the above risk factors to the age of onset of hypertension.

3714

Y.-H. Chen, D.M. Zucker / Journal of Statistical Planning and Inference 139 (2009) 3706 -- 3717

Table 3 Generalized odds-rate regression for the CVDFACTS data (standard error in parenthesis).

Male High education level Smoking Alcohol consumption Physical activity

r=0

r=1

r=2

0.1049 (0.1109) −0.3283 (0.1148) 0.0523 (0.1848) −0.3172 (0.2993) 0.0966 (0.1021)

0.4011 (0.2099) −0.7343 (0.2194) −0.0602 (0.3463) −0.6816 (0.4838) 0.4000 (0.1838)

0.6643 (0.2951) −1.0551 (0.3458) −0.2025 (0.4907) −1.0289 (0.6512) 0.6727 (0.2609)

Table 4 Case-cohort analysis for the CVDFACTS data. r=1

Male High education level Smoking Alcohol consumption Physical activity

r=2

Mean

 v ar

`var'

Mean

 v ar

`var'

0.4319 −0.7495 −0.0180 −0.6500 0.4225

0.1047 0.1125 0.3454 0.6513 0.0813

0.1129 0.1228 0.3433 0.6042 0.0865

0.7064 −1.0775 −0.1717 −1.0209 0.7092

0.1768 0.2676 0.6057 1.0917 0.1402

0.1860 0.2713 0.6071 1.0027 0.1555

 v ar: mean of the variance estimates over 500 subcohorts; `var': the sample variance of the parameter estimates among 500 subcohorts plus the estimated variance for the full-cohort parameter estimate.

Recall that r = 0 corresponds to the Cox model and r = 1 corresponds to the proportional odds model. Since the full-cohort data are available, we first fit the generalized odds-rate models to the full-cohort data using the method described in Section 2. Table 3 presents the regression parameter estimates and their standard errors. It is found that low education level is a significant risk factor for hypertension (male gender and physical activity are significant in models with r = 1 and 2 but not in the Cox model), and that after accounting for the other factors, smoking and drinking have no significant effects on the incidence of hypertension. We also apply the method developed in Section 4 to 500 case-cohort subsamples from the Chu-Dong cohort in CVDFACTS, which were obtained by repeated case-cohort sampling of subcohorts of size n = 300. The sampling probability was estimated using the weight mi (t) = 1. Table 4 shows the results for the generalized odds-rate models with r = 1, 2. We observe that the mean of the parameter estimates over the subcohort samples is close to the corresponding full-cohort estimate shown in Table  3. Moreover, the mean of the variance estimates (v ar) over the subcohort samples is quite close to the quantity `var', which is the sample variance of the parameter estimates over the 500 case-cohort samples plus the estimated variance for the full-cohort parameter estimate. Since `var' represents an estimate of the true variance of the parameter estimate based on case-cohort data,  the closeness of v ar to `var' implies that the proposed variance estimator is adequate in practical case-cohort studies. 7. Final remarks This work considers the application of the semiparametric transformation models in case-cohort studies. The proposed approach has the advantage that it achieves high efficiency, and is easy to implement without huge matrix inversion and highdimensional integration/summation, which are required by the NPMLE method when the number of cases and the dimension of covariates are large. This implies that our proposal is very suitable for model exploration in real applications. Also, it is very easy for our proposal to accommodate left-truncation data. Our proposal for case-cohort analysis is based on the Horvitz–Thompson weighting method, with two levels of weights allowed. The first-level weight is to produce an unbiased estimating equation for , and the second-level weight, possibly timedependent, is to achieve better efficiency. For the Cox model, Kulich and Lin (2004) have developed strategies for finding optimal second-level weights to enhance efficiency. How to find optimal second-level weights for general semiparametric transformation models is a potential topic for future work. Acknowledgments The authors would like to thank Drs. Chen-Hsin Chen and Wen-Harn Pan for kindly providing the CVDFACTS data and helpful comments. Appendix Assumptions: Here we present the assumptions required for the large sample theorems. We begin with some points of notation.  Superscript dot denotes derivative. For a matrix A = (aij ), we write A = ( i,j a2ij )1/2 . Finally, we denote by the maximum follow-up time: = inf{t : pr(X > t) = 0}.

Y.-H. Chen, D.M. Zucker / Journal of Statistical Planning and Inference 139 (2009) 3706 -- 3717

3715

The assumptions are then as follows: (A1) the observations are independently and identically distributed, and the covariates Z ∗ ∗ are bounded; (A2)  (·) is strictly positive with lims→−∞  (s) = 0, and is twice continuously differentiable; (A3) R˙ 0 (·) is continuous and strictly positive; (A4) 0 lies in the interior of a compact parameter set B; (A5) pr{Y( ) = 1} > 0; (A6) the matrix A (defined in t (0) −r ˆ ˆ ) defined in Eq. (12) satisfies E{sup Eq. (10)) is positive definite, and the matrix A( ∈B A() } < ∞; (A7) 0 {s(u) } dR0 (u) < ∞ (0) for all t ∈ [0, ] and r =1, 2, where s (u) is defined as in Section 3; (A8) the weight mi (t) is a random process that (i) is independent  of i , (ii) satisfies var 0 |dmi (t)| < ∞, and (iii) has the property that r(t) = {E(1 − i )mi (t)} is bounded away from 0 for all t ∈ [0, ]. The proofs of Theorems 1 and 2 follow those in Zucker (2005, Appendix) and hence are omitted. In the following we focus on the large sample theory for the case-cohort design (Theorems 3 and 4). We note here again that technical problems arise in the case of g(0) = 0 (cf. Zucker, 2005, Sec. 2.4), but we believe that the results for g(0) > 0 will continue to hold in this case. ˆ  ) and Rˆ w (t;  ). We first develop asymptotic representations for R(t; 0 0  (t) =   (t;  ). Recall from (5) that ˆ = R(t; ˆ  ) and  ˆ  ) − R0 (t): Write R(t) Representation of R(t; R R 0 0 0 N

i=1 dNi (t)

 (t) =  R N

0 Zi }e0 Zi

ˆ

i=1 Yi (t)g{R(t−)e

for t > 0.

In view of assumption (A7), Taylor expansion yields .  (t) − dR (t) =  0  R

N

i=1 dMi (t)

0 Zi

i=1 Yi (t)g{R0 (t)e

}e

0 Zi

   ˙ 0 (t)e0 Zi }e20 Zi Y (t)g{R ˆ − R0 (t)}. − i=1 i dR0 (t){R(t)   0 Zi }e0 Zi i=1 Yi (t)g{R0 (t)e

(18)

By assumptions (A1), (A2), and (A4), we have, uniformly in t ∈ [0, ], N   1 Yi (t)g{R0 (t)e0 Zi }e0 Zi → s(0) (t), N i=1

N   1 ˙ 0 (t)e0 Zi }e20 Zi → v(0) (t). Yi (t)g{R N

(19)

i=1

ˆ − R0 (t). In view of (19), as N tends to infinity, the solution to this Eq. (18) forms a first-order linear differential equation for R(t) equation can be represented in the form ˆ − R0 (t) = R(t)

N  1  t e(u) dMi (u) + op (N−1/2 ), (0) Ne(t) 0 s (u)

(20)

i=1

where



e(t) = exp

0

t

v(0) (u) dR (u) . 0 s(0) (u)

Note that the representation in (20) is equivalent to that given in Chen et al. (2002, Appendix).   Representation of Rˆ w (t; 0 ) − R0 (t): Recall that wi (t) = i + (1 − i ) i / ˆ (t), where ˆ (t) = i i (1 − i )mi (t)/ i (1 − i )mi (t). Write   ˆ (t) = g{Rˆ w (t) exp( Z )} exp( Z ). By  (t) =   (t;  ). Let G (t) = g{R (t) exp( Z )} exp( Z ) and G Rˆ w (t) = Rˆ w (t; 0 ) and  0 Rw Rw i wi 0 i 0 i 0 i 0 i 0 definition, N i=1 dNi (t)  (t) =  Rw N ˆ i=1 wi (t)Yi (t)Gwi (t−)  N N N i=1 dNi (t) i=1 dNi (t) i=1 dNi (t) +  + Op (N−1 ). − N = N N ˆ i=1 Yi (t)Gi (t) i=1 Yi (t)Gi (t) i=1 wi (t)Yi (t)Gwi (t) The bracketed term above can be asymptotically expanded as  N N N −1 ˆ (t) − G (t)} wi (t)Yi (t){G wi i i=1 dNi (t) i=1 (1 − i )(1 − i )Yi (t)Gi (t) × − i=1 N N N ˆ ˆ i=1 Yi (t)Gi (t) i=1 wi (t)Yi (t)Gwi (t) i=1 wi (t)Yi (t)Gwi (t)  N ˆ −1 (t) − −1 } i=1 (1 − i ) i Yi Gi (t){

. − N ˆ i=1 wi (t)Yi (t)Gwi (t) Applying the functional delta method, we have N1/2 { ˆ

−1

(t) − −1 } = {r(t) }−1

N 

  (1 − i ) 1 − i mi (t) + op (1)

i=1



uniformly over t ∈ [0, ] (Kulich and Lin, 2004). Also, uniformly in t, . ˙ ˆ (t) − G (t) = G Gi (t)e0 Zi {Rˆ w (t) − R0 (t)} wi i 

(21)

3716

Y.-H. Chen, D.M. Zucker / Journal of Statistical Planning and Inference 139 (2009) 3706 -- 3717

and N−1

N 

ˆ (t) → s(0) (t), wi Yi (t)G wi

N−1

i=1

N−1

˙ (t)e0 Zi → v(0) (t), wi Yi (t)G i

i=1

N  i i=1

N 



(1 − i )Yi (t)Gi (t) → E{(1 − i )Yi (t)Gi (t)}.

Thus, in view of assumption (A7), we have .  (t) − dR (t) =  0 Rw

N 

1 Ns

(0)

(t)

{dMi (t) + Ki (t)dR0 (t)} −

i=1

v(0) (t) ˆ {Rw (t) − R0 (t)}dR0 (t), s(0) (t)

(22)

where   Ki (t) = (1 − i ) 1 − i [Yi (t)Gi (t) − r(t)−1 mi (t)E{(1 − i )Yi (t)Gi (t)}].



(23)

Asymptotically, (22) forms a first-order linear differential equation for Rˆ w (t) − R0 (t), which yields the solution N  1  t e(u) {dMi (u) + Ki (u)dR0 (u)} + op (N−1/2 ). (0) Ne(t) 0 s (u)

Rˆ w (t) − R0 (t) =

i=1

0 Zi }e0 Zi , ˆ (t)=g{R(t)e  (t)=  (t;  ), and  (t)=(t;  ). Also, let G (t)=g{R (t)e0 Zi }e0 Zi , G ˆ ˆ  ),  ˆ Proof of Theorem 3. Let R(t)= R(t; 0 R R i i 0 0 0 i   ˆ (t) = g{Rˆ w (t)e0 Zi }e0 Zi , Rˆ w (t) = Rˆ w (t;  ),   (t) =   (t;  ). Then G Rw Rw wi 0 0

N−1/2 Qw (0 ) = N−1/2

N   i=1

+ N−1/2



ˆ (t)dR(t)} ˆ i (t){dNi (t) − Yi (t)G i

0

N   i=1

0



ˆ (t)dR(t) ˆ (t)dRˆ w (t)} + op (1). ˆ − wi (t)G i (t)Yi (t){G i wi

The first term above is equal to N−1/2 Q(0 ) whose asymptotic expansion has been given in Theorem 1. The second term can be expanded as N−1/2

N   i=1

− N−1/2

∞ 0

N   i=1

N  

ˆ − wi (t)dRˆ w (t)} + N−1/2 i (t)Yi (t)Gi (t){dR(t)

i=1 ∞

0

∞ 0

ˆ (t) − G (t)} dR(t) ˆ i (t)Yi (t){G i i

ˆ (t) − G (t)} dRˆ w (t). i (t)Yi (t)wi (t){G wi i

(24)

Rewrite the first term in (24) as N−1/2

N   i=1

− N−1/2

∞ 0

ˆ − dRˆ w (t)} + N−1/2 i (t)Yi (t)Gi (t){dR(t)

N   i=1

0

N   i=1



0



i (t)Yi (t)Gi (t)(1 − i )(1 − −1 i )dRˆ w (t)

i (t)Yi (t)Gi (t)(1 − i ) i { ˆ −1 (t) − −1 }dRˆ w (t).

(25)

 (t) and   (t) given in (22) and (18), and the uniform convergence of N −1 N  (t)Y (t)G (t) Applying the representations for  Rw R i i i=1 i  and Rw (t), the first term in (25) can be expressed as 



 v(0) (t) 1/2 ˆ ˆ N {Rw (t) − R(t)} dR0 (t) − N−1/2 (0) s (t) N

s(1)

0

i=1

 0



s(1) (t) Ki (t)dR0 (t) + op (1), s(0) (t)

(26)

with Ki (t) given in (23). The second term in (25) is asymptotically equivalent to N−1/2

N   i=1

∞ 0



i (t)Yi (t)Gi (t)(1 − i ) 1 −

i

 dR0 (t).

(27)

Y.-H. Chen, D.M. Zucker / Journal of Statistical Planning and Inference 139 (2009) 3706 -- 3717

3717

By the asymptotic expansion for n1/2 { ˆ (t)−1 − −1 } given in (21) and the uniform convergence of N−1

N  i i=1



(1 − i )i (t)Yi (t)Gi (t) → E{(1 − i )i (t)Yi (t)Gi (t)},

the third term in (25) is asymptotically equivalent to −N−1/2

N   i=1



  (1 − i ) 1 − i r(t)−1 mi (t)E{(1 − i )i (t)Yi (t)Gi (t)} dR0 (t).

(28)



0

Further, by Taylor expansion the last two terms in (24) can be written as    ∞  ∞ ˆ − Rˆ w (t)} dR0 (t) + v(1) (t)N1/2 {R(t) v(1) (t)(1 − i ) 1 − i N1/2 {Rˆ w (t) − R(t)}dR0 (t) + op (1), 0

0



(29)

ˆ  ) and Rˆ w (t;  ) given above, and the second term above is also op (1). Finally, in view of the asymptotic representations for R(t; 0 0 we have N  t −1/2  e(u) ˆ − Rˆ w (t)} = N Ki (u) dR0 (u) + op (1). N1/2 {R(t) (0) e(t) 0 s (u)

(30)

i=1

Combining (26)–(30), we obtain the claimed asymptotic expansion for N−1/2 Qw (0 ).



Proof of Theorem 4. Given the strong consistency of the full-cohort estimator ˆ (Theorem 2), the strong consistency of ˆ w can be proved in a way similar to Kulich and Lin (2004). Let Fi be the sigma-field generated by {Xi , i , Zi }, and  i = (1 − i ) {Ui (t) − r−1 (t)mi (t)u(t)} dR0 (t). 0

var(1 − It is easy to see that E(1 − −1 i |Fi ) = 0 and hence E{i (1 − −1 i )} = E{i E(1 − −1 i |Fi )} = 0, and var{i (1 − −1 i )} = E{⊗2 i −1 ), i = 1, . . . , N} and Q( ) are uncorrelated, and hence they

−1 i |Fi )} = E(⊗2 )(1 −

)/

= . Also, conditional on F , {  (1 −

i i i 0 i are uncorrelated unconditionally. Therefore, N−1/2 Qw (0 ) is asymptotically normal with mean 0 and covariance matrix A + B + . Further, −N−1 *Qw ()/ * = −N−1 *Q()/ *, and the latter can be shown to converge to the matrix A uniformly in a neighborhood of 0 . By a standard Taylor expansion, we then have that N1/2 (ˆ w − 0 ) is asymptotically normal with mean 0 and covariance matrix A−1 (A + B + )A−1 = A−1 + A−1 (B + )A−1 .  References Bennett, S., 1983. Analysis of survival data by the proportional odds model. Statist. Med. 2, 273–277. Borgan, ]., Langholz, B., Samuelsen, S.O., Goldstein, L., Pogoda, J., 2000. Exposure stratified case-cohort designs. Lifetime Data Anal. 6, 39–58. Chen, H.-J., Bai, C.-H., Yeh, W.-T., Chiu, H.-C., Pan, W.-H., 2006. Influence of metabolic syndrome and general obesity on the risk of ischemic stroke. Stroke 37, 1060–1064. Chen, K., 2001. Generalized case–control sampling. J. Roy. Statist. Soc. B 63, 791–809. Chen, K., Jin, Z., Yin, Z., 2002. Semiparametric analysis of transformation models with censored data. Biometrika 89, 659–668. Chen, K., Lo, S.-H., 1999. Case-cohort and case–control analysis with Cox's model. Biometrika 86, 755–764. Cheng, S.C., Wei, L.J., Yin, Z., 1995. Analysis of transformation models with censored data. Biometrika 82, 835–845. Cox, D.R., 1972. Regression models and life-tables (with discussion). J. Roy. Statist. Soc. B 34, 187–220. Dabrowska, D.M., Doksum, K.A., 1988. Estimation and testing in the two-ample generalized odds-rate model. J. Amer. Statist. Assoc. 83, 744–749. Fine, J.P., Yin, Z., Wei, L.J., 1998. On the linear transformation model with censored data. Biometrika 85, 980–986. Horvitz, D.G., Thompson, D.J., 1952. A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47, 663–685. Kong, L., Cai, J., Sen, P.K., 2004. Weighted estimating equations for semiparametric transformation models with censored data from a case-cohort design. Biometrika 91, 305–319. Kulich, M., Lin, D.Y., 2004. Improving the efficiency of relative-risk estimation in case-cohort studies. J. Amer. Statist. Assoc. 99, 832–844. Lai, T.L., Ying, Z., 1991. Rank regression methods for left-truncated and righted-censored data. Ann. Statist. 19, 531–556. Lu, W., Tsiatis, A.A., 2006. Semiparametric transformation models for the case-cohort study. Biometrika 93, 207–214. Pettitt, A.N., 1982. Inference for the linear model using a likelihood based on ranks. J. Roy. Statist. Soc. B 44, 234–243. Prentice, R.L., 1986. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73, 1–11. Robins, J.M., Rotnitzky, A., Zhao, L.P., 1994. Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 89, 846–866. Tsai, W.Y., Jewell, N.P., Wang, M.C., 1987. A note on the product-limit estimator under right censoring and left truncation. Biometrika 74, 883–886. Tsodikov, A., 2003. Semiparametric models: a generalized self-consistency approach. J. Roy. Statist. Soc. B 65, 759–774. Zeng, D., Lin, D.Y., 2006. Efficient estimation of semiparametric transformation models for counting processes. Biometrika 93, 627–640. Zeng, D., Lin, D.Y., 2007. Maximum likelihood estimation in semiparametric regression models with censored data (with Discussion). J. Roy. Statist. Soc. B 69, 507–564. Zeng, D., Lin, D.Y., Avery, C.L., North, K.E., Bray, M.S., 2006. Efficient semiparametric estimation of haplotype-disease associations in case-cohort and nested case–control studies. Biostatistics 7, 486–502. Zeng, D., Lin, D.Y., Yin, G., 2005. Maximum likelihood estimation for the proportional odds model with random effects. J. Amer. Statist. Assoc. 100, 470–483. Zucker, D.M., 2005. A pseudo-partial likelihood method for semiparametric survival regression with covariate errors. J. Amer. Statist. Assoc. 100, 1264–1277.