Joint analysis of current count and current status data

Joint analysis of current count and current status data

Journal of Multivariate Analysis 143 (2016) 153–164 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www...

486KB Sizes 3 Downloads 146 Views

Journal of Multivariate Analysis 143 (2016) 153–164

Contents lists available at ScienceDirect

Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva

Joint analysis of current count and current status data Chi-Chung Wen a , Yi-Hau Chen b,∗ a

Department of Mathematics, Tamkang University, Taiwan

b

Institute of Statistical Science, Academia Sinica, Taiwan

article

info

Article history: Received 24 September 2014 Available online 11 September 2015 AMS subject classifications: 62N01 62N02 62P10 Keywords: Correlated data Frailty model Panel count Self-consistency Semiparametric maximum likelihood

abstract We consider joint analysis of event times to a recurrent and a non-recurrent event, with the event time data subject to type I interval censoring. The motivation arises from a survey study, which collected current count data for time to occurrences of fracture (recurrent event), and current status (i.e., type I interval censored) data for time to osteoporosis (nonrecurrent event). The aim of the study is to examine risk factors for, and levels of association between, the recurrent and non-recurrent events. We propose a joint analysis of current count and current status data based on a joint modeling for recurrent and non-recurrent events. In the proposed framework, a non-homogeneous Poisson process is assumed for the recurrent event, a proportional hazards model is assumed for failure time of the non-recurrent event, and the two event time processes share a common gamma frailty. A semiparametric maximum likelihood estimator, together with a stable computation algorithm, is developed for the joint model. The parametric (covariate effects and frailty) and nonparametric (baseline mean and cumulative hazard functions) components of the estimator are consistent at rates of square root and cubic root of the sample size, respectively. The asymptotic normality for the parametric component of the estimator is established. The application to the survey data mentioned above shows that, female is a common strong risk factor for both fracture and osteoporosis, and times to fracture and osteoporosis are highly associated. © 2015 Elsevier Inc. All rights reserved.

1. Introduction Current status data, also called type I interval censored data, commonly arise in survey studies. In such data, a survey (examination) time and an indicator of whether an event of interest has ‘‘failed’’ by the survey time are available for each subject. Inferences on the underlying failure time distribution with observed current status data have been widely investigated, which can in fact be considered under the more general context of interval-censored event time analysis, where there may be multiple examination times in each subject [7,3,12]. When the event of interest is recurrent by nature, such as repeated infections or injuries, but is only examined at an examination, the only information for each subject’s event times includes the examination time and the number of events up to the examination time. We will call this type of data the ‘‘current count’’ data, which is a special case of the well known ‘‘panel count’’ data (e.g., [13,18,1,19,14]), where multiple examination times and the numbers of events up to the examination times are available for each subject. This work focuses on joint analysis of current count and current status data, which has not been formally addressed in literature, though plenty of studies have been done for individual analysis of the two types of interval-censored data. Our



Correspondence to: Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan. E-mail address: [email protected] (Y.-H. Chen).

http://dx.doi.org/10.1016/j.jmva.2015.08.015 0047-259X/© 2015 Elsevier Inc. All rights reserved.

154

C.-C. Wen, Y.-H. Chen / Journal of Multivariate Analysis 143 (2016) 153–164

motivation arises from the 2005 fracture-osteoporosis survey study in Taiwan, where one of the study aims is to understand the respective risk factors for fracture and osteoporosis, and the degrees of association between the underlying processes for the occurrences of fracture and osteoporosis. Note that fracture is recurrent while osteoporosis is non-recurrent by nature, and current count data were observed for the former while current status data were observed for the latter in this survey study, since there was only one examination time for each subject. Our proposal to joint analysis of current count and current status data is a likelihood-based approach based on joint frailty models. Specifically, we assume the two underlying event time processes, one for times to the recurrent event and the other for failure time of the non-recurrent event, share a common gamma frailty whereby the association between the two processes is induced. Given the frailty, the recurrent event process is assumed to be a nonhomogeneous Poisson process, while the failure of the non-recurring event process is assumed to follow a proportional hazards model, and the two processes are independent. The modeling framework is semiparametric, in that the frailty distribution and the covariate effects in the two event time processes are parametrically modeled, while the baseline mean and cumulative hazard functions are fully unspecified (see Section 2). To make inferences on the proposed models, in Section 3 we propose semiparametric maximum likelihood estimation, where the two nonparametric functions (baseline mean and cumulative hazard) are treated as non-decreasing step functions with jumps only at the examination times. The resulting semiparametric maximum likelihood estimator (SPMLE) for the model parameter is shown to be consistent. The convergence rates are shown to be cubic root and square root of the sample size, respectively, for the nonparametric (the baseline mean and cumulative functions) and parametric components (regression and frailty parameters) of the parameter. We derive the asymptotic normality for the parametric component of the proposed SPMLE, and propose an estimator for the asymptotic variance–covariance matrix of the SPMLE using numerical second differentiation of the log-likelihood function. In Section 4, we develop a hybrid algorithm for computation of the SPMLE, which consists of a self-consistency algorithm for solving nonparametric baseline mean and cumulative hazard functions, and a conventional optimization algorithm for solving the regression and the gamma frailty parameters. Simulation studies present in Section 5 reveal that the proposed SPMLE, together with its asymptotic theory, performs well in scenarios with moderate sample sizes. We apply the proposed method to joint analysis of the current count data for fracture and the current status data for osteoporosis observed in the 2005 fracture-osteoporosis survey study in Taiwan. The analysis provides results regarding risk factors for, as well as the levels of association between, the occurrences of fracture and osteoporosis (see Section 6). 2. Model description Let N(t ) be the number of occurrences of a recurrent event over (0, t ], and T the failure time of another related, nonrecurrent event for one subject. Denote by C the examination time, and Z1 and Z2 the vectors of covariates that may affect N(t ) and T respectively. The covariates Z1 and Z2 are allowed to have elements in common. When current count data for N(t ) and current status data for T are observed, the observation for each subject then consists of Y = (C , N , ∆, Z1 , Z2 ), where N ≡ N(C ) indicates the number of occurrences of the recurrent event up to the examination time C , and ∆ ≡ I (T ≤ C ) indicates whether the failure time T precedes C or not. To assess the effect of Z1 on N(t ) and the effect of Z2 on T , while accounting for association between two counting processes N1 (t ) and N2 (t ), with N1 (t ) = N(t ) for a recurrent event and N2 (t ) = I [T ≤ t ] for a non-recurrent event induced by the event time T , a shared and unobserved frailty η is introduced. Suppose η is a gamma random variable with mean 1 and variance γ > 0. We assume that, conditional on (η, Z1 , Z2 ), N and T are independent. The recurrent process N is a nonhomogeneous Poisson process with the mean function, which is also the cumulative incidence function, given by E (N(t )|η, Z1 , Z2 ) = η exp(β1′ Z1 )Λ1 (t ).

(1)

The cumulative hazard function of the failure time T is given by

Λ(t |η, Z1 , Z2 ) = η exp(β2′ Z2 )Λ2 (t ).

(2)

In this joint modeling, β1 and β2 denote the vectors of unknown regression parameters, Λ1 the increasing baseline mean function, and Λ2 the baseline cumulative hazard function. In the model, large γ represents strong association between the recurrent process and the failure time. It can be shown that the marginal mean function of N(t ) is E (N(t )|Z1 ) = ′ exp(β1′ Z1 )Λ1 (t ), and the marginal survival function of T is S (t |Z2 ) = (1 + γ eβ2 Z2 Λ2 (t ))−1/γ . Assume C and (η, N(t ), T ) are independent conditioned on (Z1 , Z2 ) and the conditional distribution of C given (Z1 , Z2 ) does not depend on the parameters of interest. Then the likelihood for a single observation Y = (C , N , ∆, Z1 , Z2 ) is



L(θ , Λ1 , Λ2 )(Y ) ∝ Eη [ηeβ1 Z1 Λ1 (C )]N exp(−ηeβ1 Z1 Λ1 (C ))[1 − exp(−ηeβ2 Z2 Λ2 (C ))]∆ [exp(−ηeβ2 Z2 Λ2 (C ))]1−∆

=











Γ (N + γ −1 ) ′ ′ −1 [γ eβ1 Z1 Λ1 (C )]N [(1 + γ eβ1 Z1 Λ1 (C ))−N −γ Γ (γ −1 )

− (1 + γ eβ1 Z1 Λ1 (C ) + γ eβ2 Z2 Λ2 (C ))−N −γ ]∆ [(1 + γ eβ1 Z1 Λ1 (C ) + γ eβ2 Z2 Λ2 (C ))−N −γ ]1−∆ , ′ ′ where θ = (β1 , β2 , γ )′ and Eη is the expectation with respect to η. ′



−1





−1

C.-C. Wen, Y.-H. Chen / Journal of Multivariate Analysis 143 (2016) 153–164

155

3. Parameter estimation In this section we discuss semiparametric maximum likelihood estimation for the joint model (1) and (2) with current count and current status data. Let Yi = (Ci , Ni , ∆i , Z1i , Z2i ), i = 1, . . . , n, be n i.i.d. copies of observed variable Y . The likelihood function of (θ , Λ1 , Λ2 ) based on {Yi |i = 1, . . . , n} is Ln (θ , Λ1 , Λ2 ) =

n 

L(θ , Λ1 , Λ2 )(Yi ).

(3)

i =1

We now establish the existence of the semiparametric maximum likelihood estimator (SPMLE) that maximizes this 1θ and Λ 2θ in the space of right-continuous likelihood. For every fixed θ , we first show that there exist random elements Λ and non-decreasing functions that maximize Ln . Without loss of generality, assume that C1 ≤ · · · ≤ Cn . By the fact that

|Ln (θ , Λ1 , Λ2 )| ≤

n 

Eη {[ηeβ1 Z1i Λ1 (Ci )]Ni exp[−ηeβ1 Z1i Λ1 (Ci )]}, ′



i =1

1θ (Cn ) = ∞, so Λ 1θ (Cn ) exists. On the other hand, since the observations with ∆1 = 0 or ∆n = 1 the likelihood will be 0 if Λ contribute nothing to estimation of Λ2 , we set Λ2 (C1 ) ≡ 0 when ∆1 = 0, and Λ2 (Cn ) = Λ2 (Cn−1 ) when ∆n = 1. From this and the fact shown above that the existence of Λ1θ does not depend on ∆i ’s, we may assume that ∆1 = 1 and ∆n = 0 in 2θ is infinity, so Λ 2θ exists. Finally, (3) to establish the existence of Λ2θ . Under this assumption, the likelihood will be 0 if Λ  1θ , Λ 2θ ) is continuous on a compact parameter set of θ , its maximizer, say  1 = Λ 1θ and because θ → Ln (θ , Λ θ , exists. Let Λ 2 = Λ 2θ , then ( 1 , Λ 2 ) maximizes Ln (θ , Λ1 , Λ2 ). Λ θ, Λ Since the likelihood function Ln depends on (Λ1 , Λ2 ) only through their values at the examination times, it is easy to see that the SPMLE of Λj (j = 1, 2) is unique within the class of all right-continuous non-decreasing step functions with possible jumps only at examination times {C1 , . . . , Cn }. Hence we need only restrict the search of SPMLE of Λ1 and Λ2 within this class of functions. With conditions (C1)–(C3) in the Appendix, our theorems establish asymptotic properties for the proposed SPMLE  1 , Λ 2 ) of ζ = (θ , Λ1 , Λ2 ). Denote ζ0 = (θ0 , Λ10 , Λ20 ) the true parameter. ζ = ( θ, Λ P

P

j (t ) → Λj0 (t ) for Theorem 1 (Consistency and Rate of Convergence). The SPMLE  ζ is consistent; that is,  θ → θ0 and each Λ every t in the study period (τ1 , τ2 ), j = 1, 2. The rate of convergence of the SPMLE is of order only n−1/3 under the metric d∗ defined as 1 − Λ10 ∥2 + ∥Λ 2 − Λ20 ∥2 }1/2 , d∗ ( ζ , ζ0 ) ≡ {| θ − θ0 |2 + ∥Λ namely d∗ ( ζ , ζ0 ) = Op (n−1/3 ).



d

Theorem 2 (Asymptotic Normality). The SPMLE  θ is asymptotically normal; that is, n( θ − θ0 ) −→ N (0, I0−1 ). The asymptotic −1 variance I0 is the inverse of the efficient Fisher information matrix I0 , whose existence is examined in Appendix A.3. The proofs of Theorems 1 and 2 are given in the Appendix. The efficient Fisher information is the variance–covariance matrix of the efficient score of θ [15, pp. 368–369], which is derived explicitly in the Appendix. In theory, the variance estimation of  θ can be obtained by inverting the observed information matrix. However, the information matrix has a complicated form, rendering this direct approach to variance estimation difficult to implement. An alternative approach is to numerically approximate the observed information matrix by the numerical second differentiation of the log-likelihood function:

   Iij ≡ −n−1 ρn−2 log L˜ n ( θ + ρn ei + ρn ej ) − log L˜ n ( θ + ρn ei ) − log L˜ n ( θ + ρn ej ) + log L˜ n ( θ) where L˜ n (θ ) = supΛ1 ,Λ2 Ln (θ , Λ1 , Λ2 ), ei is a unit vector with the ith element equal to 1 and all the other elements 0, and ρn is a tuning constant of order n−1/2 . This approximation method was also used by Wen and Chen [20], among others.

1 , Λ 2 ) is Op (n−1/3 ), it is in fact dominated by that for the SPMLE Remark 1. Although the overall convergence rate for ( θ, Λ 1 , Λ 2 ). The convergence rate for the parametric component  of the nonparametric baseline functions (Λ θ of the SPMLE, however, achieves the usual parametric rate Op (n−1/2 ). 4. Computation algorithm In this section, we will first state the main idea underlying our computation method for the SPMLE, and then describe in detail the algorithm. Let c1,1 < · · · < c1,n1 denote the values of examination times {Ci : i = 1, . . . , n} excluding the subjects with the smallest observation time for which N = 0, and c2,1 < · · · < c2,n2 the distinct ordered values of {Ci : i = 1, . . . , n}, excluding

156

C.-C. Wen, Y.-H. Chen / Journal of Multivariate Analysis 143 (2016) 153–164

the subjects with the smallest examination time but with ∆ = 0 and the subjects with the largest examination time but ˆ 2 [4]. As with ∆ = 1. The exclusion of such data points avoids zero or infinity value for the estimated baseline function Λ discussed in Section 3, to obtain the SPMLE, we can simply consider Λj (j = 1, 2) a right-continuous non-decreasing step  function with possible jumps only at time points {cj,1 , . . . , cj,nj }. In this case, Λj can be represented by Λj (t ) = k:cj,k ≤t vj,k , where vj = (vj,1 , . . . , vj,nj )′ is a vector of nonnegative parameters with vj,k representing the jump size of Λj at cj,k . Then, in terms of the parameters (θ , v1 , v2 ) the logarithm of the likelihood Ln can be written as

ℓ(θ , v1 , v2 ) =

n 

log A0,i (A1,i − A2,i )∆i (A2,i )1−∆i (θ , v1 , v2 ),





i=1

where

Ni

 A0,i (θ , v1 , v2 ) = Γ (Ni + γ −1 ) γ eβ1 Z1i ′



v 1 ,k 



Γ (γ −1 ),

k:c1,k ≤Ci

−Ni −γ −1

 A1,i (θ , v1 , v2 ) = 1 + γ e

β′ Z

1 1i



v 1 ,k 

,

l:c1,k ≤Ci

−Ni −γ −1

 A2,i (θ , v1 , v2 ) = 1 + γ eβ1 Z1i ′



v1,k + γ eβ2 Z2i ′

k:c1,k ≤Ci



.

v2,k 

k:c2,k ≤Ci

For 1 ≤ k ≤ nj , j = 1, 2, the partial derivative of ℓ with respect to vj,k takes the form

∂ℓ (θ , v1 , v2 ) = aj,k (θ , v1 , v2 ) − bj,k (θ , v1 , v2 ), ∂vj,k where aj,k (θ , v1 , v2 ) and bj,k (θ , v1 , v2 ) are positive functions given by a1,k (θ , v1 , v2 ) =

Ni

 i:Ci ≥c1k

,

v1,k

 k:c1,k ≤Ci

q

 b1,k (θ , v1 , v2 ) =



(1 + Ni γ )e

β1 Z1,i

∆i

i:Ci ≥c1,k

A1,i − A2,i q

 a2,k (θ , v1 , v2 ) =



( 1 + N i γ )e

i:Ci ≥c2,k

b2,k (θ , v1 , v2 ) =



β2 Z2,i

∆i

q

A1i,i − A2i,i

A2i,i A1,i − A2,i

 + (1 −

q −1 ∆i A2i,i

)

(θ , v1 , v2 ),

 (θ , v1 , v2 ),

(1 + Ni γ )eβ2 Z2,i (1 − ∆i )A2i,i (θ , v1 , v2 ), q −1

i:Ci ≥c2,k

with qi = (1 + Ni γ + γ )/(1 + Ni γ ). A necessary condition for ( θ , v1 , v2 ) to be the maximizer is (∂/∂vj,k )ℓ( θ , v1 , v2 ) =   aj,k (θ, v1 , v2 ) − bj,k (θ , v1 , v2 ) = 0, which leads to the ‘‘self-consistency equations’’:

 vj,k =  v j ,k

aj,k ( θ , v1 , v2 ) + M0 bj,k ( θ , v1 , v2 ) + M0

,

k = 1, . . . , nj , j = 1, 2,

(4)

where M0 ≥ 0 is a chosen constant whose rationale will be given later. From the definitions of A1,i and A2,i given above, we can see that A1,i > A2,i > 0 and hence, by (4),  vj,k ’s are ensured to be positive once their initial values are positive. Let D = (D1,1 , . . . , D1,n1 , D2,1 , . . . , D2,n2 )′ , where Dj,k ≡ Dj,k (θ , v1 , v2 ) = vj,k

aj,k (θ , v1 , v2 ) + M0 bj,k (θ , v1 , v2 ) + M0

,

k = 1, . . . , nj , j = 1, 2.

Then, we have ( v1′ , v2′ )′ = D( θ , v1 , v2 ), i.e., ( v1′ , v2′ )′ is a fixed point of D( θ , ·, ·), which motivates the following computational approach. Since the parameter γ is restricted to be positive, for computation stability we use the reparametrization γ ∗ = log γ . With a slight abuse of notation, we still denote by θ the parameter set (β1′ , β2′ , γ ∗ )′ . Combining the self-consistency equations in (4) with some conventional numerical optimization method (e.g., the Newton–Raphson and the Nelder–Mead

1n (t ), Λ 2n (t )): methods), we propose the following procedure to iteratively compute the SPMLE ( θn , Λ

C.-C. Wen, Y.-H. Chen / Journal of Multivariate Analysis 143 (2016) 153–164

157

(1)′ (1)′ 1 2 (k) (k), 1

Step 1. Choose an initial value (θ (1) , v ′

, v ) ∈ Rd × (0, ∞)n1 × (0, ∞)n2 . (k) Step 2. Update each current estimate (θ , v , v2 ) for k ≥ 1: (k) (k) (k) (k+1) Step 2.1. Update θ to θ by maximizing ℓ(θ , v1 , v2 ) over θ , using some conventional numerical optimization algorithm such as the Newton–Raphson and the Nelder–Mead methods. (k) (k) (k+1) Step 2.2. Update (v1 , v2 ) to (v1 , v2(k+1) ) by

(v1(k+1) , v2(k+1) ) = D(θ (k+1) , v1(k) , v2(k) ). (k+1) Step 3. If the updated estimate (θ (k+1) , v1 , v2(k+1) ) is close to (θ (k) , v1(k) , v2(k) ), then stop the procedure and let the SPMLE   1n (t ), Λ 2n (t )) = (θ (k) , ( θn , Λ v (k) , v (k) ); otherwise, return to Step 2. (1) (2) l:c ≤t 1,l l:c ≤t 2,l ′



l

l

In the procedure, the basis of Step 2.2 is the self-consistency Eq. (4). We now describe the rationale of the self-consistency equations. From the facts that both aj,k and bj,k are positive functions, and ∂ℓ/∂vj,k = aj,k − bj,k = 0, ∂ 2 ℓ/(∂vj,k )2 = ∂ aj,k /∂vj,k − ∂ bj,k /∂vj,k < 0 at ( θ , v1 , v2 ), we can choose a constant M0 ≥ 0 large enough such that

       ∂ D j ,k   v1 , v2 ) = 1 +  vj,k  ∂v (θ , j,k 

∂ aj,k ( θ , v1 , v2 ) ∂vj,k



∂ bj,k ( θ , v1 , v2 )  ∂vj,k

bj,k ( θ , v1 , v2 ) + M0

  ∈ (0, 1)  

(5)

for all k = 1, . . . , nj , j = 1, 2. By the mean value theorem and the continuity of ∂ Dj,k /∂vj,k , we know from (5) that there exists 0 < b0 < 1 such that |Dj,k (θ , u1 , u2 ) − Dj,k (θ , v1 , v2 )| ≤ b0 |uj,k − vj,k |, for k = 1, . . . , nj , j = 1, 2, and (θ, u1 , u2 ), (θ , v1 , v2 ) near ( θ , v1 , v2 ). This means that the self-consistency algorithm is locally contractive and will converge by the contraction principle (see, for example, [11, p. 220]). Although a sufficiently large M0 may theoretically be needed to satisfy the condition (5) for local convergence, a large M0 may adversely slow down the convergence. To solve the dilemma, we may start with M0 = 0 for convenience. If the algorithm has shown a trend towards convergence during early iterations, then we fix M0 = 0 throughout the algorithm; otherwise, we increase M0 to a larger value in later iterations to ensure convergence. In all of our simulations and data analysis, we found that the convenience choice of M0 = 0 leads to convergent results, and using a variety of values of M0 leads to the same convergent solution. Therefore, the specification of M0 is usually not a sensitive issue in computation. The self-consistency equations may have multiple solutions as the information loss due to censoring and missing data (if any) becomes heavier (see [17], for an example of this issue with doubly- and interval-censored data). However, our simulation studies have shown that the proposed algorithm is not sensitive to initial values for θ and (v1 , v2 ); various different choices of the initial values usually lead to the same solution. Also, multiple initial values can be used to check whether the algorithm is trapped in local maxima or not. 5. Simulation studies In this section, we assess the numerical performances of the proposed SPMLE and examine the adequacy of the normal approximation. All the computation is done on an ordinary PC with MATLAB. We conducted 400 replications in each simulation scenario. In the first simulation study, the recurrent process N was simulated from a Poisson process with mean function (1), and the failure time T for a related non-recurrent event was simulated from model (2) for each subject. Here, Λ1 (t ) = 0.8t 0.8 , Λ2 (t ) = 0.8t 1.2 , and η followed a gamma distribution with mean 1 and variance γ . The examination time C was generated from Uniform(0, 1), and the covariate Zj , j = 1, 2, were independent Bernoulli with success probability of 0.5. The number of subjects n = 200 or 400. The true values of β1 and β2 , β10 and β20 , are 0.5 or 1, and the true value of γ , γ0 , is 0.4 or 1.2. In Table 1, ‘‘Bias’’ means the average of  θ − θ0 over simulations; ‘‘SD’’ the simulation standard deviation of the estimates; ‘‘ASE’’ the average of the standard error estimates over simulations; and ‘‘CP’’ the coverage probability of the 95% confidence intervals obtained by the normal approximation. It is seen from Table 1 that, the proposed SPMLE is essentially unbiased, and the proposed standard error estimates are quite close to the simulation standard deviations. For some set-ups, the bias for  β2 is larger in the n = 400 setting than in the n = 200 setting. This may be due to the larger estimation uncertainty for β2 . In fact, after accounting for the estimation uncertainty, all the estimates for regression parameters (β1 , β2 ) can be deemed as unbiased, since all the biases are much smaller than one standard error, and the relative biases (absolute bias relative to the parameter value) are less than 12%. Also, the coverage probabilities of the 95% confidence intervals match the nominal value well, implying that the normal approximation theory works well for the SPMLE in the considered settings with moderate sample sizes. The adequacy of the normal approximation theory can be further verified from the almost linear Q–Q plots in Fig. 1, where we depict the standardized SPMLE (n I )1/2 ( θ − θ0 ) versus the standard normal variate, based on the simulation scenario with β10 = β20 = 1, γ0 = 1.2, and n = 400. Recall that our proposal is based on the gamma frailty assumption. In the second simulation study, we examined how sensitive our method is to this assumption. We considered the same joint model specifications described above, with the exception that the frailty followed a mixture log-normal distribution: 0.5LN (−1.8, 0.34) + 0.5LN (0.41, 0.36),

158

C.-C. Wen, Y.-H. Chen / Journal of Multivariate Analysis 143 (2016) 153–164

Table 1 Simulation results in the scenario where the true frailty distribution is Gamma(γ0−1 , γ0−1 ). ASE: the average of standard error estimates; CP: the coverage probability of the 95% confidence interval.

γ0

(β10 , β20 )

Parameter

n = 200 Bias

0.4 . .

(0.5, 0.5)

. . .

(0.5, 1.0)

. . .

(1.0, 0.5)

. . .

(1.0, 1.0)

1.2 . .

(0.5, 0.5)

. . .

(0.5, 1.0)

. . .

(1.0, 0.5)

. . .

(1.0, 1.0)

β1 β2 γ β1 β2 γ β1 β2 γ β1 β2 γ β1 β2 γ β1 β2 γ β1 β2 γ β1 β2 γ

n = 400 SD

ASE

CP

Bias

SD

ASE

CP

0.010 0.056 0.059

0.214 0.313 0.178

0.220 0.299 0.190

96.0 94.3 91.3

−0.001 −0.043

0.160 0.212 0.119

0.153 0.206 0.130

94.8 94.8 93.0

0.009 0.079 −0.054

0.213 0.315 0.177

0.220 0.303 0.188

96.0 93.8 91.8

0.002 0.015 −0.040

0.159 0.217 0.117

0.154 0.206 0.128

95.3 92.5 94.3

0.012 0.043 −0.063

0.195 0.306 0.147

0.206 0.297 0.159

96.3 94.5 91.8

0.007 0.024 −0.039

0.148 0.215 0.100

0.144 0.206 0.109

94.5 94.0 94.3

0.011 0.074 −0.059

0.196 0.308 0.147

0.207 0.298 0.157

96.0 94.0 92.0

0.005 0.047 −0.034

0.148 0.222 0.103

0.144 0.205 0.108

94.5 91.3 94.3

−0.010 −0.096

0.026

0.243 0.347 0.304

0.255 0.345 0.326

97.0 95.5 92.3

0.003 0.009 −0.055

0.170 0.230 0.214

0.177 0.240 0.227

95.8 97.3 95.0

0.026 0.017 −0.097

0.246 0.357 0.299

0.255 0.348 0.320

96.5 95.3 92.8

0.003 0.029 −0.047

0.170 0.237 0.214

0.177 0.242 0.224

95.5 96.3 94.3

0.020 0.002 −0.096

0.230 0.335 0.268

0.243 0.341 0.287

96.5 95.3 92.0

0.010 0.026 −0.053

0.176 0.222 0.199

0.168 0.237 0.200

95.0 95.8 92.3

0.024 0.014 −0.096

0.228 0.339 0.256

0.243 0.343 0.282

97.3 95.0 92.3

0.010 0.032 −0.048

0.174 0.226 0.195

0.168 0.239 0.198

94.3 95.5 93.5

0.003

Fig. 1. Q–Q plots of standardized estimates versus the standard normal distribution under the simulation scenario with β10 = β20 = 1, γ0 = 1.2, and n = 400.

C.-C. Wen, Y.-H. Chen / Journal of Multivariate Analysis 143 (2016) 153–164

159

Table 2 Simulation results in the scenario where the frailty distribution is misspecified. ASE: the average of standard error estimates; CP: the coverage probability of the 95% confidence interval.

(β10 , β20 ) (0.5, 0.5) (0.5, 1.0) (1.0, 0.5) (1.0, 1.0)

Parameter

β1 β2 β1 β2 β1 β2 β1 β2

n = 200

n = 400

Bias

SD

ASE

CP

Bias

SD

ASE

CP

0.025 0.033

0.260 0.395

0.264 0.363

97.3 93.3

0.012 0.014

0.180 0.238

0.183 0.248

94.8 95.3

0.023 0.060

0.263 0.395

0.264 0.369

95.8 94.3

0.014 0.017

0.179 0.233

0.183 0.251

95.8 97.0

0.018 0.003

0.231 0.386

0.252 0.358

96.8 93.0

0.016 0.013

0.170 0.231

0.174 0.245

96.5 97.5

0.020 0.027

0.231 0.383

0.253 0.362

96.8 93.0

0.020 0.017

0.171 0.231

0.175 0.246

96.3 97.0

Table 3 Simulation results in the scenario where the model for recurrent event is misspecified. Given frailty and covariate, the recurrent event process N (t ) is generated by Binomial model with various trial sizes and fixed mean µ = η exp(β1 Z1 )Λ1 (t ). Model Bin(20, µ/20)

Bin(30, µ/30)

Bin(50, µ/50)

Parameter

β1 β2 γ β1 β2 γ β1 β2 γ

n = 200

n = 400

Bias

SD

ASE

CP

Bias

SD

ASE

CP

0.010 0.053 −0.148

0.225 0.352 0.255

0.239 0.343 0.273

95.5 96.0 90.0

0.009 0.023 −0.091

0.156 0.247 0.182

0.167 0.237 0.194

96.5 94.0 90.5

0.004 0.052 −0.141

0.227 0.336 0.264

0.239 0.343 0.274

96.0 96.5 91.0

0.004 0.029 −0.067

0.163 0.242 0.190

0.167 0.237 0.196

96.0 94.8 92.5

0.008 0.044 −0.129

0.251 0.352 0.257

0.241 0.345 0.277

95.0 95.8 89.3

0.012 0.027 −0.069

0.165 0.249 0.186

0.168 0.238 0.196

96.5 93.0 92.3

where LN (µ, σ 2 ) denotes the log-normal distribution with location parameter µ and scale parameter σ . The mean and variance of this mixture distribution is 1 and 1.36, respectively. Table 2 presents a summary of the regression parameter estimates obtained by the proposed method. These results reveal that the relative biases of the regression parameters estimates are within a negligible range (<7%), and the coverage probabilities of the confidence intervals are still close to the desired level. This suggests that the proposed method seems quite robust to the misspecification of the frailty distribution. Further, we performed simulations to examine robustness of the SPMLE to the Poisson assumption on recurrent event process. The data are generated in the same way as those for Table 1 with (β10 , β20 , γ0 ) = (1, 1, 1.2), except that the recurrent event process N (t ), conditional on (η, Z1 , Z2 ), is now generated by Binomial model with various trial sizes but fixed mean µ = η exp(β1 Z1 )Λ1 (t ) as in (1). The results are shown in Table 3. As can been seen from that table, in this setting where the true recurrent event process deviates moderately from the Poisson process, the biases of the estimates for the regression parameters (β1 , β2 ) are still reasonable, in that all the biases are much smaller than one standard error, and the relative biases are less than 6%. 6. Analysis of fracture-osteoporosis survey data The 2005 fracture-osteoporosis survey study recruited 1336 male and 1361 female residents older than 65 in Taiwan. The mean ages at survey were 74.01 and 74.08 for male and female study subjects, respectively. Histories for health outcomes, including fracture and osteoporosis, were collected for the study subjects. The mean numbers of occurrences of fracture by the survey time were 0.118 for male and 0.305 for female study subjects. On the other hand, 214 men and 477 women in the study had experienced osteoporosis by the survey time. Based on the current count data for incidences of fracture, and the current status data for age-at-onset of osteoporosis observed for each subject in this survey, we will investigate risk factors respectively for fracture and osteoporosis, and assess the strength of association between the occurrences of the two disorders, applying the joint analysis approach proposed in Section 2. In both the mean model (1) for the numbers of fracture having occurred, and the cumulative hazard model (2) for ageat-onset of osteoporosis, the covariates Z1 and Z2 are the same and include the indicator of female, coffee intake, tea intake, and alcohol intake. The covariates of coffee, tea, and alcohol intake are coded as integer values from 0 to 4 for zero and increasingly higher intake. The computation procedure introduced in Section 4 is employed for obtaining the SPMLE of the model parameter. The standard error (SE) estimate is obtained by the numerical differentiation approach, and the p-value is obtained by the asymptotic normality investigated in Section 3. The analysis results are demonstrated in Table 4 and Fig. 2. To examine how sensitive the standard error estimates to the choice of the tuning parameter, Table 4 reports standard error estimates for different values of k in ρn = kn−1/2 . It reveals that the variance estimates are less affected by the selected tuning parameters. From Table 4, we see that female is the common strong risk factor for the incidence of fracture and osteoporosis, while tea intake is a significant risk factor for fracture but

160

C.-C. Wen, Y.-H. Chen / Journal of Multivariate Analysis 143 (2016) 153–164

Table 4 Analysis results of fracture-osteoporosis data with ρn = kn−1/2 . k

Est. SE SE SE SE

0.5 1 2 5

β1 (mean numbers of fracture)

β2 (hazard of osteoporosis)

γ

Female

Coffee

Tea

Alcohol

Female

Coffee

Tea

Alcohol

0.995 0.113 0.113 0.113 0.115

−0.055

0.146 0.030 0.030 0.030 0.030

−0.010

1.043 0.104 0.104 0.105 0.106

0.082 0.050 0.050 0.049 0.048

−0.043

0.041 0.038 0.038 0.038 0.036

0.057 0.056 0.055 0.053

0.044 0.043 0.042 0.040

0.030 0.030 0.030 0.029

1.258 0.122 0.123 0.125 0.131

Fig. 2. Mean function of counts of fractures and osteoporosis-free probability for male and female, with the other covariates (coffee, tea, and alcohol intake) fixed at the sample means.

not for osteoporosis. The effects of coffee and alcohol intake on the occurrences of fracture are not statistically significant, neither are their effects on the onset of osteoporosis. The variance γ of the gamma frailty is estimated to be a relatively large value of 1.26, implying that the event time processes governing fracture and osteoporosis are highly associated. Fig. 2 shows the estimated gender-specific marginal (average over frailty) mean numbers of fracture and osteoporosis-free probability, both of which are expressed as functions of age, with the other covariates (coffee, tea, and alcohol intake) fixed at their sample means. 7. Concluding remarks There are rich studies on current status data, which is a special case (Type I) of interval censored data, for non-recurrent events [12]. There also exist many studies on panel count data, which can be viewed as interval-censored recurrent event data. However, rare attempt has been made to analysis of the two types of data simultaneously. In this work, we provide an approach to joint modeling and analysis of current count data, a special case of panel count data, and current status data. Our proposal allows for joint regression analysis of times to recurrent and non-recurrent events, and assessment of the association between the two event time processes in terms of frailty, based on type I interval censored data for the two types of events. A semiparametric maximum likelihood estimation procedure is developed under the joint modeling framework proposed. Joint analysis of two event time processes has been a recent focus in event time analysis. Huang and Wolfe [6] considered joint analysis of the failure and censoring times via a frailty model. Huang and Wang [5], Liu et al. [8], and Rondeau et al. [10] considered joint analysis for a recurring and a terminating event to address the issue of dependent censoring of the recurring

C.-C. Wen, Y.-H. Chen / Journal of Multivariate Analysis 143 (2016) 153–164

161

event by the terminating event. In these studies, data on time to either the recurrent or the non-recurrent event are only subject to right censorship. The novel joint analysis proposed in this work extends the scope of joint analysis to the setting where both the recurrent and non-recurrent event time processes are subject to type I interval censorship. The methodology developed in this work paves ways for the following future research. First, it would be of interest to extend the current method to joint analysis of general panel count data on a recurrent event, and general interval censored data on a non-recurrent event. Second, the proposed analysis is based on the assumption of the Poisson process for the recurrent event, and the assumption of the proportional hazards model for the non-recurrent event. Analysis under a more general modeling framework deserves further investigation. Acknowledgments We are grateful to the Associate Editor and referees for their every helpful suggestions, and to supports from Ministry of Science and Technology of Republic of China (MOST 103-2668-M-032-001-MY3 and NSC 101-2119-M-001-002-MY3). Appendix We use the notation Pn , P0 , and P for the expectations taken under the empirical distribution, the true underlying distribution, and a given model, respectively. For simplicity, the assumptions and the proofs for theories are presented under a simpler setting where the distribution of C is independent of (Z1 , Z2 ), although the proposed method can allow the dependent case. Assume Zj is dj -dimensional. The parameter space of θ (= (β1′ , β2′ , γ )′ ) is a compact subset Θ of Rd1 × Rd2 × (0, ∞), and the parameter space of Λ1 and Λ2 is a set L of all right-continuous non-decreasing functions that are uniformly bounded on the study period [0, τ ]. Let ζ = (θ , Λ1 , Λ2 ) and m(ζ ) = log L(ζ ). The asymptotic theories are based on the following assumptions. Note that the constraints on the parameter space and hence the condition (C3) are not really essential. They are imposed just for convenience in the proofs. (C1) The distribution of Zj , j = 1, 2, is not concentrated on any proper subspace of Rdj and has a bounded support. (C2) The examination time C has a continuous density with support [τ1 , τ2 ] where 0 < τ1 < τ2 < τ . (C3) The true parameter is ζ0 = (θ0 , Λ10 , Λ20 ) where θ0 is an interior point of its parameter space, and Λj0 is continuously differentiable and satisfies M −1 < Λj0 (τ1 ) < Λj0 (τ2 ) < M for j = 1, 2. Our proofs of the theorems are generally based on the techniques developed in Huang [4], van der Vaart [15], and Ma [9]. Specifically, after showing the identifiability of the model parameters, we apply the Wald’s theorem [15, p. 48] to prove the consistency of  ζ . Also, applying the skills established in van der Vaart and Wellner [16] and van der Vaart [15], we obtain the convergence rate of  ζ . To derive the asymptotic normality of  θ , we first obtain the efficient score for θ using techniques similar to Huang [4], and show the invertibility of the efficient Fisher information, i.e., the invertibility of the variance–covariance matrix of the efficient score [15, pp. 368–369]. Then, following the well-known empirical process theory of Donsker classes, we can readily establish the asymptotic normality of  θ. A.1. Identifiability Below we establish the identifiability of the parameters; that is, if m(ζ ) = m(ζ0 ) with probability 1, then θ = θ0 and Λj = Λj0 on [τ1 , τ2 ], j = 1, 2. Considering ∆ = 0 and N = 0 in m(ζ ) = m(ζ0 ), we have

(1 + γ eβ1 Z1 Λ1 (C ) + γ eβ2 Z2 Λ2 (C ))−γ

−1

−1

= (1 + γ0 eβ10 Z1 Λ10 (C ) + γ0 eβ20 Z2 Λ20 (C ))−γ0 .

From this display and (C1), it suffices to show that γ = γ0 and Λ1 = Λ10 on [τ1 , τ2 ] to establish the identifiability. We now prove γ = γ0 and Λ1 = Λ10 . Combining the considerations of (∆, Z1 ) = (0, 0) and (1, 0) in m(ζ ) = m(ζ0 ), we have

Γ (N + γ −1 ) N −1 γ Λ1 (C )N (1 + γ Λ1 (C ))−N −γ log − 1 Γ (γ )





 = log

Γ (N + γ0−1 ) Γ (γ0−1 )

γ

N 0 Λ10

For the both sides of (6), first take the derivatives with respect to C to obtain



dΛ1 dΛ10

 (C )

N

Λ1 (C )



1 + γN 1 + γ Λ1 ( C )



 =

N

Λ10 (C )



1 + γ0 N 1 + γ0 Λ10 (C )



,

and then take the conditional expectations, conditionally on C and Z1 = 0, to get



    Λ10 (C ) 1 + γ Λ10 (C ) Λ10 (C ) 1 + γ0 Λ10 (C ) (C ) − = − = 0. dΛ10 Λ1 (C ) 1 + γ Λ1 ( C ) Λ10 (C ) 1 + γ0 Λ10 (C ) dΛ1

This gives Λ1 = Λ10 and hence γ = γ0 by considering N = 0 and Λ1 = Λ10 in (6).

(C ) (1 + γ0 Λ10 (C )) N

−N −γ0−1

 . (6)

162

C.-C. Wen, Y.-H. Chen / Journal of Multivariate Analysis 143 (2016) 153–164

A.2. Consistency and rate of convergence Consistency. We apply Wald’s theorem [15, p. 48]. The parameter set for θ is compact and the parameter set for Λj is compact relative to the weak topology. Define w(ζ ) = log{[L(ζ ) + L(ζ0 )]/[2L(ζ0 )]}, then w(ζ ) is uniformly bounded. Also, by (C2), ζ → w(ζ )(Y) is continuous at ζ , relative to the product of the Euclidean and two weak topologies, for almost every Y = (C , N , ∆, Z1 , Z2 ). Because  ζ is the MLE, Pn w( ζ ) ≥ Pn {(1/2) log(L( ζ )/L(ζ0 )) + (1/2) log 1} ≥ 0 = Pn w(ζ0 ). On the other hand, by the concavity of g (u) ≡ log((u + 1)/2) and Jensen’s inequality, we have P0 w(ζ ) = P0 g (L(ζ )/L(ζ0 )) ≤ g (P0 [L(ζ )/L(ζ0 )]) = 0, and the equality holds only if θ = θ0 and Λj = Λj0 on (τ1 , τ2 ) from the identifiability of the P

parameters. Therefore, it follows directly from Wald’s theorem that  ζ → ζ0 under the above defined product topology, that P

j converges weakly to Λj0 in probability for j = 1, 2. Since each Λj0 is continuous on (τ1 , τ2 ), we further is,  θ → θ0 and Λ P

j (t ) → Λj0 (t ) for every τ1 < t < τ2 and j = 1, 2 by Theorem 4.3.1 of Chung [2]. have Λ Rate of convergence. Now we shall prove 1 − Λ10 ∥2 + ∥Λ 2 − Λ20 ∥2 }1/2 = OP (n−1/3 ), d∗ ( ζ , ζ0 ) ≡ {| θ − θ0 |2 + ∥Λ by applying Theorem 3.2.5 of van der Vaart and Wellner [16]. Here | · | is the Euclidean norm, ∥Λ∥2 = Λ2 (u)fC (u)du, and fC denotes the density of C . We introduce some definitions from van der Vaart [15]. Given two functions l and u, the bracket [l, u] is the set of all functions f with l ≤ f ≤ u. An ε -bracket in L2 (P ) = {f : Pf 2 < ∞} is a bracket [l, u] with P (u − l)2 < ε 2 . For a subclass C of L2 (P ), the bracketing number N[ ] (ε, C , L2 (P )) is the minimum number of ε -bracket needed to cover C . With the consistency and (C3), we restrict Λ1 and Λ2 to L0 = {Λj ∈ L|M −1 ≤ Λj (τ1 ) ≤ Λj (τ2 ) ≤ M }. Let Ψ = {m(ζ )|ζ ∈ Θ × L0 × L0 }. It is easy to see that each element in Ψ is uniformly bounded and satisfies P (m(ζ ) − m(ζ0 ))2 ≼ d∗ (ζ , ζ0 )2 , where ≼ means smaller δ than, up to a constant. Lemma A.1 gives the bracketing integral J (δ, Ψ , L2 (P )), defined as 0 (1 + log N[ ] (ε, Ψ , L2 (P )))−1/2 dε , is O(δ 1/2 ). Consequently, Lemma 19.36 of van der Vaart [15] gives



P∗

  √ δ 1/2 | n(Pn − P0 )(m(ζ ) − m(ζ0 ))| ≼ δ 1/2 1 + 2 √ , δ n d∗ (ζ ,ζ0 )<δ sup

where P ∗ is the outer expectation. By the inequality of Kullback–Leibler divergence [15, p. 62], E (m(ζ )(Y )|C , Z1 , Z2 ) is maximized at (θ0 , Λ10 (C ), Λ20 (C )). So, its first derivative is equal to zero there. Since (C , Z1 , Z2 ) has a bounded support, the parameter spaces are compact, and Λ1 and Λ2 are uniformly bounded from 0 and ∞, a Taylors’s expansion gives P0 (m(ζ ) − m(ζ0 )) ≼ −d∗ (ζ , ζ0 )2 . According to Theorem 3.2.5 of van der Vaart and Wellner [16], we complete the proof. Lemma A.1. log N[ ] (ε, Ψ , L2 (P )) = O(1/ε), where N[ ] is the bracket number. Uj

Lj

Proof. First consider the functions in Ψ for a fixed θ . Given the ε -brackets Λj ≤ Λj ≤ Λj , j = 1, 2, it is ready to get a

bracket (mL , mU ) for m(ζ ) where

 L

m ≡ log

Γ (N + γ −1 ) −1 U L U [γ eβ1 Z1 Λ11 (C )]N [(1 + γ eβ1 Z1 Λ1 1 (C ))−N −γ − (1 + γ eβ1 Z1 Λ1 1 (C ) Γ (γ −1 ) 

+ γ eβ2 Z2 Λ22 (C ))−N −γ ]∆ [(1 + γ eβ1 Z1 Λ1 1 (C ) + γ eβ2 Z2 Λ2 2 (C ))−N −γ ]1−∆ , L

 U

m

≡ log

−1

U

U

−1

Γ ( N + γ −1 ) −1 U L L [γ eβ1 Z1 Λ1 1 (C )]N [(1 + γ eβ1 Z1 Λ11 (C ))−N −γ − (1 + γ eβ1 Z1 Λ11 (C ) Γ (γ −1 ) 

+ γ eβ2 Z2 Λ2 2 (C ))−N −γ ]∆ [(1 + γ eβ1 Z1 Λ11 (C ) + γ eβ2 Z2 Λ22 (C ))−N −γ ]1−∆ . U

−1

L

L

Uj

−1

Lj

2 By the mean value theorem, we have |mL − mU |2 ≼ j=1 {(Λj − Λj ) (C )}. Thus, brackets for Λj of ∥ · ∥-size ε can translate into brackets for m(ζ ) of L2 (P )-size proportional to ε . By Example 19.11 of van der Vaart [15], we can cover the set of all Λj by exp(C /ε) brackets of size ε for some constant C . Next we allow θ to vary freely as well. Because θ is finite-dimensional and (∂/∂θ )m(ζ )(Y) is uniformly bounded in (ζ , Y), this increases the entropy only slightly. This completes the proof.

2

A.3. Efficient Fisher information Here we derive the efficient score for θ and establish the invertibility of efficient Fisher information, i.e., the invertibility of the variance–covariance matrix of the efficient score [15, pp. 368–369].

C.-C. Wen, Y.-H. Chen / Journal of Multivariate Analysis 143 (2016) 153–164

163

Efficient score. Let

 q  q A − A12 + (1 − ∆)Aq2−1 , − (1 + N γ )eβ1 Z1 ∆ 1 Λ1 (C ) A1 − A2   q A 2 ˜ 2 (ζ ) = (1 + N γ )eβ2 Z2 ∆ m − (1 − ∆)Aq2−1 , A1 − A2 ˜ 1 (ζ ) = m



N

where A1 = (1 + γ eβ1 Z1 Λ1 (C ))−N −γ , A2 = (1 + γ eβ1 Z1 Λ1 (C ) + γ eβ2 Z2 Λ2 (C ))−N −γ The scores for β1 , β2 and γ are given by −1

−1

, and q = (1 + N γ + γ )/(1 + N γ ).

˜ 2 (ζ ), mβ2 (ζ ) = Z2 m  −1 −1 ˙ ˙ ˙ 1 − A˙ 2 ˙2 Γ (γ ) Γ ( N + γ ) A A mγ (ζ ) = γ −2 − + N γ + ∆ + ( 1 − ∆ ) , Γ (γ −1 ) Γ ( N + γ −1 ) A1 − A2 A2    β Z ˙ 1 = A1 log(1 + γ eβ1 Z1 Λ1 (C )) − (1 + N γ ) γ e 1β 1ZΛ1 (C ) , and A˙ 2 = A2 log(1 + γ eβ1 Z1 Λ1 (C ) where Γ˙ (t ) = (d/dt )Γ (t ), A 1+γ e 1 1 Λ1 (C )  γ eβ1 Z1 Λ1 (C )+γ eβ2 Z2 Λ2 (C ) . The scores for Λ1 and Λ2 have the forms + γ eβ2 Z2 Λ2 (C )) − (1 + N γ ) 1+γ β Z β Z e 1 1 Λ (C )+γ e 2 2 Λ (C ) ˜ 1 (ζ ), mβ1 (ζ ) = Z1 m 

1

˜ 1 (ζ ), m1 (ζ )[h1 ] = h1 (C )m

2

˜ 2 (ζ ), m2 (ζ )[h2 ] = h2 (C )m

respectively, where h1 and h2 are functions in L2 ([τ1 , τ2 ]). Let m0 (ζ ) = (mβ1 (ζ )′ , mβ2 (ζ )′ , mγ (ζ ))′ . The efficient score for

θ is defined as

m∗ (ζ0 ) = m0 (ζ0 ) − m1 (ζ0 )[h∗1 ] − m2 (ζ0 )[h∗2 ], where h∗1 and h∗2 are d-dimensional (d = d1 + d2 + 1) vector functions satisfying P [(m0 (ζ0 ) − m1 (ζ0 )[h∗1 ] − m2 (ζ0 )[h∗2 ])(m1 (ζ0 )[h1 ] + m2 (ζ0 )[h2 ])] = 0,

(7)

for any h1 and h2 in L2 ([τ1 , τ2 ]). Throughout it should be understood that all operators involving h1 or h2 are applied componentwise to them. Consider h2 = 0 and h1 = 0 in (7), respectively, to get ∗



P (m0 (ζ0 )m1 (ζ0 )[h1 ]) = P ((m1 (ζ0 )[h∗1 ] + m2 (ζ0 )[h∗2 ])m1 (ζ0 )[h1 ]), P (m0 (ζ0 )m2 (ζ0 )[h2 ]) = P ((m1 (ζ0 )[h∗1 ] + m2 (ζ0 )[h∗2 ])m2 (ζ0 )[h2 ]), which implies that

˜ 1 (ζ0 )|C ) = h∗1 (C )E (m ˜ 1 (ζ0 )2 |C ) + h∗2 (C )E (m ˜ 1 (ζ0 )m ˜ 2 (ζ0 )|C ), E (m0 (ζ0 )m ˜ 2 (ζ0 )|C ) = h∗1 (C )E (m ˜ 1 (ζ0 )m ˜ 2 (ζ0 )|C ) + h∗2 (C )E (m ˜ 2 (ζ0 )2 |C ). E (m0 (ζ0 )m ˜ 1 ̸= m ˜ 2 ) > 0 shows the existence of (h∗1 , h∗2 ) with This together with the fact that P (m h∗1 (C ) = h∗2 (C ) =

˜ 1 (ζ0 )|C )E (m ˜ 2 (ζ0 )2 |C ) − E (m0 (ζ0 )m ˜ 2 (ζ0 )|C )E (m ˜ 1 (ζ0 )m ˜ 2 (ζ0 )|C ) E (m0 (ζ0 )m ˜ 1 (ζ0 )2 |C )E (m ˜ 2 (ζ0 )2 |C ) − E (m ˜ 1 (ζ0 )m ˜ 2 (ζ0 )|C )2 E (m ˜ 2 (ζ0 )|C )E (m ˜ 1 (ζ0 )2 |C ) − E (m0 (ζ0 )m ˜ 1 (ζ0 )|C )E (m ˜ 1 (ζ0 )m ˜ 2 (ζ0 )|C ) E (m0 (ζ0 )m ˜ 1 (ζ0 )2 |C )E (m ˜ 2 (ζ0 )2 |C ) − E (m ˜ 1 (ζ0 )m ˜ 2 (ζ0 )|C )2 E (m

, .

Invertibility of efficient Fisher information. We now prove that the efficient Fisher information I0 , defined as P0 (m∗ (ζ0 )m∗ (ζ0 )′ ), is positive definite. Let v ∈ Rd . Since v ′ I0 v = P0 (v ′ m∗ (ζ0 ))2 ≥ 0, it suffices to show that v ′ I0 v = 0 implies v = 0. Suppose v ′ I0 v = 0, then v ′ m∗ (ζ0 ) = 0 a.s. Adding up A2 (v ′ m∗ (ζ0 ))|(∆,Z1 ,Z2 )=(0,0,0) = 0 and (A1 − A2 )(v ′ m∗ (ζ0 ))|(∆,Z1 ,Z2 )=(1,0,0) = 0, we have



(1 + N γ0 )γ0 Λ10 (C ) v3 γ 0 − + N γ0 + log(1 + γ0 Λ10 (C )) − −1 −1 1 + γ0 Λ10 (C ) Γ (γ0 ) Γ (N + γ0 )   N 1 + N γ0 − v ′ h∗1 (C ) − = 0. Λ10 (C ) 1 + γ0 Λ10 (C ) −2

Γ˙ (γ0−1 )

Γ˙ (N + γ0−1 )



Write v = (v1′ , v2′ , v3 )′ and h∗1 = (h∗11 ′ , h∗12 ′ , h∗13 )′ , corresponding to (β1′ , β2′ , γ )′ . By the heterogeneity of N one concludes that v3 = 0 and v1′ h∗11 (C ) + v2′ h∗12 (C ) = 0. Hence v1 = v2 = 0 by the heterogeneity of C .

164

C.-C. Wen, Y.-H. Chen / Journal of Multivariate Analysis 143 (2016) 153–164

A.4. Asymptotic normality Define

  ∂  ∂  m (θ , Λ , Λ ), m (ζ )[ h ] = m0 (θ , Λ1 , Λ2ε ), 0 1ε 2 02 2 ∂ε ε=0 ∂ε ε=0   ∂  ∂  ˜ ˜ m11 (ζ )[h˜ 1 , h1 ] = m (θ , Λ , Λ )[ h ], m (ζ )[ h , h ] = m1 (θ , Λ1 , Λ2ε )[h˜ 1 ], 1 1ε 2 1 12 1 2 ∂ε ε=0 ∂ε ε=0   ∂  ∂  ˜ ˜ m21 (ζ )[h˜ 2 , h1 ] = m (θ , Λ , Λ )[ h ], m (ζ )[ h , h ] = m2 (θ , Λ1 , Λ2ε )[h˜ 2 ], 2 1ε 2 2 22 2 2 ∂ε ε=0 ∂ε ε=0 m01 (ζ )[h1 ] =

where (∂/∂ε)|ε=0 Λjε = hj , j = 1, 2. We first verify



1 , Λ 2 ) = oP (1). nP0 m∗ (θ0 , Λ

(8)

Apply Taylor expansion of m (θ0 , Λ1 , Λ2 )(Y) at (Λ10 (C ), Λ20 (C )), we can obtain ∗

P0 m∗ (θ0 , Λ1 , Λ2 ) = P0 m∗ (ζ0 ) + P0 m01 (ζ0 )[Λ1 − Λ10 ] + m02 (ζ0 )[Λ2 − Λ20 ] − m11 (ζ0 )[h∗1 , Λ1 − Λ10 ]



− m12 (ζ0 )[h∗1 , Λ2 − Λ20 ] − m21 (ζ0 )[h∗2 , Λ1 − Λ10 ] − m22 (ζ0 )[h∗2 , Λ2 − Λ20 ]   2  2 ∥Λj − Λj0 ∥ . + Op



j =1

Using the facts that P0 m (ζ0 ) = 0, P (m0 (ζ )mj (ζ )[hj ]) = −P (m0j (ζ )[hj ]) for j = 1, 2, P (mi (ζ )[h˜ i ]mj (ζ )[hj ]) = −P (mij (ζ ) ∗

j , we have P0 m∗ (θ0 , Λ 1 , Λ 2 ) = OP (n−2/3 ), which implies (8). [h˜ i , hj ]) for i, j = 1, 2, (7), and the rate of convergence of Λ

It is known from Example 19.11 of van der Vaart [15] that the class of uniformly bounded functions of bounded variations is a Donsker class. Applying Theorem 2.10.6 of van der Vaart and Wellner [16], it can be verified that {m∗ (ζ )|ζ ∈ Θ ×L0 ×L0 } is a uniformly bounded √ Donsker class; the proof of which is technical and hence omitted here. Combining this with the ζ )− m∗ (ζ0 )) = oP (1). Adding (8) and using the fact that P0 m∗ (ζ0 ) = Pn m∗ ( ζ ) = 0, consistency of  ζ leads to n(Pn − P0 )(m∗ ( we obtain

√ √ 1 , Λ 2 )) = nPn m∗ (ζ0 ) + oP (1). ζ ) − m∗ (θ0 , Λ − nP0 (m∗ ( By the mean value theorem, there exists θ˜ lying between  θ and θ0 such that   √ √ ∂ ∗ 1 , Λ 2 ) ( − nP0 m (θ˜ , Λ θ − θ0 ) = nPn m∗ (ζ0 ) + oP (1). ∂θ

By the consistency of  ζ and the fact that P0 [− ∂θ∂ m∗ (ζ0 )] = P0 [m∗ (ζ0 )m∗ (ζ0 )′ ](= I0 ), we have

√ d n( θ − θ0 ) = I0−1 nPn m∗ (ζ0 ) + oP (1) → N (0, I0−1 ).



References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]

S.C. Cheng, L.J. Wei, Inferences for a semiparametric model with panel data, Biometrika 87 (2000) 89–97. K.L. Chung, A Course in Probability Theory, Academic Press, New York, 1974. P. Groeneboom, J.A. Wellner, Nonparametric Maximum Likelihood Estimators for Interval Censoring and Deconvolution, Birkhatiser, Boston, 1992. J. Huang, Efficient estimation for the Cox model with interval censoring, Ann. Statist. 24 (1996) 540–568. C.-Y. Huang, M.-C. Wang, Joint modeling and estimation for recurrent event process and failure time data, J. Amer. Statist. Assoc. 99 (2004) 1153–1165. X. Huang, R.A. Wolfe, A frailty model for informative censoring, Biometrics 58 (2002) 510–520. K. Keiding, Age-specific incidence and prevalence:a statistical perspective, J. Roy. Statist. Soc. Ser. A 154 (1991) 371–412. L. Liu, R.A. Wolfe, X. Huang, Shared frailty models for recurrent events and a termina event, Biometrics 60 (2004) 747–756. S. Ma, Mixed case interval censored data with a cured subgroup, Statist. Sinica 17 (2010) 1165–1181. V. Rondeau, S. Mathoulin-Pelissier, H. Jacqmin-Gadda, V. Brouste, P. Soubeyran, Joint frailty models for recurring events and death using maximum penalized likelihood estimation: application on cancer events, Biostatistics 8 (2007) 708–721. W. Rudin, Functional Analysis, McGraw-Hill, New York, 1973. J. Sun, The Statistical Analysis of Interval-censored Failure Time Data, Springer-Verlag, New York, 2006. J. Sun, J.D. Kalbfleisch, Estimation o fthe mean function of point processes based on panel count data, Statist. Sinica 5 (1995) 279–290. J. Sun, X. Zhao, The Statistical Analysis of Panel Count Data, Springer, New York, 2013. A.W. van der Vaart, Asymptotic Statistics, Cambridge Univ. Press, New York, 1998. A.W. van der Vaart, J.A. Wellner, Weak Convergence and Empirical Processes, Springer-Verlag, New York, 1996. J.A. Wellner, Y. Zhan, A hybrid algorithm for computation of the semiparametric maximum likelihood estimator from censored data, J. Amer. Statist. Assoc. 92 (1997) 945–959. J.A. Wellner, Y. Zhang, Two estimators of the mean of a counting process with panel count data, Ann. Statist. 28 (2000) 779–814. J.A. Wellner, Y. Zhang, Two likelihood-based semiparametric estimation methods for panel count data with covariates, Ann. Statist. 35 (2007) 2106–2142. C.C. Wen, Y.H. Chen, Assessing age-at-onset risk factors with incomplete covariate current status data under proportional odds models, Stat. Med. 32 (2013) 2001–2012.