Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
Competing risk model with bivariate random effects for clustered survival data Xin Lai a,b , Kelvin K.W. Yau c,∗ , Liu Liu d a
School of Management, Xi’an Jiaotong University, China
b
JC School of Public Health & Primary Care and Shenzhen Research Institute, Chinese University of Hong Kong, China
c
Department of Management Sciences, City University of Hong Kong, Hong Kong
d
School of Mathematics and V.C. & V.R. Key Lab, Sichuan Normal University, China
article
info
Article history: Received 19 November 2015 Received in revised form 4 January 2017 Accepted 8 March 2017 Available online xxxx Keywords: Bivariate random effects Competing risk Cause-specific hazard model GLMM
abstract Competing risks are often observed in clinical trial studies. As exemplified in two data sets, the bone marrow transplantation study for leukaemia patients and the primary biliary cirrhosis study, patients could experience two competing events which may be correlated due to shared unobservable factors within the same cluster. With the presence of random hospital/cluster effects, a cause-specific hazard model with bivariate random effects is proposed to analyse clustered survival data with two competing events. This model extends earlier work by allowing random effects in two hazard function parts to follow a bivariate normal distribution, which gives a generalized model with a correlation parameter governing the relationship between two events due to the hospital/cluster effects. By adopting the GLMM formulation, random effects are incorporated in the model via the linear predictor terms. Estimation of parameters is achieved via an iterative algorithm. A simulation study is conducted to assess the performance of the estimators, under the proposed numerical estimation scheme. Application to the two sets of data illustrates the usefulness of the proposed model. © 2017 Elsevier B.V. All rights reserved.
1. Introduction Competing risks arise when individuals are exposed to a number of failure causes and the occurrence of one type of event removes other type of events from being observed; see for example in Kalbfleisch and Prentice (2002) and Pintilie (2006). Over the past decades, two classes of competing risk models have been intensively studied through Cox’s proportional hazards models; one is to model the cumulative incidence function which focuses on the cumulative probability of the occurrence of a given event (Fine and Gray, 1999; Kim, 2007; Ha et al., 2016), and the other is referred to as the cause-specific hazard models which analyse the effects of covariates on a particular cause of failure (Prentice et al., 1978; Andersen and Borgan, 1985; Keiding et al., 2001; Lee et al., 2014). In this paper, our interest is the modelling of cause-specific hazard for multi-centre competing risk data. Under multicentre setting, individuals within a centre are often affected by some unobservable characteristics of the centre. Thus, the observed outcomes within the same centre may be correlated due to such a shared characteristic across individuals (Lai and Yau, 2010). In many previous studies, the dependence structure among outcomes was accommodated by incorporating
∗
Correspondence to: Department of Management Sciences, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong. Fax: +852 3442 0189. E-mail address:
[email protected] (K.K.W. Yau).
http://dx.doi.org/10.1016/j.csda.2017.03.011 0167-9473/© 2017 Elsevier B.V. All rights reserved.
1
2 3 4 5 6 7 8 9 10 11 12
2
X. Lai et al. / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
21
the random effects into the underlying statistical model; see Stiratelli et al. (1984) and Zeger et al. (1988). Lee et al. (2014) introduced the random effects into the cause-specific hazard model to describe the within-centre correlation, in which the proportional hazards assumption was used. From the perspective of censoring, the occurrence of competing risk could be taken as one of the censoring mechanisms (Li et al., 2008). Huang and Wolfe (2002) proposed an informative censoring frailty model to study the multi-cluster competing risks data, where the dependence between two competing events and the correlation within cluster were simultaneously modelled by normally distributed random effects. However, in these studies, the random effects between competing events are assumed either independent (Lee et al., 2014) or perfect linearly correlated (Huang and Wolfe, 2002). In practice, the random effects between competing events may share some commonality because they originate from the same cluster. This motivates our study of a competing risk model with more general correlation in random effects between competing events. In a standard competing risk model with two causes of failure, our work of this paper introduces bivariate random effects into the cause-specific hazard model, where random effects between two competing events are assumed to follow a bivariate normal distribution. As special cases, it reduces to the settings in Lee et al. (2014) and Huang and Wolfe (2002) when the correlation parameter between two competing events is fixed as zero and ±1 respectively. Through such extension, we can obtain an estimation of the correlation coefficient, which represents the association between two competing risks due to the cluster effect. In Section 2, the cause-specific competing risk model with bivariate random effects is introduced. The estimation procedure for regression and variance component parameters is outlined in Section 3. In Section 4, a simulation study is conducted to evaluate the performance of the competing risk model with bivariate random effects. In Section 5, the bone marrow transplantation and the primary biliary cirrhosis (PBC) data sets, which motivate our study, are analysed to illustrate the applicability of proposed model. Further discussions are presented in Section 6.
22
2. Cause-specific hazard model with bivariate random effects
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
23 24 25 26
27
28 29 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44
Suppose that the observed data with censoring are collected from M hospitals (or clusters). In each hospital, we assume that there are K distinct causes of event. Let Tij∗ denote the underlying time to the first event for patient j in hospital i and let εij ∈ (1, . . . , K ) be the corresponding cause of event. For the usual right censored data, we observe Tij = min(Tij∗ , Cij ), where Cij is the censoring time. Given covariate Xij , the cause-specific hazard function for cause k at time t is defined as Pr(t ≤ Tij∗ < t + ∆t , εij = k|Tij∗ ≥ t , Xij ) . (1) λk t ; Xij = lim ∆t →0 ∆t We further define δij = I (Tij∗ ≤ Cij )εij with I (·) denote the indicator function, thus δij ∈ (0, 1, . . . , K ). When the causes of M event are known for all patients, the observed data consist of (Tij , δij , Xij , ), j = 1, . . . , ni , i = 1, . . . , M and i=1 ni = N. For simplicity, this paper considers two causes of events (k = 1 or 2). In the standard cause-specific hazard analysis (Prentice et al., 1978), the hazard function for one cause (e.g. cause 1) is assumed to be independent of another cause (e.g. cause 2). In other words, the events from cause 2 are treated as independently censored data when analysing the effect of risk factors on the hazard function for cause 1, and vice versa. However, the events in competing risk scenario are very often correlated. For example, the graft-versus-host disease (GVHD) was found to be associated with relapse or treatment-related mortality in bone marrow transplant study (Lee et al., 2014). Furthermore, the dependence structure within the same cluster may also induce the correlation between two competing events (Katsahian et al., 2006). In this study, the random effects are employed to describe the correlation of hazard functions between two competing events. Specifically, we follow the Generalized Linear Mixed Model (GLMM) method (McGilchrist, 1994) to introduce random effects into both cause-specific proportional hazards functions (Cox, 1972)
λ1 t ; Xij = λ01 (t ) exp(x′ij β1 + Ui ) λ2 t ; Xij = λ02 (t ) exp(x′ij β2 + Vi )
where xij is the vector covariate, βk , k = 1 or 2, is the corresponding factor effect on hazard function for failure event k, and Ui and Vi , i = 1, . . . , M are random effects of hospital i on event 1 and event 2 respectively which are assumed to follow the bivariate normal distribution N (0, Σ ) with
45
46 47 48 49 50 51
(2)
θ1 Σ= ρ θ1 θ2
ρ θ1 θ2 θ2
.
The positive correlation (ρ > 0) implies that the patient with high hazard of experiencing one type of event is likely to have high chance in developing another type of failure event; while the negative correlation suggests that the increase in the hazard of one event may lower the risk of another event. When ρ = −1 or +1, the cause-specific competing risk model degenerates to the model in Huang and Wolfe (2002); when ρ = 0, it degenerates to the cause-specific model (Prentice et al., 1978) with independent random effects (Lee et al., 2014). Therefore, the proposed model can flexibly handle the correlation ranging from −1 to +1.
X. Lai et al. / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
3
Let u = (U1 , . . . , UM )′ , v = (V1 , . . . , VM )′ , and a = (u′ , v ′ )′ . Vector a follows normal distribution with mean zero and covariance matrix
Ω=
θ1 IM ρ θ1 θ2 IM
ρ θ1 θ2 IM
.
θ2 IM
3
2 {λ0k (t ) exp ηkij }I (δij =k) exp{−Λ0k (t ) exp ηkij }
(3)
t
where Λ0k (t ) = 0 λ0k (s)ds denotes the cumulative baseline hazard function for cause k. We arrange the N failure/censoring times in increasing order and denote the q(k) distinct event times for cause k by t(1(k)) < t(2(k)) < · · · < t(q(k)) . Following the marginal likelihood argument in Kalbfleisch and Prentice (1973) and Kuk and Chen (1992), the log-likelihood with random effects being conditionally fixed, l1 , can be rewritten exp ηi(k)
log
k=1 i=1
s∈R(t(i(k)) )
(4)
exp ηs
where ηi(k) is the linear predictor corresponding to t(i(k)) and R(t ) = {j : Tj ≥ t } is the risk set at t −. The baseline hazard functions can be cancelled out in the conditional partial likelihood function l1 , which simplifies the estimation procedure. 3. Estimation procedure
X′ βˆ 1 β10 βˆ β20 0 2 = + I− 1 Z ′ uˆ u0 0 v0 vˆ
X
0 I = ′ Z 0
0 0 0 1 X ′ ∂ l1 /∂η1 ∂ l /∂η − 0 1 2 θ1 θ2 1 − ρ 2 u0 θ2 − v0 ρ θ1 θ2 Z′ v0 θ2 − u0 ρ θ1 θ2
0 X ′ −∂ 2 l1 /∂η1 ∂η1′ 0 −∂ 2 l1 /∂η2 ∂η1′ Z′
0 1 0 + θ1 θ2 1 − ρ 2 0 0
(5)
0 0 0 0 0 0 0 0
−∂ 2 l1 /∂η1 ∂η2′ −∂ 2 l1 /∂η2 ∂η2′
X 0
0 0
0 0
0
θ2 IM −ρ θ1 θ2 IM
0
0 X
Z 0 0 0
0 Z
I
=
T11 T21 T31
11
12
13 14
16 17 18 19 20 21
22
23 24
25
. −ρ θ1 θ2 IM θ1 IM
26
Furthermore, following the derivations in McGilchrist (1993), simplified version of the above derivatives ∂ l1 /∂η1 , ∂ l1 /∂η2 , ∂ 2 l1 /∂η1 ∂η1′ and ∂ 2 l1 /∂η2 ∂η2′ can be obtained; details can be found in Appendix A. (ii) I−1 is partitioned conformably to β1 |β2 | a with a = (u′ , v ′ )′
9 10
−1
8
where β10 , β20 , u0 , v0 are initial values of β1 , β2 , u, v , respectively, and are replaced by their updated estimates in each iteration, for given values of θ1 , θ2 and ρ . I is the negative second derivative of l with respect to β1 , β2 , u and v . In specific, ′
7
15
To estimate the effects of risk factors on each cause-specific hazard function (βk ), the conditionally fixed random cluster effects (u, v) and the variance component parameters (θ1 , θ2 , ρ), the estimation procedure can be performed under GLMM framework (McGilchrist, 1994; McGilchrist and Yau, 1995; Yau and Kuk, 2002; Yu and Yau, 2012), which involves an iterative numerical procedure: (i) For given values of θ1 , θ2 and ρ , the estimates of β1 , β2 , u and v can be obtained by maximizing the log-likelihood l = l1 + l2 in (3) and (4), which is performed via Newton–Raphson iteration
5
6
1 l2 = − (M log 2π |Ω | + a′ Ω −1 a), 2
q(k) 2
4
i,j
k=1
l1 =
2
Define linear predictors η1ij = x′ij β1 + Ui and η2ij = x′ij β2 + Vi . Then the BLUP type log-likelihood can be written as the sum of following two terms l1 = log
1
T12 T22 T32
T13 T23 T33
27 28 29
(6)
30
4
1 2 3
X. Lai et al. / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
where the corresponding block diagonal matrices T11 and T22 provide the asymptotic variance for βˆ 1 and βˆ 2 respectively. Then following the results in Lai and Yau (2008) for the GLMM method, given aˆ = (ˆu′ , vˆ ′ )′ and the block diagonal matrix T33 , the REML estimates of θ1 ,θ2 and ρ can be obtained by solving the following estimating equations:
4
tr Ω −1
∂Ω ∂ Ω −1 + tr (T33 + aˆ aˆ ′ ) =0 ∂θ1 ∂θ1
(7)
5
tr Ω −1
∂Ω ∂ Ω −1 + tr (T33 + aˆ aˆ ′ ) =0 ∂θ2 ∂θ2
(8)
6
tr Ω −1
∂Ω ∂ Ω −1 + tr (T33 + aˆ aˆ ′ ) = 0. ∂ρ ∂ρ
(9)
7 8 9
10 11 12 13 14 15
Further simplification of REML estimators for the variance component parameters and their asymptotic standard errors can be found in Appendix B. (iii) Repeat steps (i) and (ii) until convergence criteria are fulfilled. Remark 1. If there are k > 2 competing events, the multivariate normal distribution with dimension k can be used to specify the correlation among those competing events. Above procedure could be adapted to estimate the parameters in the model with multivariate normal random effects. Specifically, the algorithm for regression parameters is the same as that in bivariate random effect model. The REML estimates of the variance component parameters could be obtained by equations analogous to (7)–(9). The analytic solutions similar to (B4–B6) may not be obtainable and the Newton–Raphson algorithm can be employed to obtain the estimates.
17
Remark 2. The estimation procedure is essentially a likelihood-based method. The fixed effects (βs ) and random effects (uv) are estimated by maximizing the BLUP type log-likelihood function l = l1 + l2 in which random effects are conditionally
18
fixed. It was shown that difference between βˆ from l and βˆ ml from marginal likelihood lm = log
16
19 20 21 22 23 24 25
is Op (N −1 ) and thus βˆ is a consistent estimator of β with βˆ − β = Op N − 2
1
27
4. Simulation
30 31 32 33 34 35
37 38 39 40 41 42 43 44 45 46 47 48
1
1 ′
(Lee and Nelder, 1996). For frailty model, the
(Murphy and van der Vaart, 2000).
A simulation study is conducted to assess the performance of the cause-specific competing risk model with bivariate random effects. Data are simulated under a multi-centre clinical trial setting where the random effects on two competing hazard functions for each patient are correlated. In the simulation, we assume 20 clinics and 10 patients in each clinic. Five patients are randomly assigned to treatment group (xij = 1) and the other five patients are in the control group (xij = 0), where i = 1, . . . , 20; j = 1, . . . , 10. For each patient in clinic i, we generate the random effects Ui and Vi from a bivariate normal distribution with zero mean, where θ1 = θ2 = 1.0, with correlation ρ . In this simulation, we consider two competing events, i.e. k = 1 or 2. For patients who experienced failure event k, the failure time is generated from the probability density function λ0k (t ) exp ηkij S0k (t )exp ηkij , where the baseline hazard λ0k (t ) is assumed to follow the exponential distribution with t
36
M
l1 (2π )− 2 |Ω |− 2 e− 2 a Ω a da
1996). Carrying the asymptotic property of profile likelihood estimator, the derived estimator θˆ1 , θˆ2 , ρˆ from lP is consistent estimator with Op N − 2
29
ˆ (Ha et al., 2001). As consistency of βˆ also holds and the upper left-hand corner of I−1 is the consistent estimator of var(β) pointed out by Lee and Nelder (1996), variance components (θ , θ , ρ) can be estimated by using the profile log-likelihood 1 2 function lP = l + 21 log 2π I−1 |β=β, ˆ a=ˆa . It can be shown that the first-order derivative of lP with respect to θ1 , θ2 , ρ gives the estimating equations (7)–(9) (Ha et al., 2001). Notice that lP is the first-order Laplace approximation of marginal ˆ a = aˆ and the difference between l + 1 log 2π I−1 and lm is Op (N −1 ) (Lee and Nelder, likelihood lm evaluated at β = β, 2
26
28
1
mean equal to 100 for event 1 and mean equal to 500 for event 2, and S0k (t ) = e− 0 λ0k (s)ds is the baseline survival function. Furthermore, we set β1 = −1.4, β2 = 1.2 with censoring time C generated from the uniform distribution on (50, 500). The correlation ρ is chosen to be −1, −0.8, −0.6, −0.4, −0.2, 0, 0.2, 0.4, 0.6, 0.8 and 1, in order to evaluate the performance of the estimators with different correlation levels. With a specified ρ , 1000 replications are simulated in each setting. Results of the simulation are presented in Table 1. We report the average bias, SE and SD, where SE and SD are defined as the average of the standard error of the estimates and the standard error of the estimates over 1000 simulations respectively. From this table, it is observed that the estimates of regression parameters in two hazard function parts perform reasonably well under different bivariate correlation settings, considering that the estimation is obtained based on the semi-parametric method. For the variance component parameter estimates, the bias in estimation of θ1 , θ2 and ρ is also relatively small. In general, results confirm the applicability of the proposed cause-specific hazard model with correlated random effects. Comparing SE and SD, except for ρ = 1 and −1, the accuracy of the estimated standard errors are generally acceptable, though slightly underestimated in some cases. At the two boundaries of correlation (ρ = 1 and −1), overestimation of the standard error is observed. In particular, the sample standard errors SD based on 1000 correlation estimates are very small,
X. Lai et al. / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
5
Table 1 Bias and standard error of estimators for cause-specific hazard model with bivariate random effects. Parameter
True value
Average Bias
SE
SD
−1.4
0.003 0.045 0.049 0.045 0.057
0.257 0.312 0.418 0.438 0.117
0.255 0.325 0.466 0.489 0.047
0.001 0.011 0.001 −0.002 0.001
0.255 0.301 0.402 0.423 0.168
0.248 0.313 0.405 0.463 0.142
0.015 0.017 0.016 −0.020 0.013
0.254 0.298 0.408 0.418 0.216
0.265 0.316 0.443 0.425 0.213
0.011 0.028 −0.009 −0.012 0.034
0.252 0.293 0.401 0.422 0.254
0.249 0.308 0.419 0.448 0.254
−0.033 0.024 −0.010 −0.010 0.042
0.254 0.285 0.403 0.423 0.277
0.254 0.308 0.473 0.432 0.312
0.008 −0.005 −0.027 −0.039 0.041
0.247 0.282 0.397 0.415 0.291
0.252 0.273 0.424 0.462 0.329
0.011 0.010 −0.044 −0.071 0.035
0.247 0.279 0.392 0.406 0.290
0.251 0.285 0.415 0.427 0.321
0.018 0.009 −0.062 −0.066 0.058
0.246 0.273 0.387 0.407 0.274
0.240 0.287 0.417 0.411 0.301
−0.000 −0.011 −0.066 −0.079 0.036
0.244 0.268 0.387 0.401 0.241
0.249 0.271 0.416 0.433 0.269
−0.004 0.029 −0.015 −0.004 0.058
0.240 0.269 0.402 0.427 0.194
0.233 0.272 0.437 0.484 0.205
Simulation 1
β1 β2 θ1 θ2 ρ
1.2 1.0 1.0 1.0
Simulation 2
β1 β2 θ1 θ2 ρ
−1.4 1.2 1.0 1.0 0.8
Simulation 3
β1 β2 θ1 θ2 ρ
−1.4 1.2 1.0 1.0 0.6
Simulation 4
β1 β2 θ1 θ2 ρ
−1.4 1.2 1.0 1.0 0.4
Simulation 5
β1 β2 θ1 θ2 ρ
−1.4 1.2 1.0 1.0 0.2
Simulation 6
β1 β2 θ1 θ2 ρ
−1.4 1.2 1.0 1.0 0.0
Simulation 7
β1 β2 θ1 θ2 ρ
−1.4 1.2 1.0 1.0 −0.2
Simulation 8
β1 β2 θ1 θ2 ρ
−1.4 1.2 1.0 1.0 −0.4
Simulation 9
β1 β2 θ1 θ2 ρ
−1.4 1.2 1.0 1.0 −0.6
Simulation 10
β1 β2 θ1 θ2 ρ
−1.4 1.2 1.0 1.0 −0.8
(continued on next page)
6
X. Lai et al. / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
Table 1 (continued) Parameter
True value
Average Bias
SE
SD
−1.4
−0.014 −0.001
0.242 0.259 0.416 0.430 0.138
0.248 0.264 0.463 0.485 0.077
Simulation 11
β1 β2 θ1 θ2 ρ
1.2 1.0 1.0 −1.0
0.025 0.023 0.078
4
which may be due to the limited one-way fluctuation on correlation estimates at boundaries (i.e., the correlation estimates can only be larger than −1 or smaller than 1) but the estimation method for standard error still assume two-way variation. Therefore, the bootstrap estimates for the standard error of correlation parameter could be used when ρˆ is close to the boundaries.
5
5. Numerical examples
6
5.1. Bone marrow transplantation for leukaemia patients
1 2 3
42
For leukaemia patients, bone marrow transplantation is one of the most commonly used treatments. A multi-centre study of leukaemia patients receiving allogeneic bone marrow transplantation was conducted at four hospitals in Australia and the United States, from March 1, 1984 to June 30, 1989. The recovery following transplantation is quite a complex process since the graft-versus-host disease (GVHD) may develop during this period (Klein and Moeschberger, 1997). Of 137 patients with acute myelocytic leukaemia (AML) or acute lymphoblastic leukaemia (ALL) enrolled in the study, at one of the four hospitals, 82 patients died or relapsed. Hence death/relapse becomes the competing cause of GVHD (Zhou et al., 2012). In this study, several potential risk factors were recorded, such as AML low-risk, AML high-risk, recipient and donor sex, recipient and donor cytomegalovirus immune status (CMV), recipient and donor age, waiting time from diagnosis to transplantation, and, for AML patients, their French–American–British (FAB) classification based on standard morphological criteria. In addition, the methotrexate (MTX), a commonly used prophylaxis of GVHD, was given to a part of patients, in order to alleviate the risk of GVHD. Details of this study can be found in Copelan et al. (1991). The survival data within the same hospital may be correlated due to unobservable shared commonality among patients, thus the cluster effect in the competing risk analysis cannot be ignored. We apply the cause-specific hazard model with bivariate random effects to this Bone Marrow Transplantation data to investigate the effects of risk factors on the hazard of two competing events: Death/relapse and GVHD. Covariates AML lowrisk, AML high-risk, donor’s age, donor’s sex, donor’s CMV status, FAB and MTX are considered in both hazard functions of death/relapse and GVHD. As a comparison, we also fit the standard cause-specific hazard model (Prentice et al., 1978) to the bone marrow transplantation data, which is equivalent to the standard Cox regression models for death/relapse and GVHD respectively. Results from Table 2.1 showed that patients with low risk AML had significantly lower risk of death/relapse than other patients in both bivariate random effects model (p-value = 0.033) and standard cause-specific hazard model (p-value = 0.011), in which the hazard ratios were 0.372 (exp{−0.988}) and 0.314 (exp{−1.157}) respectively. The MTX was found to have a significant side effect in raising the risk of death/relapse in standard cause-specific hazard (p-value = 0.024), and patients receiving MTX were expected to experience a hazard of more than two times (exp{0.745}) as those who did not receive the prophylaxis of GVHD. However, such effect became insignificant when bivariate random effects were introduced into the model (p-value = 0.240), suggesting that the dependence between competing events could be an important confounder in analysing the hazard of death/relapse. For the hazard of GVHD, low risk AML was identified as significant risk factor with hazard ratio equal to 0.474 (exp{−0.747}) in proposed model and 0.461 (exp{−0.775}) in standard model. In addition, patients with older or female donors showed higher risk of developing GVHD in both models, which are consistent with the empirical findings in Kollman et al. (2001). The estimates of variance components suggested that the hospital had a moderate effect on the hazard of death/relapse (θˆ1 = 0.444) and had almost no effect on the hazard of GVHD (θˆ2 = 0.010). The high correlation was observed (ρˆ = 0.933) and the standard error estimation from bootstrap method suggested that the correlation between two sets of random effects is reasonable in modelling the dependence of competing events. The positive correlation coefficient implying that patients who have high risks of death/relapse in the same hospital will also face high risks of developing GVHD.
43
5.2. Multi-centre primary biliary cirrhosis data
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
44 45 46
Primary biliary cirrhosis (PBC) is a chronic cholestatic liver disease with a probable autoimmune aetiology. One purpose of medical treatment in PBC is to slow progression of the disease to cirrhosis with its attendant complications thereby improving patient survival or at least prolonging the time to liver transplantation (Prince and Jones, 2000). A multi-centre
X. Lai et al. / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
7
Table 2.1 Parameter estimates (standard error) for the bone marrow transplantation study data. Parameter
Death/Relapse
GVHD
* ** a
Low risk AML High risk AML Donor’s age Donor’s sex Donor’s CMV FAB MTX
Proposed model
Standard model
Estimate
(Standard Error)
Estimate
(Standard Error)
−0.988
(0.464)* (0.463) (0.017) (0.339) (0.315) (0.374) (0.756)
−1.157
(0.453)* (0.453) (0.017) (0.338) (0.319) (0.361) (0.332)*
0.076 −0.001 0.118 −0.209 0.436 0.888
Low risk AML High risk AML Donor’s age Donor’s sex Donor’s CMV FAB MTX
0.028 −0.578 −0.025 0.057 −0.279
(0.312)* (0.359) (0.014)* (0.248)* (0.247) (0.307) (0.345)
θ1 θ2 ρ
0.444 0.031 0.933
(0.572) (0.122) (0.506)a
−0.747 −0.540
0.002 −0.005 0.151 −0.213 0.509 0.745
−0.775 −0.547 0.027 −0.582 −0.036 0.066 −0.334
(0.310)* (0.358) (0.014)** (0.248)* (0.248) (0.305) (0.292)
p-value < 0.05. p-value < 0.1. Standard error obtained from 1000 bootstrap samples.
randomized clinical trial was conducted in six European hospitals with follow up for six years which aimed to investigate the effect of Cyclosporin A (CyA) on patient survival (Lombard et al., 1993). Between 1 Jan. 1983 and 1 Jan. 1987, three hundred forty-nine patients with PBC were randomized to receive CyA (176 patients) or placebo (173 patients). A number of clinical, biochemical and histological variables, including serum bilirubin, serum albumin, sex, age were recorded for each patient. At the end of study, 61 patients died, another 29 received liver transplanted and 4 patients were lost to follow-up before study end. The purpose of the trial was to study the effect of treatment on the survival time which was measured in days. However, during the course of the trial an increased use of liver transplantation for patients with this disease made the investigators redefine the primary endpoint as death or liver transplantation (Andersen and Skovgaard, 2010). Therefore, the competing risk model becomes a more reasonable model to analyse the treatment effect on the two outcomes of the competing events. Furthermore, when taking hospitals as random clusters, it is natural to include random effects into the cause-specific hazard model. Similar to the settings described in Section 5.1, we fit the two survival models to the PBC data to investigate the treatment effect on the hazard of two competing events: Death and Liver Transplant. Because the liver function, which was indirectly measured via biochemical markers such as bilirubin (BILI), albumin (ALB), alkaline phosphatase (ALKPH) and aspartate transaminase (AST), could influence the effect of treatment and was observed to have imbalanced distribution between treatment group (TRT = 1) and placebo group (TRT = 0). The analytic results in Andersen and Skovgaard (2010) showed that ALB and BILI also had significant effect on patient’s survival. Following the previous setting (Andersen and Skovgaard, 2010), we put TRT, ALB, the logarithm of BILI (LOG-BILI) and the confounder age in both cause-specific hazard models. After removing the 6 missing data in ALB, the sample size in the analysis is 343. In particular, 255 patients did not experience any of the two competing events, i.e., 74.3% are censored; 60 patients died (17.5%) and 28 patients received liver transplant (8.2%). From the results in Table 2.2, it was observed that the treatment Cyclosporin A can significantly reduce the hazard of death by 36.6% (1 − exp{−0.456}) when hazard of liver transplant was assumed to be independent of death (standard cause-specific hazard model). After introducing the correlation between these two competing events into the cause-specific hazard model, the hazard reduction of treatment was adjusted to 35.9% (1 − exp{−0.445}) with marginal significance (p-value = 0.097). For the event of liver transplant, patients received Cyclosporin A had significantly lower hazard (p-value = 0.079) than those received placebo and the hazard ratio between these two treatment groups was 0.482 (exp{−0.730}) when correlation between competing events was considered. But in the standard cause-specific hazard model, the treatment Cyclosporin A did not show such significant effect. The estimates of variance components suggested that the hospital had almost no effect on the hazard of death (θˆ1 = 0.010) and slight effect on the hazard of liver transplant (θˆ2 = 0.180). The positive correlation (ρˆ = 0.808) in bivariate random hospital effects indicated that the patient who has high hazard of liver transplant will correspondingly have high hazard of death. For the hazard of death, both bivariate random effects model and standard model provided similar results of effect size estimation, which may be due to the small cluster effects, implying that the proposed model can provide generally consistent results when the introduced clustered effects are practically small.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
8
X. Lai et al. / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
Table 2.2 Parameter estimates (standard error) for the multi-centre primary biliary cirrhosis study data. Parameter
Death
Transplant
* ** a
1
Proposed model
Standard model
Estimate
(Standard Error)
Estimate
(Standard Error)
TRT AGE ALB log (BILI)
−0.445
(0.269)** (0.016)* (0.029)* (0.137)*
−0.456
(0.268)* (0.016)* (0.029)* (0.134)*
TRT AGE ALB log (BILI)
−0.730 −0.048 −0.086
−0.656 −0.048 −0.096
1.301
(0.416)** (0.022)* (0.039)* (0.231)*
θ1 θ2 ρ
0.010 0.180 0.808
(0.078) (0.324) (0.415)a
0.084 −0.069 1.028
0.082 −0.071 1.012
1.204
(0.410) (0.022)* (0.039)* (0.211)*
p-value < 0.05. p-value < 0.1. Standard error obtained from 1000 bootstrap samples.
6. Discussion
36
In this study, we have introduced the bivariate random effects into the cause-specific hazard model for accommodating the correlation between two competing events as well as the dependence among individuals within the same cluster. The proposed cause-specific hazard model is particularly useful since (i) competing risks are commonly present in clinical trials and medical studies, (ii) these studies are usually conducted in multi-centre scheme which naturally induce the random cluster effects and (iii) the induced random effects between competing events are likely to be correlated. Comparing with the fixed correlation assumption model as in Huang and Wolfe (2002), the proposed model is more flexible in the sense that the correlation between competing events could vary from −1 to 1 and can be estimated by using the GLMM method (McGilchrist, 1993, 1994; McGilchrist and Yau, 1995). The simulation study in Section 4 shows that the average biases in regression parameters and variance components parameters are generally small and the estimated standard errors are close to the standard error of the estimates over simulation, indicating that the estimation procedure can provide accurate estimates for effect size of risk factors and the estimates for corresponding standard errors. From the simulation results, a notable overestimation for the standard error of the estimate of correlation coefficient is observed when true correlation is equal to 1 or −1, which may be due to the fact that the practical correlation estimates at boundaries can only vary in one direction but the estimation method for standard error still assume two-way variation. Therefore, other methods such as bootstrap should be used to obtain an accurate estimate of standard error for correlation coefficient, especially when there is a perfect linear correlation between the random effects in two competing events. Application to two datasets in Section 5 demonstrates the applicability of the proposed model. The introduction of the extra correlation parameter between competing events offers further flexibility in conducting data analysis for causespecific model with correlated random effects. In particular, the two applications reveal positive correlation between the two sets of random effects in the proposed model. The application results for data showed that covariate effects could be interpreted differently when correlation between two sets of random cluster effects is non-negligible. In the bone marrow transplant study, MTX was found having significant effect on increasing hazard of death/relapse when standard causespecific hazard model was adopted, but such effect became insignificant when correlation between two competing events was introduced. In the primary biliary cirrhosis study, the treatment effect was not significant in the standard model, but such treatment effect turned to be marginally significant on reducing the hazard of transplant when correlated hospital random effects were introduced to accommodate the dependence between competing events. When the cluster effects are small, both bivariate random effects model and standard cause-specific hazard model give similar results. In case the cluster effects are not ignorable, the proposed model and estimation procedure offer further flexibility to handle cluster random effects in the model. When there are more than two competing events, the correlation between those competing events could be specified by random effects with multivariate normal distribution. The estimation procedure in Section 3 could be adapted to estimate the parameters in the competing risk model with general multivariate normal random effects. The simplified analytic solutions similar to (B4–B6) may not be obtainable and the Newton–Raphson algorithm can be applied to obtain the estimates, but it will inevitably increase computational burden when bootstrap method is used to find the standard error of correlation estimates.
37
Acknowledgements
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
38 39 40
The authors would like to thank the Associate Editor and referees for very helpful comments. This research is supported in part by the National Natural Science Foundation of China (Grant # 81201817) and the Research Grants Council of Hong Kong (Grant # T32-102/14N).
X. Lai et al. / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
9
Appendix A. Supplementary data Supplementary material related to this article can be found online at http://dx.doi.org/10.1016/j.csda.2017.03.011. References Andersen, P.K., Borgan, Ø, 1985. Counting process models for life history data: A review (C/R: P141-158). Scand. J. Stat. 12, 97–140. Andersen, P.K., Skovgaard, L.T., 2010. Regression with Linear Predictors. Springer, New York, USA. Copelan, E.A., Biggs, J.C., Thompson, J.M., et al., 1991. Treatment for acute myelocytic leukaemia with allogeneic bone marrow transplantation following preparation with Bu/Cy2. Blood 78, 838–843. Cox, D.R., 1972. Regression models and life tables (with discussion). J. R. Stat. Soc. Ser. B 34, 187–220. Fine, J.P., Gray, R.J., 1999. A proportional hazards model for the subdistribution of a competing risk. J. Amer. Statist. Assoc. 94, 496–509. Ha, I.D., Christian, N.J., Jeong, J.H., Park, J., Lee, Y., 2016. Analysis of clustered competing risks data using subdistribution hazards models with multivariate frailties. Stat. Methods Med. Res. 25, 2488–2505. Ha, I.D., Lee, Y., Song, J.K., 2001. Hierarchical likelihood approach for frailty models. Biometrika 88, 233–243. Huang, X., Wolfe, R.A., 2002. A frailty model for informative censoring. Biometrics 58, 510–520. Kalbfleisch, J.D., Prentice, R.L., 1973. Marginal likelihoods based on Cox’s regression and life model. Biometrika 60, 267–278. Kalbfleisch, J.D., Prentice, R.L., 2002. The Statistical Analysis of Failure Data, second ed. Wiley, New York, USA. Katsahian, S., Resche-Rigon, M., Chevret, S., Porcher, R., 2006. Analysing multicenter competing risks data with a mixed proportional hazards model for the subdistribution. Stat. Med. 25, 4267–4278. Keiding, N., Klein, J.P., Horowitz, M.M., 2001. Multi-state models and outcome prediction in bone marrow transplantation. Stat. Med. 20, 1871–1885. Kim, H.T., 2007. Cumulative incidence in competing risks data and competing risks regression analysis. Clin. Cancer Res. 13, 559–565. Klein, J.P., Moeschberger, M.L., 1997. Survival Analysis: Techniques for Censored and Truncated Data. Springer, New York, USA. Kollman, C., Howe, C.W., Anasetti, C., et al., 2001. Donor characteristics as risk factors in recipients after transplantation of bone marrow from unrelated donors: the effect of donor age. Blood 98, 2043–2051. Kuk, A.Y.C., Chen, C.H., 1992. A mixture model combining logistic regression with proportional hazards regression. Biometrika 79, 531–541. Lai, X., Yau, K.K.W., 2008. Long-term survivor model with bivariate random effects: Application to bone marrow transplant and carcinoma study data. Stat. Med. 27, 5692–5708. Lai, X., Yau, K.K.W., 2010. Extending the long-term survivor mixture model with random effects for clustered survival data. Comput. Statist. Data Anal. 54, 2103–2112. Lee, M., Ha, I.D., Lee, Y., 2014. Frailty modelling for clustered competing risks data with missing cause of failure. Stat. Methods Med. Res. http://dx.doi.org/ 10.1177/0962280214545639, OnlineFirst. Lee, Y., Nelder, J.A., 1996. Hierarchical generalized linear models (with discussion). J. R. Stat. Soc. Ser. B 58, 619–678. Li, Y., Tian, L., Wei, L.J., 2008. Estimating subject-specific dependent competing risk profile with censored event time observations. Biometrics 64, 1–18. Lombard, M., Portmann, B., Neuberger, J., et al., 1993. Cyclosporin a treatment in primary biliary cirrhosis: results of a long-term placebo controlled trial. Gastroenterology 104, 519–526. McGilchrist, C.A., 1993. REML estimation for survival models with frailty. Biometrics 49, 221–225. McGilchrist, C.A., 1994. Estimation in generalized mixed models. J. R. Stat. Soc. Ser. B 56, 61–69. McGilchrist, C.A., Yau, K.K.W., 1995. The derivation of BLUP, ML, REML estimation methods for generalized linear mixed models. Comm. Statist. Theory Methods 24, 2963–2980. Murphy, S.A., van der Vaart, A.W., 2000. On profile likelihood. J. Amer. Statist. Assoc. 95, 449–465. Pintilie, M., 2006. Competing Risks: A Practical Perspective. Wiley, Chishester, UK. Prentice, R., Kalbfleisch, J.D., Peterson, A.V., et al., 1978. The analysis of failure times in the presence of competing risks. Biometrics 34, 541–554. Prince, M.I., Jones, D.E.J., 2000. Primary biliary cirrhosis: new perspectives in diagnosis and treatment. Postgrad. Med. J. 76, 199–206. Stiratelli, R., Laird, N., Ware, J., 1984. Random effects models for serial observations with binary responses. Biometrics 40, 961–971. Zeger, S.L., Liang, K.Y., Albert, P.S., 1988. Models for longitudinal data: a generalised estimating equation approach. Biometrics 44, 1049–1060. Zhou, B., Fine, J., Latouche, A., Labopin, M., 2012. Competing risks regression for clustered data. Biostatistics 13, 371–383. Yau, K.K.W., Kuk, A.Y.C., 2002. Robust estimation in generalised linear mixed models. J. R. Stat. Soc. Ser. B 64, 101–117. Yu, D., Yau, K.K.W., 2012. Conditional Akaike information criterion for generalized linear mixed models. Comput. Statist. Data Anal. 56, 629–644.
1
2
3
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38