A hybrid approach to predicting events in clinical trials with time-to-event outcomes

A hybrid approach to predicting events in clinical trials with time-to-event outcomes

Contemporary Clinical Trials 32 (2011) 755–759 Contents lists available at ScienceDirect Contemporary Clinical Trials j o u r n a l h o m e p a g e ...

535KB Sizes 0 Downloads 11 Views

Contemporary Clinical Trials 32 (2011) 755–759

Contents lists available at ScienceDirect

Contemporary Clinical Trials j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / c o n c l i n t r i a l

A hybrid approach to predicting events in clinical trials with time-to-event outcomes Liang Fang, Zheng Su ⁎ Genentech Inc., South San Francisco, CA 94080, USA

a r t i c l e

i n f o

Article history: Received 22 March 2011 Received in revised form 16 May 2011 Accepted 25 May 2011 Available online 30 May 2011 Keywords: Clinical trial Event prediction Parametric Non-parametric Change point

a b s t r a c t In many clinical trials with time-to-event outcomes there are interim analyses planned at prespecified event counts. It is of great value to predict when the pre-specified event milestones can be reached based on the available data as the timeline for a study is essential to the study sponsors and data monitoring committees for logistic planning purposes. Both parametric and non-parametric approaches exist in the literature for estimating the underlining survival function, based on which the predictions of future event times can be determined. The parametric approaches assume that the survival function is smooth, which is often not the case as the survival function usually has one or multiple change points and the hazard functions can differ significantly before and after a change point. The existing non-parametric method bases predictions on the Kaplan–Meier survival curve appended with a parametric tail to the largest observation, and all of the available data is used to estimate the parametric tail. This approach also requires a smooth survival function in order to achieve an accurate estimate of the tail distribution. In this article, we propose a hybrid parametric, non-parametric approach to predicting events in clinical trials with time-to-event outcomes. The change points in the survival function are first detected, and the survival function before the last change point is estimated non-parametrically and the tail distribution beyond the last change point is estimated parametrically. Numerical results show that the proposed approach provides accurate predictions for future event times and outperforms the existing approaches. © 2011 Elsevier Inc. All rights reserved.

1. Background In many clinical trials with time-to-event outcomes there are interim analyses planned at pre-specified event counts. It is of great value to predict when the pre-specified event milestones can be reached based on available data as the timeline for a study is essential to the study sponsors and data monitoring committees involved for logistic planning purposes. Both parametric and non-parametric approaches have been proposed in the literature to estimate the underlining survival function for the patient population under study, which is essential to the predictions of arrival times of future events. The parametric approaches are based on either the exponential model for the survival function as proposed by ⁎ Corresponding author. Tel.: + 1 650 467 1190; fax: + 1 650 742 8619. E-mail address: [email protected] (Z. Su). 1551-7144/$ – see front matter © 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.cct.2011.05.013

Bagiella and Heitjan [1] or the Weibull distribution as in Ying and Heitjan [2]. The parametric approaches assume that the underlining survival function is smooth and can be accurately estimated by a parametric function with 1 or 2 parameters, which are often not the case as the survival functions usually have one or multiple changes points and the hazard functions can differ significantly before and after a change point. For example, for hospitalized patients with a major cardiovascular adverse event (MACE) the risk of having another MACE is of the greatest during the weeks post the first event while the patient is still in the hospital. The risk will be lower between the release from the hospital and about one year post the event, and it will be the lowest beyond one year of the first event (see e.g. Ref. [3]). As a result, the survival function for patients who just had a MACE can have multiple change points. Another example is given in Goodman et al. [4] where they identified two change points in

756

L. Fang, Z. Su / Contemporary Clinical Trials 32 (2011) 755–759

the survival function for prostate cancer mortality for men diagnosed between 1973 and 2002. Ying et al.[5] proposed a non-parametric method that bases predictions on the Kaplan–Meier [6] survival curve. This non-parametric approach does not assume a smooth survival function but the stepwise estimate for the tail part of the survival function has relatively larger variances and is thus unreliable. In addition, the Kaplan–Meier estimate does not extend beyond the largest observation available. Ying et al. considered appending a parametric tail to the largest observation that declines to zero and they proposed to use all of the available data to estimate the parametric tail. This approach does not address the issue that the tail part of the Kaplan–Meier estimate is unreliable. More importantly, as for the parametric approaches the estimation of the parametric tail is based on all of the available data and as a result will be inaccurate when the survival function is not smooth. The rest of the paper is structured as follows. In Section 2, we propose a hybrid parametric, non-parametric two-step approach to predicting events in clinical trials with time-toevent outcomes. In the first step, the change point algorithm proposed by Goodman et al.[4] for approximating the survival function with piecewise exponential distributions will be applied to the available data to identify the change points in the underlying survival function. In the second step, the survival function before the final change point will be estimated by the Kaplan–Meier estimate, and the tail of the survival function beyond the final change point will be estimated parametrically and extrapolated via the exponential model. We also generalized the change point detection approach of Goodman et al.[4] to piecewise Weibull distributions, which has the exponential distribution as a special case. This hybrid approach takes advantage of the fact that the non-parametric Kaplan–Meier estimate provides a relatively more accurate estimate for the initial part of the survival function. More importantly, the parametric estimate for the tail distribution is obtained with only the data that falls in the final piece of the survival function. When no change points are detected the proposed algorithm essentially reduces to either the parametric or non-parametric approaches available in the literature and thus the proposed approach can be viewed as a generalization of the existing approaches. Numerical results based on simulated clinical trials are given in Section 3, which show that the proposed approach can accurately predict when the pre-specified event counts can be reached and is thus useful in practice. Some discussions and concluding remarks are given in Section 4. 2. Methods

another m(b M) events can be solved from the equation below: M

∑ Pðan event between ti and ti + T j no event up to time ti Þ

i=1

M

= ∑

i=1

ð1Þ

As a result, in cases where some of the survival functions S (ti) are relatively small the estimate for T may have a large bias if the tail of the survival function S(t) cannot be accurately estimated. Similarly, if a trial is still open to enrollment and up to K additional patients may be enrolled in the study with expected arrival times a1 ≤ … ≤ aK, the expected time T can be solved from the equation below: M



i=1

K ðT Þ Sðti Þ−Sðti + T Þ + ∑ f1−SðT−ai Þg = m: Sðti Þ i=1

ð2Þ

where K(T) denotes the number of ai's that are no larger than T. Goodman et al.[4] proposed an approach to detecting multiple change points in survival functions and to approximating a survival function with piecewise exponential distributions. Let X1, …, Xn denote independent and identically distributed survival times, C1, …, Cn be the censoring times which are assumed to be independent of the survival times. Only the pairs (Ti, δi), i = 1, …, n are observed, where Ti = min(Xi, Ci) and δi = I{Ti ≤ Ci}. They proposed to use the following change point model to estimate the survival function S(t): 8 α1 > > < α2 λðt Þ = > > : ⋮ αk + 1

0≤t≤τ1 τ1 ≤t≤τ2 t N τk ;

where λ(t) is the hazard function, 0 = τ0 b τ1 b … b τk+1 = ∞ are the change points, k is the number of change points in the model that will be determined in a data-driven way, and αj is the value of the hazard function between the time points τj − 1 and τj. Goodman et al. proposed to obtain the estimates for the parameters via maximizing the following log profile likelihood function:  i k+1 h     log L α1 ; …; αk + 1 ; τ1 ; …; τk = ∑ X τj −X τj−1 logαj j=1   n k+1 − ∑ ∑ αj Ti ∧τj −Ti ∧τj−1 ; i=1

An accurate estimate for the tail of the survival function is essential to the prediction for the arrival of future events. For example, suppose that at a certain point of a clinical trial a total of N patients have been enrolled and enrollment is closed for the study. Suppose that n events have been observed at this point and there are still M patients in the study who have been followed for the time ti, i = 1, …, M and have not experienced an event yet. Given the survival function S(t) the expected time T that it takes to observe

Sðti Þ−Sðti + T Þ = m: Sðti Þ

j=1

where X(t) is the number of events observed up to time t. They showed that the maximum likelihood estimates τˆ j 's for τj's can be obtained via maximizing     0 1    o X τj −X τj−1   A; ∑ X τj −X τj−1 log @ j=1 ∑ni= 1 Ti ∧τj −τj−1 I Ti N τj−1

k+1 n

L. Fang, Z. Su / Contemporary Clinical Trials 32 (2011) 755–759

ˆ i 's for αi's are given as follows: and the estimates α     X τˆ j −X τˆ j−1 ˆj =   : α ∑ni= 1 Ti ∧ τˆ j − τˆ j−1 I Ti N τˆ j−1 ˆ j is simply the number of events observed Intuitively, α between τˆ j−1 and τˆ j divided by the total length of follow-up from all the patients between τˆ j−1 and τˆ j . Goodman et al. proposed a sequential testing procedure to determine the number of change points in the survival function. They start by testing the null hypothesis that there are no change points versus the alternative hypothesis that there is one change point. More specifically, for H0 : α1 = α2 to use a Wald type statistic versus H12: α1 ≠ α2 they proposed  ˆ 1 −α ˆ 2 = Var α ˆ 1 −α ˆ 2 which asymptotically follows a chiα square distribution with one degree of freedom as Yao [7] showed that the estimates of αj and τj are asymptotically independent. If the null hypothesis is rejected they will next test the null hypothesis that there is one change point versus the alternative hypothesis that there are two change points in the model. The procedure will continue in this fashion until a null hypothesis cannot be rejected. They proposed to use the alpha spending approach proposed by Lan and DeMets [8] to control overfitting. The advantage of fitting a piecewise parametric model instead of a single parametric model for the survival function is that a piecewise model can model the tail distribution well as it uses only the data that falls in the final piece of the survival function (i.e. only (Ti, δi) with Ti N τk) to estimate the tail distribution. A single parametric model fits the initial part of the survival function well but its prediction of the tail is usually inaccurate when the survival function is not smooth. Both parametric and non-parametric estimations of the survival functions have advantages and disadvantages. In particular, the Kaplan–Meier non-parametric estimate provides accurate fitting for the initial part of the survival function whereas a parametric estimate can flexibly model the tail of the distribution which also provides an extrapolation beyond the largest follow-up time observed. In order to borrow their strengths and avoid their weaknesses we propose to estimate the underlining survival function via the following hybrid parametric, non-parametric two-step approach: 1. Use a piecewise exponential model to detect the change points in the survival function based on the available data. 2. If at least one change point is detected, estimate the survival function in the following way: ( Sˆ ðt Þ =

Sˆ ðt ÞKM 0≤t≤τ   ˆ ðt−τÞ t N τ; Sˆ ðτÞKM exp − α

ð3Þ

where Sˆ KM denotes the Kaplan–Meier estimate of the survival function S(t), τ denotes the last change point ˆ is the corresponding hazard estimated for detected, and α the survival function beyond the final change point τ. If there are no change points detected, either the singlepiece exponential fit or the Kaplan–Meier curve appended with an exponential tail fitted using all the data can be used to estimate the survival function.

757

With the Sˆ ðt Þ obtained via this two-step approach the expected time T that is needed to achieve the desired number of events can be solved from Eqs. (1) and (2) when enrollment is closed and open, respectively. A piecewise Weibull model, which has the exponential model as a special case, can be used to estimate the underlying survival function in Step 1 of the 2-step approach. More specifically, we can generalize the piecewise exponential approach by Goodman et al. [4] to the following Weibull piecewise non-constant hazard model:

λðt Þ =

8 > > > < > > > :

β −1

α1 β1 ðα1 t Þ 1 β −1 α2 β2 ðα2 t Þ 2

0≤t≤τ1 τ1 ≤t≤τ2

⋮  αk + 1 βk + 1 αk + 1 t βk + 1 −1

t N τk ;

where αi is the scale parameter and βi is the shape parameter in the 2-parameter Weibull distribution between change points τi − 1 and τi. The log profile likelihood function can be written as k+1 h

   i   ∑ X τj −X τj−1 log αj βj

j=1

    ∑ βj −1 δi log αj Ti ∧τj −Ti ∧τj−1

k + 1

n

+ ∑

i=1

j=1

 β j ∑ αj Ti ∧τj −Ti ∧τj−1 :

n

k+1

i=1

j=1

−∑

The maximum likelihood estimates for the parameters in the Weibull model do not have closed-form solutions and will need to be calculated via some numerical optimization methods. Once the estimates are obtained the same sequential testing approach as in Goodman et al. can be applied to determine the number of change points in the model. 3. Results To investigate the performance of the proposed method, we conducted two simulation studies with 500 simulated clinical trials with time-to-event outcomes per study. In the first simulation study, each simulated clinical trial consisted of 300 patients with a 6-month enrollment period and a 12month total study duration. For each patient, the enrollment date was randomly generated from an uniform distribution, U (0, 6), assuming that the enrollment date is uniformly distributed during the 6-month enrollment period. The time-to-event variable, survival time, was randomly generated from a two-piece exponential distribution with hazard rates of 0.2 for t ≤ 4 months and 0.13 for t N 4 months. Thus, the event date of each patient was the enrollment date plus the survival time. If the event date occurred after the final study date, Month 12, the survival time was censored at Month 12 and the actual time on study was 12 minus enrollment date. As a result, 76% of the patients were expected to have an event by the end of the study. We performed simulation studies for various scenarios with multiple change points and present the results for the scenario with a single change point in this section as it is sufficient to illustrate the fundamental difference between the proposed approach and the existing ones.

758

L. Fang, Z. Su / Contemporary Clinical Trials 32 (2011) 755–759

To create an interim dataset, we cut the data at Month 7. If the event date occurred after Month 7, the survival time was censored at Month 7 and the actual time to event was 7 minus the enrollment date. As a result, 51% of the patients were expected to have an event by the time of the interim analysis. The objective of this simulation study was to predict the number of events at Month 12 using the interim data accumulated up to Month 7. We used 4 methods described in Section 2 to predict the number of events at the end of the study. In the first two methods, a Weibull model and a Kaplan–Meier curve were fitted respectively to the interim data to estimate the survival function titS(t). An exponential tail was appended to the end of the Kaplan–Meier curve to extend the estimated survival function beyond the largest observation at the interim analysis. In the third method, we estimated S(t) with a twopiece exponential model as in Goodman et al. [4]. In the fourth method, we obtained the change point estimate as described in Section 2 and then estimated S(t) as in Eq. (3). Once S(t) was estimated, the number of events at Month 12 was calculated as described in Eq. (1). The estimates of the change point and the hazard rates in the two-piece exponential model were generally unbiased (Table 1). The proposed method and the piecewise exponential method of Goodman et al. [4] performed very similarly and outperformed the other two methods in the accuracy of the predictions (Table 2). More specifically, both the proposed method and the piecewise exponential method had little bias in the prediction of the event numbers at the end of the study. Both the Weibull method and the Kaplan– Meier method tended to over predict the number of events. Considering that both of these two methods under estimated the survival function after Month 7 because of their incapability to detect the change point in the survival function, the biases in the predicted number of events were expected. In order to have a visual impression of these estimation methods the survival functions estimated from the interim data for one simulated clinical trial and the true survival function are shown in Fig. 1. In this simulated clinical trial, the tail distribution obtained from both the Weibull model and the exponential model underestimated the survival function after Month 7. We also included the survival curve as estimated from the exponential model to show that, in this simulated clinical trial, 1) there was minimal improvement in the estimation of the survival function by using a 2-parameter Weibull model over a single parameter exponential model; and 2) the interim Kaplan–Meier estimate appended with an exponential tail at the largest observation still underestimated the survival function. The proposed method and the method of Goodman et al. [4], on the other hand, provided

Table 1 Parameter estimates of the piecewise exponential model in the first simulation study.

True value Median (Q1, Q3) (5th, 95th)

τ

λ1

λ2

4.0 4.0 (3.5, 4.6) (1.0, 6.8)

0.20 0.20 (0.19, 0.21) (0.16, 0.23)

0.13 0.13 (0.11, 0.14) (0.08, 0.19)

Table 2 Percent bias (as measured by 100 × (predicted # of events − observed # of events) / observed # of events) of the event prediction in the first simulation study.

Median (Q1, Q3) (5th, 95th)

Weibull

Kaplan–Meier

Goodman et al.

Hybrid

6.2 (0.9, 5.1) (0.1, 12.8)

1.9 (0.9, 5.1) (− 4.8, 8.9)

−0.3 (− 1.5, 0.8) (− 5.9, 2.9)

−0.2 (− 1.9, 1.1) (− 6.6, 3.0)

accurate estimates of the tail distribution and overall provided estimates of the survival function that were closer to the true survival function. A major advantage of the hybrid method is that a large part of the survival function is estimated with the Kaplan– Meier method that requires no assumption of the underlying data distribution. This feature makes the proposed method more appealing compared to fully parametric methods, especially when the assumption of the data distribution is violated. We conducted a second simulation study to further evaluate the robustness of the proposed method when the underlying survival function follows a piecewise Weibull distribution. In this study, 1000 patients were enrolled in a uniform fashion within 12 months. The time-to-event variable T was randomly generated from a two-piece Weibull distribution with scale and shape parameters of 0.15 and 1.5, respectively, for the first piece (T ≤ 4 months), and 0.02 and 1, respectively, for the second piece (T N 4 months). Similar to the first simulation study, all patients without an event before the study cutoff date (Month 12) were censored at Month 12. The proposed method along with the other three methods was used to predict the number of events at Month 12 based on the interim data at Month 7. The results (Table 3) show that the proposed method was still unbiased, while the piecewise exponential method of Goodman et al. [4] had a large downward bias. This bias was largely due to the poor model fitting of the exponential model to the actual data for T ≤ 4. Thus, when the underlying distribution of the data deviated from the piecewise exponential distribution, the proposed method is still robust since a large part of the survival function is estimated with the non-parametric Kaplan–Meier method. However, fully parametric methods, such as the piecewise exponential method of Goodman et al. [4], performed poorly when the distribution assumption on the underlying data was violated. Again, both the Weibull method and the Kaplan–Meier method tended to over predict the number of events because of their incapability to detect the change point in the survival function. 4. Discussions and conclusions Clinical trials with time-to-event outcomes usually include one or more planned interim analyses, and it is common to schedule analyses at the occurrence of specified number of events which primarily determine the power of the study. It is important to both the study sponsors and the data monitoring committees involved in the study to have a good understanding of the event milestones because interim analyses can impose considerable logistical burdens which requires a significant amount of resources. In this article, a

100

100

L. Fang, Z. Su / Contemporary Clinical Trials 32 (2011) 755–759

60 20

40

Percent Survival (%)

80 60 40 0

0

20

Percent Survival (%)

interim KM with exponential tail hybrid true survival

80

interim KM with exponential tail exponential Weibull piecewise exponential true survival

759

0

2

4

6

8

10

12

0

2

Months

4

6

8

10

12

Months

Fig. 1. Estimated survival curves based on the interim data versus the true survival curve for a simulated trial.

Table 3 Percent bias (as measured by 100 × (predicted # of events − observed # of events) / observed # of events) of the event prediction in the second simulation study. Weibull

estimate for the tail distribution. This is a limitation for any approach though as an accurate prediction of future events is simply unobtainable without a good understanding of the tail of the underlying survival function.

Kaplan–Meier Goodman et al. Hybrid

Median 15.8 7.1 (Q1, Q3) (13.5, 17.5) (4.9, 9.2) (5th, 95th) (11.1, 19.8) (1.7, 11.2)

−3.8 (− 6.1, −1.5) (− 11.0, 3.6)

0.1 (− 2.2, 2.6) (− 5.9, 5.8)

hybrid parametric, non-parametric approach to predicting events in clinical trials with time-to-event outcomes is proposed, which hybridizes the advantages of the existing parametric and non-parametric approaches in the literature. The proposed approach is not limited to smooth survival functions and is particularly useful when the survival function is piecewise smooth with change points. Simulation results show that the proposed approach provides accurate predictions of future event milestones and outperforms the existing approaches when the underlying survival function is not smooth. One limitation with the proposed approach is that it requires some data beyond the final change point of the survival function to fit a parametric tail to the survival function. When an interim analysis is performed too early and the final change point in the survival function has not surfaced in the interim estimate of the survival function the proposed approach will not be able to have an accurate

Acknowledgments The authors thank the anonymous reviewers for their insightful comments and suggestions, which have led to an improved paper. References [1] Bagiella E, Heitjan DF. Predicting analysis times in randomized clinical trials. Stat Med 2001;20:2055–63. [2] Ying GS, Heitjan DF. Weibull prediction of event times in clinical trials. Pharmaceut Statist 2008;7:107–20. [3] Cannon CP, Braunwald E, McCabe CH, Rader DJ, Rouleau JL, Belder R, et al. Intensive versus moderate lipid lowering with statins after acute coronary syndromes. N Engl J Med 2004;350:1495–504. [4] Goodman MS, Li Y, Tiwari RC. Detecting multiple change points in piecewise constant hazard functions. J Appl Stat 2011, in press. [5] Ying GS, Heitjan DF, Chen TT. Nonparametric prediction of event times in randomized clinical trials. Clin Trials 2004;1:352–61. [6] Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Amer Statist Assn 1958;53:457–81. [7] Yao YC. Maximum likelihood estimation in hazard rate models with a change-point. Commun Stat-Theor M 1986;15:2455–66. [8] Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika 1983;70:659–63.