Impact of informative censoring on the treatment effect estimate of disability worsening in multiple sclerosis clinical trials

Impact of informative censoring on the treatment effect estimate of disability worsening in multiple sclerosis clinical trials

Multiple Sclerosis and Related Disorders 39 (2020) 101865 Contents lists available at ScienceDirect Multiple Sclerosis and Related Disorders journal...

625KB Sizes 0 Downloads 21 Views

Multiple Sclerosis and Related Disorders 39 (2020) 101865

Contents lists available at ScienceDirect

Multiple Sclerosis and Related Disorders journal homepage: www.elsevier.com/locate/msard

Original article

Impact of informative censoring on the treatment effect estimate of disability worsening in multiple sclerosis clinical trials

T

Katherine Riestera, , Ludwig Kapposb, Krzysztof Selmajc, Stacy Lindborga, Ilya Lipkovichd,1, Jacob Elkinsa ⁎

a

Biogen, 225 Binney St., Cambridge, MA 02142, USA Neurologic Clinic and Policlinic, Departments of Medicine, Clinical Research, and Biomedicine and Biomedical Engineering, University Hospital, Basel, Switzerland c Neurology Center Lodz, Lodz, Poland d IQVIA, Durham, NC, USA b

ARTICLE INFO

ABSTRACT

Keywords: Multiple sclerosis Informative censoring Confirmed disability worsening Missing data Clinical trials Simulation study

Objective: To examine the impact of missing data when evaluating the confirmed disability worsening (CDW) endpoint in multiple sclerosis clinical trials and explore analytical methods for handling censored participants (those with missing confirmation data). Methods: CDW risk factors were assessed among participants with an initial disability worsening (≥ 1.0-point increase in Expanded Disability Status Scale [EDSS] score from a baseline score of ≥ 1.0; ≥ 1.5-point increase from a baseline of 0) using data from the DECIDE trial of daclizumab beta. A post-hoc simulation study was performed to evaluate three strategies for imputing confirmation status in censored participants: assume all were confirmed; assume none were confirmed (standard analytical approach); or use an observed rate multiple imputation (ORMI) approach based on treatment group and similar participant risk factors. Simulation study results were used to evaluate pre-specified analyses in DECIDE. Results: In DECIDE, larger change from baseline to initial disability worsening in EDSS score (p = 0.0003), higher baseline EDSS score (p = 0.0013), age (p = 0.004), and preceding relapse (p < 0.0001) were associated with 12-week CDW. In the simulation study, relative to the full dataset (no missing data), the strategy of assuming no censored participants were confirmed underestimated the treatment effect, and the strategy of assuming all censored participants were confirmed overestimated the treatment effect (hazard ratio 0.749 and 0.713 vs 0.733). ORMI correctly estimated treatment effect and increased study power by ~5–10% compared with the standard analytical approach. Conclusion: The ORMI approach based on CDW risk factors minimizes bias and is expected to provide the most accurate treatment effect estimate for the CDW endpoint.

1. Introduction Confirmed disability worsening (CDW) is a key endpoint for clinical trials of participants with relapsing-remitting multiple sclerosis (RRMS) (Gold et al., 2013; Kappos et al., 2015; Lavery et al., 2014; Polman et al., 2006). Analysis of 12-week CDW requires that an initial disability worsening be confirmed after 12 weeks (or 24 weeks for 24-week CDW) (Lavery et al., 2014). Although most participants return to the clinic for their confirmation assessment, some withdraw prior to the confirmation

visit or develop disability worsening at study end. Traditionally in the analysis of CDW these participants are censored when their confirmation data are incomplete (Fox et al., 2012; Gold et al., 2012; Kappos et al., 2010; Polman et al., 2006). Treatment effects on CDW are often evaluated using time-to-event and survival analyses. One underlying assumption of these analytical methods is that of non-informative censoring (reason for censoring is independent of the event): participants who withdraw prior to confirmation have, in principle, the same CDW risk as similar participants

Abbreviations: CDW, confirmed disability worsening; CI, confidence interval; DAC BETA, daclizumab beta; EDSS, Expanded Disability Status Scale; HR, hazard ratio; IFN, interferon; MS, multiple sclerosis; OR, odds ratios; ORMI, observed rate multiple imputation; PFS, progression-free survival; RRMS, relapsing-remitting multiple sclerosis ⁎ Corresponding author. E-mail address: [email protected] (K. Riester). 1 Employee of IQVIA, Durham, NC, USA during the design and conduct of this study; current employee of Eli Lilly and Company, Indianapolis, IN, USA. https://doi.org/10.1016/j.msard.2019.101865 Received 22 July 2019; Received in revised form 11 October 2019; Accepted 22 November 2019 2211-0348/ © 2019 Published by Elsevier B.V.

Multiple Sclerosis and Related Disorders 39 (2020) 101865

K. Riester, et al.

who remain in the study (Bland and Altman, 1998; Prinja et al., 2010). Such an assumption may be defensible when censoring is unrelated to outcome (e.g., moving to a new city); however, this assumption is highly suspect if censoring is related to outcome (e.g., disability worsening) and rates are imbalanced between treatment groups (Bland and Altman, 1998; Campigotto and Weller, 2014; Dumville et al., 2006; Shih, 2002). Informative censoring may introduce bias into the efficacy analysis and potentially mask true treatment differences (Bell et al., 2014; Campigotto and Weller, 2014). Understanding the impact of missing confirmation data is critical for properly evaluating the CDW endpoint. Our aim was to examine the relationship between initial disability worsening and CDW, and assess the impact of missing data when evaluating disease-modifying therapies in participants with RRMS. We studied these relationships post-hoc in the context of a large, randomized clinical trial and used the results to perform simulation studies (in which the disability effect is known a priori) to better quantify the impact of informative censoring on CDW rate, study power, and type I error by design through the simulation study.

Fig. 1. Determination of CDW in clinical study participants with relapsing-remitting multiple sclerosis. Baseline EDSS score was assessed at study entry and at regular intervals according to the study protocol. Initial disability worsening was defined as a ≥ 1.0-point increase in EDSS score from a baseline score of ≥ 1.0 or a ≥ 1.5-point increase from a baseline score of 0. Study participants with an initial worsening of disability returned to the clinic 12 or 24 weeks later for a confirmation assessment. Disability worsening was confirmed if a participant's EDSS score remained elevated at the confirmation visit and at all intermittent visits. CDW, confirmed disability worsening; EDSS, Expanded Disability Status Scale.

2. Materials and methods 2.1. Participants and study design The randomized, double-blind, active-controlled, Phase III DECIDE study (NCT01064401) compared daclizumab beta (DAC BETA) 150 mg subcutaneous every 4 weeks with intramuscular interferon (IFN) beta-1a 30 mg once weekly in participants with RRMS who continued treatment < 3 years or until the last enrolled participant completed 2 years (Kappos et al., 2015). Participant eligibility, randomization, and assessments have been previously described (Kappos et al., 2015). The DECIDE study was approved by central and local ethics committees, and informed consent was obtained from all participants, as previously described (Kappos et al., 2015). Marketing authorization for DAC BETA was voluntarily withdrawn by Biogen on March 2, 2018 (Biogen, 2018).

the risk of CDW were evaluated. Candidate predictors were included in a multivariate logistic model and a backward stepwise selection procedure was used to remove variables that were not associated with the risk of CDW at the 5% significance level (Wald test). Treatment group was included in all models regardless of statistical significance. 2.3. Simulation study methods

2.2. Analysis methods

Rates of 12- and 24-week CDW in DECIDE participants were determined according to pre-specified analyses (Kappos et al., 2015). To evaluate the impact of missing data on the DECIDE results, a post-hoc simulation study was performed to compare the common analytical approach of ignoring initial disability worsening in censored participants with an approach that accounted for the increased risk of confirmation in participants with initial disability worsening. EDSS data for 1000 participants per treatment arm were simulated at nine visits in 5000 simulated datasets. The median HR and the median proportion of participants who progressed in each group were summarized across the 5000 datasets. Initial parameters of the stimulation study, including distribution of baseline EDSS score and change in EDSS score from a visit with or without worsening to next assessment, are included in Supplemental Methods and Supplemental Figures 2–4. To generate censored data, we applied a missing-at-random dropout process in which the probability of dropping out (and being censored) at the next visit depended on whether the participant had initial disability worsening (Little and Rubin, 2002). Three dropout scenarios were evaluated, in which the probability of being censored after experiencing initial disability worsening was varied: low (Scenario A; odds of dropping out were 1.5 times higher for participants who had an initial disability worsening versus those who did not); medium (Scenario B; 3.9 times higher); and high (Scenario C; 7.4 times higher; Supplemental Methods). If a participant had disability worsening at their last available EDSS assessment but no confirmation data were available, the presence or absence of CDW was imputed using three strategies (Supplemental Fig. 1):

Initial disability worsening was defined as a ≥ 1.0-point increase in Expanded Disability Status Scale (EDSS) score from a baseline score of ≥ 1.0 or a ≥ 1.5-point increase from a baseline score of 0. Participants with initial disability worsening had a confirmation visit 12 or 24 weeks later; those who maintained EDSS worsening over that period were considered to have CDW (Fig. 1). All intermittent EDSS assessments (if performed) had to similarly show sustained EDSS and confirmation could not occur within 29 days of relapse onset. For participants who enrolled in the extension study of DECIDE (EXTEND, NCT01797965), the 12-week assessment in EXTEND could be used to confirm progression that began in DECIDE. Time to initial disability worsening was analyzed using a Cox proportional hazards model adjusted for baseline EDSS (continuous variable), prior IFN beta-1a use, and baseline age (< 35 vs ≥ 35 years). The proportion of participants with 12- and 24-week CDW was estimated by the Kaplan-Meier method. The association between key covariates (initial disability worsening in the past 6 months, baseline EDSS, and treatment group) and treatment discontinuation/study withdrawal was also evaluated using a Cox proportional hazards model. Disability worsening was included in the model as a time-varying covariate. Among participants with initial disability worsening, the association between key demographic covariates (time since diagnosis, time since disease onset, age), key disease characteristics (number of relapses in the past year, number of relapses in the past 3 years, baseline EDSS, change in EDSS from baseline to disability worsening, relapse within 29 days prior to initial disability worsening, baseline T2 lesion volume, baseline gadolinium-enhancing lesion count, prior IFN beta use), and 2

Multiple Sclerosis and Related Disorders 39 (2020) 101865

K. Riester, et al.

1 None confirmed: all participants with a missing confirmation assessment were censored at the time of initial disability worsening and assumed not to have CDW. 2 All confirmed: all participants with a missing confirmation assessment were censored at the time of initial disability worsening and assumed to have CDW. 3 ORMI approach: participants with a missing confirmation assessment were assumed to have a confirmation rate similar to that observed in the trial, based on treatment group and participants with similar risk factors for confirmation (baseline age, relapse in the past 29 days prior to disability worsening, baseline EDSS, change in EDSS from baseline to time of disability worsening).

Of the total 701 participants who had initial disability worsening, similar proportions of participants in the IFN beta-1a and DAC BETA groups (36% vs 39%, respectively) had confirmed disability at 12 weeks (Table 1). However, participants in the IFN beta-1a group were more likely to miss their confirmation visit than those in the DAC BETA group (11% vs 8%, respectively). For the 24-week CDW endpoint, 25% and 26% of participants with initial disability worsening in both the IFN beta-1a and DAC BETA groups were confirmed at follow-up, while 17% and 13%, respectively, had missing confirmation assessments. The most common reasons for study discontinuation and missing confirmation assessments were lack of efficacy (34% and 27% for the 12- and 24-week CDW endpoints, respectively) or withdrawal of consent (24% and 19% for the 12- and 24-week endpoints, respectively; Table 2). Other reasons included adverse events, lost to follow-up, personal/logistical reasons, and/or pregnancy. We next explored which risk factors were positively or negatively associated with disability confirmation. Logistic regression using backward elimination with a rejection criteria of p ≥ 0.05 was used to identity variables associated with the 12-week CDW endpoint. Among participants with initial disability worsening, the odds of having 12-week CDW were greater in participants who had larger changes in EDSS score at the time of initial disability worsening (odds ratio [OR] 1.65, 95% CI 1.25–2.17, p = 0.0003), older age (OR 1.03, 95% CI 1.01–1.05, p = 0.004), and higher baseline EDSS scores (OR 1.27, 95% CI 1.10–1.46, p = 0.0013); odds of 12-week CDW were lower in participants with a preceding relapse (OR 0.26, 95% CI 0.18–0.37, p < 0.0001; Table 3). Treatment (DAC BETA vs IFN beta-1a) was not associated with an increased risk of 12-week CDW (OR 0.88, 95% CI 0.62–1.25); treatment reduced the risk of developing initial disability worsening (Table 1) but did not change the likelihood that initial disability worsening would be confirmed. Similar trends were noted for the 24-week CDW endpoint. The association between key covariates and treatment discontinuation/study withdrawal was also evaluated. Initial disability worsening within the past 6 months was associated with increased risk of treatment discontinuation (HR 2.87, 95% CI 2.38–3.45, p < 0.0001) and study withdrawal (HR 2.88, 95% CI 2.34–3.54, p < 0.0001; Table 4). Higher baseline EDSS score was also a risk factor (treatment discontinuation, HR 1.11, 95% CI 1.03–1.18, p = 0.0034; and study withdrawal, HR 1.17, 95% CI 1.08–1.26, p < 0.0001).

Data from the simulation study were then analyzed separately using the three imputation strategies listed above with a Cox proportional hazard model adjusted for baseline EDSS. In the simulation study, ties were handled using the Efron method. For the ORMI approach, missing confirmation status was imputed using a multiple-imputation method whereby missing data values are imputed using a set of possible values whose distribution appropriately represents the uncertainty about the missing data (instead of imputing a single value into each missing data point, as is necessarily the case for the first two approaches) (Rubin, 1987). Additional details are in Supplemental Methods. Simulated data allow the benefits of knowing the “truth” (the known treatment effect is part of the simulation, along with other critical parameter assumptions) and understanding how the various imputation methods impact potential decision errors across a set of simulated trials. Simulated trials are critical for helping us understand the properties (operating characteristics such as type I error and power) and effects of our analysis methods, and choosing a methodology that minimizes the risk of either under- or over-estimating the treatment effect. 3. Results A total of 1841 participants were randomized in DECIDE (DAC BETA group, n = 919; IFN beta-1a group, n = 922) (Kappos et al., 2015). Baseline characteristics are reported in Supplemental Results. In total, 43% (394) of participants in the IFN beta-1a group and 33% (307) of participants in the DAC BETA group experienced initial disability worsening during DECIDE (hazard ratio [HR; DAC BETA vs IFN beta-1a] 0.73, 95% confidence interval [CI] 0.63–0.84, p < 0.0001; Table 1).

3.1. Simulation study The simulation study compared the more commonly used analytical

Table 1 Initial disability worsening in DECIDE participants: percentage with relapse, confirmed worsening at 12 and 24 weeks, and missing assessments. Decide participants, n (%)

IFN beta-1a (n = 922)

With initial disability worseningb,c 394 (43) With initial disability worsening with relapsec,d 292 (32) c,d With initial disability worsening without relapse 237 (26) With confirmed worsening and missing confirmation assessment Confirmed at 12 weekse 140/394 (36) Missing confirmatory assessment at 12 weekse 43/394 (11) 99/394 (25) Confirmed at 24 weekse 67/394 (17) Missing confirmatory assessment at 24 weekse

DAC BETA (n = 919)

HR (95% CI)a

pa

307 (33) 192 (21) 209 (23)

0.73 (0.63–0.84) 0.61 (0.51–0.73) 0.86 (0.71–1.03)

< 0.0001 < 0.0001 0.0998

121/307 (39) 24/307 (8) 80/307 (26) 41/307 (13)

CI, confidence interval; DAC BETA, daclizumab beta; EDSS, Expanded Disability Status Scale; HR, hazard ratio; IFN, interferon. a Based on a Cox proportional hazards model adjusted for baseline EDSS values as a continuous variable, history of prior IFN beta use, and baseline age (≤ 35 vs > 35 years). b Initial disability worsening was defined as a ≥ 1.0-point increase on the EDSS from a baseline score of ≥ 1.0, or a ≥ 1.5-point increase from a baseline score of 0. c Percentage calculated based on the total number of participants by treatment group (IFN beta-1a group, n = 922; DAC BETA group, n = 919). d n values for initial disability worsening with and without relapse may exceed the total number of participants with initial disability worsening because each participant may experience more than one increase in EDSS. e Percentage calculated based on the number of participants with initial disability worsening by treatment group (IFN beta-1a group, n = 394; DAC BETA group, n = 307). 3

Multiple Sclerosis and Related Disorders 39 (2020) 101865

K. Riester, et al.

Table 2 Reasons for missing confirmation assessments of participants in DECIDE. Reason, n (%)

12-week CDW IFN beta-1a (n = 43)

DAC BETA (n = 24)

Total (n = 67)

24-week CDW IFN beta-1a (n = 67)

DAC BETA (n = 41)

Total (n = 108)

Alternative multiple sclerosis medication Assessment could not be useda Did not complete DECIDE Lack of efficacy Adverse event Lost to follow-up Consent withdrawn Pregnancy Personal/logistical Completed DECIDE but did not enroll in extension study (ClinicalTrials.gov, 2017)

5 (12) 5 (12)

3 (13) 2 (8)

8 (12) 7 (10)

6 (9) 15 (22)

8 (20) 7 (17)

14 (13) 22 (20)

13 (30) 5 (12) 1 (2) 11 (26) 1 (2) 2 (5) 0

10 (42) 1 (4) 1 (4) 5 (21) 1 (4) 1 (4) 0

23 (34) 6 (9) 2 (3) 16 (24) 2 (3) 3 (4) 0

19 (28) 7 (10) 1 (1) 13 (19) 1 (1) 2 (3) 3 (4)

10 (24) 3 (7) 1 (2) 8 (20) 1 (2) 1 (2) 2 (5)

29 (27) 10 (9) 2 (2) 21 (19) 2 (2) 3 (3) 5 (5)

CDW, confirmed disability worsening; DAC BETA, daclizumab beta; IFN, interferon. a Confirmatory assessment could not be used for confirmation owing to ongoing relapse, assessment was < 74 days after initial assessment (for 12-week CDW) or < 148 days after initial assessment (for 24-week CDW), or there was Expanded Disability Status Scale improvement after switching from IFN beta-1a to DAC BETA in the extension study (ClinicalTrials.gov, 2017). Table 3 Factors associated with 12- and 24-week CDW in participants in DECIDE.a Covariate 12-week CDW Relapse in past 29 days Change in EDSS score from baseline to initial disability worseningc Age at baselined Higher baseline EDSS scorec Treatment (DAC BETA vs IFN beta-1a) 24-week CDW Relapse in past 29 days Change in EDSS score from baseline to initial disability worseningc Age at baselined Higher baseline EDSS scorec Treatment (DAC BETA vs IFN beta-1a)

Odds ratio (95% CI)b

pb

0.26 (0.18–0.37) 1.65 (1.25–2.17)

< 0.0001 0.0003

1.03 (1.01–1.05) 1.27 (1.10–1.46) 0.88 (0.62–1.25)

0.0040 0.0013 0.4727

0.28 (0.19–0.43) 1.81 (1.33–2.45)

< 0.0001 0.0002

1.03 (1.01–1.06) 1.43 (1.22–1.68) 0.76 (0.52–1.12)

0.0048 < 0.0001 0.1654

Table 4 Factors associated with treatment discontinuation and study withdrawal of participants in DECIDE.a Covariate Treatment discontinuation Initial disability worsening in past 6 months Higher baseline EDSS score Treatment group (DAC BETA vs IFN beta-1a) Study withdrawal Initial disability worsening in past 6 months Higher baseline EDSS score Treatment group (DAC BETA vs IFN beta-1a)

Hazard ratio (95% CI) p 2.87 (2.38–3.45) 1.11 (1.03–1.18) 0.97 (0.82–1.14)

< 0.0001 0.0034 0.6965

2.88 (2.34–3.54) 1.17 (1.08–1.26) 0.86 (0.71–1.04)

< 0.0001 < 0.0001 0.1241

CI, confidence interval; DAC BETA, daclizumab beta; EDSS, Expanded Disability Status Scale; IFN, interferon. a Based on an evaluation of 1841 participants in the DECIDE trial.

CDW, confirmed disability worsening; CI, confidence interval; DAC BETA, daclizumab beta; EDSS, Expanded Disability Status Scale; IFN, interferon. a Among participants who experienced ≥ 1 initial disability worsening during the study period. Participants who had an initial disability worsening at their last EDSS assessment and no confirmatory assessment were excluded. If a participant had multiple instances of initial disability worsening, then the first confirmed (if the participant had a confirmed initial disability worsening) or the last (if the participant did not have any instances of initial worsening confirmed) initial disability worsening was evaluated (note: if the last non-confirmed initial disability worsening was part of a series of consecutive instances of initial worsening, then the first assessment of the series was used). b Odds ratios and p values were estimated from a multivariate logistic model using all of the listed covariates and the treatment group (DAC BETA vs IFN beta-1a). c Odds ratios correspond to a 1.0-point increase in EDSS score. d Odds ratios correspond to a 1-year increase in age.

Since these estimates were obtained from the dataset prior to applying the dropout scenarios, they represent the HR and CDW rate in the presence of no missing data. Three dropout scenarios were applied to evaluate the effect of various dropout rates on the overall rate of CDW in 5000 simulation datasets. The number of participants with missing confirmation data was higher in the control group compared with the active group across all scenarios. The three dropout scenarios were analyzed with respect to the three imputation approaches for handling missing data. Results from the simulation study indicated that when participants with a missing confirmation assessment were censored and assumed to be not confirmed, the CDW rate decreased for both treatment groups and the HR increased as compared with the full dataset. This held true for all three dropout scenarios (Table 5). For example, in Scenario C, the median CDW rate across the 5000 simulated datasets was 13.1% in the control group versus 10.0% in the active group, with a HR of 0.749; this was compared with 17.8% versus 13.4% (HR 0.733) using the full dataset. As expected, bias (i.e., the difference against the full data set) increased as the amount of missing data increased, with Scenario C (high dropout rate) showing a greater difference from the full dataset than Scenarios A or B. Furthermore, applying the standard imputation method (i.e., all cases not confirmed) also affected study power, which was estimated to decrease by 10–20% depending on the amount of missing data (Table 6).

imputation approach (i.e., assume all censored participants were not confirmed) with the ORMI approach (impute data based on CDW risk factors) with respect to study power, recovery of treatment effect, and impact on type I error. Using the full simulated dataset (i.e., prior to applying the dropout scenarios), the median CDW rate in participants was 17.8% in the control group versus 13.4% in the active group, representing a 27% reduction in the risk of progression (HR 0.733; Table 5). The estimated CDW rates by treatment group in the simulation study mimicked the observed 12-week CDW rates in the DECIDE trial (Kappos et al., 2015).

4

Multiple Sclerosis and Related Disorders 39 (2020) 101865

K. Riester, et al.

Table 5 Simulation scenarios. Scenario

Full dataset

Assumptions for handling censored participants None are confirmed

Full datasetb % confirmed HR Scenario Ac % confirmed HR Scenario Bd % confirmed HR Scenario Ce % confirmed HR

Control

Active

17.8 0.733

13.4

ORMIa

All are confirmed

Control

Active

Control

Active

Control

Active

15.4 0.740

11.6

21.1 0.722

15.7

17.6 0.729

13.2

14.1 0.746

10.7

23.1 0.718

17.1

17.6 0.727

13.1

13.1 0.749

10.0

24.7 0.713

18.3

17.6 0.730

13.2

EDSS, Expanded Disability Status Scale; HR, hazard ratio; ORMI, observed rate multiple imputation. a Imputation was based on participant-specific probability of confirmation estimated using a logistic model within each treatment group adjusted for baseline EDSS score (as a continuous variable) and change in EDSS score from baseline to the time of initial disability worsening. If a participant had multiple instances of initial disability worsening, the first confirmed (if the participant had a confirmed progression) or the last (if no instances of initial disability worsening were confirmed) record was used in the analysis. HRs were estimated from a Cox proportional hazards model adjusting for baseline EDSS. Table presents median percentage confirmed and median HR across 5000 simulations. b Full dataset estimations represent the HR and confirmation rate in the presence of no missing data, prior to applying dropout scenarios. c In Scenario A, an average of 13.6% participants in the control group and 12.6% in the active group with initial disability worsening missed their confirmation assessment. d In Scenario B, an average of 21.5% participants in the control group and 19.9% in the active group with initial disability worsening missed their confirmation assessment. e In Scenario C, an average of 27.8% participants in the control group and 25.9% in the active group with initial disability worsening missed their confirmation assessment.

HR of 0.75, and 2 years of follow-up, power is estimated to be ~77%. The HR of 0.75 corresponds to an event rate of 17.9% (179 events) in the control group and 13.5% (135 events) in the active group. If the event rate in the active group remains at 13.5%, but the rate in the control group decreases by only four events because of censoring, the study power decreases by almost 10%. This loss in power is even more dramatic when the number of events in the control group decreases by 13 events, which causes study power to drop to 48% (Fig. 2).

As expected, when all cases of missing confirmation data were considered confirmed, the estimated CDW rate increased in both groups (24.7% vs 18.3%, control vs active, respectively, Scenario C) compared with the full dataset and the treatment effect was exaggerated by 3% (HR 0.713). When using the ORMI approach, the known treatment effect was correctly estimated; the CDW rate (17.6% vs 13.2%, control vs active, respectively) and HR in Scenarios A (0.729), B (0.727), and C (0.730) were similar to that observed in the full dataset (0.733). In addition, study power was increased by ~5–10% using this approach compared with the standard method (Table 6). Type I error rates, defined as the proportion of simulation results in which p < 0.05 and HR < 1.0 when the true HR assumed in simulations was set to 1.0, were also assessed for all three scenarios. The type I error rate was maintained across all scenarios and imputation methods (Table 6). If informative censoring exists and the event rate in the control group is reduced, the power of the study may be dramatically impacted. As an example, in a trial with 1000 participants per treatment group, a

4. Discussion Analysis of the CDW endpoint is typically performed under the assumption of non-informative censoring. This assumption lacks face validity when participants are censored after experiencing initial disability worsening, as these participants may be dropping out for reasons related to their disability. Using parameters from a recent multiple sclerosis (MS) clinical trial whose results have been already presented (Kappos et al., 2015), we performed trial simulations demonstrating

Table 6 Study power and type I error simulations. Scenario

Full dataset A B C

No. missing confirmation

Full dataset (benchmark)

Control

Active

Study power

Type I error, %

0 56 89 115

0 40 64 83

77.3

2.5

Assumptions for handling censored participants None are confirmed All are confirmed Study power Type I error, % Study power Type I error, %

ORMIa Study power

Type I error, %

69.5 62.4 58.6

73.8 70.7 65.5

2.5 2.7 2.3

2.3 2.3 2.3

87.2 91.1 93.2

2.5 2.4 2.4

Study power was estimated as the proportion of simulated datasets in which the HR was < 1.0 and two-sided p value < 0.05. Type I error rate was estimated as the proportion of simulated datasets in which the HR was < 1.0 and two-sided p value < 0.05 under the assumption of a HR of 1.0. EDSS, Expanded Disability Status Scale; HR, hazard ratio; ORMI, observed rate multiple imputation. a Probability of confirmation was estimated using a logistic model within each treatment group adjusted for baseline EDSS score (as a continuous variable) and change in EDSS score from baseline to the time of initial disability worsening. If a participant had multiple instances of initial disability worsening, the first confirmed (if the participant had a confirmed progression) or the last (if no instances of initial disability worsening were confirmed) record was used in the analysis. 5

Multiple Sclerosis and Related Disorders 39 (2020) 101865

K. Riester, et al.

Fig. 2. Effects of a declining event rate in the control group on study power. Estimated study power was calculated based on the number of events in the control group for an example study with 1000 participants per group (control vs active), a hazard ratio of 0.75, 2 years of follow-up, and a 13.5% event rate in the active group. If the event rate in the control group decreases while the rate in the active group remains unchanged, study power is greatly reduced.

Results of this simulation study support the analysis methods and conclusions of the DECIDE trial. As previously reported, assumptions made about patients with a missing confirmatory assessment in DECIDE influenced the study results (Kappos et al., 2015). When patients with a missing confirmatory assessment were assumed not to have CDW, the risk of 12-week CDW was reduced by 16% in the DAC BETA group compared with the IFN beta-1a group (p = 0.16) (Kappos et al., 2015). Trends were similar when CDW was assumed to be based on risk factors of confirmation using the ORMI method, although in this analysis, the p-value reached statistical significance (p = 0.047) (Kappos et al., 2015). For the 24-week endpoint, a similar observation was noted where the percent reduction using the standard approach (21%) and ORMI method (27%) were similar, but the p-value only reached statistical significance (p = 0.033) with the ORMI method (Kappos et al., 2015). The findings herein can be also applied to other MS clinical trials and across other disease states for which these types of methodological issues with missing data are faced (Altman, 2009; Bell et al., 2013; European Medicines Agency, 2010; Kappos et al., 2015; Molenberghs et al., 2004). For example, the problem of missing data also exists for the analysis of progression-free survival (PFS) in clinical trials of cancer therapies. If a participant discontinues treatment because of toxicity, worsening of disease, or other factors that may be predictive of PFS, informative censoring may lead to extreme bias in the subsequent analysis of treatment effect, particularly if dropout rates are imbalanced among treatment groups (Campigotto and Weller, 2014; Carroll, 2007). Different recommendations have been made in cancer trials about how to handle informative censoring when calculating PFS. Recommendations include adopting an intention-to-treat approach in which participants are followed after switching treatments to document evidence of progression (Carroll, 2007), and using time-to-treatment failure as an alternative endpoint, in which inadequate response and initial signs of clinical progression are considered treatment failure (Campigotto and Weller, 2014). These groups, along with regulatory agencies, recommend that several sensitivity analyses be included to assess the impact of informative censoring (European Medicines Agency, 2013; US Food and Drug Administration, 2007; Little et al., 2012). When assessing disease progression in participants with MS, the presence of missing data can lead to bias and impact the interpretation of study results. Standard analysis methods employ an imputation approach that assumes no disability progression has occurred in participants who discontinue the study after an initial worsening, and prior to

that commonly used survival analysis techniques that make an assumption of non-informative censoring can produce biased treatment estimates, and may result in a 10–15% loss of study power for the key CDW endpoint. The assumption of non-informative censoring may be appropriate when censoring occurs for reasons unrelated to risk factors for disability progression. However, when a participant has a documented initial decline in physical function, as demonstrated by an increase in EDSS score, and then misses the confirmation assessment 12 or 24 weeks later, this assumption may not be appropriate. In these cases, censoring is potentially not independent of the event and may be related to progression or worsening of disease. Standard survival analyses may either underestimate or overestimate the true survival rate, resulting in a biased estimate of treatment effect in the presence of informative censoring regardless of whether the censoring is balanced or unbalanced across treatment arms (Bell et al., 2014; Campigotto and Weller, 2014). Evolving worsening of disease symptoms cannot be included as a time-dependent covariate in a standard survival analysis model, as worsening is a mediator of the treatment effect and on the causal pathway of treatment. However, time-varying covariates can be included in the intermediate imputation model without invalidating the estimation of the treatment effect in the analysis model. Using an imputation model enriched by evolving symptoms makes subsequent survival analysis models more appropriate, as it is now more likely that the non-informative censoring assumption underlying the Cox analysis model (when applied to the imputed data) will be met. Using imputations informed by time-varying covariates in combination with Cox modeling therefore allows us to fulfill the non-informative censoring assumption by utilizing available data. The simulation study demonstrated that bias can be introduced when using conventional analytical imputation techniques (imputing participant status as “not confirmed”) and that bias increased as the amount of missing data increased. Across the simulation examples, fewer participants in both groups were estimated to have CDW (compared with the full dataset) when using the conventional analytical imputation approach, but the relative reduction in CDW in the control group was higher than in the active group (i.e., CDW rate in the control group dropped from 17.8% [full dataset] to 13.1% [Scenario C], whereas in the active group it dropped from 13.4% [full dataset] to 10% [Scenario C] = 27% vs 25% reduction, respectively). In contrast, bias was reduced when using the ORMI method: estimations of the progression rate and HR were almost identical to those in the full dataset. Additionally, study power increased and the type I error rate was maintained. 6

Multiple Sclerosis and Related Disorders 39 (2020) 101865

K. Riester, et al.

the online version, at doi:10.1016/j.msard.2019.101865.

completing the confirmatory assessment. Based on the results of our simulation study, we propose using an ORMI approach for handling participants with missing confirmation data. This straightforward analytical approach that can be easily implemented using available commercial software enabled us to impute the confirmatory status of participants based on CDW risk factors. Given the evidence for introducing bias when using the current methodological standard, we recommend that future MS clinical trials incorporate estimations of CDW risk that are consistent with the observed trial data.

References Altman, D.G., 2009. Missing outcomes in randomized trials: addressing the dilemma. Open Med. 3, e51–e53. Bell, M.L., Fiero, M., Horton, N.J., et al., 2014. Handling missing data in RCTs; a review of the top medical journals. BMC Med. Res. Methodol. 14, 118. https://doi.org/10. 1186/1471-2288-14-118. Bell, M.L., Kenward, M.G., Fairclough, D.L., et al., 2013. Differential dropout and bias in randomised controlled trials: when it matters and when it may not. BMJ 346, e8668. https://doi.org/10.1136/bmj.e8668. Biogen, 2018. Biogen and AbbVie announce the voluntary worldwide withdrawal of marketing authorizations for ZINBRYTA® (daclizumab) for relapsing multiple sclerosis. https://news.abbvie.com/news/press-releases/biogen-and-abbvieannounce-voluntary-worldwide-withdrawal-marketing-authorizations-for-zinbrytadaclizumab-for-relapsing-multiple-sclerosis.htm (accessed 16 April 2018). Bland, J.M., Altman, D.G., 1998. Survival probabilities (the Kaplan–Meier method). BMJ 317, 1572. https://doi.org/10.1136/bmj.317.7172.1572. Campigotto, F., Weller, E., 2014. Impact of informative censoring on the Kaplan–Meier estimate of progression-free survival in phase II clinical trials. J. Clin. Oncol. 32, 3068–3074. https://doi.org/10.1200/jco.2014.55.6340. Carroll, K.J., 2007. Analysis of progression-free survival in oncology trials: some common statistical issues. Pharm. Stat. 6, 99–113. https://doi.org/10.1002/pst.251. ClinicalTrials.gov, 2017. Long-term extension study in participants with multiple sclerosis who have completed study 205MS301 (NCT01064401) to evaluate the safety and efficacy of BIIB019 (EXTEND). https://clinicaltrials.gov/ct2/show/NCT01797965? term=NCT01797965&rank=1 (accessed 26 October 2017). Dumville, J.C., Torgerson, D.J., Hewitt, C.E., 2006. Reporting attrition in randomised controlled trials. BMJ 332, 969–971. https://doi.org/10.1136/bmj.332.7547.969. European Medicines Agency, 2010. Guideline On Missing Data in Confirmatory Clinical Trials. London. European Medicines Agency, 2013. Appendix 1 to the Guideline on the Evaluation of Anticancer Medicinal Products in Man. London. Fox, R.J., Miller, D.H., Phillips, J.T., et al., CONFIRM Study Investigators, 2012. Placebocontrolled phase 3 study of oral BG-12 or glatiramer in multiple sclerosis. N. Engl. J. Med. 367, 1087–1097. https://doi.org/10.1056/NEJMoa1206328. Gold, R., Giovannoni, G., Selmaj, K., et al., SELECT study investigators, 2013. Daclizumab high-yield process in relapsing-remitting multiple sclerosis (SELECT): a randomised, double-blind, placebo-controlled trial. Lancet 381, 2167–2175. https://doi.org/10. 1016/s0140-6736(12)62190-4. Gold, R., Kappos, L., Arnold, D.L., et al., DEFINE Study Investigators, 2012. Placebocontrolled phase 3 study of oral BG-12 for relapsing multiple sclerosis. N. Engl. J. Med. 367, 1098–1107. https://doi.org/10.1056/NEJMoa1114287. Kappos, L., Radue, E.W., O’Connor, P., et al., FREEDOMS Study Group, 2010. A placebocontrolled trial of oral fingolimod in relapsing multiple sclerosis. N. Engl. J. Med. 362, 387–401. https://doi.org/10.1056/NEJMoa0909494. Kappos, L., Wiendl, H., Selmaj, K., et al., 2015. Daclizumab HYP versus interferon beta-1a in relapsing multiple sclerosis. N. Engl. J. Med. 373, 1418–1428. https://doi.org/10. 1056/NEJMoa1501481. Lavery, A.M., Verhey, L.H., Waldman, A.T., 2014. Outcome measures in relapsing-remitting multiple sclerosis: capturing disability and disease progression in clinical trials. Mult. Scler. Int. 2014, 262350. https://doi.org/10.1155/2014/262350. Little, R.J., D'Agostino, R., Cohen, M.L., et al., 2012. The prevention and treatment of missing data in clinical trials. N. Engl. J. Med. 367, 1355–1360. https://doi.org/10. 1056/NEJMsr1203730. Little, R.J.A., Rubin, D.B., 2002. Statistical Analysis with Missing Data, 2nd ed. John Wiley & Sons, Inc., Hoboken, New Jersey. Molenberghs, G., Thijs, H., Jansen, I., et al., 2004. Analyzing incomplete longitudinal clinical trial data. Biostatistics 5, 445–464. https://doi.org/10.1093/biostatistics/5. 3.445. Polman, C.H., O’Connor, P.W., Havrdova, E., et al., AFFIRM Investigators, 2006. A randomized, placebo-controlled trial of natalizumab for relapsing multiple sclerosis. N. Engl. J. Med. 354, 899–910. https://doi.org/10.1056/NEJMoa044397. Prinja, S., Gupta, N., Verma, R., 2010. Censoring in clinical trials: review of survival analysis techniques. Indian J. Community Med. 35, 217–221. https://doi.org/10. 4103/0970-0218.66859. Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons, New York. Shih, W., 2002. Problems in dealing with missing data and informative censoring in clinical trials. Curr. Control Trials Cardiovasc. Med. 3, 4. https://doi.org/10.1186/ 1468-6708-3-4. US Food and Drug Administration, 2007. Guidance For Industry: Clinical Trial Endpoints For the Approval of Cancer Drugs and Biologics. Maryland.

Data availability The DECIDE study protocol and de-identified individual participant data collected during the DECIDE trial that support this publication will be made available by request (www.biogenclinicaldatarequest.com). Funding This study was funded by Biogen (Cambridge, MA, USA) and AbbVie (Redwood City, CA, USA). The sponsors had a role in the study design; collection, analysis, and interpretation of data; writing of the report; and the decision to submit the article for publication. Declaration of Competing Interest Katherine Riester is an employee of and holds stock/stock options in Biogen. Ludwig Kappos’ institution has received in the last 3 years, and used exclusively for research support: steering committee/consulting fees from Actelion, Addex, Bayer HealthCare, Biogen, Biotica, Genzyme, Lilly, Merck, Mitsubishi, Novartis, Ono, Pfizer, Receptos, Sanofi-Aventis, Santhera, Siemens, Teva, UCB, and XenoPort; speaker fees from Bayer HealthCare, Biogen, Merck, Novartis, Sanofi-Aventis, and Teva; support of educational activities from Bayer HealthCare, Biogen, CSL Behring, Genzyme, Merck, Novartis, Sanofi-Aventis, and Teva; license fees for Neurostatus products; and grants from Bayer HealthCare, Biogen, the European Union, Merck, Novartis, Roche, Roche Research Foundations, the Swiss Multiple Sclerosis Society, and the Swiss National Research Foundation. Krzysztof Selmaj has received consulting fees from Genzyme, Novartis, Ono, Roche, Synthon, and Teva, and speaker fees from Biogen. Stacy Lindborg is an employee of and holds stock/stock options in Biogen. Ilya Lipkovich is a former employee of IQVIA (formerly Quintiles); he performed statistical analysis as part of the contract between Biogen and IQVIA. Jacob Elkins is an employee of and holds stock/stock options in Biogen. Acknowledgments The authors thank Gouchen Song, Ph.D., for his validation of the SAS simulation code. Biogen provided funding for medical writing support in the development of this manuscript. Susan Chow from Excel Scientific Solutions provided medical writing assistance, and Miranda Dixon from Excel Scientific Solutions copyedited and styled the manuscript per journal requirements. Biogen reviewed and provided feedback on the paper to the authors. The authors had full editorial control of the paper, and provided their final approval of all content. Supplementary materials Supplementary material associated with this article can be found, in

7