Repeated-Measures Analysis of the National Institute of Neurological Disorders and Stroke rt-PA Stroke Trial

Repeated-Measures Analysis of the National Institute of Neurological Disorders and Stroke rt-PA Stroke Trial

Repeated-Measures Analysis of the National Institute of Neurological Disorders and Stroke rt-PA Stroke Trial Wuwei Feng, MD, MS,* Gabriela Vasquez, Ph...

192KB Sizes 0 Downloads 34 Views

Repeated-Measures Analysis of the National Institute of Neurological Disorders and Stroke rt-PA Stroke Trial Wuwei Feng, MD, MS,* Gabriela Vasquez, PhD,† M. Fareed K. Suri, MD,† Kamakshi Lakshminarayan, MD, PhD,† and Adnan I. Qureshi, MD†

Previous analyses, including the National Institute of Neurological Disorders and Stroke (NINDS) rt-PA Stoke Trial, have assessed the clinical treatment efficacy only at single study point, but did not assess efficacy using outcomes collected at multiple time points and incorporate within-patient correlation. The data from the NINDS rt-PA Stroke Trial was analyzed with repeated-measures analysis with generalized estimating equations (GEE) approach using dichotomized outcomes (modified Rankin Scale [mRS], National Institutes of Health Stroke Scale [NIHSS], Barthel Index [BI], and Glasgow Outcome Scale [GOS]). The results were compared with data from previous analyses. All of the outcome variables at different time points were significantly correlated. rt-PA was superior to placebo overall and at specific time points individually. The overall odds of having minimal or no disability (mRS score 0 or 1) for patients treated with rt-PA was higher than those treated with placebo (odds ratio [OR], 2.1; 95% confidence interval [CI], 1.5-3.0). The ORs were 2.3 (95% CI, 1.5-3.4) times higher at 3 months, 1.9 (95% CI, 1.3-2.8) times higher at 6 months, and 2.0 (95% CI, 1.3-2.9) times higher at 12 months. A similar treatment effect also was observed with the NIHSS, BI, and GOS. Compared with previous analyses, an augmented treatment effect with larger ORs and smaller P values were observed. Repeated-measures analysis provides an alternative method for assessing treatment effect, as demonstrated in the analysis of data from the NINDS rt-PA Stroke Trial. This method could be used in future stroke trials in which outcomes of interest are collected at multiple time points. Key Words: Stroke—tissue plasminogen activator—repeated measures analysis—generalized estimating equation. Ó 2011 by National Stroke Association

Over the past decade, multiple randomized controlled clinical trials have been conducted to determine the efficacy of neuropotective or thrombolytic agents and From the *Department of Neurosciences, Medical University of South Carolina, Charleston, South Carolina; and †Zeenat Qureshi Stroke Research Center, Department of Neurology, University of Minnesota, Minneapolis, Minnesota. Received December 14, 2009; accepted January 6, 2010. Study is Supported by the Zeenat Qureshi Stroke Research Center, University of Minnesota. Authors reported no financial disclosure related to this study. Address correspondence to Wuwei Feng, MD, MS, Department of Neuroscience, Medical University of South Carolina, 96 Jonathan Lucas Street, Clinical Science Building, Room 307, Charleston, SC 29425. E-mail: [email protected]. 1052-3057/$ - see front matter Ó 2011 by National Stroke Association doi:10.1016/j.jstrokecerebrovasdis.2010.01.003

endovascular interventions for acute ischemic stroke. Because of the large number of clinical trials that failed to demonstrate efficacy, efforts have been directed toward identifying more potent agents and new statistical approaches to avoid underestimating the treatment benefit. In this study, data from the National Institute of Neurological Disorders and Stroke (NINDS) rt-PA Stroke Trial have been used to study various statistical approaches to compare outcomes between recombinant tissue plasminogen activator (rt-PA)-treated and placebo-treated patients. The trial data were initially analyzed using 4 outcome variables (modified Rankin scale [mRS], National Institutes of Health Stroke Scale [NIHSS], Barthel Index [BI], and Glasgow Outcome Scale [GOS]) assessed at 90 days using a global test.1-3 Subsequently, another analysis was performed by comparing outcomes at 6 months and 12 months.4 Other post hoc analyses, including shift

Journal of Stroke and Cerebrovascular Diseases, Vol. 20, No. 3 (May-June), 2011: pp 241-246

241

W. FENG ET AL.

242 5,6

7,8

analysis, subgroup analysis, and defining clinical response with different dichotomization strategies,9 have been used to identify new approaches to detecting treatment effect in a more sensitive manner, with implications for future statistical designs in stroke clinical trials. All of the foregoing analyses compared outcomes at only a single study point, such as 90 days, 6 months, or 12 months. Clinical trials investigating new drugs usually collect data at baseline and multiple follow-up visits, with correlation among measures. An important consideration is how to account for the correlated measures in the data analysis; for example, within-patient correlation across multiple time points should be taken into account in the analysis to avoid incorrect estimates of standard errors and treatment effects. Another common phenomenon in clinical trials is missing data at various time points due to loss to follow-up, protocol violations, discontinuation due to side effects, and other factors. Previous analyses have handled missing data with the last observation carried forward method (LOCF), meaning that the nonmissing data from the last available follow-up visit is used as a substitute for the last endpoint. The generalized estimating equations (GEE) approach, developed by Liang and Zeger as a method of handling longitudinal data with binary outcome variables,10-12 is useful for many other situations involving missing data, although it makes a strong assumption that data are missing completely at random (MCAR). We used the GEE approach to compare outcomes in patients treated with rt-PA and those treated with placebo using data collected in the NINDS rt-PA Stroke Trial and compared our results with those from previous analyses.

Materials and Methods The NINDS rt-PA Stroke Trial comprised 2 separate parts with nearly identical study methodology but different goals.1 The methodology have been described in detail previously.1,4 The focus of part I was to identify early activity of rt-PA. The prespecified endpoint of interest was improvement of $4 points on the NIHSS score at 24 hours compared with baseline NIHSS score, or complete resolution (NIHSS score of 0) of deficits at 24 hours. The goal of part II was to determine whether rt-PA treatment was associated with a significantly better 3-month functional and neurologic outcomes. The prespecified endpoint for part II was a favorable outcome, as measured on 4 neurologic and functional scales dichotomized to identify patients with minimal or no neurologic or functional deficit: mRS score of 0 or 1, NIHSS score of 0 or 1, GOS score of 1, and BI score of 95 or 100. The NIHSS score was collected at 24 hours, 7 days, and 90 days. The BI score was collected at 90 days, 6 months, and 12 months. The mRS score was collected at 7 days, 90 days, 6 months, and 12 months. The GOS score was collected at 90 days, 6 months, and 12 months.

For the purpose of this analysis, the 2 study parts were combined, similar to previous studies.4,8,13 Means for each outcome variable at all data points were plotted by treatment group. The response rate at different time points was calculated. Repeated-measures analysis with the GEE approach was used for comparing outcome measures collected in the study. The model included the following variables: age (as a continuous variable), sex (male or female), randomized study center, treatment group (rt-PA or placebo), race/ethnicity (African-American, Caucasian, or other), time from stroke onset to treatment (in minutes as a continuous variable), NIHSS score at baseline (as a continuous variable), time (baseline, 24 hours, 7 days, 90 days, 6 months, or 12 months, depending on the outcome variable, as explained earlier), and an interaction term for treatment group and time. A combination of graphic tools to examine the pattern of correlation and information criteria from modeling was used to select an appropriate covariance structure to account for within-patient correlation. A contrast statement with SAS code PROC GENMOD14 was written to test for treatment effect at specific visits: 24 hours, 7 days, 90 days, 6 months, and 12 months. Both P values and 95% confidence intervals (CIs) for odds ratios (ORs) are reported. The Pearson correlation coefficient was used to assess the correlation of the 4 outcome scales at different time points. All statistical analyses were performed using SAS version 9.1 (SAS Institute, Cary, NC).14

Results The outcome variables were strongly correlated across time points (Table 1). The Pearson correlation coefficients between NIHSS score at baseline and at 24 hours, 7 days, and 90 days were 0.65, 0.59, and 0.50, respectively, with the corresponding P values ,.0001 for all correlation coefficients. The correlation coefficients between mRS score at 7 days and that at 90 days, 6 months, and 12 months were 0.82, 0.79, and 0.76, respectively. The correlation coefficients between 90 days and 6 months and 12 months were 0.96 and 0.90 for BI and 0.94 and 0.88 for GOS. Figure 1 illustrates the time trend in treatment effect, showing that the treatment effect is persistent and robust for comparison with all 4 outcome variables. One interesting finding is an imbalance of baseline NIHSS score between 2 treatment groups even though the study is a double-blinded, placebo-controlled randomized study. Table 2 illustrates the superior and robust benefit of rt-PA over placebo as measured by 4 different outcome variables. The benefit of rt-PA can be seen as early as 24 hours after treatment, with 17.4% of the rt-PA–treated patients having a NIHSS score of 0 or 1, versus only 4.8% of the placebotreated patients. The odds of having minimal or no disability (mRS score 0 or 1) was higher for the rt-PA–treated patients compared with the placebo-treated patients using the GEE approach inclusive of all time points (OR, 2.1; 95% CI, 1.5-3.0). The ORs were 2.3 times higher at 3 months,

REANALYSIS OF THE NINDS RT-PA STROKE TRIAL

243

Table 1. Pearson correlation coefficients for outcome variables at different time points NIHSS

Baseline

24 hours

7 days

90 days

Baseline 24 hours 7 days 90 days

Reference 0.65 0.59 0.50

0.65 Reference 0.84 0.70

0.59 0.84 Reference 0.83

0.50 0.70 0.83 Reference

mRS

7 days

90 days

6 months

12 months

7 days 90 days 6 months 12 months

Reference 0.82 0.79 0.76

0.82 Reference 0.94 0.89

0.79 0.94 Reference 0.93

0.76 0.89 0.93 Reference

BI

90 days

6 months

12 months

90 days 6 months 12 months

Reference 0.96 0.90

0.96 Reference 0.94

0.90 0.94 Reference

GOS

90 days

6 months

12 months

90 days 6 months 12 months

Reference 0.94 0.88

0.94 Reference 0.92

0.88 0.92 Reference

1.9 times higher at 6 months, and 2.0 times higher at 12 months. Similar results were obtained for comparisons using outcomes measured by the BI and GOS at 90 days, 6 months, and 12 months (Table 2). The odds of having a favorable outcome overall was 2.0 greater with BI and 1.8 greater with GOS for the rt-PA–treated group. The benefit was persistent at 90 days (2.0 with BI and 2.0 with GOS), 6 months (2.0 with BI and 1.7 with GOS), and 12 months (1.9 with BI and 1.7 with GOS). There were small variations

Figure 1. Means of outcome variables at different time points by treatment group.

among the magnitude of association between comparisons of 4 outcome measures, but the significance and direction of association were consistent.

Discussion Repeated-measures analysis has several advantages. First, we observed an augmented treatment effect with larger ORs and smaller P value, with an almost 2-fold

W. FENG ET AL.

244

Table 2. Comparison of outcomes in the rt-PA and placebo groups at different time points Proportion of patients with favorable outcome* rt-PA

Placebo

P value at study point; OR (95% CI)

17.4% 27.6% 34.0%

4.8% 12.5% 20.5%

,0.001; 4.1 (2.2-7.7) ,0.001; 2.7 (1.7-4.3) ,0.001; 2.0 (1.3-3.1)

,0.001; 2.8 (1.9-4.2)

32.6% 42.6% 42.0% 42.6%

17.5% 26.6% 29.3% 28.9%

,0.001; 2.5 (1.6-3.8) ,0.001; 2.3 (1.5-3.4) 0.002; 1.9 (1.3-2.8) ,0.001; 2.0 (1.3-2.9)

,0.001; 2.1 (1.5-3.0)

51.9% 51.2% 52.0%

38.1% 37.8% 40.0%

,0.001; 2.1 (1.4-3.0) ,0.001; 2.0 (1.4-3.0) 0.002; 1.9 (1.3-2.7)

,0.001; 2.0 (1.4-2.8)

45.2% 43.6% 45.0%

31.1% 32.2% 33.6%

,0.001; 2.0 (1.4-3.0) 0.001; 1.7 (1.2-2.5) 0.007; 1.7 (1.2-2.5)

,0.002; 1.8 (1.3-2.6)

P value overall; OR (95% CI)

NIHSS 24 hours 7 days 90 days mRS 7 days 90 days 6 month 12 months BI 90 days 6 months 12 months GOS 90 days 6 months 12 months

The model includes age, sex, treatment group, race, time from stroke onset to treatment, baseline NIHSS score at baseline, time, and an interaction term for treatment group and time. *Scores of 95 or 100 on the BI, 0 or 1 on the mRS, 1 on the GOS, and 0 or 1 on the NIHSS were considered to indicate a favorable outcome.

benefit in all outcome measures (Table 3). Compared with a previous analysis,4 for mRS the ORs were 2.3 (P , .001) versus 2.0 (P , .001) at 3 months; 1.9 (P , .001) versus 1.8 (P 5 .0001) at 6 months, and 2.0 (P , .001) versus 1.8 (P , .001) at 12 months. For BI, the ORs were 2.0 (P , .001) versus 1.7 (P , .001) at 3 months; 2.0 (P , .001)

versus 1.7 (P 5 .001) at 6 months, and 1.9 (P 5 .002) versus 1.6 (P 5 .005) at 12 months. Similar differences were noted for GOS. Second, our analysis allows for adjusting bias simply by including the variables into the model. For example, there was an imbalance with NIHSS score at baseline between 2 groups even though

Table 3. Comparison with previous analyses Previous analyses

3 months* BI MRS GOS 6 monthsy BI MRS GOS 12 monthsy BI MRS GOS

Present analysis

OR (95% CI)

P value

OR (95% CI)

P value

1.7 (1.3-2.4) 2.0 (1.4-2.8) 1.8 (1.3-2.5)

,.001 ,.001 ,.001

2.0 (1.4-3.0) 2.3 (1.5-3.4) 2.0 (1.4-3.0)

,.001 ,.001 ,.001

1.7 (1.2-2.4) 1.8 (1.3-2.5) 1.6 (1.2-2.3)

.001 .001 .004

2.0 (1.4-3.0) 1.9 (1.3-2.8) 1.7 (1.2-2.5)

,.001 .002 .006

1.6 (1.1-2.1) 1.8 (1.3-2.5) 1.6 (1.1-2.2)

.005 .001 .006

1.9 (1.3-2.7) 2.0 (1.3-2.9) 1.7 (1.1-2.5)

.002 ,.001 .008

*Data at 3 months are based on the method specified by Kwiatkowski et al.4 The Mantel–Haenszel test was used for univariate analyses, with groups stratified according to clinical centers and time to treatment (0-90 minutes and 91-180 minutes). yData at 6 months and 12 months are from Kwiatkowski et al.4

REANALYSIS OF THE NINDS RT-PA STROKE TRIAL

it was a randomized trial; however, we were able to control this potential bias by including baseline NIHSS into the model, whereas the previous analysis did not adjust for this bias. Finally, and most importantly, repeatedmeasures analysis uses all data points collected from trial not only to assess the efficacy over the treatment course, but also to assess efficacy at each individual time point. In additional, within-patient correlation is incorporated in the modeling to better estimate standard errors and treatment effects. Theoretically, repeated-measures analysis allows for a more comprehensive understanding of the clinical benefit of study intervention at any defined study point or throughout the entire study period. The NINDS rt-PA Stroke Trial was not initially designed for repeated-measures analysis. Not all outcome variables were collected on the same visit; for example, NIHSS was collected at baseline, 24 hours, 7 days, and 90 days; mRS was collected at 7 days, 90 days, 6 months, and 12 months; and BI and GOS were collected at 90 days, 6 months, and 12 months. The differing time points for assessment among the various outcome variables adds complexity to the interpretation of this post hoc analysis, but gives us an opportunity to demonstrate the advantages of repeatedmeasures analysis. The analysis demonstrates that rt-PA is superior to placebo overall and at all of the study endpoints after treatment. The benefit can be seen as early as 24 hours after treatment, as assessed by NIHSS, and can be extended to 12 months, as assessed by mRS, BI, and GOS. The Food and Drug Administration (FDA) has been accepting and approving new drugs with trials design allowing for repeated-measures analysis.15,16 In LOCF analysis, the previous standard approach, nonmissing data from last available follow-up visit is used as a substitute for the last endpoint, and the analysis is based only on the last endpoint, not on the entire data set. Recently, LOCF has been challenged by comparisons with repeatedmeasures analysis and multiple-imputation analysis,16–18 both of which handle missing data with different methods and maximally use the data points collected in the clinical trial. Stroke is a devastating neurologic disease for which a new drug or intervention breakthrough is critical. No new molecule has been approved by the FDA for stroke therapy since rt-PA was approved in 1996. The pharmaceutical companies spend between $100 and $800 million on developing a new drug,19 and the average cost incurred per patient recruited in phase III clinical trials is about $26,000.20 In addition, patient recruitment for many clinical trials has been poor. To facilitate conducting clinical trials in a more timely and cost-effective fashion, new study designs that require smaller sample sizes to detect the desired treatment effect or are more sensitive to detect anticipated treatment effects are being explored. Repeated-measures analysis with a GEE approach provides an alternative analysis approach with binary

245

outcome variables in clinical trials, such as the NINDS rt-PA Stroke Trial, and demonstrates a robust result supporting the clinical efficacy of rt-PA therapy in patients with acute ischemic stroke. Designing a clinical trial allowing for repeated-measures analysis may improve the sensitivity of detecting the clinical benefits of new therapeutic interventions. Acknowledgment: We would like to thank Dr. Ilya Lipkovich for his valuable statistical input.

References 1. The National Institute of Neurological Disorders and Stroke rt-PA Stroke Study Group. Tissue plasminogen activator for acute ischemic stroke. N Engl J Med 1995; 333:1581-1587. 2. Lefkopoulou M, Ryan L. Global tests for multiple binary outcomes. Biometrics 1993;49:975-988. 3. Tilley BC, Marler J, Geller NL, et al. Use of a global test for multiple outcomes in stroke trials with application to the National Institute of Neurological Disorders and Stroke t-PA Stroke Trial. Stroke 1996;27:2136-2142. 4. Kwiatkowski TG, Libman RB, Frankel M, et al. Effects of tissue plasminogen activator for acute ischemic stroke at one year. National Institute of Neurological Disorders and Stroke Recombinant Tissue Plasminogen Activator Stroke Study Group. N Engl J Med 1999; 340:1781-1787. 5. Savitz SI, Lew R, Bluhmki E, et al. Shift analysis versus dichotomization of the modified Rankin Scale outcome scores in the NINDS and ECASS-II trials. Stroke 2007; 38:3205-3212. 6. Saver JL, Gornbein J. Treatment effects for which shift or binary analyses are advantageous in acute stroke trials. Neurology 2009;72:1310-1315. 7. Hertzberg V, Ingall T, O’Fallon W, et al. Methods and processes for the reanalysis of the NINDS Tissue Plasminogen Activator for Cute Ischemic Stroke Treatment Trial. Clin Trials 2008;5:308-315. 8. Generalized efficacy of t-PA for acute stroke. Subgroup analysis of the NINDS t-PA Stroke Trial. Stroke 1997; 28:2119-2125. 9. Broderick JP, Lu M, Kothari R, et al. Finding the most powerful measures of the effectiveness of tissue plasminogen activator in the NINDS tPA Stroke Trial. Stroke 2000; 31:2335-2341. 10. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986;73:13-22. 11. Zeger SL, Liang KY. Longitudinal data analysis for discrete and continuous outcomes. Biometrics 1986; 42:121-130. 12. Zeger SL, Liang KY. An overview of methods for the analysis of longitudinal data. Stat Med 1992; 11:1825-1839. 13. Marler JR, Tilley BC, Lu M, et al. Early stroke treatment associated with better outcome: The NINDS rt-PA Stroke Study. Neurology 2000;55:1649-1655. 14. Stokes NE, Davis CS, Koch GG. Categorical Data Analysis Using the SAS System. 2nd ed. Cary, NC: SAS Institute, 2000. 469-548. 15. Mallinckrodt CH, Clark SW, Carroll RJ, et al. Assessing response profiles from incomplete longitudinal clinical

246 trial data under regulatory considerations. J Biopharm Stat 2003;13:179-190. 16. Siddiqui O, Hung HM, O’Neill R. MMRM vs LOCF: A comprehensive comparison based on simulation study and 25 NDA datasets. J Biopharm Stat 2009;19:227-246. 17. Barnes SA, Mallinckrodt CH, Lindborg SR, et al. The impact of missing data and how it is handled on the rate of false-positive results in drug development. Pharm Stat 2008;7:215-225.

W. FENG ET AL. 18. Lipkovich i, Duan Y, Ahmed S. Multiple imputation compared with restricted pseudo-likelihood and generalized estimating equations for analysis of binary repeated measures in clinical studies. Pharm Stat 2005;4:267-285. 19. Clinical operations: Accelerating trials, allocating resources and measuring performance. Cutting Edge Information 2006 20. Fee R. The cost of clinical trials. Drug Discov Devel 2007; 10:32.