Randomizing Responders Alfred P. Hallstrom, PhD, Joel Verter, PhD, Lawrence Friedman, MD, for the Cardiac Arrhythmia Suppression Trial (CAST) Investigators University of Washington, Department of Biostatistics, Seattle, Washington (A. P.H. ); National Heart, Lung and Blood Institute, Bethesda, Maryland (J.V.; L.F.).
ABSTRACT: Three clinical trial designs for use in testing the effect of long-term drug therapy on an outcome are considered: empiric (randomization to one of several specified and fixed therapies); randomization followed by dose-adjustment of the drug; and dose adjustment followed by randomization of responders. It is shown that the latter, though some information may be lost and bias may be introduced, can be more efficient because of the reduction of noise. These results are illustrated using information gathered by the Cardiac Arrhythmia Pilot Study (CAPS) [1,2] and Cardiac Arrhythmia Suppression Trial (CAST) [3], the former being a pilot study using the second design, and the latter a large clinical trial adopting the third design. For CAST, the efficiency (represented by ratio of sample sizes) is 0.78 relative to the empiric design, and 0.6 relative to the design incorporating randomization followed by dose adjustment. KEY WORDS: Clinical trial design, empiric drug trial, dose-adjustment, run-in, efficiency, responders
INTRODUCTION A randomized clinical trial involving long-term administration of a therapy can suffer major loss in power if any of a number of design assumptions are not met. Among these would be higher than projected percent of patients not in compliance with the protocol. This could be the result of either patients ceasing to take their prescribed medication or taking less than the dosage called for in the protocol. Additionally patients in any of the randomized groups who begin to use the intervention of another group (crossovers) would reduce study power. Using the widely accepted intention-to-treat method of analysis [4] the main study result reports all outcome events as occurring in their originally assigned randomization group. Thus patients ran~lomized to an active arm of the trial who either stop taking their assigned treatment or begin using the therapy of another arm and who then have an outcome, are nonetheless counted in the active group. Reducing the rate of these problems is a major
Address for reprint requests to: Alfred P. Hallstrom, PhD, University of Washington, CAST Coordinating Center, 1107 NE 45, Room 505, Seattle, WA 98105. Received June 1, 1990; revised February 28, 1991.
486 0197-2456/91/$3.50
Controlled ClinicalTrials 12:486-503 (1991) © ElsevierSciencePublishing Co., Inc. 1991 655 Avenueof the Americas,New York, New York 10010
Randomizing Responders
487
issue in both the design and conduct of most long-term randomized clinical trials. One partial solution is the use of carefully developed exclusion criteria that reduce the likelihood of randomizing individuals who are likely not to comply with the study protocol. For example, studies often exclude patients who because of alcohol abuse, evidence of severe psychological dysfunction, or lack of a permanent residence are at higher risk of not continuing a longterm protocol. It may also be possible to reduce the crossover rate by attempting to obtain truly informed consent and insisting on a double-blind design, when feasible. Compliance can be enhanced by various monitoring techniques such as periodic measurement of adherence, continual reminders of the importance of following the protocol and positive reinforcements by the clinic staff at all follow-up visits. It may also be enhanced by having a "run-in" period using placebo prior to qualifying for randomization. During this short period prospective participants are requested to follow the study protocol and are given an appropriate amount of the placebo. Individuals are then randomized only if they meet preset minimum compliance criteria. By employing a second "run-in" period using the active drug the study can further decrease the potential compliance problem. First, any lack of tolerance for the therapy is likely to be detected, and such patients quickly excluded, prior to randomization. Second, in trials where an interim response outcome can be measured, the initial response to therapy (e.g., a reduction in blood pressure or premature ventricular beats) can be assessed. This will restrict randomization to individuals who are, at least initially, both responsive to therapy and willing to comply with the protocol. The U.S. Physicians Health Study [5], for example, used the results from a self-report form during a prerandomization run-in phase to exclude individuals estimated to be poor compliers. In this artide on the utility of daily aspirin use and its relationship to the reduction in the risk of fatal and nonfatal myocardial infarction, the investigators noted excellent compliance, based on self-reports of capsule counts. The British Physicians Aspirin Study [6], on the other hand, did not use a run-in, perhaps one of the reasons for much poorer compliance. The Studies of Left Ventricular Dysfunction [7] used two run-in periods to assess both adherence (using a placebo) and short-term tolerability (using the active dr~g). Assessing both tolerability and compliance requires a few assumptions. First, we expect that those who adhere initially will continue to be good compliers and that those eliminated because of lack of initial compliance would continue to comply poorly if randomized. The potential impact of the compliance assumptions has been addressed in a recent issue of this journal [8]. Second, we must assume that the long-term response to therapy of the main outcome can be properly judged by the short-term response to the interim outcome. If these assumptions are valid, then the use of certain designs as described below should be beneficial in improving study efficiency. Three potential designs are considered. In the first, clinical pharmacologic research or small pilot studies are conducted to determine the optimum d r u g dose combination for various patient groups, as measured by response to a surrogate variable. A second approach would allow each patient to proceed through a drug-dosing schedule (using active or placebo) after randomization,
488
A. Hallstrom et al. until the appropriate combinations were found. This has been done in several blood pressure studies [9,10] and in at least one antiarrhythmic trial, the Cardiac Arrhythmic Pilot Study (CAPS), although maintaining the blind may be difficult [11]. A third design, if properly conducted, should dramatically reduce the number of nonresponders. In this design all otherwise eligible patients would be required to proceed through a drug-dosing protocol, employing an interim outcome, prior to being considered eligible for randomization. This would allow the identification of a drug and dose that would, at least initially, elicit the appropriate response to the interim outcome. In this article we refer to trials in which patients are randomized to a specific intervention protocol as empiric, to trials in which the drug and/or dose may be adjusted after randomization as postdosed, and to trials in which at least an initial response to an appropriate drug and dose is required prior to randomization as predosed (Fig. 1). Postdosed trials should have one advantage over empiric trials, namely the smaller incidence of nonresponders, and one disadvantage, namely that some of the outcome events may occur in patients while they are still on suboptimal dose. This will be a disadvantage only in the unlikely circumstance that all the empiric trial patients are started on their optimal doses. The predosed design has both the advantage of a run-in to test compliance, as well as the potential to reduce (within the measurement limitations imposed by variability and measurement error) the number of initial nonresponders. Ther are a few potential drawbacks to the predosed design. The patient or physician may refuse participation once a drug-dose combination is found
"EMPIRIC"OESIGN ~N~(~0 Act~e
THERAPY
COMPLIANT
A~
CASE
~ ~ Piac~ao
COMPLtANT IHOIVlOUALIG~.D WITHDRAWALS
~
w
by
TfeIH
*'POST-DOSED" DEStGN Actv. ,,~ DOSO-dKIjustmen1
WITHORAWALS INDIVIDUALIZ1EO THIEP,APy ~ COMPLIANT
AOhre
CONSENTII~ . ~
~
CASE
vs
AnaJys~by Inilnlloe 1o
COMPLIANT WITHORAWALS
"PRE-OOSED"
CONSENTING CA,~E ~
/
Placebo
BESdGN
UNBUNI3EO ~ OO~E-A[~ISTMENT"miD" F~SPON[~ER$- ADVEHSEEFFECTS NONRESPONI:~RS WtTIIT)RAWAtS
~
SUCCESSFUL~ THERAPY
WITHDRAWALS INDIVIOUALIZEDTHERAPY COMI~.IANT
MATCHING PLACEBO
COMPLIANT INDNIOUALIZEO THERAPY WITHDRAWALS
•
Acirce
vs~
Analys~by Treat
Figure 1 Empiric: A fixed dose from among one of possibly several active therapies or matching placebos. Postdosed: Dose adjustment could involve several doses and several drugs including placebo dose adjustment corresponding to each active sequence. Predosed: Dose adjustment could involve several doses and several drugs. The comparison is between all those randomized at R to successful therapy and matching placebo.
Randomizing Responders
489
that has an impact on the interim outcome. The actual workload at the clinics may increase even though the total sample size needed may decrease. It may prove to be more difficult to maintain the blind since study personnel know that anyone randomized was a responder. Consequently care must be taken not to allow local clinic evaluation of the interim outcome after randomization. Any need to measure the interim response after randomization will probably require additional administrative complexities (e.g., central laboratories, longterm storage of samples). In addition it may be more likely that some patients randomized to placebo recognize that they are not on active therapy and request the active drug. Finally, what may appear at first to be a disadvantage is the loss of those events that occur during the dosing period. These events would normally not be available for analysis with the results of the randomized trial and would appear to be a loss of potentially valuable information over the empiric and postdosed designs. Some of this information may be salvaged by randomizing half of the patients to either a placebo dosing or to a delay in the initiation of dosing. This would allow a direct randomized comparison during the dosing period, providing an estimate of the efficacy of early intervention as well as an estimate of any loss of information. The potential disadvantage is the additional workload for the clinic staff and the possibility that some patients and their physicians may not accept the additional delay. METHODS
Formulae for the potential savings (loss) of the predosed design over the empiric and postdosed designs are derived from simple assumptions about outcome and response rates. The relative efficiencies of the three designs are then assessed for a specific situation using data obtained during the dosetitration period for the predosed designed CAST and data from the postdosed designed CAPS. CAST is an ongoing trial of antiarrhythmic therapy in patients who demonstrate at least six ventricular premature depolarizations per hour on a 24hour ambulatory electrocardiograph recording between 6 days and 2 years after a myocardial infarction. It is designed to test the hypothesis that suppression of ventricular premature depolarizations will reduce the risk of arrhythmic death or cardiac arrest. The design of CAST was altered because of interim findings [3], and the earlier period is now thought of as CAST-I while the ongoing trial is referred to as CAST-II. During the dose-titration period of CAST-I, patients were randomized to one of four drug-dose sequences. Patients could be given up to three drugs (encainide, flecainide, moricizine) and up to two doses of each drug. Patients who respond by demonstrating at least an 80% reduction in their ventricular premature depolarization rate and a 90% reduction in runs of ectopic beats compared with baseline are randomized to the therapy to which they responded or its matching placebo. The first 800 scheduled patients randomized to the dose-titration phase in CAST-I were included in this analysis. Patients who requested that their CAST drug be stopped after randomization to blinded therapy and were then placed on an individualized therapy (either identical to their successful CAST therapy or to some other antiarrhythmic agent) were considered to have stopped
490
A . H a l l s t r o m et al.
following the protocol for perceived differences between the open-label drug and the blinded therapy. FoUow-up interviews were conducted to determine the real reason for withdrawal. CAPS randomized 502 patients with equal probability to placebo or one of four active antiarrhythmic strategies. Optimal active drug and dose for each patient was subsequently determined by drug-dose adjustment involving up to two drugs at three doses each. In order to preserve the blind, patients on placebo also had their doses adjusted. Patients were followed for 1 year after randomization to assess suppression and side effects. The drugs used were encainide, flecainide, moricizine, and imipramine. Except for left ventricular ejection fraction limits and time since the last myocardial infarction, inclusion and exclusion criteria for CAPS and CAST were similar. Patients in CAPS had a lower ejection fraction limit of 0.20 and no upper limit. CAST has no lower limit and upper limit of 0.40 or 0.55 depending on time since myocardial infarction to entry. Comparisons between the two trials are, therefore, based on CAST patients with ejection fraction of greater than 0.20 and CAPS patients with an ejection fraction of less than 0.50. Withdrawal rates are estimated by the Kaplan-Meier procedure [12]. RESULTS
To drive formulae for the potential savings (loss) of the predosed design, assume the following conditions: 1. For all designs we define eligibility at a common time, to (e.g., randomization to blinded therapy for the empiric and postdosed designs, randomization to titration for the predosed design). 2. From to until the end of follow-up the rate for the primary outcome under the null hypothesis is p. 3. The treatment is hypothesized to reduce the rate to (1 - r)p, 0 < r < 1 in the group that responds (e.g., in CAST those w h o have their arrhythmias suppressed). 4. For the empiric design a proportion k(0 < k < 1) of the patients are responsive with respect to the interim outcome. (Note: this k will probably be less than the percent responding during the predose or postdose titration phase unless each patient is given his or her optimum dose.) 5. For the predose and postdose design a proportion sk, s > 1 but sk ~< 1, of the patients are responsive with respect to the interim outcome. 6. During the predosed titration phase, a proportion j(0 < j < 1) of the primary outcome events occur. Thus, under the assumption that outcome events are independent of responsiveness (probably not always true), for this design the hypothesized rate during the main randomized trial is (1 - j)p/ (1 -
jp).
7. During the titration phase of a postdosed trial, the same proportion, j, of the primary events occur prior to achieving response with respect to the surrogate. 8. The number of patients per group needed to conduct a study for fixed type I and II error rates is N = 2(Z1-~ + ZI-~)2PQ (P1 - P2)2
491
Randomizing Responders
w h e r e PI(P2) is the h y p o t h e s i z e d rate in the placebo (treatment) g r o u p and (P1 + P2) 2 Writing d = P1 - P2 a n d V = PQ, the ratio of the sample sizes for a n y two designs A a n d B is
_ V,ffdB V NB
VB \dA/] "
For the empiric (E) design, assuming that in those for w h o m the t r e a t m e n t is effective (with respect to the surrogate) its impact is seen immediately, the e x p e c t e d rates in the placebo and t r e a t m e n t g r o u p s can be estimated as P~ = p, P2 = k(1 - r)p + (1 - k)p = p(1 - kr),
dE = kpr,
VE = P(1
P).
-
For the p r e d o s e d (DR) design, these rates are Pl
(1
-
j)p
(1
-
jp)'
=
- r)p,
P2 = (1 - j ) ( 1 (1
dDR =
jp)
(1 - j)pr (1
jp)"
-
P= voR
-
(1 = P(1
-
jp)
'
P).
Similarly, for the p o s t d o s e d (RD) design, these rates are
P1 = p, P2 = jp + (1 - sk)(1 - jp)p + (1 - r)sk(1 - jp)p = p[1 - rsk(1 - j ] ,
dRD = prsk(1 - j), rsk\
ff = p 1 - (1 - j ) - ~ - ) ,
Yap
=
P(1
-
P).
Thus, the ratio of the r a n d o m i z e d sample size required for the empiric a n d p r e d o s e d design is
492
A. HaUstrom et al. N(empiric) _ VE N(predosed) VoR
-J (1 1---jp)k
)2 "
and the ratio of the randomized sample sizes required for the postdosed and predosed designs is N(postdosed) _ VRo 1 2 N ( p r e d o s e d ) - VDR ( ( 1 - j p ) s k ) " However, the total number of patients that are recruited in the predosed design is larger than the randomized sample size by the factor 1/sk. Therefore the ratios of the total number of patients needed to compare the various designs are N(empiric) VE (~ER) 2 VE S ( 1 - - j ~2 N(predosed) = sk ~OR -- VDn k \-~'----- jp] " SO
Na NDR
s(1 - j)(2 - kr)[2(1 - p) + prk] k(2 - r)[2(1 - p) + pr(1 - j)] '
N(postdosed) VaD ~ doR~ 2 VaD 1 N(predosed) = sk VoR \ ~ n o ] = VoR sk(1 - jp)2, SO
NRD
NDR
[2 - rsk(1 - j)1[2(1 - p) + pr(1 - j)sk] sk(1 - j)(2 - r)[2(1 - p) + pr(1 - j)l
Thus NE/Noa increases linearly with the improved response obtained by titration, inversely with the empiric response rate and increases in a damped manner as the event rate during titration increases. Over reasonable ranges of the remaining parameters the sample size ratio is almost independent of the underlying event rate or assumed treatment effect. NRo/NoR increases inversely to the response rate, directly to the titration event rate and also is essentially independent of the event rate and treatment effect. Table 1 presents the relative efficiencies of the empiric versus predosed designs for selected ps, js, ss, and ks. Thus, for a trial in which the event rate p = 0.20, an assumed 30% reduction by treatment, i.e., 1 - r = 0.7, 15% of the events occur during the titration phase (i.e., j = 0.15), 75% of patients would respond in an empiric design (i.e., k = 0.75) and 82.5% respond during dosing (i.e., s = 1.1), the empiric design would require recruitment of 30% more patients than the predosed design.
493
Randomizing Responders Table 1
R e l a t i v e E f f i c i e n c i e s of E m p i r i c a n d P r e d o s e d D e s i g n s
N (empiric) / N (predosed) Untreated Event Rate p = .1 Empiric response rate k = .5 I m p r o v e d response with dosing s = 1.1 Proportion of events occurring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
2.3 2.3 2.2
2.2 2.1 2.1
2.1 2.0 2.0
2.0 1.9 1.9
1.8 1.8 1.7
Improved response with dosing s = 1.3 Proportion of events occuring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
2.8 2.7 2.6
2.6 2.5 2.5
2.5 2.4 2.3
2.3 2.3 2.2
2.2 2.1 2.1
Empiric response rate k = .75 Improved response with dosing s = 1.1 Proportion of events occurring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.5 1.5 1.4
1.4 1.4 1.4
1.3 1.3 1.3
1.2 1.2 1.2
1.2 1.1 1.1
Improved response with dosing s = 1.2 Proportion of events occuring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.6 1.6 1.6
1.5 1.5 1.5
1.4 1.4 1.4
1.4 1.3 1.3
1.3 1.3 1.2
Empiric response rate k = .9 Improved response with dosing s = 1.0 Proportion of events occurring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.1 1.1 1.1
1.0 1.0 1.0
0.97 0.96 0.96
0.91 0.91 0.90
0.86 0.85 0.84
Improved response with dosing s = 1.1 Proportion of events occuring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.2 1.2 1.2
1.1 1.1 1.1
1.1 1.1 1.1
1.0 1.0 0.99
0.94 0.94 0.93
494
A. Hallstrom et al. Table 1
Continued
N (empiric) / N (predosed) Untreated Event Rate p = .2 Empiric response rate k = .5 I m p r o v e d response with dosing s = 1.1 Proportion of events occurring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
2.3 2.2 2.2
2.2 2.1 2.1
2.1 2.0 2.0
0.2
0.25
2.0 1.9 1.8
1.8 1.8 1.7
I m p r o v e d response with dosing s = 1.3 Proportion of events occuring during dosing j =
Treatment ef~ct 1- r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
2.7 2.6 2.6
2.6 2.5 2.4
2.4 2.4 2.3
2.3 2.2 2.2
2.2 2.1 2.0
Empiric response rate k = .75 I m p r o v e d response with dosing s = 1.1 Proportion of events occurring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.5 1.4 1.4
1.4 1.4 1.4
1.3 1.3 1.3
1.2 1.2 1.2
1.2 1.1 1.1
I m p r o v e d response with dosing s = 1.2 Proportion of events occuring during dosing j =
Trea~ent ef~ct 1- r=
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.6 1.6 1.6
1.5 1.5 1.5
1.4 1.4 1.4
1.4 1.3 1.3
1.3 1.3 1.2
Empiffc response rate k = .9 ~ p r o v e d response w i ~ dosing s = 1.0 P r o p o ~ o n of events o c ~ m n g during dosing j =
Trea~ent e~ct 1- r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.1 1.1 1.1
1.0 1.0 1.0
0.97 0.96 0.96
0.92 0.91 0.90
0.86 0.85 0.85
I m p r o v e d response with dosing s = 1.1 Proportion of events occuring during dosing j =
~ea~ent e~ct 1- r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.2 1.2 1.2
1.1 1.1 1.1
1.1 1.1 1.1
1.0 1.0 0.99
0.95 0.94 0.93
495
Randomizing Responders Table 1
Continued
N (empiric) / N (predosed) Untreated Event Rate p = .3 Empiric response rate k = .5 Improved response with dosing s = 1.1 Proportion of events occurring d u r i n g dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
2.3 2.2 2.2
2.2 2.1 2.1
2.0 2.0 1.9
1.9 1.9 1.8
1.8 1.8 1.7
Improved response with dosing s = 1.3 Proportion of events occuring during dosing j =
Treatment e~ct 1- r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
2.7 2.6 2.6
2.5 2.5 2.4
2.4 2.4 2.3
2.3 2.2 2.2
2.2 2.1 2.0
Empiric response rate k = .75 Improved response with dosing s = 1.1 Proportion of events occurring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
.015
0.2
0.25
1.5 1.4 1.4
1.4 1.4 1.3
1.3 1.3 1.3
1.2 1.2 1.2
1.2 1.1 1.1
Improved response with dosing s = 1.2 Proportion of events occuring d u r i n g dosing j =
Treatment e~ct 1-
r=
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.6 1.6 1.5
1.5 1.5 1.5
1.4 1.4 1.4
1.4 1.3 1.3
1.3 1.3 1.2
Empiric response rate k = .9 Improved response with dosing s = 1.0 Proportion of events occurring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.1 1.1 1.1
1.0 1.0 1.0
0.97 0.96 0.96
0.92 0.91 0.90
0.86 0.86 0.85
Improved response with dosing s = 1.1 Proportion of events occuring during dosing j = Treatment e~ct 1- r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.2 1.2 1.2
1.1 1.1 1.1
1.1 1.1 1.1
1.0 1.0 0.99
0.95 0.~ 0.93
496
A. Hallstrom et al. W e n o t e t h a t e x c e p t for t h e c a s e k ~ 0.90 ( e . g . , 90% of p a t i e n t s r e s p o n d in a n e m p i r i c d e s i g n ) t h e p r e d o s e d d e s i g n is a l w a y s m o r e e f f i c i e n t t h a n t h e empiric design. When comparing the predosed and postdosed designs, the p r e d o s e d is m o r e efficient, e v e n in t h i s i n s t a n c e (Table 2). A s a p r a c t i c a l e x a m p l e w e c o n s i d e r t h e r e l a t i v e e f f i c i e n c i e s of t h e t h r e e d e s i g n s for a trial to t e s t t h e C A S T h y p o t h e s i s . E s t i m a t e s of t h e a b o v e p a r a m e t e r s a r e o b t a i n e d p r i m a r i l y f r o m t h e e x p e r i e n c e in C A P S a n d C A S T .
Table 2
R e l a t i v e Efficiencies of P o s t d o s e d a n d P r e d o s e d D e s i g n s N (postdosed) ! N (predosed)
U n t r e a t e d E v e n t Rate p = .1 R e s p o n s e rate k s = 0.55
Proportion of events occurring during dosing j = Treatment effect 1 -
r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
2.1 2.1 2.0
2.3 2.2 2.1
2.4 2.3 2.3
2.6 2.5 2.4
2.8 2.7 2.6
R e s p o n s e rate ks = 0.65
Proportion of events occuring during dosing j = Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.8 1.7 1.7
1.9 1.8 1.8
2.0 1.9 1.9
2.1 2.1 2.0
2.3 2.2 2.2
R e s p o n s e rate k s = 0 . 8 2 5
Proportion of events occurring during dosing j = Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.3 1.3 1.3
1.4 1.4 1.4
1.5 1.5 1.5
1.6 1.6 1.6
1.8 1.7 1.7
R e s p o n s e rate k s = 0.9
Proportion of events occuring during dosing j = Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.2 1.2 1.2
1.3 1.3 1.3
1.4 1.4 1.3
1.5 1.5 1.4
1.6 1.6 1.5
R e s p o n s e rate k s = 0.99
Proportion of events occurring during dosing j Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.1 1.1 1.1
1.2 1.1 1.1
1.2 1.2 1.2
1.3 1.3 1.3
1.4 1.4 1.4
497
R a ndomi zi n g Responders Table 2
Continued
Untreated Event Rate p = .2 Response rate k s = .55 Proportion of events occurring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
2.1 2.0 2.0
2.2 2.2 2.1
2.4 2.3 2.2
0.2 2.5 2.5 2.4
0.25 2.7 2.6 2.6
Response rate k s = .65 Proportion of events occuring during dosing j =
Treatment e~ct 1- r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.7 1.7 1.7
1.9 1.8 1.8
2.0 1.9 1.9
2.1 2.1 2.0
2.3 2.2 2.2
Response rate k s = .825 Proportion of events occurring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.3 1.3 1.3
1.4 1.4 1.4
1.5 1.5 1.5
1.6 1.6 1.6
1.8 1.7 1.7
Response rate k s = .9 Proportion of events occuring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.2 1.2 1.2
1.3 1.3 1.3
1.4 1.4 1.3
1.5 1.5 1.4
1.6 1.6 1.5
Response rate k s = .99 Proportion of events occurring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.1 1.1 1.1
1.2 1.1 1.1
1.2 1.2 1.2
1.3 1.3 1.3
1.4 1.4 1.4
Untreated Event Rate p = .3 Response rate k s = .55 Proportion of events occurring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
2.1 2.0 2.0
2.2 2.1 2.1
2.4 2.3 2.2
2.5 2.4 2.4
2.7 2.6 2.5
Response rate k s = .65 Proportion of events occurring during dosing j =
Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
1.7 1.7 1.7
1.8 1.8 1.8
2.0 1.9 1.9
0.2 2.1 2.1 2.0
0.25 2.3 2.2 2.1
498
A. Hallstrom et al. 2
Table
Continued Response
rate ks = .825
Proportion of events occurring during dosing j -Treatment effect 1 - r =
0.6 0.7 0.8 Response
0.05
0.1
0.15
0.2
0.25
1.3 1.3 1.3
1.4 1.4 1.4
1.5 1.5 1.5
1.6 1.6 1.6
1.8 1.7 1.7
r a t e k s = .9
Proportion of events occuring during dosing j = Treatment effect 1 -
r =
0.6 0.7 0.8 Response
0.05
0.1
0.15
0.2
0.25
1.2 1.2 1.2
1.3 1.3 1.3
1.4 1.4 1.3
1.5 1.5 1.4
1.6 1.6 1.5
r a t e k s = .99
Proportion of events occurring during dosing j = Treatment effect 1 - r =
0.6 0.7 0.8
0.05
0.1
0.15
0.2
0.25
1.1 1.1 1.1
1.2 1.1 1.1
1.2 1.2 1.2
1.3 1.3 1.3
1.4 1.4 1.4
From the data obtained in CAPS a n d other studies, the expected e v e n t rate was estimated at p = 0.11 and r was p r e s u m e d to be 0.3, i.e., treatment was expected to reduce the e v e n t rate b y 30% in the responders. From the early (first 800 patients enrolled) CAST data, 77.6% ultimately r e s p o n d e d to therapy, so sk = 0.776 ( u n d e r the null a s s u m p t i o n that the interim o u t c o m e and mortality o u t c o m e are i n d e p e n d e n t ) , and 2% h a d a s u d d e n d e a t h or cardiac arrest during dose adjustment, so the p r o p o r t i o n of events occurring d u r i n g titration, j, was 0.182 (0.02/0.11). It remains to estimate k, the p r o p o r t i o n of r e s p o n d e r s in an empiric design. In the empiric design, patients w o u l d be r a n d o m i z e d into a single predeterm i n e d dose of medication and all events w o u l d be counted. H o w e v e r , some of these w o u l d contribute noise because they w o u l d occur in patients w h o s e arrhythmias were not suppressed. We a s s u m e use of the middle dose of CAPS; by that dose 75.8% h a d r e s p o n d e d to encainide, 83.5% h a d r e s p o n d e d to flecainide, a n d 66.3% h a d r e s p o n d e d to moricizine. The CAST design resulted in 51%, 37%, and 13% being first assigned to encainide, flecainide, and moricizine respectively. H o w e v e r , 8% of patients w h o tolerated b u t did not r e s p o n d to dose one, did not tolerate the middle dose. A s s u m i n g a similar percent of patients w h o r e s p o n d e d to dose one w o u l d not tolerate the middle dose, k = 0.719. Thus, s = 1.08. The relative sample sizes for the three designs are t h e n
NE NRD = 1.67. = 1.29 a n d NDR NDR
- -
Randomizing Responders
499
Thus using the CAST predosed design required 22.5% fewer patients that the empiric design and 40.0% fewer than a postdosed design. A potential concern raised by the predosed design is whether patients will agree to randomization once an "effective" drug for the interim response is identified. Of the first 800 patients enrolled to open titration in CAST, 621 were eligible for randomization to blind therapy. Only 3 of these 621 (0.5%) patients were not randomized, and in only one case was the reason attributable to patient or physician refusal. The withdrawal rate from blinded therapy due to patient or physician preference and for which the individual therapy chosen after withdrawal was the successful study therapy was zero in the group randomized to active therapy and 1% in the group randomized to placebo. DISCUSSION The planning, execution, and analysis of a clinical trial bear directly on the inferences that may be drawn and the degree to which these are implemented by the medical community. One aspect of clinical trial design and execution that has received attention is the "intention to treat" philosophy [13,14]. While this dictum has been almost universally accepted, the idea of including individuals in the analysis who are "not responsive" to therapy or who are not "adhering" to the protocol continues to concern some researchers. Although we do not have any appropriate methods of analysis that will, without introducing bias, adjust for this problem, clinical trialists have developed some designs that attempt to solve or at least reduce the magnitude of the problem. Among these are sample size adjustment for nonadherence and crossover, prerandomization run-in, and dose-titration periods designed to identify nonadherers and "responders." Based on the results of previous trials and epidemiologic studies, ventricular premature depolarizations have been shown to be related to sudden death in patients who have had a myocardial infarction. Thus in CAST we were testing the hypothesis that reducing venhicular premature depolarizations would reduce the risk of sudden death. To increase the sensitivity of the trial, we designed a drug-titration period, prior to randomization to blinded therapy to identify those patients whose premature beats would be initially suppressed by one of three drugs and to identify the best drug and dose for each patient. The results of this analysis for the first 800 patients enrolled in the predosedesigned CAST trial suggest the following conclusions: there is no indication of an unwillingness to be randomized to blinded treatment after a titration phase identified a therapy that successfully reduces arrhythmia is found (3/ 621 = 0.0048); there is the possibility of a small bias in the rate of patients stopping their study drug after therapy is initiated because of perceived differences (0% from patients on active vs. 1% from patients on placebo, p = NS); there is a loss of events (early deaths and/or cardiac arrest) representing less than 17.5% (depending on how rapidly the empiric therapy becomes effective) of the potentially available information. In this regard it is important to differentiate the deaths that occur in patients whose ventricular arrhythmia would have been suppressed from the deaths that occur in patients whose arrhythmia would not have been suppressed. The former is a loss of infor-
500
A. Hallstrom et al. marion; the latter would be noise whose absence would, in fact, represent a gain in efficiency, assuming ventricular premature depolarization suppression is important in reducing mortality. Also, approximately one fifth of the patients who are eligible (except for not being responsive to the interim measure) and who could only represent noise in an empiric or postdosed design are not included in the predosed randomized trial. This is clearly illustrated by Figure 2, which compares withdrawal because of nonresponse, intolerance, or noncompliance in CAPS and CAST. This reduction in noise leads to an increase in efficiency, which can be substantial (Tables 1 and 2). For the CAST hypothesis an empiric design would have required a 29% increase (and a postdose design a 67% increase) in the number of patients randomized compared to the number entering the titration period in a predose design. Some might argue that the predosed design is of questionable ethics. One possibility is that short-term use of the drug in a clinic population might be harmful, while long-term use in those w h o can tolerate the drug might be neutral or even beneficial. This is a legitimate concern that can be addressed by randomizing patients to placebo titration or to delayed titration during the titration phase. Next, consider the situation in which the only purpose of the predosing is to ascertain compliance and tolerability. Then, if the empiric trial is ethical, so is the predosed design since the net result for any specific patient will be the same treatment under either design. Finally, consider the case where there is a measurable interim outcome. In a study designed to test the
o
CAST (EF >=0.20) during dose-adjustment CAPS (EF<0.50) o
CAST (EF>=0.20) during blinded therapy
(3-
o
o
0
20
40
60
80
Days
Figure 2 Adverse or nonresponders and early withdrawers. These patients would be counted in the "intention-to-treat" analyses in a postdosed design such as CAPS, but only the relatively few who withdraw during blinded therapy in a predosed design such as CAST.
Randomizing Responders
501
hypothesis---Does therapy A (e.g., an antiarrhythmic drug) reduce the risk of having outcome B (e.g., sudden death)---the presumption would be that the currently available literature is inconclusive. Therefore, the need for the randomized clinical trial. There, of course, may be data suggesting that using therapy A reduces the frequency of an interim outcome, C (e.g., premature ventricular beats), and that C is positively related to B. However, as we have recently demonstrated (CAST) a positive relation does not imply causality. If a patient and physician are in equipoise concerning therapy A and outcome B but believe that the interim outcome C is causally related to outcome B, then the patient should only be randomized to a titration trial whose primary outcome is C. However, if the patient and physician are also in equipoise as to causality between C and B, then the predosed design should be as ethical as an empiric design. These issues are complex and are not the main focus of this report but may deserve debate in another forum. Suffice it to note that this issue was debated during the development of the Cardiac Arrhythmia Suppression Trial and the decision was to proceed. For the CAST the ethical argument was also thought to be somewhat specious since the patients who were randomized as well as a number w h o would have been nonresponsive when assessed by the interim outcome (VPB suppression) would have been randomized under the more traditional design. In proposing the third design, it is paramount to remember that the shortterm efficacy evaluated on an interim response variable may or may not accurately reflect the true response to the main study outcome. The purported mechanism of action of the drug, as measured by the interim response may be incorrect. It is even conceivable that such a design of initial responders may actually increase the crossover rate from placebo to active therapy, since those randomized to the inert arm may perceive a difference in response. We have reported the results for CAST which show that this was not an issue for this trial. Except for CAST, we are aware of one other trial [15] that is attempting to use such a design. That trial is using probucol to look at the development of femoral atherosclerosis in hypercholesterolemic patients. To be eligible each patient must show responses to diet, probucol, and cholestyramine prior to randomization. The designs discussed in this article are being compared in the evaluation of a specific hypothesis. Namely, will changing a measurable interim response (i.e., premature ventricular beats) reduce the risk of a major outcome (i.e., sudden death). The goal is to choose a design that is efficient with respect to the number of randomized patients. The empiric design randomizes all eligible patients to a single drug and dose, regardless of whether the therapy appropriately reduces each patient's VPBs. The postdosed design improves on the empiric by allowing some dose adjustrnent after randomization. This results in some, but not all, patients showing the appropriate response to the interim outcome. Only the predosed design requires identification of initial responders prior to randomization. Thus of the three, this design allows the most efficient test of the original hypothesis. Some might argue that we could use the empiric design and still obtain an estimate of the efficacy of reducing the main outcome by reducing the interim outcome. This would be accomplished by some statistical adjustment. Although this might be possible, it will require additional measures and could result in complicated analyses
502
A. Hallstrom et al. requiring cautious interpretations. The investigators w o u l d have to decide w h e n a n d perhaps h o w often to measure the interim outcome, itself a response to the treatment. It is not obvious h o w to properly adjust a main study outcome for an interim outcome. Unless this is possible and acceptable to the medical readers it will be difficult to obtain an unbiased estimate of the efficacy of the therapy in those w h o respond to the interim outcome. Similar arguments apply to the postdosed trial. In s u m m a r y , the results observed for the first 800 patients enrolled in CAST provide strong evidence that the predose design can be practical a n d efficient. Consideration of such a design is strongly r e c o m m e n d e d w h e n e v e r an appropriate interim response variable is available. This work was supported in part by Contract NO1-HC-65042 with the National Heart, Lung, and Blood Institute, U.S. Department of Health and Human Services.
REFERENCES 1. Cardiac Arrhythmia Pilot Study (CAPS) Investigators: Recruitment and baseline description of patients in the cardiac arrhythmia pilot study. Am J Cardiol 61:704713, 1988 2. Cardiac Arrhythmia Pilot Study (CAPS) Investigators: Effects of encainide, fiecainide, imipramine and moricizine on ventricular arrhythmias during the year after acute myocardial infarction: The Cardiac Arrhythmia Pilot Study (CAPS). Am J Cardiol 501-509, 1988 3. Cardiac Arrhythmia Suppression Trial (CAST) Investigators: Preliminary report: Effect of encainide and flecainide on mortality in a randomized trial arrhythmia suppression after myocardial infarction, N Engl J Med 321:406-412, 1989 4. Friedman LM, Furberg CD, DeMets DL: Fundamentals of clinical trials. Boston: John Wright, PSG, 1981 5. U.S. Physicians' Study; The Steering Committee of the Physician Health Study Research Group: Preliminary Report; Findings from the aspirin component of the ongoing Physicians' Health Study. N Engl J Med 318:262-264, 1988 6. British Physicians' Study: Peto R, Gray R, Collins R, Wheatley K, Hennekens C, Jamrozik K, Worlow C, Hafner B, Thompson E, Norton S, Gilliland J, Doll R. Randomized trial of prophylactic daily aspirin in British male doctors. Br Med J 296:313-316, 1988 7. Studies of Left Ventricular Dysfunction (SOLVD): Protocol 8. Brittain E, Wittes J: The run-in period in clinical trials: The effect of misclassification on efficiency. Controlled Clln Trials 11:327-338, 1990 9. Hypertension Detection and Follow-Up Program Cooperative Group: The Hypertension Detection and Follow-Up Program. Prevent Med 5:207-215, 1976 10. The Systolic Hypertension in the Elderly Program (SHEP): Cooperative Research Group. Rationale and design of a randomized clinical trial on prevention of stroke in isolated systolic hypertensives. J Clin Epidemiol 41:1197-1208, 1988 11. Gillespie MJ, Akiyama T, Butler L, et al: Successful blinding in cardiac arrhythmia pilot study (CAPS). Controlled Clin Trials 8:288, 1987. Abstract 12. Kaplan FL, Meier P: Nonparametrix estimation from incomplete observations. J Am Stat Assoc 53:457-481, 1958 13. Schwartz D, Lellouch ]: Explanatory and pragmatic attitudes in therapeutic trials. J Chron Dis 20:637-648, 1967
Randomizing Responders
503
14. Sackett DL, Gent M: Controversy in counting and attributing events in clinical trials. N Engl J Medicine 301:1410-1412, 1979 15. Walldius G, Carlson LA, et al: Development of femoral atherosclerosis in hypercholesteremic patients during treatment with cholestyramine and probucol/placebo: Probucol quantitative regression Swedish trial (PQRST): A status report. Am J Cardiol 62:37-43B, 1988