Ethics and Practice

Ethics and Practice

Ethics and Practice: Alternative Designs for Phase III Randomized Clinical Trials Christopher R. Palmer, PhD, and William F. Rosenberger, PhD Centre f...

139KB Sizes 16 Downloads 246 Views

Ethics and Practice: Alternative Designs for Phase III Randomized Clinical Trials Christopher R. Palmer, PhD, and William F. Rosenberger, PhD Centre for Applied Medical Statistics, Department of Community Medicine, Institute of Public Health, University of Cambridge, Cambridge, UK (C.R.P.), Department of Mathematics and Statistics, University of Maryland Baltimore County and Department of Epidemiology and Preventive Medicine, University of Maryland School of Medicine, Baltimore, Maryland (W.F.R.)

ABSTRACT: For decades, biostatisticians have developed and refined the methodology for clinical trials with the intent of giving trial participants a better representation than traditional, equal-allocation, fixed sample-size designs. Despite these methodologic advances and ethical advantages, alternative or data-dependent designs for phase III clinical trials, including sequential designs, Bayesian methods, and adaptive designs, have not been widely adopted in practice. We attempt to characterize situations under which these designs are feasible and desirable from ethical and logistical standpoints. In particular, we describe the role of individual and collective ethics in designing clinical trials and argue that greater attention should be paid to the former. We give examples of those alternative designs that have been used in practice, including discussion of their strengths and shortcomings. We conclude that alternative designs are applicable in limited classes of trials and that investigators should consider them more often when planning clinical trials. Controlled Clin Trials 1999;20:172–186  Elsevier Science Inc. 1999 KEY WORDS: Individual ethics, data-dependent designs, adaptive designs, Bayesian methods, fully sequential designs, sequential stopping rules, urn models, expert opinion, logistics

INTRODUCTION The typical clinical trial design for two treatments involves randomizing subjects into two groups that, to maximize power, are intended to be of equal size. During the design phase, one of the statistician’s principal activities is computing a sample size given certain design assumptions, namely, the type I and type II error rates and the supposed clinically meaningful treatment difference. Primarily for ethical reasons, trials are monitored on an interim basis; that is, the statistician performs an interim data analysis and presents the results to a steering committee or an external monitoring board. Presumably,

Address reprint requests to: Dr. C.R. Palmer, Centre for Applied Medical Statistics, University of Cambridge, Department of Community Medicine, Institute of Public Health, Robinson Way, Cambridge CB2 2SR, UK. Received July 10, 1996; accepted September 28, 1998. Controlled Clinical Trials 20:172–186 (1999)  Elsevier Science Inc. 1999 655 Avenue of the Americas, New York, NY 10010

0197-2456/99/$–see front matter PII S0197-2456(98)00056-7

Ethics, Practice, and Alternative Designs

173

sufficiently compelling evidence of a treatment difference would lead the committee to terminate patient accrual. For decades, some leading biostatisticians, motivated by ethical considerations, have explored alternatives to the typical design outlined above. Some of these designs have been extensively reviewed in articles from 1974 to 1985 [1–8]. In 1985, the ECMO trial was published [9], with subsequent debate in the statistical literature on the appropriateness of the trial and its analysis [10–16]. Some recent clinical trials have used alternative designs [17–19], whereas others are currently in progress. We believe it is timely to examine existing methodology from ethical and logistical points of view. Here we attempt to address when alternative data-dependent designs are feasible, appropriate, and desirable. We consider alternative designs under three broad categories: Bayesian methods, adaptive designs, and sequential stopping rules. Each of these may overlap with the others, as one may use Bayesian methods with adaptive randomization and have a sequential stopping rule simultaneously in a single trial. By “Bayesian methods,” we refer to any design methodology based on some selected prior distribution and implied posterior distribution conditional on the data. By “adaptive designs,” we refer to methods that incorporate accruing outcome data for updating treatment allocation probabilities to give subjects a better chance of receiving apparently superior treatment. “Sequential stopping rules” constitute the establishment of boundaries, whose crossing by a monitored test statistic leads to recommending termination of trial recruitment. We do not focus attention on phase I and II clinical trials but note that the ethical justification for using alternative designs, when appropriate, in these early phases is even stronger (see Palmer [20]) than for the phase III trials we consider here. All examples we cite have had actual use in clinical trials. None of the alternative designs we discuss bypasses randomization, a feature we believe essential for the highest scientific credibility. ETHICS AND CLINICAL TRIALS As a discipline, statistics relies on probability for describing eventualities that might or could happen in pursuit of numeric-based truth. Ethics relies on morality and describes what ought or should happen in pursuit of what is right. Matters of ethics and statistics come to the fore in clinical trials. Not surprisingly, perhaps, ethically minded biostatisticians have devoted considerable effort to enhance the design and conduct of trials. Alternative designs are by no means universally applicable or even appropriate, but we believe they are currently underused. Ethical Principles Bioethicists have developed ways of analyzing medical problems that involve applying sets of principles to a given situation. Schaffner [21] gives a summary. Typically, four principles are enunciated: autonomy, beneficence, nonmaleficence, and justice [22–25]. Other philosophers (see Gillon [26]) distinguish between systems of ethics that are consequence based (“utilitarian”) versus rights or duty based (“deontologic”). Systematic application of such

174

C.R. Palmer and W.F. Rosenberger

principles can provide a rational defense for trials or of randomization or informed consent within them. Individual and Collective Ethics As statisticians rather than ethicists, we prefer a simpler approach that reduces moral matters to just two categories: individual and collective ethics (Lellouch and Schwartz [27, 28]). The ethical choice dichotomizes into, respectively, doing what is best for current subjects in the trial versus doing what is best for future patients who stand to benefit from its results. Collective, or research, ethics are concerned with matters of scientific interest. Research physicians have responsibilities both to science and to their own patients. Pocock points out (his italics) that “each clinical trial involves a balance between individual and collective ethics” and that such a balance is never simple but complex [29, 30]. Various codes of ethics regarding human experimentation have developed through international consensus and received periodic revision, notably in the Declaration of Helsinki, which states that “Concern for the interests of the subject must always prevail over the interest of science and society” [31]. In short, collective ethics must not usurp the individual ethics that guard the interests of research subjects. Alternative designs share an underlying philosophy that puts ethical considerations ahead of purely theoretical ones. For instance, Samuel-Cahn and Wax [32] describe a case study in which clinicians used ethical considerations to select a stopping rule with a boundary having an overall alpha level of 0.30. This indicates that a type II error was considerably more severe than a type I error, a possibility that investigators sometimes overlook in designing trials to meet regulatory standards of evidence (i.e., p , 0.05). Another example from the methodologic literature, in Palmer [33], involves early elimination of unpromising treatments in a multitreatment setting through a decision-theoretic approach. The criterion used to drop a treatment is driven by expected number of successes rather than by the predetermination of error probabilities. In both examples, the design derives from the ethics rather than from traditionally convenient constraints. It is not reasonable, however, to go so far in the direction of individual ethics to the complete detriment of collective ethics, for otherwise there would be no scope for advancing scientific knowledge or making medical progress. This itself would sacrifice individual ethics because there would be no firm evidence on which to base clinical decisions. Collective ethics demands randomized trials to gather the strongest cause-and-effect evidence. The alternative designs we discuss involve randomization therefore, and each has developed as a compromise between individual and collective ethics. ALTERNATIVE DESIGNS IN PRACTICE We describe a variety of alternative designs, including their applicability, obstacles toward further implementation, and examples. The last two designs, fixed unequal allocation and crossover studies, do not fit neatly into the sequential/adaptive/Bayesian format, but one can view both as attempts to address ethical concerns with traditional trials. The first-mentioned, group sequential

Ethics, Practice, and Alternative Designs

175

design, has become more standard, whereas the related fully sequential design remains alternative. Sequential Stopping Rules An impediment to the use of sequential stopping rules is the need to know the sample size, a priori, for funding purposes. One can compute the expected sample size and its variance under most sequential stopping rules. Necessarily, this yields an expected range for budgetary considerations. One suggestion is for funding bodies to project costs on the basis of a nominal maximum, for example, a simulated 90th-percentile expected sample size, and to institute administrative procedures for conditionally releasing reserved funds, if recruitment needs prolonging. This suggestion would clearly overestimate costs, because only 10% of trials so designed would be expected to need “extra” resources. Group Sequential Monitoring As mentioned in the Introduction, a routinely implemented compromise has a requisite sample size fixed in advance, but the trial is monitored regularly for safety and early indications of efficacy. This is the group sequential design [34], meaning that data are monitored after groups of subjects respond rather than after each response, as in fully sequential designs mentioned later. Thus, fixed sample size trials often have an external monitoring committee whose remit is to advise the steering committee to stop recruiting if interim results suggest striking differences between treatments (e.g., see Geller and Pocock [35]). Provided that sequentially computed test statistics form a Brownian motion, one can use various methods (for reviews, see [36, 37]) potentially to reject the null hypothesis before the scheduled end of the study. A boundary for early stopping is established on the basis of the number of interim analyses and the desired alpha level of the study. One can also use conditional power computations to stop a trial early for no treatment effect, as in a recent trial on lupus nephritis [38]. Early stopping can occur for funding considerations, because there is no ethical mandate to withdraw subjects from therapies that are equally efficacious, and it can also facilitate explorations into other potentially more promising therapies. Fully Sequential Designs Fully sequential designs generally use equal allocation of treatments to patients without the investigators knowing in advance the total number to be randomized. This happens because the results, as they accumulate, will begin to favor one treatment or the other or else will indicate no meaningful difference between the treatments. The expected number of patients is calculable, after one specifies significance level and power, and hence the relevant stopping boundaries. Because this expected number is below that of a comparable fixed sample-size design, some ethical advantages are that fewer patients are generally needed to reach a conclusion and the process of identifying promising and unpromising treatments is

176

C.R. Palmer and W.F. Rosenberger

expedited. Furthermore, fewer trial volunteers are exposed to the poorer treatment than would be the case in a conventional fixed sample-size trial. When one performs each interim analysis that could lead to stopping, one observes an impact on the type I error (“spending of alpha”). This needs to be adjusted at the final analysis to maintain an overall significance level, leading to some loss of power [39]. The problem is magnified when data are monitored continuously, so that is one reason, besides logistical ones, why many investigators prefer group sequential designs over fully sequential designs. Sequential stopping rules are ancillary to the likelihood, and hence likelihood-based inferences are valid [40]. Therefore, one can do regression modeling of covariate effects in a standard manner. Other design-based inference procedures have been developed specifically to account for the stopping rule (see Whitehead [41] for details). Early Stopping In practice, a trial will almost inevitably continue at least to its preselected sample size despite the availability of a formal sequential stopping rule. In general, these stopping rules are rightly set up to make the recommendation of early stopping highly unlikely. For once a trial has started, continuation to its conclusion is often entirely appropriate for useful unbiased information to emerge on the new treatment(s). For example, Pocock and Hughes [30] illustrate “how clinical trials that stop early are prone to exaggerate the magnitude of treatment differences.” We agree that any statistical stopping rule should be used only as a guideline [42]. We recognize there are serious consequences to all concerned for terminating a trial inappropriately early, and we share especially the collective ethical concern that a foreshortened trial may not yield genuinely useful information. A counterargument to early stopping is the loss of information regarding secondary outcomes. A phase III clinical trial, however, should be designed principally to answer whether the therapy is effective and safe. Several authors argue extensively for large simple trials [29, 43, 44], a view with which we concur in the right circumstances, but it is highly inefficient to use clinical trials for doing research into basic science. Failing to stop a trial to answer questions that are not directly related to treatment efficacy and safety would be ethically unacceptable. Example: Triangular Tests Triangular tests (and their variants) are examples of sequential designs. Considerably helping the practicality of such designs is the availability of the software package PEST3 (Planning and Evaluation of Sequential Trials) [45]. Whitehead explains the methodology both extensively [41] and succinctly [46]. Some trials have used triangular tests with [47] and without [48] truncation to impose an upper sample-size limit. The latter was particularly well suited for this type of design, being a trial to test the efficacy of corticosteroids in AIDS patients with Pneumocystis carinii pneumonia. The primary response of early deterioration was known after a matter of days, whereas the long-term outcome was, for the patients, a matter of life and death.

Ethics, Practice, and Alternative Designs

177

Adaptive Designs Adaptive designs are appropriate for many trials, although for obvious reasons not long-term survival trials where the endpoint cannot be ascertained until perhaps years have elapsed. Simon [4] and Rosenberger and Lachin [15] discuss some logistical and statistical issues in the practical implementation of adaptive designs. Here we update some of their ideas, drawing in part on recent experience with actual adaptive clinical trials. Selection Bias Selection bias is not generally considered a problem in the modern era of clinical trials. When possible, clinical trials are double-blinded and the data managed by external data centers. Only independent monitoring boards have access to the data during the course of the trial, and confidentiality is maintained. Investigators can double-blind adaptive designs similarly. There is no reason why the physician or the recruited patients need to know the current allocation probabilities. A subtle potential for “accrual” bias still exists, however [49]. If volunteers and physicians know the rationale behind adaptive designs, they may also understand that those recruited later as opposed to sooner in a trial are more likely to receive the “better” treatment. This knowledge may bias a potential volunteer’s decision as to when to enter the study. In some cases, no such bias is possible, as in emergency medical situations or immediately life-threatening diseases, which offer patients no choices as to when to enter. These circumstances, we shall argue, are precisely the scenarios under which adaptive designs are particularly attractive. Covariates and Time Trends Adaptive designs are fully randomized, and hence any covariate imbalance between groups should be proportional to the respective sample sizes. If covariate imbalances result, as for fully sequential designs, standard likelihood-based modeling techniques should be applicable for adjusting after the trial [50], though asymptotic properties of covariate effects have not been rigorously proved to date. The time-trend issue remains serious for adaptive designs [51]. Most adaptive design theory assumes that response probabilities are homogeneous given treatment assignment. If there is a drift in subject characteristics over time influencing outcomes, then study results will be biased. Investigators should endeavor to ensure rigid eligibility requirements to maintain the composition of studies throughout their duration. This potential problem is less pronounced for small-scale short-term studies. Coad suggests dealing with time trends by stratification [52] or modeling [53]. Information Technology Multicenter clinical trials now proceed with state-of-the-art information technology. Staff can enter outcomes and covariate information into remote data

178

C.R. Palmer and W.F. Rosenberger

collection sites and immediately transmit them to centralized data centers that simultaneously update the randomization sequences and randomize new patients electronically. As discussed in Tamura et al. [18], adaptive trials are becoming more logistically feasible than ever before. Example: Urn Models The idea of using an urn model for allocation can be attributed to Zelen’s [54] deterministic play-the-winner rule used, for instance, in a trial of hypertension due to surgical anesthesia [55]. This design was followed by the randomized play-the-winner (RPW) rule [56], described as follows. An urn initially contains balls representing two treatments, say A and B. When a subject is ready for randomization, a ball is drawn and replaced to indicate the treatment to be allocated. If the subsequent response is a success on treatment A or a failure on treatment B, a ball (or balls) representing A is then added to the urn. If the response is a success on treatment B or a failure on treatment A, a ball (or balls) representing B is added. In this way, the urn builds up more balls representing the more promising treatment. The RPW rule was used in the ECMO trial [9], a trial that illustrates well the need for careful choice of the initial urn parameters and also the need for minimum enrollment [57]. The RPW rule can be generalized to continuous outcomes [58]. These rules have been handicapped in that they assume immediate response, so that the urn can be updated before the next subject arrives. It is possible, however, to incorporate delayed responses into the procedure, so that adaptation still takes place but more slowly. A point to consider with regard to delayed response is the relative speed of “good” and “bad” news. For instance another recent trial determined that neonates were known to be HIV-positive more quickly than they could be safely declared negative [59]. This reporting imbalance may bias the allocation scheme. Finally, we note that there is no reason why a fixed sample size must be used in RPW clinical trials. Rosenberger and Sriram [60] give a stopping rule and confidence interval for sequential estimation. Bayesian Methods We have little to add to the excellent summary and discussion by Spiegelhalter et al. [61] of Bayesian methods for clinical trials. In essence, a prior distribution for treatment effect is combined with accruing information into a posterior distribution that is conditional on the data. Kadane [62] expresses concisely the case in favor of such designs. As a matter of practicality, though, the elicitation of priors presents difficulties. Methods using expert opinion [61, 63, 64] may not be applicable when there is limited practical knowledge of the new therapy. Incorporating information from earlier phase III studies is not applicable for investigational drugs, as Ellenberg notes [65]. And one must be aware of publication bias toward positive studies if choosing priors on the basis of the literature [66]. Another potential difficulty arises at the end of a trial. When clinicians are presented with updated priors in the form of posterior distributions, is such information sufficient to allow them actually to change their practice? It is not

Ethics, Practice, and Alternative Designs

179

clear that clinicians would decide to allocate the indicated treatment to new patients, though this is not a criticism unique to Bayesian designs. Bayesians believe in the likelihood principle, and consequently their analyses ignore the design of clinical trials. Thus, if the trial had a stopping rule or an alternative randomization scheme, the analysis would be the same as with a typical clinical trial design. Frequentists (see e.g., [67]) would take issue with this, as alternative designs, by their very construction, contain much information about the trial’s outcome. Example: An Expert Opinion Bayesian Method A recent book discusses the Kadane-Sedransk-Seidenfeld (KSS) method for designing a trial [63]. Investigators specify a primary outcome and a set of covariates supposed a priori to be important in a subject’s prognosis (in addition to treatment). They then solicit opinions of several experts with respect to a linear (normal error) model describing the relationship among these covariates and a subject’s prognosis. Using standard Bayesian techniques, they update these opinions as data accrue. Given a new subject’s covariate structure, they compute each expert’s function. Unless at least one expert finds a treatment to be the best for someone with the subject’s covariate structure, it will not be assigned. Treatments that are considered best by at least one expert could then be randomized. Clearly, this design is only appropriate when there are experts available with sufficient experience with all experimental therapies. Relatively few trials apparently meet this criterion. The KSS methodology is illustrated in a clinical trial of nitroprusside and verapamil infusions for hypertension during cardiac surgery [68]. Methodology should also be developed for modeling more general responses (i.e., non-normal errors). But this is the only design we have discussed that incorporates a subject’s covariate structure into the allocation decision, which we believe is a significant step in the right direction [69]. Other Designs Fixed Unequal Allocation The simplest idea for an alternative design is a fixed randomization scheme with unequal allocation. This is akin to the traditional fixed sample-size design, except that 50:50 allocation is replaced, for example, by 2:1 or 3:1. Sometimes, favoring the experimental therapy is warranted in trials of potentially great public health benefit, such as when testing a new AIDS therapy, where patients may be reluctant to have only a 50% chance of receiving the novel therapy. Such unequal allocation schemes can improve recruitment and partially satisfy the individual ethics criterion. They can also be useful if widespread knowledge about the control therapy already exists and if more understanding is desired about the new therapy (see [70], where 2:1 allocation was used with this justification). One could also conceive of k:1 allocation designs where k (not necessarily integer valued) changes in accordance with the results of the interim monitoring. This idea really leads to the principles behind both adaptive designs and multistage designs (e.g., [71]).

180

C.R. Palmer and W.F. Rosenberger

An imbalance exceeding 3:1 may affect power adversely. For example, 2:1 allocation increases the requisite sample size by 1/8, whereas for 3:1 allocation, the increase is by 1/3, compared with equal allocation [72]. Sposto and Krailo [73] note that in some two-treatment survival trials, reasonable imbalances will result in approximately the same power for the logrank test as equal allocation. This could significantly decrease the number of patients assigned to the inferior treatment. Gore [74] suggested that subjects themselves be allowed some degree of choice in their level of randomization probabilities (e.g., from 70:30 through 50:50 to 30:70). This has the advantages of offering volunteers a little more freedom of choice and preserving randomization. It is not clear, however, that subjects would understand such a scheme well, as most of them would have little knowledge or understanding on which to base their decisions. Crossover Designs The ethical dilemma may be minimized in some trials by the use of a crossover design [75, 76], when applicable, because patients will receive, at some point, the better (or best) treatment. Crossover designs are only applicable for chronic stable conditions (e.g., hypercholesterolemia, migraine) and have become the standard for such. They are best used for short-term therapy. Investigators must establish suitable washout periods to minimize any carryover effect, which can also be modeled in the analysis phase. It is best to explore carryover effects in phase II clinical trials to be sure that such an effect will not confound results in phase III [77]. RECOMMENDATIONS Rather than proposing universally applicable rules to select particular alternative designs, we provide general principles to help those planning trials decide whether to consider seriously an alternative design and, if so, to make some more specific suggestions about their relative merits. General Guidelines We believe that if a proposed trial involves a rare disease (or condition), then the weight of individual ethics is greater than if the disease being combatted is common. The utilitarian argument supporting collective ethics is stronger if the numbers of persons likely to benefit from the trial’s results are orders of magnitude higher. A more compelling argument for individual ethics, however, is whenever the disease being studied in the trial (or equally the treatment’s toxicity) is potentially lethal. Table 1 synthesizes these arguments by indicating those circumstances when individual ethics should receive utmost priority in the delicate balance of ethics. The usefulness of the table for researchers, despite unanswered questions it may raise, is as follows: the greater the weight of individual ethics, the stronger the case for considering an alternative design. In showing a trend for the shift between ethics, Table 1 clearly oversimplifies matters. Furthermore, the debate about how the categories are defined and who defines them should involve more than just statisticians [20]. The asterisks

181

Ethics, Practice, and Alternative Designs

Table 1 Suggested Balance of Individual and Collective Ethics by Disease Severity and Prevalence Prevalence Severity

Very Rare

Rare

Common

Very Common

Life threatening Severe Moderate Mild

Individual Individual Individual ***

Individual Individual *** Collective

Individual *** Collective Collective

*** Collective Collective Collective

Asterisks (***) indicate ambivalence, where neither individual nor collective ethics dominates.

indicate ambivalence, whereas the further away from the asterisks the relevant cell is located, the stronger the case is for the particular ethic given. Specific Considerations Bayesian methodology is particularly well suited for trials in which clinicians already have some experience with the therapies, gained either before the study or from previous studies to form realistic (and skeptical) prior distributions. Adaptive designs are most logistically feasible in a single-center study when there is a single outcome of interest, little chance of confounding by covariates, an immediate response, and a stable population [15]. The use of surrogate outcomes to adapt on when the primary outcome is not available quickly is possible in some trials, but more research needs to be done on potential bias in the analysis. The incorporation of the individual subject’s covariate structure (for those covariates likely to influence prognosis) into the allocation probabilities, as in the KSS method, is another area of future research. Sequential stopping rules are applicable in much the same settings as adaptive designs. One important question of interest is, for given error probabilities, which fares better in terms of protecting subjects from treatment failures: skewing the allocation probabilities in a fixed sample design or equal allocation in a fully sequential design? Coad and Rosenberger [78] show that combining the RPW and triangular stopping rule procedures actually minimizes the expected number of failures compared with each procedure separately. DISCUSSION In many trials, data on the primary outcome are not available until the recruitment phase has already finished, so exploiting somehow the accruing data is not possible. Some trials, though, are short term and can have rapidly ascertainable outcomes. Some involve emergency medicine or a life-threatening illness, in which case individual ethics must dominate all other considerations. It would be unscientific blindly to apply methodology well suited for longterm, large-scale clinical trials to these other trials without carefully considering potential benefits of alternative methodologies. Science demands thoughtful, appropriate, and ethical designs. Despite ethical advantages for Bayesian, sequential, or adaptive designs for trials, they are the exception, rather than the rule, in practice.

182

C.R. Palmer and W.F. Rosenberger

In our experience, one of the main obstacles in implementing any of these alternative designs is the rush to have a protocol approved. Funding institutions are usually interested in starting a trial rapidly rather than planning. Members of a multicenter trial’s steering committee are often pressured to agree to a common protocol very quickly. This is not to say that innovative steering committees do not discuss and debate design issues, but issues like alternative randomization schemes are unlikely to make it to the table, unless already present in the original grant proposal. Yet many biostatisticians would feel reluctant to take a risk (we believe falsely perceived) in the competitive process of grant proposals. All these factors conspire against the adoption of an alternative design. The difficulties in proposing alternative designs in such a brief period of time could be relaxed somewhat in two ways, involving theory and practice, respectively. The first is by having the mathematical machinery in place to implement them. The onus then shifts to the theoretical statistician to develop appropriate inferential and estimation procedures for these designs (see [79] for progress in this direction). Even so, it is difficult to motivate such work in the absence of a real clinical trial. Second, we suggest an expanded role for pilot studies, already accepted as useful precursors to large-scale trials [80]. We note that such preliminary studies can be useful in eliciting priors for use in Bayesian designs and for establishing and assessing the feasibility of an automated randomization procedure, if adaptive randomization can be invoked in the main study. Some might argue that randomization in itself compromises individual ethics. Yet we disagree because the therapies under study are presumably in a state of equipoise at the start of the trial. Although 50:50 allocation is therefore justified initially, as the data accrue during the course of the study, skewing the allocation probabilities in accordance is consistent with the necessary compromise between individual and collective ethics. Clayton [5] states, “Rather ironically, adaptive designs, although designed specifically with ethical considerations in mind, in a sense heighten the ethical difficulty. Later on in the trial, clinicians may be required to randomise patients in a 9:1 ratio, and if one treatment is so good as to receive 90% of the patients, the ethical problems of withholding it from the other 10% seem more acute.” Royall [14] and Schaffner [21] also use this argument to dismiss adaptive designs. We respond with two comments: Whether a standard clinical trial design or an adaptive design was used in this instance, we recommend external monitoring, and most likely such a trial would have been terminated under either scenario. Second, 9:1 allocation at that stage is more ethical that 1:1 allocation. In a personal communication, Altman raised the issue of whether a design can be more or less ethical than another on the grounds that ethical matters are either right or wrong. Our response is that ethical principles (e.g., to do no harm) are black and white but that their application can be gray in practice. To illustrate, a 100-person trial design that harms 30 persons is more ethical than one that harms 50, but is less so than one that harms 20, or even 29. As always, ethical arguments are more forceful in the context of failure meaning death. Machin [81] provides two reasons for recruiting patients into a trial as quickly as possible. The first is the understandable desire to answer a particular question

Ethics, Practice, and Alternative Designs

183

expeditiously. The second concerns possible disturbance of clinical equipoise if substantial evidence in favor of one treatment should emerge in the course of a long recruitment phase. Whereas he uses this reasoning to support interim analyses in traditional trials, we see the same argument as supporting the case for using an alternative design whenever possible. In conclusion, clinical trials pose ethical problems at their starts, their ends, and at all times in between. Alternative designs use accruing data dynamically as information becomes available. In most trials conducted, valuable information emerges in the early stages that can, and we suggest often should, be incorporated into the latter stages of the trial in preference to blinding of the data until one attains some prespecified recruitment target. In realization of the ethical shortcomings of totally ignoring the accruing data, conventional trials have adopted data and safety monitoring boards to interrupt an ongoing trial if they judge it ethically appropriate to do so. We believe that in certain situations, which we have indicated, this ethical policing role should be promoted from a background presence to influence proactively the design and conduct of the trial. That is, there are circumstances when purely statistical considerations (such as p , 0.05) or administrative reasons (e.g., the budget must be for a fixed sample size) should be subordinated for the sake of ethics. This is so even if there is a slight loss of power efficiency, a cost for using a less readily understood design and analysis, or logistical difficulties incurred by the involvement of a number of subjects that is not predetermined. Funding mechanisms should become more flexible to accommodate sequential trials that do not know in advance the total numbers of patients randomized. Finally, we contend that when circumstances are appropriate, the failure to exploit modern statistical methodology and information technology is indefensible in present day clinical trials. Professor Rosenberger’s research was supported by grant R29-DK51017-02 from the National Institute of Diabetes and Digestive and Kidney Diseases. This paper was initiated while he was visiting the University of Cambridge during summer 1996. He thanks the Institute of Public Health for its hospitality. Both authors are grateful to referees for their helpful comments.

REFERENCES 1. Weinstein MC. Allocation of subjects in medical experiments. N Eng J Med 1974;291:1278–1285. 2. Hoel DG, Sobel M, Weiss GH. A survey of adaptive sampling for clinical trials. In: Elashoff R, ed. Perspectives in Biometrics. New York: Academic Press; 1975, pp. 29–61. 3. Bailar JC. Patient assignment algorithms-an overview. In Proceedings of the 9th International Biometric Conference. Vol. I. Raleigh: Biometric Society; 1976, pp. 189–206. 4. Simon R. Adaptive treatment assignment methods and clinical trials. Biometrics 1977;33:743–749. 5. Clayton DG. Ethically optimised designs. Br J Clin Pharmacol 1982;13:369–480. 6. Iglewicz B. Alternative designs: sequential, multi-stage, decision theory and adaptive designs. In: Buyse ME, Staquet J, Sylvester RJ, eds. Cancer Clinical Trials: Methods and Practice. Oxford: Oxford University Press; 1983, pp. 312–334. 7. Bather JA. On the allocation of treatments in sequential medical trials. Int Stat Rev 1985;53:1–13. 8. Armitage P. The search for optimality in clinical trials. Int Stat Rev 1985;53:15–24.

184

C.R. Palmer and W.F. Rosenberger

9. Bartlett RH, Roloff DW, Cornell RG, et al. Extracorporeal circulation in neonatal respiratory failure: a prospective randomized study. Pediatrics 1985;76:479–487. 10. Cornell RG, Landenberger BD, Bartlett RH. Randomized play the winner clinical trials. Commun Stat Theor Meth 1986;15:159–178. 11. Wei LJ. Exact two-sample permutation tests based on the randomized play-thewinner rule. Biometrika 1988;75:603–606. 12. Begg CB. On inferences from Wei’s biased coin design for clinical trials. Biometrika 1990;77:467–484. 13. Wei LJ, Smythe RT, Lin DY, Park TS. Statistical inference with data-dependent treatment allocation rules. J Am Stat Assoc 1990;85:156–162. 14. Royall RM. Ethics and statistics in randomized clinical trials. Stat Sci 1991;6:52–62. 15. Rosenberger WF, Lachin JM. The use of response-adaptive designs in clinical trials. Controlled Clin Trials 1993;14:471–484. 16. Farewell VT, Viveros R, Sprott DA. Statistical consequences of an adaptive treatment allocation in a clinical trial. Can J Stat 1993;21:21–27. 17. Parmar MKB, Spiegelhalter DJ, Freedman LS. The CHART trials: design and monitoring. Stat Med 1994;13:1297–1312. 18. Tamura RN, Faries DE, Andersen JS, Heiligenstein JH. A case study of an adaptive clinical trial in the treatment of out-patients with depressive disorder. J Am Stat Assoc 1994;89:768–776. 19. Whitehead J. Applications of sequential methods to a phase III clinical trial in stroke. Drug Inform J 1993;27:733–740. 20. Palmer CR. Ethics and statistical methodology in clinical trials. J Med Ethics 1993;19:219–222. 21. Schaffner KF. Ethically optimizing clinical trials. In: Kadane JB, ed. Bayesian Methods and Ethics in a Clinical Trial Design. New York: Wiley; 1996, pp. 19–63. 22. Beauchamp T, Childress J. Principles of Biomedical Ethics. New York: Oxford University Press; 1983. 23. Fried C. Medical Experimentation: Personal Integrity and Social Policy. New York: Elsevier; 1974. 24. Lebacqz K. Controlled clinical trials: some ethical issues. Controlled Clin Trials 1980;1:29–36. 25. Levine R. Ethics and Regulation of Clinical Research. New Haven: Yale University Press; 1986. 26. Gillon R. Philosophical Medical Ethics. Chichester: Wiley; 1985. 27. Lellouch J, Schwartz D. L’essai therapeutique: ethique individuelle ou ethique collective. Rev Inst Int Stati 1971;39:127–136. 28. Schwartz D, Flamant R, Lellouch J. Clincial Trials. London: Academic Press; 1980. 29. Pocock SJ. Clinical Trials. Chichester: Willey; 1983. 30. Pocock SJ, Hughes MD. Practical problems in interim analyses, with particular regard to estimation. Controlled Clin Trials 1989;10:209S–221S. 31. British Medical Association. Medical Ethics Today. London: BMJ Publishing; 1993, pp. 330–333. 32. Samuel-Cahn E, Wax Y. A sequential test for comparing two infection rates in a randomized clinical trial, and incorporation of data accumulated after stopping. Biometrics 1986;42:99–108. 33. Palmer CR. A comparative phase II clinical trials procedure for choosing the best of three treatments. Stat Med 1991;10:1327–1340. 34. Armitage P, McPherson CK, Rowe BC. Repeated significance tests on accumulating data. J R Stat Soc A 1969;132:235–244.

Ethics, Practice, and Alternative Designs

185

35. Geller NL, Pocock SJ. Interim analyses in randomized clinical trials: ramifications and guidelines for practitioners. Biometrics 1987;43:213–223. 36. Enas GG, Dornseif BE, Sampson CB, et al. Monitoring versus interim analysis of clinical trials: a perspective from the pharmaceutical industry. Controlled Clin Trials 1989;10:57–70. 37. Pocock SJ. Statistical and ethical issues in monitoring clinical trials. Stat Med 1993;12:1459–1469. 38. Lachin JM, Lan SP. Termination of a clinical trial with no treatment group difference: the Lupus Nephritis Collaborative Study. Controlled Clin Trials 1992;13:62–79. 39. Lan KKG, Rosenberger WF, Lachin JM. Use of spending functions for occasional or continuous monitoring of data in clinical trials. Stat Med 1993;12:2219–2232. 40. McCullagh P. Discussion of Professor Bather’s paper. J R Stat Soc B 1981;43:265–292. 41. Whitehead J. The Design and Analysis of Sequential Clinical Trials. Chichester: Ellis Horwood; 1992. 42. Freedman LS, Spiegelhalter DJ. Comparison of Bayesian with group sequential methods for monitoring clinical trials. Controlled Clin Trials 1989;10:357–367. 43. Peto R. Discussion of “On the allocation of treatments in sequential medical trials,” by JA Bather and “The search for optimality in clinical trials” by P Armitage. Int Stat Rev 1985;53:31–34. 44. Ellenberg SS, Foulkes MA. The utility of large, simple trials in the evaluation of AIDS treatment strategies. Stat Med 1994;13:408–415. 45. Brunier H, Whitehead J. PEST3.0 Operating Manual. Reading: University of Reading; 1993. 46. Whitehead J. Sequential designs for pharmaceutical clinical trials. Pharm Med 1992;6:179–191. 47. Storb R, Deeg HJ, Whitehead J, et al. Methotrexate and cyclosporine compared with cyclosporine alone for prophylaxis of acute graft versus host disease after marrow transplantation for leukemia. N Engl J Med 1986;12:729–735. 48. Montaner JSG, Lawson LM, Levitt M, et al. Corticosteroids prevent early deterioration in patients with moderately severe Pneumocystic carinii pneumonia and the acquired immunodeficiency syndrome. Ann Intern Med 1990;113:14–20. 49. Rosenberger WF. New directions in adaptive designs. Stat Sci 1996;11:137–149. 50. Rosenberger WF, Flournoy N, Durham SD. Asymptotic joint normality of maximum likelihood estimators from multiparameter response-driven designs. J Stat Plann Inf 1997;60:69–76. 51. Altman DG, Royston JP. The hidden effect of time. Stat Med 1988;7:629–637. 52. Coad DS. A comparative study of some data-dependent allocation rules for Bernoulli data. J Stat Comput Simul 1992;40:219–231. 53. Coad DS. Sequential tests for an unstable response variable. Biometrika 1991; 78:113–121. 54. Zelen M. Play the winner rule and the controlled clinical trial. J Am Stat Assoc 1969;64:131–146. 55. Rout CC, Rocke DA, Levin J, et al. A re-evaluation of the role of crystalloid preload in the prevention of hypotension associated with spinal anesthesia for elective cesarean section. Anesthesiology 1993;79:262–269. 56. Wei LJ, Durham S. The randomized play-the-winner rule in medical trials. J Am Stat Assoc 1978;73:840–843. 57. Cox DR. Discussion of the paper by C. B. Begg. Biometrika 1990;77:483–484. 58. Rosenberger WF. Asymptotic inference with response-adaptive treatment allocation designs. Ann Stat 1993;21:2098–2107.

186

C.R. Palmer and W.F. Rosenberger

59. Connor EM, Sperling RS, Gelber R, et al. Reduction of maternal-infant transmission of human immunodeficiency virus type 1 with zidovudine treatment. N Engl J Med 1994;331:1173–1180. 60. Rosenberger WF, Sriram TN. Estimation for an adaptive allocation design. J Stat Plann Inf 1997;59:309–319. 61. Spiegelhalter DJ, Freedman LS, Parmar MKB. Bayesian approaches to randomized trials. J R Stat Soc A 1994;157:357–416. 62. Kadane JB. Prime time for Bayes. Controlled Clin Trials 1995;16:313–318. 63. Kadane JB, ed. Bayesian Methods and Ethics in a Clinical Trial Design. New York: Wiley; 1996. 64. Kadane JB, Wolfson, LJ. Experiences in elicitation. J R Stat Soc D 1998;47:3–20. 65. Ellenberg SS. Discussion of the paper by Speigelhalter, Freedman and Parmar. J R Stat Soc A 1994;157:402. 66. Begg CB, Berlin JA. Publication bias and dissemination of clinical research. J Natl Cancer Inst 1989;81:107–115. 67. Whitehead J. The case for frequentism in clinical trials. Stat Med 1993;12:1405–1414. 68. Kadane JB. Introduction to the verapamil/nitroprusside study. In: Kadane JB, ed. Bayesian Methods and Ethics in a Clinical Trial Design. New York: Wiley; 1996, pp. 129–130. 69. Rosenberger WF, Palmer CR. Book review: Bayesian Methods and Ethics in a Clinical Trial Design. J Am Stat Assoc 1997;92:384–385. 70. Cocconi G, Bella M, Zironi S, et al. Fluorouracil, doxorubicin, and mitomycin combination versus PELF chemotherapy in advanced gastric cancer: a prospective randomized trial of the Italian Oncology Group for Clinical Research. J Clin Oncol 1994; 12:2687–693. 71. Cornfield J, Halperin M, Greenhouse SW. An adaptive procedure for sequential clinical trials. J Am Stat Assoc 1969;64:659–770. 72. Altman DG. Practical Statistics for Medical Research. London: Chapman and Hall; 1991. 73. Sposto M, Krailo MD. Use of unequal allocation in survival trials. Stat Med 1987;6:119–126. 74. Gore SM. The consumer principle of randomisation [letter]. Lancet 1994;343:58. 75. Jones B, Kenward MJ. Design and Analysis of Crossover Trials. London: Chapman and Hall; 1989. 76. Senn S. Crossover Trials in Clinical Research. Chichester: Wiley; 1992. 77. Jones B, Lewis JA. The case for cross-over trials in phase III. Stat Med 1995;14: 1025–1038. 78. Coad DS, Rosenberger WF. A comparison of the randomised play-the-winner rule and the triangular test for clinical trials with binary responses. Stat Med (in press). 79. Flournoy N, Rosenberger WE, eds. Adaptive Designs. Hayward CA: Institute of Mathematical Statistics; 1995. 80. Wittes J, Brittain E. The role of internal pilot studies in increasing the efficiency of clinical trials. Stat Med 1990;9:65–72. 81. Machin D. Interim analysis and ethical issues in the conduct of trials. In: Williams CJ, ed. Introducing New Treatments for Cancer: Practical, Ethical and Legal Problems. New York: Wiley; 1992.