Statistical issues and analysis of mortality and morbidity randomized clinical trials in congestive heart failure

Statistical issues and analysis of mortality and morbidity randomized clinical trials in congestive heart failure

Journal of Cardiac Failure Vol. 1 No. 4 1995 Reviews Statistical Issues and Analysis of Mortality and Morbidity Randomized Clinical Trials in Congest...

732KB Sizes 1 Downloads 54 Views

Journal of Cardiac Failure Vol. 1 No. 4 1995

Reviews Statistical Issues and Analysis of Mortality and Morbidity Randomized Clinical Trials in Congestive Heart Failure L L O Y D D. F I S H E R , P h D

Seattle, Washington

Statistical issues in congestive heart failure (CHF) randomized clinical trials (RCTs) are not unique. In that sense there is no need for a review specific to statistical issues in CHF RCTs. Certain issues, however, are more important in CHF RCTs than in many other RCTs, although most of the issues discussed in this article are germane in the setting of trials in any disease with a high mortality rate (eg, acquired immunodeficiency syndrome, many cancers). These issues will be addressed later. In addition, certain issues in the design, operation, and analysis of RCTs, perhaps less familiar to the clinical community not as intimately involved in the conduct of such trials, will also be reviewed here. Since CHF has such a serious prognosis, a number of survival trials have been performed. In these trials, ongoing monitoring of the data is ethically required; the way such monitoring is performed will be discussed also. A very abbreviated description of a few trials is given first so that they may be referred to. (No attempt was made to be comprehensive or to select the most important or best trials.) The sequential monitoring of trials and the use of a Data and Safety Monitoring Board (DSMB) is also discussed. This is followed by a discussion of the issues in trial design: (1) How long should a trial be? (2) Is it appropriate to use specific causes of death? (3) May power be improved by using composite endpoints?

Materials and Methods A Short Description of Some of the CHF Trials Reference will be made to a number of CHF clinical trials. A telegraphic reminder of the trials is given here; the references have the detailed reports of the individual study results. V-HeFT P: Six hundred forty-two men with CHF, who were taking digoxin and a diuretic, were randomized to one of three arms: (1) placebo, (2) prazosin (20 mg/d), and (3) hydralazine (300 mg/d) and isosorbide dinitrate (160 rag/d). The follow-up period averaged 2.3 years (range, 6 months to 5.7 years). PROMISE2: The PROMISE trial examined milrinone versus placebo for survival in New York Heart Association (NYHA) class III and IV patients. A total of 1,088 patients were randomized to either milrinone (40 mg/d) or placebo. The follow-up period averaged 215 days (range, 1 day to 20 months). PROFILE3: The patients in this study were in NYHA class III or IV with an ejection fraction less than 0.35. A total of 2,304 patients were enrolled either to a doubleblind placebo or flosequinan at 75 mg/d or 100 mg/d depending on the response to prerandomization flosequinan. CONSENSUS4: The CONSENSUS trial studied NYHA class IV patients randomized to enalapril (2.5-40 mg/d) versus placebo. The follow-up study period averaged 188 days (range, 1 day to 20 months). One hundred twenty-six patients were randomized to placebo and 127 were given enalapril. SOLVDS: The patients in this study were in NYHA classes II and III, with 1,285 randomly assigned to enalapril (2.5-20 mg/d) and 1,284 randomized to placebo treatment. The average follow-up period was 41 months.

From the Department of Biostatistics, University of Washington, Seattle, Washington. Manuscript received July 15, 1994; revised manuscript received March 13, 1995; accepted May 23, 1995. Reprint requests: Lloyd D. Fisher, PhD, Department of Biostatistics, Box 357232, University of Washington, Seattle, WA 98195.

303

304

Journal of Cardiac Failure Vol. 1 No. 4 September 1995

Discussion Sequential Monitoring of Data and the Data and Safety Monitoring Board Ethical concerns in both medicine and experimentation in humans go back to the beginning of medicine. 6 In this era of randomized clinical trials, if the clinical endpoint being investigated is quite serious (eg, death, neurologic function), there is an ethical mandate that once an answer is known a clinical trial be stopped for the benefit of both the subjects being studied and for future patients as well. This much is clear. What is not so clear is how to implement these ethical concerns in practice. There are multiple issues involved in implementing this clear ethical mandate. Among the questions that need to be considered are the following: How much information is needed before an answer is known? Is there any problem in repeatedly looking for the answer as the data accumulate? How are pertinent facts related to a particular study, but that are not part of the study itself, going to be appropriately taken into account? Most clinical trials could serve to benefit selected individuals, companies, or both with scientific prestige, financial gain, or even by a feeling of accomplishment. How can one assure that monitoring of such a trial is done in an unbiased and objective fashion? I will now consider these issue and some current solutions. H o w Much Information Is Needed Before an Answer Is K n o w n ? In truth, this is a very difficult question; if taken from its basic components the answers would be very difficult to achieve with any consensus. In some hard to define sense, one might like a level of proof that benefits society overall in the long r u n - - a utility approach to the level of proof. In practice, the answer has been determined through a convention that is somewhat of a historical accident. 7 *The underlying conceptual framework used has been hypothesis testing. To briefly review, suppose there is no true difference between two treatments being compared; by chance, a study of particular patients will have some (small or large) difference observed. It is not desirable to declare a difference as a true treatment effect if the data might very well occur by chance even when there is no difference. The historical convention is that to be considered convincing the observed difference must be

*It is rumored that the use of the .05 significance level became customary because Sir R. A. Fisher who wrote the first statistical textbook (Scientific Methods for Research Workers) used the .05 significance level most of the time. Fisher was engaged in an ongoing feud with Karl Pearson and supposedly could not get permission to use more than selections from the available statistical tables published by Biometrika under Pearson's direction. Thus, the use of the .05 significance level is started. Conceptually, the appropriate amount of proof would depend on the consequences of the different possible errors--in practice, this would be extremely difficult and raise great controversey.

dramatic enough that only 1 time in 20 would such a difference, or a more extreme difference, occur by chance when in truth there is no difference between treatments. The situation of no difference is called the null hypothesis; the need to be convincing at the more than 1/20 or .05 level of proof refers to the significance level of the study or test. This .05 significance level is the historical convention used today. The significance level used for the experiment is also called the probability of a type I error. (The error being that one declares a treatment difference when in fact there is no true treatment difference.) An alternative approach to scientific inference using Bayesian statistical methods is now being more widely advocated 8 for use in clinical trials. To date, this paradigm has not been widely used, but it may be in the future. Usually, but not always, the practical differences between frequentist trials as now run and Bayesian trials are not great. Is There Any Problem in Looking for This Information as the Data Accumulate, Especially if One Has Multiple Opportunities to Examine the Data F r o m a Study and to Declare a Treatment Difference? In this case more information is required at a particular look to declare a difference than would be the case if only one look were to be done. 9 This is done to preserve the .05 significance level. The problem of looking multiple times or performing multiple tests is generically called the multiple comparison problem. In the context of multiple tests of accumulating data, it is also called the sequential testing problem or the repeated testing problem. Intuitively, the idea may be appreciated by considering the flipping of a fair coin (ie, half the time a head occurs and half the time a tail). You are on a diet and trapped with a bag of donuts; you decide to leave it to chance whether you eat a donut. You flip a coin and will not eat the donut if the flip is heads and will eat the donut if the flip is tails. The flip is made and lands heads. You then decide that one flip is too few; you will make it the best two of three flips, eating the donut if two of the three additional flips are tails. Unfortunately, exactly two of the three flips are heads. You then decide that four of seven is really a better test. Again, only three of the seven new coin tosses are tails. You then go for five of nine tosses, etc. After spending most of the day flipping a coin, it finally results that 25 of 49 flips are tails! You then eat the donut knowing that there was a 50% chance that you would not eat the donut--clearly, you were meant to eat the donut. Were the chances really 50%? Of course not. It can be shown mathematically that eventually you would have more tails than heads. There was a 100% chance you would get to eat the donut. Even if it was specified ahead of time that you could only try 1, 3, 5, 7 , . . . , 49 flips per trial and then stop if there were more tails than heads on any of the tests, there is only 1 chance in 335, 554, 432 that you would lose in all 25 of the experiments. In fact, you were very

Statistical Issues in CHFTrials

unlucky to need that many coin flips. A similar problem exists when comparing two therapies--each look at the data is another opportunity to stop and declare a treatment difference. If multiple looks are allowed, this has to be taken into account if we only want 1 chance in 20 of declaring a therapeutic difference when in fact there was none. One way to avoid the problem is to wait until the end of the trial and then perform the only test for a difference; however, that option is not ethically available in survival trials. Appropriate statistical procedures have been studied and developed for sequential monitoring of clinical trials. 1°-~2This is referred to in the literature as sequential analysis, sequential monitoring, repeated testing, or a similar term. The details of the sequential analysis of the data is complex and has trade-offs. If a relatively smaller difference is going to be required early to declare a treatment difference, this increases the size of the difference that will be required later on in the trial to declare a treatment difference. Although there are several methods available to handle the sequentially examined data, there is no one consensus method that decides how much evidence to require at each look to terminate a trial and declare a winner; the only constant factor is that the .05 significance level is usually preserved. Usually early differences must be extremely dramatic to stop a sequentially monitored clinical trial so that the total sample size not be too large. This is difficult to live with in practice if early apparent treatment differences accumulate. On the other hand, the price of lowering the amount of evidence to declare a treatment difference early in the trial is either: (1) to require more evidence later to declare a difference or (2) to have a larger trial so that the chances of finding a difference, when there is a difference of a given amount, are not diminished. How Are Pertinent Facts Related to a Particular Study, but Not Part of the Study Itself, Going to Be Appropriately Taken into Account? Because Most Clinical Trials Serve to Benefit Selected Individuals, Companies, or Both in Terms of Scientific Prestige, Financially, or Even With a Feeling of Accomplishment, How Can One Assure That Monitoring of a Trial Is Done in an Unbiased, Objective Fashion? The answers to these two questions are related in current practice. To ensure an objective evaluation of study data, the DSMB is appointed. The DSMB is delegated the responsibility to monitor the clinical trial and to decide whether and when a treatment difference has been determined; if sequential analysis is being performed the ideas given above are typically followed. Often the DSMB is convened before a study starts and has an opportunity to interact on the design and operational details of the study, as well as later assessing the ongoing performance of the study and sequentially reviewing the accumulating data. The DSMB has the responsibility to take into account all known



Fisher

305

important factors, not just the data of the study at hand, in deciding whether a study can continue, needs to be modified, or even needs to be stopped. An example of external data that can influence a study is the announcement of results by another study using the same drug in a slightly different patient population or even a related drug in a similar population. Another example is a beneficial effect of some other drug in a similar patient population, which might make a placebo no longer ethical. In therapeutic evaluation, not only is the treatment benefit of concern but the risks of a therapeutic strategy must also be examined and the risk/benefit ratio taken into account. The DSMBs for National Institutes of Health studies have been used since at least the 1960s. A committee chaired by Bernard Greenberg studied the issues of sequential monitoring for the (then) National Heart Institute. ~3 Since then, DSMBs have been used in most multicenter National Institutes of Health studies as well as in numerous trials performed by the pharmaceutical industry; the ideas and information given here have been discussed previously. 14-~6 The role of the DSMB can be every bit as important for safety as for efficacy in evaluating therapeutic interventions. For example, both the PROMISE study a of milrinone and the PROFILE study 3 of flosequinan were stopped early by DSMBs because of excess toxicity with the studied regimens. The PROMISE milrinone study was stopped with the relative risk of milrinone compared to placebo at 1.28; in that study, the overall mortality was 30% for milrinone and 24% for the placebo arm. The relative risk of flosequinan compared to placebo was 1.43 in the PROFILE study. The CONSENSUS trial, 4 in contrast, was stopped early for a beneficial treatment effect on survival when compared to placebo. In the CONSENSUS trial, the crude 6-month mortality exhibited a 40% reduction in the death rate from 44% in the placebo group to 26% in the enalapril arm of the trial. The SOLVD study 5 went to the scheduled full follow-up period before reporting a favorable benefit of enalapril. The larger SOLVD study resulted in a lower mortality in the enalapril group (35.2%) than in the placebo group (39.7%). The reduction in the relative risk was 16% with a 95% confidence interval from 5 to 26%. These trials illustrate: (1) the strong ethical need for sequential monitoring and (2) the benefit of sequential monitoring in stopping trials early for both efficacy and safety. The DSMBs always have representation from the medical discipline(s) involved in the trial; they often have a biostatistician(s), a lay representative, and a trained medical ethicist. The members should be free of potential conflict of interest with respect to the operation and outcome of the study so that they can give objective, unbiased advice. The operational definition of free of potential conflict of interest has varied but usually includes statements that board members and their immediate family do not own stock in a corn-

306

Journal of Cardiac Failure Vol. 1 No. 4 September 1995

pany that might profit from a particular outcome of the study; often it is required that the individuals not have substantial ongoing consulting arrangements with the company. In addition, DSMB members are often advised on laws relating to insider trading. In most cases today, the DSMBs use sequential monitoring guidelines to assure adequate control of the significance level; it is generally recognized that these roles are only guidelines and that the DSMB must exercise critical scientific judgment in their deliberations and recommendations about a study. A statistically significant difference per se does not automatically mean that a trial will terminate because: (1) imbalance between treatment groups might offer a plausible alternative explanation, (2) within the study, the results by subgroups, or causes of death, and so forth might be so bizarre as to suggest chance has more play in the interpretation than suggested by the probability level alone, and finally, (3) the results of other related studies might make the result less believable, thus effectively increasing the level of proof needed before integrating the results of one study into the context of other known findings. Because the participating clinicians delegate to the DSMB their ethical mandate to monitor the study, the investigators can usually send requests and suggestions to the DSMB so that they (the investigators) can be assured that concerns are addressed by the DSMB (although the investigators do not know how these concerns are considered). The decisions and deliberations of the DSMB are usually kept confidential until the trial ends (either at the end of the scheduled full follow-up time or with a preplanned early termination due to the strength of evidence in the sequential analysis). The confidentiality avoids information being promulgated that would threaten the completion of the trial. Furthermore, those delivering the clinical care are freed from having partial findings that make the delivery of the treatment arms problematic. This occurs because, as discussed earlier, there is a required amount of evidence before concluding that a therapeutic difference has been demonstrated. The actual level of the scientific proof is approximately continuous, and as a trial moves close to potential early stoppage for either efficacy or safety, there is an appropriate concern, even anxiety, by the DSMB as they monitor the data. As one who has been involved in such boards, I can attest to the heavy responsibility felt by such boards in these difficult situations and the seriousness with which they discharge their responsibilities. Multiple authors have presented methods for the statistical detail of the sequential monitoring procedures. !7,18 Of particular importance for trials that end early are the following: the monitoring rules typically have a finite number of times at which the data are examined. Until the last few years, the times were usually calendar times (eg, once or twice per year); however, the statistical power of a trial depends on the number of endpoints

when a specific outcome, such as mortality, is used. Since the power of the statistical procedure depends on the number of endpoints observed, the monitoring is now often set up to occur at times where (approximately) certain numbers of endpoints have occurred. Furthermore, as shown by Lan and DeMets, j9 when a trial is close to ending, the DSMB may look at it more often without threatening the significance level. It is important to avoid unduly long trials that lead to undue patient exposure to an adverse treatment.

Length of Observation in CHF Trials Many drugs for CHF were evaluated for efficacy only by using exercise stress testing to show an improvement in exercise capacity. The trials were often of 4, 6, 8, or 12 weeks duration; it seemed to be implicitly assumed that any drug that improved cardiac function, as reflected in maximal exercise stress testing, must have an overall benefit to the patient. With the milrinone PROMISE 2 study, a drug that was arguably shown to improve exercise test time was shown to decrease length of life. Flosequinan was approved in the United States for its beneficial effect on hemodynamics and exercise; this effect was shown in multiple studies. 2°-24The flosequinan PROFILE 3 study showed an early trend toward improvement for the first 2 months; however, the situation deteriorated over time with excess mortality in the flosequinan arm of the trial and worsened long-term quality-of-life results. This occurred despite the fact that flosequinan improved exercise performance and was licensed for this purpose by the United States Food and Drug Administration before the PROFILE results were known. These results make the evaluation of new compounds for the treatment of CHF very expensive and difficult. No longer can one assume that short-term improvement in the quality of life would be continued over an extended time period; furthermore, the potential for excess mortality would remain without definitive comparative clinical trial data. This means that an early benefit, for example in exercise tolerance, would only be an initial step in a clinical development program. Furthermore, long-term observation without control groups are difficult to interpret with the relatively high mortality and variable results in different populations. In most instances, long-term controlled survival data would seem to be needed.

Mechanism of Death There may be important biologic information in the cause of death data in CHF trials, but the clinical usefulness is doubtful. In certain situations (eg, bone marrow transplant studies with deaths separated into relapse, graft-versus-host disease, and those due to the conditioning regimen for transplantation), a specific cause of death may be appropriate as an endpoint. That is not true in

Statistical Issues in CHFTrials

CHE Deaths due to a power or pump failure and deaths due to rhythm disturbances are so closely related that one could not use one cause as a reasonable clinical endpoint. One might argue that only cardiovascular deaths be used as the endpoint; usually, the large majority of the deaths are cardiovascular. To eliminate other causes of death, one needs to be certain that they could not be related to a drug effect. I favor all causes of mortality as the endpoint in CHF mortality trials.

Composite Endpoints Composite endpoints have been proposed in the use of thrombolytic therapy by Braunwald and colleagues 25 and by Califf and co-workers. 26 Using a composite endpoint in a clinical trial has appeal for a number of reasons. The primary reason for considering a composite endpoint is to increase the statistical power to detect a treatment difference. As mentioned earlier, the statistical power of a study depends on the number of endpoints; if there are more endpoints, the study is more likely to detect a difference for a given sample size. Alternatively, a smaller study can be used to look for treatment differences. A composite endpoint allows a variety of different factors of importance to the patient to be used to decide on the relative efficacy of different treatments. Nevertheless, there are a number of reasons that multiple endpoints are problematic. Three such problems are discussed here: (1) the masking of more serious endpoints if the time to the first endpoint is used, (2) the desirability of weighting different events, and (3) from a Food and Drug Administration regulatory point of view, the necessity for a confirmatory trial may be uncertain. Suppose that an endpoint were to consist of the first occurrence of any of the following: death, myocardial infarction, hospitalization for CHF, or the need to discontinue a drug, add a drug, or both. One scenario that could result in a statistically significant difference between two treatment arms would be to have many more dose adjustments early within one arm, say an active drug arm, of an RCT but with a statistically significant fewer number of deaths. If a time to first event analysis is used, however, the many early drug adjustments might make the active drug arm that prolongs life be determined to be inferior to a placebo arm. A treatment arm could look superior because of many fewer more minor events, but actually be demonstrably worse with respect to more serious endpoints. This suggests that more than one occurrence of an event be used in the therapeutic evaluation and also leads naturally to the concept that different events should be weighted in some manner. One issue in comparing composite endpoints is the combining of apples and oranges. Should not different events (eg, death and a change of drug) have different weights? Sometimes economic principles are suggested;



Fisher

307

others think that relative assignments of scores should be made by patients, physicians, and so on. These problems have been the subject of active investigation by health services researchers and others. A recent book, Health Status and Health Policy, 27 considers in detail theoretic foundations and practical methods for characterizing healthrelated quality of life; these methods (in this instance designed to allow rational allocation of health-care funds across different diseases and treatments) combine both length of life and a variety of factors related to quality of life. Quality-of-life measures include both medical events (eg, hospitalization for CHF) and functional and psychologic aspects of the patient's life. A number of questionnaires and interview techniques have been developed to investigate quality of life and health outcome.27 Perhaps the most relevant methods ask the patients about their preferences for different health states and events. Patrick and Erickson a7 use, as an example, the assignment of values to heart failure patients. 2s Unfortunately, the answers can depend on the method of assessing preferences and the population involved. The quality of life can be terrible with CHF, and there could be a legitimate use for a drug that improved quality of life while shortening survival. A quality-of-life questionnaire has been developed specifically for CHF patients, the Minnesota Living With Heart Failure Questionnaire. 29,3° It is short and targets specifically the effects of CHE Conceptually, these types of methods are perhaps the most appealing means of dealing with the multiple aspects of heart failure. One could then integrate the effects of multiple occurrences of specific events (eg, myocardial infarction, hospitalization for acute CHF) and more pervasive effects of CHF over time (eg, lack of exercise tolerance, mental dysfunction, depression) with longevity of life in the various states. At this time, however, the requisite groundwork for developing and validating the measurement process has not been done. The more usual approach to date has been to have a primary endpoint (eg, survival) and then to collect data on other measures of potential interest without clearly a priori defined methods to determine when the overall result is positive. The potential for more complex evaluations is promising; there are factors making this desirable: the available methodology is improving while at the same time there is an increasing emphasis on evaluating the relative benefit/cost ratio for competing therapies (eg, when decisions are made for large segments of the population through formulary committees). Conversely, there are factors making such trials difficult: more and complex data need to be collected; in an unblinded or imperfectly blinded setting, the potential for bias could be increased. In choosing a composite endpoint, the different events or states chosen might be restricted to those for which the mechanism and understanding of the drug action could reasonably, result in an improvement. Unfortunately, the complexity of the pathogenesis and progression of CHF

308

Journal of Cardiac Failure Vol. 1 No. 4 September 1995

is such that not all outcomes resulting from a compound are necessarily foreseen. Furthermore, such choices might sometimes be criticized for ignoring a possible downside of a new compound. The specificity of the target endpoints depends on an investigator's understanding of the drug action mechanisms, as well as the body's response over time to its changed environment. The third concern is not scientific per se but can be important in drug development in the United States. By regulation, the Food and Drug Administration, in evaluating drugs, needs well-controlled clinical trials. The plural, trials, is deliberate. Where possible, two statistically significant trials are needed to show efficacy before a compound is approved. If a trial shows a survival benefit, however, it is considered practically, and to many people ethically, impossible to mount a second confirmatory trial. Thus, in this case one appropriate trial suffices. Suppose, however, that a composite endpoint is used with death and hospitalization. If hospitalization were the only endpoint, two trials might be expected; if death were the endpoint, only one trial might be required. If both are used and the combined endpoint shows a treatment difference but death alone has only a trend, then should the Food and Drug Administration require a second confirmatory trial? Such issues have not been resolved at this date.

References 1. Cohn JN, Archibald DG, Ziesche S, Franciosa JA, Harston WE, Tristani F, Dunkman B, Jacobs W, Francis G, Flohr K, Boldman S, Cobb F, Shah P, Saunders R, Fletcher R, Loeb H, Hughes V, Baker B: Effect of vasodilator therapy on mortality in chronic congestive heart failure: results of a Veterans Administration Cooperative Study. New Engl J Med 1986;314:1547-52 2. Packer M, Carver JR, Roedheffer RJ, Ivanhoe R J, DiBianco R, Zeldis SM, Hendrix GH, Bommer WJ, Elkayam U, Kukin ML, Mallis GI, Sollano JA, Shannon J, Tandon PK, DeMets DL: Effect of oral milrinone on mortality in severe chronic heart failure. New Engl J Med 1991 ;325:1468-75 3. Packer M, Roulea J, Swedberg K, Pitt B, Fisher L, Klepper M, the PROFILE Investigators and Coordinators: Effect of Flosequinan on survival in chronic heart failure: preliminary results of the PROFILE study. Circulation 1993;88:I301, Abstract 1612 4. The CONSENSUS Trial Study Group: Effects of enalapril on mortality in severe congestive heart failure: results of the Cooperative North Scandinavian Enalapril Study Group (CONSENSUS). N Engl J Med 1987;316:1429-35 5. The SOLVD Investigators: Effect of enalapril on survival in patients with reduced left ventricular ejection fraction and congestive heart failure. N EngI J Med 1991;325: 293-302 6. Reiser SL, Dyck AJ, Curran WH, eds: Ethics in medicine: historical perspectives and contemporary concerns. MIT Press, Cambridge, MA, 1977

7. Fisher L, McDonald J: Fixed effects analysis of variance. Academic Press, New York, 1978 8. Spieglehalter DJ, Freedman LS, Parmar MKB: Bayesian approaches to clinical trials. J Royal Statist Soc Series A 1995;157:357-416 9. Armitage P, McPherson CK, Rowe BC: Repeated significance tests on accumulating data. J R Stat Soc Series A 1969; 132:235-44 10. Jennison C, Turnbull BW: Statistical approaches to interim monitoring of medical trials: a review and commentary. Stat Sci 1990;3:299-317 11. Whitehead J: The design and analysis Of sequential clinical trials. Ellis Horwood, Chichester, England, 1983 12. Lan KKG, DeMets DL: Changing frequency of interim analysis in sequential monitoring. Biometrics 1989;45: 1017-20 13. Greenberg Report: Organization, review, and administration of cooperative studies. Controlled Clin Trials 1988;9:137-48 (Publication of 1967 report) 14. Fleming TR, DeMets DL: Monitoring of clinical trials: issues and recommendations. Controlled Clin Trials 1993;14:183-97 15. Fleming TR: Data monitoring committees and capturing relevant information of high quality. Stat Med 1993;12:565-70 16. DeMets DL, Ellenberg SS, Fleming TR, Childress JF, Mayer KH, Pollard RB, Rahal JJ, Waiters L, O'Fallon J, Whitley-Williams P, Strauss S, Sande M, Whitley RJ: The Data and Safety Monitoring Board and acquired immune deficiency syndrome (AIDS) clinical trials. Controlled Clin Trials (in press) 17. O'Brien PC, Fleming TR: A multiple testing procedure for clinical trials. Biometrics 1979;35:549-56 18. Lan KKG, DeMets DL: Discrete sequential boundaries for clinical trials. Biometrika 1983;70:659-63 19. Lan KKG, DeMets DL: Changing frequency of interim analysis in sequential monitoring. Biometrics 1989;45: 1017-20 20. Gottlieb SS, Kukin ML, Penn J, Fisher ML, Cines M, Medina N, Yushak M, Taylor M, Packer M: Sustained hemodynmaic response to Flosequinan in patients with heart failure receiving angiotensin-converting enzyme inhibitors. J Am Coll Cardiol 1993;22:963-7 21. Massie BM, Berk MR, Brozena SC, Elkayam U, Plehn JF, Kukin ML, Packer M, Murphy BE, Neuberg GW, Steingart RM, Levine TB, DeHaan H: Can further benefit be achieved by adding Flosequinan to patients with congestive heart failure who remain asymptomatic on diuretic, digoxin, and an angiotensin converting enzyme inhibitor: results of the Flosequinan-ACE inhibitor trial (FACET). Circulation 1993;88:492-501 22. Packer M, Narahara KA, Elkayam U, Sullivan JM, Pearle DL, Massie BM, Creager MA: Double-blind, placebocontrolled study of the efficacy of Flosequinan in patients with chronic heart failure: principle investigators of the REFLECT study. J Am Coll Cardiol 1993;22: 65-72 23. Cowley AJ, McEntegart D J: Placebo-controlled trial of Flosequinan in moderate heart failure: the possible importance of aetiology and method of analysis in the interpreta-

Statistical Issues in CHFTrials tion of the results of heart failure trials. Int J Cardiol 1993;38:167-75 24. Elboru JS, Riley M, Stanford CF, Nicholls DP: The effects of Flosequinan on submaximal exercise in patients with chronic cardiac failure. Br J Clin Pharmacol 1990;29: 519-24 25. Braunwald E, Cannon CP, McCabe CH: Use of composite endpoints in thrombolysis trials of acute myocardial infarction. Am J Cardiol 1993;72:3G-12G 26. Califf RM, Harrelson-Woodlief L, Topol EJ: Left ventricular ejection fraction may not be useful as an end point of thrombolytic therapy comparative trials. Circulation 1990;82:1847-53



Fisher

309

27. Patrick DL, Erickson P: Health status and health policy. Oxford University Press, New York, 1993 28. Hogness J, Van Antwerp M: The artificial heart: its development and use. National Academy Press, Washington, DC, 1991 29. Rector TS, Kubo SH, Cohn JN: Patients' self-assessment of their congestive heart failure. Part 2. Content, reliability and validity of a new measure, The Minnesota Living with Heart Failure Questionnaire. Heart Failure 1987;Oct/Nov: 198-209 30. Kubo SH, Rector TS, Strobeck JE, Cohn JN: OPC-8212 in the treatment of congestive heart failure: results of a pilot study. Cardiovasc Drugs Ther 1988;2:653-60