Inference and decision at the bedside

Inference and decision at the bedside

J ClinEpidemiol Vol. 42, No. 4, pp. 309-316, 1989 Printed in Great Britain. Copyright All rights reserved INFERENCE AND DECISION 0895-4356/89 $3...

983KB Sizes 0 Downloads 52 Views

J ClinEpidemiol Vol. 42, No. 4, pp. 309-316, 1989 Printed in Great Britain.

Copyright

All rights reserved

INFERENCE

AND DECISION

0895-4356/89 $3.00 + 0.00 Q 1989 Pergamon Press plc

AT THE BEDSIDE*

DAVID L. SACKETT

McMaster University Faculty of Health Sciences, Chief of Medicine, Chedoke-McMaster Hospitals, 1200 Main Street West, Hamilton, Ontario, Canada L8N 325

No man can receive a greater honour than the respect of his peers, and I was both thrilled and touched to learn that I had been selected to join Peter Armitage and Alvan Feinstein as a recipient of this year’s J. Allyn Taylor International Prize in Medicine. I accept it on behalf of the colleagues with whom I’ve worked in the Department of Clinical Epidemiology and Biostatistics at McMaster over the past 20 years. These brilliant, dedicated, irreverent, fun-loving partners are responsible for my success, and I regard this award as a recognition of our group, not just myself. Finally, the person most responsible for this and any other worthy things I’ve done since we met 35 years ago, Barbara Bennett Sackett, is here today to share this recognition. In preparing this essay, I’ve attempted to identify certain clinical parallels with Peter Armitage’s opening essay on Inference and Decision in Clinical Trials. The reasons are both scientific and autobiographic. One of the delights of having spent most of my academic career at McMaster is the freedom this institution gives me and my colleagues, every 5 years or so, to change our careers in radical and non-traditional ways. In my own case, this has most recently taken the form of returning, shortly before my 50th birthday, to the status of a daytime Medical Resident, rotating through the Coronary Care Unit, the ICU, the Infectious Disease Service, the Diabetes Clinic, and so forth, re-treading in acute general internal medicine. Because we always bring our former experiences and patterns of thinking with us as we take on new areas, I brought along the

*Text of an oration given at The University of Western Ontario, London, Canada, in September 1987 on the occasion of the presentation of the J. Allyn Taylor International Prize in Medicine.

experiences and patterns I had developed during my previous two decades as a clinical investigator who had devoted most of his time to the design and execution of clinical trials and health care research. As I did so, a set of ideas that I had first developed 20 years earlier became reinforced every time I saw a patient. In brief, these ideas run as follows: the important tasks we carry out as clinicians require the particularization, to the individual patient, of our prior experiences (as individual clinicians and as a profession) with groups of similar patients. Thus, the rational evaluation of a symptom, sign, or laboratory test result in today’s patient demands our critical appraisal of how this clinical finding has behaved previously among groups of patients with the same differential diagnosis. Similarly, the rational selection of a treatment for today’s patient requires our appraisal of how similar patients have fared on various treatments in the past. If I was correct that rational clinical practice requires the projection of diagnostic findings, prognoses, and therapeutic responses from groups of patients to the individual patient, it therefore followed that the strategies and tactics used to study groups of patients (housed in the discipline of epidemiology and the science of statistics) ought to be useful to me as an individual clinician dealing with my individual patient. Moreover, it should be possible for me to take a set of epidemiologic and biostatistical strategies developed to study the “distribution and determinants of disease” in groups and populations, recast them in a clinical perspective, and use them to improve my clinical performance. I therefore set about trying to do so. As I spent more and more time in attempts to apply, in the frontlines of clinical medicine, the results of the randomized trials I and others had been executing, I identified three issues in

309

310

DAVIDL. SACKETT

inference and decision making at the bedside: first, how should clinicians decide whether the results of a randomized trial, performed at another time in another town by other clinicians among other patients, apply to their own particular patient, today, in their own town? That is, how ought we consider the generalizability or external validity of internally valid randomized trials? Second, faced with an ever-expanding array of validated, efficacious preventive and therapeutic maneuvers, how should clinicians decide which ones deserve the highest priorities, especially when they may not have the time or other resources to pursue them all? That is, can we develop a clinically useful and easily understood yardstick for comparing the payoffs, for our patients, of pursuing and treating different disorders? And, finally, how are we to select the best therapy when there are no randomized trials to guide us? Is there any way to try to bring the powerful inferential methods of the randomized trial to bear on the assessment of the efficacy of treatment in the individual patient? I will devote the remainder of this essay to the exploration of these three issues of inference and decision-making at the bedside. Reports of and about randomized control trials are burgeoning. If you call up the National Library of Medicine on your computer and ask them to list the citations indexed under the term “random allocation”, you will receive over 8500 references published just since 1986. Moreover, an increasing number of preventive, therapeutic, and rehabilitative regimens have been shown, through the application of this powerful scientific method, to do more good than harm. However, the fact that a treatment produces statistically significant benefits among a group of published patients in another town still leaves the frontline clinician with a major additional inferential task, for this clinician must decide whether the treatment will work in my patient in my town. Rephrased in scientific terms, the clinician must decide whether this internally valid trial is externally valid and generalizable. Of course, generalizability, in the strictest sense, is confined to patients who, had they been present where and when a trial was underway, would have been in it. Extrapolation of the trial results to other patients may happen to be both correct and useful, but more closely resembles speculation than deduction. But it is precisely this speculation that the frontline clinician must engage in when deciding whether to apply the results of an internally valid randomized trial to

a particular patient, and I want to report some ideas my colleagues and I are developing in an effort to assist this process of inference. The issue of generalizability has come to increasing notice with the relatively recent practice of including, in publications of the internal results of randomized control trials, statements on the number of otherwise eligible patients who were, for whatever reason, not entered and randomized with the rest. Although such “eligible but not randomized” patients have always existed, and regularly outnumber those randomized, they have attracted attention only recently, especially when the internally valid results of the associated trials have demonstrated the unexpected absence of efficacy. The inferential task facing the frontline clinician is to decide whether the presence of large numbers of “eligible but not randomized” patients have rendered the trial, however internally valid, non-generalizable. The first element of the task is based on sheer numbers, and inquires whether, if only a small fraction of otherwise eligible patients were entered into the trial, its results can ever be generalized and applied to our patient. However, when the probability of responding to the experimental therapy has no bearing on whether eligible patients are admitted to a randomized trial, the number or proportion of eligible patients admitted has no bearing on the generalizability of the trial results. Although, as we shall see later, the types of eligible patients who receive treatment outside a trial may be crucial, the numbers (or ratios, etc.) of patients who do so are important only to the power of the trial’s results, and are irrelevant to the issue of generalizability. Thus, this issue need not impede the frontline clinician’s decision about applying a trial’s result to the individual patient, especially when the trial demonstrated a beneficial effect of therapy. Both more distant and recent medical history bear this out. Tens of thousands of peptic ulcer patients underwent uncontrolled gastric freezing in the 1960s in the belief that this procedure improved the subsequent course of this disease; it required only 160 of them, randomized to real or sham freezing, to demonstrate that the procedure was not efficacious [l]. A similar lesson is provided by internal mammary ligation as a treatment for angina pectoris [2]. A more recent example, this time with a positive result, comes from the randomized trials of coronary artery bypass grafting for angina pectoris. Subgroups

Inference and Decision at the Bedside

of patients defined on clinically sensible grounds (an issue that will be discussed in detail in a moment) routinely were more likely to be “eligible but not randomized” than actually randomized to surgical or medical therapy in the CASS-or Coronary Artery Surgery Study trial [3]. None the less, the 160 eligible and randomized patients with low ejection fractions at entry were sufficient to demonstrate both clinically and statistically significant reductions in mortality following bypass surgery (and the fact that several patients randomized to, and analyzed in, the medical arm underwent bypass surgery during the trial makes these results all the more positive and more impressive). The cornerstone to this guide is the absence of a selective exclusion of patients more or less likely to respond to the experimental therapy. There are methods for both safeguarding against and testing for this attribute, but time does not permit their discussion here. The second element of the problem focusses on the type, not the number, of “eligible but not randomized” patients, and usually applies to trials that conclude that the experimental therapy is not efficacious and is not, therefore, recommended for use in our patient (the rules apply to any trial, but the “negative” ones have received the most press recently [4] and I’ll therefore use them as my example). The appropriate concern about such “negative” trials is whether those patients most likely to benefit from the experimental therapy ever got into it; could they or their clinicians, convinced of the efficacy of the treatment in question, have arranged for them to receive it outside the trial? If so, and as a result of the selective exclusion of this subset of patients who are highly responsive to the experimental treatment, only the less- or non-responsive patients are entered and randomized, producing the trial’s negative (albeit internally valid) result. Once again, however, the frontline clinician can, by applying some simple principles of epidemiology and biostatistics to the clinical situation, infer a clinically-useful decision, for if a subset of patients are thought to be highly responsive to the treatment in question, generalizability is a function of how such patients fared inside a trial, not outside it. In the extreme case, if a randomized trial failed to enter a subgroup of eligible patients who truly did benefit from therapy, and found no benefit among the remaining eligible patients that it did randomize, then its negative result would only

311

be generalizable to the latter, randomized patients, and it would be incorrect to extrapolate its result and conclude that the treatment was worthless among all eligible patients. In the less extreme case, if only a small number of these “responsive” patients were randomized, then their favorable outcomes could be swamped by the overall negative result, and an incorrect conclusion once again could result. Faced with the possibility of such a situation, the frontline clinician can carry out two steps. The first step is to decide whether our specific patient belongs to the group who displayed a greater than average tendency to be “eligible but not randomized”. There is disagreement over what ought to be the next step. Some commentators on internally valid, negative trials suggest that the next logical step is the independent follow-up of these “discrepant” subgroups of “eligible but not randomized” patients outside the trial, using their subsequent clinical course to decide whether their treatment was efficacious [5]. I, as well as others [6], submit that these attempts to conclude efficacy from such uncontrolled, unblinded, case-series represents a 40 year retreat to the pre-experimental era. The answer to the question of efficacy doesn’t reside in today’s “eligible but not randomized” patients any more than it resided in yesterday’s case-series of internal mammary ligations; if it did, we wouldn’t have needed to develop the randomized trial! Rather, I’d submit that the frontline clinician should scrutinize the paper for a report on the clinical course of the subset of patients with these same clinical and laboratory characteristics who were randomized and included in the trial. This report, in the form of a sub-group analysis of discrepant but randomized patients, can lead the clinician to one of three conclusions. First, this analysis could reveal that this subgroup does benefit in a clinically and statistically significant fashion from the experimental treatment, an important positive result, generalizable to our individual patient. Second, this subgroup analysis could reveal that a clinically significant benefit from the experimental treatment can be ruled out at an appropriate level of confidence (an important negative result, also generalizable to our patient). Third, the analysis could reveal that too few such discrepant patients are (or were) randomized to permit any internally or externally valid conclusion about whether they benefit or deteriorate from the experimental treatment, telling the frontline

DAVIDL. SACKETT

312

clinician that no firm inference can be made from this report about how our individual patient would fare on this treatment, and that some other method of reaching this decision would have to be employed (I’ll return to describe such a method toward the end of this essay). In concluding this portion of this essay, I want to acknowledge the contributions of Michael Gent, Brian Haynes, and Wayne Taylor to these ideas. Our group and others will continue to address, both with statistical models and concrete examples, these issues of generalizability in the hope that we can generate valid, useful rules for the frontline clinician. The second topic I want to discuss in considering inference and decision-making at the bedside resides at the interface between prevention, therapeutics, economics, and administration. Simply put, and given the limitations of the clinician’s time and energies, where should their preventive and therapeutic efforts go? Faced with a patient and a few spare minutes, should they be devoted to a blood pressure check?; a counselling session about dietary fat?; an inquiry about possible symptoms of transient cerebral ischemia?; a demonstration of how to use nicotine-bearing chewing gun in order to stop smoking? Given that there’s not enough time to treat every patient for every treatable, much less preventable disorder, which interventions ought to take priority, and how ought these priorities be established? To be sure, there are plenty of experts, mostly self-proclaimed, who are quick to tell us how we should spend this precious time, and there are now even some sensible task forces and professional review bodies who, using rigorous rules of evidence such as the ones we’ve been discussing here, often fortified with thoughtful economic analyses, can even give us valid advice. To the present, however, this good advice tends to be compartmentalized to individual disorders and to provide yes/no sorts of answers. The frontline clinician still needs some yardstick by which the benefits of alternative treatments can be

measured and compared with each other, so that the ones with the biggest payoffs can receive appropriate priority; once again, the need for inference at the bedside. What sort of a yardstick would be most helpful to the frontline clinician who was trying to make such inferences? When Adreas Laupacis, Robin Roberts and I considered this issue [7], we thought a useful yardstick would incorporate four different elements of the decision to apply or withhold a given treatment: first, the consequences of doing nothing (that is, the patient’s risk of succumbing to his disease if we withheld therapy); second, the extent to which we could reduce this risk by applying the specific therapy; third, the risks of the therapy to the patient, in terms of side-effects and toxicity; and fourth, the generation of a number that would permit us to compare this treatment for this condition with other treatments for other conditions, so that we could decide where to devote our scarce resources. Our work led us to study the interesting, and potentially very useful properties of the reciprocal of the absolute risk reduction which, as it turns out, is the number of patients with a given disorder one needs to treat in order to prevent one of them from succumbing to the complications of their disease. This is shown in Table 1, where we have summarized the results of one of the famous U.S. Veterans Administration hypertension trials [8]. In that trial, hypertensive men, some of whom already had target organ damage at entry and others of whom did not, were randomly allocated to receive inert placeboes or active antihypertensive drugs and then followed for approximately 3 years for the occurrence of death, stroke, or other major cardiovascular complications of their hypertension. Table 1 reveals that over a fifth of those with prior target organ damage who received inert placeboes went on to die or suffer stroke or major cardiovascular damage over the next 3 years, as opposed to less than a tenth of similar men treated with active antihypertensive drugs. Men

Table 1. Measures of the efficacy of treatment

Patient status at entry to the trial Target organ damage present Target organ damage absent

Rate of events over 3 years: Active Placebo Ra RP

Relative risk reduction: Rp-Ra Ra

Absolute risk reduction: Rp-Ra

Number needed to treat: 1 Rp-Ra

0.222

0.085

62%

0.137

7

0.098

0.040

59%

0.058

17

Inference and Decision at the Bedside

who were free of prior target organ damage at the start of the study fared better: roughly one-tenth of such patients, when treated with placeboes, had events and, once again, such men fared better on active treatment. How should we summarize these risks and this benefit of active treatment? One way of doing so is to calculate the proportional, or relative risk reduction, on treatment, for these adverse events, and this is shown in the fourth column of Table 1. Although the relative risk reduction nicely captures one element of the benefits of active treatment, it is virtually identical for both groups of hypertensive patients (62% for those with target organ damage at the outset, and 59% for those without) and it has lost the information that those men with target organ damage at entry both started and ended at higher risk of further complications. The arithmetic difference in complication rates on placebo and active treatment, the absolute risk reduction, as shown in the fifth column of the table, does regain the lost information, and shows that the frontline clinician can accomplish a bit more than twice the reduction in further complications and death (0.137 as opposed to 0.058) by paying special attention to hypertensives who already have target organ damage. But these numbers do not roll easily off the tongue or aid communication between clinicians and patients. Our group, as well as others, have suggested that we take this calculation a bit further, as shown on the sixth column of Table 1. As it happens, the reciprocal of the absolute risk reduction both regains the information lost in the relative risk reduction and possesses a highly useful additional property: it tells us the number of each sort of hypertensive patient we will need to treat in order to prevent one death, stroke or other major complication of their hypertension. That is, we need to treat just seven hypertensive patients with pre-existing target organ damage for 3 years in order to prevent one of them from going on to die or suffer a stroke or other major complication of their disease, whereas we need to treat 17 hypertensives who are free of prior target organ damage in order to accomplish this same goal. Thus, we can accomplish more by pursuing the vigorous detection and treatment of hypertensives with than without prior target organ damage, and it really becomes worthwhile to do our best to get them to stay under our care and take their medicine. Thus, this “number needed to treat” approach provides both the

313

clinician and the patient with a very useful inference about the efficacy of therapy. Moreover, this inference can be extended in two other, very useful directions. First, we can add information about the cost of treating this many patients in order to prevent one complication. To be sure, this weighing of benefits and risks can be done in a formal economic or even cost-utility analysis, but such arcane analyses often are beyond both the interest and the understanding of the frontline clinician. Alternatively, we can carry out a more pragmatic analysis, simply taking into account what we already know about the risks of the known side-effects and toxicity of the treatment. For example, a common first-line treatment for hypertension is a diuretic, and we now know that these drugs produce specific, and frequently unacceptable side-effects at predictable rates [9]. When we treat (for 3 years) 17 hypertensive men with no baseline target organ damage in order to prevent death, stroke or other major complication in one of them, we can expect one of them to become impotent, a second to develop symptomatic gout, and a third to develop diabetes. Thus, we can now get a handle on both the benefits and risks of such treatment in a form that clinicians and patients can quickly grasp and discuss, creating a therapeutic alliance in which both parties can collaborate in deciding whether to proceed with treatment, and exploring other treatments that might accomplish the same ends at lower, more acceptable risks. The second additional use for this yardstick is in making comparisons across different treatments that have been validated in randomized trials for different diseases, providing frontline clinicians with extremely useful information as to where their efforts to prevent or to detect and treat illness are likely to bring the greatest benefits to their patients. Although this process requires some additional statistical assumptions, plus the standardization of the duration of therapy, it can none the less be quite illuminating [7]. For example, the 5-year payoff from detecting and treating symptomless severe hypertension is excellent-treating just three of them will prevent one death, stroke, or myocardial infarction. Similarly, we can prevent one stroke or death among every six patients with transient ischemic attacks when we treat them with aspirin. On the other hand, we’d have to give cholestyramine plus diet to 89 hypercholesterolemic men for 5 years in order to prevent

314

DAVID L. SACKETT

one of them from dying or suffering a myocardial infarction, and we’d have to treat 141 of the milder hypertensives (at, based on our earlier slide, a price of six becoming impotent, seven developing gout, and four becoming diabetic) in order to prevent one death, stroke or myocardial infarction. I’d suggest that this way of presenting such evidence promotes clinical inferences of the sort that frontline clinicians are trying to draw from the clinical literature every day, and we and our colleagues will continue to pursue these leads. Once again, considerable work remains to be done on both the theory and practice of this approach to developing a clinically useful yardstick of comparative efficacy, but we believe that it holds sufficient promise to warrant further work. My first two topics have considered problems in inference and decision in the application of the results of completed randomized trials at the bedside. My final topic deals with an as yet too frequent therapeutic dilemma. How can we select the best treatment for the patient when there are no randomized trials to guide us? Such situations abound, and for a number of reasons. For example, no guidance can be obtained when no randomized trial has been conducted on the issue; moreover, some conditions are so rare that even multicentre collaborative trials are not feasible. Second, even when a relevant trial has generated a positive result, this result may not be generalizable to our patient. As already discussed, if our patient would not have met the eligibility criteria for the trial, extrapolation may not be appropriate. Third, in positive trials the experimental therapy almost never benefits every patient randomly assigned to it, and our hypertension example reminds us that it may harm some of them. Finally, even when a relevant randomized trial has generated a negative result, at least some of the patients in it will appear to have benefitted from the experimental therapy. How do we help the individual patient in such circumstances? Can we bring the scientific method to bear on this dilemma? After all, the routine treatment of the individual patient can, in some ways, be likened to an experiment, and Alvan Feinstein has brilliantly described this interaction as one in which the patient arrives in an initial state, accepts a treatment proposed by the clinician, and winds up in a subsequent state [lo]. To the extent that this subsequent

state is judged by the clinician and patient to be more desirable than the initial state, both conclude that the treatment was efficacious. However, for a number of reasons, any positive conclusions drawn from these “trials of therapy” in individual patients may be wrong. First, the patient’s illness may simply have run its course, and the signs and symptoms would have disappeared on their own, with no treatment whatsoever. None the less, any treatment instituted in the interim, as long as it doesn’t actually harm the patient, will appear efficacious. Second, the phenomenon of “regression toward the mean” has now been observed so many times that we now recognize that extreme symptoms (even angina and transient ischemic attacks), signs (most notoriously, elevated blood pressures), and laboratory test results (of all sorts) will quite regularly, if totally ignored and reassessed sometime later, be found to have returned to or at least toward the normal range. Once again, any treatment initiated in the interim, even if totally useless, will appear to have been efficacious. Third, the placebo effect-the incompletely understood response to the act of giving and receiving apparently inert treatment-has been estimated to be responsible for up to 30% of many treatment effects, and was even proposed as an explanation for why so many angina patients-some of whom were physicians-experienced symptomatic relief following internal mammary artery ligation, Fourth, when both clinician and patient know what is to be expected from the treatment, their unblinded interpretation of subsequent events may be strongly influenced by these prior expectations. Finally, when the patient is grateful for the clinician’s time and effort, this gratitude-plus simple good manners-often is reflected in minimizing continuing symptoms or overestimating recovery from them. Of course, it was the recognition of these threats to the validity of uncontrolled trials of therapy in individual patients that led to the ascendancy of the randomized trial as the gold standard for deciding whether a treatment did more good than harm. The essential elements of this strategy are four. First, the assignment of treatments to patients by random allocation, a system analogous to tossing a coin and giving the patient the experimental treatment if the coin lands heads-up and the control treatment if the coin lands tails-up. This element creates comparable groups of patients receiving

Inference and Decision at the Bedside

the alternative treatments, with similar progfor spontaneous noses, equal likelihoods recovery, and equal chances for regression toward the mean. The second essential element of the randomized trial is that, whenever possible, neither the patient nor the therapist know which treatment the patient is receiving-that is, both are “blind” to the treatment being applied. To the extent that this element is achieved, it reduces bias in the expectation, by the patient, and reporting, by both the patient and the clinician, of the effects of the treatment. The third essential element of the randomized trial has been the development of objective, “hard” outcomes and the precise definition of treatment success and failure, in order both to identify and to communicate the exact effects of treatment. And, finally, the incorporation of formal statistical analysis into the randomized trial has provided the essential ingredient for both specifying and avoiding drawing incorrect conclusions about the efficacy of the experimental therapy-namely, the false-positive conclusion that it works when, in truth, it doesn’t, and the false-negative conclusion that it doesn’t work when, in fact, it does. It is the absence of these essential elements of science from the uncontrolled trial of therapy in the individual patient that has condemned this aspect of clinical practice to an “art.” However, and returning for a moment to my earlier autobiographic comments, in responding to a challenge issued to me as a discussant in a Medical Grand Rounds 3 years ago, it dawned on me that this need not be so. The challenge was to present a rational approach for deciding how to treat a case of chronic active hepatitis of a form in which no randomized trials could provide any guidance. I had to provide a rationale for making a decision about whether a treatment was helpful to an individual patient. In pondering this challenge, it dawned on me that we regularly make the alternative decision about whether a treatment was harmful to an individual patient every time we decide if an adverse drug reaction has occurred, and it was in reviewing attempts to bring science into this latter area that some interesting ideas occurred to me. Although issues in pharmacokinetics and pharmacodynamics are of undoubted importance, the proof of the pudding of an adverse drug reaction is what happens to the patient when we withdraw the offending agentdechallenge-and, even more decisively if it can be done with acceptable risk, what happens when

315

we re-introduce the putative offending agentre-challenge. It was this systematic de-challenge and rechallenge of the individual patient that stimulated my lateral thinking. Why not systematically-indeed, why not randomlyaechallenge and re-challenge the individual patient with a drug thought to help, rather than harm him? The idea grew, was taken up and, as usual, improved upon by my colleagues-especially a brilliant young clinical epidemiologist named Gordon Guyatt-and has become the so-called “N-of-l” trial [ll]. It also has benefitted from similar designs introduced earlier by behavioral scientists. In brief, an N-of-l trial comprises the observation of a patient through pairs of randomly ordered treatment and control periods, while measuring relevant treatment targets. It can best be explained by example. In our first trial, an asthmatic man, despite appropriate doses of multiple inhaled and oral bronchodilators and steroids (all of which have been well-validated in randomized trials), described symptoms that were making his life miserable: shortness of breath on ordinary daily activities (bending, hurrying, climbing stairs) and nocturnal “spasms” of dyspnea and coughing. In collaboration with his physician-who happened to be Gordon Guyatt-these symptoms became the treatment targets that would be used to gauge the efficacy of alterations in his treatment, and the patient agreed to record standardized measures of their severity in a diary. The drug they agreed to test was his oral theophyllineboth patient and clinician thought it was probably helpful, but weren’t sure. They also agreed that 10 days was long enough to tell how the patient was responding to treatment, so this became the treatment period. The hospital pharmacy produced identical-appearing capsules containing either the active drug or an inert placebo, permitting the trial to proceed on a double-blind basis, and pairs of treatment periods were set up, with randomization within each pair as to whether the active drug or the placebo would be given first. At the end of each pair, the patient and clinician got together, examined the blinded results, and decided whether to quit (and break the code) or go on to another pair. After two pairs of treatment periods, the patient had had enough, and demanded to be placed on the drug that he had received during the first period of each pair. Moreover, a

316

DAVIDL. SACKEIT

statistical analysis indicated that the differences he had recorded in his symptoms were unlikely to be due to chance. His clinician agreed, and when the code was broken, it was revealed that he had done far better on placebo. The theophylline had made his symptoms worse, and he continues to do better off it. From this small beginning, Dr Guyatt and I have generated an N-of-l clinical service that has now performed over 50 trials, more than two-thirds of which have generated results that convinced their participants to adopt them for their subsequent care. Certain aspects of this approach are especially appealing to patients and clinicians alike. First, of course, it seeks the best treatment for this specific patient, not the average patient. Moreover, by having the patient participate in the design of the trial, the specification of the treatment targets, and so forth, it creates a true therapeutic alliance between clinician and patient. Finally, because every patient receives active treatment during some periods of the trial, it overcomes the reluctance of some patients and clinicians to risk, through random allocation, a trial-long relegation to a placebo group. A host of exciting methodologic issues remain to be sorted out about the best way to design and, especially, to analyze such trials, Nonetheless, we believe that they constitute a highly useful and quite intriguing strategy of extending inference and decision to the bedside. Indeed, the N-of-l trial can incorporate the same elements of experimental design that are central to the large-scale randomized trial. The key difference between these two sharply contrasting applications of these scientific principles lies in the nature of the purposes to which they are directed. In the large randomized trial, it is to infer the best therapy for the average eligible patient. In the N-of-l trial, it is to infer the best treatment for the individual patient. Since this latter purpose is the ultimate objective of inference and decision-making, both in

large scale randomized control trials and in one-on-one frontline clinical medicine, I can think of no better observation on which to conclude this essay, other than to thank once again the sponsors of the Taylor International Prize for the honour they have given me, and to accept it on behalf of my students, teachers, patients, and colleagues, the ones who really earned it!

REFERENCES 1.

2.

3.

4.

5.

6.

7.

8.

9.

IO. 11.

Miao LL. Gastric freezing: an example of the evaluation of medical therapy by randomized trials. In: Bunker JP, Barnes BA, Mosteller F, Eds. The Costa, Rlaks, and Benefits of Surgery. New York: Oxford University Press; 1977: 198. Barsamian EM. The rise and fall of internal mammary artery ligation in the treatment of angina pectoris and the lessons learned. In Bunker JP, Barnes BA, Mosteller F, Eds. The Costs, Risks, and Benefits of Surgery. New York: Oxford University Press; 1977: 212. Passamani E, Davis KB, Gillespie MJ, Killip T and the CASS Principal Investigators and Their Associates. A randomized trial of coronary artery bypass surgery; Survival of patients with a low ejection fraction. N Engl J Med 1985; 312: 1665-1671. The EC/IC Bypass Study Group. Failure of extracranial intracranial arterial bypass to reduce the risk of ischemic stroke. N Engl J Med 1985; 313: 1191-1200. Relman AS. The extracranial-intracranial arterial bypass study. What have we learned? (Editorial). N Eogl J Med 1987; 316: 809-810. Chalmers TC, Meier P, Plum F. The EC-IC bypass study (Correspondence). N Engl J Med 1987; 317: 1030-1031. Laupacis A, Sackett DL, Roberts RS. An assessment of clinically useful measures of the consequences of treatment. N Engl J Med 1988; 318: 1728-1733. Veterans Administration Cooperative Study Group on Antihypertensive Agents. Effects of treatment on morbidity in hypertension. III. Influence of age, diastolic pressure, and prior cardiovascular d&e&e; further analvsis of side effects. Circulation 1972: 45: 991-1004. Report of the Medical Research Council Working Party on Mild to Moderate Hypertension: Adverse reactions to bendrofluazide and propranolol for the treatment of mild hypertension. Lmcet 1981; 2: 539-543. Feinstein AR. Clinical Biostatlstlcs. Saint Louis: Mosby, 1977: 17. Guyatt G, Sackett DL, Taylor DW, Chong J, Roberts R, Pugsley S. Determining optimal therapy: randomized trials in individual patients. N Engl J Med 1986; 3 14: 889-892.