Otolaryngology–Head and Neck Surgery (2007) 137, 697-699
EDITORIAL
Confidence
H
ow confident are you in research reported in this journal? If you, the study authors, or an independent party could reproduce the study, would you get the exact same results? Peer review helps ensure validity but does not define reproducibility of results or what confidence they merit from readers. These concepts are the realm of confidence intervals, which inspired the letter that follows. Letter to the Editor
As someone whose confidence in research results is directly proportional to my understanding of them, I find my confidence waning with every confidence interval I encounter in your journal. I am confident, however, that the prevalence of confidence intervals is rising under your editorship and, therefore, request that you kindly reverse this trend. Confidence intervals remain as distasteful now as they did 25 years ago in medical school. Yes, as a practicing clinician, I crave “confidence,” but how does publishing a range of results labeled affectionately “95 percent confidence interval” engender more confidence than a single, discrete, understandable estimate? Confidence intervals epitomize Esar’s definition of statistics as “the science of producing unreliable facts from reliable figures.”1 Modern medicine rests on a foundation of efficiency and clarity of communication. I must exude an air of certainty, not confusion, and the fuzzy mumbo jumbo of confidence intervals facilitates only the latter. Patients want to know exactly what to expect from treatment, not some broad range of values within which their destiny is likely to fall. Therefore, I suggest you remove confidence intervals from the articles you publish as a first step toward certainty of thought and unambiguous communication. I have no desire to burden you with this task, so perhaps your publisher can automatically delete them when typesetting articles. If 8 out of 10 patients get better with an intervention, that is 80%, period; why cloud the brain with additional needless qualifications? Confidently yours, Conley Confident, MD Surely the Best Place, USA Editor’s Response Of all the requests I make of authors, one of the most frequent is to supply a 95% confidence interval (CI). Many authors, however, share the opinion of Dr Confident that a
single point estimatemore clearly and succinctly conveys a numeric result. This point estimate may take the form of a rate, ratio, mean result, mean difference in group outcomes, percent surviving at a time interval, or a simple percentage of successful outcomes in 1 group. Why bother with 3 numbers, a point estimate plus upper and lower bounds of a CI, when a single might suffice? Because the 95% CI offers a range of values considered plausible for the population.2 Investigators can rarely study every person with a condition so they limit their efforts to a selected subset called a sample. A point estimate summarizes findings for the sample, but extrapolation to the larger population introduces error and uncertainty, which makes a range of plausible values more appropriate. The more subjects studied, the tighter (narrower) the CI, and the more certain readers can be about population conclusions. Consider a rickety old bridge, spanning a 300-foot chasm. Would you cross the bridge knowing that no one ever fell off? Your first thought would likely be, “How many pedestrians actually crossed the bridge?” knowing that a point estimate of 0% fatalities cannot be interpreted without additional information, namely sample size. When would you feel “confident” that it was safe to cross: if 5, 15, 150, or 1,500 pedestrians crossed without plunging to their death? Intuitively, if 5 people crossed successfully, you might remain skeptical of safety. In other words, you know little about the real fatality rate for a population of pedestrians (which may soon include you) based on a point estimate of 0% fatalities with a sample size of 5. In statistical terms, the 95% CI is about 0% to 60%, meaning that no fatalities in a single trial of 5 subjects includes a fatality rate of up to 60% as plausible for the population value. A quick way to estimate the upper limit of a 95% CI when no events occur (eg, point estimate of 0%) is to calculate 3/n, where n is the sample size.3 For the bridge example, 3/n becomes 3/5 or 60%. Alternatively, consider an enthusiastic young surgeon who insists that you not worry about her surgical prowess because her first 10 cases of stereotactic, transurethral, balloon-assisted, parotidectomy were unqualified successes. Unfortunately, the maximum confidence we can place in her abilities is that of being 95% certain that her eventual failure rate will be about 30% (3/10) or lower. Putting intuition aside, the upper limit of the 95% CI for no events in 15, 150, or 1,500 trials (3/n) is 20%, 2%, and 0.2%, respectively. Of course, all the point estimates are 0
Received August 11, 2007; accepted August 13, 2007.
0194-5998/$32.00 © 2007 American Academy of Otolaryngology–Head and Neck Surgery Foundation. All rights reserved. doi:10.1016/j.otohns.2007.08.009
698
Otolaryngology–Head and Neck Surgery, Vol 137, No 5, November 2007
(as is the lower limit of the 95% CI), but knowing the upper limit of the 95% CI facilitates an intelligent, informed course of action. A daredevil might cross the bridge with up to 20% risk (based on 0/15 fatalities), but a mere mortal might want 0.2% risk as the upper limit of the 95% CI (based on 0/1,500) or even lower. Which would you choose? Despite their utility, 95% CIs are a statistical eyesore for most readers. Darell Huff, author of How to Lie with Statistics, said it best: “If you can’t prove what you want to prove, demonstrate something else and pretend that they are the same thing. In the daze that follows the collision of statistics with the human mind, hardly anybody will notice the difference.”4 So here’s a crash course on reducing the cerebral impact of CIs: 1. Accept uncertainty. Recognize that all observations based on a limited sample are uncertain and must be viewed as a range of plausible results (95% CI) for the population of interest. Your task as reader is to decide if the level of uncertainty, imposed by sample size and by the inherent variability of the data, is low enough to make point estimates credible and relatively certain. 2. Look for a 95 percent CI. All main results deemed important by the authors should have a 95% CI in addition to the point estimate. The “confidence” you can place in results that lack a 95% CI is difficult to determine unless you estimate it yourself (eg, rule of 3/n for zero events) or use a statistical program to calculate one. 3. Interpret a “positive result” with the 95% CI lower limit. When the authors conclude an outcome or group difference is statistically significant or clinically important, scrutinize the lower limit of the 95% CI. If the magnitude of effect at the low end is consistent with a trivial or nonimportant outcome, not enough subjects were studied to create credible confidence that the point estimate is meaningful. 4. Interpret a “negative result” with the 95% CI upper limit. When the authors conclude an outcome or group difference is not significant or important, scrutinize the upper limit of the 95% CI. If the magnitude of effect at the high end is consistent with a nontrivial or important outcome, not enough subjects were studied to ensure that a potentially important effect was overlooked (low statistical power). Let us put theory into practice by considering a new drug for acute sinusitis, Biofilm-Blaster, that boosted antibiotic cure rates by 20% in a randomized trial (P ⬍ 0.05). The 95% CI of 2% to 38% should immediately temper our interest because the lower limit of 2% means the investigators have not excluded a potentially trivial effect (even if it is statistically significant). Stated differently, the plausible range of population results (95% CI) includes some that are trivial. We need a larger trial with greater precision to show if Biofilm-Blaster is worth considering. Conversely, a 95% CI of 16% to 24% would boost confidence in the results, but
we would still need to balance the benefit against potential adverse effects. Now suppose that as the allergic rhinitis season arrives, you get an order form for Cheap-N-Great antihistamine, touted as comparable to expensive, nonsedating alternatives in randomized trials. The actual difference observed (point estimate) was only 5%, but the 95% CI was ⫺15% to 25%. For this “negative” study, we examine the upper limit of the CI, which is consistent with a 25% poorer result with the new drug. The manufacturer conducted a small trial (unintentionally or perhaps intentionally) to stack the deck in favor of missing a potentially important difference. Conversely, a larger study with a 95% CI of ⫺5% to 5% would be a credible statement of no real difference. When I request that authors submit a revised manuscript and add 95% CIs to main results, I occasionally get told “our sample is too small for confidence intervals or statistical analysis.” Not only is this patently false (see any statistician), but it is precisely the small sample study (say 5, 10, or 20 observations) that benefits most from a 95% CI. The width of a 95% CI is inversely related to sample size, and the broad intervals in small studies can rapidly erode confidence. One aspect of our journal mission is to publish information that can be used to improve patient care and public health. This is possible only if research articles contain ingredients for readers to interpret and apply results. CIs are a key ingredient for managing uncertainty because they trounce the artificial precision of a single estimate, instead coaxing readers to focus on the implications—and clinical importance— of a range of plausible results. The words of Oliver Wendell Holmes Junior remain true: “Certainty is generally is illusion and repose is not the destiny of man.”5 CIs highlight uncertainty between samples and populations, but uncertainty is even greater when moving from populations to individual patient care. A plausible range of results (95% CI) for a population does notequate to similar results for every patient managed individually by the clinician. A given patient may differ greatly from those in a research study because of disease-related factors, comorbid conditions, or socioeconomic considerations. Osler’s remarks to the New Haven Medical Association in 1903 are essential reading for all clinicians: Variability is the law of life, and as no two faces are the same, no two bodies are alike and no two individuals react alike and behave alike under the abnormal conditions which we know as disease. This is the fundamental difficulty in the education of the physician, and one which he may never grasp, or he takes it so tenderly that it hurts instead of boldly accepting the axiom of Bishop Butler, more true of medicine than of any other profession: ‘Probability is the guide of life.’ Surrounded by people who demand certainty, and philosopher enough to agree with Locke that ’Probability supplies the defect of our knowledge and guides us when that fails and is always conversant
Rosenfeld
Confidence
about things of which we have no certainty,’ the practitioner too often gets into a habit of mind which resents the thought that opinion, not full knowledge, must be his stay and prop. There is no discredit, though there is at times much discomfort, in this everlasting perhaps with which we have to preface so much connected with the practice of our art. It is, as I said, inherent in the subject.”6 The additional uncertainty in managing individuals does not negate the value of research but makes it a guide to intelligent action rather than a mandate for cookbook medicine. In this regard, the 95% CI may be viewed as a first step in uncertainty management and enlightened research interpretation. We have come a long way in otolaryngology, but insistence by journal editors—and readers—that authors include 95% CIs is an inevitable future step. Equally important is the humble acceptance that uncertainty is inherent in the art of medicine, an art that will flourish as original research in this, and other journals, facilitates evidencebased care.
AUTHOR INFORMATION Corresponding author: Richard M. Rosenfeld, MD, MPH. Department of Otolaryngology, Long Island CollegeHospital, 339 Hicks Street, Brooklyn, NY 11201.
699 E-mail address:
[email protected].
Richard M. Rosenfeld, MD, MPH Editor in Chief Department of Otolaryngology, State University of New York Downstate, and Long Island College Hospital, Brooklyn, NY.
FINANCIAL DISCLOSURE None.
REFERENCES 1. Quotations by author: Evan Esar, American humorist (1899-1995). Esar’s Comic Dictionary; 1943. http://www.quotationspage.com/ quotes/Evan_Esar/. Accessed August 10, 2007. 2. Gardner MJ, Altman DG. Confidence intervals rather than P values: estimation rather than hypothesis testing. Stat Med 1986;292:746 –50. 3. van Belle G. Statistical rules of thumb. New York: Wiley-Interscience; 2002:49. 4. Huff D. How to lie with statistics, 41st printing. New York: WW Norton Company; 1954. 5. Bartlett J, Kaplan J. Bartlett’s familiar quotations. 16th ed. Boston: Little, Brown, and Company; 1992:542. 6. Osler W. On the educational value of the medical society. In: Aequanimitas. Philadelphia: P. Blakiston’s Son & Co; 1904:348.