Editorial Three Kinds of Lies John T. Thompson, MD - Baltimore, Maryland Everyone is familiar with the famous quote from Benjamin study reported by Holopigian and Bach1 in 2010 is a good Disraeli about lies and statistics, but we depend on statistics primer on common statistical errors found in the ophthalmic to evaluate the claims made by most articles published in the literature. It highlights the issues about whether research ophthalmic literature. The article in this issue by Lisboa et al results should include one or both eyes of a patient. A recent (See page 1317) found that only 21% of the publications in 3 analysis of 206 ophthalmic randomized clinical trials by Lee major ophthalmology journals in 2012 used simple descripet al2 found explicit definition of the study hypothesis in only 27.2% of studies. Noninferiority and equivalence tive statistics (percentages, means, medians, histograms, and studies were particularly likely to have methodologic errors. standard deviations). The authors claim that to understand The ophthalmic literature has increasing numbers of 50% of the articles published would require knowledge of 15 prospective randomized trials that report noninferiorly statistical methods and that comprehension of 90% of articles between 2 treatments. The problems with appropriate design would require familiarity with 29 statistical methods. These and interpretation of these trials are such that new guidelines 29 methods include some fairly obscure techniques such as for the reporting of these trials were published by the multiway tables, sensitivity analysis, and receiver operating CONSORT group.3 The P value often is considered the holy characteristics. Their article raises important questions about grail of proving a hypothesis, but a recent publication by how the readers of Ophthalmology, and especially the reNuzzo4 points out that the concept of statistical significance viewers of these manuscripts, can judge the validity of the can be very deceptive. In a study with a P value of 0.05, claims supposedly proved by statistics. there is at least a 29% chance that the measured effect is We should not accept as fact that a new treatment is false, and even with a highly significant P value of 0.01, the superior to another based on statistical analysis alone, chance of a false conclusion decreases only to 11%. There is especially because the statistical methods may be erroneous. even risk in making a distinction I reviewed a manuscript last year for a prospective study that had a We should not accept as fact that a between statistically significant not statistically significant relatively small sample size (fewer new treatment is superior to another and results within a single study. It than 90), a small difference in based on statistical analysis alone has been demonstrated by means between the 2 groups, and Gelman and Stern5 that the moderate standard deviations. I differences between the significant and nonsignificant results was puzzled when the authors claimed that the differences often are not statistically significantly different. Clinicians between 2 groups were statistically significant. When I and reviewers need additional education about some of the checked their calculations with the same software used by less frequently used statistical methods mentioned in the the authors, I found no statistically significant differences study by Lisboa et al at a level they can understand, because between the 2 groups. The manuscript was returned to the many of the primary references defining these statistical authors with advice to have a statistician review their results tests are written for professional statisticians. Additional because they apparently were not using the statistical softreview articles concerning different types of statistical ware appropriately. The availability of relatively inexpentesting reported in the biomedical literature would be very sive computer-based statistical programs with a wide variety helpful. of statistical modules has allowed some authors to try out a Another important distinction is what is statistically number of statistical methods on the entire data set or subsignificant and what is clinically relevant. A difference in sets and simply to choose the one that gives the most visual acuity between treatment A and B of 0.04 logarithm impressive results. This has given rise to a new statistical of the minimum angle of resolution units (2 letters) certainly term coined p hacking by Uri Somonsohn from the Wharton may be statistically significant, but clinicians should ask School at the University of Pennsylvania. Recall that if you whether the difference is clinically meaningful to drive are looking at risk factors for a disease and evaluate 20 therapeutic decisions in favor of one treatment over another. potential risk factors, one by chance will achieve P ¼ 0.05. The physicist Ernest Rutherford observed: “If your experiThis emphasizes why appropriate statistical corrections for ment needs statistics, you ought to have done a better multiple comparisons are essential. Another phase 2 study I experiment.” It may seem paradoxical, but the really reviewed this past year defined statistical significance as P important advances in ophthalmology do not need statistics. < 0.2, and sure enough, the researchers found a statistically The value of cataract surgery in improving the quality of life significant difference between 2 groups, even though it did in elderly patients and the benefits of intravitreal antie not meet the usual P < 0.05. vascular endothelial growth factor injections in eyes with The medical literature supports the need for more rigorous subfoveal choroidal neovascularization or central retinal evaluation of the statistical methods used in medical studies. A Ó 2014 by the American Academy of Ophthalmology Published by Elsevier Inc.
ISSN 0161-6420/14/$ - see front matter http://dx.doi.org/10.1016/j.ophtha.2014.04.005
1315
Ophthalmology Volume 121, Number 7, July 2014 vein occlusion are so great that statistics prove only what is obvious to patients and their ophthalmologists. This raises the question of when statistics are needed in research publications. Statistics are most helpful when the treatment effect is not apparent, such as when multiple effects are being analyzed or when there are confounding factors that must be adjusted. Those who read the manuscript do not have to understand every aspect of the statistical method, but should be able to understand enough to make a decision about whether the conclusions of the manuscript are proven by the data. Unfortunately, complex statistical methods sometimes are used to obscure the fact that the simple statistical tests did not identify statistical significance. This is why readers must be circumspect when these methods are used, unless clearly justified by the type of analysis performed. The authors should not assume that readers of an ophthalmic journal have an extensive knowledge of statistical methods. They should strive to make the statistical methods and the presentation of study results comprehensible, because it will increase the readers’ acceptance of the authors’ conclusions. Manuscripts should include a description explaining why the authors used a less common statistical analysis when it would seem that a t test or chi-square test would have been appropriate. It is also helpful if the authors can use more than one statistical
Financial Disclosures Financial Disclosure(s): Grant support - Genentech (Roche), Regeneron Pharmaceuticals.
1316
method to evaluate the data in different ways when appropriate, rather than presenting only the one that makes the data look best. If different statistical methods such as t tests, confidence intervals, and nonparametric tests all identify the primary study results as statistically significant, it is more likely that a real difference exists between the 2 groups examined. Authors are encouraged to acknowledge if a statistician was consulted. It would reassure readers, manuscript reviewers, and editors even more if the study were prospectively designed with predefined statistical analysis using assistance from a statistician. That would be significant. References 1. Holopigian K, Bach M. A primer on common statistical errors in clinical ophthalmology. Doc Ophthalmol 2010;121:215–22. 2. Lee CF, Cheng AC, Fong DY. Ophthalmic randomized controlled trials reports: the statement of the hypothesis. Am J Ophthalmol 2014;157:254–9. 3. Piaggio G, Elbourne DR, Pocock SJ, et al. Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement. JAMA 2012;308:2594–604. 4. Nuzzo R. Scientific method: statistical errors. Nature 2014;506: 150–2. 5. Gelman A, Stern H. The difference between “significant” and “not significant” is not itself statistically significant. The American Statistician 2006;60:328–31.