CARDIOLOGY/EDITORIAL
Misleading Negative Chest Radiographs: Should We ADHERE to the Conclusions? Richelle J. Cooper, MD, MSHS
From the UCLA Emergency Medicine Center, Los Angeles, CA.
0196-0644/$-see front matter Copyright ª 2006 by the American College of Emergency Physicians. doi:10.1016/j.annemergmed.2005.11.030
SEE RELATED ARTICLE, P. 13. [Ann Emerg Med. 2006;47:19-21.] In this issue, Collins et al present data from the Acute Decompensated Heart Failure Registry (ADHERE) and suggest that the initial emergency department (ED) chest radiograph may be insensitive to predict a hospital discharge diagnosis of acutely decompensated heart failure.1 The assertion that the chest radiograph is not that sensitive (for many disorders) is not new; however, the idea that 1 in 5 congestive heart failure patients has a false-negative chest radiograph in the ED seems inconsistent with clinical practice. If the chest radiograph does not show signs of heart failure, are we really missing clinically important cases? The face validity of these conclusions seems incongruent with our practice. Either the results are correct and we have been wrong in a number of cases for many years, or the data and assumptions used by Collins et al to infer these conclusions are invalid. I will make an argument for why the latter is likely true. Limitations of Registry Databases and ADHERE The number of patient registry databases and literature based on predominantly convenience sample case series (such as this one) has grown in recent years. Proponents of registries argue that we can obtain more information about actual practice and patient outcomes (effectiveness) as opposed to the efficacy reported from highly structured randomized controlled trials. However, effectiveness can only be measured if there is no selection bias and if the care delivered at the centers that participate in these registries represents typical practice. It is not evident that either of these is true.2,3 In fact, there are many features inherent in registries, including ADHERE, either by design or convenience, that threaten the internal validity of analyses. Medical record review, a common method of data collection in registries, as well as other research, is best when performed with standardized methods.4,5 The ADHERE design includes good medical record review methods, but the retrospectively collected data are still limited to the quality and accuracy of information recorded in the medical chart.6 A registry’s potential benefits often relate to its large, multicenter patient recruitment, but the large sample also leads to a greater potential for misinterpretation of significance. Just as a study that is underpowered may fail to recognize a true Volume 47, no. 1 : January 2006
difference, when a sample is large, the chance that a difference, whether meaningful or not, will be found during analyses is much greater. With more than 100,000 hospital encounters in ADHERE,7 it is easy to find significant statistical differences or to report estimates which appear to be very precise with narrow confidence intervals. However, any statistical test or confidence interval calculation assumes perfect data, without any bias in the data collection or analysis.8,9 Because this is not true in many registries, the potential to misinterpret the veracity and even the importance of a difference noted is much greater. Research based on data from trial registries should be interpreted with caution and should not be confused with prospective, observational trials. Registry database studies are usually best as descriptive reports that help provide insights into developing a hypothesis for prospective research. A registry is only representative of the patients enrolled. Although that statement seems obvious, simple, and even intuitive, the consequences of bias in the selection of patients for the registry are frequently not considered in analyses or conclusions. Research costs time and money, and it may therefore be reasonable or even necessary to enroll a convenience sample. Selection bias associated with convenience sampling can be minimized if every potential case has an equal probability of being selected or consecutive cases are enrolled. That is not always the case, and in more than 1 registry (including ADHERE, in which participating centers only need to submit a monthly quota of selected cases), there is no assurance of consecutive or random sampling. Selection bias makes it impossible to ensure accurate population estimates. In addition, ADHERE allows repeated enrollment of the same patient (without any means to account for this in analysis) that further threatens the accuracy of the population estimates of patient demographics and outcomes.7 Limitations to the Analysis of Test Characteristics of the ED Chest Radiograph If we ignore the general registry problems and biases and pretend ADHERE data were perfect, we must accept 2 more key assumptions to believe the results of the Collins et al analysis of ADHERE: that the criterion standard is correct and that there is no selection bias with regard to ED cases. The validity of these assumptions is dubious. The diagnosis of decompensated heart failure is the criterion standard in the Collins et al analysis. What does it mean that the patient has decompensated heart failure, and what is the best Annals of Emergency Medicine 19
Cooper
Misleading Negative Chest Radiographs way to extrapolate this to a criterion standard for the purposes of diagnostic research? Because ADHERE data are based on medical record review, we do not know how the clinicians decided the patients’ diagnosis, whether there was a standard evaluation of all patients, or even how the definition may have varied at each participating center. This lack of a criterion standard is a fundamental issue that produces bias in diagnostic research that is hard to adjust for.10,11 In practice, the diagnosis of heart failure is usually based on clinical criteria. Echocardiography and radionucleotide and other ancillary laboratory and radiographic tests may be used to assess cardiac function in patients who present with symptoms of heart failure, but how they were used in ADHERE patients is unknown. Presumably, some diagnoses were made based on clinical criteria, and in other cases ancillary tests suggesting cardiac dysfunction influenced the final discharge diagnosis. If the criterion standard (diagnosis of heart failure at hospital discharge) was based on the results of ancillary and functional tests, then only those patients for whom this evaluation was performed would be identified (verification bias), and the relevance to ED practice is uncertain.10 If the diagnosis of congestive heart failure is in part determined by the chest radiograph (incorporation bias), then it is not valid to evaluate the sensitivity of the imaging study for the diagnosis. This type of circular reasoning, in which the chest radiograph is used in practice to establish the discharge diagnosis of congestive heart failure and then later analyzed for its accuracy to detect the same criterion standard (discharge diagnosis), results in overly optimistic estimates of sensitivity. Ultimately, the accuracy of the discharge diagnosis in ADHERE cannot be confirmed. The evaluation of false-negative chest radiographs not only assumes the discharge diagnosis of congestive heart failure (the criterion standard) is correct but also that the ED diagnosis is incorrect, an assumption not supported by any data. The authors’ analysis fails to consider the possibility that a change in the patient’s disease occurred during the hospitalization. The ED and the hospital discharge diagnosis may be different, and both may be correct. The patients who were ‘‘missed’’ in the ED were often diagnosed with a condition that may develop into heart failure as a complication of the underlying disease process (eg, arrhythmias or myocardial ischemia). It is likely that in some cases in which the ED diagnosis does not match the discharge diagnosis, the heart failure was not present during the ED evaluation, resulting in the misclassification of ED radiograph interpretations as false-negative. In ADHERE, the hospital discharge diagnosis of congestive heart failure is not just the criterion standard but is simultaneously the means to identify the convenience sample of enrolled cases. This methodology creates an additional selection bias pertinent to the Collins et al research questions because ADHERE does not include all ED patients with congestive heart failure. ED patients discharged home after ED treatment or those identified with heart failure in the ED but without verification of their disease or change in diagnosis by the admitting physician are not included. 20 Annals of Emergency Medicine
Thus, not only does ADHERE not capture a representative sample of the hospitals’ discharged heart failure patients but also the registry does not capture the EDs’ population of heart failure patients. If the study involves a selected sample, do the results really apply to our patients and our practice? The test is not assessed in the patients (all ED patients with heart failure) for whom we will use the results.12-14 Failure to include all ED patients with an ED diagnosis of congestive heart failure makes the accuracy of the results sought in the Collins et al research question (the sensitivity and false negative rate of the ED chest radiograph to detect the hospital discharge diagnosis congestive heart failure) and the premise of their article (the prevalence of negative chest radiographs in ED patients with congestive heart failure) unknowable. To accurately answer these questions, we need an unbiased sample of all ED patients across the spectrum of the disease. It is not clear what information we will be able to learn from the ADHERE database. It can provide some information about the care and outcomes of the selected patients identified at hospital discharge with decompensated heart failure. The prevalence of negative ED chest radiographs cannot be accurately determined with this database. Collins et al1 dutifully record many of their study’s limitations but then offer a conclusion that is predicated on the assumption that none of the limitations matter.15 If we made different assumptions based on the spectrum of patients treated in the ED and different assumptions about the accuracy of the diagnosis based on differential evaluation, the true number of patients ‘‘missed’’ based on their initial ED radiograph likely would not be 1 in 5 but would be far more infrequent. So what does the emergency physician do with this information? The initial ED chest radiography might be insensitive to the discharge diagnosis in a convenience sample of patients admitted to the hospital. However, the radiograph is a simple test that provides useful clinical information. It not only helps in defining the disease in patients with clinical signs of heart failure but also can reveal complicating features and detect other etiologies associated with a vague or unclear presentation. Although it is important to consider the diagnosis of heart failure even if there are no classic radiograph signs, the alternative diagnoses that provide chest discomfort and dyspnea or hypoxia are equally important. As with any test, one needs to consider whether a test’s result will change patient outcomes before ordering it. Until there is better research to suggest differently, I will continue to order a simple chest radiograph and not feel angst that I am missing clinically important cases of heart failure because the radiograph is ‘‘negative.’’
Supervising editor: Michael L. Callaham, MD Funding and support: The author reports this study did not receive any outside funding or support. Reprints not available from the author. Volume 47, no. 1 : January 2006
Cooper
Misleading Negative Chest Radiographs
Address for correspondence: Richelle J. Cooper, MD, MSHS, UCLA Emergency Medicine Center, 924 Westwood Blvd, Suite 300, Los Angeles, CA 90024; 310-794-0583, fax 310-7940599; E-mail
[email protected].
8. 9.
REFERENCES 1. Collins S, Lindsell CJ, Storrow AB, et al. Prevalence of negative chest radiography in the emergency department patient with decompensated heart failure. Ann Emerg Med. 2006;47: 13-18. 2. Armstrong D, Kline-Rogers E, Jani SM, et al. Potential impact of the HIPAA privacy rule on data collection in a registry of patients with acute coronary syndrome. Arch Intern Med. 2005;165: 1125-1129. 3. Tu JV, Willison DJ, Silver FL, et al, for the Investigators in the Registry of the Canadian Stroke Registry. Impracticability of informed consent in the Registry of the Canadian Stroke Network. N Engl J Med. 2004;350:1414-1421. 4. Gilbert EH, Lowenstein SR, Koziol-McLain J, et al. Chart reviews in emergency medicine research: where are the methods? Ann Emerg Med. 1996;27:305-308. 5. Schwartz RJ, Panacek EA. Basics of research, part 7: archival data research. Air Med J. 1996;15:119-124. 6. Lowenstein SR. Medical record reviews in emergency medicine: the blessing and the curse. Ann Emerg Med. 2005;45:452-455. 7. Adams KF, Fonarrow GC, Emerman CL, et al. Characteristics and outcomes of patients hospitalized for heart failure in the United
10. 11.
12.
13.
14. 15.
States: rationale, design, and preliminary observations from the first 100,000 cases in the Acute Decompensated Heart Failure Registry (ADHERE). Am Heart J. 2005;149:209-216. Maclure M, Schneeweis S. Causation of bias: the episcope. Epidemiology. 2001;12:114-122. Schriger D. Problems with current methods of data analysis and reporting, and suggestions for moving beyond incorrect ritual. Eur J Emerg Med. 2002;9:203-207. Mower WR. Evaluating bias and variability in diagnostic test results. Ann Emerg Med. 1999;33:85-91. Knottnerus JA, van Weel C, Muris JWM. Evidence base of clinical diagnosis: evaluation of diagnostic procedures. BMJ. 2002;324: 477-480. Jaeschke R, Guyatt G, Sackett DL. Users’ guides to the medical literature, III: how to use an article about a diagnostic test, A: are the results of the study valid? Evidence-Based Medicine Working Group. JAMA. 1994;271:389-391. Jaeschke R, Guyatt GH, Sackett DL. Users’ guides to the medical literature, III: how to use an article about a diagnostic test, B: what are the results and will they help me in caring for my patients? Evidence-Based Medicine Working Group. JAMA. 1994;271: 703-707. [No authors listed]. How to read clinical journals, II: to learn about a diagnostic test. Can Med Assoc J. 1981;124:703-720. Schriger DL. Suggestions for improving the reporting of clinical research: the role of narrative. Ann Emerg Med. 2005;45: 437-443.
IMAGES IN EMERGENCY MEDICINE (continued from p. 11)
DIAGNOSIS: Fournier’s gangrene. First described in the literature in 1883, necrotizing fasciitis of the perineum (Fournier’s gangrene) is an infection of the deep subcutaneous tissue, resulting in extensive damage to fat and fascia while oftentimes sparing the skin.1,2 There can be a rapid progression from a seemingly minor infection to extensive tissue destruction with associated systemic symptoms, including sepsis and death. Rapidly increasing severe pain is often the first symptom but may be absent in diabetics as a result of neuropathy. In men, it most commonly affects the scrotum. It is usually a result of perineal trauma or diabetes, with the most common organisms being mixed aerobic and anaerobic bacteria.3,4 Definitive treatment is surgical debridement in conjunction with parenteral antibiotics and supportive care. Mortality ranges from 11% to 65%.2-4 REFERENCES 1. Laor E, Palmer LS, Tolia BM, et al. Outcome prediction in patients with Fournier’s gangrene. J Urol. 1995;154:89-92. 2. Atakan IH, Kaplan M, Kaya E, et al. A life-threatening infection: Fournier’s gangrene. Int Urol Nephrol. 2002;34:387-392. 3. Norton K, Johnson L, et al. Management of Fournier’s gangrene: an eleven year retrospective analysis of early recognition, diagnosis, and treatment. Am Surg. 2002;68:709-714. 4. Kilic A, Aksoy Y, Kilic L. Fournier’s gangrene: etiology, treatment, and complications. Ann Plast Surg. 2001;47:523-527.
Volume 47, no. 1 : January 2006
Annals of Emergency Medicine 21