Clinical Prediction Rules

Clinical Prediction Rules

ANNALS OF EMERGENCY MEDICINE JOURNAL CLUB Clinical Prediction Rules Answers to the November 2009 Journal Club Tyler W. Barrett, MD David L. Schriger,...

485KB Sizes 0 Downloads 48 Views

ANNALS OF EMERGENCY MEDICINE JOURNAL CLUB

Clinical Prediction Rules Answers to the November 2009 Journal Club Tyler W. Barrett, MD David L. Schriger, MD, MPH

From the Vanderbilt University Medical Center, Nashville, TN (Barrett); and the University of California, Los Angeles, CA (Schriger).

0196-0644/$-see front matter Copyright © 2009 by the American College of Emergency Physicians. doi:10.1016/j.annemergmed.2009.11.021

Editor’s Note: You are reading the twelfth installment of Annals of Emergency Medicine Journal Club. The questions and the article they are about (Vaillancourt et al. Ann Emerg Med. 2009;54:663-671) were published in the November 2009 issue. Information about journal club can be found at http://www.annemergmed.com/content/journalclub. Readers should recognize that these are suggested answers. We hope they are accurate; we know that they are not comprehensive. There are many other points that could be made about these questions or about the article in general. Questions are rated “novice,” ( ) “intermediate,” ( ) and “advanced” ( ) so that individuals planning a journal club can assign the right question to the right student. The “novice” rating does not imply that a novice should be able to spontaneously answer the question. “Novice” means we expect that someone with little background should be able to do a bit of reading, formulate an answer, and teach the material to others. Intermediate and advanced questions also will likely require some reading and research, and that reading will be sufficiently difficult that some background in clinical epidemiology will be helpful in understanding the reading and concepts. We are interested in receiving feedback about this feature. Please e-mail [email protected] with your comments.

DISCUSSION POINTS 1. Clinical prediction models (also referred to as decision rules, prediction rules, or prognostic models) have become an important component of emergency medicine practice. The publication of such models has steadily increased during the past 15 years. A. Define the term “clinical prediction rule.” B. Clinical prediction rules have been developed for many emergency department (ED) complaints (eg, syncope, cervical spine injury, and ankle pain). Physician scientists, including this Journal Club’s authors, have differing opinions of the contribution of decision rules to physician decisionmaking. Discuss the pros and cons of clinical prediction rules and their effect on patient care. 2. This paper describes a prospective, multicenter, independent validation of the Canadian C-Spine clinical prediction rule in the out-of-hospital setting. 380 Annals of Emergency Medicine

A. Prediction modeling experts have established detailed methodological standards for the development and validation of a clinical prediction rule. Your department’s research director asks you to develop a prediction model for determining which patients with flulike symptoms need a chest radiograph to exclude pneumonia. Describe the recommended procedures for developing a new clinical prediction rule. What are some of the problems with the methodology described in 2 cited articles? B. Publications describing the development of a new clinical prediction rule often caution readers that the model must be validated before use in clinical practice. Why should clinical prediction rules be validated? Describe common internal and external validation techniques. C. The original Canadian C-Spine Rule was developed and validated for patients evaluated by physicians in the ED for possible cervical spine injury. This study examines the performance of a slightly modified rule when implemented by a different group of evaluators, specially trained paramedics. Describe what can happen when a prediction rule is enacted by a different group of evaluators than those for whom it was developed. Do you think the authors adequately accounted for this possibility? D. The criterion standard for the primary outcome, acute cervical spine injury, was defined as “any fracture dislocation, or ligamentous instability demonstrated by radiographic imaging.” Nearly half (1,126/2,393) of the eligible patients enrolled were not evaluated with cervical spine radiographs. What did the authors use as a surrogate measure for the criterion standard for these patients? The authors present the baseline characteristics for the patients with incomplete and complete outcome assessments. Why did the authors include this information in the appendix? 3. ED patients are often exposed to ionizing radiation from radiographic studies. Several prediction rules have been developed to reduce the number of unnecessary radiographic studies. Volume , .  : April 

Journal Club A. What is the estimated radiation exposure dose from a 2-view chest radiograph? Compare that dose to a cervical spine radiograph series and to a computed tomography (CT) scan of the cervical spine. What is the estimated lifetime risk of cancer attributable to these imaging studies? B. Many EDs perform “pan scan” CT imaging studies on trauma patients, regardless of whether the physical examination is concerning for injuries to each of the body regions. Why do many trauma surgeons advocate the use of whole-body CT imaging? What is the estimated radiation exposure and CT-related cancer risk from a single pan scan CT? Include in your answer how patient age and sex affect the overall risk of developing a cancer from ionizing radiation. How might cancer risk from imaging affect your management of trauma patients? 4. You have recently been promoted to the local emergency medicine services (EMS) medical director. The city council has charged your department to improve customer satisfaction scores and reduce your expenses by 10%. A. The rotating emergency medicine EMS resident recommends that the spine immobilization protocol be changed to include out-of-hospital application of the Canadian C-Spine Rule. The resident cites the Vaillancourt et al article and suggests that reducing unnecessary cervical spine immobilizations would result in decreased expenses and improved patient approval ratings. Would you amend the immobilization protocol according to the data? B. This study’s cervical spine injuries incidence was smaller (0.6%) than previous reports (2%). This resulted in a large 95% confidence interval for the rule’s sensitivity. How might the wide confidence intervals affect your decision to permit out-of-hospital cervical spine injury screening?

ANSWER 1 Q1. Clinical prediction models (also referred to as decision rules, prediction rules, or prognostic models) have become an important component of emergency medicine practice. The publication of such models has steadily increased during the past 15 years. Q1.a Define the term “clinical prediction rule.” A clinical prediction rule is defined as a decisionmaking instrument for clinicians that incorporates 3 or more variables from the patient’s history, physical examination, and simple diagnostic tests.1 Steyerberg2 writes “clinical prediction [rules] may provide the evidence-based input for shared decisionmaking, by providing estimates of the individual probabilities of risks and benefits.” Toll et al3 describe that a prediction rule’s objective is to “estimate the probability that a certain outcome is present (diagnosis) in an individual or will occur (prognosis).” Q1.b Clinical prediction rules have been developed for many emergency department (ED) complaints (eg, syncope, cervical spine injury, and ankle pain). Physician scientists, including this Journal Club’s authors, have differing opinions of the contribution of Volume , .  : April 

decision rules to physician decisionmaking. Discuss the pros and cons of clinical prediction rules and their effect on patient care. Pros (per Dr. Barrett) The publication of clinical prediction rules has increased steadily during the past 15 years.2,3 The ED is a fast-paced, complex, high-risk patient care environment in which decision rules are most likely to be useful.4 This unique treatment environment might explain the large number of prediction rules developed for use in the management of common ED presenting complaints.5-11 Prediction rules are designed to either provide probabilities of a specific outcome without recommending a treatment decision (assistive decision rules) or to recommend a certain patient management decision (directive decision rules).4 Assistive rules do not direct the physician to admit a patient with chest pain but rather provide an evidencebased probability of acute coronary syndrome according to that individual patient’s information. The physician must decide whether his or her clinical suspicion is consistent with the rule’s assigned risk for the patient. When they are consistent, the treatment decision is often straightforward. However, when the physician judgment and prediction rule probability are discordant, physician gestalt often prevails. Physician clinical judgment is generally more sensitive but less specific than the prediction rule.4 Prediction rules may function as reassurance for physicians not to perform an extensive evaluation on a lowrisk patient. For example, recall the last patient with mild paraspinal neck pain after a low-speed motor vehicle crash that you treated in the ED. Your clinical suspicion that the patient had a significant cervical spine injury was low, and documenting that the patient was “NEXUS negative” further supported your decision to not order radiographs. When asked by the patient (or more often a family member) why you did not obtain neck radiographs, the physician can reference the validated prediction rule research, rather than just claim clinical experience. Welldeveloped and validated prediction rules might assist lessexperienced physicians by prompting them to inquire about important risk factors for certain serious diseases (eg, history of cancer during an evaluation for pulmonary embolism). The ideal prediction rule fulfills the following criteria: contains all the clinically sensible important risk factors, is easy to use in the clinical setting, accurately predicts the outcome of interest, has been validated outside the setting in which it was developed, demonstrates benefit in impact analysis, and provides clear direction to the physician about patient management.2 Few of the developed prediction rules fulfill all of these criteria and most have not undergone impact analysis. Critics of prediction rules question whether these rules actually alter patient care and improve efficiency.4 This is a relevant criticism because few impact studies have been performed to quantify a prediction rule’s value in daily practice.12 The ED is an excellent venue for such impact studies, especially with the potential impending health care reform advocating more standardized patient evaluations. Annals of Emergency Medicine 381

Journal Club Cons (per Dr. Schriger) In 1973, Wennberg and Gittelsohn’s13 pioneering work on small area variations in health care showed that the volume of health care provided was more closely related to the number of providers and their practice style than it was to the needs of the patients. Their work unequivocally demonstrated that there was tremendous variation in health care delivery and that at least some of this variation is undesirable. The medical community’s initial response was “clinical standards,” a term that was quickly deemed to be too rigid and was replaced with “clinical guidelines.” For the past 25 years, the debate has focused on how these guidelines should be made (what blend of evidence and opinion) and how rigidly they should be applied. If medicine is a 16-lane superhighway with physicians driving in every lane, is the goal to chop off the 2 most extreme lanes, to restrict travel to the center 8 lanes, or to limit travel to 1 or 2 lanes? Is the goal to suggest which lanes are preferred or to provide the physician with exact details of how to drive? A lack of agreement about the purpose of clinical guidelines and decision instruments has led to some interesting squabbles that some take seriously but some find comical. I am amused by investigators who puff up their chests and stick their necks out (bad pun intended) as they argue why “my c-spine rule is better than your c-spine rule.” Are they to be taken seriously? Is one rule really better than another or does their effort to differentiate one rule from another blind them to the essential similarities of the rules? For me, the message is not “if you use this exact recipe you will know how to treat patients” but “that in large numbers of patients in 2 different populations studied by different investigators, there was a general pattern that patients who appeared to be capable of reporting pain if present seldom had a neck fracture in the absence of bony tenderness or neurologic deficit.” I find myself using all of the decision rules that way; I would be hard pressed to tell you the specifics of any particular rule, except, perhaps, the Ottawa Ankle Rule, yet I guide my own practice with the general principles implied by each rule’s specifics. I am a big fan of evidence-informed medicine and am grateful to investigators who study large populations to analyze what characteristics are associated with the presence or absence of a condition. I file this information away so that I can blend it with my gestalt to make sure that my clinical judgment is not taking me too far astray. I never, however, pull out a decision rule and follow it like a recipe. Why? Probably because I don’t believe that that is the best way to help my patients. I have little faith in the stability of the rules and suspect that the general principles of the rules are far more important than their specifics. That is not to say that specific rules should not be developed; they serve the purpose of delineating one style of practice that probably works. Please read the answer to the final part of question 2a for insight into where some of my skepticism arises. 382 Annals of Emergency Medicine

ANSWER 2 Q2. This paper describes a prospective, multicenter, independent validation of the Canadian C-Spine clinical prediction rule in the out-of-hospital setting. Q2.a Prediction modeling experts have established detailed methodological standards for the development and validation of a clinical prediction rule. Your department’s research director asks you to develop a prediction model for determining which patients with flulike symptoms need a chest radiograph to exclude pneumonia. Describe the recommended procedures for developing a new clinical prediction rule. What are some of the problems with the methodology described in 2 cited articles? The first step in designing this study is to define the problem of interest. In this example, the problem is that many individuals with flulike symptoms undergo unnecessary chest radiographs to identify the rare case of flu-associated pneumonia. The hypothesis is that data available within the first hour of ED management can be used to assess a patient’s risk of having pneumonia and the need for a chest radiograph. The study objective is to develop a reliable, discriminating, multivariable model that accurately identifies patients with flulike symptoms who require a chest radiograph to rule out pneumonia. There are multiple ways to approach this problem. Rules can take several forms. One common format, used by many radiography use guidelines, states that a patient does not need a test as long as certain binary variables are not present. If even one characteristic is present, then the study cannot be omitted (according to the rule). Another rule style assigns points according to the presence or absence of specific predictors (eg, Centor criteria for exudative pharyngitis, Wells Score for pulmonary embolism). The cumulative score corresponds to an estimated risk for the disease. The use of nomograms and computer software packages is an alternative model that might be as clinically sensible and user friendly as the aforementioned rules. Regardless of the final presentation of the model, prediction model researchers should adhere to the following methodology. The choice of candidate predictors for the model is paramount. Strong predictors are required for a well-performing prediction rule, and the strength of a predictor is determined by its association with the outcome and the distribution of the predictor within the study population.2 Binary predictors that are strongly correlated with the outcome but rarely occur (eg, decapitated patients are always dead, but few patients are decapitated) are not as useful as those that are present in half the patients.2 The most important predictors must have clear, clinically sensible definitions and have minimal missing values among the participants.1,14 Predictor variables must be readily available to physicians in the routine management of patients with flulike symptoms and enter the model in the same temporal manner that the predictor would be available in the clinical arena.1,2,14-17 Candidate predictor variables should be predetermined according to clinical expertise and an exhaustive review of the Volume , .  : April 

Journal Club

Volume , .  : April 

6

Patient outcome by Variables X and Y

-2

0

Variable Y 2

4

y = 3.6 - x

-4

-2

0

2 Variable X

Normal

Diseased

4

6

Figure 1A.

4

6

Patient outcome by Variables X and Y

0

Variable Y 2

y = sin(5.5x) - x+3.7

-2

related literature.14 They need to be biologically plausible if the prediction rule is to have face validity. They need to be readily available if the rule is to be successfully implemented. Triage vitals signs, patient history of underlying lung disease, and chronic steroid use would be relevant candidate predictors. Invasive studies or laboratory results (such as sputum culture) that are not available within 1 hour will result in a rule that cannot be successfully implemented. The outcome to be predicted also must be clearly defined, clinically relevant, and relatively immune to misclassification.1 The most well-defined outcome is patient death. In this example, the researchers must clearly define the criterion standard for diagnosing pneumonia. Would a portable anteroposterior view be adequate, or would at least posteroanterior and lateral views be required? Diagnostic uncertainty and the potential for misclassifying patients will bias the performance of the model and likely result in poor performance in independent validation studies. Once the candidate predictor variables and outcome variables have been defined, some process must be used to generate the rule. There is considerable controversy about the best methods for doing this. We begin with a very basic discussion of the concepts using a 2-variable decision rule. Keep in mind that overfitting issues are amplified as the number of predictor variables increases. Imagine for a moment a clinical situation in which 2 continuous variables are the only ones available to make clinical decisions about a binary outcome. One can visualize this as a scatterplot in which patients (represented by pink and blue dots that signify their outcome) are placed in the appropriate location on the x and y axes. Should all of the pink dots cluster in one region and all the blue cluster in another, then it would be easy to create a rule that identifies regions of values of variable X and variable Y that are strongly associated with a particular outcome. In other words, investigators are attempting to develop a model that says if variable X has certain values and variable Y has certain values, then one can be certain that a patient will not have a certain outcome. In a 2-variable system, this requires defining a boundary line (not necessarily a straight line) that segregates the blue dots from the pink ones. In Figure 1A, we see that the straight line y⫽–x⫹3.6 correctly segregates all but 3 of the 250 pink dots and 3 of the 250 blue dots into the proper region. Said another way, it achieves a sensitivity of 98.8% and a specificity of 98.8%. These test characteristics might be fine or, if this were a c-spine fracture rule, might be wholly inadequate; we would not want to miss 12 of 1,000 fractures. Investigators might be tempted to find a more promising rule. The equation y⫽sin(5.5x)–x⫹3.7 achieves 100% sensitivity and 100% specificity on this data set (Figure 1B). Should we be excited to have found a perfect rule, one that not only achieves perfect separation but also sounds incredibly scientific? I hope your intuition says no. Four main issues should temper our excitement: (1) overfitting because of

-4

-2

0

2 Variable X

Normal

Diseased

4

6

Figure 1B.

random error, (2) overfitting because of systematic error, (3) lack of biological plausibility, and (4) measurement error. Overfitting from random error is a fairly easy concept to understand. Imagine that in the same setting that generated the 500 points in Figure 1 we enrolled another 500 patients. The new patients can be thought of as being randomly drawn from the same population that the first 500 patients were drawn from. Although both sets of 500 patients will reflect the characteristics of the underlying population, we know that they will differ from each other. The degree to which they will differ is a function of size of the sample and the underlying variance in the population, and statistics can help us anticipate how much they will vary. The important point, however, is that if we throw a second sample onto Figure 1, the performance of the rules changes (Figure 2). The linear rule now has a sensitivity 99.2% (496/500) and a specificity of 99% (495/500), both slightly higher than the original estimate. The sine rule does not fare as well. The sensitivity decreases from 100% to 99.8% (499/500) and Annals of Emergency Medicine 383

Journal Club

-4

-2

Variable Y 2 0

4

6

Patient outcome by Variables X and Y

-4

-2

0

2 Variable X

Normal

Diseased

4

6

Figure 2.

the specificity decreases to 99.4% (497/500). Decision rules often concern rare events such as spine fractures, and, even when a rule has 100% sensitivity in 1,000 cases, we must remember that, by random variation alone, it is possible that there are one or more fractures in the next 1,000 patients with these characteristics. Overfitting can also arise from nonrandom, systematic error. A rule developed in an inner-city hospital is applied to a rural population. There are many reasons to believe that it may perform differently. Just as random variation could lead to a decrease in rule performance in subsequent samples, so too can systematic differences. For this reason, users have to be skeptical of rules developed in single sites or in groups of sites that differ substantially from the site in which the user intends to apply it. Topologists will tell us that it is always possible to find a surface that will segregate 2 sets of points as long as we do not constrain the shape of that surface. As the surface becomes more complex, however, the likelihood of an overfitted, unstable rule increases. Furthermore, such complex surfaces often lack biological plausibility. In our first 500 subjects, our sine wave did better than our line, but do we have any reason to believe that 2 variables in a medical prediction rule should be related to the outcome in this way? Even our line may lack plausibility. Users should be skeptical of models that relate variables in ways that seem disconnected from the clinical situation. Furthermore, they should be skeptical of rules of the form (probability of event)⫽f(variable A, variable B) when the more likely situation is (probability of event)⫽f(variable A, variable B, variable A⫻variable B). The first equation is the equivalent of saying that the effect of A on the outcome is independent of the state of variable B and vice versa. This is a very strong and likely incorrect assumption, given that there is synergism or antagonism in most biological systems. For example, the probability that a patient with rales and fever has pneumonia is likely higher than the sum of the probabilities of 384 Annals of Emergency Medicine

Figure 3. Interrater reliability for a binary candidate predictor variable. Interrater reliability⫽100%, ␬⫽1.

pneumonia in a patient with fever alone or rales alone. A rule of the form (probability of pneumonia)⫽f(fever, rales), although simpler, may not be as useful as one of the form (probability of pneumonia)⫽f(fever, rales, fever⫻rales). Finally, we must consider the effect of measurement error and misclassification. In our example, measurement error, be it random or systematic, could lead to incorrect values for x or y for certain individuals. Developers of prediction rules typically perform interrater reliability studies to determine whether users can reliably determine the values of the independent variables. The July 2009 Journal Club provides a detailed discussion on interrater reliability for readers interested in this topic. Unfortunately, these efforts are insufficient to ensure the reliability of the measure. This is particularly true for rare categorical outcomes in which it is not unusual to have a highly imbalanced interrater reliability data table (Figure 3). Although this table tells us quite a lot about the reliability of negative ratings (in the 98 instances in which one rater thought the finding was absent the other concurred), it tells us very little about agreement when the finding is present. The raters agreed 2 out of 2 times, but who knows whether this 2 of 2 would be 50 of 50 or 25 of 50 if more observations were made. The 1-sided 95% confidence interval (CI) for 2 of 2 is 15% to 100%, so “2 of 2” does not provide much certainty about the reliability of the measure, even though the authors can claim that they achieved 100% agreement and a ␬ of 1. This phenomenon can be seen in Figure 4 of the Vaillancourt et al article because only 2 patients were judged to have trouble rotating their neck. The stability of the rule also depends on proper classification of the outcome measure. If the outcome is inaccurately measured, be it by systematic or nonsystematic error, the rule may not perform as expected. We hope this simple example illustrates how complex rule development is. With 3 variables, the most simple boundary surface is a plane. If you can imagine what the “surface” looks like when there are 8 independent variables, then you have chosen the wrong career and should leave medicine for astrophysics. Regardless, recognize that the risk of overfitting is high no matter what method of rule development is used. One Volume , .  : April 

Journal Club particularly dangerous method of rule development is multivariable analysis using stepwise selection of predictors according to the univariate statistical significance of each candidate predictor and the outcome. Stepwise methods lead to instability of predictor selection, biased estimates of coefficients, exaggeration of P values, and worse predictive quality than using the full model without selection.2,14,18 Alternative methods such as Classification and Regression Trees (CART)19 are not immune to issues of overfitting.2 In creating our rule for predicting the need for chest radiography, we will also have to consider handling missing data, calculating appropriate sample size, and validating the rule on a different sample. Q2.b Publications describing the development of a new clinical prediction rule often caution readers that the model must be validated before use in clinical practice. Why should clinical prediction rules be validated? Describe common internal and external validation techniques. Clinical prediction models must be validated to ensure that the model accurately predicts outcomes in patients who were not part of the development cohort. A valuable prediction model maintains its calibration (agreement between predicted risks and observed risks) and discrimination (model’s ability to differentiate between patients with and without an outcome) when applied to various groups of patients. Prediction models must be validated before use in clinical practice if we are to be confident of decisions that are based on them. When developing a new prediction rule, investigators should always measure and report the model’s internal validation. Internal validation determines the reliability of the newly developed prediction model in the development patient cohort. A prediction model that fails to reliably predict outcomes in its own development cohort is likely of little value in other patient groups. There are 2 common strategies for internal validation: split-sample and bootstrap resampling. In the split-sample technique, the entire development cohort is randomly divided into derivation (50% to 67% of original sample) and validation (33% to 50% of original) subsets. The model is developed in the derivation subset and tested in the validation subset.2 The strategy often results in overly optimistic assessments of performance because the validation subset is closer to the derivation subset than an external group of subjects.20 The split-sample technique is inefficient and results in a less stable model because the model is created on only half to two thirds of the original cohort, rather than the entire sample. A more efficient alternative technique is bootstrap resampling that allows the entire cohort to be used for model development. Bootstrap resampling involves the repeated selection of samples, with replacement from the entire original cohort. Bootstrap validation provides a more stable, optimismcorrected estimate of model performance.2 Laupacis et al1 opined that “it is essential to prospectively validate the rule in a group of patients different from the group in which it was derived, preferably with different clinicians.” The external validation of prediction rules is essential to evaluate Volume , .  : April 

the generalizability to individuals with different baseline characteristics. External validation involves the measurement of a model’s performance in a unique patient population. There are multiple types of external validation. In temporal validation, investigators develop a model on a group of patients (often a retrospective cohort) and then test the model’s performance by applying the model to a prospective cohort of patients at the same center. Temporal validation can provide an overly optimistic assessment of rule performance because patients in the validation subset will likely have more in common with the development subset than a cohort of patients in another hospital, state, or nation. Furthermore, because the clinicians who developed the rule are often the ones performing the temporal validation, there may be an artificial reduction in interobserver variation in applying the rule. Clinicians not involved in the rule development might assess the predictor variables differently, resulting in worse prediction rule performance.2 The preferred method is a fully independent validation, as advocated by prognostic research experts.1,2,20 Investigators not involved with the rule’s development test the rule’s performance in unique groups of patients. The investigators measure interrater reliability of each element of the rule and calculate the rule’s calibration and discrimination. Although the advantages of this strategy are indisputable, the push for rigorous validation has had an unintended consequence, the proliferation of new prediction rules. Investigators design an independent validation study but, on finding poor performance, reject the rule and use the data set to develop their own model. This continuous development of new rules neglects information garnered from the previous studies and for some topics has produced an ungainly set of contradictory rules. There are more than 60 models that predict outcome in breast cancer and 25 that predict long-term outcome after head trauma.12 Toll et al3 advocate that investigators opt instead to update the prediction rule when it performs poorly in their patient cohort. The updated prediction rule is developed from the combination of the original development cohort and the validation study patients, thus increasing the total amount of patient information used in the model and likely the updated model’s generalizability.3 Q2.c The original Canadian C-Spine Rule was developed and validated for patients evaluated by physicians in the ED for possible cervical spine injury. This study examines the performance of a slightly modified rule when implemented by a different group of evaluators, specially trained paramedics. Describe what can happen when a prediction rule is enacted by a different group of evaluators than those for whom it was developed. Do you think the authors adequately accounted for this possibility? When a rule intended for use by practitioners with a certain level of training and experience working in a specific setting is implemented by practitioners with a different level of training and experience in a different setting, there is no guarantee that performance will be maintained. As discussed in questions 2a Annals of Emergency Medicine 385

Journal Club and 2b, a rule’s performance depends in part on the consistent scoring of each rule input. There is no guarantee that the new personnel (in this case, paramedics) will score the items the same way the physicians did. Although Vaillancourt et al demonstrated excellent interrater reliability of paramedics in a convenience sample of 155 patients, they did not assess paramedic versus physician agreement. Such a comparison would be the most direct way of determining whether the paramedics were implementing the rule in the same manner as the physicians. Table 3 of the Vaillancourt et al article compares physician and paramedic implementation of the rule. Although both types of providers detected all 12 cases, 12 of 12 has a 1-sided 95% CI of 74% to 100%, meaning that paramedic or physician sensitivity could be as low as 74%. Consequently, the study does not provide indisputable proof that paramedics can achieve adequate sensitivity. The paramedics’ specificity was somewhat lower than the physicians’, but this more conservative approach is not necessarily a bad thing. It would be useful to know, however, whether the paramedics were answering the individual items more conservatively and thereby generating different advice or whether they were generating the same advice but choosing to ignore it. The article does not provide the information required to make this determination. Finally, the paramedics and physicians are making their judgments in different settings and at different times. This design mirrors what practice will be like when the rule is implemented, good justification for conducting the study in this manner. Nevertheless, it would have been useful to have the paramedics transport the patients to the ED in spinal precautions and then perform their assessment, which would be followed immediately by the physician’s independent assessment. Overall, however, the greatest problem is the denominator for sensitivity, which is too small to permit an unequivocal statement about safety. Q2.d The criterion standard for the primary outcome, acute cervical spine injury, was defined as “any fracture dislocation, or ligamentous instability demonstrated by radiographic imaging.” Nearly half (1,126/2,393) of the eligible patients enrolled were not evaluated with cervical spine radiographs. What did the authors use as a surrogate measure for the criterion standard for these patients? The authors present the baseline characteristics for the patients with incomplete and complete outcome assessments. Why did the authors include this information in the appendix? The authors explain that for patients who did not receive imaging, a study nurse made contact by telephone or mail “within 14 days and classified them as having no acute cervical spine injury if they met all the following explicit criteria: (1) pain in neck is rated as none or mild, (2) restriction of movement of neck is rated as none or mild, (3) does not require use of a neck collar, and (4) neck injury has not prevented return to usual occupational activities (work, housework, or 386 Annals of Emergency Medicine

school).” They note that this method was shown to have 100% sensitivity for fracture. It would be unethical to subject patients who do not need radiographs to their cost and ionizing radiation. It is therefore wholly appropriate that the authors used clinical follow-up as a means of assigning outcomes to patients who do not receive imaging. This surrogate outcome seems quite reasonable in that even if a tiny fracture was missed, it was a fracture that presumably had no effect on the patient’s short- or long-term well-being. Roughly 1 in 5 eligible patients could not be included in the analysis for lack of complete data. When this occurs, there is always concern that those excluded differ from those included in ways that would make overall results change had they been included. By showing us that the 2 groups are similar across a variety of presumably important characteristics (Table E1), the authors try to assure us that the failure to capture data on these patients does not bias the study.

ANSWER 3 Q3. ED patients are often exposed to ionizing radiation from radiographic studies. Several prediction rules have been developed to reduce the number of unnecessary radiographic studies. Q3.a What is the estimated radiation exposure dose from a 2-view chest radiograph? Compare that dose to a cervical spine radiograph series and to a computed tomography (CT) scan of the cervical spine. What is the estimated lifetime risk of cancer attributable to these imaging studies? Medical uses of radiation, including medical imaging, now account for the largest source of radiation exposure to US citizens.21 The annual effective absorbed radiation dose from natural background sources is reported to be 3 mSv.21 The adult effective dose for a 2-view (posteroanterior and lateral views) is 0.1 mSv (range reported to be 0.05 to 0.24 mSv). A standard cervical spine series exposes the patient to 0.2 mSv (range 0.07 to 0.3 mSv), whereas a CT of the cervical spine results in 4.9 mSv (range 3.5 to 16 mSv).22 These effective radiation doses are age and sex averaged. The importance of age and sex and radiation exposure is discussed in Answer 3b. The estimated lifetime risk of cancer attributable to radiation exposure depends on the patient’s age and sex. A 1-year-old has 10 to 15 times the lifetime risk of developing a malignancy than does a 50-year-old adult when exposed to the same radiation dose.23 A 10-mSv CT study potentially increases risk of a lethal malignant transformation in 1 of 2,000 to 3,000 adult studies and at least once per 500 pediatric studies.24,25 The National Academies Biologic Effects of Ionizing Radiation (BEIR) VII reported that 800 additional cases of cancer will develop per 100,000 patients exposed to a radiation dose of 100 mSv.26 Q3.b Many EDs perform “pan scan” CT imaging studies on trauma patients, regardless of whether the physical examination is concerning for injuries to each of the body regions. Why do many trauma surgeons advocate the use of whole-body CT imaging? What is the estimated radiation exposure and CT-related cancer risk from a single pan scan CT? Include in your answer how patient age and Volume , .  : April 

Journal Club sex affect the overall risk of developing a cancer from ionizing radiation. How might cancer risk from imaging affect your management of trauma patients? Pan-CT imaging during early trauma resuscitation was first reported in 1997 by Huber-Wagner et al 27 and Low et al.28 Whole-body CT is often performed during the evaluation of trauma patients to identify and grade traumatic injuries, detect clinically occult injuries, and expedite disposition on patients with serious mechanisms of injury but without significant injury.29 Failure to diagnose injuries is a leading cause of malpractice claims against trauma surgeons.30 Missed injuries occur more frequently in patients with altered mental status because of intoxication or head injury, those with multiple soft tissue or orthopedic injuries, and those receiving chronic anticoagulation medications.6,31-35 Therefore, the trauma literature has advocated liberal use of CT imaging, reporting that CT results changed treatment in 19% of patients.29 Although the Salim et al29 article has been criticized for poor study methodology, at many trauma centers “pan scan” CT imaging is routine. A recent large, retrospective, multicenter, German study reported that integration of whole-body CT increased patient survival in patients with polytrauma. The authors compared actual patient mortality to the Trauma and Injury Severity Score, and Revised Injury Severity Classification predicted mortality for patients who had whole-body CT versus those with selective CT imaging.27 The authors contend “the crucial factor in whole-body CT is not the exposure of radiation but the early reali(z)ation (sic) and implementation of the findings to the critically injured patient.”27 However, they do admit that their results need to be confirmed in a randomized controlled trial. Furthermore, the investigators acknowledge that the results might be confounded by the preferential selection of likely survivors at centers with more experienced clinicians, better equipment, or highly developed protocols for using whole-body CT in patients likely to benefit from the study.27 Unfortunately, pan-CT imaging is not without a cost, and the potential benefits of identifying occult injuries must be weighed against the potential radiation exposure to the patient. The estimated risk of lethal malignant transformation from a single CT has been reported to occur in 1 of 2,000 to 3,000 adult patients.23-25 According to these estimates, a typical Level I trauma center that treats 2,400 major trauma patients annually and follows a pan-scan CT protocol will directly cause 4 patients to develop lethal cancers. Advocates for pan-scan imaging might argue that the number of lives saved by CT diagnosing life-threatening injuries exceeds this handful of potential cancer deaths. However, one must remember the radiation exposure is irreversible and cumulative and that younger patients, especially children, are at greatest risk for developing cancer. Therefore, each future CT scan or radiograph increases that individual’s risk of cancer. The Einstein et al24 review of the lifetime attributable risk of cancer from CT coronary angiography found rates as high as 1 in 114 Volume , .  : April 

for 20-year-old women because of the radiosensitivity of the underlying breast and lung tissue. Mower25 advocated that physicians be more judicious in their use of CT imaging. Emergency medicine and trauma surgeons should consider more selective CT approaches, rather than automatically checking the pan-scan order box without considering the patient’s history and physical examination findings and likelihood of injury in that body region. Consider that most hospitals have an elaborate consent process for blood transfusions that typically carries a risk of 1 in 250,000 to 1 in 2 million for transmitting hepatitis B or C or HIV to the recipient, yet almost never do hospitals require consent for panscan CT imaging in stable, alert trauma patients. Although the radiation risks of CT seems like a distant, abstract concept in the setting of acute trauma patient management in the ED, physicians should be more cognizant of this important risk. Ultrasonography, serial examinations and observation, and selective imaging might be acceptable alternatives to the pan scan.

ANSWER 4 Q4. You have recently been promoted to the local emergency medicine services (EMS) medical director. The city council has charged your department to improve customer satisfaction scores and reduce your expenses by 10%. Q4.a The rotating emergency medicine EMS resident recommends that the spine immobilization protocol be changed to include out-of-hospital application of the Canadian C-spine Rule. The resident cites the Vaillancourt et al article and suggests that reducing unnecessary cervical spine immobilizations would result in decreased expenses and improved patient approval ratings. Would you amend the immobilization protocol according to the data? Before answering this question, we would need to know more about the EMS system. How seasoned are the out-ofhospital personnel? Are they responsive to continuing education? Is there sufficient time and manpower to train them to an appropriate standard for performing the rule? If all of the pieces were in place, we believe that there is no reason that paramedics could not decide which patients may be safely transported out of spinal precautions. This belief is based partially on the Vaillancourt et al article and partly on common sense and experience. The Vaillancourt et al article does not establish the safety of the procedure. We believe that the upside of this strategy is sufficiently large that it is worth a try as long as there is adequate training and monitoring. But that is just our belief. What do you think? Q4.b This study’s cervical spine injuries incidence was smaller (0.6%) than previous reports (2%). This resulted in a large 95% confidence interval for the rule’s sensitivity. How might the wide confidence intervals affect your decision to permit out-of-hospital cervical spine injury screening? As discussed above, because the rule’s sensitivity is based on 12 fractures, the 1-sided 95% CI is 74% to 100%. If this were our sole source of information, it would be insufficient to support a trial of this strategy. It is not, however, the sole source Annals of Emergency Medicine 387

Journal Club of information. We know that Canadian and National Emergency X-Radiography Utilization Study (NEXUS) rules work for physicians, and we can hypothesize that paramedics should be able to produce near-equivalent results. We also might hypothesize that the kinds of fractures that might be missed are likely stable and that it is highly unlikely that a patient with an occult fracture will come to harm if not immobilized. Multiplying the low likelihood of a miss by the low likelihood of harm even if there is a miss, we get some pretty good odds that this strategy will be safe and successful. In Bayesian terms, the distribution of 12 of 12 is wide but narrows if it is conditioned by previous knowledge that suggests that the risk is low. So like many things in medicine and life, a single piece of evidence (in this case, the Vaillancourt et al study) is insufficient to provide a conclusion, and our conclusions have more to do with our previous beliefs than with the new piece of evidence. Section editors: Tyler W. Barrett, MD; David L. Schriger, MD, MPH

REFERENCES 1. Laupacis A, Sekar N, Stiell IG. Clinical prediction rules. A review and suggested modifications of methodological standards. JAMA. 1997;277:488-494. 2. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. New York, NY: Springer; 2009. 3. Toll DB, Janssen KJ, Vergouwe Y, et al. Validation, updating and impact of clinical prediction rules: a review. J Clin Epidemiol. 2008;61:1085-1094. 4. Reilly BM, Evans AT. Translating clinical research into clinical practice: impact of using prediction rules to make decisions. Ann Intern Med. 2006;144:201-209. 5. Gibler WB, Runyon JP, Levy RC, et al. A rapid diagnostic and treatment center for patients with chest pain in the emergency department. Ann Emerg Med. 1995;25:1-8. 6. Hoffman JR, Mower WR, Wolfson AB, et al. Validity of a set of clinical criteria to rule out injury to the cervical spine in patients with blunt trauma. National Emergency X-Radiography Utilization Study Group. N Engl J Med. 2000;343:94-99. 7. Kline JA, Courtney DM, Kabrhel C, et al. Prospective multicenter evaluation of the pulmonary embolism rule-out criteria. J Thromb Haemost. 2008;6:772-780. 8. Marrie TJ, Lau CY, Wheeler SL, et al. A controlled trial of a critical pathway for treatment of community-acquired pneumonia. CAPITAL Study Investigators. Community-Acquired Pneumonia Intervention Trial Assessing Levofloxacin. JAMA. 2000;283:749-755. 9. Plint AC, Bulloch B, Osmond MH, et al. Validation of the Ottawa Ankle Rules in children with ankle injuries. Acad Emerg Med. 1999;6:1005-1009. 10. Stiell IG, Greenberg GH, McKnight RD, et al. Decision rules for the use of radiography in acute ankle injuries. Refinement and prospective validation. JAMA. 1993;269:1127-1132. 11. Stiell IG, McKnight RD, Greenberg GH, et al. Implementation of the Ottawa Ankle Rules. JAMA. 1994;271:827-832. 12. Moons KG, Altman DG, Vergouwe Y, et al. Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ. 2009;338:b375.

388 Annals of Emergency Medicine

13. Wennberg J, Gittelsohn A. Small area variations in health care delivery. A population-based health information system can guide planning and regulatory decision-making. Science. 1973;182:1102-1108. 14. Royston P, Moons KG, Altman DG, et al. Prognosis and prognostic research: developing a prognostic model. BMJ. 2009; 338:b604. 15. Harrell FE Jr, Margolis PA, Gove S, et al. Development of a clinical prediction model for an ordinal outcome: the World Health Organization Multicentre Study of Clinical Signs and Etiological agents of Pneumonia, Sepsis and Meningitis in Young Infants. WHO/ARI Young Infant Multicentre Study Group. Stat Med. 1998; 17:909-944. 16. Harrell FE Jr, Shih YC. Using full probability models to compute probabilities of actual interest to decision makers. Int J Technol Assess Health Care. 2001;17:17-26. 17. Randolph AG, Guyatt GH, Calvin JE, et al. Understanding articles describing clinical prediction tools. Evidence Based Medicine in Critical Care Group. Crit Care Med. 1998;26:16031612. 18. Steyerberg EW, Eijkemans MJ, Harrell FE Jr, et al. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med. 2000;19:1059-1079. 19. Breiman L. Classification and Regression Trees. Belmont, CA: Wadsworth International; 1984. 20. Altman DG, Vergouwe Y, Royston P, et al. Prognosis and prognostic research: validating a prognostic model. BMJ. 2009; 338:b605. 21. Mettler FA Jr, Huda W, Yoshizumi TT, et al. Effective doses in radiology and diagnostic nuclear medicine: a catalog. Radiology. 2008;248:254-263. 22. Winslow JE, Hinshaw JW, Hughes MJ, et al. Quantitative assessment of diagnostic radiation doses in adult blunt trauma patients. Ann Emerg Med. 2008;52:93-97. 23. Picano E. Sustainability of medical imaging. BMJ. 2004;328:578580. 24. Einstein AJ, Henzlova MJ, Rajagopalan S. Estimating risk of cancer associated with radiation exposure from 64-slice computed tomography coronary angiography. JAMA. 2007;298:317-323. 25. Mower WR. Radiation doses among blunt trauma patients: assessing risks and benefits of computed tomographic imaging. Ann Emerg Med. 2008;52:99-100. 26. National Research Council, National Academies. Health risks from exposure to low levels of ionizing radiation: BEIR VII, phase 2. Washington, DC: National Academies Press, 2006. Available at: http://dels.nas.edu/dels/rpt_briefs/beir_vii_final.pdf. Accessed October 28, 2009. 27. Huber-Wagner S, Lefering R, Qvick LM, et al. Effect of whole-body CT during trauma resuscitation on survival: a retrospective, multicentre study. Lancet. 2009;373:1455-1461. 28. Low R, Duber C, Schweden F, et al. [Whole body spiral CT in primary diagnosis of patients with multiple trauma in emergency situations]. Rofo. 1997;166:382-388. 29. Salim A, Sangthong B, Martin M, et al. Whole body imaging in blunt multisystem trauma patients without obvious signs of injury: results of a prospective study. Arch Surg. 2006;141:468-473; discussion 473-475. 30. Morris JA Jr, Carrillo Y, Jenkins JM, et al. Surgical adverse events, risk management, and malpractice outcome: morbidity and mortality review is not enough. Ann Surg. 2003;237:844851; discussion 851-852. 31. Fabbri A, Marchesini G, Morselli-Labate AM, et al. Blood alcohol concentration and management of road trauma patients in the emergency department. J Trauma. 2001;50:521-528.

Volume , .  : April 

Journal Club 32. Ferrera PC, Verdile VP, Bartfield JM, et al. Injuries distracting from intraabdominal injuries after blunt trauma. Am J Emerg Med. 1998;16:145-149. 33. Janjua KJ, Sugrue M, Deane SA. Prospective evaluation of early missed injuries and the role of tertiary trauma survey. J Trauma. 1998;44:1000-1006; discussion 1006-1007.

Volume , .  : April 

34. Lavoie A, Ratte S, Clas D, et al. Preinjury warfarin use among elderly patients with closed head injuries in a trauma center. J Trauma. 2004;56:802-807. 35. Reynolds FD, Dietz PA, Higgins D, et al. Time to deterioration of the elderly, anticoagulated, minor head injury patient who presents without evidence of neurologic abnormality. J Trauma. 2003;54:492-496.

Annals of Emergency Medicine 389