Evidence-Based Examination of Diagnostic Information

Evidence-Based Examination of Diagnostic Information

Evidence-Based Examination of Diagnostic Information 2 Julie M. Fritz, PhD, PT, ATC Objectives After reading this chapter, the reader will be able ...

109KB Sizes 3 Downloads 54 Views

Evidence-Based Examination of Diagnostic Information

2

Julie M. Fritz, PhD, PT, ATC

Objectives After reading this chapter, the reader will be able to: 1. Describe the process physical therapists (PTs) use to identify the most efficient and effective clinical diagnostic tests. 2. Describe the elements of a “best” clinical diagnostic test. 3. Provide an overview of evidence-based practice and diagnosis, including the rules to apply to judge the existing evidence.

T

he Guide to Physical Therapist Practice20 identifies five elements of patient/client management that must be integrated by PTs in an attempt to optimize the outcome of care (Table 2-1). The examination is the process of obtaining data from the patient. Evaluation requires the therapist to make judgments on the basis of the data. The examination and evaluation lead to a diagnosis, or classification. Diagnosis therefore has a preeminent role in the patient management process because it represents the end result of the examination and evaluation process and is responsible for guiding the selection of interventions and establishing a prognosis.19,20 Despite its importance, many clinicians are unaware of how to optimize the selection and interpretation of diagnostic tests and integrate this information into patient management decisions. The Guide describes diagnosis as having two aspects: the process of evaluating data obtained from the examination and the end result of that process. The process of evaluating diagnostic data requires the therapist to select and perform the necessary diagnostic tests for a particular patient and then make the appropriate interpretation of the results. The second step, arriving at the end result, requires an integration of the results of all tests performed into a cluster, or classification, that in turn directs the treatment. The classification may differ from the medical diagnosis because it is based on impairments and functional limitations assessed during the examination and not on pathologic origins.6,28,52 The need for developing classification systems within the profession of physical therapy has been emphasized to facilitate professional communication, improve the outcomes of care, and increase the power of clinical research.47 Understanding the diagnostic process in physical therapy involves much more than simply memorizing a list of classification labels; it requires the PT to learn how to select the best tests to perform efficiently and effectively and how to integrate the results to arrive at a diagnostic decision. 18

One of the first steps in examining and interpreting diagnostic tests is to consider why the test is being performed. The tests used by PTs are performed for two basic purposes.8,55 Some are performed to examine the status of an anatomic structure, exclude or include certain anatomical regions for further examination, or detect conditions not appropriate for physical therapy management. These tests are often used as screening procedures and need to demonstrate diagnostic efficacy; they should have a high level of accuracy in distinguishing between individuals with or without the condition of interest. For example, a PT may use the anterior drawer test during the examination of a patient with knee pain in an attempt to assess the status of the anterior cruciate ligament. Another example is asking questions regarding unexplained weight loss or night pain in a patient with a musculoskeletal disorder to determine if the patient’s symptoms may be caused by a previously undiagnosed neoplasm. Tests designed for diagnostic efficacy are used to focus further examination and may be concerned with anatomical considerations instead of selecting specific treatment techniques. The second reason why PTs perform certain diagnostic tests is because the results, singularly or in combination with other findings, are believed to indicate that a particular type of intervention will be most effective for the patient. Tests used in this manner form the foundation of classification systems and should demonstrate outcome efficacy. For example, the observation of frontal plane displacement of the shoulders relative to the pelvis (i.e., lumbar lateral shift) in a patient with low back pain (LBP) is frequently cited as an important examination finding.7,30,38,45 Several pathoanatomical hypotheses have been posited in explanation of the phenomenon, including disk herniation,5,38 muscle spasm,17 and segmental instability,7 yet the precise condition resulting in a lateral shift is often unknown.45 Despite this, the observation of a lateral shift is often an important diagnostic finding because it may indicate a specific intervention (e.g., correction of the lateral shift) that will be most useful in reducing pain and disability.7,39 It is possible that one test may have the potential to serve both diagnostic and classification purposes. For example, the neck distraction test is frequently performed during the examination of patients with neck pain. The test is positive when manual distraction of the neck relieves the patient’s symptoms. The distraction test has been described as a test for diagnosing

CHAPTER 2

TABLE 2-1

The Elements of Patient Management Examination

Evaluation

Diagnosis

Prognosis

Intervention

The process of obtaining a history, performing relevant systems reviews, and selecting and administering specific tests and measures to obtain data. A dynamic process in which the PT makes clinical judgments on the basis of data gathered during the examination. Both the process and the end result of evaluating information obtained from the examination, which the PT then organizes into defined clusters, syndromes, or categories to help determine the most appropriate intervention strategies. Determination of the level of optimal improvement that might be attained through intervention and the amount of time required to reach that level. Purposeful and skilled interaction of the PT with the patient and, if appropriate, with other individuals involved in the care of the patient using various physical therapy methods and techniques to produce changes in the condition that are consistent with the diagnosis and prognosis.

From the Guide to physical therapist practice, ed 2, Phys Ther 81:9-746, 2001.

Evidence-Based Examination of Diagnostic Information

19

Judging the Evidence: Study Design The optimal design for a study examining a diagnostic test is the one that most effectively reduces susceptibility to bias (a deviation of the results from the truth in a consistent direction).14,24 The optimal design for examining a diagnostic test is “a prospective, blind comparison of the test and the reference test in a consecutive series of patients from a relevant clinical population.”34 In other words, a study investigating a diagnostic test should use a prospective design in which all subjects are evaluated by the diagnostic test and a reference standard representing the definitive, or best, criteria for the condition of interest. When performed in this manner, the results of the test and the reference standard can be summarized in a 2  2 table (Figure 2-1). Each subject will fit into only one box in this table. The distribution of subjects into these different boxes will then be used to determine the usefulness of the diagnostic test. Other aspects of the study design besides the basic layout are important for determining the strength of evidence offered by the study. These factors include the reference standard, the diagnostic test itself, and the patient population studied. Reference Standard

cervical nerve root compression and has been shown to have some validity.59,60 However, the test may also be used to select an intervention. When the distraction test is positive, some therapists may interpret this finding as indicating a need for cervical traction.40 Considering the purpose of a test is important for further consideration of the diagnostic process from an evidence-based perspective because the purpose has significant implications for examining the evidence related to its usefulness in clinical practice.

Evidence-Based Practice and Diagnosis Evidence-based practice can be defined as “the conscientious and judicious use of current best evidence in making decisions about the care of individual patients.”48 To practice in an evidence-based manner, the clinician must be able to determine what constitutes the “best” evidence. Developing proficiency at reading and interpreting the evidence in the literature related to diagnostic tests is an important skill for PTs who want to become efficient and skillful at clinical diagnosis. Many PTs will be familiar with some of the principles for determining the best evidence when examining studies comparing different interventions. Most therapists understand that the best evidence in this area comes from randomized clinical trials with relatively long-term and complete follow-up periods.9,21,58 When seeking to determine the best evidence on diagnostic tests, the rules governing the evaluation of studies regarding treatment outcomes are no longer applicable.49 Rules for judging evidence offered by a study of a diagnostic test have been described; however, these rules are not as familiar to most therapists.1,24,34,41 These rules primarily apply to two important aspects of designing or interpreting a study of a diagnostic test: the study design and data analysis.

When studying a diagnostic test, the test must be compared with a reference standard, or gold standard. The reference standard is the criterion that best defines the condition the test is attempting to detect.25 It is important to recognize, however, that reference standards are not perfect but should offer the best approximation of the condition.50 The selection and application of the reference standard are an extremely important considerations in a study of a diagnostic test. If the reference standard cannot be accepted as the best method of determining whether the patient has the condition of interest, the study will not be able to provide meaningful information.26 The reference standard must be consistent with the purpose of the test. If the test is primarily being used for diagnosing pathology in a certain anatomical structure, then a reference standard related to pathoanatomy, such as a magnetic resonance image or radiograph, would be appropriate. If a test is being used to select an intervention, a reference standard related to pathoanatomy would not be appropriate. Because such tests are attempting to predict which patient will respond to a particular intervention, the reference standard needs to be related to the therapeutic outcome of the intervention.

Diagnostic test positive

Reference Standard Positive

Reference Standard Negative

True-positive results

False-positive results A

Diagnostic test negative

B

False-negative results

True-negative results C

D

FIGURE 2-1 Contingency table created by comparing the results of the diagnostic test and the reference standard.

20

SECTION ONE

Introduction

A study investigating tests for carpal tunnel syndrome provides a good example of this distinction in reference standards.4 One test that was examined in this study was Phalen’s test. This test is typically used to diagnose compression of the median nerve in the carpal tunnel. The authors of the study, however, also hypothesized that a positive Phalen’s test result may be helpful in determining that a patient may respond to wrist splinting. To assess Phalen’s test for both of these purposes, two different reference standards were needed—one to represent the pathoanatomical purpose of the test and the second to represent its role in selecting an intervention. The authors chose to use a nerve conduction velocity study as a pathoanatomical reference standard and the response of symptoms to 2 weeks of splinting as the intervention reference standard. This second reference standard permitted an examination of the usefulness of Phalen’s test in determining if splinting should be performed, regardless of its ability to diagnose pathology in the median nerve. The results of a study that uses a reference standard reflecting one purpose cannot be generalized to other possible uses of the test. For example, Spurling’s test is typically described as a test for cervical radiculopathy.57 One study that examined the validity of Spurling’s test compared the results against a reference standard of subject-reported neck pain present during the week preceding the examination.53 By using this reference standard, the authors conceptualized the tests as essentially screening procedures designed to distinguish between individuals with or without a recent history of neck pain. This is not the reason why most PTs perform Spurling’s test during an examination. Therapists typically use the test to help determine if a cervical nerve root lesion is present. Therapists may use the results of Spurling’s test to make an intervention decision. For example, some therapists may consider a positive Spurling’s test result an indication to perform cervical traction.40 A reference standard of self-reported neck pain makes the results difficult to interpret because it is inconsistent with what Spurling’s test is used for. Examining the reference standard and ensuring its consistency with the purpose of the test are essential for evaluating diagnostic test studies. Although the majority of studies in the literature use pathoanatomical reference standards, physical therapists are often concerned with issues related to classification and outcome efficacy. If the reference standard is inappropriate for the purpose of the test, the study will not provide useful results. Other factors related to the reference standard are important to consider. The reference standard must be consistently applied to all subjects in the study. For example, a study of screening examinations was performed by nurses with goniometry to detect cerebral palsy in preterm infants.44 Infants with a high suspicion of cerebral palsy were referred to a neurologist whose evaluation served as the reference standard, whereas a less rigorous reference standard consisting of chart reviews was used for the remaining subjects.44 The adequacy of chart reviews for diagnosing cerebral palsy with the same accuracy as a clinical examination leaves this study susceptible to bias, which can lead to an overestimation of the diagnostic value of a test.34,46 In addition, the reference standard should be judged by an individual who is unaware of, or blinded to, the diagnostic test results and the overall clinical presentation of the subject. If

blinding is not maintained, judgments of the reference standard may be influenced by expectations based on knowledge of the test results.16 Review bias occurs in situations when either the reference standard or the diagnostic test is judged by an individual with knowledge of the other result.46 For example, in a study by Lauder et al,32 various clinical diagnostic tests were compared against a reference standard of electrodiagnostic testing to determine their utility in diagnosing lumbar radiculopathy. In the study, it is unclear if the individual performing the diagnostic tests was aware of the results of the electrodiagnostic studies. Clearly, if the examiner was aware of the electrodiagnostic test results, this knowledge could have influenced the judgment of the diagnostic test results. Diagnostic Tests

The diagnostic tests being studied must be described in sufficient detail to allow the reader to understand and replicate the procedures. The actual physical performance of the test also needs to be described because the same test may be performed differently by different examiners. A study’s results can only be generalized to the test as it was performed in the study. For example, Levangie33 examined the diagnostic usefulness of pelvic asymmetry for detecting the presence of LBP among subjects referred to physical therapy. If asked how to test for the presence of pelvic asymmetry, most PTs would probably describe the palpation of certain bony landmarks, with the patient standing or sitting. In this study, however, pelvic asymmetry was determined by using a pelvic inclinometer to assess iliac crest height. It cannot be assumed that determination of pelvic asymmetry with palpation would yield similar results. Description of the diagnostic test should cover physical performance and the criteria defining positive and negative results. Many tests commonly used by PTs have variable or unclear grading criteria. Determining the presence of centralization in patients with LBP is an example. What constitutes a positive finding of centralization varies. Some use definitions strictly based on movement of symptoms from distal to proximal.13,38 Others have defined centralization to include diminishment of pain during testing.29 Such disagreements point out the need to clarify how positive and negative results are defined within a particular study. Grading the diagnostic test is also susceptible to review bias if the individual judging the test is aware of the results on the reference standard. If this blinding is not maintained, the usefulness of the test is likely to be somewhat overestimated.34 Study Population

The subjects of a diagnostic test study are an important consideration. The subjects should be similar to patients that a therapist would consider applying the test to in clinical practice. Some people in the study will end up having the condition of interest, whereas others will not. Furthermore, those who do have the condition should reflect a continuum of severity from mild to severe.26 Those who do not have the condition should have similar symptoms.23 Unfortunately, many studies include some healthy subjects. Any diagnostic test will look more useful than it really is when it is used to distinguish between healthy

CHAPTER 2

individuals and those with severe conditions.34 Spectrum bias occurs when study subjects are not representative of the population in whom the test is typically applied in practice.34 Spectrum bias can profoundly affect the results of a study. The best way to avoid spectrum bias is to use a prospective design in which a consecutive group of subjects from a clinical setting is studied.1 Comparing the study by Burke et al4 with another study examining the value of Phalen’s test for diagnosing carpal tunnel syndrome illustrates the concern over spectrum bias. A study by Gellman et al15 also compared Phalen’s test against a reference standard of nerve conduction velocity. The only substantial difference between the two studies was the subjects. All the subjects in the Burke et al study had symptoms consistent with carpal tunnel syndrome.4 The study by Gellman et al15 involved subjects with symptoms consistent with carpal tunnel syndrome but also included a group of 50 hands that were asymptomatic. Inclusion of hands without symptoms creates a spectrum bias. Not surprisingly, the results of this study demonstrated much greater diagnostic accuracy for Phalen’s test than the study relatively free from spectrum bias.

Using the Data: Analysis The basic layout of the results from a study of a diagnostic test is shown in Figure 2-1. The result for each subject fits into only one of the four categories on the basis of a comparison of the results of the diagnostic test and the reference standard. The defining characteristics of the four categories are: • True-positive (a) subjects who are positive on both the reference standard and the diagnostic test • False-positive (b) subjects who are negative on the reference standard but positive on the diagnostic test • False-negative (c) subjects who are positive on the reference standard but negative on the diagnostic test • True-negative (d) subjects who are negative on both the reference standard and the diagnostic test From this layout several statistics can be calculated that are useful for understanding the value of a diagnostic test (Table 2-2).16 Sensitivity, Specificity, and Predictive Values

Sensitivity and specificity values are calculated vertically from the 2  2 table and represent the proportion of correct diagnostic test results among individuals with and without the condition. Sensitivity (or true-positive rate) is the proportion of true-positive subjects among all subjects who have the condition of interest. Specificity (or true-negative rate) is the proportion of true-negative subjects among all subjects without the condition.50 Predictive values are calculated horizontally from the table and represent the proportion of subjects with a positive or negative diagnostic test result that are correct results. The positive predictive value is the proportion of true-positive subjects among all subjects with a positive diagnostic test. The negative predictive value is the proportion of true-negative subjects among all subjects with a negative diagnostic test.18 The predictive values are generally of less value in interpreting the

Evidence-Based Examination of Diagnostic Information

21

TABLE 2-2

Statistics Commonly Used in Studies of Diagnostic Tests Statistic

Formula

Description

Positive predictive value Negative predictive value

a/(a  b)

Given a positive test result, the probability that the individual has the condition. Given a negative test result, the probability that the individual does not have the condition. Given that the individual has the condition, the probability that the test will be positive. Given that the individual does not have the condition, the probability that the test will be negative. Given a positive test result, the increase in odds favoring the condition. Given a negative test result, the decrease in odds favoring the condition.

d/(c  d)

Sensitivity

a/(a  c)

Specificity

d/(b  d)

Positive LR

sensitivity/ (1  specificity)

Negative LR

(1  sensitivity)/ specificity

LR, Likelihood ratio.

usefulness of a test because they depend highly on the prevalence of the condition of interest in the study population. Positive predictive values will be lower and negative predictive values higher in study populations with a low prevalence of the condition. If prevalence is high, the trends reverse.23 Sensitivity and specificity values remain fairly consistent across different prevalence levels50 and are preferred over predictive values. Sensitivity and specificity values provide useful information for interpreting diagnostic tests. For example, a test with high sensitivity has relatively few false-negative results. High test sensitivity therefore attests to the value of a negative test result.51,54 In other words, if a test has high sensitivity, few false-negative results will be found, and therefore the examiner can have some level of trust that the negative result actually represents the absence of the condition. Sackett et al50 have advocated the acronym SnNout (if sensitivity is high, a negative result is useful for ruling out the condition). High sensitivity indicates that a test is useful for excluding, or ruling out, a condition when it is negative but does not address the value of a positive test. A diagnostic test with high specificity has relatively few false-positive results and therefore speaks to the value of a positive test result.51,54 The acronym advocated is SpPin (if specificity is high, a positive result is useful for ruling in the condition).50 Unfortunately few diagnostic tests have both high sensitivity and high specificity. Knowledge of sensitivity and specificity of a diagnostic test can improve clinical decision-making by helping clinicians weigh the value of both positive and negative results. A study examining history and physical examination findings in predicting rotator cuff tears in older patients provides an illustration.35 Numerous diagnostic tests were compared against a reference standard of shoulder arthrogram. No test had high levels of both sensitivity and specificity (Table 2-3). A painful arc

22

SECTION ONE

Introduction

TABLE 2-3

Diagnostic Efficacy of Clinical Tests for Detecting Rotator Cuff Tears Diagnostic Test

Presence of night pain Presence of supraspinatus atrophy Shoulder elevation PROM 170° Shoulder external rotation PROM 70° Neer impingement sign Weakness with external rotation strength test Painful arc during elevation PROM

Sensitivity

Specificity

Positive LR

Negative LR

87.7 55.6 30.2 19.0 97.2 75.9 97.5

19.7 72.9 78.1 83.6 9.0 57.3 9.9

1.09 2.05 1.38 1.16 1.07 1.78 1.08

0.62 0.61 0.89 0.97 0.31 0.75 0.25

From Likater D, Pioro M, El Bilbeisi H, et al: Returning to the bedside: using the history and physical examination to identify rotator cuff tears, JAGS 48:1633-1637, 2000. PROM, Passive range of motion.

during passive elevation of the arm was the most sensitive, and limited external rotation passive range of motion to less than 70° was the most specific.35 The high sensitivity (97.5%) of the presence of a painful arc indicates that this finding is useful for ruling out a rotator cuff tear; however, the low specificity (9.9%) indicates that a positive painful arc has little meaning. Few falsenegative results are found when testing for a painful arc, and it would be unlikely that the patient actually has a rotator cuff tear if a painful arc is not present. Conversely, limited external rotation was highly specific (83.6%), indicating that a positive test is useful for confirming a rotator cuff tear. The sensitivity of limited external rotation was poor (19%), indicating a lack of value for a negative test result.

TABLE 2-4

A Guide to Interpretation of LR Values Positive LR

Negative LR

Interpretation

10

0.10

5-10

0.1-0.2

2-5

0.2-0.5

1-2

0.5-1

Generate large and often conclusive shifts in probability Generate moderate shifts in probability Generate small, but sometimes important, shifts in probability Alter probability to a small, and rarely important, degree

From Jaeschke R, Guyatt GH, Sackett DL: Users’ guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? JAMA 271:703-707, 1994.

Likelihood Ratios

Sensitivity and specificity values provide helpful information; however, they do not provide a complete picture. The actual performance of a diagnostic test is related to sensitivity and specificity values and also depends on the pretest probability that the condition is present. Useful tests should produce large shifts in probability once the result of the test is known.10,31,36 Sensitivity and specificity values cannot quantify shifts in the probability given a certain test result. The best statistics for quantifying shifts in probability, based on the results of a diagnostic test, are likelihood ratios.3,27 Likelihood ratios (LRs) combine sensitivity and specificity values into a value that can be used to quantify shifts in probability once the diagnostic test result is known.56 The positive LR is calculated as sensitivity/ (1 – specificity) and indicates the increase in odds favoring the condition given a positive test result. The negative LR is calculated as (1 – sensitivity)/specificity and indicates the change in odds favoring the condition given a negative test result.24 An LR value of 1 indicates the test result does nothing to change the odds favoring the condition, whereas an LR value greater than 1 increases the odds of the condition and an LR value of less than 1 diminishes the odds of the condition. Table 2-4 provides a guide for interpreting the strength of an LR.27 A diagnostic test with a large positive LR (e.g., 5.0) indicates that the shift in odds favoring the condition will be relatively large when the diagnostic test is positive. It is therefore desirable for a test to have a large positive LR value. In general, diagnostic tests with high levels of specificity will also have

large positive LR values because both attest to the usefulness of the positive test result. The negative LR value indicates the change in odds favoring the condition given a negative diagnostic test result. Because a negative test result is supposed to reduce the odds that a condition is present, it is desirable for a test to have a small (e.g., 0.20) negative LR value. A small negative LR indicates a diagnostic test that is useful for ruling out a condition when the result is negative. Tests with high sensitivity values generally have small negative LR values. Examining the tests for rotator cuff tears discussed earlier provides an example of the importance of combining sensitivity and specificity values (see Table 2-3). The most sensitive test was a painful arc (97.5%), and this test also had the smallest negative LR. The most specific test was external rotation range of motion (83.6%); however, positive LR value was greater for the presence of supraspinatus atrophy (1.78 vs. 2.05). This is because the sensitivity value for the finding of supraspinatus atrophy was much better than the sensitivity for external rotation range of motion limitation.

Using the Data: Interpretation The diagnostic process requires therapists to think in terms of probability and revision of probabilities. Before performing a diagnostic test, a therapist will have some idea of the likelihood that the patient being evaluated has the condition of interest. Although this probability is rarely articulated or quantified in the therapist’s mind, all clinicians develop at least a sense that

CHAPTER 2

certain conditions are more likely, and others less likely, for certain patients. The condition of interest in the therapist’s mind may be related to pathology or pathoanatomy; for instance, is it likely that this patient has a cervical disk lesion that is causing his arm pain? The condition of interest being considered by the therapist may involve treatment decision making; for instance, will this patient’s arm pain be relieved with traction treatments? The therapist also has some threshold level of certainty, at which point he or she will be “sure enough” and ready to act.36,42 Again, this threshold is typically not quantified, but there is some amount of assurance that any therapist must reach before an action is taken with the patient. The threshold is a factor of the costs associated with making an incorrect decision versus the benefits of being correct.2,43 For example, a high threshold of certainty would be required if the question involved ruling out metastatic disease in the lung as a source of arm pain. If a therapist had any lingering doubts about such a diagnosis, it would be incumbent to refer the patient for further diagnostic workup before pursuing physical therapy. On the other hand, if the question concerned the application of a treatment with minimal cost and low potential for risk, such as mechanical cervical traction, the threshold for action would be lower. A therapist will likely be willing to initiate traction treatment if he or she is fairly certain the patient may benefit and if there is not greater certainty that the patient would benefit from an alternative treatment. LRs provide the information needed to select the diagnostic test or tests that will most efficiently move the therapist from the uncertainty associated with the pretest probability to a posttest probability that crosses a threshold for action. The probability that a patient has a particular condition before performing the diagnostic test can come from sources other than the clinical experience and expertise of the examiner. Other sources of pretest probabilities include epidemiologic data on prevalence rates for certain conditions, the prevalence of the condition in studies examining diagnostic test properties, clinical databases, and information already obtained on the patient from the examination.2 Whatever the

Evidence-Based Examination of Diagnostic Information

source of the pretest probability, LR values quantify the direction and magnitude of change in the pretest probability on the basis of the diagnostic test result.25 To illustrate the process, consider the case of a 37-year-old male patient with a 1-week history of LBP and right buttock pain that does not extend below the knee. The question is one related to treatment decision-making: “Is this patient likely to respond to a manipulation intervention?” What is a reasonable pretest probability that the patient will respond to manipulation? On the basis of a randomized trial demonstrating that many patients with LBP will respond to manipulation22,37 as well as clinical experience, the probability may be fairly high, perhaps 60%. What information should be gathered to alter this probability? To answer this question, the results of a recent study examined the usefulness of various diagnostic tests against a reference standard of success with manipulation (defined as a 50% decrease in self-reported disability occurring over two treatment sessions).12 The results of this study (Table 2-5) show the best test would be asking the patient how long the symptoms have been present (positive LR  4.4 for 15 days or less). It is not uncommon that factors from the history prove more useful than those from the physical examination. If the test is positive (i.e., the duration of symptoms is 16 days), what should the new probability of success with manipulation be? Two methods can be used to make this determination. The simpler but somewhat less precise method uses a nomogram (Figure 2-2).11 A straight edge is anchored along the left side at the point representing the pretest probability. The straight edge is then aligned with the appropriate LR value (4.4 in this example), and the line is then extended through the right side of the nomogram. The point of intersection on the right side indicates the posttest probability.50 In this example, if the duration of the patient’s symptoms was less than 16 days, the posttest probability of success with manipulation appears to be approximately 83%. The posttest probability can be quantified with greater precision by using a calculation process described by Sackett et al50 and outlined in Box 2-1.

TABLE 2-5

Diagnostic Usefulness of Various Signs and Symptoms for Determining if a Patient with Low Back Pain Will Respond to a Manipulation Technique Test

Sensitivity

Specificity

Positive LR

Negative LR

0.56 0.88 0.75

0.87 0.36 0.44

4.4 1.4 1.3

0.51 0.33 0.59

0.97

0.23

1.3

0.13

0.50

0.85

3.3

0.59

0.84

0.33

1.3

0.48

FACTORS FROM THE HISTORY

Duration of symptoms 15 days Symptom distribution not distal to the knee Episodes of LBP not becoming more frequent FACTORS FROM THE PHYSICAL EXAMINATION

Hypomobility with prone spring testing in at least one lumbar segment Hip internal rotation PROM greater than 35° in at least one hip No peripheralization during lumbar standing AROM

23

Adapted from Flynn T, Fritz J, Whitman J, et al: A clinical prediction rule for classifying patients with low back pain who demonstrate short term improvement with spinal manipulation, Spine 27:2835-2843,2002. LBP, Low back pain; PROM, passive range of motion; AROM, active range of motion.

24

SECTION ONE

Introduction

.1

99

.2 .5

2

Percentage

5 10 20 30 40 50 60 70 80 90 95

95 1000 500 200 100 50 20 10 5 2 1

Posttest probability

90 80

.5 .2 .1 .05 .02 .01 .005 .002 .001

70 60 50 40 30 20

Percentage

1

10 5 2

more from another type of intervention.6 For example, consider if the patient told the therapist that for previous episodes of LBP, treatment with spinal manipulation had not been successful. In this case, the therapist would likely believe the pretest probability of success with a manipulation technique to be much lower, perhaps as low as 15%. In this circumstance, the finding that the current duration of symptoms was less than 16 days would only increase the probability of success to 44%. The therapist may be better served to use the test with the smallest negative LR value because if this finding is negative, it is likely that the posttest probability will be small enough to exclude manipulation as a treatment option and move on to other considerations. The test with the smallest LR value was prone posterior-to-anterior spring testing over the spinous processes in the lumbar spine (see Table 2-5). If this testing did not reveal any hypomobility, the posttest probability of success with manipulation would be only 2%.

1 .5

Summary

.2

LRs provide the most powerful tool for quantifying the importance of a particular test within the diagnostic process. Because LR values can be calculated for both positive and negative results, the importance of both positive and negative test results can be examined independently. This is important because few tests provide useful information in both capacities, and understanding the relative strength of evidence provided by a negative or a positive test result helps to refine interpretation of the diagnostic test. Understanding the information contained in statistics, such as sensitivity, specificity, and LRs, can assist therapists in improving their diagnostic and decision-making skills. Developing these skills is paramount for therapists working in primary care settings.

.1 99 Pretest Likelihood Posttest probability ratio probability FIGURE 2-2 Nomogram for estimating posttest probability of a diagnosis.

(From Fagan TJ: Nomogram for Bayes’s theorem, N Engl J Med 293:257, 1975.)

By using these calculations, the pretest probability of 60% would correspond to pretest odds of 1.5:1. Multiplying this by the LR value of 4.4, the posttest odds would be 6.6:1. Converting this back to probability results in a posttest probability of 87%. Examples such as this highlight the importance of attending to the most important examination findings for clinical decision-making. If the examiner had instead focused on the lack of symptoms distal to the knee as confirming evidence that this patient was likely to benefit from manipulation, the actual posttest probability would only increase to a 68% probability of success. Without knowledge of the relative unimportance of this finding, the therapist might overinterpret the finding. If the pretest probability were lower, the therapist may instead want to seek a finding that would confirm that the patient does not need manipulation but instead would benefit

BOX 2-1

Calculation of Posttest Probability Step 1: Convert the pretest probability to pretest odds: Pretest probability Pretest odds  1  Pretest probability Step 2: Multiply the pretest odds by the LR value: Pretest odds  LR  posttest odds Step 3: Convert the posttest odds to posttest probability: Posttest odds  1  Posttest probability Posttest odds

REFERENCES 1. Begg CB: Methodologic standards for diagnostic test assessment studies, J Gen Intern Med 3:518-520, 1988 2. Bernstein J: Decision analysis, J Bone Joint Surg 79-A:1404-1414, 1997. 3. Boyko EJ: Ruling out or ruling in disease with the most sensitive or specific diagnostic test, Med Decis Making 14:175-179, 1994. 4. Burke DT, Burke MA, Bell R, et al: Subjective swelling: a new sign for carpal tunnel syndrome, Am J Phys Med Rehabil 78:504-508, 1999. 5. Charnley J: Orthopaedic signs in the diagnosis of disc protrusion, Lancet 1:186-192, 1951. 6. Delitto A, Snyder-Mackler L: The diagnostic process: examples in orthopedic physical therapy, Phys Ther 75:203-211, 1994. 7. Delitto A, Erhard RE, Bowling RW: A treatment-based classification approach to low back syndrome: identifying and staging patients for conservative management, Phys Ther 75:470-489, 1995. 8. Deyo RA, Haselkorn J, Hoffman R, et al: Designing studies of diagnostic tests for low back pain or radiculopathy, Spine 19(suppl):2057s-2063s, 1994. 9. Dickersin K, Scherer R, Lefebvre C: Identifying relevant studies for systematic reviews, BMJ 309:1286-1291, 1994. 10. Dujardin B, Van den Ende J, Van Gompel A, et al: Likelihood ratios: a real improvement for clinical decision making? Eur J Epidemiol 10:29-36, 1994. 11. Fagan TJ: Nomogram for Bayes’s theorem, N Engl J Med 293:257, 1975. 12. Flynn T, Fritz J, Whitman J, et al: A clinical prediction rule for classifying patients with low back pain who demonstrate short term improvement with spinal manipulation, Spine 27:2835-2843, 2002. 13. Fritz JM, Delitto A, Vignovic M, et al: Inter-rater reliability of judgments of the centralization phenomenon and status change during movement

CHAPTER 2

14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.

testing in patients with low back pain, Arch Phys Med Rehabil 81:57-61, 2000. Geddes JR, Harrison PJ: Closing the gap between research and practice, Br J Psychiatry 171:220-225, 1997. Gellman H, Gelberman RH, Tan AM, et al: Carpal tunnel syndrome: an evaluation of the provocative diagnostic tests, J Bone Joint Surg 68-A: 735-737, 1986. Greenhalgh T: How to read a paper: papers that report diagnostic or screening tests, BMJ 315:540-543, 1997. Grieve GP: Treating backache: a topical comment, Physiotherapy 69:316, 1983. Griner PF, Mayewski RJ, Mushlin AI, et al: Selection and interpretation of diagnostic tests and procedures. Principles and applications, Ann Intern Med 94:557-592, 1981. Guccione AA: Physical therapy diagnosis and the relationship between impairments and function, Phys Ther 71:499-504, 1991. Guide to Physical Therapist Practice, ed 2, Phys Ther 81:9-746, 2001. Guyatt GH, Sackett DL, Cook DJ: Users’ guide to the medical literature: II. How to use an article about therapy or prevention: A. Are the results of the study valid? JAMA 270:2598-2601, 1993. Hadler NM, Curtis P, Gillings DB, et al: A benefit of spinal manipulation as an adjunctive therapy for acute low-back pain: A stratified controlled study, Spine 12:703-705, 1987. Hagen MD: Test characteristics: how good is that test? Med Decis Making 22:213-233, 1995. Irwig L, Tosteson ANA, Gatsonis C, et al: Guidelines for meta-analyses evaluating diagnostic tests, Ann Intern Med 120:667-676, 1994. Jaeschke RZ, Meade MO, Guyatt GH, et al: How to use diagnostic test articles in the intensive care unit: diagnosing weanability using f/Vt, Crit Care Med 25:1514-1521, 1997. Jaeschke R, Guyatt G, Sackett DL: Users’ guides to the medical literature. III. How to use an article about a diagnostic test. A. Are the results of the study valid? JAMA 271:389-391, 1994. Jaeschke R, Guyatt GH, Sackett DL: Users’ guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? JAMA 271:703-707, 1994. Jette AM: Diagnosis and classification by physical therapists: a special communication, Phys Ther 69:967-969, 1989. Karas R, McIntosh G, Hall H, et al: The relationship between nonorganic signs and centralization of symptoms in the prediction of return to work for patients with low back pain, Phys Ther 77:354-360, 1997. Khuffash B, Porter RW: Cross leg pain and trunk list, Spine 14:602-603, 1989. Kortelainen P, Puranen J, Koivisto E, et al: Symptoms and signs of sciatica and their relation to the localization of the lumbar disc herniation, Spine 10:88-92, 1985. Lauder T, Dillingham R, Andary M, et al: Effect of history and exam in predicting electrodiagnostic outcome among patients with suspected lumbosacral radiculopathy, Am J Phys Med Rehabil 79:60-68, 2000. Levangie PK: The association between static pelvic asymmetry and low back pain, Spine 24:1234-1241, 1999. Lijmer JG, Mol BW, Heisterkamp S, et al: Empirical evidence of designrelated bias in studies of diagnostic tests, JAMA 282:1061-1066, 1999. Likater D, Pioro M, El Bilbeisi H, et al: Returning to the bedside: using the history and physical examination to identify rotator cuff tears, JAGS 48:1633-1637, 2000. Lurie JD, Sox HC: Principles of medical decision making: spine update, Spine 24:493-498, 1999.

Evidence-Based Examination of Diagnostic Information

25

37. Meade TW, Dyer S, Browne W, et al: Randomised comparison of chiropractic and hospital outpatient management for low back pain: results from extended follow up, BMJ 311:349-351, 1995. 38. McKenzie RA: The lumbar spine: mechanical diagnosis and therapy, Waianae, New Zealand, 1989, Spinal Publications. 39. McKenzie RA: Manual correction of sciatic scoliosis, NZ Med J 76: 194-199, 1972. 40. Moeti P, Marchetti G: Clinical outcome from mechanical intermittent cervical traction for the treatment of cervical radiculopathy: a case series, J Orthop Sports Phys Ther 31:207-213, 2001. 41. Mulrow CD, Linn WD, Gaul MK, et al: Assessing quality of diagnostic test evaluation, J Gen Intern Med 4:288-295, 1989. 42. Pauker SG, Kassirer JP: The threshold approach to clinical decision making, N Engl J Med 302:1109-1117, 1980. 43. Pauker SG, Kassirer JP: Therapeutic decision-making: a cost benefit analysis, N Engl J Med 293:229-234, 1975. 44. Pinto-Martin JA, Torre C, Zhao H: Nurse screening of low-birthweight infants for cerebral palsy using goniometry, Nurs Res 46:284-287, 1997. 45. Porter RW, Miller CG: Back pain and trunk list, Spine 11:596-600, 1986. 46. Reid MC, Lachs MS, Feinstein AR: Use of methodological standards in diagnostic test research: getting better but still not good, JAMA 274: 645-651, 1995. 47. Rose SJ: Physical therapy diagnosis: role and function, Phys Ther 69: 535-537, 1989. 48. Sackett DL, Richardson WS: Evidence based medicine: what it is and what it isn’t, BMJ 312:71-72, 1996. 49. Sackett DL, Wennberg JE: Choosing the best research design for each question. It’s time to stop squabbling over the “best” methods, BMJ 315:1636, 1997. 50. Sackett DL, Haynes RB, Guyatt GH, et al: Clinical epidemiology: a basic science for clinical medicine, ed 2, Boston, 1992, Little, Brown. 51. Sackett DL: A primer on the precision and accuracy of the clinical examination, JAMA 267:2638-2644, 1992. 52. Sahrmann SA: Diagnosis by the physical therapist—a prerequisite for treatment, Phys Ther 68:1703-1706, 1988. 53. Sandmark H, Nisell R: Validity of five common manual neck pain provoking tests, Scand J Rehab Med 27:131-136, 1995. 54. Schulzer M: Diagnostic tests: a statistical review, Muscle Nerve 17:815-819, 1994. 55. Schwartz JS: Evaluating diagnostic tests—what needs to be done? J Gen Intern Med 1:266-276, 1986. 56. Simel DL, Samsa GP, Matchar DB: Likelihood ratios with confidence: sample size estimation for diagnostic test results, J Clin Epidemiol 44:763-770, 1991. 57. Spurling RG, Scoville WB: Lateral rupture of the cervical intervertebral discs: a common cause of shoulder and arm pain, Surg Gynecol Obstet 78:350-358, 1944. 58. van Tulder MW, Assendelft WJ, Koes BW, et al: Method guidelines for systematic reviews in the Cochrane Collaboration back review group for spinal disorders, Spine 22:2323-2330, 1997. 59. Viikari-Juntura E, Porras M, Laasonen EM: Validity of clinical tests in the diagnosis of root compression in cervical disc disease, Spine 14:253-257, 1989. 60. Wainner RM, Fritz JM, Irrgang JJ, et al: Reliability and diagnostic accuracy of the clinical examination and patient self-report measures for cervical radiculopathy, Spine 28:52-62, 2003.