Screening Magnetic Resonance Images Versus Plain Films for Low Back Pain: A Randomized Trial of Effects on Patient Outcomes Jeffrey G. Jarvik, MD, MPH t, Richard A. Deyo, MD, MPH TM, Thomas D. Koepsell, MD, MPH 2'5
easuring patient outcomes has become an essential aspect of clinical research. For the radiologist, however, measuring patient outcomes is difficult and often seemingly irrelevant, because the "signal" of the patient outcome is frequently drowned out by the "noise" of other factors such as the choice and efficacy of therapies. For this reason, it is important for radiologists to study the intermediate outcomes of a diagnostic technology. These outcomes are the effect of the diagnostic technology on the choice of therapy (therapeutic impact) and its effect on diagnosis (diagnostic impact). Nonetheless, the ultimate evaluation of a new technology depends on the demonstration that use of the technology improves the patient's outcome. The most commonly examined outcome is mortality. However, for many chronic conditions such as low back pain, mortality and cure are not relevant. Instead, measures of health-related quality of life are the important outcomes to characterize. In this pilot study, we evaluated the impact of a new imaging technology--screening magnetic resonance (MR) imaging of the lumbar spine---on diagnosis, therapy, and outcomes in patients with low back pain. Current recommendations for the diagnostic evaluation of patients with acute low back pain urge physicians to refrain from imaging tests until conservative therapy has been unsuccessful for more than 4 weeks [1]. At that point, if the patient does not have neurologic symptoms in the lower extremities, anteroposterior and lateral plain radiographs are recommended. The greatest advantage of plain radiographs is that they are relatively inexpensive. Yet, this advantage is offset by their generally poor sensitivity and specificity. A screening MR examination of the lumbar spine may be a costeffective replacement for plain films in this population. However, the increased amount of information provided by the screening MR examination (some of which will be clinically unimportant) [2] may prompt physicians to embark on further diagnostic testing that may lead to more invasive therapy, not necessarily beneficial to the patient. In this article we describe the study design and the instruments being used to evaluate the intermediate outcomes of diagnostic and therapeutic impor-
M
$28
From the 1Department of Radiology, University of Washington Medical Center, Seattle, WA; 2Department of General Internal Medicine, University of Washington Medical Center, Seattle, WA; 3Clinical Scholars Program, University of Washington Medical Center, Seattle, WA; 4Health Sewices Research, School of Public Health, University of Washington, Seattle, WA; and 5Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA. This work was supported by the Robert Wood Johnson Clinical Scholars Program, the Radiological Society of North America Research and Education Fund, and the GE-AUR Radiology Research Academic Fellowship. The opinions and conclusions are those of the authors and not necessarily those of the Robert Wood Johnson Foundation. Address reprint requests to J. G. Jarvik, MD, Department of Radiology, Box 357115, University of Washington, Seattle, WA 98 t 95. Acad
Radio11996;3:$28-S31
9 1996, Association of University Radiologists
Vol. 3, Suppl. 1, April 1996
tance, as well as the patient-centered outcome of healthrelated quality of life. Baseline measurements collected on the 25 patients enrolled in tile study to date are provided, and these scores are compared with those from another ongoing prospective cohort study of patients with low back pain. MATERIALS AND METHODS Patient Recruitment and Enrollment
Patients are recruited from tile general internal medicine and urgent care clinics at Harborview Medical Center (a large county hospital) and the Seattle Veterans Affairs Medical Center. A research assistant is stationed in the radiology department to identify all requests for plain film radiography of the lumbar spine. For those patients referred from a participating clinic, t informed consent is obtained if the following eligibility criteria are m&: (1) history of low back pain, (2) no low back surgery within 1 year of enrollment, (3) no history of acute external trauma, (4) no metallic hardware (e.g., Harrington rods) in lumbar spine, (5) no contraindications for MR imaging, (6) a telephone contact available, and (7) no plans to move within the next 6 months. After enrolhnent, both patients and referring clinicians are asked to respond to a series of questionnaires. Patients are then randomly assigned to groups that receive either a screening MR imaging examination or plain films of the lumbar spine. Group assignments were made before the start of the study using computer-generated random numbers and a block random method to achieve balanced groups. The research assistant was not invoh,ed in this process. The assignments were then placed in sealed envelopes that are o p e n e d by the research assistant when a subject is ready to be assigned to a group. After imaging studies are complete, the subjects return to their clinician for usual care, with followup continuing for 3 months after enrollment. Diagnostic and Therapeutic Impact (Physician-Centered Intermediate Outcomes)
Our questionnaire investigates four aspects of decision making. The first area deals with tile reason the clinician ordered the diagnostic test. We determine whether tile primary interest is in establishing or excluding a diagnosis, what that diagnosis is, and the estimated probability that. tile patient has that diagnosis. We also ask clinicians to create a list of differential diagnoses with probabilities for each diagnosis. Tile change in probabilities is used
MR VERSUS PLAIN FILMS FOR LOW BACK PAIN
before and after the imaging test to determine the change in diagnostic entropy (see below). The second and third aspects of the questionnaire are the diagnostic and therapeutic evaluation. Clinicians are asked before and after imaging what further diagnostic testing they anticipate and what treatment they plan. This enables us to compare h o w their diagnostic and treatment plans changed as a result of the imaging test. The last area assessed is the amount of reassurance provided to both tile patient and physician by the diagnostic test. Patient-Centered Outcome Measures
Our choice of measures for evaluating health-related quality of life was based on those used for the Maine Lumbar Spine Study [3]. This is a prospective cohort of patients with sciatica who have been followed up for 2 years. All measures shown in Exhibit 1 have recently had their reliability, responsiveness, and validity established in this patient cohort. Patrick et al. [3] created two symptom indexes that measure the frequency and bothersomeness of back pain on a scale of 0-24. The Modified Roland scale is a back pain-specific functional status scale that consists of 23 yes-no items [4]. A single score is derived by simply
EXHIBIT 1 H e a l t h - R e l a t e d Quality-of-Life M e a s u r e s I. Medical Outcomes Study Short Form-36 (SF-36) A. General functional status and well-being questionnaire B. Domains 1. Physical functioning 2. Physical rote limitations 3. Emotionalrole limitations 4. Social functioning 5. Bodily pain 6. General mental health 7. Vitality 8. General heaIth C. Each subscale is scored from 0 (worst) to 100 (best) II. Modified Roland Scale A. Back pain specific functional status questionnaire B. Developed from Sickness Impact Profile (SIP) C. Items scored from zero to 23 III. Symptom Scales A. Developed by back pain PORT project B. Frequency of back or leg pain C. Bothersomeness of back or leg pain D. Each indexscoredfrom zero to 24 IV. Disability Due to Back Pain A. Days of school or work missed B. Days spending more than half of day in bed C. Days activity decreased by more than half
$29
JARVIK ET AL.
adding the number of items to which the respondent has answered yes. This was the measure most responsive to clinical changes in the blaine study. The bledical Outcomes Study Short Form 36-item health survey questionnaire (SF-36) is a general health status questionnaire that has eight domains, each scored separately on a 0-100 scale, with 0 being poor health and 100 being excellent health [5]. The most responsive domains were the physical dimensions (physical functioning, physical role limitations, and bodily pain). Finally, we measured three types of disability days due to the patients' back pain: (1) number of times more than a half a day of school or work was missed, (2) number of times more than half a day was spent in bed, and (3) number of times activity was decreased for more than half the day. These questionnaires were administered to the subjects before randomization and 3 months after imagI ing. Additignally, at 1 and 3 months after imaging, patients r e s p o n d e d to a modified version of the D e y o / Diehl patient satisfaction questionnaire. This instrument was designed to measure the patients' degree of satisfaction with their primary care physician. We m o d i f i e d tlie questionnaire by deleting o n e question about w h e t h e r the patient thought an imaging study was n e e d e d and adding two questions about the reassurance the patients attributed to the imaging study they underwent.
Imaging Protocols We used an imaging protocol for which we had previously demonstrated good reliability and performance compared with a traditional four-sequence study (Robertson W et al., presented at the Radiological Society of North America meeting, November 1993). The decreased imaging time of this examination made it cost competitive with plain films in this setting. The pulse sequence used for sagittal scans was two-dimensional fast spin-echo (2DFSE), repetition time (TR) = 3,000 msec, echo tinm (TE) = 85 msec, two excitations, 256 x 256 matrix, 24 x 18 cm field of view, 5-mm sections with 0-n~n skip, 16 kHz, saturation pulse (SAT), superior-inferior anterior-posterior, nine slices, 1 min 18 sec imaging time. For axial scans, we used 2DFSE, TR = 4, 000 msec, TE = 72 msec, two excitations, 256 x 128 matrix, 16/16 cm field of view, 4-mm sections with 1-mm skip, 16 kHz, SAT anterior, 12 slices, 1 mill 4 sec imaging time. We did not restrict the number of views obtained for the plain films, although the majority so far have been two views (anteroposterior and lateral).
$30
Vol. 3, Suppl. 1, April 1996
l~ata Analysis We first compared baseline characteristics to verify t that the random-assignment process produced balanced groups. This comparison focused on those factors that tend to predict outcome in patients with low back pain, such as age, sex, duration of pain, and baseline values of the health-related quality-of-life measures. To compare groups, we used chi-square analysis and Fisher's exact tests for categorical variables (e.g., sex), the t test for continuous variables (e.g., age), and the blann-Whitney U test for ordinal variables. To quantify the impact of the diagnostic test on the formation of a differential diagnosis, we calculated the diagnostic "entropy" before and after imaging. Entropy is a quantitative measure of diagnostic uncertainty that has been used to assess the impact of information from tests or procedures [6]. When several possible diagnoses are part of the differential, and diagnostic probabilities are spread evenly among them, entropy is high. When diagnostic probabilities concentrate on one diagnosis, with low probabilities for all others, entropy is low. As diagnostic certainty decreases, the spread of diagnostic probabilities increases, as does entropy. Conversely, as diagnostic cemainty increases, entropy decreases. RESULTS
In general, the mean scores and ranges that we observed in our sample were similar to the results Of the much larger cohort from Maine. Randomization appeared to have worked because the MR imaging and plain film groups were not significantly different at baseline with respect to sex, education, employment status, disability claims, length of current episode, or number of prior episodes of back pain. Also, the healthrelated quality-of-life measures were not significantly different between groups (Table 1). The MR group had a mean blodified Roland score of 17.5, which is nearly 5 points higher than the plain fihn group, and this difference approaches statistical significance. A difference of 3 points on the Roland scale is thought to be clinically significant [4]. Although this baseline difference between groups may simply reflect random chance, given the small number of patients, if the groups are not balanced by the end of the pilot study, we will need to control for differences in baseline severity when comparing patient outcomes. Similarly, results for tile physician questionnaire demonstrated no statistically significant differences between
Vol. 3, Suppl. 1, April 1996
MR VERSUS PLAIN FILMS FOR LOW BACK PAIN
TABLE 1: Baseline Measurements Compared with Maine Lumbar Spine Study l
Baseline Measurements Measure
Maine Lumbar Spine Study
Modified Roland Short Form-36 Physical Role physical Role emotional Social Bodily pain Mental health Vitality General health Symptom scales Sciatic frequency index Bothersome index Disability days Decreased activity Days in bed Days missed work
Plain Films (n = 10)
MR Imaging (n = 12)
15.8 + 5.4
12.7 _+6.3
17.5 + 6.1
,06
37.5 + 27.3 12.3 _+26.4 52.3 + 43.3 47.7 + 31.2 26.2 + 21.2 63.6 + 19.8 39.9 + 22.0 75.7 _+ 18.7
43.0 25.0 56.7 52.5 32.4 45.3 35.4 54.3
38.2 22.9 52.8 47.9 21.2 50.0 35.1 43.0
,71 .90 .85 .76 .24 .50 .97 .29
15.4 + 6.3 15.2
10.7 + 4.4 11.9 + 5.2
12.9 _+4.8 14.3 + 4.9
.28 .28
19.7 + 11.4 8.9 _+11.1 12.9 _+ 12.6
12.2 + 13.6 6.2 + 12.6 0.5 + 1.1
15.5 + 11.7 8.0 -+ 8.7 1.7 + 3.3
.56 .70 .26
+ 31.8 + 35.4 + 47.3 + 36.2 + 24.3 _+15.5 +_ 19.6 + 26.8
9
+ 26.2 + 37.6 + 46.0 + 34.1 _+ 18.6 _+ 16.3 _+ 18.3 + 21.7
p (t test)
ii
Data are written as mean + standard deviation.
groups. Physicians tended to rate MR imaging as being more useful tlian plain film studies (p = .09), to say that MR imaging helped then] to avoid further testing more often (p = .06), and to believe that MR imaging was more likely to affect their treatment plan (p = .07). CONCLUSIONS
Randomization is a powerful tool against bias that has been underused in clinical radiology research. We have s h o w n that it is feasible to randomize patients to different imaging techniques and to obtain healthrelated quality-of-life outcome measures in conditions such as back pain, for which mortality and cnre are inappropriate. It is also feasible to ,neasure physicians' decision-making parameters at the same time. These
results are encouraging for the performance of a fullscale randomized trial. REFERENCES 1. Agency for Health Care Policy Research.Acute low back problems in adults: assessment and treatment. (tech. rep. no. 95-0643). Washington, DC: U.S. Departmentof Health and Human Services,31131994. 2. Boden S, David D, Dina T, et al. Abnormal magnetic resonance scans of the lumbar spine in asymptomatic subjects: a prospective investigation. J Bone Joint Surg Am 1990;72:403-408. 3. Patrick D, Deyo R, Atlas S, Singer D, Chapin A, Keller R. Assessing healthrelated quality of life in patients with sciatica. Spine 1995;20:1899-1909. 4. Roland M, Morris R. A study of the natural history of back pain: Development of a reliable and sensitive measure of disability in low back pain. Spine 1983;8:141-144. 5. Ware J, Sherbourne C. The MOS 36-item Short-Form Survey (SF-36): 1. Conceptual framework and item selection. Med Care 1992;30:473-483. 6. PitkeatNy D, Evans A, James W. The use of information theory in evaluating the contribution of radiological and laboratory investigations to diagnosis and management. Clin Radio11979;30:643-647.
$31