The effect of training and experience on the magnetic resonance imaging interpretation of meniscal tears

The effect of training and experience on the magnetic resonance imaging interpretation of meniscal tears

The Effect of Training and Experience on the Magnetic Resonance Imaging Interpretation of Meniscal Tears Lawrence M. White, M.D., Mark E. Schweitzer, ...

473KB Sizes 0 Downloads 35 Views

The Effect of Training and Experience on the Magnetic Resonance Imaging Interpretation of Meniscal Tears Lawrence M. White, M.D., Mark E. Schweitzer, M.D., Diane M. Deely, M.D., and William B. Morrison, M.D.

To evaluate the effects of experience and training in the magnetic resonance imaging (MRI) diagnosis of meniscal tears 30 consecutive patients (60 menisci) in whom MRI of the knee with arthroscopic confirmation of meniscal status were studied. MRIs were interpreted by 10 reviewers of varying levels of training and experience ranging from first-year radiology residents to attending musculoskeletal radiologists. Sensitivity and specificity, and intraobserver variability of MRI interpretation of meniscal tears were calculated for each reviewer and compared to those of readers of the same and varying levels of MRI training and experience. Accuracy (range, 78% to 88%), sensitivity (range, 79% to 88%), and specificity (range, 72% to 94%) results were high, and intraobserver agreement was moderate to high (range, 0.49 to 0.77), in the diagnosis of meniscal tears for all reviewers with 4 or more years of radiology residency training and 3 months of formal MRI experience. In contrast, the accuracy (range, 63% to 82%), sensitivity (range, 58% to 79%), and specificity (range, 58% to 72%) results of reviewers with less experience and training were lower, with higher intraobserver variability. Our results suggest that experience and training play an important role in the accurate and reliable MRI diagnosis of Ineniscal tears. Key Words: Magnetic resonance imaging--Meniscal tears--Experience--Training

'agnetic resonance imaging (MRI) is an im,portant modality in the evaluation of internal derangement of the knee.1 Many observers would agree that evaluation of the menisci is the most difficult part of this examination. Despite this, MRI has been shown to be highly accurate in the diagnosis of meniscal tears) -5 The difficulty with most studies in the investigation of meniscal pathology is that they are performed at academic institutions by investigators with recognized expertise. Coincident with this expertise is a high level of experience. It has been shown that experience and training are important factors in the interpretation

M

From the Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pennsylvania, U.S.A. Address correspondence and reprint requests to Mark E. Schweitzer, M.D., Department of Radiology, Thomas Jefferson University Hospital, 132 lOth St, 1096 Main Bldg, Philadelphia, PA 19107, U.S.A. © 1997 by the Arthroscopy Association of North America 0749-8063/97/1302-150753.00/0

224

of many imaging tests and, consequently, the determination of the accuracy of the test. 69 Therefore the purpose of this study was to evaluate the effect of training and experience in the MRI interpretation of meniscal tears.

MATERIALS AND METHODS Thirty consecutive patients who had MRI examinations of the knee followed by arthroscopic surgery were studied (age range, 16 to 79 years; 16 male, 14 female). All patients underwent imaging on a 1.5 Tesla MRI scanner (Signa, General Electric Medical Systems, Milwaukee, WI) using a Quadrature transmitreceive extremity coil (General Electric Medical Systems). Both intermediate-weighted conventional spin echo (SE) and T2-weighted fast spin echo (FSE) images were obtained in each case in the sagittal and coronal planes. The knee was positioned in full exten-

Arthroscopy." The Journal of Arthroscopic and Related Surgery, Vol 13, No 2 (April), 1997: pp 224-228

MR1 INTERPRETATION: TRAINING AND EXPERIENCE

sion with 10° to 20 ° of external rotation. Intermediateweighted SE images were performed with a repetition time (TR) of 1,000 to 1,120 milliseconds and an echo time (TE) of 20 milliseconds (4-ram thick slices, lmm interslice gap, 16-cm field of view, 256 × 256 acquisition matrix, 1 NEX). Fat suppression was utilized on the sagittal SE images obtained. FSE imaging was performed with a TR of 3,800 to 4,250 milliseconds, a TE of 100 milliseconds, and an echo train length (ETL) of 8 (4-ram thick slices, 1-mm interslice gap, 12-cm field of view, 256 × 256 acquisition matrix, 2 NEX). 1° Meniscal windows were not used in this study because their use has been shown to be unnecessary in the accurate diagnosis of meniscal tears, u Arthroscopic findings of both the medial and lateral menisci (n = 60) were used as the standard of reference for the presence of a meniscal tear. Fifteen tears of the medial menisci and nine tears of the lateral menisci were found at surgery. Three of the 60 menisci were described at arthroscopy as frayed, without mention of a discrete tear and were not characterized as torn for the purpose of this study. Ten reviewers of varying degrees of experience separately and independently evaluated each meniscus for the presence or absence of a meniscal tear. The reviewers ranged in degree of MRI training and experience from first-year radiology resident to attending staff musculoskeletal radiologist. A random selection of 15 of the patients (30 menisci) were re-evaluated independently by each reviewer 2 to 3 weeks after the initial sitting for a determination of intraobserver variability. All reviewers were blinded to the results of arthroscopic surgery. The reviewers were aware that each patient had undergone an arthroscopic procedure but did not know the reason and did not know the ratio of torn to nontorn menisci. Four of the reviewers were first-year radiology residents with a single month of body MRI training, composed of approximately 80% musculoskeletal case material. Of these four first year residents, two reviewed the cases immediately at the end of their initial one month MRI rotation (Readers 1 and 2). One of these first year residents (Reader 1) also made a separate preliminary reading of the cases, on the first day of their MRI rotation, without formal MRI training. The two other first year residents (Readers 3 and 4) reviewed the cases several months after their month of MRI training. Therefore, the first-year resident reviewers included two residents reviewing cases immediately after their MRI rotation, and two after a time delay. Two additional radiology residents reviewed the

225

cases 6 months after their last period of formal MRI training including a third year resident with 1 month of formal MRI experience (Reader 5), and a fourth year resident with 3 months of MRI training (Reader 6). The remaining reviewers consisted of two musculoskeletal radiology fellows with the equivalent of 1 year of MRI experience (Readers 7 and 8), and two attending staff musculoskeletal radiologists with 4 to 5 years of MRI experience (Readers 9 and 10). Results of the MRI interpretations from each reviewer and surgical data regarding the presence or absence of a meniscal tear were analyzed using a statistical database (Statview 4.01; Abacus Concepts, Berkeley, CA). Sensitivity, specificity, and accuracy values in the detection of lateral and medial meniscal tears were determined for each reviewer (Table 1), as were the number of false-negative and false-positive interpretations (Table 2). Intraobserver variation was calculated for each reviewer, on the basis of the results of the 30 menisci re-evaluated, to allow for an assessment of the effect of training and experience on the reliability of the MRI interpretation of meniscal tears (Table 1). Interobserver variability calculations were performed between the reviewers with 1 month of MRI training as well as between the readers with a greater degree of MRI training to evaluate the reproducibility of meniscal interpretations within groups of distinct levels of experience. Interobserver agreement among the multiple readers with limited MRI training, as well as among those with greater degrees of MRI experience were assessed using K-value) methodology as described by Fleiss. 12 Intraobserver variability was similarly calculated utilizing K-value analysis. K-values were interpreted according to guidelines adapted from Landis and Koch.13 Strength of agreement was given as follows: excellent for K-values 0.81 to 1.00; good for k:-values 0.61 to 0.80, moderate for K-values 0.41 to 0.60, fair for Kvalues 0.21 to 0.40, and poor for K-values <0.20. RESULTS The accuracy, sensitivity, and specificity of the first year radiology resident (who reviewed cases before their initial period of MRI training) were 58% for the diagnosis of meniscal tears. The same reviewer's accuracy immediately following a month-long period of MRI training was 82%, with a sensitivity of 75% and specificity 86%. A second first-year resident (Reader 2) who reviewed the cases immediately after the initial month of MRI training had an accuracy of 78%, sensitivity of 79%, and a specificity of 78% in the diagnosis

L. M. WHITE ET AL.

226

T A B L E 1. Sensitivity, Specificity, and Intraobserver Variability Combined Lateral

Reader Reader Reader Reader Reader Reader Reader Reader Reader Reader Reader

1 Pre 1 Post 2 3 4 5 6 7 8 9 10

Medial

Sens

Spec

Acc

Sens

Spec

Acc

Sens

Spec

Acc

Intraobserver K-Value

33 56 67 78 22 33 67 56 67 56 56

43 76 76 57 86 71 81 95 86 100 95

40 70 73 63 67 60 77 83 80 87 83

73 87 87 73 67 87 93 100 100 93 100

80 100 80 73 73 47 87 80 53 87 60

77 93 83 73 70 67 90 90 77 90 80

58 75 79 75 50 67 83 83 88 79 83

58 86 78 64 81 61 83 89 72 94 81

58 82 78 68 68 63 83 87 78 88 82

0.1246 0.5556 0.7129 0.4643 0.0476 0.3993 0.4886 0.6825 0.5833 0.7538 0.7689

NOTE. Sensitivity, specificity, and intraobserver variability of all observers, in the diagnosis of meniscal tears listed in order of increasing levels of training/experience. Reader 1-Pre, first year resident before any MRI training. Reader 1-Post and Reader 2, first year residents immediately after 1 month of body MRI experience. Readers 3, 4, and 5, two first-year and a third-year resident, respectively, with 1 month of previous MRI training finished several months before testing. Reader 6, a fourth year resident with 3 months of MRI experience and training. Readers 7 and 8, musculoskeletal radiology fellows with 1 year of MRI experience. Readers 9 and 10, attending musculoskeletal radiologists with 4 to 5 years of MRI experience.

o f m e n i s c a l tears. T h e two other first-year r a d i o l o g y residents (Readers 3 and 4) w h o r e v i e w e d the cases several months after the time o f their M R I training were both 68% accurate with sensitivities o f 75% and 50% and specificities o f 64% and 81%, respectively, d i a g n o s i n g meniscal tears. A single third-year resident (Reader 5) with 1 month o f M R I experience and training was 63% accurate with a sensitivity o f 67% and specificity o f 61% in d i a g n o s i n g m e n i s c a l tears. In contrast, the accuracy results o f all reviewers with 3 or m o r e years of r a d i o l o g y training and 3 months o f intensive M R I experience were on average higher and m o r e consistent with one another, ranging from 78% to 88%. The single fourth-year resident assessed who had 3 months o f M R I training (Reader 6) was 83% accurate with a sensitivity o f 83% and a specificity o f 83% in the diagnosis o f m e n i s c a l tears. T w o musculo-

TABLE 2. False-Negative Readings (Undercalls) and

False-Positive Readings (Overcalls) of Each Reader Readers Reader Reader Reader Reader Reader Reader Reader Reader Reader Reader

1 2 3 4 5 6 7 8 9 10

False-Negative

False-Positive

6 5 12 12 8 4 4 3 5 4

5 8 7 7 14 6 4 10 2 7

skeletal r a d i o l o g y fellows (Readers 7 and 8) r e v i e w e d the cases with accuracy results o f 78% and 87%, sensitivities o f 88% and 83%, and specificities o f 72% and 89%, respectively, and two e x p e r i e n c e d attending staff m u s c u l o s k e l e t a l radiologists (Readers 9 and 10) were 82% and 88% accurate, 79% and 83% sensitive, and 81% and 94% specific, respectively, in diagnosing m e n i s c a l tears. Intraobserver a g r e e m e n t was m o d e r a t e to high (range 0.49 to 0.77) for all reviewers in our study 4 or m o r e years o f r a d i o l o g y r e s i d e n c y training and 3 months o f formal M R I experience. The intraobserver reliability assessments o f the interpretations o f readers with lesser degrees o f experience and training were in the p o o r to m o d e r a t e range (0.04 to 0.46). The exception to this were the intraobserver variability results o f the two first-year residents j u s t finishing their month o f M R I training, who both had g o o d intraobserver a g r e e m e n t results ( 0.56 and 0.71, respectively). Interobserver a g r e e m e n t a m o n g reviewers 3 or m o r e years o f r a d i o l o g y training and 3 months o f M R I experience was g o o d (to value = 0.64). In contrast, the interobserver a g r e e m e n t a m o n g less e x p e r i e n c e d reviewers was p o o r (K = 0.32). DISCUSSION T h e interpretive accuracy o f m a n y radiological tests has been shown to be d e p e n d e n t to a certain degree on experience and training. 69 I m p r o v e d diagnostic accuracy coincident with greater levels o f r a d i o l o g i c a l

MRI INTERPRETATION." TRAINING AND EXPERIENCE

training and experience has been shown between emergency room attending staff physicians, junior radiology residents, and a consulting staff radiologist in the interpretation of posttrauma cervical spine radiographs. 6 Similar examples of increased diagnostic accuracy concomitant with greater experience and training have been described between junior and senior radiology residents in the diagnosis of primary musculoskeletal tumors, 7 as well as between junior radiology residents, senior radiology residents, and staff radiologists in the interpretation of emergency room radiographs. 8 One study has similarly cited improved accuracy in the MRI diagnosis of cirrhosis with specialized training. 9 It is our belief that the practical evaluation of the menisci for the presence or absence of a tear is one of the most challenging diagnostic decisions in musculoskeletal MRI. By evaluating the results of reviewers of varying levels of training and experience, we hoped to assess the effects of such experience in the accuracy and reliability of the MRI interpretation of meniscal tears. There is one previous investigation of accuracy and variability (reliability) in the diagnosis of meniscal tears. 14 Although an assessment of interobserver variability was made in this study, there was no evaluation of intraobserver variation. This study introduced the question of the effect of experience in the MRI interpretation of meniscal tears. In this study, the investigators found no significant differences in the diagnostic accuracy or interobserver variability of readings performed by a general radiologist with 1 month of fellowship musculoskeletal MRI experience and those of two subspecialty radiologists with multiple years of musculoskeletal MRI experience. We used this study as a basis for a more detailed investigation into the effects of both training and experience in the interpretation of meniscal tears. Our results show that readers with longer periods of training (3 or more months of formal MRI training and 4 or more years of radiology residency training) were more accurate and consistent in the diagnostic evaluation of the presence or absence of meniscal tears than those with lesser degrees of training and experience. The absolute interpretive accuracy results of this group of reviewers (78% to 88%), likely reflective of the day-to-day accuracy of MRI at our institution in the diagnosis of meniscal tears are lower than those cited elsewhere in the literature. 2-5 Interestingly, little difference was observed in the accuracy or intraobserver variability present between the reviewer with 4 years of radiology training including 3 months of MRI experience (Reader 6), and other reviewers with even more experience.

227

An exception to the usually poorer results of less experienced reviewers were the two first-year residents who interpreted the cases immediately after an initial period of MRI training (Readers 1 and 2). The results of these readers were more accurate and more reliable than those of other readers with similar amounts of MRI experience. A possible explanation for these results lies in the fact that both of these reviewers were completing a month of intensive body MRI experience and teaching, having evaluated an average of 10 knee MRI examinations daily during this time. In contrast the other junior residents with lower accuracy results and higher intraobserver variability (Readers 3, 4, and 5), had finished their initial period of MRI training earlier during their first year of residency training. The best explanation for these results is the difficulty of retaining recently learned material. K-value analysis of interobserver variability among observers with less than 3 months of MRI experience and between reviewers with greater amounts of MRI experience was chosen as the most suitable statistical method to measure agreement, corrected for chance, within these two groupings of multiple readers. 12 Kvalue analysis was used similarly in evaluating intraobserver variability. As a result of practical considerations, only a subset of menisci were re-read (n = 30) for the determination of intraobserver variability. The strength of this analysis might have been greater with re-evaluation of all of the menisci. Similarly, because of practical considerations, re-readings for determining intraobserver variation were performed after 2- to 3-week delay interval. Although relatively short, we felt that this was a sufficient temporal delay for readers to forget their prior observations. Another limitation of our study was the utilization of arthroscopy as the standard of reference for the determination of meniscal pathology. Although arthroscopy is the best standard available at this time, it is not a perfect standard for the presence or absence of a meniscal tear. 15 A population selection bias was introduced into our study since only selected patients underwent MRI and only a fraction of these went on to arthroscopic surgery. It is unclear ethically how such a bias could be avoided. Imaging bias was also created because the decision to perform arthroscopy was based to a significant extent on the prospective readings of the MRI examination. One more limitation of this investigation was the limited number of observers evaluated at each level of MRI experience and training requiring grouping of trainees for adequate statistical analysis. Lastly all the readers in the study were from one institution and were trained by one group of individuals. While

228

L. M. W H I T E E T AL.

this fact is a d v a n t a g e o u s in that all r e a d e r s had s i m i l a r t r a i n i n g e x p e r i e n c e s , it m a y h a v e for the s a m e r e a s o n s i n f l u e n c e d the o v e r a l l a b s o l u t e i n t e r p r e t i v e a c c u r a c y o f all r e a d e r s in the d i a g n o s i s o f m e n i s c a l tears. O u r results s u g g e s t that a l e a r n i n g c u r v e exists in the i n t e r p r e t a t i o n o f M R I s o f m e n i s c a l tears. A s w i t h o t h e r r a d i o l o g i c a l d i a g n o s t i c skills, t h e r e appears to b e an i n c r e a s e in the i n t e r p r e t i v e a c c u r a c y o f M R I in the d i a g n o s i s o f m e n i s c a l tears c o i n c i d e n t w i t h M R I e x p e r i e n c e and training.

Acknowledgments: The authors thank Susan K. DeWyngaert, M.D., Patrick T. Lui, M.D., Daniel M. Radack, M.D., Kristin M. Gerndt, M.D., Steven G. Moss, M.D., Carin F. Gonsalves, M.D., and Janio Szklaruk, M.D. for their valuable help in reviewing each of the MRI cases and Karen M. Russell, MSc. for assistance in the statistical analysis of our data. REFERENCES 1. Mink JH, Reicher MA, Crues JV III, Deutsch AL. MRI of the knee. Ed. 2. New York: Raven, 1993. 2. Crues JV III, Mink J, Levy TL, Lotysch M, Stoller DW. Meniscal tears of the knee: Accuracy of MR imaging. Radiology 1987; 164:445-448. 3. De Smet AA, Norris MA, Yandow DR, Quintana FA, Graf BK, Keene JS. MR diagnosis of meniscal tears of the knee: Importance of high signal that extends to the surface. AJR 1993; 161:101-107.

4. Mesgarzadeh M, Moyer R, Leder DS, et al. MR imaging of the knee: Expanded classification and pitfalls to interpretation of meniscal tears. Radiographics 1993; 13:489-500. 5. Reicher MA, Hartzman S, Duckwiler GR, Bassett LW, Anderson LJ, Gold RH. Meniscal injuries: Detection using MR imaging. Radiology 1986; 159:753-757. 6. Annis JAD, Finlay DBL, Allen MJ, Barnes MR. A review of cervical-spine radiographs in casualty patients. Br J RadioI 1987;60:1059-1061. 7. Piraino DW, Amartur SC, Richmond B J, et al. Application of an artificial neural network in radiographic diagnosis. J Digital Imaging 1991;4:226-232. 8. Rhea JT, Potsaid MS, DeLuca SA. Errors of interpretation as elicited by a quality audit of an emergency radiology facility. Radiology 1979; 132:277-280. 9. Mitchell DG, Lovett KE, Harm HWL, Ehrlich S, Palazzo J, Rubin R. Cirrhosis: Multiobserver analysis of hepatic MR imaging findings in a heterogeneous population. J Magn Reson Imaging 1993;3:313-321. 10. Schweitzer ME. Clinical MR desktop data: Knee. Protocols for suspected internal derangements. J Magn Reson Imaging 1993; 3(S):14. 11. Buckwalter KA, Braunstein EM, Janizek DB. Vahey TN. MR imaging of meniscal tears: Narrow versus conventional window width photography. Radiology 1993; 187:827-830. 12. Fleiss JL. Statistical methods for rates and proportions. Ed. 2. New York: John Wiley, 1981;212-234. 13. Seigel DG, Podgor MJ, Remaley NA. Acceptable values of kappa for comparison of two groups. Am J Epidemiol 1992; 135:571-578. 14. De Smet AA, Norris MA, Yandow DR, Graf BK, Keene JS. Diagnosis of meniscal tears of the knee with MR imaging: Effect of observer variation and sample size on sensitivity and specificity. AJR 1993; 160:555-559. 15. Quinn SF, Brown TF. Meniscal tears diagnosed with MR imaging versus arthroscopy: How reliable a standard is arthroscopy? Radiology 1991; 181:843-847.