Assessment of intraaxial and extraaxial brain lesions with digitized computed tomographic images versus film: ROC analysis

Assessment of intraaxial and extraaxial brain lesions with digitized computed tomographic images versus film: ROC analysis

Assessment of Intraaxial and Extraaxial Brain Lesions with Digitized Computed Tomographic Images versus Film: ROC Analysis Suzie M. EI-Saden, MD, Geor...

2MB Sizes 0 Downloads 27 Views

Assessment of Intraaxial and Extraaxial Brain Lesions with Digitized Computed Tomographic Images versus Film: ROC Analysis Suzie M. EI-Saden, MD, George J. Hademenos, PhD, Wei Zhu, PhD, J a m e s W. Sayre, PhD, Brad Glenn, MD, Jim Steidler, MD, Lakshmi Kode, MD, Brian King, MD, Diana Quinones, MD, Daniel J. Valentino, PhD, John R. Bentson, MD

R a t i o n a l e a n d O b j e c t i v e s . The authors evaluated the diagnostic accuracy of viewing c o m p u t e d tomographic (CT) scans as film versus soft-copy images at a workstation. M e t h o d s . Receiver operating characteristic analysis of the interpretation of 202 CT scans (103 were normal, 99 w e r e abnormal) by five neuroradiologists was performed. Abnormal images contained high- or low-attenuation intraaxial lesions or extraaxial fluid (subdural, subarachnoid, or epidural hemorrhage). Hard copies w e r e read on a standard light box, and digital images were examined at a 1,024 × 1,250 workstation. Lesion location and type and confidence ratings were recorded on a worksheet. R e s u l t s . There w e r e no statistically significant differences in diagnostic accuracy b e t w e e n the two display modes. Reader performance was slightly better with the workstation in the assessment of low-attenuation lesions. C o n c l u s i o n . Diagnostic accuracy is similar for CT scans displayed at a workstation and those displayed as hard c o p y in the assessment of subtle intra- and extraaxial brain lesions. K e y W o r d s . Brain, CT; images, display;images, interpretation; picture arehiving and communication system (PACS). he field of neuroradiology has undergone major transitions during the past decade. Imaging in neuroradiology has evolved from limited plain radiographic examinations to high-volume digital data sets. These data sets consist of cross-sectional images obtained by using volume-acquisition or cine c o m p u t e d t o m o g r a p h y (CT), magnetic resonance (MR) imaging, MR angiography, MR spectroscopy, digital subtraction angiography, and other methods [ 1,2]. The multisequence, multisection examinations generated often must be c o m p a r e d with large volumes of historic imaging data. Because of these technologic developments, needs for image analysis and information management have emerged that exceed the capabilities of current film-based systems. Given the trend toward digitization, the neuroradiologist will ultimately be required to use a digital-display workstation for viewing CT and MR

T

90

From the Department of Radiologicaf Sciences, University of California at Los Angeles (UCLA) Center for the Health Sciences, Los Angeles, CA. This researchwas supportedin part by program project grant 1P01CA51198 from the National Cancer Institute. Address reprint requeststo S. M. EI-Saden,MD, Division of Neuroradiology,Departmentof Radiological Sciences, Rm BL-133, UCLA Center for the Health Sciences, 10833 Le Conte Ave, Los Angeles, CA 90024-1721. Received June 26, 1995, and acceptedfor publication after revision October24, 1996. Acad Radiol 1997;4:90-95 @1997, Association of University Radiologists

Vol. 4, No, 2, February 1997

cross-sectional images. Most of these digital image-management systems have a display workstation with a spatial resolution of either 1,024 × 1,024 or 2,048 x 2,048 [3]. The digital system allows the user to interact with the image data by manipulating factors such as w i n d o w and level, magnification, and cine mode and to retrieve earlier studies for immediate comparison. This system is therefore expected to provide the radiologist with many advantages compared with the film-based system. Research assessing the diagnostic accuracy and efficiency of film versus workstation display has been limited, however, and the studies that have been performed have focused primarily on MR imaging [4]. We therefore performed a receiver operating characteristic (ROC) study to assess the diagnostic performance of five neuroradiologists in the detection of subtle brain lesions on hard- and soft-copy CT scans.

MATERIALS AND METHODS Image Selection All CT scans obtained between July 1994 and December 1994 at our institution were reviewed, and 202 scans obtained at brain window settings were selected. The scans were obtained with either a GE 9800 unit (GE Medical Systems, Milwaukee, WI) or a Siemens Somatom S unit (Siemens Medical Systems, Iselin, NJ) and were archived into the departmental picture archiving and communication system within 24 hours of acquisition. We presume that there was no objective difference in the quality of images obtained with the two CT scanners, because rigorous quality-control procedures were performed regularly on both units. One hundred three of the selected images, chosen randomly from all levels of the brain, were normal and served as controls. Approximately one-third of these normal images were of the posterior fossa, and the remaining images were of the supratentorial compartment. The other 99 images selected for this study were abnormal and showed three categories of lesions: intraaxial high-attenuation lesions, intraaxial low-attenuation lesions, and extraaxial fluid collections. The latter category was further subdivided into subdural hemorrhage, epidural hemorrhage, and subarachnoid hemorrhage. Of the 99 scans that depicted lesions, approximately 33 scans (33%) showed intraaxial low-attenuation lesions, 33 scans (33%) showed intraaxial high-attenuation lesions, and 33 scans (33%) showed extraaxial blood or fluid collections. Examples of each of the lesion categories are provided in Figure 1. Subarachnoid hemorrhage always showed high attenu-

INTRAAXIAL AND EXTRAAXIAL BRAIN LESIONS

ation, but subdural and epidural hemorrhage showed increased attenuation, isoattenuation, or decreased attenuation w h e n compared with the brain. High-attenuation intraaxial lesions were most often due to acute hemorrhage within the brain, but the range of high-attenuation intraaxial lesions also included dense primary brain tumors or metastases and occasionally a small calcification of the brain parenchyma. Although all of these lesions vary in their attenuation, all appear hyperattenuating w h e n compared with surrounding brain parenchyma and were evaluated no differently by the observers. Associated low-attenuation edema, which is commonly found surrounding a high-attenuation intraaxial lesion, was not counted as a separate lesion. Low-attenuation intraaxial lesions detected included small-vessel ischemic infarcts and low-attenuation tumors. All images selected had been obtained without contrast material. Lesions of the dura were excluded, as were lesions of the sella turcica and skull base. Occasionally, an image showed several different types of abnormalities. The test images selected by two neuroradiologists (S.M.E., J.R.B.) showed abnormalities that were subtle [5] but not obscure. A total of approximately 800 CT scans were reviewed to obtain the final 202 images used in the study. Reference forms were generated from these selected images and were used to grade the worksheets filled out by the readers. The same 202 images were then retrieved from the picture archiving and communication system and displayed in soft-copy format on a 1,024 workstation (Sienet DVC; Siemens Medical Systems, Erlangen, Germany) with an actual matrix size of 1,024 x 1,250. The 202 test images were then randomly assigned by using a random number generator; approximately 50 images were assigned numbers in the range of 100-199, 50 were assigned numbers of 200-299, 50 were assigned numbers of 300-399, and 50 were assigned numbers of 400-499. Each series of images was further randomly assigned w h e n viewed in the hard-copy format or the softcopy format so that the order of appearance was different for each type of display. Before the readers began the study, all images were reviewed at the workstation by one person (S.M.E.) to verify that they corresponded to the hard-copy images. Factors such as the amount of time required for image interpretation with each type of display were not studied.

ROC Analysis Five neuroradiologists (B.G., J.S., L.K., B.K., D.Q.) re-

91

EL-SADEN ET AL

A.

Vol.4, No.2, February1997

B.

C.

FIGURE 1. Axial CT scans depict the three categories of lesions shown to the observers. A, Scan through the posterior fossa shows a high-attenuation intraaxial lesion (arrow). B, Scan through the brain at the level of the lateral ventricles shows a lowattenuation intraaxial lesion (arrow). C, Scan through the posterior fossa shows extraaxial subdural hematoma along the left leaf of the tentorium (arrows). /2, Scan through the cerebrum shows extraaxial epidural hematoma over the right frontal convexity (arrows). An associated fracture is present on images obtained with bone windows (not shown). E, Scan through the cerebrum shows subarachnoid hemorrhage in a right frontal sulcus (arrow). D.

viewed and evaluated all images in both formats. Two of the readers were fellows nearing the end of a 2-year fellowship, and three were fellows nearing the end of the 1st year of their 2-year fellowship. For every image, observers completed a structured ROC form on w h i c h their degree of confidence in diagnosing the lesion was specified and the location of the lesions was given (Fig 2). Specifically, they were told to categorize the intraaxial lesions according to their location and attenuation and to subcategorize the extraaxial lesions according to w h e t h e r they were subarachnoid, subdural, or epidural. No time limit was imposed for viewing scans in either display mode, and the amounts of time required for interpretation were not recorded. The order of the images was randomly assigned for each observer and each modality. The readers were blinded to

92

E.

patient names, examination date, and hospital identification numbers. Each observer reviewed four sets of images during two separate sessions. The reading sessions were separated by a period of 4 weeks, on average, to reduce the probability of case recognition. Observers underwent training to help them achieve maximum performance in both the film-reading and the workstation sessions. During training, observers were given an exact description of the categories of lesions and sample images (not included in the final study) that were representative of each category of lesion. Training was also provided to familiarize the readers with the various methods for manipulating images at the workstation (eg, w i n d o w and level). The ROC confidence rating scale of 0-4 and the relative meaning of each score were explained. Readers rated their confidence

Vol. 4, No. 2, February 1997

INTRAAXIAL AND EXTRAAXIAL BRAIN LESIONS

NEURO CT ROC WORKSHEET READER:

DATE: FILM

Rating

Scale:

IMAGE NUMBER: 1K

0 .......... 1 .......... 2 .......... 3 .......... 4 0% sure

Low Density

High Density

Rating____

Rating__

50% sure

100%

sure

Extra-axi~tl Abnormalitv Rating__ SDH-Subdural hemorrhage SAH-Subarachnoid ----

I

EDH-Epidural

TABLE 1: Random Assignment of Reading Order Reader

First Session

Second Session

1 2 3 4 5

A 1,B v C 2,D 2 Aa,B 2,C 1,D 1 C 1,D 1,A2,B 2 C2,D a,A v B 1 C v D 1,A 2,B 2

A2,B 2,C 1,D 1 A v B 1,C 2,D a Ca,D 2,A v B1 C v D 1,A2,B a C2,D 2,A 1,B 1

Note.--A, B, C, and D refer to the four individual sessions (about 50 images). Subscript 1 refers to images evaluated on film (hard copy). Subscript 2 refers to images evaluated at the workstation (soft copy).

hemorrhage

hemorrhage

1

FIGURE 2. ROC form used by the observers for each lesion. Note that there are three categories of lesions and that the third category is further subdivided. Under each lesion category is an oval figure; the reader was asked to make an X on this oval at the location of the lesion observed. The reader was instructed to mark the confidence rating only when the image was normal.

(0 = least c o n f i d e n t , 4 = m o s t c o n f i d e n t ) for t h e i r impression of whether the image was normal or abnormal, a n d t h e y s p e c i f i e d t h e t y p e o f a b n o r m a l i t y p r e s e n t (ie, intra- o r extraaxial, l o w o r h i g h a t t e n u a t i o n ) [6]. R e a d e r o r d e r w a s r a n d o m l y d e t e r m i n e d a c c o r d i n g to the s c h e m e p r e s e n t e d in T a b l e 1. W h i l e e v a l u a t i n g images o n t h e w o r k s t a t i o n , o b s e r v e r s w e r e a l l o w e d to man i p u l a t e t h e w i n d o w a n d level settings. A t r a i n e d assistant w a s always p r e s e n t to assist in o p e r a t i n g t h e w o r k station. O n l y o n e i m a g e w a s v i e w e d at a time, a n d r e a d e r s w e r e n o t a l l o w e d to r e v i e w p r i o r i m a g e s in eit h e r format. W h e n h a r d c o p i e s w e r e b e i n g r e v i e w e d , only o n e v i e w b o x w a s i l l u m i n a t e d at a time. Each s h e e t of film (14 × 17 i n c h e s ) that c o n t a i n e d a t e s t i m a g e w a s c o v e r e d b y a m a s k (14 × 17 i n c h e s ) w i t h a c u t - o u t a r e a that a l l o w e d o n l y t h e t e s t i m a g e to b e v i e w e d . T h e cutout area w a s c o v e r e d b y a 3 × 5-inch i n d e x c a r d at all times e x c e p t d u r i n g r e a d o u t .

Before t h e s t u d y b e g a n , t h e light b o x u s e d for readi n g h a r d c o p i e s w a s s e r v i c e d to p r o v i d e m a x i m u m lum i n a n c e ( a p p r o x i m a t e l y 400 foot-lambert). L u m i n a n c e o f t h e w o r k s t a t i o n w a s a p p r o x i m a t e l y 100 foot-lambert.

Statistical Analysis T h e c o m p l e t e d w o r k s h e e t s w e r e initially r e v i e w e d b y o n e o f t h e original staff m e m b e r s w h o selected t h e i m a g e s (S.M.E.). T h e w o r k s h e e t w a s g r a d e d as incorr e c t ff t h e l e s i o n w a s m i s s e d o r m i s c a t e g o r i z e d o r ff its l o c a t i o n w a s m i s m a r k c d . False-positive a n d false-negative findings w e r e r e c o r d e d . If an i m a g e c o n t a i n e d m o r e t h a n o n e l e s i o n a n d all lesions w e r e n o t identified, t h e w o r k s h e e t w a s g r a d e d as i n c o r r e c t . Also, if the correct category was marked on the worksheet but t h e l e s i o n l o c a t i o n w a s n o t d r a w n o n t h e figure (Fig 2), this result w a s c o n s i d e r e d i n c o r r e c t , as o n e c o u l d n o t h e sure o f w h e r e t h e r e a d e r s a w t h e lesion. F o r t h e c a t e g o r y o f e x t r a a x i a l lesions, t h e e x a c t c o m p a r t m e n t in w h i c h t h e l e s i o n o c c u r r e d ( s u b d u r a l o r e p i d u r a l s p a c e ) h a d to b e c o r r e c t . Evaluations o f t h e 202 s e l e c t e d i m a g e s w e r e anal y z e d statistically. D e t e c t i o n a c c u r a c y w a s i n d i c a t e d b y t h e a r e a u n d e r t h e ROC c u r v e (Az). Data for e a c h obs e r v e r w e r e a n a l y z e d i n d e p e n d e n t l y b y using CORROC II, a c o m p u t e r p r o g r a m d e v e l o p e d b y Metz et al [7]. W e j a c k k n i f e d cases for e a c h r a d i o l o g i s t s e p a r a t e l y to c r e a t e a p s e u d o v a l u e v e c t o r for e a c h o b s e r v e r . W i t h t h e j a c k k n i f e m e t h o d , for e a c h a r e a a s s o c i a t e d w i t h a g i v e n r e a d e r , t h e area is e s t i m a t e d n times, e a c h t i m e d e l e t i n g o n e o f t h e n cases in t h e study. T h e s e areas are t h e n u s e d to c o n s t r u c t n p s e u d o v a l u e s . W e t h e n t r e a t e d t h e p s e u d o v a l u e as d a t a for t h e p u r p o s e o f statistical analysis [8]. A n a v e r a g e ROC c u r v e is s h o w n in

93

EL-SADEN

ET

High 1

Vol. 4,

AL

Density

Low

Density 1.00

1.00

~ -

0.8

0.80

0.80

0.6

0.60

0.60

0.4

0.40

0.40

__

0.2 ~

0 0

1 0.2

1K PC

~~~~i , , ~~I ~~, ~I 0.4

0.6

0.8

- - H C ...... 1K

0.20

1

, ~~ , I ~ 0.00 0.00 0.20

~

,

~, i

0.40

FPF

0.60

Extra-axial

H3 1K

0.20

~

0.80

~

No. 2, F e b r u a r y 1 9 9 7

0.00 1,00

FPF

i~ 0.00

, , l l ~ L , I r r ~ l l ~ t , l J ~ 0.20

0.40

0.60

0.80

1.00

FPF

A. B. C. FIGURE 3. Comparison of average A z values of film and digital images for intraaxial high-attenuation lesions (A), intraaxial low-attenuation lesions (B), and extraaxial lesions (C). FPF= false-positive fraction, HC = hard-copy images, TPF= true-positive fraction, and 1K= workstation.

Figure 3. ROC analysis for the five observers for each display mode (film Vs digital display) was performed by using the CORROC II program.

RESULTS

The Az values for each type of image are s h o w n in Table 2, along with the standard deviation for each value and confidence limits for the difference in values. For low-attenuation lesions, Az values ranged from 0.87 to 0.98 for those seen on hard-copy images and from 0.94 to 0.98 for those seen on soft-copy images. For high-attenuation lesions, Az values ranged from 0.96 to 0.99 and from 0.93 to 0.99, respectively. For extraaxial lesions, Az values ranged from 0.63 to 0.98 and from 0.83 to 0.99, respectively. These large Az values occurred even after every attempt was made to obtain scans with lesions that were sufficiently subtle. The confidence limits for the individual readers are also shown in Table 2. For extraaxial and low-attenuation lesions, reader 1 (2nd-year fellow) and reader 4 (lst-year fellow) performed slightly better using the workstation, Overall, use of the jackknife method for aggregation resulted in no statistically significant difference in ROC findings b e t w e e n soft- and hard-copy viewing. Figure 3 shows comparisons of average ROC curves for both types of display and each type of lesion. Note that for low-attenuation lesions, there was slightly better observer performance with the soft-copy display. One reader closely mirrored truth for two of the sessions, w h i c h yielded degenerate data sets. This reader's values were omitted from Table 2A and 2C.

94

DISCUSSION

The results of this study indicate that there is no statistically significant difference in detection of lesions selected for this study with hard- versus soft-copy display. Although readers 1 and 4 performed better at the workstation, the finding was not consistent among all readers. Readers consistently preferred the workstation over the hard copies; however, this was just a subjective measurement. Readers c o m m e n t e d that manipulation of the soft-copy image was helpful in determining the presence of a lesion, especially w h e n the lesion was extraaxial, because manipulation of the w i n d o w and level often made the lesion more apparent. In this way, questionable findings could be further delineated with the softcopy format. Another advantage of the workstation was the reader's ability to alter the w i n d o w and level to visualize bone detail without having to look at another set of images, those obtained with the "bone window," as one must do w h e n looking at hard copies. The presence of a fracture may have helped in determining w h e t h e r extraaxial blood was present. Bone-window images were not included in the hard-copy readout sessions. During both the soft-copy and the hard-copy sessions, readers were allowed to observe only one image at a time, a situation that is not typical of a normal CT evaluation of the brain. In lieu of being able to look both above and below the level of a suspected abnormality, the readers were allowed to manipulate the soft-copy images to confirm the presence of abnormalities; this ability to manipulate images was another reported reason for the preference of soft-copy over hard-copy viewing.

Vol. 4, No. 2, February 1997

INTRAAXIAL

TABLE 2: A z Values for Evaluation of Intraaxial HighAttenuation Lesions, Intraaxial Low-Attenuation Lesions, and Extraaxial Lesions A: High-Attenuation Area

Reader Hard Copy Soft Copy 1 2 3 4 All

.99 .98 .99 .96 .97

(.01) (.10) (.01) (.02) (.01)

.99 .96 .93 .95 .95

(.01) (.03) (.07) (.06) (.02)

95% Confidence Difference Limit for Difference .00 .02 .06 .01 .02

-.014, -.035, -.071, -.113, -.021,

.025 .068 .200 .135 .074

B: Low-Attenuation Area

Reader Hard Copy Soft Copy 1 2 3 4 5 All

.98 (.01) .93 (.04) .94 (.03) .87 (.11) .98 (.02) .93 (.02)

95% Confidence Difference Limit for Difference

.95 (.04) .97 (.01) .94 (.03) .94 (.02) .98 (.03) .97 (.01)

.03 -.04 .00 -.07 .00 -.04

-.056, .068 -.118, .038 -.055, .069 -.325,-.020" -.052, .068 -.082, .001

C: Extraaxial Area

Reader

Hard Copy

Soft Copy

Difference

95% Confidence Limit for Difference

1 2 3 4 All

.63 (.11) .93 (.05) .92 (.05) .98 (.01) .96 (.03)

.83 (.08) .96 (.02) .87 (.08) .99 (.01) .94 (.03)

-.20 -.03 .05 -.01 .02

-.481,-.102" -. 144, .077 -.171, .291 -.037, .008 -.063, 115

AND EXTRAAXIAL

BRAIN LESIONS

any true difference b e t w e e n the t w o display modes. Although d e t e c t i o n of extraaxial fluid was a reasonable test of the workstation, the distinction b e t w e e n epidural and subdural location was difficult w h e n only one image was being viewed, regardless of the display m o d e that was selected. For this reason, extraaxial lesions w e r e most oRen missed by the observers. We recognize that d e t e c t i o n of s u b a r a c h n o i d hemorrhage, w h i c h looks very different from epidural and subdural hemorrhage, is more d e p e n d e n t on contrast resolution than edge delineation, even t h o u g h this abnormality was categorized as an extraaxial lesion based on anatomic location. The s u b a r a c h n o i d h e m o r r h a g e s used in the study all s h o w e d high attenuation, w h e r e a s the subdural h e m o r r h a g e s and epidural hemorrhages s h o w e d high attenuation, isoattenuation, or l o w attenuation dep e n d i n g on w h e t h e r they w e r e acute, subacute, or chronic, respectively. In conclusion, the results of this study confirm that diagnostic i n t e r p r e t a t i o n of digitized CT images at a w o r k s t a t i o n is at least as accurate as i n t e r p r e t a t i o n of hard copies. With the hard-copy format, the w i n d o w and level settings are p r e s e t at the time of scanning by the technologist, w h o may or may not see the lesion. W h e n a finding is questionable, the radiologist must occasionally go to the console and manipulate the image; this function is readily available on most digital-display workstations. ACKNOWLEDGMENTS

Note.--One standard deviation is given in parentheses. *Difference is significant at the 95% confidence level.

The lesions selected for this study w e r e c h o s e n to test the limits of attenuation resolution and edge resolution on the workstation, although these p a r a m e t e r s w e r e not m e a s u r e d quantitatively. Every a t t e m p t was made to select subtle lesions for inclusion. This task was difficult for the category of high-attenuation intraaxial lesions, b e c a u s e these lesions often are associated w i t h surrounding e d e m a that makes the lesions m o r e conspicuous. The high-attenuation lesions selected ranged from the relatively obvious calcification to the more subtle minimal p a r e n c h y m a l blood. Therefore, it is not surprising that the A z values w e r e so large. There w e r e some cases in w h i c h the readers observed only o n e abnormality on an image that contained m o r e than one finding. This error may have b e e n due m o r e to the r e a d e r ' s "satisfaction of search" than to

We a c k n o w l e d g e the staff of the UCLA Radiology Picture Archiving and C o m m u n i c a t i o n System. REFERENCES 1. Lou SL, Loloyan M, Weinberg WS, et al. CT and MR imaging acquisition performance in a neuroradiology PACS module: one year clinical experience (abstr). Radiology1990;177(P):320. 2. Lou SL, Huang HK. Assessment of a neuroradiology picture archiving and communication system in clinical practice. A JR 1992; 159:1321-1327. 3. Stuart BK, Aberle DR, Boechat MI, et al. Clinical utilization of grayscale workstations. IEEE Eng Med Biol 1993; 12:86-102. 4. Chang PJ, Ziegelbein K. Improved MR image interpretation accuracy and efficiency compared with that of film with use of a multiband cine workstation with temporal synch ronization (abstr). Radiology 1994; 193(P):225. 5. Rockette HE, King JL, Medina JL, et al. Imaging systems evaluation: effect of subtle cases on design and analysis of receiver operating characteristic studies. A JR 1995;166:679-683. 6. Metz CE. ROC methodology on radiologic imaging. Invest Radio11986;21 : 720-733. 7. Metz CE, Wang PL, Kronman HB. A new approach for testing the significance of differences between ROC curves measured from correlated data. In: Deconink F, ed. Information processing in medical imaging. The Hague, The Netherlands: Nijhoff, 1984;432-435. 8. Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Invest Radio11992;27:723-731.

95