A comparison of expert and novice performance in the detection of simulated pulmonary nodules

A comparison of expert and novice performance in the detection of simulated pulmonary nodules

Radiography (2000) 6, 111–116 doi:10.1053/radi.1999.0228, available online at http://www.idealibrary.com on A comparison of expert and novice perform...

269KB Sizes 0 Downloads 14 Views

Radiography (2000) 6, 111–116 doi:10.1053/radi.1999.0228, available online at http://www.idealibrary.com on

A comparison of expert and novice performance in the detection of simulated pulmonary nodules D. Manning, MSc, PhD, TDCR*, J. Leach, BSc, MSc, PhD† and S. Bunting, BA* *Department of Radiography & Imaging Science, St. Martin’s College, Lancaster, U.K.; †School of Engineering, Computing & Applied Mathematical Sciences, Lancaster University, Lancaster, LA1 3JD, U.K. (Received 8 April 1999; revised 12 October 1999; accepted 9 December 1999)

Key words: radiography reporting; skill mix; education; image perception; ROC.

Sixteen student radiographers in the third year of a degree programme and four experienced radiologists took part in a test of their ability to detect pulmonary nodules in images of a chest phantom. No special training in radiography reporting skills was given to the students so the tests demonstrated the untutored ability of novice radiographers compared with radiologists. Overall performance of the students in lung nodule detection gave a mean ROC Az value of 0.851 normalized to the radiology reports and a mean ROC Az =0.742 (SD =0.052) against ground truth. Radiologists achieved a mean ROC Az =0.871 (SD =0.0118) against ground truth by comparison. We comment on some aspects of the students’ performance and offer suggestions on training strategies which may help in the acquisition of some of the visual skills required to perform such tasks. We suspect that the visual search requirement of complex images is one factor which may result in poor performances in observers untrained in radiographic reporting and we recommend further studies into this area by tracking the eye movements of novices and experts during film viewing. © 2000 The College of Radiographers

Introduction Traditionally radiologists have been responsible for the decision making process in the interpretation of medical images. However, the roles and workloads carried out by radiologists have expanded so that now other professionals such as radiographers are beginning to take on some interpretation tasks in medical imaging [1–3]. As a step in gaining baseline data for these possible developments we carried out an investigation to determine the reporting ability of student radiographers on an existing degree programme.

Aim The purpose of the work was to determine the untutored ability of radiographers on an under1078–8174/00/020111+06 $35.00/0

graduate course to carry out a detection task on a complex image. This provided the opportunity to comment on possible sources of detection error relative to expert reporting performance.

Materials and methods The radiographic reporting task Chest radiography is the most frequently requested imaging examination in medical practice [4] and pulmonary nodule detection is an example of a possible screening task [5]. Fifty images of an anthropomorphic chest phantom were produced and half contained simulated pulmonary nodules which were superimposed on the lungfields in a variety of locations (as shown in Fig. 1). The 50:50 normal/abnormal ratio was not revealed to the © 2000 The College of Radiographers

112

readers and it provided the ideal ratio for receiver operator characteristic analysis of the results [6]. An unrevealed maximum of five simulated lesions was introduced into the positive images. The lesions were designed on the results of threshold experiments previously reported [7], with nodule diameters ranging from 5 mm to 8 mm. All were visible but were situated in the lung and mediastinal areas at various locations to give a range to their conspicuity.

Subjects Sixteen students, made up of 12 females and four males with an age range 20–35 years, carried out the detection task. The reference standard

The reference standard was an absolute gold standard because ground truth was known for all locations, sizes and the number of inserted nodules in the anthropomorphic phantom for each image. However, to give some meaningful context to the student performance, four experienced radiologists also performed the task of detecting the lung nodules. Their mean receiver operating characteristic (ROC) score was 0.89 with a standard deviation (SD) of 0.013 indicating the degree of inter-observer variation.

Manning et al.

on a rating scale of the form commonly used in ROC experiments [6]. No feedback was given to the subjects. Receiver operating characteristic (ROC) methodology [6] A single figure value for the area under the ROC curve (Az) was calculated for each student by the method described by Hanley and McNeil [10]. The ROC methodology used for the chest data was the free-response technique (FROC) first described by Egan et al. [11] and refined by Chakraborty and Winter [12]. The area under the ROC curve therefore gave a direct measure of student performance in the detection of the lesions.

Results ROC analysis for lung nodule detection All the results are presented in Table 1. The radiologists’ mean ROC (Az) value and its standard deviation are shown at the foot of Table 1. Comparison between the students and the radiologists in terms of their absolute ROC (Az) scores, their sensitivity and specificity, are illustrated in Fig. 2. The results are presented again in Fig. 3 with the student values normalized to the mean radiology scores as a gold standard. Error bars in these charts indicate the standard deviations in each case. A one tailed t-test was performed on the Az values of the ROC scores for the two groups.

Viewing procedure Students were presented with the images in a single session limited to 40 min in order to reduce the effects of fatigue on performance [8]. Each student was tested separately but viewing conditions were standardized to a darkened room with a single viewing box and readers were allowed to choose their preferred viewing distance [9]. All students chose to view the images at a distance of between 0.7 and 1.0 m. Information and instructions about the nature of the task were given before the reading session. The subjects were shown an example of the lung lesion and were told how they should indicate their decisions. They were asked to indicate the location of the suspected nodular lesion and to score their level of confidence

Discussion The film series for the study was designed as a difficult but realistic problem of radiographic interpretation. The degree of difficulty is indicated in the relatively low sensitivity scores for the radiologists which suggest that even for experienced readers the nodules were hard to locate. The mean absolute ROC (Az) score for students was 0.742 (SD <0.052) which was significantly lower (P<0.05) than the radiologist’s (Az) score of 0.891 (SD <0.013). The types of error made by the student radiographers in this task were notably different from those reported for novice or trainee readers elsewhere. In particular, the sensitivity was

Detection of simulated pulmonary nodules

113

Table 1. ROC results, sensitivity specificity for student radiographers and radiologists

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Mean SD

Radiologist

1 2 3 4 Mean SD

Student scores—lung nodules ROC Az

Sensitivity

Specificity

0.6159 0.6939 0.7154 0.7205 0.7273 0.7284 0.7292 0.7297 0.7405 0.7434 0.7441 0.7575 0.7611 0.7929 0.8153 0.8524 0.7417 0.0522

0.42 0.54 0.49 0.55 0.56 0.42 0.6 0.55 0.58 0.58 0.58 0.6 0.6 0.61 0.62 0.65 0.559375 0.0657742

0.89 0.75 0.8 0.71 0.66 0.89 0.59 0.73 0.41 0.62 0.4 0.52 0.55 0.48 0.61 0.63 0.64 0.15024425

Radiologist scores—lung nodules Roc Az

Sensitivity

Specificity

0.9055 0.8966 0.8845 0.87695 0.89089 0.01267

0.81 0.79 0.77 0.72 0.7725 0.0386221

0.92 0.91 0.89 0.91 0.9075 0.012583057

Student scores in lung nodule detection: normalized to the mean Radiology Gold Standard ROC Az

Sensitivity

Specificity

0.8

0.7241

0.705

Figure 1. Part of the left lung is shown with one of the simulated nodules indicated.

1 0.9 Relative performance

Student

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

unexpectedly low for novice reporters, where the tendency in other studies was towards a high false-positive rate, both in simple tasks such as fracture detection [2, 3] and in mammography [13]. Combined with their low specificity, more typical of novice observer performance [2, 3], the result was an overall ROC value showing diagnostic ability lower than that required in clinical practice. Significantly, the performance was lower than the standard expected in the assessment of postgraduate radiographers on formal reporting

ROC

Sensitivity

Specificity

Figure 2. Detection of lung nodules—absolute performance of students ( ) and radiologists ( ). Note the low sensitivity score for the students indicating their poor identification of the lesions.

courses in skeletal radiography. There is at least a 90/90 requirement in sensitivity/specificity compared with radiology as the gold standard in our experience of such courses. It is important to appreciate the various levels of the visual task involved in this experiment. Pure detection of a target feature is simply the awareness of a local change in light intensity or energy, but in the chest radiograph there are many changes

114

Manning et al.

1

Relative performance

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

ROC

Sensitivity

Specificity

Figure 3. Detection of lung nodules—student performance ( ) normalized to radiology gold standard ( ). In a real situation the performance of new film readers will always be compared with radiologists, as shown here.

in optical density created by the ‘structured noise’ of the lung field anatomy. Small changes in the location of a nodule can affect its recognition dramatically [14], suggesting that local anatomical interference plays an important role [15]. Detection of a particular feature may take place, but the decision on its significance as a ‘signal’ depends on higher order cognitive processes. Recognition is necessary in order to place the signal in a particular broad class of objects, which may then lead to identification, where the observer decides that the object is indeed a specific one [8, 16]. Radiology decisions are based on a perceptual process which involves considerable expert knowledge of whether image appearances are normal, variations on the normal, or pathological. We can only speculate on the sources of the inaccuracy demonstrated by the students’ poor ROC performance in this study, but we can suggest the following: (i) some lesions were missed because although detectable, their size and contrast put them below the threshold signal-to-noise ratio necessary for higher order processing. The lesions were designed on the results of threshold experiments [7] but the structured noise of the normal lung anatomy reduced the conspicuity of some nodules. It has been shown [8] that although high contrast features can be detected without a minimum size threshold, even quite large sized objects have distinct lower level contrast thresholds for their identification and recognition. Although relevant structures may be seen down to 0.7 mm, it is known that for focal lesions, visualization and recognition of their significance

by experienced radiologists requires them to have a diameter of at least 4.0 mm [14, 17]. This indicates a cognitive requirement beyond that of simple detection. The cognitive problem being offered to the student radiographers was possibly just too complex for their experience. Training radiographers to carry out such tasks could perhaps benefit by making use of many visual examples staged to demonstrate a gradual increase in difficulty due to detail overlying items of interest. Feedback on the significance of these items would then be an important aspect of cognition or understanding related to the visual problem. Computed radiography can provide an opportunity in this area by allowing for teaching strategies using visual prompting or ‘zooming’ of areas of interest. (ii) some lesions were missed because of an inadequate search. The task presented to the subjects demanded a substantial visual search. Peripheral vision performance is intimately related to visual search and observers are known to operate on any of a set of visuo-spatial fields known as ‘visibility lobes’ [18]. These lobes are a description of the probability of detection of a standardized object at a given distance (in radians) from the fovea of the retina. The most appropriate set of visual lobes depends on the level of visual task at which the operator is working for the search in question. For a given size of visual lobe the probability of detection of a signal in a single glimpse is roughly inversely proportional to the area being searched [8]. The size of the search area in the chest radiographs forced a long inspection time at the viewing distances chosen by the subjects. The students could have used a strategy that divided the image into smaller sectors for search which may have resulted in a higher detection rate. This could have been performed mentally or with the help of a mask and window to section off parts of the image in turn or even with the magnifying glass made available for the procedure. There may be some advantage in training and instructing students to adopt this type of strategy when searching for specific items. Hughes et al. [5] taught their subjects a systematic method for perusal of chest images. Students were found to improve their sensitivity and specificity after adopting the search technique, although it remains uncertain whether the improvement could be partially due to increased experience and familiarity with the task. Further studies on gaze

Detection of simulated pulmonary nodules

duration, and direction [19–21] might help to clarify this point. Alternatively groups could be compared for diagnostic performance in control trials related to search training programmes. The students in our study were probably not familiar enough with the visual problem to devise any such method independently. Notably, all students chose to view the images at a distance of between 0.7 and 1.0 m. The chosen distance no doubt corresponded to their rest state accommodation which is an important consideration for maintaining performance in a prolonged inspection task such as this. However, it determined a larger search area than would have been necessary at greater viewing distances. The trade-offs between retinal angle for the signal, the image blur and size of the search area should be appreciated in schemes of instruction for this type of visual task. (iii) normal anatomy was wrongly identified as a pulmonary nodule. The source of error in this case was failure of recognition rather than failure of detection which further implies the cognitive nature of the task. Visual ‘clutter’, the structured noise of anatomical details in a complex image, has a masking effect on diagnostic signs [22]. An important component in the expert skill of radiological performance is familiarity with the appearance of normal anatomy and its variations. This skill is acquired by supervised exposure to a wide range of these variants during a training period and it is both maintained and refined by continued experience. For radiographers to achieve a similar standard of detection from complex images, even in limited areas of reporting, their education will need to find ways of accelerating this process to fit their shorter training time.

Conclusions The ROC performance of the students in the detection of pulmonary nodules in chest radiographs was predictably poorer than that of radiologists. The students were given no special coaching in these tasks and qualitatively the outcome was expected; but the insights shown into the types of error and the general uniformity of their performance may be instructive in planning course developments in radiography education. We plan to follow this study with an investigation which tracks the eye movements of novice and expert

115

readers to give a clearer and more quantitative understanding of some of the sources of error on which we speculate in this paper. It is clear that care must be taken with role changes in medical imaging so that reader education is based on an understanding of the mechanisms involved in visual decision making. These educational developments would be assisted by further studies of the kind described here which investigate the ways in which expert readers perform their skills and novices learn how to acquire them. Acknowledgement We wish to thank the student radiographers of King’s College London, Dr P. Gishen of King’s College Hospital, London and Dr W. Wall of the Royal Lancaster Infirmary, Lancaster for their help with this project.

References 1. Anderson J. Red dots enhance the job. Radiography Today 1991; 57: 654. 2. Renwick IGH, Butt WP, Steele B. How well can radiographers triage X-ray films in accident and emergency departments? Br Med J 1991; 302: 568. 3. Loughran CF. Reporting fracture radiographs by radiographers: the impact of a training programme. Br J Radiol 1994; 67: 945–50. 4. Marshall NW, Faulkner K, Busch HP, Marsh DM, Pfenning H. An investigation into the radiation dose associated with different imaging systems for chest radiography. Br J Radiol 1994; 67: 353–9. 5. Hughes H, Hughes K, Hamill R. A study to evaluate the introduction of a pattern recognition technique for chest radiography by radiographers. Radiography 1996; 2: 263–88. 6. Chesters S. Human visual perception and ROC methodology in medical imaging. Phys Med Biol 1992; 37: 1433–76. 7. Manning DJ, Lewis C. Simulation of pulmonary lesions for signal detection tasks. Proc Roentgen Centenary Conference 1995; 124 British Institute of Radiology. 8. Overington I. Physiological conditions for the effective interpretation of radiographic images. Physiological conditions for the effective interpretation of radiographic images. BIR Report 1989; 18: 129–35. British Institute of Radiology. 9. Owens DA. A comparison of accommodation responsiveness and contrast sensitivity for sinusoidal gratings. Vision Research 1980; 20: 159–67. 10. Hanley JA, McNeil BJ. The meaning and use of the area under the ROC curve. Radiology 1982; 143: 29–36. 11. Egan JP. Signal detection theory and ROC analysis. New York: Academic Press, 1975: 74–5.

116

12. Chakraborty DP, Winter LHL. Free response methodology: alternate analysis and a new observerperformance experiment. Radiology 1990; 174: 873–81. 13. Pauli R, Hammond S, Cooke J, Ansell J. Radiographers as film readers in screening mammography: an assessment of competence under test and screening conditions. Br J Radiol 1995; 69: 10–14. 14. Brogden BG, Kelsey CA, Moseley RD. Factors affecting the perception of pulmonary lesions. Radiologic Clin N Am 1984; 21: 633–54. 15. Samei E, Flynn MJ, Eyler WR, Peterson E. The effect of local background anatomical patterns on the detection of subtle lung nodules in chest radiographs. SPIE Medical Imaging 1998; 3340: 44–54. 16. Ratches JA. Static performance model for thermal imaging systems. Optical Engineering 1976; 15: 525–30. 17. Stender HS, Oestman JW, Freyschmidt J. Perception of image details and their diagnostic relevance. BIR Report 1989; 20: 14–19. British Institute of Radiology. 18. Inditsky B, Bodman HW, Fleck HJ. Visual performance: contrast metric: visibility lobes—eye movements. Lighting Research and Technology 1982; 14: 218–31.

Manning et al.

19. Kundel HL, Nodine CF, Toto LC. Eye position study of the effects of verbal prompt and pictorial backgrounds on the search for lung nodules in the chest radiograph. Image perception and performance. Proc SPIE Medical Imaging 1999; 3663: 122–8. 20. Wooding DS, Roberts GM, Phillips-Hughes J. Development of eye movement response in the trainee radiologist. Image perception and performance. Proc SPIE Medical Imaging 1999; 3663: 136–45. 21. Cowley HC, Gale AG. Breast cancer screening: comparison of radiologists’ performance in a self assessment scheme and in actual breast screening. Image perception and performance. Proc. SPIE Medical Imaging 1999; 3663: 157–68. 22. Eckstein MP, Abbey CK, Whiting JS. Human versus model observer performance in anatomical backgrounds. Proc SPIE Medical Imaging 1998; 3340: 16–26.