Clinical Radiology (2005) 60, 623–626
REVIEW
Is radiologists’ volume of mammography reading related to accuracy? A critical review of the literature S.M. Moss*, R.G. Blanks, R.L. Bennett Cancer Screening Evaluation Unit, Institute of Cancer Research, Sutton, UK Received 15 November 2004; received in revised form 12 January 2005; accepted 20 January 2005
KEYWORDS Breast radiography; Cancer screening; Radiology reporting
The current UK quality assurance guidelines for radiologists in the NHS breast screening programme require those reporting screening mammograms to read a minimum of 5000 cases per year. We aimed to review the evidence for this and to assess whether there was justification for lowering the required level. A literature search was conducted to identify relevant studies where accuracy of reporting mammograms was related to reading volume. Three of the five studies reviewed suggested a positive association between reading volume and sensitivity, but there were few data on volumes above 5000 cases per year. The available evidence did not provide any basis for reducing the threshold volume. Further work is needed, in a UK or European setting, to study the relationship between reading volume and accuracy at higher volume levels and also the separate effects of reading volume and reading experience. q 2005 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.
Introduction Accuracy of reading of mammograms affects both cancer detection rates and false-positive rates. Current recommendations in England are that radiologists who report screening mammograms should read at least 5000 cases per year,1 and current European guidelines recommend a similar figure.2 Elsewhere, radiologists may read far fewer than this; for example, in the USA the Mammography Quality Standards Act of 1992 stipulates a minimum annual reading volume of 480 per annum. On average, radiologists in the UK read 5 to 7 times more cases than those in the USA.3 If the current UK minimum reading volume could be reduced, this might result in more radiologists
becoming eligible to report screening mammograms and provide greater flexibility for film reading. Following discussions with members of the NHSBSP Radiologists Quality Assurance Committee, we reviewed the available literature on the effect of volume of film reading on radiologists’ performance. In the UK, radiologists involved in breast cancer screening are encouraged to participate in a voluntary self-assessment programme (PERFORMSe)11, which involves reporting on a test set of cases. Such test sets of necessity include a higher proportion of cases than observed in real life. The majority of studies on this subject have used such test sets.
Methods and results * Guarantor and correspondent: S.M. Moss, Cancer Screening Evaluation Unit, Institute of Cancer Research, 15 Cotswold Road, Sutton SM2 5NG, UK. Tel.: C44 208 722 4191; fax: C44 208 770 0802. E-mail address:
[email protected] (S.M. Moss).
We used the reference lists of studies already known to us to identify other potentially eligible studies. A literature search using PubMed
0009-9260/$ - see front matter q 2005 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.crad.2005.01.011
624
S.M. Moss et al.
(unrestricted on date or language) identified no new eligible work. We only included studies where the accuracy of reporting of mammograms was related to reading volume. We did not include those that looked at factors such as radiologist experience (e.g., years of reading mammograms) but that did not also look at reading volume; nor did we include papers looking at case volume related to other outcome measures (such as treatment outcomes). Only five studies were identified, of which two were carried out in the USA,4,5 one in the USA and the UK,6 one in Canada7 and one in Italy.8 The main characteristics of the studies are described in Table 1. Four of the papers were based on PERFORMStype data sets, including varying proportions of cancers (11% to 43%). For these reports, accuracy in terms of sensitivity and specificity was assessed against known outcomes (non-cancer cases being confirmed by negative follow-up or biopsy). One of these studies8 was based on completion of a test set, mostly as part of a training course, and volume of clinical, not screening, mammograms read. One investigation7 was based on real-life reporting, and in this study standardized referral and cancer detection rates were compared with overall programme performance.
Table 1
Assessment of reading volume All papers included mammograms read per year (or month). The number of years over which this was assessed varied between 1,4 37 and lifetime reading;5 for two reports6,8 the number was not stated. Only the work that included UK radiologists6 provided much information on reading volumes of O5000 cases per year, with no subdivision of reading volumes above this figure. In one study, radiologists with reading volumes O9000 per year were specifically excluded7.
Outcome measures All four PERFORMS-type investigations calculated sensitivity as the proportion of cancers recommended for referral, and specificity (where calculated) as the percentage of non-cancers not referred. Two reports4,6 calculated ROC curves for individual radiologists; both of these calculated area under the ROC curves and sensitivity at specificity Z90% as measures of accuracy. The work from Italy,8 based on training data, looked at the
Studies of reading volume and accuracy: main characteristics
Paper
Year
Selection of films
Number of films (% cancers)
Gold standard
No. of radiologists (volume range) per year
Setting
Results/conclusions (readings per year)
Beam et al.4
2003
PERFORMStype
148 (43%)
110 (96% !5000)
USA
ROC curves against recent volume showed no linear relationship
Esserman et al.6
2002
PERFORMS 2
60 (22%)
1999
150 (11%)
Kan et al.7
2000
PERFORMStype Real life
UK, 194 (R600); USA, 19 lowvolume (%1200), 22 mediumvolume (1200– 3600), 18 highvolume (R3600) 117 (500–51,000)
UK and USA
Ciatto et al.8
Cancer/non-cancer confirmed by biopsy or 2-year follow-up PERFORMS reviewed by 5 experienced film readers. Cancer/ non-cancer confirmed by biopsy or 3-year follow-up Cancer/non-cancer confirmed by biospy/follow-up N/A
Elmore et al.5
1998
Sensitivity at a 90% specificity: 0.785 for UK readers (O5000) 0.756 for high-vol USA readers (O3600) 0.702 for med-vol USA readers (2400) 0.648 for low-vol USA readers (!1200) Readings per year/test passes: !1000, 18%; 1000– 2000, 45%; O2000, 58% SCDR 0.89 (!2000), 0.96 (2000–2999), 0.99 (3000– 3999), 1.07 (4000–5199). SAIR 1.15 (!2000), 0.81 (2000–2999), 0.78 (3000– 3999), 1.09 (4000–5199) Total lifetime mammograms, but not mammograms/year associated with cancer detection rate
PERFORMStype
?
150 (18%)
123 non-cancer cases confirmed by biopsy, mammogram or both at 3 years
Italy
35 (2000–5199)
British Columbia, Canada
10 (200–5000)
Connecticut and New York
SCDR, age-standardized cancer detection rate; SAIR, age-standardized abnormal interpretation (recall) rate; N/A, not applicable.
Is radiologists’ volume of mammography reading related to accuracy?
proportions of readers reaching sensitivity O80% and recall rate !15%. The real-life study7 compared observed recall and cancer detection rates with those expected according to the proportions of prevalent/incident screens in different age groups, based on data from the whole programme.
Other factors Four of the studies4–6,8 looked at other factors, including years of experience of reading mammograms, although one6 presented no data on this.
Results Of the four PERFORMS-type studies, two4,5 showed no significant relationship of either sensitivity or specificity with reading volume. However, both of these papers assumed a linear relationship between volume and the chosen outcome measure. The report on a sample of 110 USA radiologists4 showed considerable variation in accuracy, and concluded that current reading volume was not related to interpretive accuracy. This work estimated that a 1% increase in accuracy, measured by Am (the area under the ROC curve estimated non-parametrically), was associated with an increase in annual reading volume of 3000 mammograms. However, the reading volume was only reported for the previous year, and approximately 20% of the radiologists had not read the USA minimum of 480 films per year. The average volume of films read over a number of years is likely to be a more useful measure. In addition, very few of the radiologists in this study had read more than 5000 films in the previous year. The research therefore provides relatively few data relevant to a UK setting. Another US investigation5 included only 10 radiologists, whereas the range of mammograms read per year was 200 to 5000; this would therefore have had little power to show significant differences. By comparison, the report by Esserman6 showed a significant positive association between reading volume and sensitivity, but no clear association with accuracy as measured by area under the ROC curve or with specificity. Here, USA radiologists were grouped as either low-volume, mediumvolume or high-volume readers (%100, 101 to 300, and R301 mammograms per month, respectively). The UK radiologists formed a fourth (UK highvolume) group. The average percentage of cancers detected by the four groups was 71.5%, 69.0%, 78.6% and 83.5%, respectively. The inclusion of the
625
UK high-volume readers substantially increased the range of volumes and also provided a group likely to have fewer annual fluctuations in their reading volumes. The results of this paper support the view that a high volume of film reading is related to greater accuracy. The Italian study8 found a significant increase in the proportion of radiologists with sensitivity O80% with increasing reading volume; however, as discussed above, the volume did not relate to screening mammograms. The Canadian work7 showed an increasing trend in standardized cancer detection ratio with increasing reading volume, but based on small numbers (calculated using only 1 year’s data) and not statistically significant. This research showed a significant curvilinear relationship between standardized abnormal interpretation ratio and reading volume, with lowest values in the medium-volume groups.
Other factors Of the studies that looked at years of experience of reading mammograms, only one8 reported a significant association with percentage of radiologists passing the test. The two papers5,8 that looked at total lifetime mammograms read reported a significant positive association with accuracy.
Discussion/conclusions There is relatively little evidence on the association between reading volume and accuracy of reading of mammograms. Most of the studies reviewed here used PERFORMS-type tests which included a higher proportion of cancers than would be observed in real life. Three of the five papers suggested a positive relationship between reading volume and sensitivity; the remaining two presented few data on volumes greater than 5000 per year. One of the investigations 4 that reported no association between volume and accuracy included a particularly high proportion of cancers (43%). Results from test sets may be subject to context bias, whereby interpretation is biased by knowledge of likely prevalence of disease.9 For example, where a test set is based on a very high proportion of cancers, high recall of normal films may occur, resulting in apparently low specificity.10 It is also likely that differences between readers may be underestimated. It is difficult to make recommendations for the
626
UK on the basis of the above work. By definition, all UK screening radiologists involved are high-volume readers, reading at least 5000 cases per year. By contrast, USA readers tend to be lower-volume readers, and their accuracy may be more related to other factors such as years of experience. The studies reviewed provide little evidence on the association between reading volume and accuracy at volumes higher than 5000 per year. There is some evidence of an association between volume at lower levels and accuracy (both sensitivity and specificity), but any independent effect of years of experience cannot be assessed from the results presented. Our interpretation of the current available evidence is that it does not provide any basis for reducing the threshold of 5000 mammograms per year for screening radiologists in the UK. Further work is needed in a UK or European setting, looking in particular at the relationship between reading volume and accuracy at higher volume levels, and at the separate effects of reading volume and reading experience. As radiographers are now working as film readers in some screening programmes, the appropriate reading volume of these staff should also be evaluated. Detailed studies investigating readers with a range of volumes from 1000 to 10,000 per year would provide more useful evidence, but would need to be conducted in a setting other than the UK.
References 1. NHS Breast Screening Radiologists Quality Assurance Committee. Quality assurance guidelines for radiologists. Sheffield, UK: NHSBSP Publications; 1997.
S.M. Moss et al.
2. Perry N, Broeders M, Wolf C de, To ¨rnberg S, eds. European guidelines for quality assurance in mammography screening. 3rd Ed. Luxembourg, European Commission, Europe Against Cancer programme, 2001. 3. Smith-Bindman R, Chu PW, Miglioretti DL, Sickles EA, Blanks R, Ballard-Barbashh R, et al. Comparison of screening mammography in the United States and the United kingdom. JAMA 2003;290:2129—37. 4. Beam C, Conant E, Sickles EA. Association of volume and volume-independent factors with accuracy in screening mammogram interpretation. J Natl Cancer Inst 2003;95: 282—90. 5. Elmore J, Wells C, Howard D. Does diagnostic accuracy in mammography depend on radiologists’ experience? J Womens Health 1998;7:443—9. 6. Esserman L, Cowley H, Eberle C, Kirkpatrick A, Chang S, Berbaum K, et al. Improving the accuracy of mammography: volume and outcome relationships. J Natl Cancer Inst 2002; 94:369—75. 7. Kan L, Olivotto IA, Warren Burhenne LJ, Sickles EA, Coldman AJ. Standardized abnormal interpretation and cancer detection ratios to assess reading volume and reader performance in a breast screening program. Radiology 2000; 215:563—7. 8. Ciatto S, Ambrogetti D, Catarzi S, Morrone D, Rosselli Del Turco M. Proficiency test for screening mammography: results for 117 volunteer Italian radiologists. J Med Screen 1999;6:149—51. 9. Egglin TK, Feinstein AR. Context bias. A problem in diagnostic radiology. JAMA 1996;276:17525. 10. Blanks RG, Given-Wilson RM, Moss SM. Efficiency of cancer detection during routine repeat (incident) mammographic screening: two versus one view mammography. J Med Screen 1998;5:141—5. 11. Gale AG. PERFORMS—a self assessment scheme for radiologists in breast screening. In Seminars in Breast Disease: Improving and monitoring mammographic interpretative skills, 2003;6:148—152.