ARTICLE IN PRESS The Breast (2006) 15, 528–532
THE BREAST www.elsevier.com/locate/breast
ORIGINAL ARTICLE
Computer-aided detection (CAD) of cancers detected on double reading by one reader only S. Ciatto, D. Ambrogetti, G. Collini, A. Cruciani, E. Ercolini, G. Risso, M. Rosselli Del Turco Centro per lo Studio e la Prevenzione Oncologica, Viale A. Volta 171, 50131, Florence, Italy Received 18 April 2005; received in revised form 12 August 2005; accepted 24 August 2005
KEYWORDS Breast cancer; Diagnosis; Screening; Mammography; Computer-aided detection
Summary We evaluated the role of computer-aided detection (CAD) in cancers undergoing double reading and detected by one reader only. A series of 33 cancers, originally missed by the first reader and detected by the second reader, and 75 negative controls were processed to assess CAD sensitivity, and was read by the six radiologists who originally missed the cancers with the help of CAD printouts. CAD case-based sensitivity, specificity and positive predictive value were 51.5%, 18.6% and 21.7%, respectively. Average sensitivity of all radiologists in all cancers in the series was 74.7%, being higher for CAD+ (86.2%) than for CAD (62.5%) cancers (Po0:01). When reading cancer cases that they had originally missed, radiologists had a sensitivity of 75.8%, which was higher for CAD+ (100.0%) than for CAD (58.3%) cancers. The average recall rate was 14.2%, the majority of recalls (45 out of 64) occurring for lesions marked by CAD. CAD may help in detecting at most half of cancers missed at a single reading but detected by a second reader. & 2005 Elsevier Ltd. All rights reserved.
Introduction Screening mammography is currently recommended, since it is effective in reducing breast cancer mortality, but its sensitivity is not optimal when measured by proficiency tests1,2 or according to the proportional incidence of interval cancer, which shows that approximately one in four cancers is missed by biennial screening.3 Diagnostic errors Corresponding author. Tel.: +39 055 5012214;
fax: +39 055 5001623. E-mail address:
[email protected] (S. Ciatto).
may be ascribed either to misperception or to inadequate analysis of radiological abnormalities.4–7 Double reading is currently recommended as a measure to improve diagnostic accuracy.8–11 Perception may be helped by computer analysis of digitised images with algorithms to identify mammography sites for a second review by the radiologist. Computer-aided detection (CAD) may be particularly indicated in screening, in which a high number of mammograms are read with a very low prevalence of radiological abnormalities, a scenario favouring loss of attention and fatigue, both impairing the radiologist’s perception.
0960-9776/$ - see front matter & 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.breast.2005.08.035
ARTICLE IN PRESS Computer-aided detection (CAD) of cancers detected on double reading by one reader only Retrospective and prospective studies have shown that computer-assisted reading is associated with a fair to moderate increase in sensitivity and a moderate drop in specificity12–18; retrospective studies on screen-detected or interval cancers17,18 suggest computer-assisted reading as a possible alternative to conventional double reading, as the accuracy of the two methodologies is similar. Double reading is aimed at detecting cancers that might be missed by a single reader. Such cases correspond to cancers detected by only one of the two readers performing routine double reading. Ideally, the performance of computer-assisted reading aimed at substituting conventional double reading should thus be tested in such cases. In the present study we selected a consecutive series of cancers detected by only one of the two readers performing double reading from the Florence screening programme archives. A sample of negative cases (controls) was mixed with screendetected cancers (cases). The series obtained was analysed by a CAD system to assess the proportion of cases and controls marked by CAD. Then the set was read in a blind fashion and with the help of CAD printouts by six of the seven radiologists (one was not available for the study) who had originally missed the cancers, to see if computer-assisted reading allowed proper detection. Based on the results observed, the implications of adopting computer-assisted reading as an alternative to double reading are discussed.
Material and methods The study series was prepared by one of the authors (SC) at the Centro per lo Studio e la Prevenzione Oncologica of Florence. The series consisted of 108 original mammograms drawn from the archives of the Florence city screening programme. Thirtythree (five of the 38 consecutive cases originally selected were excluded as the films were damaged or had indelible marks at the cancer site) were cancers that had been screen-detected by routine double reading and had been reported as negative by the first, had been detected by the second reader, and were designated as ‘‘cases’’. Seventyfive ‘‘controls’’, matching the cases by age (+ or 2 years) and examination date (year) were randomly selected from the screening archives according to a 2:1 case/control ratio (one of the 76 cases originally selected was excluded as the films were damaged). Cancer was excluded in controls as they had originally been reported as negative and remained unchanged on further biennial screening.
529
According to the screening protocol in the Florence programme, single oblique view mammography at repeat screening in non-dense breasts, in accordance with the radiologist’s judgment. Single oblique view mammography had only been performed in 10 cancers and in 23 controls. Overall, the series consisted of 75 cases with two views and 33 cases with one view, making a total of 366 films and an average of 1.69 films/breast or 3.38 films/ case. The CAD system tested in the present study (Second Look 6.0 software) was developed and is presently marketed by iCAD (Nashua, NH, USA). All original films in the test series were digitised with iCAD, the operators being blinded to the diagnostic outcome (case or control) by means of a specialised digitiser. Digitised images were submitted for computer analysis using iterative applications of embedded intelligence systems to identify mammography locations that warranted a second review of the films. For each case in the series, CAD printouts were produced, each showing the mammographic views and marks (if any) indicating the site for second review (calcifications and opacities were identified on the printouts with different marks), as selected by the computer. CAD printouts were compared with the final diagnosis by one of the authors (SC), checking each CAD mark with the cancer site, as indicated by the reader who had detected it, in order to identify true-positive and false-positive CAD marks. The sensitivity, specificity and positive predictive value (PPV) of CAD marking were determined on a film and on a lesion basis, on the overall series and separately for masses and microcalcifications. Six expert (at least 20,000 screening mammograms read) breast radiologists (A, B, C, D, E and F), currently involved in the Florence screening programme and who had performed the first reading and missed one or more cancers in the series, read the films, displayed on a rotating viewer, with the help of CAD printouts. The radiologists were blinded to the reading results until the study was completed. Readers were invited to indicate on a pre-defined paper scheme (showing the breast and the site of the abnormality) those cases which, according to current screening practice, they would have selected for diagnostic assessment for the presence of mammography abnormalities. Based on previous experience in CAD use, radiologists were recommended not to discount lesions that they saw and were concerned about, but that were not marked by CAD. CAD was intended to assist radiologists in detecting ‘‘additional’’ lesions for work-up, not for determining which lesions not to work up. Test results were evaluated in terms of
ARTICLE IN PRESS 530 sensitivity and recall rate (determined only in noncancer cases ¼ 1–specificity) for each reader and on average. Sensitivity was determined in all cancers as well as in cancers marked by CAD (CAD+) or not (CAD). Sensitivity for CAD+ and CAD cancers was also determined for single radiologists only in cancers they had originally missed on the first reading. The latter analysis was limited to 29 out of 33 cancers in the series, as of the seven radiologists originally missing the cancers on the first reading, one, who had missed four cancers, did not participate in the study. Statistical analysis of differences in the test results was based on the Chi-squared test (w2), statistical significance being set at P ¼ 0:05.
Results The study series consisted of 108 cases, including 33 cancers. CAD marked 253 sites for second review (an average 2.34 per case or 1.38 per film, with 87 microcalcifications and 166 opacities). CAD marked the cancer site in at least one view in 17 out of 33 cases with a case-based sensitivity of 51.5%. Casebased sensitivity was not different in cases with a single oblique view (5 out of 10 ¼ 50.0%) or in cases with two views (12 out of 23 ¼ 52.1%). CAD marked the control cases in at least one view in 61 out of 75 cases with a case-based specificity of 18.6%. The case-based PPV of a CAD marking was 21.7%. On a film and lesion basis, CAD marked a benign mass in 105 controls and in 45 cancer cases and a malignant mass in 16 cancer cases; the PPV for masses was thus 9.6%. On a film and lesion basis, CAD marked a benign cluster of calcifications in 50 controls and in 28 cancer cases, and a malignant cluster in nine cancer cases; the PPV for calcifications was thus 10.3%. At review, 53 abnormalities (41 opacities and 12 calcifications) were evident at the cancer site: on a film and lesion basis, CAD marked 16 out of 41 (39.0%) opacities and 9 out of 12 calcifications (75%) at the cancer site. Table 1 reports the results of the set of reading by six radiologists. Average sensitivity was 74.7% (range 57.5–93.9%) and was higher for CAD+ (86.2%, range 64.7–100.0%) than for CAD (62.5%, range 37.5–87.5%) cancers (w2 ¼ 13:5, df ¼ 1, Po0:01). When considering reports given by radiologists on cancers they had originally missed on the first reading, sensitivity was 75.8% (range 66.6–100.0%) and was higher for CAD+ (100.0%) than for CAD (58.3%, range 0.0–100.0%) cancers. The average recall rate was 14.2% (range 5.3–32.0%). The majority of recalls were for lesions marked by CAD (45 out of 64).
S. Ciatto et al.
Discussion The present study allows a further evaluation of the use of single computer-assisted reading as a possible substitute for independent double reading. Assuming that a single computer-assisted reading might replace a second reading, CAD would be of no help in cases of cancers detected by two readers, whereas it might help (a) in cases missed by both readers or (b) in cases missed by the first but detected by the second reader. The contribution of single computer-assisted reading in detecting cases missed by two readers has been investigated in several retrospective studies comparing independent double reading and single computer-assisted reading in series of negative cases interspersed with either screen-detected12–16 or interval cancers.17,18 CAD showed a good sensitivity in screendetected cancers and a substantially lower sensitivity in interval cancers, and no significant difference in sensitivity was observed when independent double reading and single computerassisted reading were compared. The limits of such retrospective simulation studies are essentially due to intrinsic biases of the retrospective, test-based study design, and the equivalence of single computer-assisted reading and conventional double reading in detecting interval cancers would require confirmation by controlled prospective studies, which are not yet available. The diagnostic performance of single CAD reading compared with conventional double reading should be ideally tested in cancers missed by one of two readers, and this is the first report, as far as we know, of an experiment using such a retrospective study design. The impact of CAD on diagnostic accuracy cannot be assessed, though it can be predicted to some extent, on the basis of the proportion of benign and cancerous lesions marked by the system (CAD accuracy); such values may be used to predict the maximum benefit of CAD in increasing sensitivity, as well as the maximum negative effect of increasing unnecessary recalls if we assume that all CAD marks would prompt referral for further diagnostic assessment. CAD is poorly specific and generates excess false-positive marks (approximately one per film), which almost obliges the reader to discount most marks in order to avoid unacceptable recall rates, unfortunately in some cancer cases too. Thus, the contribution of computer-assisted reading to increasing sensitivity is always lower than what could be predicted on the basis of CAD marking of cancer lesions. Proper evaluation should be based on the effect of CAD on actual recalls (computer-assisted reading accuracy). In a retrospective study like the
ARTICLE IN PRESS
14.2 58.3 (7/12) 100.0 (15/15) 75.8 (22/29)
(0/1) (1/2)
62.5 86.2 74.7 Mean
Sensitivity in cancers and recall rate in controls are reported for overall, CAD marked (CAD+), and CAD unmarked (CAD) cases. Only 29 of 33 cancers are considered as of the seven radiologists originally missing the 33 cancers in the set, one, who had missed four cancers, did not participate to the study.
19 45
1 10 4 1 2 1 66.6 100.0 0 — 0.0 50.0 100.0 100.0 100.0 100.0 100.0 100.0 75.0 100.0 66.6 100.0 66.6 85.7 (14) (12) (10) (8) (10) (6) 87.5 75.0 62.5 50.0 62.5 37.5 (17) (15) (14) (11) (16) (15) 100.0 88.2 82.3 64.7 94.1 88.2 (31) (27) (24) (19) (26) (21) 93.9 81.8 72.7 57.5 78.7 63.6 A B C D E F
CAD+ (17)
CAD (16)
(6/8) (4/4) (2/3) (2/2) (2/3) (6/7)
CAD+
(2/2) (2/2) (2/2) (2/2) (2/2) (5/5)
CAD
(4/6) (2/2) (0/1)
5.3 32.0 20.0 8.0 12.0 8.0
(4) (24) (15) (6) (9) (6)
3 14 11 5 7 5
CAD CAD+ Total Total (33)
Total
Sensitivity % for the radiologist originally missing the cancer (positive/missed cases) Sensitivity % on 33 total cancers (positive cases) Reader
Table 1
Results of CAD assisted reading of the study set by six radiologists.
Recall rate % on 75 negative controls (positive cases)
Computer-aided detection (CAD) of cancers detected on double reading by one reader only
531
present one, the interpretation of the results is further complicated by the test scenario (which usually influences the reader towards a higher sensitivity and a lower specificity) and by the fact that some readers had already seen (missed or detected) the cancers in the study set. Computerassisted reading in the present study only had the aim of confronting readers with cases they had missed as a first reader in current practice. CAD sensitivity was relatively low, as at least one abnormality in one view was marked in only 51.5% of the cancer cases, a much lower figure than the 94.1% observed for screen-detected cases in a previous study12 using a previous algorithm release. Such a difference may be explained by the fact that screen-detected cases in the latter study were mostly detected by two readers, probably because they were associated with more easily noticeable abnormalities, which are also more likely to be perceived and marked by CAD. A different level of sensitivity of CAD in ‘‘difficult’’ cases has often been reported in previous studies. CAD has a lower sensitivity for interval than for screen-detected cancers17,18 and among interval cancers CAD sensitivity varies from 90.9% for more evident lesions (reviewed as ‘‘screening errors’’) to 25.0% for less evident lesions (reviewed as ‘‘minimal signs’’).18 It must be noted that several cases in the present series had only one single oblique view. This might have been expected to bias CAD sensitivity towards lower values, as CAD is reported to be less sensitive when one single view instead of two views are available for analysis; nevertheless, this does not seem to have been the case in the present study, as CAD sensitivity was the same, irrespective of the number of views available. Computer-assisted reading sensitivity was 74.7% on average, which is not surprising as cancers in the series may have been ‘‘difficult’’ to see, but had in reality been detected, although by only one of the two readers. Average sensitivity was higher (86.2% vs. 62.5%) in CAD+ than in CAD cases, which may result from either CAD helping with detection or from the expected association of the CAD+ condition with more easily perceivable abnormalities. Although based on a more limited number of readings, a more interesting finding is evident when the radiologists’ performance was limited to cancers they had originally missed; the original false-negative report may be ascribed to a systematic error, due to a reader-specific lower sensitivity for a special abnormality pattern, which thus might also be difficult to perceive at review. In fact, the sensitivity difference between CAD+ and CAD cancers was more marked compared with the average (100% vs. 58.3%). Although a positive
ARTICLE IN PRESS 532 report in CAD+ cancers that were originally missed may also be due to other causes (intra-observer judgment variability, maximised sensitivity due to the test scenario), the difference observed is consistent with the supposed systematic error and suggests improved perception and detection in the presence of CAD marking. A case-based specificity for CAD of 18.6% was also low, confirming previous reports.12,17,18 This is not necessarily a major limitation of the method, as the aim of CAD is not diagnosis but alerting the reader to specific areas, with the obvious possibility of the reader discounting CAD marking and confirming a negative report. This certainly occurred in the present study. Although the fact that CAD marked more than one area per film and more than two areas per case caused a ‘‘noise’’ that must have caused an extra workload for the reader and possibly influenced the reader towards a lower specificity, the latter does not seem to have been a major effect as the recall rate among controls after computer-assisted reading was around 14% (not much higher than commonly reported for prevalent service screening), which was at least partially explained by the test scenario. However, an effect of CAD on specificity cannot be denied, as shown by the high prevalence of CAD+ cases among recalled controls. In conclusion, this study suggests that computerassisted reading may help the detection of cancers by radiologists who had previously missed the cancer on the first reading of screening mammograms. Even taking into account the influence on accuracy of the test scenario and of the expected intra-observer judgment variability at review, the greater differential sensitivity between CAD+ and CAD cancers observed in radiologists who had originally missed the cancer with regard to those who had not, suggests that CAD may help to avoid systematic perception errors and may improve single conventional reading sensitivity. However, in reality CAD marked only about a half of the cancers that had originally been missed by one but detected by the other reader on conventional double reading. This finding suggests some limitation in the use of computer-assisted reading as a substitution for conventional double reading.
References 1. Ciatto S, Ambrogetti D, Catarzi S, Morrone D, Rosselli Del Turco M. Proficiency test for screening mammography: results for 117 volunteer Italian radiologists. J Med Screen 1999;6:149–51.
S. Ciatto et al. 2. Ciatto S, Rosselli Del Turco M, Ambrogetti D, Catarzi S, Morrone D. Test per la valutazione dell’accuratezza diagnostica nella mammografia. Risultati di 103 test sostenuti da radiologi italiani. Radiol Med 1996;92:367–71. 3. Paci E, Ciatto S, Buiatti E, Cecchini S, Palli D, Rosselli Del Turco M. Early indicators of efficacy of breast screening programmes. Results of the Florence District Programme. Int J Cancer 1990;46:198–202. 4. Bird RE, Wallace TW, Yankaskas BC. Analysis of cancers missed at screening mammography. Radiology 1992;184: 613–7. 5. Harvey JA, Fajardo LL, Innis CA. Previous mammograms in patients with impalpable breast carcinoma: retrospective vs. blinded interpretation. AJR Am J Roentgenol 1993;161: 1167–72. 6. Kopans DB. Breast imaging, 2nd ed. Philadelphia: LippincottRaven; 1998. p. 793. 7. Dijck JAAM, Verbeek ALM, Hendriks JH, Holland R. The current detectability of breast cancer in a mammographic screening program: a review of the previous mammograms of interval and screen-detected cancers. Cancer 1993; 72:1933–8. 8. Ciatto S, Rosselli Del Turco M, Morrone D, Catarzi S, Ambrogetti D, Cariddi A, et al. Independent double reading of screening mammograms. J Med Screen 1995;2:99–101. 9. Kirkpatrick A, Thornberg S, Thissen AI. The European guidelines for quality assurance in mammography screening. Brussels: European Community, Europe Against Cancer Programme; 1992. 10. Perry N, Broeders M, de Wolf C, To ¨rnberg S, editors. European guidelines for quality assurance in mammographic screening. 3rd ed. Luxembourg: European Commission; 2001. p. 155–8. 11. Thurfjell EL, Lernevall KA, Taube AAS. Benefit of independent double reading in a population-based mammography screening program. Radiology 1994;191:241–4. 12. Ciatto S, Rosselli Del Turco M, Risso G, et al. Comparison of standard reading and computer aided detection (CAD) on a national proficiency test of screening mammography. Eur J Radiol 2003;45:135–8. 13. Malich A, Marx C, Facius M, Boehm T, Fleck M, Kaiser WA. Tumor detection rate of a new commercially available computer-aided detection system. ECR 2001; Abstract B0412. Eur Radiol 11: 2454–9. 14. Brem RF, Schoonjans JM, Hoffmeister J, Raza S, Baum JK. Evaluation of breast cancer with a computer-aided detection system by mammographic appearance, histology and lesion size. Radiology 2000;217(P):400. 15. Warren Burhenne LJ, Wood SA, D’Orsi CJ. Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology 2000;215: 554–62. 16. Freer TW, Ulissey MJ. Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. Radiology 2001;220: 781–6. 17. Ciatto S, Ambrogetti D, Bonardi R, Brancato B, Catarzi S, Risso G, et al. Comparison of two commercial systems for computer-aided detection (CAD) of breast cancer by screening mammography. Radiol Med 2004;107:480–8. 18. Ciatto S, Rosselli Del Turco M, Burke P, Visioli C, Paci E, Zappa M. Comparison of standard and double reading and computer-aided detection (CAD) of interval cancers at prior negative screening mammograms blind review. Br J Cancer 2003;89:1645–9.