European Journal of Radiology 85 (2016) 808–814
Contents lists available at ScienceDirect
European Journal of Radiology journal homepage: www.elsevier.com/locate/ejrad
Impact on the recall rate of digital breast tomosynthesis as an adjunct to digital mammography in the screening setting. A double reading experience and review of the literature Luca A. Carbonaro a , Giovanni Di Leo a , Paola Clauser b,c,∗ , Rubina M. Trimboli a , Nicola Verardi a , Maria P. Fedeli d , Rossano Girometti b , Alfredo Tafà e , Paola Bruscoli e , Gianni Saguatti e , Massimo Bazzocchi b , Francesco Sardanelli a,f a
Radiology Unit, Research Hospital Policlinico San Donato, San Donato Milanese, Milan, Italy Institute of Radiology, Department of Medical and Biological Sciences, University of Udine, Udine, Italy c Department of Biomedical Imaging and Image-guided Therapy, Division of Molecular and Gender Imaging, Medical University of Vienna, Austria d Scuola di Specializzazione in Radiodiagnostica, Università degli Studi di Milano, Milan, Italy e Unità Operativa di Senologia, Ospedale Maggiore, AUSL Bologna, Bologna, Italy f Dipartimento di Scienze Biomediche per la Salute, Università degli Studi di Milano, Milan, Italy b
a r t i c l e
i n f o
Article history: Received 20 September 2015 Received in revised form 12 January 2016 Accepted 13 January 2016 Keywords: Breast cancer Digital breast tomosynthesis (DBT) Digital mammography (DM) Recall rate Screening population based
a b s t r a c t Objectives: To estimate the impact on recall rate (RR) of digital breast tomosynthesis (DBT) associated with digital mammography (DM + DBT), compared to DM alone, evaluate the impact of double reading (DR) and review the literature. Methods: Ethics committees approved this multicenter study. Patients gave informed consent. Women recalled from population-based screening reading were included. Reference standard was histology and/or ≥1 year follow up. Negative multiple assessment was considered for patients lost at follow up. Two blinded readers (R1, R2) evaluated first DM and subsequently DM + DBT. RR, sensitivity, specificity, accuracy, positive and negative predictive values (PPV, NPV), were calculated for R1, R2, and DR. Cohen and 2 were used for R1-R2 agreement and RR related to breast density. Results: We included 280 cases (41 malignancies, 66 benign lesions, and 173 negative examinations). The RR reduction was 43% (R1), 58% (R2), 43% (DR). Sensitivity, specificity, accuracy, PPV and NPV were: 93%, 67%, 71%, 33%, 98% for R1; 88%, 73%, 75%, 36%, 97% for R2; 98%, 55%, 61%, 27%, 99% for DR. The agreement was higher for DM + DBT (=0.459 versus =0.234). Reduction in RR was independent from breast density (p = 0.992). Conclusion: DBT was confirmed to reduce RR, as shown by 13 of 15 previous studies (reported reduction 6–82%, median 31%). This reduction is confirmed when using DR. DBT allows an increased inter-reader agreement. © 2016 Elsevier Ireland Ltd. All rights reserved.
1. Introduction Clinical trials have shown that screening mammography is able to reduce mortality from breast cancer [1]. Even though, there is still intense discussion and criticism regarding screening programs [2]. Being a two-dimensional imaging method, digital mammography (DM) has several limitations: small lesions can be hidden by dense
∗ Corresponding author at: Department of Biomedical Imaging and Imageguided Therapy, Division of Molecular and Gender Imaging, Medical University of Vienna/General Hospital Vienna, Waehringer Guertel 18-20, Vienna, Austria. E-mail address:
[email protected] (P. Clauser). http://dx.doi.org/10.1016/j.ejrad.2016.01.004 0720-048X/© 2016 Elsevier Ireland Ltd. All rights reserved.
breast parenchyma and thus cancers can be missed, especially in women with dense breasts [3]. On the other hand, the superimposition of normal breast glandular tissue can create false images, leading to a high percentage of unnecessary recalls. Furthermore, with the introduction of screening mammography, the diagnosis of lesions with unknown or low clinical significance has increased, raising the issues of overdiagnosis and overtreatment [4]. Recalling patients for further examinations (additional mammographic views, ultrasound, magnetic resonance imaging, or even biopsy) is cause of anxiety for the women and determines significant additional costs for screening programs [5]. According to European and U.S. guidelines, recall rates should be kept beneath 7% and 12%, respectively [6,7]; these different thresholds are due to
L.A. Carbonaro et al. / European Journal of Radiology 85 (2016) 808–814
the variability in clinical practice, with higher recall rates in the U.S., as compared to European countries [8]. The site where mammography is interpreted has also been found to influence the number of women recalled [9]. These differences are not associated with a comparable variability in detection rate, which does not improve with higher recalls [10]. Digital breast tomosynthesis (DBT), used in association with DM, is able to detect a higher number of cancers, compared to DM alone. In particular, initial result of trials, both in Europe and in the U.S., showed that DBT in the screening setting is able to enhance the number of invasive malignant lesions diagnosed. At the same time, thanks to its capability to obtain various images of the same breast and reduce the effect of tissue superimposition, studies showed that DBT is able to reduce the number of women recalled because of unspecific findings and false positives mammographic images [11]. In this study we aimed at evaluating the effect on recall rate of two-view DBT in association with DM, as compared to DM alone. Furthermore, we assessed inter-reader agreement and whether breast density influenced the effect of DBT on recall rates. Finally, we performed a review of the literature published on this topic.
2. Materials and methods 2.1. Study population and screening reading protocol This multicenter prospective study obtained the ethics committee approval at all three centers involved, and the patients included signed a written informed consent. Between January 2012 and December 2013 all women recalled from the screening program and afferent to the involved institutions for diagnostic work-up could be enrolled in this study. In two of the three regions involved, screening mammography is offered every second year to women between 50 and 69 years of age. In the third region, screening mammography is offered annually from 45 to 49 years and every second year from 50 to 75 years. In all three centers involved, the readings for population based screening program were performed by two independent readers. When both readers identified a finding deserving further characterization, the woman was recalled. When the opinion of the two readers was discordant, the decision to recall or not the woman was taken either with a consensus reading or by showing the examination to a third, blinded, radiologist. The screening readers of the three centers were not involved in the readings of the current study. Exclusion criteria were: lack of written informed consent, mammography performed in symptomatic patients, breast implant in the breast with the suspicious finding(s), and pregnancy. The final study population consisted of women recalled from the regional screening programs that agreed in participating in the study and had no exclusion criteria.
2.2. DBT acquisition protocol Patients recalled from the screening program who agreed in participating in the study underwent, after bilateral DM, DBT of the breast harboring the finding that prompted the recall. Each exam was performed on both standard views (cranio-caudal and mediolateral oblique). The same mammographic unit (Giotto Tomo, Internazionale Medico-Scientifica, Bologna, Italy) was used at each of the three centers, equipped with an amorphous-selenium digital detector (ANRAD), with a sensitive area of 24 × 30 cm2 and a squared pixel pitch of 0.085 mm, resulting in an image size of 2816 × 3584 pixels. Images of the included cases were anonymized and stored in a dedicated workstation.
809
2.3. Study reading protocol and image analysis When the case collection was completed, two independent breast radiologists not involved neither in the population-based screening nor in the diagnostic work-up at any of the three centers, evaluated all images. The first reader (R1) had a 4-years experience in breast imaging, including reading of screening mammograms; the second reader (R2) had a 6-years experience in clinical breast imaging. Both radiologists had clinical experience with DBT. Readers were blinded to the patients’ clinical data and reference standard, but aware of the source of the cases (regional screening program). For each case, readers had to evaluate first DM alone and subsequently DM in association with DBT. For both reading modalities (DM alone and DM with DBT), readers were asked to evaluate the images as if they were performing the reading in a screening population, and to state whether the examination presented a finding that prompted recall or not. Readers could evaluate only the current images, as no previous studies were made available. No computer aided detection system for DM or DBT was used. Breast density was evaluated by one of the readers (R1) according to the fifth edition of the BI-RADS®, distinguishing four classes: a (breasts almost entirely fatty), b (scattered areas of fibroglandular density), c (breasts heterogeneously dense), and d (extremely dense breasts) [7]. 2.4. Reference standard The reference standard was the histopathological evaluation of needle biopsy sample or of surgical specimen for lesions classified as suspicious after complete work up. In the cases in which biopsy was not performed, reference standard was follow up of at least 1 year. Negative multiple assessment was considered as reference standard for patients lost at follow up. The multiple assessment consisted of additional mammographic views, ultrasonography (US) and, when available, comparison with previous examinations. In some cases, magnetic resonance imaging (MRI) was also performed, in order to exclude malignancy. 2.5. Statistical analysis and sample size calculation Cases that were assessed as negative or benign at multiple assessment, with or without biopsy, and remained stable at follow up, were considered true negative when the patient was not recalled by the study reader, false positive when the study reader prompted recall. Cases were assessed as true positive when a patient with a malignant lesion was recalled, and false negative when the reader did not prompt recall. The number of cases recalled by each reader with DM alone was used as reference number to calculate reduction in recall rate when DBT was added. This calculation was made first for both readers separately and then for the two readers together (double reading, DR). When results of DR were calculated, both recalls prompted by only one or by both readers were considered. Diagnostic performance was estimated in terms of sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV), together with their 95% confidence interval (95% CI), calculated according the binomial distribution. The comparison between the two readers in terms of performance and recall rate at DM alone and DM + DBT was evaluated using McNemar test. The comparison of recall rates for DR was also estimated according to breast density, using Chi-square test. Inter-reader agreement was estimated using Cohen coefficient. The study was prospectively designed and the sample size calculation was based on the capability to show a 30% reduction (from the current 5% to 3.5%) of the recall rate of DM + DBT compared to
810
L.A. Carbonaro et al. / European Journal of Radiology 85 (2016) 808–814
Table 1 Histology of the 107 lesions with a histopathological verification. Forty-one lesions were confirmed as malignant while the remaining 66 were proven to be benign. Histology Malignant Invasive ductal carcinomab Ductal carcinoma in situb Invasive lobular carcinomab Benign Fibroadenomac Cystc Lymph nodec Sclerosing adenosisc Papillomac Otherc Non specifiedc a b c
Table 3 Diagnostic performance of digital mammography in association with digital breast tomosynthesis for Reader 1, Reader 2 and for Double reading, considering as true positive only malignant lesions.
% (95% CIa ) 38% (29–48%) 66% (49–80%) 22% (11–38%) 12% (4–26%) 62% (52–71%) 17% (9–28%) 15% (7–26%) 6% (2–15%) 3% (0–10%) 3% (0–10%) 12% (5–23%) 44% (32–57%)
Lesions 41 27 9 5 66 11 10 4 2 2 8 29
Sensitivity Specificity Accuracy Negative predictive value Positive predictive value
Reader 1 % (95% CI)
Reader 2 % (95% CI)
Double reading % (95% CI)
93 (80–99) 67 (61–73) 71 (65–76) 98 (95–100) 33 (24–42)
88 (74–96) 73 (67–78) 75 (70–80) 97 (94–99) 36 (24–44)
98 (87–100) 55 (49–62) 61 (56–67) 99 (96–100) 27 (20–35)
95% CI: 95% confidence interval.
maximum diameter and 1 ductal carcinoma in situ (DCIS) of 30 mm in maximum diameter. The PPV was 33%. Compared to DM alone, R2 would have not recalled other 138/236 women (58%) when reading DM + DBT. Diagnostic performance of DM + DBT for R2 showed a non significantly lower sensitivity (P = 0.693) and a higher specificity (P = 0.183) compared to R1, with a comparable PPV (Table 3). The five false negatives were: 3 IDC of 5 mm, 12 mm and 15 mm in maximum diameter, 1 bifocal invasive lobular cancer (ILC) of 10 and 6 mm in diameter, and 1 DCIS of 8 mm in diameter. Upon the evaluation of DR, only 22 (8%) of the 280 enrolled patients would have not been recalled using DM alone (Table 2). Considering the remaining 258 women, 112 (43%) would have not been recalled at DR when using DM + DBT. Double reading showed a very high sensitivity with a specificity of 55% and a PPV of 27% (Table 3). Only one false negative was identified: one IDC of 15 mm in diameter that was visible only as faint architectural distortion on DM and DBT, while it was clearly visible on US. Recall rate for R1 and R2 was significantly different when considering DM alone (P = 0.001), but not when considering DM + DBT (P = 0.500). Examples of suspicious screening mammography not confirmed and confirmed at DBT are given in Fig. 1 and Fig. 2, respectively. Inter-reader agreement was only fair (=0.234) for DM alone and moderate for DM + DBT (=0.459). No significant differences were found in recall rate for DR when cases were stratified for breast density (p = 0.992) (Table 4).
95% CI = 95% confidence interval. Percentages are calculated on 41 malignant lesions. Percentages are calculated on 66 benign lesions.
that of DM alone. With 80% power and 0.05 ˛ error, 270 women had to be enrolled. 3. Results 3.1. Patients A total of 280 women were enrolled (mean age, 55 years, range 45–74). Breast density distribution was: class a in 48 cases, class b in 90 cases, class c in 82 cases, and class d in 60 cases. In 107 women (38%) a histopathological verification was obtained. Image-guided needle biopsy or surgical excision resulted in 66 (62%) benign findings and 41 (38%) cancers (Table 1). The remaining 173 cases were found to be negative or benign lesions during diagnostic work-up (multiple assessment). During follow up, 2 of these 173 patients underwent biopsy, both with a diagnosis of fibrocystic changes. Forty-one of 173 patients (15%) were lost at follow up, all of them with negative triple assessment. Average follow up for the remaining 130 patients ranged from 1 to 3 years (median = 2). 3.2. Recall rate and diagnostic performance Upon the evaluation of DM alone, R1 would have not recalled 79 (28%) and R2 would have not recalled 44 (16%) of the 280 enrolled patients. All further calculations were performed on the remaining 201/280 patients (72%) for R1 and 236/280 patients (84%) for R2. Recall rates from each of the two readers and for double reading are shown in Table 2. Compared to DM alone, R1 would have not recalled other 86/201 women (43%) when reading DM with DBT (DM + DBT). Diagnostic performance of DM + DBT for R1 showed a very high sensitivity and good specificity (Table 3). Three false negatives were found: 2 invasive ductal carcinoma (IDC) of 10 mm and of 15 mm in
4. Discussion We obtained a strong reduction in recall rate (about 40%) when DBT was used as an adjunct to DM, as compared to DM alone, in a blinded reading of patients recalled from a population-based screening setting. This reduction was observed for the two different readers and also for double reading, and it was independent of breast density. Furthermore, we noted that the variability between the two readers was higher when reading DM alone i.e., the adjunct of DBT improved the agreement between readers. Recall rate with DM alone of R1, who had population-based screening experience, was significantly lower than that of R2, who had no
Table 2 Recalls and recall rates (RR) for Reader 1, Reader 2 and Double reading at the evaluation of digital mammography (DM) alone, DM plus digital breast tomosynthesis (DBT) and recall reduction resulted after the adjunct of DBT to DM. Recalls DM N Reader 1 Reader 2 Double reading a b c
201 236 258
Recalls DM plus DBT % (95%CIa )
N b
72% (66–77%) 84% (79–88%)b 92% (88–95%)b
115 98 146
95% CI = 95% confidence interval. Percentages are calculated on the initial number of patients included in the study (280 cases). Percentages are calculated on the number of recalls DM of the same reader and the double reading.
RR reduction
% (95%CIa ) c
57% (50–64%) 42% (35–48%)c 57% (50–63%)c
N
% (95% CIa )
86 138 112
43% (36–50%)c 58% (52–65%)c 43% (37–50%)c
L.A. Carbonaro et al. / European Journal of Radiology 85 (2016) 808–814
811
Fig. 1. A 57 years old woman was recalled from the regional screening program (a, b) for a mass in the left breast, better visible in the medio-lateral oblique view (b, arrow). Digital breast tomosynthesis was performed and no suspicious lesion was detected in the area of the mammographic finding (c, arrow). No suspicious lesions were detected at ultrasound (not shown). The finding was interpreted as tissue superimposition.
Fig. 2. A 49 years old woman was recalled from the regional screening program (a, b) because of an asymmetry (b, arrow) and a probably benign lesions in the upper quadrants of the left breast (b). On digital breast tomosynthesis (DBT), the asymmetry was confirmed, appearing as a spiculated mass (c, arrow), and the probably benign mass did not show suspicious features on both DBT and ultrasound (not shown). The spiculated lesion was biopsied under ultrasound guidance and diagnosis was invasive ductal carcinoma.
population-based screening experience. This inter-reader difference was not found when evaluating DM and DBT. Various authors already addressed the issue of recall rate after the introduction in the clinical and screening setting of DBT (Table 5) [12–17,11,18–25]. These studies showed different design, number of readers and number of cases considered, with a calculated reduction in recall rate variable between 6% and 82%.
Our results are in agreement with other works evaluating the recall rate in selected cases, showing a reduction from 30% to 40% [12–15,23,24]. Studies performed in the real screening setting, with a higher numbers of cases, showed a reduction in recall rates higher than 15% [11,19–21]. The only exceptions are those by Skaane et al. [17]. and Lång et al. [25] in which a higher recall rate was found for DM with DBT, compared to DM alone. This could be related to
812
L.A. Carbonaro et al. / European Journal of Radiology 85 (2016) 808–814
Table 4 Recalls and recall rates for double reading at the evaluation of digital mammography (DM) alone, DM plus digital breast tomosynthesis (DBT) and the recall reduction resulted after the adjunct of DBT to DM, according to breast density classes. BI-RADS density class
a
Recalls at DMa Recalls at DM + DBTa Recall reductionb
46 28 18
b 18%(13–23%) 19%(13–26%) 39%(25–55%)
81 46 35
c 31%(26–37%) 31%(24–39%) 43%(32–55%)
d
77 43 34
30%(24–36%) 29%(22–37%) 44%(33–56%)
54 30 24
Total 21%(16–26%) 20%(14–28%) 44%(31–59%)
258 147 111
In parentheses, 95% confidence intervals. a Percentages are calculated on total recalls. b Percentages are calculated on recalls DM for each BI-RADS density class.
an initial very low recall rate for DM where the studies were performed. Other explanations could be the readers’ initial experience with DBT in one study [17]. and the use of only one DBT view in the other study [25]. A significant increase in detection rate was identified in both studies. Of note, Skaane et al. [17] obtained a reduction in the false positive rate of 13%, when using DM and DBT. The observation that centers with a higher number of recalls may profit more from the introduction of DBT has been already suggested by other authors [26]. Data regarding subsequent screening rounds are also needed, as this has also been shown to affect the overall recall rate [27]. The variability in absolute recall rate seen in the literature mirrors the variability in recall rate in different countries and centers, already shown for mammography [8,9]. Of note, readers’ agreement for recall was not evaluated in the studies listed in Table 5. Detection rate was between 5 and 6 per 1000 in the majority of
these studies [14,15,18–23]. This confirms the lower variability of detection rate, compared to the variability of recall rate, shown for DM [10]. Positive predictive value shows a high variability, due to its relation with the prevalence of disease in the evaluated setting. Our results are, though, similar to that of other studies [16–18]. Only a few papers addressing the issue of recall rate also evaluated the diagnostic performance, with good results in terms of sensitivity and specificity [13,16], as we have obtained. Sensitivity and specificity for DBT reported in the literature range between 70–100% and 64–89%, respectively [28–30]. We found values well within these ranges when evaluating single reader’s performance, but, as expected, a lower specificity in double reading: all cases considered as positive by both readers or by only one reader were considered, thus increasing the number of false positives.
Table 5 Results of studies evaluating the reduction in recall rate (RR) with the introduction of digital breast tomosynthesis (DBT) in association with digital mammography (DM). First Author, Year, Country [Reference # ]
Study design
Cases
Cancers
RR (DBT)
Difference in RR (DM vs DBT)
Performance
Poplack, 2007, USA [12] Gur, 2009, USA, [13]
Retrospective, screening recalls Retrospective, selected cases
99 125
4 35
NA NA
Rose, 2013, USA, [14]
Comparison of two periods+ Comparison of two groups, same period*,+ Retrospective, consecutive cases**
56 DM 51 DBT 37 DM 35 DBT 99
5.5%
Haas, 2013, USA [15]
13856 DM 9499 DM + DBT 7058 DM 6100 DM + DBT 997
Minus 40% Minus 30% Minus 38% Minus 38%# Minus 6–67%##
NA Sensitivity 93% Specificity 72% DR 5.37‰ PPV 10% DR 5.7‰ Sensitivity 79% Specificity 84% PPV 50% NPV 95% DR 9.4‰ PPV 26% DR 8.1‰ DR 5.7‰ PPV 50% DR 5.5‰ PPV 45% DR 6.3‰ PPV 4.6% DR 5.4‰ PPV 6.4% DR 5.9‰ DR 4.6‰ ROC-AUC 0.873 DR 8.9‰ Sensitivity 88–98% Specificity 55–73%
Rafferty, 2013, USA [16]
8.4% FPR 20–41% TPR 78–92%
◦
Skaane, 2013, Norway [17]
Prospective, screening setting
12621
120
3.7%
Ciatto, 2013, Italy [11] Destounis, 2014, USA [18]
Prospective, screening setting+ Comparison of two groups, same period* Comparison of two periods
7294 524 DM 524 DBT 10728 DM 15571 DM + DBT 38674 DM 20943 DM + DBT 281187 DM 173663 DM + DBT 9364 DM 8591 DM + DBT 12577 DM 12921 DBT 153 7500 280
59 2 DM 3 DBT 49 DM 85 DBT 190 DM 131 DBT 5056 DM 3285 DBT 54 DM 51 DBT NA
3.5% 4.2%
McCarthy, 2014, USA [19] Greenberg, 2014, USA [20]+ Friedewald, 2014, USA [21]+ Durand, 2015, USA [22] Lourenco, 2015, USA [23] Hakim, 2015, USA [24] Lång, 2015, Sweden [25]++ Current study, Italy, 2015
Comparison of two groups, same period Comparison of two periods Comparison of two groups, same period Retrospective, comparison of two periods Retrospective, selected cases Prospective, screening setting Retrospective, screening recalls
50 67 44
◦◦
8.8% 13.6% 9.1% 7.8% 6.4% 40% 3.8% 42–57%
Plus ◦ 22% Minus 17% Minus 25–82%## Minus 15% Minus 16% Minus 16% Minus 37% Minus 31% Minus 32% Plus 43% Minus 43–58%
DBT = Digital breast tomosynthesis; DM = Digital mammography; RR = Recall rate; NA = Not available; FPR = False positive rate; TP = True positive; DR = Detection rate; PPV = Positive predictive value; NPV = Negative predictive value; ROC-AUC = Receiver operating characteristic—area under the curve. + Multicenter/Multisite. ++ One view DBT. * With patients receiving DBT more likely to have personal or family history of breast cancer. ** Both screening and diagnostic examinations. # Reduction in odds of recall adjusted for age, breast density and risk factors. ## Ranges in a multi-reader study. ◦ Before arbitration. Higher recall rate compared to DM alone (2.9%) but also higher detection rate. ◦◦ All patients with a DM findings were recalled; value estimated on the basis of DBT findings.
L.A. Carbonaro et al. / European Journal of Radiology 85 (2016) 808–814
Our study has some limitations. First, all the cases included presented already a finding that prompted recall on screening DM; thus, the percentage of positive mammographic examination and of cancers is higher, compared to the usual screening population. As shown by previous studies (Table 5), this might not resemble the entity of recall rate reduction that can be obtained in a screening setting, though it adds to the evidence in favor of the use of DBT in screening. Second, in forty-one cases (15%) the standard of reference was composed only by multiple assessment and not by a negative follow-up, as patients were lost at follow up. However, all the patients were evaluated with additional imaging (US and/or MRI) to exclude the presence of malignancy, thus assuring a reference standard to define positive and negative cases and to evaluate overall DBT performance. In none of the cases that underwent follow up an interval cancer was found. Third, we should consider that the recall rate reduction we reported, as per study design, couldn’t take into account the DBT evaluation of all these women who were not recalled at DM. This implies that we could not estimate the increase in sensitivity achievable with DBT nor the tradeoff in terms of false positives prompted by DBT. As a consequence, our reduction in recall is likely to be overestimated. However, in terms of overall diagnostic performance, the additional DBT false positives should be at least partially compensated by the additional DBT true positives. Finally, in our evaluation, all patients recalled by at least one reader were also considered as recalled at double reading. This is not necessarily true in a real screening setting, where cases might be assessed as negative at consensus reading or by the third reader, and not recalled. With reference to Table 5, considering the previous and the current study, 14 of 16 showed a reduction in recall rate, highly variable depending on study design and setting, initial recall rate, single or double reading, as already mentioned. The median recall rate reduction reported was 31%. However, when DBT has to be applied to population-based screening programs, especially considering the relatively low initial recall rate, the reduction in recall rate may be lower than 30%, probably in the range between 15% and 25%. In conclusion, we showed that the association of DBT with DM allows a significant reduction in recall rate, as compared to DM alone, irrespectively of breast density, in a population-based multicenter screening setting when applied to cases recalled at the first reading. Interestingly, we found a twofold increase in inter-reader agreement when DBT is added to DM. Large-scale randomized controlled studies in a population-based screening setting are needed to confirm the reduction in recall rate, the increase in sensitivity, and the impact on interval cancer rate. Funding sources This work had no specific funding source. Conflict of interest The authors declare that there are no conflict of interest. References [1] L. Tabár, B. Vitak, T.H.-H. Chen, A.M.-F. Yen, A. Cohen, T. Tot, et al. Swedish two-county trial: impact of mammographic screening on breast cancer mortality during 3 decades, Radiology 260 (2011) 658–663, http://dx.doi.org/ 10.1148/radiol.11110469. [2] A.B. Miller, C. Wall, C.J. Baines, P. Sun, T. To, S.A. Narod, Twenty five year follow-up for breast cancer incidence and mortality of the Canadian National Breast Screening Study: randomised screening trial, BMJ 348 (2014) g366. [3] E.D. Pisano, C. Gatsonis, E. Hendrick, M. Yaffe, J.K. Baum, S. Acharyya, et al., Diagnostic performance of digital versus film mammography for breast-cancer screening, N. Engl. J. Med. 353 (2005) 1773–1783, http://dx.doi. org/10.1056/NEJMoa052911.
813
[4] C. Colin, M. Devouassoux-Shisheboran, F. Sardanelli, Is breast cancer overdiagnosis also nested in pathologic misclassification? Radiology 273 (2014) 652–655, http://dx.doi.org/10.1148/radiol.14141116. [5] N.T. Brewer, T. Salz, S.E. Lillie, Systematic review: the long-term effects of false-positive mammograms, Ann. Intern. Med. 146 (2007) 502–510. [6] N., Perry, M., Broeders, C., de Wolf, S., Törnberg, R., Holland, L., von Karsa, European, Guidelines for quality assurance in breast cancer screening and diagnosis. Fourth edition, (2006). [7] Carl J. D’Orsi, E.A. Sickles, E.B. Mendelson, E.A. Morris, ACR BI-RADS® Atlas, Breast Imaging Reporting and Data System, 5th Edition, American College of Radiology, Reston, VA, 2013. [8] R. Smith-Bindman, R. Ballard-Barbash, D.L. Miglioretti, J. Patnick, K. Kerlikowske, Comparing the performance of mammography screening in the USA and the UK, J. Med. Screen. 12 (2005) 50–54, http://dx.doi.org/10.1258/ 0969141053279130. [9] J. Rothschild, A.P. Lourenco, M.B. Mainiero, Screening mammography recall rate: does practice site matter? Radiology 269 (2013) 348–353, http://dx.doi. org/10.1148/radiol.13121487. [10] S. Hofvind, A. Ponti, J. Patnick, N. Ascunce, S. Njor, M. Broeders, et al., False-positive results in mammographic screening for breast cancer in Europe: a literature review and survey of service screening programmes, J. Med. Screen. 19 (Suppl. (1)) (2012) 57–66, http://dx.doi.org/10.1258/jms. 2012.012083. [11] S. Ciatto, N. Houssami, D. Bernardi, F. Caumo, M. Pellegrini, S. Brunelli, et al., Integration of 3D digital mammography with tomosynthesis for population breast-cancer screening (STORM): a prospective comparison study, Lancet Oncol. 14 (2013) 583–589, http://dx.doi.org/10.1016/S1470-2045(13)701347. [12] S.P. Poplack, T.D. Tosteson, C.A. Kogel, H.M. Nagy, Digital breast tomosynthesis: initial experience in 98 women with abnormal digital screening mammography, AJR Am. J. Roentgenol. 189 (2007) 616–623, http:// dx.doi.org/10.2214/AJR.07.2231. [13] D. Gur, G.S. Abrams, D.M. Chough, M.A. Ganott, C.M. Hakim, R.L. Perrin, et al., Digital breast tomosynthesis: observer performance study, AJR Am. J. Roentgenol. 193 (2009) 586–591, http://dx.doi.org/10.2214/AJR.08.2031. [14] S.L. Rose, A.L. Tidwell, L.J. Bujnoch, A.C. Kushwaha, A.S. Nordmann, R. Sexton, Implementation of breast tomosynthesis in a routine screening practice: an observational study, AJR Am. J. Roentgenol. 200 (2013) 1401–1408, http://dx. doi.org/10.2214/AJR.12.9672. [15] B.M. Haas, V. Kalra, J. Geisel, M. Raghu, M. Durand, L.E. Philpotts, Comparison of tomosynthesis plus digital mammography and digital mammography alone for breast cancer screening, Radiology 269 (2013) 694–700, http://dx. doi.org/10.1148/radiol.13130307. [16] E.A. Rafferty, J.M. Park, L.E. Philpotts, S.P. Poplack, J.H. Sumkin, E.F. Halpern, et al., Assessing radiologist performance using combined digital mammography and breast tomosynthesis compared with digital mammography alone: results of a multicenter, multireader trial, Radiology 266 (2013) 104–113, http://dx.doi.org/10.1148/radiol.12120674. [17] P. Skaane, A.I. Bandos, R. Gullien, E.B. Eben, U. Ekseth, U. Haakenaasen, et al., Comparison of digital mammography alone and digital mammography plus tomosynthesis in a population-based screening program, Radiology 267 (2013) 47–56, http://dx.doi.org/10.1148/radiol.12121373. [18] S. Destounis, A. Arieno, R. Morgan, Initial experience with combination digital breast tomosynthesis plus full field digital mammography or full field digital mammography alone in the screening environment, J. Clin. Imaging Sci. 4 (2014) 9, http://dx.doi.org/10.4103/2156-7514.127838. [19] A.M. McCarthy, D. Kontos, M. Synnestvedt, K.S. Tan, D.F. Heitjan, M. Schnall, et al., Screening outcomes following implementation of digital breast tomosynthesis in a general-population screening program, J. Natl. Cancer Inst. 10 (2014) 6, http://dx.doi.org/10.1093/jnci/dju316. [20] J.S. Greenberg, M.C. Javitt, J. Katzen, S. Michael, A.E. Holland, Clinical performance metrics of 3D digital breast tomosynthesis compared with 2D digital mammography for breast cancer screening in community practice, AJR Am. J. Roentgenol. 203 (2014) 687–693, http://dx.doi.org/10.2214/AJR.14. 12642. [21] S.M. Friedewald, E.A. Rafferty, S.L. Rose, M.A. Durand, D.M. Plecha, J.S. Greenberg, et al., Breast cancer screening using tomosynthesis in combination with digital mammography, JAMA J. Am. Med. Assoc. 311 (2014) 2499–2507, http://dx.doi.org/10.1001/jama.2014.6095. [22] M.A. Durand, B.M. Haas, X. Yao, J.L. Geisel, M. Raghu, R.J. Hooley, et al., Early clinical experience with digital breast tomosynthesis for screening mammography, Radiology 274 (2015) 85–92, http://dx.doi.org/10.1148/ radiol.14131319. [23] A.P. Lourenco, M. Barry-Brooks, G.L. Baird, A. Tuttle, M.B. Mainiero, Changes in recall type and patient treatment following implementation of screening digital breast tomosynthesis, Radiology 274 (2015) 337–342, http://dx.doi. org/10.1148/radiol.14140317. [24] C.M. Hakim, V.J. Catullo, D.M. Chough, M.A. Ganott, A.E. Kelly, D.D. Shinde, et al., Effect of the availability of prior full-field digital mammography and digital breast tomosynthesis images on the interpretation of mammograms, Radiology 276 (2015) 65–72, http://dx.doi.org/10.1148/radiol.15142009. [25] K. Lång, I. Andersson, A. Rosso, A. Tingberg, P. Timberg, S. Zackrisson, Performance of one-view breast tomosynthesis as a stand-alone breast cancer screening modality: results from the Malmö Breast Tomosynthesis Screening Trial, a population-based study, Eur. Radiol. (2015), http://dx.doi.org/10.1007/ s00330-015-3803-3.
814
L.A. Carbonaro et al. / European Journal of Radiology 85 (2016) 808–814
[26] F. Caumo, D. Bernardi, S. Ciatto, P. Macaskill, M. Pellegrini, S. Brunelli, et al., Incremental effect from integrating 3D-mammography (tomosynthesis) with 2D-mammography: increased breast cancer detection evident for screening centres in a population-based trial, Breast 23 (2014) 76–80, http://dx.doi.org/ 10.1016/j.breast.2013.11.006. [27] M. Sala, M. Comas, F. Macià, J. Martinez, M. Casamitjana, X. Castells, Implementation of digital mammography in a population-based breast cancer screening program: effect of screening round on recall rate and cancer detection, Radiology 252 (2009) 31–39, http://dx.doi.org/10.1148/radiol. 2521080696. [28] F.J. Gilbert, L. Tucker, M.G. Gillan, P. Willsher, J. Cooke, K.A. Duncan, et al., The TOMMY trial: a comparison of TOMosynthesis with digital MammographY in the UK NHS Breast Screening Programme–a multicentre retrospective reading
study comparing the diagnostic performance of digital breast tomosynthesis and digital mammography with digital mammography alone, Health Technol. Assess 19 (i–xxv) (2015) 1–136, http://dx.doi.org/10.3310/hta19040. [29] M. Alakhras, R. Bourne, M. Rickard, K.H. Ng, M. Pietrzyk, P.C. Brennan, Digital tomosynthesis: a new future for breast imaging? Clin. Radiol. 68 (2013) 225–236, http://dx.doi.org/10.1016/j.crad.2013.01.007. [30] J. Lei, P. Yang, L. Zhang, Y. Wang, K. Yang, Diagnostic accuracy of digital breast tomosynthesis versus digital mammography for benign and malignant lesions in breasts: a meta-analysis, Eur. Radiol. 24 (2014) 595–602, http://dx.doi.org/ 10.1007/s00330-013-3012-x.