Factors affecting the rate of false positive marks in CAD in full-field digital mammography

Factors affecting the rate of false positive marks in CAD in full-field digital mammography

European Journal of Radiology 81 (2012) e844–e848 Contents lists available at SciVerse ScienceDirect European Journal of Radiology journal homepage:...

388KB Sizes 0 Downloads 50 Views

European Journal of Radiology 81 (2012) e844–e848

Contents lists available at SciVerse ScienceDirect

European Journal of Radiology journal homepage: www.elsevier.com/locate/ejrad

Factors affecting the rate of false positive marks in CAD in full-field digital mammography Florian Engelken a,∗ , Raphael Bremme b,1 , Ulrich Bick a,2 , Sophie Hammann-Kloss c,3 , Eva M. Fallenberg a,4 a

Department of Radiology, Charité Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany Naugarder Str. 16, 10409 Berlin, Germany c Groninger Strasse 8, 13347 Berlin, Germany b

a r t i c l e

i n f o

Article history: Received 1 November 2011 Received in revised form 9 February 2012 Accepted 29 February 2012 Keywords: Mammography Computer-aided detection Specificity Breast density Fibroglandular tissue

a b s t r a c t Objective: To assess the effect of breast density, fibroglandular tissue volume, and breast volume on the rate of false-positive marks of a computer-assisted detection software in digital mammography. Materials and methods: 222 patients with normal digital mammograms and a minimum follow-up of 22 months were retrospectively identified. MLO and CC views were analyzed using a CAD software with three operating points (‘specific’, ‘balanced’, ‘sensitive’). False-positive marks were recorded. Images were analyzed by a volumetric breast density assessment software, yielding estimates of percentage density, fibroglandular tissue volume, and breast volume. Statistical analysis was performed using the Mann–Whitney U-test, the t-test for independent samples and the Poisson regression model. Results: Patients with high fibroglandular tissue volumes had a higher mean number of false-positive mass marks than patients with low fibroglandular tissue volumes (specific setting: 0.50 vs. 0.35, respectively; balanced setting: 0.70 vs. 0.40, respectively, p < 0.05; sensitive setting: 0.89 vs. 0.58, respectively, p < 0.05). Relative risk for a false-positive mass marker increased by 1.43 (p < 0.05), 1.63 (p < 0.001) and 1.50 (p < 0.01) per 100 ml of fibroglandular tissue for the specific, balanced and sensitive settings, respectively. No significant effects of percentage density or breast volume on the number or the relative risk of false-positive mass marks were observed. Conclusion: The volume of fibroglandular tissue present, but not the percentage density of the breast, affected the specificity for masses of the CAD software investigated. This may have implications for improving the performance of CAD systems, as the specificity of CAD may be improved by adjusting the algorithm threshold depending on the volume of fibroglandular tissue present. Considering both factors, fibroglandular tissue volume and percentage density, independently, could improve overall CAD performance in subgroups of patients, e.g. those with small, dense breasts or large breasts with low density. © 2012 Elsevier Ireland Ltd. All rights reserved.

1. Introduction Computer-aided detection (CAD) has been shown to increase the sensitivity of mammography [1–3]. However, a major drawback of CAD systems is the occurrence of false-positive results, which

∗ Corresponding author. Tel.: +49 30 450 627 308; fax: +49 30 450 527 968. E-mail addresses: fl[email protected] (F. Engelken), [email protected] (R. Bremme), [email protected] (U. Bick), [email protected] (S. Hammann-Kloss), [email protected] (E.M. Fallenberg). 1 Tel.: +49 176 60014212. 2 Tel.: +49 30 450 627 001; fax: +49 30 450 527 968. 3 Tel.: +49 179 9021832. 4 Tel.: +49 30 450 627 105; fax: +49 30 450 527 968. 0720-048X/$ – see front matter © 2012 Elsevier Ireland Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ejrad.2012.02.017

require radiologist attention, may increase reporting time, and lead to false recalls [4–6]. A meta-analysis of 7 large studies assessing the sensitivity and specificity of CAD in screening mammography found that 96% of women recalled based on CAD were healthy [7]. The high rate of false-positive results may be a key factor limiting the integration of CAD into the clinical workflow. Some studies identified a negative effect of breast density on the specificity of CAD systems for masses [8,9], whereas others found that specificity was unaffected but sensitivity decreased [10,11]. These studies used visual assessment of breast density for classification. Recently, an automated volumetric breast density assessment software (R2 Quantra, Hologic Inc., MA, USA) became available, which estimates the volumes of fibroglandular tissue and of the whole breast and calculates the percentage density of glandular tissue.

F. Engelken et al. / European Journal of Radiology 81 (2012) e844–e848

The purpose of this study was to assess the effect of breast percentage density (PD), fibroglandular tissue volume (FTV) and breast volume (BV) on the number of false-positive marks produced by a CAD system in full-field digital mammography. 2. Materials and methods

e845

Table 1 Cross-table of the mean number of false positive marks in breasts of large or small volume and density. Breast volume

Large (408–1564 ml) Small (63–407 ml)

Percentage density (specific/balanced/sensitive) Low (8–21%)

High (22–61%)

0.41/0.49/0.67 0.32/0.34/0.55

0.50/0.74/0.95 0.44/0.52/0.70

2.1. Patient selection We searched our records for patients with examinations on the same digital mammography unit (GE Senographe 2000D, General Electric Company, Fairfield, CT, USA) between June 1st, 2002 and August 31st, 2006 who had unremarkable reports for at least one side and normal clinical and mammographic follow-up examinations. Unremarkable reports and unremarkable mammograms were defined as absence of mass lesions, architectural distortions or microcalcifications suggestive of malignancy, including any known benign lesions displaying such features. Inclusion criteria were: female sex, unremarkable mammography on one or both sides, and unremarkable follow-up for at least 22 months. Exclusion criteria were: previous operation or vacuum-assisted biopsy on the eligible breast, technical deficits such as inadequate positioning, insufficient compression or presence of skin folds. Images were reported by two radiologists as part of the clinical routine. For the purpose of this study, a further review of mammography images and reports was performed by a radiologist with 5 years experience in mammography imaging (F.E.) to ascertain that inclusion and exclusion criteria were met. In patients with two eligible breasts, one side was chosen at random for analysis. 231 patients were eligible for inclusion in the study. In 112 patients, both breasts met the criteria for inclusion and one side was chosen at random. The density assessment software failed to produce results in 9 breasts, most likely due to divergent results between the CC- and MLO-view. 222 breasts were included in the analysis. Institutional review board approval was obtained. 2.2. Image analysis: CAD Raw image data was analyzed by a CAD software (R2 ImageChecker, Version 9.3, Hologic, Bedford, MA, USA) to identify suspicious masses and microcalcifications [11,12]. For mass detection, the software algorithm segments the breast, then attempts to find masses by evaluating structures based on their density, shape, and margin characteristics. As a parallel step, structures consisting of radiating lines are identified and, if present, the degree of spiculation is rated. To detect calcifications, the algorithm uses two filters (artificial neural networks), first evaluating each image with a shift-invariant neural network and then analyzing the calcifications found with a cluster filter, which assesses features such as calcification size and shape. The location of the lesions in the breast is determined and compared to complementary views to look for similarities and asymmetries. Results of the various analyses are combined, compared to the training database using statistical pattern recognition and ranked according to their likelihood of malignancy. Finally, the algorithm checks the operating point setting to determine which results are displayed. There are three operating points (specific, balanced and sensitive), which can be set independently for calcifications and masses [13]. An example of a CAD-report is shown in Fig. 1. Numbers and types of falsepositive marks were recorded for both medio-lateral oblique (MLO) and craniocaudal (CC) views for each possible operating point. 2.3. Image analysis: volumetric breast density assessment In addition to CAD-analysis, the raw images were analyzed by a volumetric breast density assessment software (R2 Quantra,

Hologic, Bedford, MA, USA), which calculates volumetric breast density as a ratio of fibroglandular tissue and total breast volume estimates. For each pixel of the mammographic image, the software estimates the thickness of fibroglandular tissue that an X-ray must have passed through to deposit the measured energy on the detector. The estimates are based on published attenuation coefficients of breast tissue, X-ray spectra of the target material as well as tube current, tube voltage and compression thickness [14,15]. The result is given in units of length of fibroglandular tissue penetrated and is converted to a volume by multiplication with pixel dimensions. Using a similar process, the software also measures the volume of the whole breast, compensating for uncompressed regions of the breast. After both volumes are calculated, PD is calculated by division. The results of both views (CC, MLO) are then combined into a single numerical figure [16]. Both the CAD and the volumetric breast density assessment software were run on a dedicated server integrated into the picture archiving and communication (PACS) network. 2.4. Statistical analysis Power calculation was performed using a sample size calculation program (nQuery Advisor 7.0, Statistical Solutions, Cork, Ireland). The study was powered to detect differences in the rate of false positive marks of 0.3. Data analysis was performed using a statistical analysis program (IBM SPSS Statistics® , International Business Machines Corp., Armonk, NY, USA). For data analysis patients were grouped into tertiles according to FTV (low, <70 ml; intermediate 70–105 ml; high, >105 ml) and percentage breast density (low, <17%; intermediate, 17–25.9%; high, >25.9%). Differences between groups were tested for statistical significance using the Mann–Whitney U-test. Differences in FTV and PD between patients with high and low numbers of false-positive mass marks were tested using a t-test for independent samples. Loglinear regression analysis was performed using the Poisson regression model with FTV and PD as covariates. 3. Results Mean patient age at the time of the first mammogram was 54 ± 10 years, ranging from 30 to 79 years. Mean PD was 23.7 ± 11.4%, ranging from 8% to 61%. Mean FTV was 98.5 ± 57.8 ml, ranging from 22 ml to 379 ml. Mean BV was 455 ± 242 ml, ranging from 63 ml to 1564 ml. Patients with large breasts and high PD had the highest number of false positive marks, while patients with small breasts and low PD had the lowest (Table 1). The mean number of false-positive mass marks per case for patients with low, intermediate, and high FTVs was 0.35, 0.41 and 0.50, respectively, for the ‘specific’ CAD setting, 0.4, 0.45, and 0.70, respectively, for the ‘balanced’ setting, and 0.58, 0.65, and 0.89, respectively, for the ‘sensitive’ setting (Fig. 2). The differences in the number of false-positive mass marks for high vs. low FTV were statistically significant for the ‘balanced’ and the ‘sensitive’ setting (p < 0.05). No statistically significant differences in the mean number of false positives were found between patients with low, intermediate, and high PD or BV.

e846

F. Engelken et al. / European Journal of Radiology 81 (2012) e844–e848

Fig. 1. Example of a CAD-report at different operating points: (a) ‘specific’ setting: no false positive marks; (b) ‘balanced’ setting, one false positive mass mark; (c) ‘sensitive’ setting, two false positive mass marks. No false positive calcification marks and no combined mass and calcification marks at any setting.

With the ‘balanced’ and ‘sensitive’ CAD settings, patients with multiple (2 or more) false-positive mass marks had a significantly higher mean FTV than patients with no false-positive mass marks (see Table 2). There were no statistically significant differences in mean breast density between patients with multiple and no falsepositive mass marks.

Loglinear regression analysis using FTV, PD, and BV as covariates showed a statistically significant increase in relative risk for a falsepositive mass marker with increasing FTV, but no effect of PD or BV (Table 3). No significant effects of PD, FTV, or BV on the number of falsepositive marks for microcalcifications were observed.

F. Engelken et al. / European Journal of Radiology 81 (2012) e844–e848

e847

Table 2 FTV in breasts with different numbers of false-positive mass marks; ‡ and # indicate statistical significance (p < 0.05) of the differences between two groups. Mean FTV ± standard deviation (n)

No. of false-positive mass marks

0 1 ≥2

Specific

Balanced

Sensitive

95.1 ± 54.1 (153) 99.6 ± 48.7 (48) 120.8 ± 92.0 (21)

92.9 ± 53.2‡ (143) 100.9 ± 50.1 (51) 122.6 ± 83.5‡ (28)

90.3 ± 51.6# (123) 104.4 ± 53.8# (55) 113.9 ± 74.0# (44)

Table 3 Relative risks of a false positive mass mark (numbers in parentheses represent 95% confidence intervals). Variable

FTV (per 100 ml) PD (per 10%) BV (per 100 ml) *

CAD sensitivity setting Specific

Balanced

Sensitive

1.43* (1.03–1.98) 0.96 (0.79–1.17) 1.00 (0.90–1.10)

1.63* (1.24–2.15) 0.96 (0.80–1.14) 1.00 (0.92–1.10)

1.50* (1.17–1.92) 0.93 (0.80–1.08) 1.00 (0.93–1.08)

Statistical significance (p < 0.05) of the result.

4. Discussion In mammographic CAD systems, as in any diagnostic test, there is a trade-off between sensitivity and specificity. If subtle features suggestive of malignancy are to be used for marking suspicious lesions, some benign structures will inevitably be marked as well. An optimal balance between sensitivity and specificity is necessary for maximizing the potential of CAD systems in mammography. The effects of breast density on the sensitivity of human observers and CAD programs have been well studied. Some studies indicate that human observers are more likely to overlook cancers in dense breasts [17,18], whereas a study by Birdwell et al. found that distracting lesions, location of lesions at the edge of glandular tissue, and breasts with a lot of superimposition of benign structures (“busy breasts”) are more important [19]. There is evidence that CAD may be more likely to miss breast cancers manifesting as mass lesions in dense breasts [8–11,20]. With regard to the specificity of the tested CAD systems, some of these studies also indicate a simultaneous decrease in specificity in dense breasts [8,9]. However, none of the studies considered the volume of fibroglandular tissue in the breast as a variable, since this data was not readily available until now. Our data suggest that specificity is not affected by breast density, but instead by the volume of fibroglandular tissue present, with a 40–60% increase in relative risk for a false-positive mass marker per 100 ml of fibroglandular tissue. This observation may possibly be attributable to stochastic effects,

as malignancy-simulating structures are more likely to occur when more mammographic densities exist. Since the volume of fibroglandular tissue and mammographic density are strongly linked, it is possible that the decrease in specificity with increasing breast density observed in previous studies was due to the confounding variable, FTV. The high incidence of false positive marks is a major limitation of current CAD-systems [4–7]. This is particularly true for false positive mass marks, as they are more frequent and more difficult to refute by the radiologist, because the criteria to decide that a mass lesion has a high likelihood of being benign are much less objective compared to those for microcalcifications. The findings of this study may indicate how the performance of CAD systems can be improved by reducing the number of false positive marks for a given algorithm threshold. While the decrease in sensitivity of CAD systems is due to masking of suspicious features by dense parenchyma [8–11,20], our data indicates that the decrease in specificity may be a function of the total volume of breast parenchyma, which correlates with the likelihood of normal structures combining to falsely suggest features of malignancy, rather than of mammographic density as previously thought. This means that both factors should be considered independently to optimize the two aspects of CAD-performance, sensitivity and specificity. As the relationship between algorithm threshold and sensitivity and specificity is not linear, the ‘cost’ of increasing sensitivity in terms of specificity is much lower when sensitivity is low than when it is high. Therefore, the algorithm threshold of CAD systems could, for example, be lowered in small, dense breasts (intermediate FTV, low sensitivity, moderate specificity) to increase sensitivity at the cost of only a moderate decrease in specificity; in contrast, it could be increased in large breasts with low density (intermediate FTV, high sensitivity, moderate specificity) to improve specificity, thus improving the overall performance of the system. 4.1. Limitations of the study

Fig. 2. Mean number of false-positive CAD marks for masses in patients grouped according to FTV; *p < 0.05.

There are some limitations to this study, which include the retrospective design. Also, we did not assess whether the volume of fibroglandular tissue and/or breast density affected the sensitivity of the CAD system. However, this has been the subject of numerous studies [8–11,20]. It appears plausible that the mechanism underlying the failure to detect suspicious features may result from a masking effect of dense parenchyma and is different from the effects causing misinterpretation of benign structures as suggestive of malignancy. However, this assumption needs to be tested in future studies assessing the effect of FTV on the sensitivity of CAD systems.

e848

F. Engelken et al. / European Journal of Radiology 81 (2012) e844–e848

5. Conclusion In conclusion, the number of false-positive CAD marks for masses is affected by the total volume of fibroglandular tissue but not by the density of the breast parenchyma. The two factors, PD and FTV, should be considered independently by CAD systems to set the optimal operating point in order to achieve an optimal balance of sensitivity and specificity.

[6]

[7]

[8]

[9]

Conflict of interest statement [10]

Regarding a possible conflicts of interest, the authors disclose the following: • F. Engelken and E. Fallenberg have received Equipment support by Hologic, Inc. • U. Bick has received Equipment support by Hologic, Inc., holds a license agreement with Hologic, Inc. and receives Royalties from Hologic, Inc. References [1] Brem RF, Baum J, Lechner M, et al. Improvement in sensitivity of screening mammography with computer-aided detection: a multiinstitutional trial. American Journal of Roentgenology 2003;181:687–93. [2] Warren Burhenne LJ, Wood SA, D’Orsi CJ, et al. Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology 2000;215:554–62. [3] Freer TW, Ulissey MJ. Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. Radiology 2001;220:781–6. [4] Fenton JJ, Taplin SH, Carney PA, et al. Influence of computer-aided detection on performance of screening mammography. New England Journal of Medicine 2007;356:1399–409. [5] Taylor P, Potts HW. Computer aids and human second reading as interventions in screening mammography: two systematic reviews to compare effects

[11]

[12]

[13] [14] [15]

[16] [17] [18]

[19]

[20]

on cancer detection and recall rate. European Journal of Cancer 2008;44: 798–807. Fenton JJ, Abraham L, Taplin SH, et al. Effectiveness of computer-aided detection in community mammography practice. Journal of the National Cancer Institute 2011;103:1152–61. Noble M, Bruening W, Uhl S, Schoelles K. Computer-aided detection mammography for breast cancer screening: systematic review and meta-analysis. Archives of Gynecology and Obstetrics 2009;279:881–90. Brem RF, Hoffmeister JW, Rapelyea JA, et al. Impact of breast density on computer-aided detection for breast cancer. American Journal of Roentgenology 2005;184:439–44. Malich A, Fischer DR, Facius M, et al. Effect of breast density on computer aided detection. Journal of Digital Imaging 2005;18:227–33. Ho WT, Lam PW. Clinical performance of computer-assisted detection (CAD system in detecting carcinoma in breasts of different densities. Clinical Radiology 2003;58:133–6. Obenauer S, Sohns C, Werner C, Grabbe E. Impact of breast density on computeraided detection in full-field digital mammography. Journal of Digital Imaging 2006;19:258–63. Ellis RL, Meade AA, Mathiason MA, Willison KM, Logan-Young W. Evaluation of computer-aided detection systems in the detection of small invasive breast carcinoma. Radiology 2007;245:88–94. Understanding R2 ImageChecker® CAD 9.3 MAN-01223 Rev 001. Bedford, MA, USA: Hologic, Inc.; 2009. Johns PC, Yaffe MJ. X-ray characterisation of normal and neoplastic breast tissues. Physics in Medicine & Biology 1987;32:675–95. Boone JM, Fewell TR, Jennings RJ. Molybdenum, rhodium, and tungsten anode spectral models using interpolating polynomials with application to mammography. Medical Physics 1997;24:1863–74. Understanding R2 QuantraTM 1.3 MAN-01224 Rev 001. Bedford, MA, USA: Hologic, Inc.; 2009. Bird RE, Wallace TW, Yankaskas BC. Analysis of cancers missed at screening mammography. Radiology 1992;184:613–7. Harvey JA, Fajardo LL, Innis CA. Previous mammograms in patients with impalpable breast carcinoma: retrospective vs blinded interpretation. 1993 ARRS President’s Award. American Journal of Roentgenology 1993;161:1167–72. Birdwell RL, Ikeda DM, O’Shaughnessy KF, Sickles EA. Mammographic characteristics of 115 missed cancers later detected with screening mammography and the potential utility of computer-aided detection. Radiology 2001;219:192–202. Li L, Wu Z, Salem A, et al. Computerized analysis of tissue density effect on missed cancer detection in digital mammography. Computerized Medical Imaging and Graphics 2006;30:291–7.