A comparison of the accuracy of film-screen mammography, full-field digital mammography, and digital breast tomosynthesis

A comparison of the accuracy of film-screen mammography, full-field digital mammography, and digital breast tomosynthesis

Clinical Radiology 67 (2012) 976e981 Contents lists available at SciVerse ScienceDirect Clinical Radiology journal homepage: www.clinicalradiologyon...

438KB Sizes 15 Downloads 178 Views

Clinical Radiology 67 (2012) 976e981

Contents lists available at SciVerse ScienceDirect

Clinical Radiology journal homepage: www.clinicalradiologyonline.net

A comparison of the accuracy of film-screen mammography, full-field digital mammography, and digital breast tomosynthesis M.J. Michell a, *, A. Iqbal a, R.K. Wasan a, D.R. Evans a, C. Peacock a, C.P. Lawinski b, A. Douiri c, f, R. Wilson d, P. Whelehan e a

Breast Radiology, King’s College Hospital, London, UK KCARE, King’s College Hospital, London, UK c Department of Primary Care and Public Health Sciences, King’s College London, London, UK d Department of Clinical Radiology, The Royal Marsden NHS Foundation Trust, Surrey, UK e Medical Research Institute, University of Dundee, Ninewells Hospital & Medical School, Dundee, UK f National Institute for Health Research Comprehensive Biomedical Research Centre, Guy’s and St. Thomas’ NHS Foundation Trust, London, UK b

article in formation Article history: Received 3 October 2011 Received in revised form 15 February 2012 Accepted 6 March 2012

AIM: To measure the change in diagnostic accuracy of conventional film-screen mammography and full-field digital mammography (FFDM) with the addition of digital breast tomosynthesis (DBT) in women recalled for assessment following routine screening. MATERIALS AND METHODS: Ethics approval for the study was granted. Women recalled for assessment following routine screening with screen-film mammography were invited to participate. Participants underwent bilateral, two-view FFDM and two-view DBT. Readers scored each lesion separately for probability of malignancy on screen-film mammography, FFDM, and then DBT. The scores were compared with the presence or absence of malignancy based on the final histopathology outcome. RESULTS: Seven hundred and thirty-eight women participated (93.2% recruitment rate). Following assessment 204 (26.8%) were diagnosed as malignant (147 invasive and 57 in-situ tumours), 286 (37.68%) as benign, and 269 (35.4%) as normal. The diagnostic accuracy was evaluated by using receiving operating characteristic (ROC) and measurement of area under the curve (AUC). The AUC values demonstrated a significant (p ¼ 0.0001) improvement in the diagnostic accuracy with the addition of DBTcombined with FFDM and film-screen mammography (AUC ¼ 0.9671) when compared to FFDM plus film-screen mammography (AUC ¼ 0.8949) and film-screen mammography alone (AUC ¼ 0.7882). The effect was significantly greater for soft-tissue lesions [AUC was 0.9905 with the addition of DBT and AUC was 0.9201 for FFDM with film-screen mammography combined (p ¼ 0.0001)] compared to microcalcification [with the addition of DBT (AUC ¼ 0.7920) and for FFDM with film-screen mammography combined (AUC ¼ 0.7843; p ¼ 0.3182)]. CONCLUSION: The addition of DBT increases the accuracy of mammography compared to FFDM and film-screen mammography combined and film-screen mammography alone in the assessment of screen-detected soft-tissue mammographic abnormalities. Ó 2012 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.

* Guarantor and correspondent: M.J. Michell, Breast Radiology Department, King’s College Hospital, Denmark Hill, London SE5 9RS, UK. Tel.: þ44 (0) 20 3299 3875; fax: þ44 (0) 20 3299 4363. E-mail address: [email protected] (M.J. Michell). 0009-9260/$ e see front matter Ó 2012 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.crad.2012.03.009

M.J. Michell et al. / Clinical Radiology 67 (2012) 976e981

Introduction X-ray mammography using analogue film-screen and full-field digital mammography (FFDM) has been shown to be effective for both routine screening and symptomatic breast diagnosis.1,2 The accuracy of conventional mammography is limited by anatomical noise resulting from the superimposition of normal structures. In clinical practice, this may affect both sensitivity and specificity d cancer detection may be limited, particularly in younger women and those with dense breast patterns because mammographic evidence of the tumour may be completely or partially obscured. Conversely, overlying normal structures may produce an appearance with conventional twodimensional mammography, which is suspicious for cancer and leads to recall for further assessment following routine screening. Digital breast tomosynthesis (DBT) has the potential to improve the accuracy of mammography by enabling the reader to view x-ray images of the breast tissue as a series of thin reconstructed sections, so overcoming the problem of overlying tissues on conventional twodimensional images.3 The technical features of DBT have been described in detail in previously published work.3 Recent articles and presentations at scientific meetings have addressed the application of DBT in clinical practice. Some data have shown an improvement in specificity and Breast Imaging Reporting and Data System (BIRADS) classification of lesions using DBT alone or in combination with FFDM.4e10 However, there remains some uncertainty about whether and how DBT should be used in routine screening, in the assessment and diagnostic work-up of mammographically detected abnormalities found in routine screening and in women presenting with breast symptoms. The present study examines the impact of the addition of FFDM and DBT on the diagnostic accuracy of mammography in a population of women recalled following film-screen mammography alone.

Materials and methods The aim of the study was to assess the potential additional diagnostic accuracy over film-screen mammography, FFDM and DBT in a population of women recalled for assessment in a screening programme. The study primarily measures the specificity of these two additional digital imaging techniques in an unblinded clinical setting. This design was required as at the time DBT was not approved for routine use in breast screening in the UK. Research ethics committee approval for the study was obtained in November 2008 and women were invited to participate from January 2009 to July 2010. Women eligible for recruitment were those recalled for further assessment of a mammographic abnormality found on routine filmscreen mammographic screening. The screening examinations were carried out either on a mobile or static screening site. All screening mammograms were double read by radiologist or radiographer film readers meeting the UK

977

National Health Service Breast Screening Programme (NHSBSP) standard of reading more than 5000 mammograms per annum. In cases of discordance between reader one and reader two, the decision to recall or not was made following discussion and consensus by two or more readers. The diagnostic work-up at assessment was carried out by one of the team of five specialist breast radiologists, according to NHSBSP screening assessment guidelines.11 Women eligible for the study were provided with written information at the time of invitation to attend the breast assessment clinic. On attendance at the clinic, the study was discussed with eligible women and written consent was obtained from those who agreed to participate. Participating women underwent bilateral two view medialelateraleoblique (MLO) and cranio-caudal (CC) FFDM and DBT within a single compression episode for each projection. The mammography machine was a FFDM unit with tomosynthesis capability (Hologic Selenia Dimensions, Hologic, Bedford, MA). Eleven low-dose projection images are acquired during movement of the x-ray tube through a 15 arc. Reconstructed 1 mm sections in a plane parallel to the digital image receptor surface are then produced for viewing. During the period of this study routine quality control measurements were made on the Dimensions system following the methods and protocols suggested by the Institute of Physics and Engineering in Medicine (IPEM) and the NHSBSP.12e14 For FFDM the mean glandular dose (MGD) to a standard breast was calculated using software provided by the National Centre for the Coordination of Physics in Mammography.15 The dose levels varied between 1.37 and 1.57 mGy. The range was due to small differences in machine performance over time, and a series of software upgrades provided by the manufacturer, which resulted in small changes in dose. Currently, a national protocol for the measurement of breast dose for DBT has not been finalized. Therefore, DBT dose levels were estimated using data provided by the manufacturer (Hologic) and varied between 1.66 and 1.90 mGy for a standard breast. The screen-film, FFDM, and DBT examinations were viewed sequentially by the radiologist carrying out the assessment according to a standard protocol and data were recorded on the research data collection form. The filmscreen images were viewed first, followed by the FFDM and then the DBT images. The radiologist was able to use the tools available on the digital mammography workstation for adjustment of brightness and contrast, magnification and black/white image inversion for the FFDM and DBT images. DBT images were viewed as 1 mm reconstructed sections and the facility to increase the section thickness was available. The following data were recorded for each case: mammography features of any lesion resulting in recall to assessment including mammographic sign (circumscribed lesion, spiculate lesion, microcalcification, parenchymal distortion, asymmetry), size in millimetres, suspicion level according to the Royal College of Radiologists Breast Group

978

M.J. Michell et al. / Clinical Radiology 67 (2012) 976e981

mammography classification system (1 ¼ normal, 2 ¼ benign, 3 ¼ probably benign, 4 ¼ suspicious, 5 ¼ malignant), site on both the MLO and CC views and mammographic breast density (fatty, glandular, dense).16 The final outcome was recorded as normal/benign, malignant invasive or malignant non-invasive. Cases recorded as normal/benign included all cases where further imaging and clinical assessment showed no features to suggest malignancy and those cases that underwent needle biopsy or surgery showing benign findings. Malignant invasive and malignant non-invasive results were based on final surgical histology. The diagnosis and management of all cases undergoing needle biopsy and/or surgery were discussed at the weekly prospective multidisciplinary meetings. The inferential statistical analysis was performed using receiver operative characteristic (ROC) methods to evaluate the results. Sub-analysis was carried out for mammographic feature and parenchymal density. ROC analysis was performed to assess and compare the diagnostic accuracy of the three imaging methods by calculating the area under the curve (AUC) as described by Hanley et al.17,18 A ROC graph, with the y-axis representing the true-positive rate and the x-axis representing false-negative rate, was used to plot the individual and summary points of sensitivity and specificity. All statistical analyses were carried out with Stata Software version 11.0.19

Results Seven hundred and ninety-two women recalled to assessment because of a mammographic abnormality following routine screening using film-screen mammography were invited to participate in the study over an accrual period of 18 months. Women recalled because of a clinical symptom in whom the screening mammograms were normal were not invited to participate. Seven hundred and thirty ¼ eight of the 792 (93.2%) women agreed to participate in the study. Seven hundred and ninety-two lesions were identified in 738 participants. In 17 patients, there were two lesions (two patients with bilateral benign lesions, two patients with bilateral malignant lesions, and one patient with ipsilateral benign and malignant lesions. The remaining 12 patients had two ipsilateral malignant lesions.) In two patients, there were three lesions (one patient with two cancers in the contralateral breast, one patient with three cancers in one breast). Two hundred and four of 759 (26.8%) lesions were malignant: 147/204 (72%) were invasive cancer, 57/ 204 (27.9%) were ductal carcinoma in situ (DCIS). Following assessment, 555/759 (73%) lesions were found to be either normal or benign d these included cysts and lesions with a benign outcome on either needle biopsy or surgical histology. The additional information provided by DBT significantly improved the accuracy compared to film-screen mammography alone and FFDM with film-screen mammography combined. The diagnostic accuracy of the

imaging methods as reflected by the mean (SE) of AUC demonstrated that for film-screen mammography alone, the AUC was 0.7882  0.0198, 95% CI 0.74945e0.82702; for FFDM with film-screen mammography combined the AUC was 0.8949  0.0124, 95% CI 0.87061e0.91915; and with the addition of DBT the AUC was 0.9671  0.0050, 95% CI 0.95732e0.97683. The difference in AUC between DBT and FFDM with film- screen mammography combined was 0.0722 (p ¼ 0.0001), and the difference in AUC between DBT and film-screen mammography alone was 0.1789 (p ¼ 0.0001; Fig 1). The sensitivity (absolute and complete), specificity, positive predictive value (PPV), and negative predictive value (NPV) for FFDM and DBT are shown in the Table 1. The distribution of malignant (n ¼ 204) and benign (n ¼ 555) lesions according to mammography score and imaging method is shown in Figs 2 and 3. Of the 204 cancers, 70/204 (34.3%) were classified as malignant (M5) on film-screen mammography; this improved to 81/204 (39.7%) on FFDM and 119/204 (58.3%) with the addition of FFDM and DBT, respectively. The addition of DBT resulted in fewer malignant lesions being classified as M3/M4 compared to film-screen mammography and FFDM combined. Six of the 204 (2.94%) cancers were only visualized on DBT. Two lesions were ipsilateral additional cancers, two lesions were contralateral cancers and one lesion found in the ipsilateral breast in a patient recalled for benign microcalcification proven by needle biopsy. Women with benign findings were discharged from the assessment clinic to continue routine screening. The followup period was 18 to 36 months. In one case, a 10 mm invasive grade 1 contralateral cancer was diagnosed and treated 24 months following initial diagnosis and treatment of the screen-detected cancer. This was detected during a routine retrospective audit of screen-detected cancers and was not diagnosed on routine follow-up on FFDM. No other cases are known to have presented with an interval breast cancer through routine interval cancer data collection by the service. The addition of DBT resulted in more normal/benign cases being classified as (M1/M2) [412/555 (74.2%)]

Figure 1 ROC curves for film-screen, FFDM, and DBT imaging scores (n ¼ 759).

M.J. Michell et al. / Clinical Radiology 67 (2012) 976e981 Table 1 Computed values for absolute sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of histology-proven malignant lesions.

Absolute sensitivitya Complete sensitivityb Specificity PPV NPV

Full-field digital mammography (%)

Digital breast tomosynthesis (%)

39.7 97.5 51 42.3 98.3

58.3 100 74.2 58.8 100

a

Absolute sensitivity ¼ M5 counted as positive. Complete sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) ¼ M3 þ M4 þ M5 counted as positive. b

compared to film-screen mammography alone [21/555 (3.78%)] and FFDM with film-screen mammography combined [283/555 (50.9%)]. Fig 6 shows the distribution by mammography score on DBT of malignant and benign softtissue lesions classified as M3 on FFDM. There were no malignant cases classified as M1 or M2 on DBT. Twenty-two of 218 (10.1%) malignant lesions were classified as either suspicious or malignant (M4/M5) on DBT. Sub-analysis by mammographic density did not show any significant difference for the detection of cancers with radiographically fatty, dense, or glandular breasts (Table 2). Analysis according to mammographic features showed that in soft-tissue lesions there was a significant difference in accuracy between FFDM with film-screen mammography combined and the addition of DBT (Fig 4); AUC 0.9201  0.0133 (95% CI 0.89411e0.94614) and 0.9905  0.0024 (95% CI 0.98585e0.99519), respectively (difference in AUC 0.0704; p ¼ 0.0001). However, for microcalcification, there was no significant difference between the two imaging techniques (Fig 5). Area under the ROC curve for DBT was 0.7920  0.0320 (95% CI 0.72930e0.85460) and for FFDM with film-screen mammography combined was 0.7843  0.0322 (95% CI 0.72115e0.84752; difference 0.0077; p ¼ 0.3182).

Discussion Previously published studies have demonstrated that the use of DBT in addition to FFDM may the improve the accuracy in the classification of mammographic features, as

Figure 2 The distribution of malignant lesions (n ¼ 204) according to mammography score for film-screen, FFDM, and DBT.

979

measured using BIRADS or similar mammography classification systems.4e10 The improvements in diagnostic accuracy are due to the reduction in anatomical noise of the image and greater ability of the radiologist to perceive, view, and interpret the details of both normal and abnormal features on x-ray mammograms.3 In this study of women recalled for further assessment following routine screening, FFDM was more accurate than screen-film mammography. This is consistent with the findings of trials comparing the accuracy of film-screen mammography and FFDM.1,2 The addition of DBT, however, showed a significant improvement in accuracy compared to both film-screen mammography alone and film-screen mammography and FFDM combined. The improvement in visualization of subtle signs using DBT enabled the reader to classify indeterminate lesions found on FFDM more accurately as either more suspicious for malignancy or more likely to be benign. In this study, additional lesions were detected on DBT that had not been visualized with film-screen mammography with FFDM combined. Six additional cancers, all spiculate lesions, were found in four patients. In all of these cases, the additional foci were identified on second-look ultrasound, aided by the three-dimensional information provided by DBT. In 126/ 218 (57.8%) cases with low-suspicion (M3) lesions on filmscreen mammography with FFDM combined, DBT correctly demonstrated that the case was normal or benign. The normal breast parenchyma has a different appearance on DBT compared to FFDM but the radiologists in the study became accustomed to this quickly and there were no significant problems with false-positive diagnosis on DBT. The improvement in diagnostic accuracy with the additional information from DBT was limited to soft-tissue lesions, as might be expected from the reduction in the effect of anatomical noise using DBT. For microcalcification, DBT did not improve accuracy. DBT was not compared with fine-focus magnification mammography in this study. The change in accuracy was examined prospectively. In order to compare the different techniques directly, the DBT images need to be read independently of the FFDM images, and the present authors are currently engaged in a multi-reader blinded comparison study. There are significant implications for both screening and diagnostic practice from this and other studies.4 DBT may increase the efficiency and effectiveness of screening through improvement in both specificity and sensitivity. Better visualization of normal and benign features may improve the ability of the film reader to confidently diagnose cases without malignancy and thereby reduce the recall rate, with savings in resources for the healthcare system and fewer women suffering the anxiety and inconvenience associated with recall. Conversely, improved visualization of abnormal features may improve sensitivity for the detection of small cancers with subtle signs and decrease the false-negative interval cancer rate. However, these potential improvements in specificity and sensitivity will need to be carefully examined in large-scale trials. There are increased costs associated with DBT, related to both equipment and reading time, but a cost-effectiveness

980

M.J. Michell et al. / Clinical Radiology 67 (2012) 976e981

Figure 3 The distribution of the benign and normal lesions (n ¼ 555) according to mammography score for film-screen, FFDM, and DBT.

analysis was beyond the scope of this study and should be addressed by future research. The equipment from a single manufacturer was used in conducting this trial. The technique and features of DBT machines vary considerably between different manufacturers and clinical performance results are, therefore, not necessarily generalizable. In the diagnostic and assessment setting, imaging workup has traditionally involved both ultrasound and supplementary mammography, including spot compression and magnification views. DBT provides detailed mammographic information for soft-tissue lesions, demonstrates the whole breast, and is not dependent on the accurate positioning over the target lesion required for spot compression views. Law20 investigated the breast dose from magnification films on a range of x-ray units of various designs with a nominal magnification factor of 1.8. The ratio derived between the dose level using magnification geometry and conventional contact geometry was 2.2, giving an average magnification film dose of 5 mGy (based on an average screen film MGD of 2.3 mGy). Thus the use of FFDM with DBT can provide a considerable dose advantage over the use of screen-film magnification techniques. A comparison of DBT with ultrasound was beyond the scope of this study. Further studies are required to assess the added value of DBT in the diagnostic work-up and local staging of malignancy in the context of the full range of

current techniques, including ultrasound, magnification and spot compression mammography, and MRI. The conclusions that can be reached from this prospective preliminary study are limited by the method; the readers were unblinded, they read and scored the DBT

Figure 4 ROC curve analysis for soft-tissue lesions (n ¼ 603).

Table 2 ROC analysis of fatty breasts (n ¼ 115) and dense þ glandular breasts (n ¼ 644). Fatty breast

AUC SE 95% CI p-Values

Dense and glandular breast

FFDM

DBT

FFDM

DBT

0.9342 0.0202 0.894e0.973 0.0002

0.9903 0.0062 0.978e1.000

0.8863 0.0143 0.858e0.914 0.0001

0.9624 0.0059 0.950e0.973

FFDM, full-field digital mammography; DBT, digital breast tomosynthesis; AUC, area under the curve.

Figure 5 ROC curve analysis for micro-calcification (n ¼ 156).

M.J. Michell et al. / Clinical Radiology 67 (2012) 976e981

Figure 6 Lesions classified as M3 on FFDM (n ¼ 218). Distribution by mammography score on DBT.

images with knowledge of the film-screen mammography and FFDM findings. This study has, therefore, demonstrated improved diagnostic accuracy of film-screen mammography and FFDM combined compared to film-screen mammography alone, and improved accuracy when DBT is added to film-screen mammography and FFDM, in the interpretation of soft-tissue mammographic abnormalities in women recalled following routine screening.

Acknowledgement This project was supported with funding from the NHSBSP.

References 1. Hambly NM, McNicholas MM, Phelan N, et al. Comparison of digital mammography and screen-film mammography in breast cancer screening: a review in the Irish breast screening program. AJR Am J Roentgenol 2009;193:1010e8.

981

€ nnow K, Olsson M, et al. Digital versus screen-film 2. Heddson B, Ro mammography: a retrospective comparison in a population-based screening program. Eur J Radiol 2007;64:419e25. 3. Niklason LT, Christian BT, Niklason LE, et al. Digital tomosynthesis in breast imaging. Radiology 1997;205:399e406. 4. Andersson I, Ikeda DM, Zackrisson S, et al. Breast tomosynthesis and digital mammography: a comparison of breast cancer visibility and BIRADS classification in a population of cancers with subtle mammographic findings. Eur Radiol 2008;18:2817e25. 5. Poplack SP, Tosteson TD, Kogel CA, et al. Digital breast tomosynthesis: initial experience in 98 women with abnormal digital screening mammography. AJR Am J Roentgenol 2007;189:616e23. 6. Svahn T, Andersson I, Chakraborty D, et al. The diagnostic accuracy of dual-view digital mammography, single-view breast tomosynthesis and a dual-view combination of breast tomosynthesis and digital mammography in a free-response observer performance study. Radiat Prot Dosimetry 2010;139:113e7. 7. Gennaro G, Toledano A, Di Maggio C, et al. Digital breast tomosynthesis versus digital mammography: a clinical performance study. Eur Radiol 2010;20:1545e53. 8. Gur D, Abrams GS, Chough DM, et al. Digital breast tomosynthesis: observer performance study. AJR Am J Roentgenol 2009;193:586e91. 9. Hakim CM, Chough DM, Ganott MA, et al. Digital breast tomosynthesis in the diagnostic environment: a subjective side-by-side review. AJR Am J Roentgenol 2010;195:W172e6. 10. Good WF, Abrams GS, Catullo VJ, et al. Digital breast tomosynthesis: a pilot observer study. AJR Am J Roentgenol 2008;190:865e9. 11. Digital breast tomosynthesis. Sheffield: NHSBSP; September 2010. 12. The commissioning and routine testing of mammographic X-ray systems. In: IPEM report. York: Institute of Physics and Engineering in Medicine; 1994. 13. The Commissioning and Routine Testing of Full Field Digital Mammography Systems. version 3. In: NHSBSP Equipment Report 0604. Sheffield: NHSBSP; April 2009. 14. Quality assurance guidelines for medical physics services, in NHSBSP Publication. Sheffield: National Breast Screening Quality Assurance Coordinating Group for Physics; June 2005. 15. National Centre for the Coordination of Physics in Mammography (NCCPM). 16. Maxwell AJ, Ridley NT, Rubin G, et al. The Royal College of Radiologists Breast Group breast imaging classification. Clin Radiol 2009;64:624e7. 17. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143: 29e36. 18. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983;148:839e43. 19. Corporation S. Stata statistical software. College Station: Stata: Release 11. Statistical Software. College Station, TX: StataCorp LP. 2009. 20. Law J. Breast dose from magnification films in mammography. Br J Radiol 2005;78:816e20.