ARTICLE IN PRESS
Original Investigation
Comparison of Resource Utilization and Clinical Outcomes Following Screening with Digital Breast Tomosynthesis Versus Digital Mammography: Findings From a Learning Health System Nila H. Alsheik, MD, Firas Dabbous, PhD, Scott K. Pohlman, Kathleen M. Troeger, Richard E. Gliklich, MD, Gregory M. Donadio, Zhaohui Su, PhD, Vandana Menon, MD PhD, Emily F. Conant, MD
Abbreviations AHC Advocate Health Care BI-RADS American College of Radiology's Breast Imaging Reporting and Data System CI confidence interval DBT digital breast tomosynthesis DB digital mammography HER2 human epidermal growth factor receptor 2 IQR interquartile range
Rationale and Objectives: To compare outcomes associated with breast cancer screening with digital mammography (DM) alone versus in combination with digital breast tomosynthesis (DBT) in a large representative cohort. Materials and Methods: A total of 325,729 screening mammograms from 247,431 women were analyzed, across two healthcare systems, from June 2015 to September 2017. Patient level demographic, calculated risk levels, and clinical outcomes were extracted from radiology information system and electronic medical records. Multivariable regression modeling adjusting for institution, age, breast density, and first exam was conducted to compare patient characteristics, recall rates, time to biopsy and final diagnosis, clinical outcomes, and diagnostic performance. Participating institutions and the Coordinating Center received Institutional Review Board approval for a waiver of consent to collect and link data and perform analysis. Results: A total of 194,437 (59.7%) screens were DBT versus 131,292 (40.3%) with DM. Women with dense breasts and higher calculated risk were more likely to be screened with DBT. Recall rates were lower for DBT overall (8.83% DBT vs 10.98% DM, adjusted odds ratio, 95% confidence interval = 0.85, 0.83 0.87) and across all age groups, races, and breast densities, and at facilities that used predominantly DBT (8.05%) versus predominantly DM (11.22%), or a combination (10.73%). The most common diagnostic pathway after recall was mammography and ultrasound. Women recalled from DBT were more likely to proceed directly to ultrasound. The median time to biopsy (18 vs 22 days) and final diagnosis (10 vs 13 days) was shorter for DBT. The adjusted cancer rate, cancer detection rate, and specificity were higher for DBT.
Acad Radiol 2018; &:1 9 Acknowledgments: This study was supported by Hologic Inc., Marlborough, Massachusetts Declaration of Interest: Nila H. Alsheik MD—employee of Advocate Health and is also on Hologic's scientific advisory panel; Firas Dabbous PhD—employee of Advocate Health; Scott K. Pohlman—employee of Hologic Inc.; Kathleen M. Troeger—employee of Hologic Inc.; Richard E. Gliklich MD—employee of OM1 Inc.; Gregory M. Donadio—employee of OM1 Inc.; Zhaohui Su PhD—employee of OM1 Inc.; Vandana Menon MD PhD—employee of OM1 Inc.; Emily F. Conant MD— has grant from Hologic and is also on their scientific advisory panel. From the Advocate Caldwell Breast Center, Advocate Lutheran General Hospital, 1700 Luther Lane, Park Ridge, IL (N.H.A.); James R. & Helen D. Russell Institute for Research & Innovation, Advocate Lutheran General Hospital—Center for Advanced Care, 1700 Luther Lane, Suite 1410, Park Ridge, IL 60068 (F.D.); Hologic Inc., 250 Campus Drive, Marlborough, MA 01752 (S.K.P., K.M.T.); OM1 Inc., 800 Boylston Street, Suite 1410, Boston, MA 02199 (R.E.G., G.M.D., Z.S., V.M.); Department of Radiology, 3400 Spruce Street, Hospital of the University of Pennsylvania, Philadelphia, PA 19104 (E.F.C.). Received May 3, 2018; revised May 29, 2018; accepted May 30, 2018. Address correspondence to: E.F.C. e-mail:
[email protected] © 2018 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved. https://doi.org/10.1016/j.acra.2018.05.026
1
ARTICLE IN PRESS Academic Radiology, Vol &, No &&, && 2018
ALSHEIK ET AL
OR odds ratio PPV1 “positive predictive value 1”— the number of cancers detected per number of positive screens
Conclusion: DBT demonstrated a more efficient screening pathway and improved quality measures with lower recall rates in all patient types, reduced diagnostic mammography and shorter time to biopsy and final diagnosis. Key Words: Tomosynthesis; Recall; Cancer detection. © 2018 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
PPV 3 “positive predictive value 3”— the number of cancers diagnosed per number of biopsies performed RIS radiology information system UPHS University of Pennsylvania Health System
TAGEDH1INTRODUCTIONTAGEDN
B
reast cancer is the most frequently diagnosed cancer among women in the United States (1,2). Breast cancer mortality rates have declined significantly secondary to early mammographic screen detection and improved treatment (3,4). Digital breast tomosynthesis (DBT) is rapidly becoming the standard of care secondary to improved sensitivity and specificity (5 7). The quasi-3D data acquired with DBT improves lesion conspicuity allowing both better characterization and localization of lesions. In the United States, some imaging centers have transitioned to screening all women with DBT, while others offer either digital mammography (DM) or DBT screening depending on the availability of each modality and/or based on individual patient factors such as breast density, breast cancer risk, and insurance coverage. The enhanced characterization of lesions at DBT screening may lead to more streamlined diagnostic imaging, thereby improving cost-effectiveness. The aim of this study is to compare the clinical outcomes and diagnostic pathways of screening with DBT versus DM alone. To this end, we present a systematic analysis of screening outcomes across age, ethnicity, race, and risk profile and diagnostic pathways based upon index screening mammogram from a large and diverse multiinstitutional population. Learning health systems are systems that leverage the experience of every patient to determine the most effective and efficient care that can be offered within their organization. The core requirement of such a system is big data processing of digital health information from within and outside a healthcare organization to provide a new approach to determine diagnostic or treatment pathways and their resultant outcomes and costs. Once both input and outcome information are known, predictive analytics can be applied to enable optimization and personalization of healthcare choices so that every patient follows the most effective and efficient diagnostic or treatment pathway based on the individual
2
characteristics of the patient, the evidence base, and the organization's capabilities and outcomes.
TAGEDH1MATERIALS AND METHODSTAGEDN Participating institutions and the Coordinating Center received Institutional Review Board approval for a waiver of consent to collect and link data and perform analysis. All study procedures were Health Insurance Portability and Accountability Act compliant. Study Population and Data Collection
This analysis includes data from 39 imaging facilities from 2 large healthcare networks, Advocate Health Care (AHC) and the University of Pennsylvania Health System (UPHS), performing screening with either DBT or DM. DBT examinations were performed at 30 imaging facilities (29 facilities used the Hologic Selenia Dimensions DBT system, Marlborough, Massachusetts and one facility used the Siemens MAMMOMAT Inspiration DBT system, Tarrytown, New York). The dates of screening exams from AHC spanned June 1st 2015 to April 4th 2017 and the UPHS data spanned October 1st 2015 to September 29th 2017. Each participating institution provided clinical, patient, and imaging data at each imaging encounter. Data were pooled using a proprietary, high-security, cloud-based, automated Big Data "pipeline" leveraging the most updated technologies, including distributed file systems, no-SQL document stores (non-SQL on a nonrelational database), relational databases, and large-scale data processing engines capable of handling trillions of rows of data hosted on the OM1 intelligent data cloud (OM1, Inc, Boston, Massachusetts). The OM1 Intelligent Data Cloud, a multisource data cloud that compiles health data from multiple sources to generate individual patient profiles and cohorts for exploration and analysis. A proprietary patient index links data to
ARTICLE IN PRESS Academic Radiology, Vol &, No &&, && 2018
unique individuals across multiple data sources, while maintaining deidentification. The study cohort consisted of 247,431 individual women who underwent 325,729 screening mammograms (DBT or DM) during the study period. Women were excluded from the analysis for the following reasons: a previous diagnosis of breast cancer, previous breast augmentation, or for lack of clinical information during the screening encounter. Uniform operational definitions were applied across data sources for exposure and outcome variables. The radiology data set included exam-level data on procedure types and results issued during the study period and patient-level data on patient risk factors and demographic characteristics. Family history was self-reported by the patient at each breast imaging encounter. Based upon these data, Tyrer-Cuzick (AHC) and Gail Score (UPHS) were calculated. Age was calculated in standard fashion as the difference between the date of the screening examination and date of birth. Race was selfreported as Caucasian, African American, or Asian and categorized as other or unknown in the absence of reporting into first three categories. Ethnicity was characterized as Hispanic, non-Hispanic, or unknown. Radiologists’ assessments and recommendations were based on the American College of Radiology's Breast Imaging Reporting and Data System (BI-RADS) (8). Both participating institutions coded mammograms in their respective Radiology Information Systems (RIS) as either screening or diagnostic examinations. The RIS systems also captured BIRADS breast density category at screening (almost entirely fatty, scattered fibroglandular, heterogeneously dense, and extremely dense) and other hormonal, breast disease, and hereditary breast cancer risk factors. UHPS assigned qualitative BI-RADS breast density categories. AHC predominantly used qualitative BI-RADS with Quantra (Hologic, Marlborough, Massachusetts) as an adjunct. Screening examinations were categorized as baseline screen if indicated electronically and as subsequent screens either if indicated in the system or if a prior screen was present on record. For all other situations, the screening exam was categorized as “unknown.” Diagnostic evaluation pathways were defined as the series of procedures that women underwent during the 90 days following a positive screening mammography (BI-RADS 0; additional imaging needed). For each screening episode, any associated downstream diagnostic mammographic or ultrasound imaging, biopsy, and histopathologic results were captured. Pathways were defined empirically based on the observed sequence of imaging examinations and biopsy. Time to resolution of diagnostic evaluation was computed as the number of days between the screening mammogram and the last diagnostic procedure in the pathway. Recall rate was defined as the proportion of screening mammography examinations resulting in a recommendation for further evaluation. The operational definition for recall rate was based on the American College of Radiology's glossary of statistical terms definition which estimates the percentage of examinations interpreted as positive. For the screening
COMPARISON OF RESOURCE UTILIZATION AND CLINICAL
episode, positive examinations include BI-RADS categories 0, 3, 4, and 5 assessments and the denominator, BI-RADS categories 0, 1, 2, 3, 4, and 5 assessments. Biopsy information and cancer data were obtained from RIS at UPHS and from RIS and local institutional tumor registries from AHC. To classify women as elevated risk for breast cancer, a >20% cut point for the lifetime risk TyrerCuzick score was used for women from AHC. Since there are no recommended cut points for the lifetime Gail Model Score, a threshold to match the observed distribution of lifetime risk Tyrer-Cuzick score from AHC was selected for those patients with only a Gail assessment. Statistical Analysis
Screening outcomes were stratified by clinical and demographic patient characteristics and by screening modality. Results were tested for statistical significance using chi-square test. The unit of analysis was the screening examination. For each screening modality cohort recalled, the distribution of the most common diagnostic evaluation pathways was described. Comparisons of recall rates across strata were adjusted for institution. Considering the skewed distribution of time to biopsy and time to diagnosis, the median time to these outcomes was modeled as a function of screening modality, adjusting for institution, and race. Other clinical outcomes evaluated, in a subset of women with at least one year of follow-up, included cancer rate (per 1,000 exams) defined as the number of cancers within 365 days of any screening exam, the cancer detection rate (per 1,000 exams) defined as the number of cancers detected by screening within 365 days of a positive screen, false negative rate defined as number of cancers per 1,000 negative screens, positive predictive values including PPV1 defined as the number of cancers detected per number of positive screens and PPV3 defined as the number of cancers diagnosed per number of biopsies performed, sensitivity defined as the proportion of cancers correctly diagnosed after positive screens, and specificity defined as proportion of noncancers that were correctly identified after negative screens and with one year follow-up (9). Multivariable regression modeling was performed to evaluate cancer rates, cancer detection rates by screening modality, and to statistically adjust for any potential differences in patient or site characteristics. A priori, the logistic regression models were adjusted for institution, age categories (40 44, 45 49, 50 59, 60 79 years), breast density (using BIRADS density categories), and first exam. To account for the clustering effect of multiple screenings within a single patient, generalized estimating equation, which models recall and cancer outcomes (yes or no) as a function of screening mammogram modality, patient and site characteristics, was utilized. Missing data were uncommon (<0.1%) for the covariates in the adjusted model except first exam, which had incomplete clinical information in approximately 10% of cases. In the primary analysis, exams with missing information 3
ARTICLE IN PRESS Academic Radiology, Vol &, No &&, && 2018
ALSHEIK ET AL
on first exams were coded as "unknown."’ In sensitivity analyses, exams with "unknown" first exam status were removed. TAGEDH1RESULTSTAGEDN Overall, 194,437 or 59.6% of screening examinations were with DBT and 131,292 or 40.3% were with DM. Caucasian women, women identified at elevated risk for breast cancer based on the Gail Model or the Tyrer-Cuzick score, women with dense breasts, and those with subsequent (not baseline) screening exams were more likely to receive DBT (Table 1). These differences were significant after adjustment for institution to account for population differences at each institution. Overall, recall rates were lower for DBT than for DM (Table 2). DBT also conferred the most significant reduction in recall rates women aged 40 45. The decrease in recall rate was also statistically significant for all race and ethnicity
categories, all breast density categories, and women identified at all strata of risk for breast cancer. Recall rates for DBT were lower at facilities that had fully or predominantly (>90%) transitioned to DBT versus those that were performed in a hybrid DBT and DM environment (8.05% vs 10.43%, p < 0.001). There was no difference in recall rates for DM screens regardless of whether they were performed at sites that were fully or predominately (>90%) DM or in a hybrid DBT and DM environment (p = 0.15). After adjusting for the previously described covariates (odds ratio [OR], 95% confidence interval [CI] = 0.85, 0.83 0.87; p value < 0.001), and in sensitivity analysis where missing data were excluded (OR, 95% CI = 0.85, 0.83 0.88; p value < 0.001), recall rates remained lower in DBT versus DM. Figure 1 describes the distribution of the most common diagnostic evaluation pathways for recall by screening modality. The most common evaluation following a recalled screen,
TABLE 1. Patient Characteristics by Type of Screening Mammogram Digital Breast Digital Mammography (DM) Tomosynthesis (DBT) n = 194,437 n = 131,292
Characteristic Age (years)
Age categories
Race*
Ethnicity
Elevated risk for breast cancer by Tyrer-Cuzick Score*
Elevated risk for breast cancer by Gail Model*
Breast density*
Screening exam*
Institution
N Mean (SD) Median (Q1 Q3) 40 44 years 45 49 years 50 59 years 60 79 years Caucasian African American Asian Other Unknown Hispanic Non-Hispanic Unknown Elevated
Low Unknown Elevated Low Unknown Almost entirely fatty (0) Scattered fibroglandular densities (1) Heterogeneously dense (2) Extremely dense (3) Unknown First Subsequent Unknown Advocate Health University of Pennsylvania Health System
194,437 57.8 (10.1) 57 (50 66) 21,198 (10.9%) 26,844 (13.8%) 61,946 (31.9%) 84,449 (43.4%) 127,849 (68.8%) 38,069 (20.5%) 9,265 (5.0%) 10,610 (5.7%) 8,644 8,896 (5.1%) 167,218 (94.9%) 18,323 9,904 (10.6%)
131,292 58.6 (10.1) 58 (51 66) 12,068 (9.2%) 16,251 (12.4%) 41,855 (31.9%) 61,118 (46.6%) 67,440 (56.1%) 39,512 (32.8%) 4,804 (4.0%) 8,525 (7.1%) 11,011 11,912 (11.2%) 94,179 (88.8%) 25,201 7,849 (7.3%)
83,426 (89.4%) 10,452 5,357 (9.2%)
100,265 (92.7%) 15,238 400 (5.6%)
52,955 (90.8%) 32,343 16,080 (8.3%) 85,086 (43.8%) 80,470 (41.4%) 12,744 (6.6%) 57 16,278 (10.0%) 146,627 (90.0%) 31,532 103,782 (53.4%) 90,655 (46.6%)
6,714 (94.4%) 826 13,355 (10.2%) 66,279 (50.5%) 46,314 (35.3%) 5,344 (4.1%) 0 19,672 (15.1%) 110,769 (84.9%) 851 123,352 (94.0%) 7,940 (6.0%)
* Statistically significant (p value < 0.001) comparing DBT and DM after adjustment for institution.
4
ARTICLE IN PRESS Academic Radiology, Vol &, No &&, && 2018
COMPARISON OF RESOURCE UTILIZATION AND CLINICAL
TABLE 2. Recall Rates by Patient and Facility Characteristics and Type of Screening Mammogram Recall Rate
Subgroups of Interest Overall n, % Age categories 40 44 45 49 50 59 60 79 Race categories Caucasian African American Asian Other Unknown Ethnicity categories Hispanic Non-Hispanic Unknown Breast Density categories Almost entirely fatty (0) Scattered fibroglandular densities (1) Heterogeneously dense (2) Extremely dense (3) Breast risk categories by Tyrer-Cuzick Score# Elevated risk Low risk Unknown Breast risk categories by Gail Model# Elevated risk Low risk Unknown Imaging Facility Predominantly (>90%) DBT Predominantly (>90%) DM Mix of DBT and DM
Digital Breast Tomosynthesis (DBT)
Digital Mammography (DM)
p Value for Overall Comparison*
17,165, 8.83%%
14,415, 10.98%%
<0.001
12.86% 11.02% 8.85% 6.99%
16.84% 14.13% 10.55% 9.14%
<0.001 <0.001 0.007 <0.001
8.70% 9.01% 10.17% 8.42% 8.97%
10.91% 11.45% 13.43% 10.36% 9.16%
<0.001 0.017 <0.001 0.001 0.98
10.00% 8.91% 7.54%
11.17% 11.76% 7.98%
0.020 <0.001 0.082
5.16% 7.57% 10.69% 9.89%
8.20% 9.79% 13.23% 13.21%
0.009 <0.001 <0.001 0.005
11.61% 9.61% 13.14%
13.57% 10.83% 11.41%
<0.001 0.001 <0.001
8.33% 6.84% 7.91%
11.25% 9.43% 9.56%
0.044 <0.001 0.083
8.05% NA 10.43%
NA 11.22% 10.93%
NA NA 0.002
* p values for comparing recall rates between DBT and DM within each stratum of age, race, ethnicity, and breast density categories were adjusted for institution. # Breast risk categories by Tyrer-Cuzick Score are only available for Advocate Health Care, and breast risk categories by Gail Model are only available for University of Pennsylvania Health System.
regardless of modality, was a diagnostic mammogram followed by an ultrasound. The next most common diagnostic pathway was evaluation with diagnostic mammogram alone. However, women in the DBT group were less likely to receive a diagnostic mammogram as a part of their diagnostic evaluation compared to those screened with DM (11.9% of the DBT group vs 2.8% of the DM group had a diagnostic imaging pathway which only included ultrasound, p < 0.001). Biopsy rates in the overall cohort and in the recalled cohort were 1.5% and 16.3% for DBT and 1.3% and 11.8% for DM, respectively. Out of the 17,165 women recalled in the DBT group, 2,822 (16%) underwent a biopsy within 90 days of a positive screening examination compared to 1,708 (12%) out of 14,415 women recalled in the DM group. The median
(IQR) time to biopsy, after a positive screening exam, was significantly lower in the DBT group (18, 11 28) compared to DM (22, 14 34). Ninety four percent (n = 16,109) of the women in the DBT group and 92% (n = 13,301) in the DM group had a final diagnosis within 90 days of a positive screening examination. Median (IQR) time to diagnosis was shorter in the DBT group (10, 5 18) compared to the DM group (13, 8 22). The difference between modalities in time to biopsy and time to diagnosis, remained significant after adjustment for institution and race (p < 0.001). The difference between modalities was also observed within hybrid DBT + DM sites (time to biopsy = 19 days with DBT versus 23 days with DM, p < 0.001; time to diagnosis = 12 days vs 13 days, p = <0.001).
5
ARTICLE IN PRESS Academic Radiology, Vol &, No &&, && 2018
ALSHEIK ET AL
Figure 1. Most common diagnostic pathways within 90 days following a positive screening digital mammography (DM) or digital breast tomosynthesis (DBT) examination.
Table 3 presents cancer outcomes for women with one year of follow-up by screening modality. The overall cancer rate, cancer detection rate, PPV1, sensitivity, and specificity were higher for DBT compared to DM exams and these differences persisted after adjustment in multivariate models and in sensitivity analysis. In addition, in the cohort screened with DBT, there was an increased invasive cancer detection rate and a decrease in false negative exams. However, in adjusted analyses, only the increased invasive cancer detection rate remained statistically significant. Figure 2 presents cancer detection rates for women with one year of follow-up stratified by age, breast density, and screening modality. Cancer detection rates were consistently higher for DBT compared to DM across all age groups and breast density categories. In adjusted analyses, statistically significant differences were seen in the 60 79 year old age group and in the heterogeneously dense groups, which were the groups with the largest sample size.
Table 4 presents data on the clinical characteristics of the cancers identified by each screening modality. Histopathologic, stage, and receptor data were available for 807 women with cancer in the study cohort. There were no differences in tumor stage, nodal status, or estrogen and progesterone status between women in the DBT or DM groups. The cancer detection rates per 1,000 exams by tumor grade for DBT by Grade I, II, III, or IV, and Unknown were 1.3, 1.4, 0.7, and 1.4, respectively. In comparison, the cancer detection rates by tumor grade for DM were 0.8, 1.5, 0.9, and 0.6 for the same groups. A larger proportion of cancers in the DBT group were categorized as Grade 1 and human epidermal growth factor receptor 2 (HER2) negative compared to DM. TAGEDH1DISCUSSIONTAGEDN This study utilized a learning health system to examine clinical outcomes and downstream imaging after screening with DBT or DM in two large, geographically and clinically diverse U.S. health care networks. The results of this study confirm previous reports of recall rate reduction achieved with DBT with the clinical benefits of improved PPV-1, specificity, and cancer detection rate (7,9). Recall rate reduction was most significant in the 40 44 year old age group, while increased cancer detection was most significant in the 60 79 year old age group and the heterogeneously dense breast subtype. This study builds upon prior studies by demonstrating overall reduction in recall rate as well as reductions across demographic (age, race, ethnicity), and clinical (breast risk profile and density) strata, and facilities. Further, our data suggest DBT may provide a more efficient diagnostic workup at recall and a faster time to biopsy and ultimate diagnosis. There were significant differences in the risk profile of women who received DBT versus DM; the former group was more likely to have dense breasts and higher calculated risk scores for breast cancer. This finding underscores the
TABLE 3. Cancer Outcomes by Screening Modality in Women with Atleast 12 months of Follow-up After a Positive Screening Examination
Cancer Outcomes Total cancers (n) Invasive cancers (n) Cancer rate per 1000 Invasive cancer rate per 1000# Cancer detection rate per 1000 Invasive cancer detection rate per 1000# False negative rate per 1000 PPV1 (cancer/recall), % PPV3 (cancer/biopsy), % Sensitivity % Specificity %
DBT
DM
n = 95,370
n = 99,660
p Value
Odds Ratio
497 350 5.2 3.7 4.8 3.3 0.42 5.4 23.8% 92.0% 91.5%
429 328 4.3 3.3 3.8 2.9 0.52 3.5 24.9% 87.9% 89.5%
0.004 0.16 0.001 0.10 0.30 <0.001 0.47 0.039 <0.001
1.21 1.16 1.27 1.14 0.80 1.57 0.94 1.58 1.26
Unadjusted
Adjusted*
95% CI
(1.07 (0.96 (1.11 (0.97 (0.53 (1.36 (0.79 (1.02 (1.22
1.38) 1.30) 1.45) 1.34) 1.21) 1.80) 1.11) 2.43) 1.30)
Odds Ratio
1.16 1.19 1.22 1.22 0.76 1.44 0.94 1.64 1.20
95% CI
(1.01 (1.01 (1.05 (1.03 (0.48 (1.24 (0.78 (1.01 (1.16
1.35) 1.40) 1.42) 1.45) 1.19) 1.68) 1.13) 2.64) 1.24)
* Adjusted for institution, age categories (40 44, 45 49, 50 59, 60 79 years), breast density (the four BI-RADS density categories), and first exam. # 15 women with cancer did not have information on whether the cancer was invasive and were therefore excluded from this analysis.
6
ARTICLE IN PRESS Academic Radiology, Vol &, No &&, && 2018
Figure 2. Cancer detection rate by age (A), breast density (B), and screening modality in women with at least 12 months follow-up after a positive screening.
importance of adequately controlling for these population differences in comparative analyses. Our data demonstrate that recall rates are lower with DBT overall, across all age groups, all races, non-Hispanic ethnicity, all breast density categories, and in women with elevated lifetime TyrerCuzick risk score. These findings may assist clinical decision making for specific groups of women who are the most likely to be recalled such as those with dense breasts. The observation that African American women and women of Hispanic ethnicity were less likely to receive DBT raises concerns related to DBT access and potential health care disparities. As recommendations for screening mammography are increasingly delayed past the original cut point of 40 years of age, DBT may be of value in younger women, particularly those at higher lifetime risk of breast cancer. Facility level data from our study indicate that those facilities which fully transitioned to DBT exhibit lower DBT recall rates than hybrid and predominantly DM screen environments. The finding that women in the DBT group were more likely to receive ultrasound alone as their diagnostic test is in line with Lourenco et al. who reported that 28.3% in the DBT recall cohort proceeded to ultrasound alone for diagnostic evaluation versus 2.6% in the DM recall cohort, in a single center study (10). A potential explanation for this is
COMPARISON OF RESOURCE UTILIZATION AND CLINICAL
that superior lesion localization, characterization and conspicuity on the index screening DBT provides higher diagnostic confidence and the more direct route to ultrasound alone. Time to biopsy and time to final diagnosis was significantly shorter in the DBT group after adjustment for institution and race. This may be, in part, because those institutions with greatest DBT utilization represent tertiary referral centers within their health care systems, thereby implicitly having the most interdisciplinary resources by which to shorten ultimate time to biopsy, time to final diagnosis, and time to treatment. Within our cohort, however, those institutions with greater DBT screen utilization also exhibit proportionately higher DBT diagnostic utilization at recall suggesting that these findings may be due to improved diagnostic confidence in biopsy recommendations arising from DBT diagnostic examinations. This is further supported by our finding that in subgroup analyses within hybrid sites, time to biopsy, and diagnosis were shorter with DBT than DM. In line with these findings, Raghu et al. reported a decrease in the proportion of lesions characterized as probably benign (BI-RADS 3) and an increase in the proportion of examinations characterized as benign (BI-RADS 1 or 2) DBT versus DM cohorts (11). In our study, there was a 22% higher cancer detection rate for DBT compared to DM. The majority of screen detected cancers were early stage for both DBT and DM with no significant differences in nodal status between the two groups. There were significant differences in the distribution of tumor size and grade with a larger proportion of Grade 1 tumors in the DBT group. There was a trend for DBTdetected cancers to be human epidermal growth factor negative. Similar results were also reported by Kim et al., demonstrating a luminal A-like subtype (estrogen receptor positive or progesterone receptor positive or both, human epidermal growth factor receptor 2 negative, and Ki-67 expression <14%) were more often associated with DBT screening than DM alone screening on multivariate analysis (12). Further investigation is warranted to evaluate whether DBT screening detects earlier stage, less advanced, and aggressive breast cancers. The strengths of this study include the large, geographically, ethnically, and racially diverse screening cohort from multisite academic and community health care networks. Linkage with either RIS and/or local tumor registry allowed analysis of the histopathologic characteristics of the breast cancer detected as well as the false negative screens. Limitations include linkage for some of the data with only local tumor registry from one institution versus complete matching with state or larger population-based tumor registry, which may affect sensitivity and specificity calculations. Additionally, while we adjusted for facility and several patient factors associated with screening outcomes when comparing DBT and DM outcomes, it is possible that other factors not included in the adjustments may influence the results. Our data demonstrate a streamlined diagnostic imaging evaluation in the DBT cohort and sustained recall rate reduction across all patient strata. Improved imaging efficiency, decreases 7
ARTICLE IN PRESS Academic Radiology, Vol &, No &&, && 2018
ALSHEIK ET AL
TABLE 4. Clinical Characteristics for Cancers Detected Within 12 Months of a Positive Screening Examination by Screening Modality
Characteristic
Digital Breast Tomosynthesis n = 397
Digital Mammography n = 401
p Value for Overall Comparison
86, 26.9%% 159, 49.7% 55, 17.2% 11, 3.4% 9, 2.8% 0, 0.0% 77
82, 23.8%% 179, 52.0% 56, 16.3% 11, 3.2% 11, 3.2% 5, 1.5% 66
0.34
49, 39.5% 32, 25.8% 9, 7.3% 34, 27.4% 273
64, 40.3% 29, 18.2% 31, 19.5% 35, 22.0% 251
0.018
51, 19.5% 211, 80.5% 135
56, 18.9% 240, 81.1% 114
0.87
109, 38.5% 120, 42.4% 54, 19.1% 114
87, 25.4% 160, 46.8% 95, 27.8% 68
<0.001
341, 89.5% 40, 10.5% 16
348, 88.1% 47, 11.9% 15
0.54
300, 79.2% 79, 20.8% 18
299, 75.9% 95, 24.1% 16
0.28
27, 9.4% 259, 90.6% 0, 0.0% 111
48, 15.3% 261, 83.4% 4, 1.3% 97
0.025
Cancer stage Stage, n, % 0 I IIA IIB III IV Unknown Tumor size Tumor size, n, % <11 mm 11 15 mm 16 20 mm >= 21 mm Unknown Nodal status Nodal status, n, % Positive Negative Unknown Tumor grade Grade, n, % I II III or IV Unknown Estrogen receptor status Status, n, % Positive Negative Unknown Progesterone receptor status Status, n, % Positive Negative Unknown Human epidermal growth factor receptor status Status, n, % Positive Negative Borderline Unknown
in false positives, and improved cancer detection reinforce the value of DBT screening. Our diverse patient population and screening setting demonstrate that these improvements in outcomes are generalizable to all practice environments. DBT allows the detection of clinically occult, early stage breast cancers, and reduces false positive screening mammograms, a critical component to maintaining patient compliance with national mammographic screening programs.
8
TAGEDH1REFERENCESTAGEDN 1. Breast Cancer Facts and Figures 2017-2018 Atlanta: American Cancer Society; 2017 [cited 2018]. Available from: https://www.cancer.org/content/dam/cancer-org/research/cancer-facts-and-statistics/breast-cancer-facts-and-figures/breast-cancer-facts-and-figures-2017-2018.pdf. 2. Miller JD, Bonafede MM, Herschorn SD, et al. Value Analysis of digital breast tomosynthesis for breast cancer screening in a US medicaid population. J Am Coll Radiol 2017; 14:467–474. 3. Marmot MG, Altman DG, Cameron DA, et al. The benefits and harms of breast cancer screening: an independent review: a report jointly
ARTICLE IN PRESS Academic Radiology, Vol &, No &&, && 2018
4.
5.
6.
7.
commissioned by Cancer Research UK and the Department of Health (England) October 2012. Br J Cancer. 2013; 108:2205–2240. Coldman A, Phillips N, Wilson C, et al. Pan-Canadian study of mammography screening and mortality from breast cancer. JNCI 2014; 106. Available at: https://doi.org/10.1093/jnci/dju261. Bernardi D, Macaskill P, Pellegrini M, et al. Breast cancer screening with tomosynthesis (3D mammography) with acquired or synthetic 2D mammography compared with 2D mammography alone (STORM-2): a population-based prospective study. Lancet Oncol 2016; 17:1105–1113. Conant EF, Beaber EF, Sprague BL, et al. Breast cancer screening using tomosynthesis in combination with digital mammography compared to digital mammography alone: a cohort study within the PROSPR consortium. Breast Cancer Res Treat 2016; 156:109–116. Friedewald SM, Rafferty EA, Rose SL, et al. Breast cancer screening using tomosynthesis in combination with digital mammography. JAMA 2014; 311:2499.
COMPARISON OF RESOURCE UTILIZATION AND CLINICAL
8. 9.
10.
11.
12.
Sickles EA, D'Orsi CJ. How should screening breast US be audited? The BI-RADS perspective. Radiology 2014; 272:316–320. Conant EF, Beaber EF, Sprague BL, et al. Breast cancer screening using tomosynthesis in combination with digital mammography compared to digital mammography alone: a cohort study within the PROSPR consortium. Breast Cancer Res Treat 2016; 156:109. Lourenco AP, Barry-Brooks M, Baird GL, et al. Changes in recall type and patient treatment following implementation of screening digital breast tomosynthesis. Radiology 2015; 274:337. Raghu M, Durand MA, Andrejeva L, et al. Tomosynthesis in the diagnostic setting: changing rates of BI-RADS final assessment over time. Radiology 2016; 281:54. Kim JY, Kang HJ, Shin JK, et al. Biologic profiles of invasive breast cancers detected only with digital breast tomosynthesis. Am J Roentgenol 2017; 209:1411–1418.
9