A Simulation Screening Mammography Module Created for Instruction and Assessment

A Simulation Screening Mammography Module Created for Instruction and Assessment

ARTICLE IN PRESS Innovations in Radiology Education A Simulation Screening Mammography Module Created for Instruction and Assessment: Radiology Resi...

553KB Sizes 0 Downloads 33 Views

ARTICLE IN PRESS

Innovations in Radiology Education

A Simulation Screening Mammography Module Created for Instruction and Assessment: Radiology Residents vs National Benchmarks Jeffrey D. Poot, DO, Alison L. Chetlen, DO1,2 Abbreviations and Acronyms ACR American College of Radiology ALH atypical lobular hyperplasia BCSC Breast Cancer Surveillance Consortium BI-RADS Breast Imaging Reporting and Data System CAD computer-aided detection DCIS ductal carcinoma in situ FN false negative FP false positive IDC invasive ductal carcinoma

Rationale and Objectives: To improve mammographic screening training and breast cancer detection, radiology residents participated in a simulation screening mammography module in which they interpreted an enriched set of screening mammograms with known outcomes. This pilot research study evaluates the effectiveness of the simulation module while tracking the progress, efficiency, and accuracy of radiology resident interpretations and also compares their performance against national benchmarks. Materials and Methods: A simulation module was created with 266 digital screening mammograms enriched with high-risk breast lesions (seven cases) and breast malignancies (65 cases). Over a period of 27 months, 39 radiology residents participated in the simulation screening mammography module. Resident sensitivity and specificity were compared to Breast Cancer Surveillance Consortium (BCSC data through 2009) national benchmark and American College of Radiology (ACR) Breast Imaging Reporting and Data System (BI-RADS) acceptable screening mammography audit ranges. Results: The sensitivity, the percentage of cancers with an abnormal initial interpretation (BI-RADS 0), among residents was 84.5%, similar to the BCSC benchmark sensitivity of 84.9% (sensitivity for tissue diagnosis of cancer within 1 year following the initial examination) and within the acceptable ACR BI-RADS medical audit range of ≥75%. The specificity, the percentage of noncancers that had a negative image interpretation (BI-RADS 1 or 2), among residents was 83.2% compared to 90.3% reported in the BCSC benchmark data, but lower than the suggested ACR BI-RADS range of 88%–95%. Conclusions: Using simulation modules for interpretation of screening mammograms is a promising method for training radiology residents to detect breast cancer and to help them achieve competence toward national benchmarks. Key Words: Screening mammography; resident education; simulation; breast imaging. © 2016 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.

ILC invasive lobular carcinoma MQSA Mammography Quality Standards Act

Acad Radiol 2016; ■:■■–■■ From the Department of Radiology, H066, Penn State Hershey Medical Center, P.O. Box 850, 500 University Drive, Hershey, PA 17033 (J.D.P.); Department of Radiology, Penn State Milton S. Hershey Medical Center, Hershey, Pennsylvania (A.L.C.). Received March 18, 2016; revised May 5, 2016; accepted July 1, 2016. 1 This author worked as a research consultant for Siemens, Inc., during November and December 2014 as an expert reader during a tomosynthesis study to apply for Food and Drug Administration (FDA) approval. This consulting work is not related to this research. 2 This author worked as a research consultant for General Electric, Inc., October 30, 2015, as an expert reader during a tomosynthesis study to apply for FDA approval. This consulting work is not related to this research. Address correspondence to: J.D.P. e-mail: [email protected], [email protected] © 2016 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.acra.2016.07.006

1

POOT AND CHETLEN

Academic Radiology, Vol ■, No ■■, ■■ 2016

TN true negative TP true positive 2D two-dimensional

INTRODUCTION

S

creening mammography is the only breast imaging modality that is known to reduce breast cancer mortality. Instruction of radiology residents in interpreting screening mammograms is challenging, and participation of radiology trainees in screening mammographic interpretation is a critical component of residency training (1). This simulation module was created to teach interpretation of screening mammography to residents in training with the overarching goal of increasing breast cancer detection among women. To limit variability of exposure to screening mammography cases as well as to track progress, efficiency, and accuracy of interpretations of screening mammograms, we created an enriched standardized set of digital screening mammograms with known outcomes. Participation in this simulation module allowed radiology residents to obtain immediate feedback and to correlate imaging characteristics with histologic findings. The goal of this module was to improve mammographic screening training and breast cancer detection, as well as to prepare more fully for the rigors of private practice or academic medicine. Additionally, the United States Mammography Quality Standards Act (2) (MQSA) mandates medical audits to track breast cancer outcome data associated with interpretive performance. Practicing breast radiologists regularly review feedback from their audit reports and are able to continually refine their interpretive skills. This screening mammography simulation experience provided residents with a similar objective feedback mechanism and the opportunity to learn about the MQSA (2) medical outcomes audit program and compare their performance against national benchmarks. This pilot research study evaluated the effectiveness of the simulation screening mammography module. MATERIALS AND METHODS A simulation module was created and modified over a period of three academic years: from 2012 to 2013, from 2013 to 2014, and from 2014 to 2015, from a collection of 266 digital screening mammograms from 2008 to 2009 to provide radiology residents with a standardized set of screening mammograms. Resident Workflow

Each radiology resident participated in the simulation screening mammography module while on their breast imaging rotation. At our institution, first- through fourth-year radiology residents rotate through breast imaging over a 5-week 2

period, a total of three times during their 4-year residency. During their introductory week, residents observed and interpreted numerous diagnostic mammograms alongside attending breast radiologists, utilized educational videos, textbooks, journal articles, and online resources, which included RadPrimer by Amirsys, Inc. As suggested in the American College of Radiology (ACR) and Society of Breast Imaging curriculum (3), residents reviewed normal mammographic anatomy, features of characteristically benign and suspicious breast calcifications and masses, asymmetries, indirect signs of malignancy such as architectural distortion, skin thickening, or axillary adenopathy, features of the surgically altered breast, as well as Breast Imaging Reporting and Data System (BIRADS) descriptors. The residents began interpreting studies from the simulation screening mammography module after their introductory week curriculum was completed. Studies were interpreted in a batch reading format. Digital images were retrieved from our General Electric (Fairfield, CT) picture archiving and communication system and then interpreted by the resident on an MQSA-approved 5-megapixel mammography-specific display monitor. Residents were provided a quiet room with optimal viewing conditions, including a low ambient light environment, while participating in this simulation module. Residents were encouraged to batch read these studies during weeks 2–4 of their standard 5-week rotation in breast imaging. Computer-aided detection (CAD) was not used for the simulated cases. Remediation and supplemental cases were then offered to the residents, if needed, in the final week of their rotation. The data set for each academic year was identical for all residents. Residents on their breast imaging rotation interpreted the data set provided for that academic year, so that if the radiology resident returned to the breast imaging rotation another academic year, the cancer and high-risk cases were different from previous years. Residents were required to read the ACR BI-RADS manual prior to interpretation of the simulation screening mammograms. Residents were instructed to use BI-RADS descriptors during the case evaluation and report creation. The residents only had options of assigning a BI-RADS category 1 (negative) or BI-RADS category 2 (benign) or BI-RADS category 0 (incomplete—need additional imaging evaluation) during this simulation screening mammography module. All studies were technically adequate; none of the studies had been a technical recall. The resident responses during this module were recorded into a secure password-protected file kept internally on the hospital network. If the residents assigned

Academic Radiology, Vol ■, No ■■, ■■ 2016SCREENING MAMMOGRAPHY FOR INSTRUCTION AND ASSESSMENT

BI-RADS category 0 to a study, they would also record laterality of callback, BI-RADS descriptor of the imaging finding, clock face, depth within the breast, and their differential diagnosis. Residents were also given a goal to interpret each exam in less than 3 minutes. After the resident interpreted 100 screening mammograms, an attending breast radiologist provided constructive feedback one-on-one with the resident discussing visual search patterns, morphologies of abnormalities, failures of perception, lack of knowledge, and misjudgments. Accuracy of interpretations and methods to increase efficiency were also reviewed. If necessary, remediation occurred at that time. Residents were also provided with this same feedback at the conclusion of completing the screening mammography module. During their feedback sessions, the residents were provided with the subsequent diagnostic images and final pathologic results from the breast biopsy for the BI-RADS category 0 cases within the simulation screening mammogram case set. The final pathologic results included histology, size, and molecular classification of malignancy. Answers were compared between the resident’s and the attending radiologist’s original interpretations as well as resident to resident performance. Junior resident performance was compared to senior resident performance. Resident sensitivity and specificity were compared to Breast Cancer Surveillance Consortium (4) (BCSC data through 2009) national benchmark and ACR BI-RADS acceptable screening mammography audit ranges (5).

First Iteration

For the first iteration of the simulation module in the academic years 2011 and 2012, residents were required to interpret 212 digital two-dimensional (2D) screening mammograms (taken in 2008) during their breast imaging rotation. All mammograms initially assigned a BI-RADS category 1 or 2 had at least 3 years’ negative follow-up. All prior screen captures were removed from the simulation screening cases. Nearly 162 of the 212 cases (76%) were negative exams, BI-RADS category 1. The remaining 50 cases were BI-RADS category 0 cases, with 49 subsequently undergoing biopsy. The single case that did not undergo biopsy represented a simple cyst at ultrasound, BI-RADS category 2. Twenty of these biopsied cases represented breast cancer, which included ductal carcinoma in situ (DCIS), invasive ductal carcinoma (IDC), invasive lobular carcinoma (ILC), and mucinous carcinoma. Two of these cases represented noncancerous high-risk lesions, which included atypical lobular hyperplasia (ALH) and papilloma. Twenty-seven of the biopsied lesions were found at histology to be benign and examples included apocrine metaplasia, benign calcifications, chronic inflammation, columnar cell change, ductal ectasia, fat necrosis, fibroadenoma, fibrocystic change, fibrosis, hemangioma, sclerosing adenosis, a simple cyst, usual ductal hyperplasia, or a combination of these entities.

Second Iteration

For the second iteration of the simulation module in the academic years 2012 and 2013, residents were required to interpret 230 digital 2D screening mammograms (taken during 2008) during their breast imaging rotation. Therefore, all mammograms initially assigned a BI-RADS category 1 or 2 had at least 4 years’ negative follow-up. All prior screen captures were removed from the simulation screening cases. The BIRADS category 1 negative cases from the first iteration of the simulation screening module were left unchanged for the second iteration; therefore, 162 of the 230 cases were BIRADS category 1, negative. The remaining 68 cases included all 50 BI-RADS 0 cases from the first iteration with supplementation of 18 additional cases. Fourteen additional breast cancers were added for a total of 34 cancer cases and included DCIS, IDC, male IDC, ILC, multifocal IDC, and tubular carcinoma. Four additional noncancerous high-risk lesions were supplemented for a total of six cases, which included atypical ductal hyperplasia, ALH, papilloma, and radial scar. Third Iteration

For the third iteration of the simulation module in the academic years 2013 and 2014, residents were required to interpret 261 digital 2D screening mammograms (taken during 2008 and 2009) during their breast imaging rotation. Therefore, all mammograms initially assigned a BI-RADS category 1 or 2 had at least 4 years’ negative follow-up. All prior screen captures were removed from the simulation screening cases. Nearly 161 of the 261 cases (62%) were BI-RADS category 1, negative. There was an exclusion of a single BI-RADS 1 case, making the total for this iteration at 161 instead of 162 as for the prior two iterations. There was also an exclusion of a single high-risk lesion of ALH that had been used in the prior iterations. Also, there were exclusions of three breast cancer cases used in the prior iterations, which included DCIS, ILC, and tubular cancer. Thirty-one additional breast cancers were added for a total of 62 cancer cases and included DCIS, IDC, inflammatory breast cancer, male IDC, recurrent DCIS in lumpectomy scar, and widely metastatic IDC. The remaining five noncancerous high-risk lesions were once again used. Additionally, two nonbreast cancer malignancies were added to the simulation module and included malignant melanoma and T-cell lymphoma. Three benign biopsied lesions were supplemented and represented a cavernous hemangioma, a columnar cell change with atypia, and a fibroadenoma. Data Set

The simulation module consisting of the three iterations included a total of 266 screening mammograms. Of the 266 cases, 162 (61%) were negative from this simulation. Of the 266 screening mammograms, 104 (39%) were classified as BIRADS category 0, and additional views were recommended. 3

POOT AND CHETLEN

Academic Radiology, Vol ■, No ■■, ■■ 2016

cancer and high-risk lesions that may otherwise take years of clinical experience to see. Of the 266 screening mammography cases, 103 were classified as BI-RADS category 0 and underwent biopsy, with 65 of the 103 cases representing breast cancer (63% of all BIRADS 0 cases); hence, the data set was enriched. Of the 103 cases, 7 (7%) represented high-risk or other cancerous lesions such as T-cell lymphoma and melanoma. Thirty-one of the 103 cases represented benign or non–high-risk lesions (30%). This finding is shown in Figure 3. BI-RADS Categorization Figure 1. Percentage of BI-RADS category 0 or BI-RADS category 1 or 2 cases in the simulation screening module. BI-RADS, Breast Imaging Reporting and Data System.

In our simulation, the attending breast radiologist’s interpretation of a screening mammogram was considered the correct answer. All cases initially assigned as a BI-RADS category 1 or 2 by the faculty breast imager had between 3 and 6 years’ negative imaging follow-up. No FNs were included in our simulation screening mammography data set. All cases initially classified as BI-RADS category 0 had known histologic outcomes, except for the single BI-RADS category 0 case, which represented a simple cyst on subsequent workup. Audit Definitions—Our Simulation

Figure 2. Number of cases of breast cancers, high-risk lesions, nonbreast cancers, and benign lesions during each simulation. BIRADS, Breast Imaging Reporting and Data System.

Although the recommended recall rate of 5%–12% per the ACR BI-RADS 2013 Atlas (5), our enriched screening mammography case set included 39% of BI-RADS category 0 cases (Fig 1). The interactive training set included an enriched data set of screening mammograms with abnormalities that included masses, architectural distortions, focal asymmetries, and calcifications of varying levels of difficulty. Each module was enriched with high-risk and precancerous breast lesions, such as atypical ductal hyperplasia, lobular neoplasia, complex sclerosing lesions, intraductal papillomas, as well as breast malignancies, including IDC, DCIS, ILC, tubular carcinoma, mucinous carcinoma, inflammatory carcinoma, and male breast cancer. Figure 2 shows the number of BI-RADS 0 case types during each simulation. Table 1 lists the breast pathology found at histology in our screening mammography simulation. Creating the screening set from cases 3 years prior to the start of the simulation ensured that there were no false negative (FN) cases. The data set was enriched with abnormal cases due to the infrequency of breast cancer seen in radiology residency. This enriched data set was created to provide a standardized set of numerous presentations of breast 4

A true positive (TP) was defined as the percentage of cases the residents identified correctly as BI-RADS 0 of the cases the attending breast radiologist interpreted as BI-RADS 0. A true negative (TN) was defined as the percentage of cases the residents identified correctly as BI-RADS 1 or 2 of the cases the attending breast radiologist interpreted as BI-RADS 1 or 2. A false positive (FP) was defined as the percentage of cases the residents identified incorrectly as BI-RADS 0 of cases the attending breast radiologist interpreted as BI-RADS 1 or 2. A FN was defined as the percentage of cases the residents identified incorrectly as BI-RADS 1 or 2 of cases the attending breast radiologist interpreted as BI-RADS 0. This is also pictorialized in Table 2. Sensitivity is defined as the TP rate and measures the percentage of positives that are correctly identified. In our simulation, this was computed as [TP/(TP + FN)]. Specificity is defined as the TN rate and measures the percentage of negatives that are correctly identified. In our simulation, this was computed as [TN/(TN + FP)]. Please see the Appendix for a comparison of our audit definitions with the BCSC (4) data as well as the ACR BIRADS (5) atlas data. Statistical Analysis

We use the attending radiologist’s interpretation as the “gold standard.” As these cases were hand selected for each iteration of the simulation screening mammography module, there were no FNs in the data set. The sensitivity of residents’ interpretation is defined as the percentage of times the positive

Academic Radiology, Vol ■, No ■■, ■■ 2016SCREENING MAMMOGRAPHY FOR INSTRUCTION AND ASSESSMENT

TABLE 1. Breast Pathology Included in Our Simulation Screening Mammography Modules Breast Cancers • • • • • • • • • •

High-Risk Lesions or Other Cancers

DCIS Inflammatory breast carcinoma IDC Invasive lobular carcinoma Male breast cancer Mucinous carcinoma Multifocal IDC Recurrent DCIS in lumpectomy scar Tubular carcinoma Widely metastatic IDC

• • • • • • •

Atypical ductal hyperplasia Complex sclerosing lesions Lobular neoplasia Malignant melanoma Papillomas Radial scar T-cell lymphoma

Benign or Non-High-Risk Lesions • • • • • • • • • • • • •

Apocrine metaplasia Benign calcifications Chronic inflammation Columnar cell change Ductal ectasia Fat necrosis Fibroadenoma Fibrocystic change Fibrosis Hemangioma Sclerosing adenosis Simple cyst Usual ductal hyperplasia

DCIS, ductal carcinoma in situ; IDC, invasive ductal carcinoma.

number (first rotation vs second and third) is assessed by modeling the probability of correctly classifying positive mammograms and the probability of correctly classifying negative mammograms using logistic regression. Institutional review board approval was waived by the institutional Human Subjects Protection Office. RESULTS

Figure 3. In our simulation screening modules, 103/104 BI-RADS category 0 lesions underwent subsequent percutaneous breast biopsy with known histologic outcomes. One of the 104 cases represented a simple cyst and did not undergo biopsy as it demonstrated benign imaging features at diagnostic ultrasound. BI-RADS, Breast Imaging Reporting and Data System.

TABLE 2. Statistical Definitions for Comparison Between Resident and Attending Radiologist Interpretations of Screening Mammography Module Cases

Resident interpretation BI-RADS 0 Resident interpretation BI-RADS 1 or 2

Attending Interpretation BI-RADS 0

Attending Interpretation BI-RADS 1 or 2

True positive

False positive

False negative

True negative

Over a period of 27 months, 39 radiology residents participated in the mandatory simulation screening mammography module 48 times during their breast imaging rotation. Nine residents interpreted two separate data sets on two different rotations. Resident performance varied by level of experience with sensitivities and specificities of those on their first rotation at 77.45% and 79.45%, second rotation at 79.41% and 82.39%, and third rotation at 82.54% and 86.01%, as depicted in Figure 4. Residents on their second rotation compared to those on their first rotation, as well as those on their third rotation compared to the first or second rotation, were more likely to have

BI-RADS, Breast Imaging Reporting and Data System.

mammograms by gold standard are correctly classified as positive by residents. The specificity is defined as the percentage of times the negative mammograms by gold standard are correctly classified as negative. The statistical significance of the dependence of the sensitivity and specificity on the rotation

Figure 4. Resident performance of those on their first, second, and third rotations as compared to the initial attending breast radiologist interpretation.

5

POOT AND CHETLEN

Figure 5. Overall resident performance as compared to the BCSC Benchmark Data with regard to sensitivity, percentage of cancers with an abnormal initial interpretation, as well as specificity, and percentage of noncancers that had a negative image interpretation. BCSC, Breast Cancer Surveillance Consortium.

concordance with faculty for abnormal initial interpretation (BI-RADS 0) (P value = 0.00254) and negative initial interpretation (BI-RADS 1 or 2) (P value = 0.0001). The sensitivity across all 48 simulations by 39 radiology residents, the percentage of cancers with an abnormal initial interpretation (BI-RADS 0), was 84.5%, similar to the BCSC benchmark sensitivity of 84.9% (sensitivity for tissue diagnosis of cancer within 1 year following the initial examination) and within the acceptable ACR BI-RADS medical audit range of ≥75%. The overall specificity across all 48 simulations, the percentage of noncancers that had a negative image interpretation (BI-RADS 1 or 2), among residents was 83.2% compared to 90.3% reported in the BCSC benchmark data, but lower than the suggested ACR BI-RADS range of 88%–95%. These findings are also depicted in Figure 5. Recall rate could not be reported as our data set was enriched and a reported “recall rate” from this study would be falsely inflated. DISCUSSION The MQSA legislation set a requirement for an annual audit for each mammographic facility and each radiologist. The medical audit is recognized as one of the best quality assurance tools (6,7). As of 2012, 40% of breast imaging fellowships did not offer training related to the practice audit (8). By instituting a simulation program, residents can be assured that they have been exposed to this important and mandated part of practicing breast imaging. Using simulation modules for purposes of resident training in interpretation of screening mammograms is a promising method for training radiology residents to detect breast cancer and for helping them achieve competence toward national benchmarks. By creating a simulation module from digital screening mammography cases that are in the archives, trainees are still able to interpret screening mammograms on proper 6

Academic Radiology, Vol ■, No ■■, ■■ 2016

MQSA-approved 5-megapixel monitor workstations vs nondiagnostic tablets or nondiagnostic monitors. At our institution, this simulation module has now become a requirement for residents to complete during each breast imaging rotation while in their diagnostic radiology residency training. This experience is felt to be valuable to a learner because pathology results from the screening callbacks (the abnormal screening exams) are available from the patients who underwent a subsequent breast biopsy. Additionally, patients’ diagnostic images, which may include their diagnostic mammogram and often targeted ultrasound, are also available for review and discussion during the course of the resident feedback sessions. Residents are provided with immediate feedback from attending radiologists who review the batches of cases with them. Furthermore, by utilizing a simulation module, residents are able to effectively learn the art of batch reading, which has been reported to reduce screening mammography recall rates without the expense of reducing the cancer detection rate (9). This module also complements our workflow as our breast clinic provides online screening results to women presenting for their screening mammogram, which limits resident involvement in interpreting screening mammograms at real time. Anecdotally, patient care is improved when real-time interpretations are given to patients as this allows decreased patient anxiety and ability to perform same-day diagnostic workup and/or biopsy if necessary. Patient care may also be improved by lower recall rate by attending-only interpretations. A recent study by Hawley et al. noted that attending radiologists reading alone had a lower overall recall rate than when involving a trainee (1). Because the de novo cases will have fewer FPs, fewer patients will have transiently increased anxiety (10,11). Hawley et al. also discovered that the lack of increased cancer detection rate at a significantly higher recall rate with the involvement of trainees compared to a single attending radiologist indicates an increased percentage of FP results (1). Hawley et al.’s study suggests that any expertise brought to image interpretation by the radiology trainees was insufficient to positively influence desired outcomes (that of higher cancer detection rate) (1). In our study, there is no alliterative effect or framing bias or anchoring bias, described by Hawley et al. (1), which is known to occur between attending radiologists and their trainees. Reading skills increased directly with mammographic interpretive volume for test subjects in the study by Nodine et al. (12), which utilized a case set similar to ours. In their study, Nodine et al. utilized a case set that included 150 mammograms composed of unilateral 2D images, of which one-third showed malignant lesions and the remaining twothirds demonstrated no malignancy. Nodine et al. had 3 experienced mammographers, 19 radiology residents, and 9 mammography technologists indicate whether the images contained no malignant lesions or if there were suspicious lesions to indicate malignancy. Decision time was recorded in Nodine et al.’s study and was rated for decision confidence. There are three general areas that expert mammographers use in

Academic Radiology, Vol ■, No ■■, ■■ 2016SCREENING MAMMOGRAPHY FOR INSTRUCTION AND ASSESSMENT

diagnosis: visual search, pattern and object recognition, and decision making. Resident performance yielded greater FP results than with experts, which is thought to be due to lack of perceptual-learning experience during training, which made differentiation of normal variations, benign lesions, and malignant lesions difficult. Our study highlights the importance of exposing residents to more pathology while in training to help improve those three areas that expert mammographers use in diagnosis. A study by Saunders and Samei revealed that incorrect detection as well as incorrect classification decisions were associated with longer screening mammography interpretation times, suggesting that interpretation time can be incorporated into mammographic decision making to identify cases with higher probabilities of perceptual error that require further review (13). While residents were encouraged to complete cases in less than 3 minutes in our simulation scenario, this was not strictly enforced as residents were given the necessary time to complete the cases at their own pace, to simulate real-life situations. We provided the residents with a goal of interpretation time of less than 3 minutes on average so that they would have clear expectations of completing a worklist. However, this time recommendation was not strictly enforced as some residents, particularly the more junior learners, needed more time to complete the task. Using simulation modules provides an opportunity for the faculty to evaluate their residents in a manner similar to the MQSA regulation that requires each facility to maintain a medical outcomes audit program. Residents can then track their progress and compare their statistics from the simulation screening module to some national benchmarks. For those residents who require remediation, using a simulation screening module can serve as a way to monitor advancement in their fund of knowledge as it pertains to screening mammography. Using a simulation screening module also allows for residents to gain a better understanding of the statistical parameters used in the medical outcomes audit program. Limitations and Future Directions

As our case set was enriched with known cancers, we could not accurately assess recall rate or cancer detection rate. The cases within our simulation screening mammography module included an abnormal interpretation (recall) rate of 39% vs the 5%–12% national benchmark recommended by ACR BIRADS 2013 Atlas. Future studies could evaluate if selecting simulation case sets with a lower recall rate would affect resident performance. Our study did not assess the difficulty of the cases to determine if that would affect our results. A study by Grimm et al. suggested that increased resident and attending breast radiologists perceived difficulty of cases is associated with a decrease in resident performance for those cases (14). Future studies could retrospectively evaluate each case for its degree of difficulty and make sure there is an even number of each for a given simulation. A study by Lee et al. showed that having

a mammography boot camp can improve the performance in interpreting mammograms for suspicious cases but not for negative ones (15). Because residents were given feedback after 100 cases, this finding could help explain why evaluation of suspicious cases was on target for the sensitivity of detecting breast cancers with the BCSC benchmark data, but lower than suggested for specificity. Our study did not include CAD data. A study by Luo et al. showed a statistically significant difference in performance in the interpretation of mammograms with CAD than without in radiology residents (16). Future studies could compare resident interpretation with and without CAD. Our pilot study had a small sample size as only 48 attempts were made at the time of data interpretation. Additional studies could be performed after residents complete their 4 years of residency so data can be culled from residents with three attempts to see if there is improvement within individuals from rotation to rotation. As digital breast tomosynthesis is known to reduce the effect of tissue superimposition and improve mammographic interpretation, we will likely supplement our future simulation screening module case set with three-dimensional images in addition to 2D images. This will skew the data and not allow direct comparison of performance longitudinally throughout a resident’s career. Our breast center is currently performing all screening studies with tomosynthesis. A study by Zhang et al. suggests that initial performance with Digital Breast Tomosynthesis (DBT) among radiology residents is independent of years of training (17). Future studies should separate results of threedimensional interpretations from 2D interpretations. Further studies could evaluate resident increased recall rate. Anecdotally, some residents had difficulty perceiving or classifying calcifications, whereas others routinely missed architectural distortion. We found that a one-on-one mentorship in these situations with directed, intensive study of their “overcalls” was necessary. This would be a valuable pursuit as the recall rate for almost half of radiologists is higher than the recommended rate (18), and the recall rate seen in trainees is higher than when attending radiologists interpret screening mammograms alone (1). CONCLUSION By interpreting an enriched data set of standardized screening mammograms with known outcomes, residents were given meaningful feedback on their interpretive skills of screening mammography compared to national benchmarks while in radiology residency training. ACKNOWLEDGMENTS We would like to thank Jason Liao, PhD, for his assistance with statistics and Shelly Tuzzato for her maintenance of our simulation screening module case set. REFERENCES 1. Hawley JR, Taylor CR, Cubbison AM, et al. Influences of radiology trainees on screening mammography interpretation. J Am Coll Radiol 2016;

7

POOT AND CHETLEN

2.

3.

4.

5.

6.

7.

8.

8

doi:10.1016/j.jacr.2016.01.016; S1546-1440(16)00067-3 [pii]; [Epub February 26, 2016]. US Food and Drug Administration. Mammography quality standards act. Available at: http://www.fda.gov/radiation-emittingproducts/ mammographyqualitystandardsactandprogram/default.htm. Accessed November 20, 2015. Monticciolo DL, Rebner M, Appleton CM, et al. The ACR/Society of Breast Imaging Resident and Fellowship Training Curriculum for Breast Imaging, updated. J Am Coll Radiol 2013; 10:207–210, e4. doi:10.1016/j.jacr.2012 .07.026; [Epub December 23, 2012]. Breast Cancer Surveillance Consortium. Sensitivity and specificity for 2,061,691 screening mammography examinations from 2004–2008—based on BCSC data through 2009. 2009. Available at: http://breastscreening .cancer.gov/statistics/benchmarks/screening/2009/tableSensSpec.html. Accessed September 16, 2015. Sickles EA, D’Orsi CJ. ACR BI-RADS® follow-up and outcome monitoring. In: ACR BI-RADS® Atlas, breast imaging reporting and data system. Reston, VA: American College of Radiology, 2013. Spring DB, Kimbrell-Wilmot K. Evaluating the success of mammography at the local level: how to conduct an audit of your practice. Radiol Clin North Am 1987; 25:983–992. Murphy WA, Jr, Destouet JM, Monsees BS. Professional quality assurance for mammography screening programs. Radiology 1990; 175:319– 320. Farria DM, Salcman J, Monticciolo DL, et al. A survey of breast imaging fellowship programs: current status of curriculum and training in the United States and Canada. J Am Coll Radiol 2014; 11:894–898. doi: 10.1016/j.jacr.2014.02.005; [Epub May 22, 2014].

Academic Radiology, Vol ■, No ■■, ■■ 2016

9. Burnside ES, Park JM, Fine JP, et al. The use of batch reading to improve the performance of screening mammography. AJR Am J Roentgenol 2005; 185:790–796. 10. Schou Bredal I, Kåresen R, Skaane P, et al. Recall mammography and psychological distress. Eur J Cancer 2013; 49:805–811. doi: 10.1016/j.ejca.2012.09.001; [Epub September 27, 2012]. 11. Bond M, Pavey T, Welch K, et al. Psychological consequences of falsepositive screening mammograms in the UK. Evid Based Med 2013; 18:54– 61. doi:10.1136/eb-2012-100608; [Epub August 2, 2012]. 12. Nodine CF, Kundel HL, Mello-Thoms C, et al. How experience and training influence mammography expertise. Acad Radiol 1999; 6:575–585. 13. Saunders RS, Samei E. Improving mammographic decision accuracy by incorporating observer ratings with interpretation time. Br J Radiol 2006; 79(Spec2):S117–S122. 14. Grimm LJ, Kuzmiak CM, Ghate SV, et al. Radiology resident mammography training: interpretation difficulty and error-making patterns. Acad Radiol 2014; 21:888–892. doi:10.1016/j.acra.2014.01.025. 15. Lee EH, Jun JK, Jung SE, et al. The efficacy of mammography boot camp to improve the performance of radiologists. Korean J Radiol 2014; 15:578– 585. doi:10.3348/kjr.2014.15.5.578; [Epub September 12, 2014]. 16. Luo P, Qian W, Romilly P. CAD-aided mammogram training. Acad Radiol 2005; 12:1039–1048. 17. Zhang J, Grimm LJ, Lo JY, et al. Does breast imaging experience during residency translate into improved initial performance in digital breast tomosynthesis? J Am Coll Radiol 2015; 12:728–732. doi:10.1016/j.jacr.2015 .02.025. 18. Rosenberg RD, Yankaskas BC, Abraham LA, et al. Performance benchmarks for screening mammography. Radiology 2006; 241:55–66.

Academic Radiology, Vol ■, No ■■, ■■ 2016SCREENING MAMMOGRAPHY FOR INSTRUCTION AND ASSESSMENT

APPENDIX AUDIT DEFINITIONS—BREAST CANCER SURVEILLANCE CONSORTIUM (4) • A true positive (TP) is equal to the number of cancers that had an abnormal initial interpretation. • A true negative (TN) is equal to the number of noncancers that had a negative initial interpretation. • Sensitivity is equal to the percentage of cancers that had an abnormal initial interpretation (BI-RADS category 0, 4, or 5). • Specificity is equal to the percentage of noncancers that had a negative initial interpretation (BI-RADS category 1, 2, or 3 with no recommendation for immediate follow-up).





• AUDIT DEFINITIONS—ACR BI-RADS (5) •

A TP is defined as tissue diagnosis of cancer within 1 year after a positive examination (BI-RADS category 0, 3, 4, or 5 for screening). • A TN is defined as no known tissue diagnosis of cancer within 1 year of a negative examination (BI-RADS category 1 or 2 for screening). • A false negative (FN) is defined as tissue diagnosis of cancer with 1 year of a negative examination (BI-RADS category 1 or 2 for screening). • A false positive (FP) has three separate definitions: 1) FP1 is defined as no known tissue diagnosis of cancer within 1 year of a positive examination; 2) FP2 is defined as no known tissue diagnosis of cancer with 1 year after recommendation for tissue diagnosis or surgical consultation on the basis of a positive examination; and 3) FP3 is defined as a concordant benign tissue diagnosis (or discordant benign





tissue diagnosis and no known tissue diagnosis of cancer) within 1 year after recommendation for tissue diagnosis on the basis of a positive examination. Sensitivity is defined as the probability of interpreting an examination as positive when cancer exists, measured as the number of positive examinations for which there is a tissue diagnosis of cancer within 1 year of imaging examination, divided by all cancers present in the population examined in the same time period [TP/(TP + FN)]. Specificity is defined as the probability of interpreting an examination as negative when cancer does not exist, measured as the number of negative examinations for which there is no tissue diagnosis of cancer within 1 year of examination, divided by all examinations for which there is no tissue diagnosis of cancer within the same time period [TN/(TN + FP)]. Recall rate is defined as the percentage of examinations with an abnormal initial interpretation. Recall rate was not examined in this simulation screening mammography module as the ACR BI-RADS medical audit range for an abnormal interpretation (recall) rate is suggested to be between 5% and 12%. Because our simulation module was enriched with breast cancers higher than was expected for the general population, this was not calculated or provided as feedback to the residents. Cancer detection rate as recommended by the ACR BIRADS medical audit range suggests ≥2.5 per 1000 examinations. As our study is limited to 266 enriched cases, this also was not calculated. PPV1 is defined as the percentage of examinations with an abnormal initial interpretation that result in a tissue diagnosis of cancer within 1 year. The ACR BI-RADS medical audit range suggested for PPV1 is 3%–8%. Our simulation data set could not translate into providing PPV1 data because our simulation module was enriched.

9