Improving digital breast tomosynthesis reading time: A pilot multi-reader, multi-case study using concurrent Computer-Aided Detection (CAD)

Improving digital breast tomosynthesis reading time: A pilot multi-reader, multi-case study using concurrent Computer-Aided Detection (CAD)

European Journal of Radiology 97 (2017) 83–89 Contents lists available at ScienceDirect European Journal of Radiology journal homepage: www.elsevier...

1MB Sizes 8 Downloads 14 Views

European Journal of Radiology 97 (2017) 83–89

Contents lists available at ScienceDirect

European Journal of Radiology journal homepage: www.elsevier.com/locate/ejrad

Research article

Improving digital breast tomosynthesis reading time: A pilot multi-reader, multi-case study using concurrent Computer-Aided Detection (CAD)

MARK



Corinne Balleyguiera, , Julia Arfi-Rouchea, Laurent Levyb, Patrick R. Toubianac, Franck Cohen-Scalic, Alicia Y. Toledanod, Bruno Boyere a

Department of Radiology, Gustave Roussy, 114 rue Edouard-Vaillant, 94805 Villejuif Cedex, France Institut de Radiologie de Paris, 31 Avenue Hoche, 75008 Paris, France c Centre de Senologie et d’Echographie, 13 rue Beaurepaire, 75010 Paris, France d Biostatistics Consulting, LLC, 10606 Wheatley Street, Kensington, MD 20895, USA e Centre d’Imagerie Medicale Italie, 6 place d'Italie, 75013 Paris, France b

A R T I C L E I N F O

A B S T R A C T

Keywords: Diagnostic imaging Breast cancer Computer-assisted diagnosis Digital breast tomosynthesis Time studies

Purpose: Evaluate concurrent Computer-Aided Detection (CAD) with Digital Breast Tomosynthesis (DBT) to determine impact on radiologist performance and reading time. Materials and methods: The CAD system detects and extracts suspicious masses, architectural distortions and asymmetries from DBT planes that are blended into corresponding synthetic images to form CAD-enhanced synthetic images. Review of CAD-enhanced images and navigation to corresponding planes to confirm or dismiss potential lesions allows radiologists to more quickly review DBT planes. A retrospective, crossover study with and without CAD was conducted with six radiologists who read an enriched sample of 80 DBT cases including 23 malignant lesions in 21 women. Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) compared the readings with and without CAD to determine the effect of CAD on overall interpretation performance. Sensitivity, specificity, recall rate and reading time were also assessed. Multi-reader, multi-case (MRMC) methods accounting for correlation and requiring correct lesion localization were used to analyze all endpoints. AUCs were based on a 0–100% probability of malignancy (POM) score. Sensitivity and specificity were based on BI-RADS scores, where 3 or higher was positive. Results: Average AUC across readers without CAD was 0.854 (range: 0.785-0.891, 95% confidence interval (CI): 0.769,0.939) and 0.850 (range: 0.746-0.905, 95% CI: 0.751,0.949) with CAD (95% CI for difference: −0.046,0.039), demonstrating non-inferiority of AUC. Average reduction in reading time with CAD was 23.5% (95% CI: 7.0–37.0% improvement), from an average 48.2 (95% CI: 39.1,59.6) seconds without CAD to 39.1 (95% CI: 26.2,54.5) seconds with CAD. Per-patient sensitivity was the same with and without CAD (0.865; 95% CI for difference: −0.070,0.070), and there was a small 0.022 improvement (95% CI for difference: -0.046,0.089) in per-lesion sensitivity from 0.790 without CAD to 0.812 with CAD. A slight reduction in specificity with a −0.014 difference (95% CI for difference: -0.079,0.050) and a small 0.025 increase (95% CI for difference: −0.036,0.087) in recall rate in non-cancer cases were observed with CAD. Conclusions: Concurrent CAD resulted in faster reading time with non-inferiority of radiologist interpretation performance. Radiologist sensitivity, specificity and recall rate were similar with and without CAD.

1. Introduction Breast cancer is the most common form of cancer detected in women worldwide, excluding non-melanoma skin cancer [1]. Accurate

and early detection of breast cancer is key to improved diagnosis and treatment. Early detection has been conducted with analog mammography, with a gradual shift to full field digital mammography (FFDM) after the results of the Digital Mammographic Imaging Screening Trial

Abbreviations: ACR, American College of Radiology; ANOVA, Analysis of variance; AUC, Area Under the ROC Curve; BI-RADS®, Breast Imaging Reporting and Data System; CAD, computer-aided detection; CC, craniocaudal; CI, confidence interval; CRF, case report form; CTC, CT colonography; DBT, digital breast tomosynthesis; DMIST, Digital Mammographic Imaging Screening Trial; FFDM, full field digital mammography; HIPAA, Health Insurance Portability and Accountability Act; IDC, invasive ductal carcinoma; MLO, mediolateral oblique; MRMC, multi-reader, multi-case; POM, probability of malignancy; ROC, Receiver Operating Characteristic ⁎ Corresponding author at: Department of Radiology Gustave Roussy 114 rue Edouard-Vaillant 94805 Villejuif Cedex, France. E-mail address: [email protected] (C. Balleyguier). http://dx.doi.org/10.1016/j.ejrad.2017.10.014 Received 27 April 2017; Received in revised form 21 September 2017; Accepted 19 October 2017 0720-048X/ © 2017 Elsevier B.V. All rights reserved.

European Journal of Radiology 97 (2017) 83–89

C. Balleyguier et al.

consisted of bilateral craniocaudal (CC) FFDM images and mediolateral oblique (MLO) DBT images (standard synthetic images, planes and slabs) (SenoClaire, GE Healthcare, Waukesha, WI). Cases were excluded from women with a personal history of breast cancer or imaging evidence of previous surgery because readers were not provided patient history or prior exams. Stratified random selection of the 80 cases used in the study was based on case type (cancer, benign, recalled, negative), mammographic appearance and histopathology of lesions, breast density, detectability of soft tissue densities (masses, architectural distortions and asymmetries) or mixed lesions (soft tissue densities with calcifications) with standard synthetic images, conspicuity of soft tissue densities or mixed lesions, and American College of Radiology (ACR) Breast Imaging Reporting and Data System (BI-RADS®) assessment categories [14]. Although the CAD system does not detect calcifications, cases with calcifications as the only lesion were included in the study to evaluate the system with cases representative of the intended use population. An expert breast imaging radiologist outlined malignant lesions in each case to establish “truth” and confirmed case type. Biopsy proof in BI-RADS 3, 4 or 5 exams was the reference standard for cancer and benign cases. The reference standard for recalled cases was BIRADS 0 without biopsy and for negative cases was BI-RADS 1 or 2 without biopsy.

(DMIST) were reported in 2005 [2,3]. Currently, FFDM has largely replaced analog mammography. More recently digital breast tomosynthesis (DBT) has been added to FFDM, which has been shown to increase tissue visualization and cancer detection, result in lower recall rates and reduce false-positives compared to FFDM alone [4–6]. However, reviewing DBT in combination with FFDM takes about twice as long as reading only an FFDM exam [7–9]. A study with three radiologists each reading the same 100 screening exams, where bilateral 2view FFDM alone exams were read first and bilateral 2-view FFDM + DBT were read 3–7 days later, showed reading FFDM + DBT exams (average 77; range 60–90 s per exam) was 2.3 times longer (p < 0.01) than FFDM alone (average 33; range 25–46 s per exam) [7]. A larger study of 12,621 screening exams with each bilateral 2-view FFDM + DBT exam interpreted by one of eight radiologists and each bilateral 2-view FFDM alone exam interpreted by a different radiologist demonstrated a mean reading time of 91 s for FFDM + DBT and 45 s for FFDM alone (2.0 times longer with the addition of DBT, p < 0.001) [8]. Another study reported a 1.5 times increase (p < 0.001) in average reading time from 1.9 min per exam (range 1.1-3.0 min) for FFDM alone to 2.8 min per exam (range 1.5-4.2 min) for FFDM + DBT with 10 radiologists and 3665 screening exams (2163 FFDM alone and 1502 FFDM + DBT), where the images from each exam were read by one of the 10 radiologists [9]. Therefore, a concurrent DBT computer-aided detection (CAD) system was developed to assist radiologists by reducing reading time while maintaining the accuracy of DBT. A concurrent CAD system is used by radiologists throughout the review of image exams, in contrast to more traditional second read CAD systems that are not intended to be used by radiologists until after an initial review without CAD has been completed. A similar strategy was used with a concurrent CAD system developed for CT colonography (CTC) [10–12]. Using this system, the radiologist fully reviews the 2D and 3D CT colonography images at the locations marked by CAD and then does a quick review of the whole CTC exam in 2D. With this reading paradigm, Iussich et al. [10,11] and Regge et al. [12] reported concurrent CTC CAD maintained or increased reader sensitivity at the same specificity, while decreasing reading time. The generation of synthetic mammograms with CAD from DBT volumes has been previously reported by van Schie et al. [13]. They demonstrated the accuracy of readers for detecting lesions with the CAD-generated 2D synthetic images alone was slightly better than reading FFDM alone in a pilot reader study. In this paper we present a new concurrent DBT CAD system (PowerLook Tomo Detection, iCAD, Nashua, NH) for mammography that enables the radiologist to review CAD-enhanced synthetic images, in which suspicious lesions detected by CAD in DBT planes are blended into standard synthetic images, and navigate to planes to confirm or dismiss potential lesions, thereby allowing the radiologist to more quickly review the DBT planes. The purpose of this pilot multi-reader, multi-case (MRMC) study was to compare with and without CAD radiologist performance for the detection of malignant lesions as measured by area under the receiver operating characteristic (ROC) curve (AUC), reading time, sensitivity, specificity, and recall rate.

2.2. The CAD system The concurrent DBT CAD system uses artificial intelligence technology to always detect five soft tissue densities per volume of planes (GE SenoClaire). A tomosynthesis review workstation (IDI MammoWorkstation, GE Healthcare, Buc, France) uses the CAD detections to create a CAD-enhanced synthetic image by blending the five detections from the planes into the corresponding standard synthetic image. The lesions detected by CAD are not marked or outlined in the resulting CAD-enhanced synthetic image as shown in Fig. 1. The increased lesion conspicuity on the CAD-enhanced synthetic image assists the radiologist in identifying soft tissue densities that can be selected with a mouse click, and the workstation navigates to the detected lesion in the corresponding DBT planes. The lesions are then confirmed or dismissed by the radiologist in the DBT planes. None of the cases used in this study were used in CAD system training. Standalone performance was assessed to determine 1) CAD-enhanced synthetic image sensitivity for malignant soft tissue densities and mixed lesions at the case-level and lesion-level and 2) CAD Detection Rate for images, defined as an estimate of the number of regions detected by CAD in the planes and blended into the corresponding synthetic image. Malignant lesions were determined to be correctly detected by CAD when the centroid of a CAD detection was within the truthing radiologist’s truth volume. True positives for CADenhanced synthetic image sensitivity were lesions correctly detected by CAD and blended into the CAD-enhanced synthetic image and lesions not detected by CAD but determined to be visible on the standard synthetic image by the truthing radiologist. Rather than standalone sensitivity of the CAD system alone, the standalone sensitivity measured in this study is the sensitivity of the CAD-enhanced synthetic image. Since there are no CAD marks, the visibility of lesions on the CAD-enhanced synthetic images determines the impact on the radiologist. Therefore, the true positive definition for CAD-enhanced synthetic image sensitivity includes lesions that were detected by CAD and lesions that were already visible on the standard synthetic image.

2. Materials and methods 2.1. Study cases Four participating institutions from France (two), Germany (one) and the US (one) retrospectively collected data under the same tomosynthesis data collection protocol in compliance with each country’s laws. A central Institutional Review Board approved the protocol and waived informed consent. All cases were de-identified with Health Insurance Portability and Accountability Act (HIPAA) compliance. Eighty cases were selected from 101 cases that met the study inclusion criteria. Included cases were bilateral screening or diagnostic tomosynthesis exams from women 18 years or older. Each exam

2.3. Readers and training Six radiologists who were fully certified to interpret mammograms and DBT exams and had interpreted more than 500 GE SenoClaire DBT exams in the last two years participated as study readers (CB, JA-R, LL, PRT, FC-S, BB). Five were sub-specialist breast radiologists who devoted ≥75% of their professional time to breast imaging for the last 3 84

European Journal of Radiology 97 (2017) 83–89

C. Balleyguier et al.

Fig. 1. Use of Concurrent DBT CAD System with 71-year-old female with extremely dense breasts presenting for screening exam. No suspicious findings are identified on bilateral CC FFDM (a: two images on left) and MLO standard synthetic images (a: two images on right). Using CAD-enhanced synthetic MLO images (b: center two images) to assist in identifying soft tissue densities, the radiologist can select the architectural distortion seen superiorly on the CAD-enhanced right breast MLO image (b: left center image) and automatically navigate to the lesion in the right upper outer quadrant detected by CAD in the right MLO DBT planes (b: leftmost image). The radiologist can then evaluate this architectural distortion that was a 3 cm invasive ductal carcinoma at histopathology. Lesions are not marked or outlined by the concurrent DBT CAD system; instead, lesions detected in the DBT planes by CAD are naturally blended into the corresponding standard synthetic images to create CAD-enhanced images. Standard synthetic (c), CAD-enhanced (d) and DBT planes (e) images zoomed in to the architectural distortion more clearly show the improvement in lesion visibility with CAD-enhancement.

years, and one was a general radiologist who devoted < 75% of her/his professional time to breast imaging. All readers were trained on the reading procedures with 30 practice tomosynthesis cases that were not part of the study set, 20 of which were read with CAD and 10 without CAD. Training focused on the use of the CAD-enhanced synthetic image to assist in identifying soft tissue densities and using the workstation to efficiently navigate to corresponding locations in the planes of potential lesions identified by the reader. During the training the readers were instructed that CAD enhances soft tissue densities and mixed lesions, but not calcifications. 2.4. Image interpretation Study readings occurred at the Hyatt Regency Etoile (Paris, France) June 4–12 and July 7–17, 2015. Each reader reviewed each case both without CAD and with CAD on the IDI MammoWorkstation during these 2 sessions separated by 4 weeks to minimize memory recall bias [15]. Each case consisted of bilateral CC FFDM and MLO DBT images. Cases were divided into two sets of 40 (Case Set A, Case Set B) based on case selection factors [16] especially case type. Each reader interpreted one case set with CAD and the other case set without CAD in the first session, and performed the complementary interpretations in the second case session, in a counterbalanced design (Fig. 2). Reading order was individually randomized for each radiologist. Readers were blinded to case type, acquisition site interpretations, prior images, and patient history for each case. All readers performed their interpretations independently. For each case on each read, the reader recorded whether one or more lesions were suspicious enough that recall, short-interval follow-

Fig. 2. Study Design.

up, or tissue diagnosis was recommended (Yes or No). If a No was recorded, the reader provided a BI-RADS assessment category [14] and a case-level probability of malignancy (POM) score, which consisted of integers ranging from 0 to 100% (0–20%: negative or benign, 21–40%: probably benign, 41–60%: possibly malignant, 61–80%: probably malignant, 81–100%: malignant). If a Yes was recorded, the reader provided for each suspicious finding: mammographic appearance (soft 85

European Journal of Radiology 97 (2017) 83–89

C. Balleyguier et al.

tissue density and/or calcifications), BI-RADS category, POM score, and location annotation (using the workstation). Reading time in seconds was measured starting when the reader viewed the images and ending once the reader finished reviewing the images, based on readers stating “start” and “stop”, respectively. Reading time did not include the time needed to document the lesion location with the workstation or to complete the case report form (CRF). The radiologists were not able to observe the timer and were therefore blinded to the reading time.

Table 1 Clinical Characteristics of Study Sample: N (%) unless otherwise noted. All Cases

Case Set A (N = 40)

Case Set B (N = 40)

Total (N = 80)

51.6 (8.8)

52.7 (10.2)

52.2 (9.5)

3 (8) 18 (45)

2 (5) 17 (43)

5 (6) 35 (44)

16 (40) 3 (8)

17 (43) 4 (10)

33 (41) 7 (9)

Number of Lesions 0 1 2

17 (43) 22 (55) 1 (3)

18 (45) 20 (50) 2 (5)

35 (44) 42 (53) 3 (4)

Case Type Cancer Benign Recalled Negative

10 (25) 9 (23) 4 (10) 17 (43)

11 (28) 8 (20) 3 (8) 18 (45)

21 (26) 17 (21) 7 (9) 35 (44)

Age (years) Mean (SDa) BI-RADS Breast Density a. Almost entirely fatty b. Scattered areas of fibroglandular density c. Heterogeneously dense d. Extremely dense

2.5. Statistical methods The MRMC mixed effects analysis of variance (ANOVA) method [17,18] was used to analyze AUCs and reading times. AUCs were obtained nonparametrically [19] for each reader based on POM scores requiring correct lesion localization of at least one malignant finding to obtain credit for identifying a subject with cancer. For a reader to be scored with a correctly identified malignant lesion, the center point of the lesion location annotated by the reader was required to be within or coincident with the truthing radiologist’s outline of the malignant lesion. Statistical inference on percent difference in reading time employed a normalizing transformation, natural log of (% difference + 100), and results were back-transformed to the original scale for ease of interpretation. The ANOVA model was applied directly to the differences. Comparison of radiologist performance with and without CAD, as measured by difference in average AUC, was evaluated with two-sided significance level 0.05; a 95% confidence interval (CI) was constructed using a Student’s t-distribution [17,18] and its lower limit was compared to the negative of a pre-specified 0.05 non-inferiority margin. The average improvement in reading time and a corresponding two-sided 95% CI were computed, and its lower limit was compared with zero (0) to evaluate superiority. MRMC methods were also used to analyze persubject sensitivity, specificity and recall rate. Per-subject sensitivity and specificity were based on per-subject BI-RADS scores requiring correct lesion localization, with a BI-RADS score of 3 or higher constituting a positive test result. Recall rate was analyzed in cases without cancer based on whether the reader indicated there were any suspicious lesions detected. MRMC analysis of lesion-level sensitivity was based on perlesion BI-RADS scores requiring correct lesion localization and used clustered data methods to account for the correlation between lesions in the same case [20].

Cancer Cases

Histopathology IDCb IDCb + ILCc, 2 lesions ILCc ILCc + LCISd (Benign), 2 lesions Mucinous carcinoma DCISe DCISe, 2 lesions Size (cmf, maximum) Mean (SDa) Mammographic Appearance Soft tissue density only Soft tissue densities only, 2 lesions Soft tissue density (Malignant) + Calcifications (Benign), 2 lesions Mixed lesion Calcifications only, 2 lesions Calcifications only Conspicuity of soft tissue density or mixed lesion better in planes than standard synthetic images Yes No Soft tissue density or mixed lesion detectable in standard synthetic images Yes No

3. Results 3.1. Demographic, clinical and case characteristics Of the 80 women (mean ± SD age, 52.2 ± 9.5 years) participating in this study 40 patients had dense breasts (BI-RADS c or d) and 45 patients had one or more malignant or benign lesions (Table 1). Fifty-nine patients did not have cancer. Of the 21 patients with biopsyproven cancer, 19 patients had one malignant lesion and two patients had two malignant lesions. The 23 malignant lesions included in the study averaged 1.84 ± 1.10 cm in size and were predominantly invasive ductal carcinoma (IDC). Three cancer cases (3/21, 14%) were calcifications only, 15 (15/21, 71%) were soft tissue densities only and 3 (3/21, 14%) were mixed lesions of soft tissue densities and calcifications.

Benign and Recalled Cases

Mammographic Appearance Soft tissue density only Mixed lesion Calcifications only Conspicuity of soft tissue density or mixed lesion better in planes than standard synthetic images Yes No Soft tissue density detectable in standard synthetic images Yes No

3.2. Change in AUC Four (4) of the readers had slightly higher AUCs with CAD, while 2 had lower AUCs (Table 2). The largest individual reader change in AUC with CAD was a reduction of 0.065. The average AUC across readers with CAD was 0.850 (range: 0.746-0.905; 95% CI: 0.751, 0.949), similar to the average AUC across readers without CAD (0.854, range:

a b

86

SD: standard deviation. IDC: invasive ductal carcinoma.

Case Set A (N = 10)

Case Set B (N = 11)

Total

6 0 2 1 0 1 0

5 1 3 0 1 0 1

11 (52) 1 (5) 5 (24) 1 (5) 1 (5) 1 (5) 1 (5)

(60) (0) (20) (10) (0) (10) (0)

(45) (9) (27) (0) (9) (0) (9)

(N = 21)

1.76 (1.10)

1.92 (1.15)

1.84 (1.10)

7 (70) 0 (0) 1 (10)

6 (55) 1 (9) 0 (0)

13 (62) 1 (5) 1 (5)

1 (10) 0 (0) 1 (10)

2 (18) 1 (9) 1 (9)

3 (14) 1 (5) 2 (10)

9 (100) 0 (0)

8 (89) 1 (11)

17 (94) 1 (6)

5 (56) 4 (44)

4 (44) 5 (56)

9 (50) 9 (50)

Case Set A (N = 13)

Case Set B (N = 11)

Total

11 (85) 1 (8) 1 (8)

8 (73) 0 (0) 3 (27)

19 (79) 1 (4) 4 (17)

11 (92) 1 (8)

7 (88) 1 (13)

18 (90) 2 (10)

3 (25) 9 (75)

4 (50) 4 (50)

7 (35) 13 (65)

(N = 24)

European Journal of Radiology 97 (2017) 83–89

C. Balleyguier et al. c d e f

Table 3 Analysis of Reading Time.

ILC: invasive lobular carcinoma. LCIS: lobular carcinoma in situ. DCIS: ductal carcinoma in situ. cm: centimeters.

Table 2 Analysis of Area under Empirical ROC Curve (AUC). Estimate (Standard Error):

Without CAD

With CAD

Difference

AUC Reader A (Most experience) Reader B Reader C Reader D Reader E Reader F (Least experience) Average 95% CI for Average

0.891 (0.048) 0.811 (0.065) 0.871 (0.053) 0.879 (0.054) 0.886 (0.047) 0.785 (0.064) 0.854 (0.043) (0.769, 0.939)

0.905 (0.046) 0.746 (0.074) 0.881 (0.047) 0.901 (0.047) 0.870 (0.053) 0.799 (0.065) 0.850 (0.050) (0.751, 0.949)

0.014 (0.039) −0.065 (0.059) 0.009 (0.038) 0.022 (0.027) −0.016 (0.027) 0.014 (0.060) −0.003 (0.021) (−0.046, 0.039)

Average (Standard Error) in seconds and % Difference:

Without CAD (seconds)

With CAD (seconds)

Difference (seconds)

% Difference in Time

Reader A (Most experienced) Reader B Reader C Reader D Reader E Reader F (Least experienced) Average Normalizing transformation for statistical analysis Back-Transformed 95% CI for Average

59.8 (2.3)

45.6 (2.4)

−14.2 (3.4)

−16.0 (5.3)

39.0 40.7 61.3 62.3 45.3

23.5 26.5 55.2 55.2 42.1

−15.5 (1.1) −14.2 (1.1) −6.1 (3.2) −7.1 (2.1) −3.2 (1.6)

−36.8 (1.9) −33.8 (2.3) 3.1 (7.5) −7.2 (3.3) −3.7 (3.5) −15.7 (−) Natural log of (% Difference + 100) −23.5% (−37.0%, −7.0%)

(1.1) (1.1) (2.4) (2.4) (1.2)

(0.5) (1.1) (2.5) (2.1) (1.3)

51.4 (−) Natural log

41.3 (−) Square root

−10.1 (2.2) None

48.2 (39.1, 59.6)

39.1 (26.2, 54.5)

Not required (−15.5, −4.6)

39.1 (95% CI: 26.2, 54.5) seconds with CAD. 3.4. Sensitivity, specificity and recall rate Per-patient sensitivity averaged across readers was the same with CAD and without CAD, 0.865 (95% CI for difference: −0.070, 0.070), with slightly different ranges (0.762–0.952 without CAD; 0.762–0.905 with CAD) and 95% CIs (0.759, 0.971 without CAD; 0.754, 0.976 with CAD) (Table 4). Average per-lesion sensitivity was 0.790 (range: Table 4 Analysis of Sensitivity, Specificity, and Recall Rate.

Fig. 3. Average AUC Without and With CAD.

0.785-0.891; 95% CI: 0.769, 0.939) (Fig. 3). The two-sided 95% CI for the difference in average AUC with CAD − without CAD was -0.046, 0.039. The study is considered to have successfully demonstrated noninferior AUC because the lower limit, −0.046, was above the negative of the pre-specified 0.05 non-inferiority margin, -0.05. 3.3. Reading time The average reading time was 51.4 (range: 39.0-62.3) seconds without CAD and 41.3 (range: 23.5-55.2) seconds with CAD (Table 3). The average difference in reading time with CAD was a decrease by 10.1 s (95% CI for difference: 4.6–15.5 s improvement). The average percent difference on the untransformed scale shows a 15.7% improvement; however, this probably underestimates the center of the distribution of improvement because that center is more heavily influenced by reading times that are longer without CAD. To correct for this underestimation, the natural log of (% Difference + 100) transformation was used for statistical analysis and transformed back to the percent difference scale for reporting, providing an average 23.5% reduction in reading time (95% CI for difference: 7.0–37.0% improvement). Transformations were also used to determine the average reading time as 48.2 (95% CI: 39.1, 59.6) seconds without CAD and 87

Estimate (Standard Error):

Without CAD

With CAD

Difference

Per-Subject Sensitivity Reader A (Most experience) Reader B Reader C Reader D Reader E Reader F (Least experience) Average 95% CI for Average

0.905 (0.064) 0.857 (0.076) 0.857 (0.076) 0.857 (0.076) 0.952 (0.046) 0.762 (0.093) 0.865 (0.053) (0.759, 0.971)

0.905 (0.064) 0.762 (0.093) 0.905 (0.064) 0.905 (0.064) 0.905 (0.064) 0.810 (0.086) 0.865 (0.056) (0.754, 0.976)

0.000 (0.067) −0.095 (0.093) 0.048 (0.046) 0.048 (0.046) −0.048 (0.046) 0.048 (0.082) 0.000 (0.033) (−0.070, 0.070)

Per-Lesion Sensitivity Reader A (Most experience) Reader B Reader C Reader D Reader E Reader F (Least experience) Average 95% CI for Average

0.826 (0.074) 0.783 (0.080) 0.783 (0.080) 0.783 (0.098) 0.870 (0.065) 0.696 (0.104) 0.790 (0.070) (0.652, 0.928)

0.870 (0.069) 0.696 (0.104) 0.870 (0.069) 0.826 (0.093) 0.870 (0.069) 0.739 (0.101) 0.812 (0.066) (0.681, 0.942)

0.043 (0.076) −0.087 (0.085) 0.087 (0.058) 0.043 (0.044) 0.000 (0.063) 0.043 (0.077) 0.022 (0.032) (−0.046, 0.089)

Specificity Reader A (Most experience) Reader B Reader C Reader D Reader E Reader F (Least experience) Average 95% CI for Average

0.627 (0.063) 0.373 (0.063) 0.678 (0.061) 0.627 (0.063) 0.525 (0.065) 0.627 (0.063) 0.576 (0.059) (0.450, 0.702)

0.678 (0.061) 0.407 (0.064) 0.678 (0.061) 0.593 (0.064) 0.508 (0.065) 0.508 (0.065) 0.562 (0.059) (0.437, 0.687)

0.051 (0.056) 0.034 (0.063) 0.000 (0.059) −0.034 (0.041) −0.017 (0.070) −0.119 (0.068) −0.014 (0.029) (−0.079, 0.050)

Recall Rate in Non-Cancers Reader A (Most experience) Reader B Reader C Reader D Reader E Reader F (Least experience) Average 95% CI for Average

0.390 (0.063) 0.627 (0.063) 0.322 (0.061) 0.373 (0.063) 0.475 (0.065) 0.373 (0.063) 0.427 (0.058) (0.301, 0.552)

0.373 (0.063) 0.593 (0.064) 0.322 (0.061) 0.407 (0.064) 0.525 (0.065) 0.492 (0.065) 0.452 (0.058) (0.331, 0.573)

−0.017 (0.061) −0.034 (0.063) 0.000 (0.059) 0.034 (0.041) 0.051 (0.070) 0.119 (0.068) 0.025 (0.028) (−0.036, 0.087)

European Journal of Radiology 97 (2017) 83–89

C. Balleyguier et al.

The reading paradigm used in this study is similar to that reported for CTC colonography [10–12]. With this reading paradigm, Iussich et al. [10,11] and Regge et al. [12] showed that concurrent CAD maintained or increased reader sensitivity at the same specificity, while decreasing reading time. For DBT mammography, this concurrent CAD study demonstrates similar reader sensitivity and specificity, while decreasing reading time. Other approaches to improving the reading time of DBT include slabbing to reduce the number of planes to review by combining adjacent planes to create thicker planes [24] and reviewing single-view DBT planes (MLO views) without synthetic or FFDM images [25]. Slabbing has been suggested to reduce reading time by 20% without any significant loss in image quality [24]. An explorative analysis of single-view DBT exams demonstrated improved cancer detection rate with only a small increase in recall rate and no change in positive predictive value compared to two-view FFDM alone [25]. Slabbing and single-view DBT exams may be alternatives to further investigate. This was a retrospective reader study with a small sample size and a predominance of sub-specialist breast radiologists, which poses some limitations. A larger pivotal reader study of concurrent DBT CAD with breast imaging and general radiologists will be conducted to address broader applicability of results. This is important because DBT in combination with FFDM improves the ability to distinguish malignant from benign tumors and can detect early signs of cancer hidden by overlapping tissues compared to FFDM alone [4,6,26,27], which increases cancer detection rates and reduces call back rates. Studies [4,8,27] have indicated that DBT plus FFDM reveals about 30% more cancers and reduces false-positives approximately 15% compared to FFDM alone. In conclusion, the concurrent DBT CAD system used in this study to create a CAD-enhanced synthetic image enables radiologists to review tomosynthesis images 23.5% faster, without altering the interpretation performance of the radiologist in detecting breast cancer. This technique appears promising for application with tomosynthesis for breast cancer screening.

0.696–0.870; 95% CI: 0.652, 0.928) without CAD and 0.812 (range: 0.696–0.870; 95% CI: 0.681, 0.942) with CAD, indicating a small improvement with CAD of 0.022 (95% CI for difference: −0.046, 0.089). The average specificity without CAD was 0.576 (range: 0.373–0.678; 95% CI: 0.450, 0.702); a slight decrease in specificity was observed with CAD (0.562; range: 0.407–0.678; 95% CI: 0.437, 0.687) for a difference of −0.014 (95% CI for difference: -0.079, 0.050). The recall rate in non-cancer cases without CAD was 0.427 (range: 0.322–0.627; 95% CI: 0.301, 0.552) and with CAD was 0.452 (range: 0.322–0.593; 95% CI: 0.331, 0.573), a small increase of 0.025 (95% CI for difference: −0.036, 0.087). Although no formal subgroup analysis of the reader study endpoints in the three calcifications only cases was performed, no significant impact with CAD was noticed. 3.5. Standalone performance Nineteen malignant soft tissue density lesions were analyzed in 18 cancer cases. For these cases, the CAD-enhanced synthetic image sensitivity was 16/18 or 88.9% at the case-level and 16/19 or 84.2% at the lesion-level. The average CAD detection rate was five per image by the design of the CAD system. 4. Discussion This retrospective MRMC crossover study investigated the ability of a concurrent DBT CAD system to reduce the reading time of DBT without affecting the performance of the radiologist. Radiologist performance was evaluated by AUC, which combines radiologist sensitivity and specificity into a single performance metric. The AUC analysis demonstrated concurrent DBT CAD maintained the overall interpretation performance of the readers to detect malignant lesions with a 23.5% reduction in reading time. The ability of the concurrent DBT CAD system to detect malignant soft tissue densities resulted in a CADenhanced synthetic image sensitivity of 88.9% at the case level at five CAD detections per image. This study was planned as a pilot study for a larger pivotal study that included radiologists from other geographical areas, with a larger proportion of general radiologists. Estimates of quantities that influence power of MRMC studies obtained from this study were used to ensure that the pivotal study was adequately powered to determine whether overall performance was maintained with reduced reading time. As DBT is becoming more widely used in screening mammography, decreasing the reading time is a major concern to radiologists. Several studies [7–9] have indicated reading DBT mammograms require longer reading times than traditional FFDM. With the increase in the use of DBT for screening, we realized the need for specialized workflow tools to reduce the reading time of DBT images while maintaining radiologists’ performance and accuracy. The radiologist uses concurrent DBT CAD by fully reviewing CADenhanced synthetic images and navigating to DBT planes to confirm or dismiss potential lesions. The CAD system detects soft tissue densities in the tomosynthesis planes, and a blending algorithm merges the CAD detections from the planes into the corresponding synthetic image. The radiologist then views the lesion(s) on these CAD-enhanced synthetic images and can navigate directly to the tomosynthesis plane for further visualization and characterization. Since the CAD system analysis occurs in the DBT planes, parenchymal densities above or below suspicious lesions have little impact the CAD analysis, while they may obscure malignant lesions in traditional synthetic or FFDM images. This concurrent DBT CAD system does not mark or outline detected lesions or detect calcifications like traditional second read FFDM CAD [21–23]. Lesions detected by concurrent DBT CAD are naturally blended into the synthetic image, making detections of normal or benign structures easier to dismiss by readers in the study than traditional FFDM CAD marks in their clinical practices.

Conflict of interest Author AT is a consultant to iCAD, Inc. (Nashua, NH). The other authors declare that they have no conflicts of interest. Acknowledgements We appreciate Senthil Periaswamy, Ph.D., Jonathan Go, Jeffrey Hoffmeister, M.D. and Rachel Brem, M.D. for reviewing drafts of the manuscript. We also thank Meridith Peratikos, M.S. for assistance with the statistical analyses and scientific writer Andrea Gwosdow, Ph.D. for assistance writing and editing the manuscript. This work was supported by iCAD, Inc. The funding source assisted in the data collection and reimbursed the scientific writer and biostatisticians for their contributions to the study. References [1] Breast Cancer Statistics, World Cancer Research Fund International, 2017 http:// www.wcrf.org/int/cancer-facts-figures/data-specific-cancers/breast-cancerstatistics (accessed 01.10.16). [2] E.D. Pisano, C. Gatsonis, E. Hendrick, et al., Diagnostic performance of digital versus film mammography for breast-cancer screening, N. Engl. J. Med. 353 (2005) 1773–1783. [3] N.T. van Ravesteyn, L. van Lier, C.B. Schechter, et al., Transition from film to digital mammography: impact for breast cancer screening through the national breast and cervical cancer early detection program, Am. J. Prev. Med. 48 (2015) 535–542. [4] S.M. Friedewald, E.A. Rafferty, S.L. Rose, et al., Breast cancer screening using tomosynthesis in combination with digital mammography, JAMA 311 (2014) 2499–2507. [5] ACR Statement on Breast Tomosynthesis, American College of Radiology, 2017 http://www.acr.org/About-Us/Media-Center/Position-Statements/PositionStatements-Folder/20141124-ACR-Statement-on-Breast-Tomosynthesis , 2014

88

European Journal of Radiology 97 (2017) 83–89

C. Balleyguier et al.

Guidances/ucm187277.htm 2012 (accessed 01.10.16). [16] B. Lu, R. Greevy, X. Xu, C. Beck, Optimal nonbipartite matching and its statistical applications, Am. Stat. 65 (2011) 21–30. [17] S.L. Hillis, A comparison of denominator degrees of freedom methods for multiple observer ROC analysis, Stat. Med. 26 (2007) 596–619. [18] N.A. Obuchowski, H.E. Rockette, Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: an anova approach with dependent observations, Commun. Stat. − Simulation Comput. 24 (1995) 285–308. [19] E.R. DeLong, D.M. DeLong, D.L. Clarke-Pearson, Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach, Biometrics 44 (1988) 837–845. [20] N.A. Obuchowski, On the comparison of correlated proportions for clustered data, Stat. Med. 17 (1998) 1495–1507. [21] R.F. Brem, J. Baum, M. Lechner, et al., Improvement in sensitivity of screening mammography with computer-aided detection: a multiinstitutional trial, AJR Am. J. Roentgenol. 181 (2003) 687–693. [22] J.S. The, K.J. Schilling, J.W. Hoffmeister, et al., Detection of breast cancer with fullfield digital mammography and computer-aided detection, AJR Am. J. Roentgenol. 192 (2009) 337–340. [23] L.J. Warren Burhenne, S.A. Wood, C.J. D'Orsi, et al., Potential contribution of computer-aided detection to the sensitivity of screening mammography, Radiology 215 (2000) 554–562. [24] M. Dustler, M. Andersson, D. Fornvik, et al., A study of the feasibility of using slabbing to reduce tomosynthesis review time, Proc. SPIE 8673 Medical Imaging 86731L (2013). [25] K. Lang, I. Andersson, A. Rosso, et al., Performance of one-view breast tomosynthesis as a stand-alone breast cancer screening modality: results from the Malmo Breast Tomosynthesis Screening Trial a population-based study, Eur. Radiol. 26 (2016) 184–190. [26] A.M. McCarthy, D. Kontos, M. Synnestvedt, et al., Screening outcomes following implementation of digital breast tomosynthesis in a general-population screening program, J. Natl. Cancer. Inst. (2014), http://dx.doi.org/10.1093/jnci/dju316. [27] E.F. Conant, E.F. Beaber, B.L. Sprague, et al., Breast cancer screening using tomosynthesis in combination with digital mammography compared to digital mammography alone: a cohort study within the PROSPR consortium, Breast Cancer Res. Treat. 156 (2016) 109–116.

(accessed 01.10.16). [6] E.S. McDonald, A. Oustimov, S.P. Weinstein, et al., Effectiveness of digital breast tomosynthesis compared with digital mammography: outcomes analysis from 3 years of breast cancer screening, JAMA Oncol. 2 (2016) 737–743. [7] D. Bernardi, S. Ciatto, M. Pellegrini, et al., Application of breast tomosynthesis in screening: incremental effect on mammography acquisition and reading time, Br. J. Radiol. 85 (2012) e1174–1178. [8] P. Skaane, A.I. Bandos, R. Gullien, et al., Comparison of digital mammography alone and digital mammography plus tomosynthesis in a population-based screening program, Radiology 267 (2013) 47–56. [9] P.A. Dang, P.E. Freer, K.L. Humphrey, et al., Addition of tomosynthesis to conventional digital mammography: effect on image interpretation time of screening examinations, Radiology 270 (2014) 49–56. [10] G. Iussich, L. Correale, C. Senore, et al., CT colonography: preliminary assessment of a double-read paradigm that uses computer-aided detection as the first reader, Radiology 268 (2013) 743–751. [11] G. Iussich, L. Correale, C. Senore, et al., Computer-aided detection for computed tomographic colonography screening: a prospective comparison of a double-reading paradigm with first-reader computer-aided detection against second-reader computer-aided detection, Invest. Radiol. 49 (2014) 173–182. [12] D. Regge, G. Iussich, C. Senore, et al., Population screening for colorectal cancer by flexible sigmoidoscopy or CT colonography: study protocol for a multicenter randomized trial, Trials 15 (2014) 97. [13] G. van Schie, R. Mann, M. Imhof-Tas, et al., Generating synthetic mammograms from reconstructed tomosynthesis volumes, IEEE Trans. Med. Imaging 32 (2013) 2322–2331. [14] E.A. Sickles, C.J. D'Orsi, L.W. Bassett, et al., ACR BI-RADS® mammography, ACR BIRADS® Atlas, Breast Imaging Reporting and Data System, 5th edition, American College of Radiology, Reston, VA, USA, 2013. [15] Guidance for Industry and FDA Staff - Clinical Performance Assessment: Considerations for Computer-Assisted Detection Devices Applied to Radiology Images and Radiology Device Data - Premarket Approval (PMA) and Premarket Notification [510(k)] Submissions, U.S. Department of Health and Human Services, Food and Drug Administration, Center for Devices and Radiological Health, Division of Imaging and Applied Mathematics, Office of Science and Engineering Laboratories, Division of Radiological Devices, Office of In Vitro Diagnostic Device Evaluation and Safety, 2012 http://www.fda.gov/RegulatoryInformation/

89