Whole slide imaging diagnostic concordance with light microscopy for breast needle biopsies

Whole slide imaging diagnostic concordance with light microscopy for breast needle biopsies

    Whole Slide Imaging Diagnostic Concordance with Light Microscopy for Breast Needle Biopsies W. Scott Campbell PhD, MBA, Steven H. Hin...

1MB Sizes 0 Downloads 29 Views

    Whole Slide Imaging Diagnostic Concordance with Light Microscopy for Breast Needle Biopsies W. Scott Campbell PhD, MBA, Steven H. Hinrichs MD, Subodh M. Lele MD, John J. Baker MD, Audrey J. Lazenby MD, Geoffrey A. Talmon MD, Lynette M. Smith MS, William W. West MD PII: DOI: Reference:

S0046-8177(14)00160-9 doi: 10.1016/j.humpath.2014.04.007 YHUPA 3293

To appear in:

Human Pathology

Received date: Revised date: Accepted date:

30 January 2014 3 April 2014 9 April 2014

Please cite this article as: Campbell W. Scott, Hinrichs Steven H., Lele Subodh M., Baker John J., Lazenby Audrey J., Talmon Geoffrey A., Smith Lynette M., West William W., Whole Slide Imaging Diagnostic Concordance with Light Microscopy for Breast Needle Biopsies, Human Pathology (2014), doi: 10.1016/j.humpath.2014.04.007

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Title Page

RI P

T

Whole Slide Imaging Diagnostic Concordance with Light Microscopy for Breast Needle Biopsies

University of Nebraska Medical Center, Department of Pathology and Microbiology

MA

1

NU

SC

W. Scott Campbell, PhD, MBA1; Steven H. Hinrichs, MD1; Subodh M Lele, MD1; John J. Baker, MD1; Audrey J. Lazenby, MD1; Geoffrey A. Talmon, MD1: Lynette M. Smith, MS2; William W. West, MD1

University of Nebraska Medical Center, College of Public Health, Center for Collaboration on Research, Design and Analysis

ED

2

PT

Keywords: Whole Slide Imaging, Virtual Microscopy, Telemedicine, Needle Breast Biopsy Running head: Whole Slide Imaging for Breast Needle Biopsies

CE

Corresponding Author:

AC

W. Scott Campbell, PhD, MBA Department of Pathology and Microbiology University of Nebraska Medical Center 985900 Nebraska Medical Center Omaha, NE 68198-5900 402-559-9593 (o) 402-559-5900 (f) [email protected] Conflict of Interest Statement: At the time the work described in this manuscript was performed, Ventana was a research collaborator with the Department of Pathology and Microbiology. No monetary assistance was provided by Ventana to the Department or any project participant.

ACCEPTED MANUSCRIPT Abstract This study investigated the diagnostic accuracy of whole slide imaging (WSI) in breast

RI P

T

needle biopsy diagnosis in comparison with standard light microscopy (LM). The study examined the effects of image capture magnification and computer monitor quality on

SC

diagnostic concordance of WSI and LM.

NU

Four pathologists rendered diagnoses using WSI to examine 85 breast biopsies (92 parts; 786 slides) consisting of benign and malignant cases. Each WSI case was evaluated using

MA

images captured at either 20x or 40x magnifications and viewed using a DICOM grade, color-calibrated monitor or a standard, desktop LCD monitor. For each combination, the

ED

WSI result was compared to the original, LM diagnosis. The overall concordance rate

PT

observed between WSI and LM was 97.1% (95% CI:94.3%-98.5%). After a washout period, all cases were reviewed a second time by each pathologist after using LM, and the second

CE

LM diagnosis was compared to the WSI diagnosis rendered by the same pathologist.

AC

Intraobserver concordance between WSI and LM was 95.4% (95% CI:92.2%- 97.4%). The second LM diagnoses were also compared to the original LM diagnoses, and the observed interobserver LM concordance rate was 97.3% (95% CI:93.1% -99.0%). The study data demonstrated that breast needle biopsy diagnoses rendered by WSI were equivalent to diagnoses rendered by LM. No diagnostic differences were detected between the underlying viewing system parameters of monitor quality and image capture resolution. The results of this study demonstrated that WSI can be effectively utilized in subspecialty diagnostic cases where a minimum amount of tissue is available.

ACCEPTED MANUSCRIPT 1.0 Introduction The U.S. Food and Drug Administration (FDA) has determined that whole slide imaging

RI P

T

instruments (WSI) will be regulated as Class III medical devices [1,2]. The validation parameters used by commercial entities to obtain clearance are under consideration. This

SC

study investigated the diagnostic concordance rates of breast needle biopsy diagnosis when WSI was used as the primary diagnostic device in comparison with Light Microscopy

NU

(LM). Second, preliminary data was collected to determine whether the image capture

MA

resolution and viewing monitor characteristics utilized in this study affected diagnostic

ED

concordance rates between WSI and LM.

PT

While the diagnostic accuracy of traditional telepathology systems is well established [3-5],

CE

WSI represents the evolution of telepathology for remote histopathology diagnostics [6]. Evidence supporting the equivalence of WSI to LM as a primary diagnostic tool is mounting,

AC

and multiple studies supporting WSI for general surgical pathology usage have been published [7-9]. WSI has been evaluated for specialty surgical pathology applications including prostate [10], gastrointestinal [11, 12] and dermatopathology specimens [13-15]. The studies, in general, followed the CAP recommendations for validation of specific WSI uses [16]. WSI has been demonstrated to assist rapid diagnostic consultation in women’s health clinics [17], but there is limited data supporting the use of WSI for primary diagnosis of breast needle biopsies. This study investigated the use of WSI for primary diagnostic interpretation of breast tissue obtained by needle biopsy. Two factors have been identified that could impact the overall accuracy of WSI including the scanning image magnification

ACCEPTED MANUSCRIPT and the use of monitors of variable levels of quality. A previous WSI validation study by Campbell, et al. noted that when slides were imaged at a magnification of 20x,

T

interpretation of intranuclear detail was difficult and microorganism detection was

SC

RI P

challenging [7].

NU

There is a paucity of research investigating the optimal image capture resolution or monitor viewing characteristics used for pathologist interpretation of WSI. While

MA

researchers agree that glass slide quality [18], spatial resolution of images [19], camera optical lens quality [20] and color standardization [21] are important factors affecting WSI,

ED

no definitive specifications have been established to determine guidelines and parameters

PT

for WSI use for primary diagnosis. For WSI to be adopted and used for primary diagnoses, the issues of image quality and optimal system performance in terms of their impact on

AC

CE

diagnostic accuracy must be addressed.

The American College of Radiology (ACR) has published documentation and research supporting their guidelines for image management and viewing of images for diagnostic purposes [22]. The consideration and pathway established by the ACR for validation of digital imaging technologies was used to design a multi-step study to determine diagnostic accuracy of WSI and investigate the effect of image resolution and monitor characteristics on diagnostic concordance and accuracy of diagnoses rendered by WSI.

ACCEPTED MANUSCRIPT 2.0 Materials and Methods A computer review of the anatomic pathology laboratory information system was used to

RI P

T

identify 85 breast needle biopsy cases (i.e., stereo tactic, needle core, and vacuum assisted needle core) originally interpreted and reported by a single, board-certified pathologist

SC

who serves as the senior breast subspecialty expert for the department. Seventy-eight cases consisted of one part and 7 cases consisted of two parts comprising a total of 786

NU

Hematoxylin &Eosin (H&E) slides. Based on the original report, 56 benign diagnoses and

MA

36 malignant diagnoses were rendered on the 92 parts. Each case and part was given a study number, and all H&E slides for each case were scanned at 20x and at 40x. All images

ED

were created using a Ventana iCoreo scanner and viewed using Ventana Virtuoso Express viewing software (Ventana Medical Systems, Tucson, AZ). (Figure 1) The study was

CE

PT

approved by the University of Nebraska Medical Center Institutional Review Board.

AC

Four board-certified pathologists including three serving on the breast pathology subspecialty service reviewed all 85 cases exclusively by WSI. Each reviewer had received training in the use of the WSI viewer prior to the study and three of the four pathologists had extensive experience with WSI case examination spanning several years. The study was structured as a 2 x 2 study design comparing results using either images scanned at 20x or 40x magnifications and viewed on a standard, non-calibrated desktop monitor (17”, 1.3 megapixel flat screen Dell E177FP) or a DICOM, diagnostic grade, color calibrated monitor ( 30”, 4 megapixel flat screen NDS Dome E4c calibrated per manufacturer specifications). (See Table 1) Using this approach, each pathologist reviewed 25% of the

ACCEPTED MANUSCRIPT cases in one of the four image capture/image display categories; thereby each pathologist reviewed all the cases one time by WSI. No pathologist reviewed the same case twice by

RI P

T

WSI.

SC

Pathologists were presented cases in batches of 20-25 cases. Pathologists reviewed cases

NU

at their own pace. A total of 368 diagnoses on 92 parts using WSI over a period of six months were rendered. Each WSI diagnosis was compared to the original LM diagnosis to

MA

determine the overall WSI to LM concordance rate. The 92 WSI diagnoses rendered for each monitor quality/image resolution pairing were compared to the original LM

PT

ED

diagnoses to render a WSI concordance rate by monitor quality and image resolution.

CE

Cases were presented to each study physician with pre-examination information consistent with the usual operating procedures of the Department. Information included patient age,

AC

gender, gross description of the tissue specimen(s), pre-operative diagnosis and preoperative comments by the surgeon when provided. Study pathologists rendered a diagnosis for each case in a manner consistent with departmental case sign-out. In the event that a study pathologist requested a special diagnostic stain to reach a conclusive diagnosis and the stain was available from the original LM examination, it was retrieved, scanned at the magnification indicated in the study schedule for the requesting pathologist and presented to the study pathologist. If the study pathologist requested special stains that did not exist at the time of the original LM examination, no additional slides were

ACCEPTED MANUSCRIPT presented. Routine diagnostic special stains were provided (e.g., AE1/AE3, p63, e-

RI P

T

cadherin), but no tumor biomarker stains were provided (e.g., PR-Q, ER-Q, HER2/NEU).

Scoring was conducted on a three point scale: 1) concordant; 2) concordant with clinically

SC

insignificant differences; and 3) discordant with clinical significance. The scoring system

NU

was designed to differentiate diagnoses that were discordant with clinical significance such as malignant vs. benign or invasive carcinoma vs. ductal carcinoma in situ (DCIS)

MA

diagnoses. By contrast, differing diagnoses using equivalent variations in descriptive terminology were assigned to the concordant but clinically insignificant category as were

ED

differences in DCIS growth patterns and variations of no more than one histologic grade

CE

PT

difference in malignant diagnoses [19-21].

Following CAP recommendations for WSI validation and intraobserver variation, each

AC

study pathologist reviewed all study cases a second time using LM. To minimize recall bias, a washout period of no less than six months and no greater than 10 months was used. Case presentation details and procedures were identical to those used during WSI reviews. Each study pathologist’s subsequent LM diagnoses were compared to their individual WSI diagnoses and to the original LM sign-out diagnoses. Reproducibility of each pathologist’s interpretation relative to the diagnostic modality was evaluated in multiple ways as illustrated in Figure 2. Diagnostic comparisons included: A) Interoperator diagnostic comparison of WSI diagnoses and the original LM diagnoses; B) Intraoperator comparison of each study pathologist’s WSI diagnoses and their LM diagnoses of the same case; and C)

ACCEPTED MANUSCRIPT Interoperator diagnostic comparison of LM diagnoses of each pathologist to the original LM diagnoses. Generalized estimating equations (GEE) was used to estimate the concordance

T

rates, as well as 95% confidence intervals (CI). SAS software Version 9.3 (SAS Institute Inc.,

SC

RI P

Cary NC) was used for the data analysis.

NU

3.0 Results

Complete concordance between WSI and the original LM diagnoses was reached in 113

MA

(31%) diagnoses. Discordant opinions were reported from 12 (3%) diagnoses involving nine cases. The remaining 243 (66%) of the diagnoses were scored as concordant with

ED

clinically insignificant differences. When the two concordant categories were grouped

PT

together, overall concordance rates were 97.1% (95% CI: 94.3% - 98.5%) as shown in

AC

CE

Table 2. No significant difference between concordance rates by pathologist was noted.

Discordant diagnostic opinions are detailed in Table 3. One diagnosis was reported as invasive ductal adenocarcinoma by WSI which had been diagnosed originally as DCIS by LM (case 17). However, the LM diagnosis did report regions of tissue suspicious, but not diagnostic, for microinvasion. The LM diagnosis for Case 32 was invasive ductal adenocarcinoma in contrast to the diagnosis of DCIS rendered using WSI. Four cases (Cases 31, 48, 53 and 56) diagnosed as benign and non-atypical by LM were diagnosed using WSI as atypical ductal hyperplasia (ADH) and/or containing atypical cellular alterations and were categorized as discordant. In one case (case 71), the WSI opinion of one pathologist was fibroadenomatoid changes whereas the original LM diagnosis was invasive carcinoma with lobular features. One LM diagnosis of DCIS (Case 40)

ACCEPTED MANUSCRIPT was considered benign by two pathologists using WSI. One case diagnosed as benign fibrous breast tissue using LM was diagnosed as invasive carcinoma with lobular features by a single pathologist

RI P

T

using WSI.

SC

In 36 instances (39%), all study pathologist diagnoses were concordant with clinically insignificant differences to the original sign-out diagnosis. Four such cases were malignant diagnoses, and 31

NU

cases were benign diagnoses. Examples of concordant diagnoses with clinically insignificant differences are reported in Table 4. Complete concordance between all study pathologists and the

MA

original sign-out diagnosis was achieved in 7 cases (3%). Six of the 7 cases were benign, and one

PT

the original sign-out diagnosis.

ED

case was malignant in nature. In no instance were all study pathology diagnoses discordant with

CE

A comparison of study pathologist diagnoses rendered by LM to their WSI diagnoses (Figure 2 – Comparison B) and categorized using the same scoring criteria described previously resulted in a

AC

concordance rate of 95.4% (95% CI: 92.2%-97.4%) for intraoperator diagnoses (Table 5). When the study pathologists’ LM diagnoses were compared to the original, LM sign-out diagnoses (Figure 2 – Comparison C), a LM interoperator concordance rate of 97.3% (95% CI: 93.1% - 99.0%) was observed (Table 5).

Ten diagnoses rendered by WSI and classified as discordant to the originally reported LM diagnoses were reported as concordant when the same case was reviewed by LM by the same study pathologist. However, eight diagnoses rendered by WSI that were considered concordant to the original, LM sign out diagnoses were changed to discordant diagnoses by the study pathologist

ACCEPTED MANUSCRIPT when LM was used as their diagnostic modality. Examination of diagnostic changes made consisted equally of upgraded diagnoses (e.g., DCIS to invasive carcinoma or benign conditions to atypia) and

T

downgraded diagnoses (e.g., invasive carcinoma to DCIS or atypia to benign morphologies). See

SC

RI P

Table 6.

No statistically significant concordance relationships were observed between any of the four

NU

monitor quality/image resolution combinations. Overall diagnostic concordance rates by monitor/resolution category are reported in Table 7. No significant difference in discordant rates

MA

between WSI diagnoses and the original LM diagnoses or the study pathologist LM diagnoses were noted when the WSI case was reviewed using a standard desktop monitor vs. a higher resolution,

ED

color calibrated monitor. No significant difference in intraobserver concordance rates were

PT

observed between images scanned at 20x and 40x. A significant difference in interoperator discordant rates was observed between diagnoses rendered using images scanned at 20x and 40x

CE

(1.5% vs. 4.1%, p = 0.023). At 20x, two cases were downgraded by WSI and one case was upgraded

downgraded.

AC

(i.e., benign changed to malignant). Six 40x WSI diagnoses were upgraded and three were

4.0 Discussion The application of digital imaging to the practice of radiology required a complete evaluation of the technology including hardware, software and operator capability. Many similarities exist for digital pathology including the impact of computer monitor quality and image production on the accuracy of diagnoses rendered by WSI. To understand the effects of image resolution and viewing monitor quality on diagnostic accuracy of WSI, LM diagnoses were compared to WSI diagnoses rendered using different image capture resolutions and different monitor quality levels. An interoperator

ACCEPTED MANUSCRIPT WSI concordance rate of 97.1% and an interoperator LM concordance rate of 97.3% between study pathologists and the original sign-out pathologist were found to be equivalent to those published in

T

the literature [26-28]. The intraobserver concordance rate of 95.4% comparing LM diagnoses and

RI P

WSI diagnoses by the same pathologist was also within published LM intraoperator concordance rates. Together, the data indicate that diagnoses rendered by WSI are equivalent to LM and

SC

independent of the examining pathologist whether using 20x or 40x as the original image capture

NU

resolution or the quality of the monitor used to review the WSI.

MA

The definition of a “truth” diagnosis in surgical pathology has proven somewhat elusive in surgical pathology due to interpretive, cognitive and linguistic variability within the pathology community

ED

[29-32]. One solution to this problem has been to establish the “truth” diagnosis by expert panel

PT

consensus. In this study, the “truth” diagnosis was the original case diagnosis, and relying on one individual’s diagnostic opinion as the standard to measure diagnostic concordance would be

CE

problematic. This issue was mitigated by comparing the original LM diagnosis with LM diagnoses rendered by each pathologist in the study. This exercise established diagnostic consistency

AC

between all pathologists who participated in the study including the original, diagnosing pathologist using LM.

Although not a surprise, diagnostic discordance in the study arose from diagnostic disagreement between pathologists over terminilogy and interpretive criteria. Diagnostic discordance between usual hyperplasia and atypical hyperplasia is well documented [27] but was scored as discordant due to differences in subsequent clinical intervention. Pathologists involved in this concordance study were not provided the opportunity to acquire a second opinion to achieve consensus nor review other pertinent clinical information (e.g., radiographs), and therefore, did not have the

ACCEPTED MANUSCRIPT opportunity to review their interpretation with a colleague. This may have impacted the discordance rate in cases where discordance was noted between atypia and malignant diagnostic

RI P

T

interpretations and in cases where microinvasion was equivocal.

SC

The finding that intraobserver concordance rates of pathologists with WSI and LM competence was equivalent to published results from LM interoperator and intraoperator concordance studies [26-

NU

28] suggested that intraobserver validation studies may not be necessary for purposes of WSI validation. This possibility should be further explored because intraobserver studies increase the

MA

length of time to conduct validation studies. Following CAP recommendations [16], a washout period between case reviews to reduce recall bias could extend a study up to 12 months. While

ED

early telepathology diagnostic accuracy studies used a variety of washout periods for intraobserver

PT

comparison studies [33-35], CAP and the Laboratory Quality Center were unable to identify published studies documenting the relationship of washout period length and recall bias [16], a

AC

CE

finding which may impact the design and interpretation of intraobserver concordance studies.

Data from this study did not support the hypothesis that diagnostic concordance rates would increase from the use of a high-quality, color calibrated, DICOM monitor. While Yagi hypothesized that color calibration is important for WSI accuracy [21], data from this study does not indicate a significant relationship between color calibration and diagnostic concordance. This finding was consistent with Krupinski, et al. [36] who did not detect a relationship between diagnostic concordance and color calibration for breast biopsy tissue examination. Whereas Krupinski, et al. focused on specific regions of interest within a breast biopsy, the current study expands upon the findings by investigating the affect of color calibration on entire breast biopsy cases. Based on the

ACCEPTED MANUSCRIPT high levels of concordance in this study, additional studies with larger sample sets are needed to

RI P

T

detect differences in diagnostic concordance based on monitor quality.

Contrary to the initial hypothesis that WSI diagnosis concordance rates with LM would be higher

SC

when images were captured at higher resolutions, WSI diagnoses rendered using images captured at 40x were more likely to be discordant with LM diagnoses than WSI diagnoses rendered using

NU

images captured at 20x. The basis of this finding may be operational and not based on image clarity as images captured at 40x often approached one minute per slide to resolve, a substantial time

MA

difference compared to the time to place a slide under a LM or to change objective lenses. The study pathologists may simply not have reviewed all the 40x images involved in each case. High

ED

throughput computer processors optimized for image viewing may eliminate this operational

PT

concern [37]. However, some 40x WSI diagnoses were discordant by calling benign LM cases as malignant or atypical hyperplasia suggesting locator issues were not the sole cause for

CE

discordances. The total number of discordant diagnoses was small (n = 9), and additional study is

AC

needed to fully investigate the effect of image magnification on diagnostic accuracy.

The high rate of concordant but clinically insignificant diagnoses (65%) highlighted the variation in diagnostic language used by pathologists. This impacted the level of complete diagnostic concordance rates for this pathologist. The listing of fibrocystic changes and particular morphologies within the fibrocystic change continuum (e.g., adenosis, fibrosis, sclerosing adenosis, or proliferative fibrocystic change) was uncommonly identical to the LM diagnosis or between study pathologists. For cases with malignant diagnoses, reporting of growth pattern, grade, and presence of microcalcifications and necrosis varied. However, analysis of the findings demonstrated that a cancer diagnosis was less likely to result in a discordant and/or clinically

ACCEPTED MANUSCRIPT insignificant discordant result (p = 0.009) than a benign diagnosis such as fibrocystic change. This finding may be due to the fact that cancer diagnoses follow a structured reporting process in which

T

the presence or absence of defining characteristics are required to be enumerated within the

RI P

diagnostic report. A similar reporting structure does not exist for the wide array of benign changes

SC

in breast pathology.

NU

The data from this study support the use of WSI technology for breast needle biopsy diagnosis. The tissue presented to pathologists in this study varied in size but generally measured 2-3 mm in

MA

width and 10-20 mm in length. (Figure 3) None of the pathologists expressed concern over the small size of tissue available for diagnosis, and no reports of tissue insufficient of a diagnosis were

ED

made. Diagnostic discordance was characterized by known diagnostic challenges, variations in

PT

terminology and interpretive error. This study extends previous work demonstrating the equivalence of WSI to LM in general surgical pathology to challenging, specialty pathology

CE

diagnoses. Additional research is required to document minimum image capture resolution and computer monitor viewing characteristics for optimal WSI performance. Such data is important for

radiology.

AC

the FDA and CAP to develop WSI guidelines and standards for use as the ACR has done for

ACCEPTED MANUSCRIPT References

RI P

T

1. Faison TA. FDA regulation of whole slide imaging (WSI) devices: Current thoughts. U.S. Food and Drug Administration, Washington D.C., February 14, 2012. ftp://ftp.cdc.gov/pub/CLIAC_meeting_presentations/pdf/Addenda/cliac0212/Tab_15_Faison_CLIA C_2012Feb14_Whole_Slide_Imaging.pdf. Accessed 10 Jan 2014.

SC

2. Titus K. Regulators scanning the digital scanners. CAP Today. 2012;26:1-56-62.

NU

3. Dunn BE, Choi H, Recla DL, Kerr SE, Wagenman BL. Robotic surgical telepathology between the iron mountain and milwaukee department of veterans affairs medical centers: A 12-year experience. Hum Pathol. 2009;40(8):1092-1099.

MA

4. Dunn BE, Choi H, almagro U, Recla DL, Krupinski EA, Weinstein RS. Routine surgical telepathology in the department of veterans affairs: Experience-related improvements in pathologist performance in 2200 cases. Telemedicine Journal. 1999;5(4):323.

ED

5. Weinstein RS, Descour MR, Liang C, et al. Telepathology overview: From concept to implementation. Hum Pathol. 2001;32(12):1283-1299.

PT

6. Weinstein RS, Graham AR, Richter LC, et al. Overview of telepathology, virtual microscopy, and whole slide imaging: Prospects for the future. Hum Pathol. 2009;40(8):1057-1069.

CE

7. Campbell WS, Lele SM, West WW, Lazenby AJ, Smith LM, Hinrichs SH. Concordance between whole-slide imaging and light microscopy for routine surgical pathology. Hum Pathol. 2012;43:1739-1744.

AC

8. Bauer TW, Schoenfield L, Slaw RJ, Yerian L, Sun Z, Henricks WH. Validation of whole slide imaging for primary diagnosis in surgical pathology. Arch Pathol Lab Med. 2013;137:518-524. 9. Jukic DM, Drogowski LM, Martina J, Parwani AV. Clinical examination and validation of primary diagnosis in anatomic pathology using whole slide digital images. Arch Pathol Lab Med. 2011;135:372-378. 10. Rodriguez-Urrego PA, Cronin AM, Al-Ahmadie HA, et al. Interobserver and intraobserver reproducibility in digital and routine microscopic assessment of prostate needle biopsies. Hum Pathol. 2011;42:68-74. 11. Molnar B, Berczi L, Diczhazy C, et al. Digital slide and virtual microscopy based routine and telepathology evaluation of routine gastrointestinal biopsy specimens. J Clin Pathol. 2003;56:433438. 12. Al-Janabi S, Huisman A, Vink A, et al. Whole slide images for primary diagnostics of gastrointestinal tract pathology: A feasibility study. Hum Pathol. 2012;43:702-7. 13. Al Habeeb A, Evans A, Ghazarian D. Virtual microscopy using whole-slide imaging as an enabler for teledermatopathology: A paired consultant validation study. J Pathol Inform. 2012;3:2.

ACCEPTED MANUSCRIPT 14. Koch LH, Lampros JN, Delong LK, Chen SC, Woosley JT, Hood AF. Randomized comparison of virtual microscopy and traditional glass microscopy in diagnostic accuracy among dermatology and pathology residents. Hum Pathol. 2009;40:662-667.

RI P

T

15. Velez N, Jukic D, Ho J. Evaluation of 2 whole-slide imaging applications in dermatopathology. Hum Pathol. 2008;39:1341-1349.

SC

16. Pantanowitz L, Sinard JH, Henricks WH, et al. Validating whole slide imaging for diagnostic purposes in pathology: Guideline from the college of american pathologists pathology and laboratory quality center. Arch Pathol Lab Med. 2013;137:1798-810.

NU

17. Lopez AM, Graham AR, Barker GP, et al. Virtual slide telepathology enables an innovative telehealth rapid breast care clinic. Hum Pathol. 2009;40:1082-1091. 18. Yagi Y, Gilbertson JR. A relationship between slide quality and image quality in whole slide imaging (WSI). Diagn Pathol. 2008;3 Suppl 1:S12.

MA

19. Clarke GM, Zubovits JT, Katic M, Peressotti C, Yaffe MJ. Spatial resolution requirements for acquisition of the virtual screening slide for digital whole-specimen breast histopathology. Hum Pathol. 2007;38:1764-1771.

ED

20. Yagi Y, Gilbertson JR. The importance of optical optimization in whole slide imaging (WSI) and digital pathology imaging. Diagn Pathol. 2008;3 Suppl 1:S1.

PT

21. Yagi Y. Color standardization and optimization in whole slide imaging. Diagn Pathol. 2011;6 Suppl 1:S15.

CE

22. ACR practice guidelines and technical standards. http://www.acr.org/~/media/ACR/Documents/PGTS/toc.pdf. Accessed 10 Jan 2014.

AC

23. Monticciolo DL. Histologic grading at breast core needle biopsy: Comparison with results from the excised breast specimen. Breast J. 2005;11:9-14. 24. Schuh F, Biazus JV, Resetkova E, Benfica CZ, Edelweiss MI. Reproducibility of three classification systems of ductal carcinoma in situ of the breast using a web-based survey. Pathol Res Pract. 2010;206:705-711. 25. Wells WA, Carney PA, Eliassen MS, Grove MR, Tosteson AN. Pathologists' agreement with experts and reproducibility of breast ductal carcinoma-in-situ classification schemes. Am J Surg Pathol. 2000;24:651-659. 26. Raab SS, Nakhleh RE, Ruby SG. Patient safety in anatomic pathology: Measuring discrepancy frequencies and causes. Arch Pathol Lab Med. 2005;129:459-466. 27. Jain RK, Mehta R, Dimitrov R, et al. Atypical ductal hyperplasia: Interobserver and intraobserver variability. Mod Pathol. 2011;24:917-923. 28. Tsuda H, Akiyama F, Kurosumi M, Sakamoto G, Watanabe T. Monitoring of interobserver agreement in nuclear atypia scoring of node-negative breast carcinomas judged at individual

ACCEPTED MANUSCRIPT collaborating hospitals in the national surgical adjuvant study of breast cancer (NSAS-BC) protocol. Jpn J Clin Oncol. 1999;29:413-420.

T

29. Cramer SF. Interobserver variability in surgical pathology. In: Weinstein RS, Graham AR, eds. Advances in pathology and laboratory medicine. Volume 9 ed. St. Louis, MO: Mosby; 1996:3.

RI P

30. Frable WJ. Surgical pathology--second reviews, institutional reviews, audits, and correlations: What's out there? error or diagnostic variation? Arch Pathol Lab Med. 2006;130(5):620-625.

SC

31. Raab SS, Grzybicki DM, Janosky JE, et al. Clinical impact and frequency of anatomic pathology errors in cancer diagnoses. Cancer. 2005;104(10):2205-2213.

NU

32. Raab SS, Nakhleh RE, Ruby SG. Patient safety in anatomic pathology: Measuring discrepancy frequencies and causes. Arch Pathol Lab Med. 2005;129(4):459-466.

MA

33. Weinberg DS, Allaert FA, Dusserre P, et al. Telepathology diagnosis by means of digital still images: An international validation study. Hum Pathol. 1996;27(2):111-118.

ED

34. Piccolo D, Soyer HP, Burgdorf W, et al. Concordance between telepathologic diagnosis and conventional histopathologic diagnosis: A multiobserver store-and-forward study on 20 skin specimens. Arch Dermatol. 2002;138(1):53-58.

PT

35. Chorneyko K, Giesler R, Sabatino D, et al. Telepathology for routine light microscopic and frozen section diagnosis. Am J Clin Pathol. 2002;117(5):783-790.

CE

36. Krupinski EA, Silverstein LD, Hashmi SF, Graham AR, Weinstein RS, Roehrig H. Observer performance using virtual pathology slides: Impact of LCD color reproduction accuracy. J Digit Imaging. 2012;25(6):738-743.

AC

37. Yagi Y, Yoshioka S, Kyusojin H, et al. Ultra high speed whole slide image viewing system. Anal Cell Pathol (Amst). 2011;34:265-275. 38. Zhang, Stenback, Wardrop. Interval estimation of the process capability index. Communications in Statistics: Theory and Methods. 1990;19:4455-4470.

ACCEPTED MANUSCRIPT Legends

T

Figure 1 – Viewing station configuration

MA

Figure 3B – Example of tissue length (9mm)

NU

Figure 3A – Example of tissue width (3mm)

SC

RI P

Figure 2 – Diagnostic concordance comparisons between WSI, LM and pathologist. Comparison A: Interoperator comparison of Original LM diagnoses and WSI diagnoses. Comparison B: Intraoperator comparison of pathologists WSI diagnoses and their LM diagnoses. Comparison C: Interoperator comparison of original LM diagnoses to study pathologists LM diagnoses.

PT

ED

Table 1 – 2 x 2 study design describing each image capture and viewing monitor characteristics.

CE

Table 2 – Interobserver diagnostic concordance rates between WSI and original LM by pathologist and in total.

AC

Table 3 – Discordant diagnoses listed by case number. Diagnoses listed in the second column represent the original LM diagnoses, and diagnoses listed in the third column represent the discordant WSI diagnoses. As shown, two distinct, discordant diagnoses were rendered in two cases. Each diagnosis listed is attributed to a single pathologist.

Table 4 – Examples of diagnoses scored as concordant with clinically insignificant differences

Table 5 – Intraobserver diagnostic concordance rates between WSI and LM by the same pathologist and Interobserver diagnostic concordance rates between LM and original LM by pathologist and in total.

ACCEPTED MANUSCRIPT

T

Table 6 – Discordant diagnoses listed by case number. Diagnoses listed in the second column represent the original LM diagnoses, and diagnoses listed in the third column represent the discordant LM diagnoses rendered by study pathologists.

AC

CE

PT

ED

MA

NU

SC

RI P

Table 7. Probability of discordant WSI diagnoses to LM diagnoses by image resolution and monitor type groupings. (Generalized estimating equations (GEE) calculations [38])

ACCEPTED MANUSCRIPT

Pathologist

1.3 MP monitor

4 MP monitor

Pathologist 1

Parts 1 - 23

Parts 70 - 92

Pathologist 2

Parts 24 – 46

Pathologist 3

Parts 47 – 69

Pathologist 4

Parts 70 - 92

RI P

Parts 1 - 23

SC

Parts 24 - 46

NU

40x image capture

T

20x image capture

Parts 47 - 69

Parts 47 - 69

Parts 24 - 46

Pathologist 2

Parts 70 - 92

Parts 47 - 69

MA

Pathologist 1

Pathologist 3

Parts 70 - 92

Parts 24 - 46

Parts 1 - 23

ED

Pathologist 4

Parts 1 - 23

AC

CE

PT

Table 1 – 2 x 2 study design describing each image capture and viewing monitor characteristics.

ACCEPTED MANUSCRIPT

Pathologist 2

Pathologist 3

28 (30.4%) 59 (64.1%)

16 (17.4%) 75 (81.5%)

32 (34.8%) 56 (60.9%)

5 (5.4%)

1 (1.1%)

4 (4.3%)

Pathologist 4

Aggregate

SC

RI P

T

Pathologist 1

NU

Interobserver WSI Concordance Score to original LM diagnosis Concordant Clinically Insignificant Difference Discordant

37 (40.2%) 53 (57.6%)

113 (30.7%) 243 (66.0%)

2 (2.2%)

12 (3.3%)

MA

Table 2 – Interobserver diagnostic concordance rates between WSI and original LM by pathologist

AC

CE

PT

ED

and in total.

ACCEPTED MANUSCRIPT Original Light Microscope Diagnosis

Discordant Whole Slide Image Diagnoses

Left breast biopsies: benign fibrous breast tissue with foci of Fibroadematoid change; microcalcifications identified

Invasive mammary carcinoma with lobular features.

17

Ductal carcinoma in situ with extensive periductal sclerosis and inflammation. Focus suspicious but not diagnostic for microinvasion. Solid growth pattern with high nuclear grade. Microcalcifications within DCIS and necrosis present.

Invasive ductal adenocarcinoma, grade 3 with DCIS and without angiolymphatic invasion.

31

Fragmented portions of focally calcified sclerosing papilloma and associated benign breast tissue. No atypia or malignancy identified

Single microscopic focus of Atypical Ductal Hyperplasia (ADH)/DCIS. Fibrocystic change with sclerosing adenosis

32

Invasive ductal adenocarcinoma. Nottingham grade 2/3, no DCIS identified

SC

RI P

T

Case Number 6

Fibrocystic changes with myoepithelial hyperplasia. Focal rupture of duct with chronic inflammation.

2.

Mucocele like lesion focally with an area with crush artifact. Differential includes atypical ductal hyperplasia, sclerosed papilloma and in situ carcinoma

DCIS with cribriform to micropapillary growth pattern. Low nuclear grade without necrosis. Microcalcification in DCIS.

1.

Fibrocystic change with usual Intraductal hyperplasia, multifocal microcalcification, prominent sclerosing adenosis.

2.

Fibrocystic mastopathy with sclerosing adenosis and ductal hyperplasia of the usual type with associated microcalcifications within benign ductal lumina. (Possible ADH but favor usual hyperplasia)

PT

ED

40

MA

NU

1.

Benign breast tissue with proliferative fibrocystic changes including microcyst formation, aprocine metaplasia, and nonatypical hyperplasia

53

Benign breast tissue with proliferative fibrocystic changes including non-atypical ductal hyperplasia, fibrosis and duct ectasia

AC

CE

48

Fibrocystic mastopathy with ductal hyperplasia and areas of columnar cell and micropapillary change with atypia; associated microcalcifications noted within ductal lumina. 1.

Atypical micropapillary and papillary ductal epithelial hyperplasia

2.

Atypical ductal hyperplasia

56

Benign breast parenchyma with proliferative fibrocystic changes including sclerosis adenosis with associated microcalcifications

Proliferative fibrocystic change with atypical ductal hyperplasia focally bordering on DCIS; flat epithelial atypia, sclerosing adenosis, microcalcifications

71

Invasive carcinoma with features suggestive of pleomorphic lobular type. Nottingham grade 2/3.

Benign breast tissue with focal microcalcification and focal fibroadenomatoid change.

Table 3 – Discordant diagnoses listed by case number. Diagnoses listed in the second column represent the original LM diagnoses, and diagnoses listed in the third column represent the discordant WSI diagnoses. As shown, two distinct, discordant diagnoses were rendered in two cases. Each diagnosis listed is attributed to a single pathologist.

ACCEPTED MANUSCRIPT

15

18

Proliferative fibrocystic changes Fragments of benign breast tissue with fibrocystic change and focal microcalcification Intraductal papilloma with DCIS. Microcalcifications identified

T

9

Benign breast tissue with focal adenosis, non-atypical hyperplasia and associated microcalcifications Benign breast tissue without foci of fibroadematoid change and microcalcifications DCIS with papillary, micropapillary growth pattern. Low nuclear grade. No necrosis. Microcalcification in DCIS DCIS with papillary, micropapillary and cribriform growth pattern. Intermediate nuclear grade. Necrosis focally present and microcalcification present in DCIS Invasive ductal adenocarcinoma. Nottingham grade 2/3. Associated DCIS with solid growth pattern, intermediate nuclear grade, non-necrotic

Reason for scoring as concordant with clinically insignificant differences No mention of adenosis, microcalcifications, or ductal hyperplasia No mention of Fibroadenoma No mention of grade, growth pattern or necrotic. Report of papilloma not mentioned in original diagnosis Nuclear grade different. Growth patterns different.

RI P

6

Concordant diagnosis with clinically insignificant differences

DCIS with papillary and solid growth patterns, high nuclear grade with focal necrosis and microcalcifications Invasive ductal carcinoma, at least grade 2/3.

SC

1

Original Diagnosis

NU

Case Number

No mention of DCIS and characteristics of DCIS

AC

CE

PT

ED

MA

Table 4 – Examples of diagnoses scored as concordant with clinically insignificant differences

ACCEPTED MANUSCRIPT Pathologist 2

Pathologist 3

Pathologist 4

52 (56.5%)

39 (42.4%)

44 (47.8%)

29 (31.5%)

36 (39.1%)

46 (50.0%)

43 (46.7%)

4 (4.3%)

7 (7.6%)

5 (5.4%)

34 (37.0%)

17 (18.5%)

56 (60.9%)

70 (76.1%)

2 (2.2%)

5 (5.4%)

T

Pathologist 1

164 (44.6%) 186 (50.5%)

RI P

61 (66.3%)

Aggregate

18 (4.9%)

32 (34.8%)

33 (35.9%)

116 (31.5%)

57 (62.0%)

59 (64.1%)

242 (65.8%)

3 (4.3%)

0 (0%)

10 (2.7%)

NU

SC

2 (2.2%)

MA

Intraobserver WSI Concordance Score to LM diagnosis Concordant Clinically Insignificant Difference Discordant Interobserver LM Concordance to original LM diagnosis Concordant Clinically Insignificant Difference Discordant

Table 5 – Intraobserver diagnostic concordance rates between WSI and LM by the same

ED

pathologist and Interobserver diagnostic concordance rates between LM and original LM by

AC

CE

PT

pathologist and in total.

ACCEPTED MANUSCRIPT

6

Left breast biopsies: benign fibrous breast tissue with foci of Fibroadematoid change; microcalcifications identified DCIS, low grade with papillary and cribiform growth pattern, microcalcifications present

9

Study Pathologist Diagnostic resolutions/discrepancies when cases reviewed by light microscope All WSI discrepancies resolved 1.

Benign fibrocystic change with usual ductal hyperplasia, intraductal papilloma, and microcalcification.

2.

Intraductal papilloma with focal atypical ductal hyperplasia with calcifications

31 32

Fragmented portions of focally calcified sclerosing papilloma and associated benign breast tissue. No atypia or malignancy identified Invasive ductal adenocarcinoma. Nottingham grade 2/3, no DCIS identified DCIS with cribriform to micropapillary growth pattern. Low nuclear grade without necrosis. Microcalcification in DCIS.

50 53 56 71

Benign breast tissue with proliferative fibrocystic changes including microcyst formation, aprocine metaplasia, and non-atypical hyperplasia Benign breast tissue with proliferative fibrocystic changes and fibroadenoma Benign breast tissue with proliferative fibrocystic changes including non-atypical ductal hyperplasia, fibrosis and duct ectasia Benign breast parenchyma with proliferative fibrocystic changes including sclerosis adenosis with associated microcalcifications Invasive carcinoma with features suggestive of pleomorphic lobular type. Nottingham grade 2/3.

AC

48

CE

PT

ED

40

NU

20

Ductal carcinoma in situ with extensive periductal sclerosis and inflammation. Focus suspicious but not diagnostic for microinvasion. Solid growth pattern with high nuclear grade. Microcalcifications within DCIS and necrosis present. Benign breast tissue with foci of adenosis with associated microcalcifictations

MA

17

Atypical intraductal papilloma (with at least atypical ductal hyperplasia, bordering on low-grade DCIS) with associated microcalcifications. Recommend excision of lesion Invasive ductal carcinoma, high grade, ductal carcinoma in situ, high grade with comedo necrosis

SC

3.

T

Original Light Microscope Diagnosis

RI P

Case Number

Flat epithelial atypia with calcifications All WSI discrepancies resolved All WSI discrepancies resolved 1.

Fibrocystic change with usual Intraductal hyperplasia, multifocal microcalcification, prominent sclerosing adenosis. (Dr.1 stays discord without changes to WSI dx)

2.

Fibrocystic changes with sclerosing adenosis, radial scar, and focal atypical ductal hyperplasia. Adenoma with lactational change

3.

Fibrocystic mastopathy with sclerosing adenosis and ductal hyperplasia of the usual type with associated microcalcifications within benign ductal lumina. (Possible ADH but favor usual hyperplasia) All WSI discrepancies resolved Phyllodes tumor (two diagnoses) All WSI discrepancies resolved All WSI discrepancies resolved All WSI discrepancies resolved

Table 6 – Discordant diagnoses listed by case number. Diagnoses listed in the second column represent the original LM diagnoses, and diagnoses listed in the third column represent the discordant LM diagnoses rendered by study pathologists.

ACCEPTED MANUSCRIPT

Mean probability discordant

Standard Error

Lower

Upper

1-20x

1-Standard

0.007713

1-20x

2-Calibrated

0.03087

0.01513

RI P

Resolution Monitor

T

95% Confidence Limits

2-40x

1-Standard

0.05041

0.02133

0.02169

2-40x

2-Calibrated

0.03318

0.01886

0.01073 0.09798

0.01052 0.000526

0.01168 0.07906

SC

NU

0.103

0.1128

AC

CE

PT

ED

MA

Table 7. Probability of discordant WSI diagnoses to LM diagnoses by Image resolution and monitor type groupings. (Generalized estimating equations (GEE) calculations [26])

MA

NU

SC

RI P

T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

Fig. 1

CE

PT

ED

MA

NU

SC

RI P

T

ACCEPTED MANUSCRIPT

AC

Figure 2 – Diagnostic concordance comparisons between WSI, LM and pathologist. Comparison A: Interoperator comparison of Original LM diagnoses and WSI diagnoses. Comparison B: Intraoperator comparison of pathologists WSI diagnoses and their LM diagnoses. Comparison C: Interoperator comparison of original LM diagnoses to study pathologists LM diagnoses.

NU

SC

RI P

T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

MA

Fig. 3A

NU

SC

RI P

T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

MA

Fig. 3B