Evaluation of whole slide image immunohistochemistry interpretation in challenging prostate needle biopsies

Evaluation of whole slide image immunohistochemistry interpretation in challenging prostate needle biopsies

Human Pathology (2008) 39, 564–572 www.elsevier.com/locate/humpath Original contribution Evaluation of whole slide image immunohistochemistry inter...

1MB Sizes 1 Downloads 21 Views

Human Pathology (2008) 39, 564–572

www.elsevier.com/locate/humpath

Original contribution

Evaluation of whole slide image immunohistochemistry interpretation in challenging prostate needle biopsies B Jeffrey L. Fine MD a,⁎, Dana M. Grzybicki MD, PhD b , Russell Silowash BS b , Jonhan Ho MD a,c,d , John R. Gilbertson MD a,1 , Leslie Anthony MA d , Robb Wilson MA b , Anil V. Parwani MD, PhD a,d , Sheldon I. Bastacky MD a , Jonathan I. Epstein MD e , Drazen M. Jukic MD, PhD a,c,d a

Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA c Department of Dermatology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15232, USA d Innovative Medical and Information Technologies (IMITs) Center, University of Pittsburgh Medical Center, Pittsburgh, PA 15203, USA e Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, USA b

Received 19 June 2007; revised 2 August 2007; accepted 3 August 2007

Keywords: Digital pathology; Telepathology; Whole slide images; Virtual slides; Virtual microscopy; Prostate biopsy; Prostate carcinoma; Immunohistochemistry

Summary Whole slide images (WSIs), also known as virtual slides, can support electronic distribution of immunohistochemistry (IHC) stains to pathologists that rely on remote sites for these services. This may lead to improvement in turnaround times, reduction of courier costs, fewer errors in slide distribution, and automated image analyses. Although this approach is practiced de facto today in some large laboratories, there are no clinical validation studies on this approach. Our retrospective study evaluated the interpretation of IHC stains performed in difficult prostate biopsies using WSIs. The study included 30 foci with IHC stains identified by the original pathologist as both difficult and pivotal to the final diagnosis. WSIs were created from the glass slides using a scanning robot (T2, Aperio Technologies, Vista, CA). An evaluation form was designed to capture data in 2 phases: (1) interpretation of WSIs and (2) interpretation of glass slides. Data included stain interpretations, diagnoses, and other parameters such as time required to diagnose and image/slide quality. Data were also collected from an expert prostate pathologist, consensus meetings, and a poststudy focus group. WSI diagnostic validity (intraobserver pairwise κ statistics) was “almost perfect” for 1 pathologist, “substantial” for 3 pathologists, and “moderate” for 1 pathologist. Diagnostic agreement between the final/consensus diagnoses of the group and those of the domain expert was “almost perfect” (κ = 0.817). Except for one instance, WSI technology was not felt to be the cause of disagreements. These results are encouraging and compare favorably with other efforts to quantify diagnostic variability in surgical pathology. With thorough training, careful validation of specific applications, and regular

☆ This work was supported by funding from the US Air Force (contract no. DAMD17-03-2-0017) administered by the US Army Medical Research Acquisition Activity, Ft Detrick, MD. The content of the information does not necessarily reflect the position or policy of the US government and no official endorsement should be inferred. ⁎ Corresponding author. E-mail address: [email protected] (J. L. Fine). 1 Dr Gilbertson is currently with the Department of Pathology, Case Western Reserve University, Cleveland, OH.

0046-8177/$ – see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.humpath.2007.08.007

Evaluation of WSI IHC interpretation in challenging prostate needle biopsies

565

postsignout review of glass IHC slides (eg, quality assurance review), WSI technology can be used for IHC stain interpretation. © 2008 Elsevier Inc. All rights reserved.

1. Introduction Whole slide images (WSIs), also known as virtual slides, are digital facsimiles of entire histopathologic sections originally mounted on glass microscope slides [1,2]. These images are viewed using interactive software, termed “virtual microscopy software,” which enables the user to adjust magnification and navigate to any portion of the image (Fig. 1). Until recently, application of WSI technology has primarily been limited to education and proficiency testing [3,4], but newer studies describe its potential usefulness in quality assurance activities and in the analysis of pathologists' diagnostic decision making [5,6].

A small number of previously published validation studies, including our own, support the clinical effectiveness of WSIs in settings and situations in which static and dynamic digital imaging telepathology have been shown to be useful [7-10]. Feasibility studies demonstrating the use of WSIs in real-time anatomic pathology practice are needed before integration of this technology into day-to-day practice may practicably occur, but such studies are lacking in this relatively recent imaging modality. Pathologists interested in digital imaging at the University of Pittsburgh Medical Center (UPMC) have been actively involved in the development and evaluation of novel WSI applications and resources [5,7], with the goal of improving the

Fig. 1 WSI viewing software, also known as “virtual microscopy” software (ImageScope, version 7.3.36.1042, Aperio Technologies). This is a prostate biopsy (prostate adenocarcinoma) stained with p63 that was originally scanned at 20× objective magnification, zoomed in to maximum magnification. The configuration is typical, with a low-magnification thumbnail image in the upper right corner, a highmagnification image comprising most of the viewing area, magnification controls in the upper left area, and several “annotation” tool buttons (ie, for measuring or marking the WSI). The user can navigate to any portion of the image using the computer mouse (ie, “click and drag”); magnification can be changed by rolling the mouse wheel or by clicking on the visible magnification controls. Also present are thumbnails of other available images (left side) permitting rapid switching between different stains in this case. Finally, a magnification tool is present (thick-lined rectangle near center); this tool permits even more magnification of interesting areas and aids detection of subtle features such as faint nucleoli.

566 quality and efficiency of anatomic pathology services not only for education and research, but also for the routine clinical practice of anatomic pathology. The UPMC pathology laboratory practice structure is based on a center of excellence (COE) model with many UPMC pathologists practicing in one subspecialty area based on specific tissue types. Examination and interpretation of patient specimens is commonly performed by expert pathologists who have received subspecialty training in the histopathology of a specific tissue type (eg, pulmonary pathology, genitourinary [GU] pathology). The UPMC GU COE is located at Shadyside Hospital, 1 of 18 hospitals in the UPMC health care system. On average, 1000 prostate needle biopsy specimens are interpreted at the UPMC GU COE per year, the majority performed for prostate cancer screening. The technical part of diagnostic immunohistochemistry (IHC) tests, commonly required for accurate diagnosis of prostate needle biopsy specimens, is performed at the UPMC IHC laboratory, which is located at a different but geographically close hospital (Presbyterian Hospital). Interpretation of IHC by pathologists at the UPMC GU COE therefore requires the physical transfer of glass slides between hospitals, resulting in decreased efficiency and increased costs. Similarly, pathologists in private practice rely exclusively on distant commercial laboratories for performance of IHC because most private practice laboratories do not have on-site IHC services. In addition, newer and increasingly stringent laboratory accreditation requirements for IHC testing may result in increased outsourcing of IHC by laboratories with insufficient volume to validate some in-house tests [11]. The increasing necessity for IHC testing at distant sites supports a useful role for WSIs, with digital pathology applications improving the efficiency of transmission of IHC information to ordering pathologists. In previously published studies, we have demonstrated the validity of WSIs for diagnosis of hematoxylin and eosin (H&E)–stained prostate needle biopsy slides [7]. To our knowledge, validity studies focused on the interpretation of IHC stained sections using WSIs have not been published. The goal of this study was to measure the diagnostic validity of prostate needle biopsy WSIs in a sample of challenging cases that required the examination and interpretation of IHC stained slides for accurate diagnosis.

J. L. Fine et al. part of the diagnostic process, was generated by retrospective review of the UPMC anatomic pathology laboratory information system (CoPath Plus, Cerner Corporation, Kansas City, MO). Cases were de-identified by eliminating all patient, pathologist, clinician, and accession identifiers (as well as any other HIPAA safe harbor data) from the slides and clinical reports. A pathologist not involved in the interpretive phases of the study reviewed the anatomic pathology reports to identify prostate biopsy cases in which (1) the IHC stains were ordered specifically to confirm or rule out cancer in an atypical or suspicious area with no areas of obvious cancer elsewhere in the slide, (2) the results of the individual stains were clearly stated in the clinical report, and (3) the resulting IHC slides contained the lesion in question (ie, the region of interest was not cut through, making stains noncontributory). Archived slides from these cases were pulled along with appropriate IHC controls. The slides were then examined to select cases in which the suspicious area(s) that prompted IHC ordering could be clearly identified—either from markings on the slide or from specific descriptions in the clinical report (the area of interest was previously marked/dotted on the H&E-stained glass slide in approximately 90% of cases) (Fig. 2). For purposes of this study, each focus constituted a “case” and included H&E levels, IHC stains, and IHC controls. Selected slides were labeled with assigned case numbers and information regarding the block and stain. In one instance, a single block contained 2 distinct diagnostic foci that otherwise met the inclusion criteria; each focus was treated as a separate case (this was clearly marked on all glass microscope slides and WSIs).

2. Materials and methods 2.1. Study design This evaluation study involved retrospective review of a targeted sample of prostate needle core biopsies obtained from the UPMC GU COE anatomic pathology archives.

2.2. Case selection An initial list of approximately 60 consecutive prostate needle biopsies, examined 6 to 9 months before the study and for which IHC was ordered by the examining pathologist as

Fig. 2 Representative focus of prostate adenocarcinoma with dot stained with p63. Low-magnification view of a typical biopsy with a focus that was marked with blue ink (WSI created at 20× objective magnification). For the purposes of this study, this single dotted focus was designated as a study case (the entire case included an H&E stain and 3 IHC stains).

Evaluation of WSI IHC interpretation in challenging prostate needle biopsies

567

2.3. Apparatus

2.6. Data collection

Glass slides were scanned using an automated WSI scanning robot (T2, Aperio Technologies, Vista, CA) with a 20× microscope objective at a spatial sampling period of 0.47 μm per pixel. WSIs were viewed using desktop personal computers with vendor-provided viewer software. Both Web-based and stand-alone viewer software were used. Desktop computers were of a standard institutional configuration (eg, Pentium processor, 512 to 1024 megabytes of RAM, Microsoft Windows 2000/XP with connectivity to the institutional network). Standard surgical pathology microscopes (BX45, Olympus America, Center Valley, PA) were used for evaluation of glass slides, and a departmental multihead microscope (BX 51, Olympus) was used during consensus conferences.

Evaluation forms were independently completed by the 5 subjects for all 30 cases in 2 phases: (1) review of WSI H&E and the corresponding virtual (WSI) IHC stained sections and (2) review (or rather rereview) of IHC stained glass slides. To measure the validity of individual case diagnoses, a domain expert independently reviewed the same WSI of the 30 cases and rendered his pathologic diagnosis for each case; this was considered the “gold standard.” WSI images of the H&E and IHC stains for the 30 cases were transmitted to the domain expert who also completed a case assessment form for each of the 30 cases. Data were entered into a Microsoft Access database by a technology evaluation researcher.

2.7. Consensus conferences 2.4. Study subjects The study took place in the Division of Anatomic Pathology at UPMC Presbyterian and Shadyside Hospitals and included 5 volunteer pathologists as subjects. Subjects included 2 pathology fellows and 3 staff pathologists with fellowship training in GU pathology. A well-known prostate pathology expert from Johns Hopkins University participated as a consulting domain expert. Additional study personnel included a project manager, Laboratory Information System (LIS) staff, imaging support staff, and technology evaluation researchers.

2.5. Evaluation tool A unique evaluation tool was developed for use in this study. The tool was a hard copy case-assessment form containing items for recording multiple case characteristics, including diagnosis, slide/image quality, case complexity, subject diagnostic confidence, and time to complete case examination. All items were recorded as nominal categorical variables except for time to complete, which was recorded as an ordinal categorical variable. Pathologic diagnosis was recorded as adenocarcinoma, high-grade prostatic intraepithelial neoplasia (PIN), atypical (ATYPIN), or benign. The term ATYPIN was defined as a “focus of high-grade PIN with adjacent small atypical glands” [12]; in other words, although these glands could represent a small focus of invasive carcinoma, they could also represent oblique (tangential) sectioning of high-grade PIN glands. Atypical was defined as a focus suspicious for, but not completely diagnostic of, neoplasia. Slide/image quality was recorded on a 3-category scale representing poor, diagnostic, and excellent. Case complexity and diagnostic confidence variables were recorded on a 3-category scale representing low/poor, medium/acceptable, and high/good. Time to complete the case was recorded in ordinal 15-minute categories of time ranging from less than 15 minutes to more than 60 minutes.

After subjects completed their individual assessments of the cases, they participated in several face-to-face consensus conferences in order to arrive at a diagnostic consensus for each case in which there was diagnostic disagreement between any of the subjects. Subjects collectively viewed WSIs to arrive at the consensus diagnoses, and the original glass slides were on hand if subjects wanted to view them through a multihead microscope. Viewing of glass slides from cases occurred primarily to remind subjects of the glass slide appearance of specific foci and/or to compare the appearance of specific foci across the 2 mediums. Consensus diagnoses were compared with the original UPMC-assigned diagnoses and to the gold-standard diagnoses. Discrepancies were discussed. Included in the discussions were subjects' assessments regarding the role of the WSI medium in diagnostic discrepancies, if any.

2.8. Focus group A focus group involving the 5 subjects and the domain expert was conducted at the conclusion of the last consensus conference. The goal of the focus group was to obtain additional qualitative information regarding perceptions of the strengths and weaknesses of WSIs for clinical applications. The focus group was audiotaped and hard copy written notes were recorded by technology evaluation researchers.

2.9. Data analysis All quantitative data analysis was performed using the statistical software package SPSS (version 13.0, SPSS Inc, Chicago, IL). For cases where subjects recorded more than one diagnosis (eg, ATYPIN and adenocarcinoma), the most severe diagnosis was used for data analysis. WSI diagnostic validity was measured by comparing the level of intraobserver agreement between the WSI and glass slide diagnoses assigned to each case by each of the 5 subjects. In addition,

568 WSI diagnostic validity was examined by comparing the level of interobserver agreement between subjects when examining WSIs (phase 1) with the level of interobserver agreement between subjects when examining glass slides (phase 2). Diagnostic validity for the study cases was measured by comparing the interobserver agreement between the group consensus diagnoses and the goldstandard diagnoses. Intra- and interobserver agreement was measured using pairwise κ statistics. κ values were interpreted as poor (less than or equal to 0), slight (0.010.20), fair (0.21-0.40), moderate (0.41-0.60), substantial (0.61-0.80), and almost perfect (0.81-1) [13]. Internal, external, and construct validity were measured using Spearman rank correlation coefficient to determine whether theoretically expected statistically significant associations between specific variables could be demonstrated [14]. Internal validity was determined by examining associations between the numbers of glass slides or WSIs examined and time to complete the case. External validity was determined by examining associations between quality ratings assigned to corresponding glass slides and WSIs. Construct validity was determined by examining the association between case complexity and diagnostic confidence ratings. Statistical significance was assumed at a P value of less than or equal to .05. Qualitative assessment of the recorded focus group discussion data was performed using content analysis aimed at identifying recurrent themes, concepts, and ideas expressed by the subjects and domain expert during the focus group discussion.

J. L. Fine et al. Table 1 Intraobserver diagnostic agreement between WSI (phase 1) and glass microscope slides (phase 2) (the subjects are labeled “Pathologist”) Pathologist 1

Phase 1

Phase 2

0.586

Pathologist 2

Phase 1

Phase 2

0.813

Pathologist 3

Phase 1

Phase 2

0.615

Pathologist 4

Phase 1

Phase 2

0.726

Pathologist 5

Phase 1

Phase 2

0.799

Although individual subject's levels of diagnostic agreement with the gold standard when viewing WSIs were variable (range, 0.516-0.730), the diagnostic agreement between the consensus diagnoses and the gold standard was almost perfect (κ = 0.817). Consensus conference review of cases with diagnostic disagreements repeatedly revealed that disagreement regarding the assignment of case final diagnoses was primarily due to variability in individual pathologist's use of diagnostic categories rather than differences in pathologist's detection of key histopathologic foci on H&E-stained sections or variable interpretations of IHC stains.

3.2. Time data

3. Results 3.1. WSI diagnostic validity Intraobserver pairwise κ statistics representing the level of subject diagnostic agreement when viewing WSIs compared to glass slides is shown in Table 1. One of the subjects exhibited almost perfect agreement, 3 exhibited substantial agreement, and one exhibited moderate agreement. Tables 2 and 3 show interobserver pairwise κ statistics representing levels of diagnostic agreement between subjects when viewing WSIs (Table 2) and glass slides (Table 3). Examination and comparison of the values revealed that interobserver agreement for the 2 mediums was highly similar, with 3 pairs of subjects showing substantial agreement when viewing either images or slides, 5 pairs (WSI) and 7 pairs (glass slides) showing moderate agreement, and 2 pairs showing fair agreement when viewing WSIs. The range of pairwise values was 0.359 to 0.678 and 0.502 to 0.672 for WSIs and glass slides, respectively. Therefore, there appeared to be a tendency for a slightly higher level of agreement between subjects when viewing glass slides.

For glass slides, 86% of cases required less than 15 minutes for diagnosis and 9% of cases required from 15 to 30 minutes; 5% of responses were missing (n = 150). For WSIs, 81% of cases required less than 15 minutes for diagnosis, 12% cases required from 15 to 30 minutes, and 1 case required 30 to 45 minutes (1%); 6% of responses were missing (n = 149).

3.3. Focus group Subjects felt that the overall optical image quality of WSIs is currently not capable of exceeding that of glass microscope slides; however, they believed that the image quality of WSIs of tissue sections stained with immunohistochemical stains was adequate for IHC stain interpretation. In only one case did they feel that the image quality was noticeably inferior to that of the glass microscope slide in terms of being able to see necessary details. Another strength of WSIs expressed by multiple subjects was that WSIs seemed to be superior in this group of cases for examination of some specific cellular details (nuclear morphology) compared with light microscopy, such as when using the magnifier tool (Fig. 1) to zoom in very closely. Furthermore, subjects felt that one's

Evaluation of WSI IHC interpretation in challenging prostate needle biopsies Table 2 Interobserver diagnostic agreement (phase 1, WSI) (subjects are labeled “Path” 1 through 5) Path 1 Path Path Path Path

1 2 3 4

Path 2

Path 3

Path 4

Path 5

0.451

0.362 0.678

0.561 0.648 0.643

0.359 0.557 0.497 0.580

comfort level and skills for use of the system are quickly acquired with practice. Weaknesses of the system that are still in need of improvement centered on technical functionality limitations, including system speed and difficulties focusing with loss of resolution at higher magnifications.

3.4. Evaluation tool validity Internal validity for the assessment tool was determined by testing the hypothesis that a positive correlation should exist between the number of case slides or images and subjects' recorded estimations of the time required to examine the case. A statistically significant correlation was present for only 1 of the 5 subjects (r2 = 0.327, P = .01). Review of the raw data showed that for most cases, all of the subjects reported being able to complete case examination within 15 minutes. This time held true for both WSIs and glass slides. External validity for the assessment tool was determined by testing the hypothesis that a positive correlation should exist between the assigned quality ratings for correlating case glass and virtual slides. Examination of the subjects' assigned quality ratings for corresponding glass and virtual slides examined in phases 1 and 2 of the study revealed statistically significant positive correlations (P ≤ .01) for 3 of the 5 subjects. Regardless of a subject's specific complexity rating for individual study cases, intuitively and logically, one would predict that as perceptions of case complexity increased, levels of diagnostic confidence decreased. As a measure of construct validity, we examined the association between subjects' assessments of case complexity and their levels of diagnostic confidence for each phase of the study. Statistically significant negative correlations (P ≤ .01) between these 2 variables were found for 3 of the 5 subjects during both phases of the study. The individual subjects for whom statistically significant correlations were not demonstrated differed for each of the study phases.

anatomic pathology services across geographic distances, in addition to other possible benefits derived from digital workflow related to automation, workflow integration, and image analysis. Although WSI vendors are actively developing many of these functionalities, few groups are actively reporting progress in developing clinical applications for WSI technology [7-10]. The current study addressed IHC interpretation; a timely topic, given that (at the time of submission) at least one major commercial laboratory is offering WSI-based IHC delivery as a service. This study included difficult biopsies in which the original pathologist used IHC stains to render a diagnosis (Fig. 3). Despite a relatively high level of diagnostic difficulty, levels of intraobserver agreement between WSIs and glass slides were substantial or almost perfect for 4 of the 5 subjects. Except for 2 subject pairs (when viewing WSIs), interobserver agreement was at least moderate (κ greater than or equal to 0.61). Because our case sample was highly representative of diagnostically difficult prostate core biopsy specimens currently examined at many tertiary care academic centers, we believe our findings support the general diagnostic validity of WSIs (generated using a comparable system) for interpretation of diagnostically difficult prostate core biopsy specimens. The interpretation of all of these cases required reliable examination of images of IHC stained tissues; therefore, we also believe our findings support the validity of WSIs for interpretation of IHC stained tissues in this specific application. Given our subjects opinions about WSI quality, it is very likely that this technology will be valid in other IHC applications, although additional validation will be necessary. Results of the comparison of our local (ie, UPMC subject pathologists) consensus diagnoses versus goldstandard expert interpretations showed nearly perfect agreement; this suggests that certain difficult cases could be reviewed via WSI technology by other pathologists in consultation for a consensus diagnosis. In this situation, pathologists need to recognize that the lesion in question is difficult and requires consultation. Our subjects felt that it was possible to make this type of judgment about a given biopsy focus using WSI technology. The levels of intra- and interobserver agreement observed for these difficult biopsy cases (moderate to substantial) are consistent with levels of agreement previously reported in the literature for Gleason grading of prostatic biopsies, a similar assessment requiring pathologists to assign a numerical Gleason score (from more than 2 ordinal score groups) to Table 3 Interobserver diagnostic agreement (phase 2, glass microscope slides) (subjects are labeled “Path” 1 through 5) Path 1

4. Discussion WSI technology has rapidly evolved over the past several years and represents a potentially powerful ability to project

569

Path Path Path Path

1 2 3 4

Path 2

Path 3

Path 4

Path 5

0.556

0.517 0.607

0.600 0.543 0.680

0.502 0.588 0.593 0.672

570

J. L. Fine et al.

Fig. 3 Images of a focus of adenocarcinoma. The focus of carcinoma consists of small irregular glands (A) that have diffuse strong cytoplasmic racemase staining (B) and lack basal cells, demonstrated by negative staining for p63 (C) and keratin 903 (D). WSI created at 20× objective magnification.

malignant tissues [15-18]. Previously reported studies describing the measurement of pathologist interobserver agreement when viewing other surgical pathology specimen types using conventional light microscopy or digital images (either static or dynamic) show highly variable levels of agreement, ranging from essentially perfect [19] to fair [20]. Regardless of the specific κ statistics calculated, measurements of interobserver agreement that are essentially no better for microscopic slides compared to digital images support the validity of digital images for use in telepathology-based consultation. Therefore, we believe our findings support the initiation of feasibility studies using WSIs of prostate core biopsy sections in day-to-day anatomic pathology practice.

As mentioned above, WSI technology is already being used for IHC interpretation for real-time patient care (although without published corroborating studies). In addition to our direct study of WSI technology, we collected data for validity assessments of our evaluation tool (assessment form). Although we were able to demonstrate external and construct validity, we were not able to demonstrate internal validity. Review of raw data involved with internal validity (time required for each case examined) suggested that this was due to suboptimal construction of the time categories available for this item. Specifically, the categories currently span a period of up to 1 hour at 15-minute intervals (ie, “less than 15 minutes,”

Evaluation of WSI IHC interpretation in challenging prostate needle biopsies “15-30 minutes”). Because most of the cases were examined in less than 15 minutes, our assessment form did not allow differentiation of examination times between cases. Future versions of a self-scored assessment form should include revised choices regarding time (eg, “less than 1 minute”). In addition to this adjustment, we would also like to convert this variable from a self-reporting variable to one recorded either automatically or (less ideally) by a nonparticipant observer who will be able to perform accurate time assessments. Despite our failure to demonstrate internal validity for our evaluation tool at this time, we believe our ability to demonstrate external and construct validity for most of the subjects as well as the generation of an explanation for our inability to demonstrate internal validity that is logically consistent with the collected data and is amenable to adjustment with additional testing supports the usefulness of our tool for this and future studies. Such a tool is essential for critical evaluation of potential clinical applications, including potential impact on pathologists' throughput relative to traditional methods (eg, glass slides). With respect to study design, our group of subjects was heterogeneous and included practicing pathologists (with varying levels of practice experience) and trainee pathologists. Although this lack of homogeneity among our study subjects could be viewed as a limitation in our study design that most likely contributed to the variability in agreement we observed, we wished to examine the diagnostic validity of our system using a group of subjects representative of the day-to-day variability in anatomic pathology staff that would be expected to examine these images at our institution. Additional validity studies using a larger and more homogeneous group of subjects may result in a more accurate assessment of the level of intra- and interobserver agreement for the defined study group. The inclusion of a glass slide versus glass slide study “arm” (in addition to glass slides versus WSIs comparison) would further strengthen future study results in that one could more directly compare WSIs to glass microscope slides with regard to intra- and interobserver variability. Such studies would be expected to show improved validity relative to our findings in the current study, thereby supporting our current conclusions and confirming the diagnostic validity of WSIs for the histopathologic examination described here. Although validation studies are a critical component to introducing novel technology into pathologists' practice, future studies should also begin to address other critical components of digital pathology practice such as practical workflow (including viewing software and laboratory information system integration), cost justification and potential revenue, impact on and requirements from practices' pathology informatics infrastructure (image archival and disaster recovery), WSI quality, and automation (including both technical and professional components of practice). To this end, studies should continue to examine relatively specific applications of WSI technology while incorporating improvements in study design (eg, subject

571

selection) and data capture (eg, assessment forms and automated timing). Time information gathered from future studies will be critical not only for evaluation of new potential applications, but will facilitate the identification of process bottlenecks (eg, virtual microscopy software usability). These data will also be needed for pathology practices to make informed decisions about whether to proceed with WSI applications. Although turnaround time could be improved, it is also possible that overall per-pathologist work capacity is diminished if the WSI method itself is slower than comparable glass slide–based methods. This is an important possibility that should be addressed when looking at WSI technology; it may be acceptable in light of other benefits offered by a specific application. In particular, WSI technology may enable pathologists to extract more information from IHC tests than is currently feasible. Whereas most clinical image analysis applications only examine selected areas, WSIs permit the automatic analysis of entire tissue sections or slides [2]. Such tools could be powerful and might even be initial sources of cost justification and/or revenue for WSI technology in pathology practice, similar to recent non–WSI-based (ie, semiautomated or manual) image analysis applications.

4.1. Conclusions We conclude that WSI-based technology can currently permit accurate interpretation of IHC stains in the setting of diagnostically difficult prostate biopsies for adequately trained pathologists. Our comparison of a local (ie, UPMC subject pathologists) consensus diagnosis to a gold-standard expert opinion also confirmed the well-established fact that certain difficult cases may defy confident diagnosis by a single pathologist. We believe our findings provide evidence supporting the validity of WSI technology for interpretation of IHC in prostate biopsies; furthermore, these findings suggest that development of other “virtual” IHC applications is feasible and should be pursued (including relevant specific validation studies). Finally, it is hoped that once WSI technology has become better established in terms of diagnostic validity, further work will develop the as-yet untapped potential of digital workflow/automation including the use of automated image analysis to augment pathologists, thus extending what is currently the scarcest resource in modern anatomic pathology practice.

References [1] Romer DJ, Yearsley KH, Ayers LW. Using a modified standard microscope to generate virtual slides. Anat Rec B New Anat 2003;272B:91-7. [2] Chantrain CF, DeClerck YA, Groshen S, McNamara G. Computerized quantification of tissue vascularization using high-resolution slide

572

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

J. L. Fine et al. scanning of whole tumor sections. J Histochem Cytochem 2003;51: 151-8. Kumar RK, Freeman B, Velan GM, De Permentier PJ. Integrating histology and histopathology teaching in practical classes using virtual slides. Anat Rec B New Anat 2006;289B:128-33. Marchevsky AM, Khurana R, Thomas P, Scharre K, Farias P, Bose S. The use of virtual microscopy for proficiency testing in gynecologic cytopathology. Arch Pathol Lab Med 2006;130:349-55. Ho J, Parwani AV, Jukic DM, Yagi Y, Anthony L, Gilbertson JR. Use of whole slide imaging in surgical pathology quality assurance: design and pilot validation studies. HUM PATHOL 2006;37:322-31. Krupinski EA, Tillack AA, Richter L, et al. Eye-movement study and human performance using telepathology virtual slides. Implications for medical education and differences with experience. HUM PATHOL 2006;37:1543-56. Gilbertson JR, Ho J, Anthony L, Jukic DM, Yagi Y, Parwani AV. Primary histologic diagnosis using automated whole slide imaging: a validation study. BMC Clin Patho 2006;6:4. Johnston DJ, Costello SP, Dervan PA, O'Shea DG. Development and preliminary evaluation of the VPS ReplaySuite: a virtual doubleheaded microscope for pathology. BMC Med Inform Decis Mak 2005;5:10. Weinstein RS, Descour MR, Liang C, et al. An array microscope for ultrarapid virtual slide processing and telepathology design, fabrication, and validation study. HUM PATHOL 2004;35:1303-14. Molnar B, Berczi L, Diczhazy C, et al. Digital slide and virtual microscopy based routine and telepathology evaluation of routine gastrointestinal biopsy specimens. J Clin Pathol 2003;56:433-8. Wolff AC, Hammond EH, Schwartz JN, et al. American Society of Clinical Oncology/College of American Pathologists guideline

[12] [13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

recommendations for human epidermal growth factor receptor 2 testing in breast cancer. Arch Pathol Lab Med 2007;131:18-43. Epstein JI, Yang XJ. Prostate biopsy interpretation. Philadelphia, PA: Lippincott Williams & Wilkins; 2002. Sim J, Wright CC. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther 2005;85: 257-68. Dziedzic KS, Thomas E, Myers H, Hill S, Hay EM. The Australian/ Canadian osteoarthritis hand index in a community-dwelling population of older adults: reliability and validity. Arthritis Rheum 2007;57:423-8. Melia J, Moseley R, Ball RY, et al. A UK-based investigation of interand intra-observer reproducibility of Gleason grading of prostatic biopsies. Histopathology 2006;48:644-54. Oyama T, Allsbrook Jr WC, Kurokawa K, et al. A comparison of interobserver reproducibility of Gleason grading of prostatic carcinoma in Japan and the United States. Arch Pathol Lab Med 2005;129:1004-10. Allsbrook Jr WC, Mangold KA, Johnson MH, Lane RB, Lane CG, Epstein JI. Interobserver reproducibility of Gleason grading of prostatic carcinoma: general pathologist. HUM PATHOL 2001;32: 81-8. Allsbrook Jr WC, Mangold KA, Johnson MH, et al. Interobserver reproducibility of Gleason grading of prostatic carcinoma: urologic pathologists. HUM PATHOL 2001;32:74-80. Leinweber B, Massone C, Kodama K, et al. Teledermatopathology: a controlled study about diagnostic validity and technical requirements for digital transmission. Am J Dermatopathol 2006;28:413-6. Odze RD, Tomaszewski JE, Furth EE, et al. Variability in the diagnosis of dysplasia in ulcerative colitis by dynamic telepathology. Oncol Rep 2006;16:1123-9.