Available online at www.sciencedirect.com
Gynecologic Oncology 111 (2008) 120 – 124 www.elsevier.com/locate/ygyno
P16 INK4a immunohistochemistry improves the reproducibility of the histological diagnosis of cervical intraepithelial neoplasia in cone biopsies Carmen M. Gurrola-Díaz a,⁎, Ángel E. Suárez-Rincón b , Gonzalo Vázquez-Camacho c , Giuseppe Buonocunto-Vázquez c , Sergio Rosales-Quintana c , Nicolas Wentzensen d , Magnus von Knebel Doeberitz d a
c
Instituto de Enfermedades Crónico-Degenerativas, Departamento de Biología Molecular y Genómica, Centro Universitario de Ciencias de la Salud, Universidad de Guadalajara, Guadalajara, Jalisco C.P. 44340, México b Unidad de Investigación en Colposcopia y Patología Cervical, Hospital General Regional No. 45, Instituto Mexicano del Seguro Social, Guadalajara, Jalisco, México Departamento de Anatomía Patológica, U.M.A.E., Centro Medico Nacional de Occidente, Instituto Mexicano del Seguro Social, Guadalajara, Jalisco, México d Department of Applied Tumor Biology, Institute of Pathology, University of Heidelberg, Heidelberg, Germany Received 13 March 2008 Available online 9 August 2008
Abstract Objective. Cervical cancer is currently the most frequently occurring cancer among women in Mexico. Mexican cervical cancer prevention programs have been unsatisfactory in part because the tests used to diagnose precursor lesions have poor reproducibility. The implementation of specific biomarkers may overcome these limitations. Here, we analyzed whether immunohistochemistry for p16INK4a could improve the reproducibility of histopathological diagnoses of cervical precancerous lesions. Methods. Serial sections of 78 specimens were stained for H&E and p16INK4a and independently interpreted by three Mexican pathologists. Specimens were interpreted and categorized in two ways: 1) four diagnostic categories including negative lesions, CIN1, CIN2, and CIN3, or 2) two diagnostic categories; either lesions that do not require therapy (negative, CIN1), or lesions that require therapy (≥ CIN2). The agreement in diagnoses between pairs of observers was evaluated by kappa statistics. Results. The best concordance in diagnosing was observed with two categories and p16INK4a staining. Interestingly, the overall diagnostic discordances of higher than one CIN grade were 26.1% for H&E and 9.20% for p16INK4a (P b 0.001). Using four diagnostic categories, weighted kappa values for each pair of observers were 0.28, 0.15, and 0.36 for H&E and 0.34, 0.35, and 0.60 for p16INK4a stains. Using two diagnostic categories, kappa values were 0.36, 0.12, and 0.18 for H&E and 0.59, 0.70, and 0.59, p16INK4a stains. Conclusion. These data show that p16INK4a immunohistochemistry substantially improved the reproducibility of interpreting histological slides. This approach may result in more accurate diagnoses and improved clinical management of patients with cervical precancerous lesions in Mexico and elsewhere. © 2008 Elsevier Inc. All rights reserved. Keywords: p16 protein/p16INK4a (cyclin-dependent kinase inhibitor); Biomarker; Immunohistochemistry; Mexico; Cervical intraepithelial neoplasia (CIN); Reproducibility
Introduction
Abbreviations: H&E, hematoxylin and eosin; CIN, cervical intraepithelial neoplasia; p16INK4a, p16 protein; HPV, human papilloma virus. ⁎ Corresponding author. Fax: +52 33 10585287. E-mail address:
[email protected] (C.M. Gurrola-Díaz). 0090-8258/$ - see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.ygyno.2008.06.032
In Mexico, cervical cancer is the most frequent cancer among women, with a mortality rate of 8.3 per 100,000 women in 2003 [1]. Early detection and prevention programs for cervical cancer aim to identify asymptomatic pre-neoplastic lesions that require surgical management before progression to invasive cancer. Cervical biopsies are histologically classified as Cervical Intraepithelial
C.M. Gurrola-Díaz et al. / Gynecologic Oncology 111 (2008) 120–124
Neoplasia grade 1 (CIN1), CIN2 and CIN3 [2]. CIN 1 is usually regarded as benign and may result from an acute Human Papilloma Virus (HPV) infection. CIN2–3 are regarded as precursors of invasive carcinomas and therapy is indicated. Despite clear morphological criteria for histopathological diagnoses of CIN, lesion classification is fraught with rather poor inter- and intra-observer reproducibility [3,4]. Specific biomarkers represent a potential technical advance that may improve the reproducibility of interpreting staining patterns in histological sections [5]. Neoplastic transformation is triggered by the deregulated expression of two high-risk-HPV encoded oncogenes, E6 and E7, in the basal and parabasal cells of the cervical epithelium. This is accompanied by substantial overexpression of the p16INK4a gene product [6–8]. In accordance with these observations, recent studies demonstrated that cervical cancer and most CIN2 and CIN3 expressed high levels of p16 protein (p16INK4a) that were readily detected by immunohistochemistry [9,10]. These findings suggested that p16INK4a immunohistochemistry may improve the reproducibility of histopathological classification of cervical cancer precursor lesions. Several recent studies have addressed this point. A substantial improvement in diagnostic accuracy was reported from studies performed in Western Europe and North America. However, this approach has not been tested in emerging countries that suffer from high cervical cancer incidence due to inappropriate screening technology and programs. In this study, we evaluated the inter-observer agreement in diagnoses of lesions stained by either hematoxylin and eosin (H&E) or p16INK4a immunohistochemistry in 78 cervical cone biopsy samples from colposcopy clinics in Guadalajara, Mexico. Material and methods Study design and interpretation of histological slides Mexican women were recruited from colposcopy clinics at the Mexican Institute for Health Security, located in Guadalajara city, and underwent cervical cone biopsies. The mean age of the patients was 34.5 ± 8.5 years (18–50 years old). Seventyeight biopsy specimens were fixed in formalin, embedded in paraffin wax, cut into thin sections, and mounted on slides. Consecutive section pairs were stained with H&E and p16INK4a antibody, as described previously [10]. Three consultant pathologists with experience in cervical pathology in Mexico were invited to join the present study. First, each pathologist independently reviewed a set of 78 H&Estained slides. Next, each pathologist was asked to review published literature regarding the interpretation of p16INK4a immunohistochemistry. Consequently, each pathologist independently reviewed a second set of 78 slides stained for p16INK4a and numbered with a different code. There was no personal contact among the pathologists, and the application of criteria was not discussed. The pathologists scored the H&E and p16INK4a stained sections according to four diagnostic categories: one negative, and three positive for cervical intraepithelial neoplasia (CIN1, 2, and 3) [11]. Subsequently, we re-
121
grouped the histopathological reports according to two diagnostic categories based on whether treatment was indicated; the negative and CIN1 did not require treatment and the CIN2–3 required surgical intervention [12]. Conventional morphological criteria were used for interpreting H&E-stained sections. The criteria for interpreting p16INK4a immunohistochemistry comprised strong diffuse staining (nuclear and/or cytoplasmic) in the basal and parabasal cell compartments. Hematoxylin and eosin staining and p16INK4a immunohistochemistry Paraffin-embedded biopsies were cut in serial sections (4 μm thick) and dewaxed by passage through three xylene baths, rehydrated with graded alcohols (100%, 96%, and 70%), washed with deionized and distilled water, and stained with H&E, or immunostained for p16INK4a. For conventional H&E staining, we used commercially available solutions. P16INK4a immunohistochemistry was performed using the CINtecTM p16INK4a histology kit (DakoCytomation, Glostrup, Denmark), which has been optimized for use on formalin-fixed tissues. Diaminobenzidine was used as a chromogen, and slides were counterstained with hematoxylin. HeLa cells were used as positive controls for p16INK4a immunohistochemistry. Statistical analyses We calculated the scoring frequencies for each observer and for each staining method. To analyze the concordance between observers for scoring H&E and p16INK4a slides, we used contingency tables and calculated the chi-square and odds ratio. We also measured the discordance greater than one grade for each method and compared them with the chi-square test. The CIN classification was compared to p16INK4A positivity in a contingency table and evaluated for independence with the chisquare test. Finally, we used kappa statistics to measure the agreement between pairs of observers. Kappa is an index of agreement over and above that expected by chance alone. Generally, kappa values N0.60 are regarded as good agreement beyond chance, and values b0.20 as poor agreement [13]. Weighted kappa values take into account that some disagreements are more serious than others. In a weighted analysis a disagreement between diagnoses of negative and CIN3 is treated differently than a disagreement between diagnoses of negative and CIN1 [14]. Weighted kappa values were calculated for all diagnostic categories (negative, CIN1, CIN2, and CIN3). Unweighted kappa values were calculated for treatment categories (negative + CIN1 and CIN2–3). Group kappa values for all observations were also calculated. The statistical analyses were performed using the software packages SPSS for windows version 10.0.6 (SPSS Inc.) and epi-info 2002. Results The overall scoring frequency and concordance among pairs of observers were analyzed. The three pathologists interpreted the
122
C.M. Gurrola-Díaz et al. / Gynecologic Oncology 111 (2008) 120–124
Table 1 Relative frequencies of scored lesions by observer
Negative CIN 1 CIN 2 CIN 3 a b
Table 2 Concordance in the reporting of cervical biopsies
Observer one
Observer two
Observer three
H&E a
H&E a
H&E a
76.9 6.4 3.9 12.8
p16INK4a, b 76.3 3.9 6.7 13.1
11.5 50.1 20.5 17.9
p16INK4a, b 5.3 57.8 23.7 13.2
3.8 38.5 38.5 19.2
Number of concordant scores (relative frequency)
p16INK4a, b 2.6 65.8 13.2 18.4
H&E staining. p16INK4a immunohistochemistry.
H&E and p16INK4a stained sections and classified them into four diagnostic categories: negative, CIN1, CIN2, and CIN3 (Table 1). The primary disagreement among observers was between negative lesions and CIN1. In particular, observer 1 diagnosed a majority of sections negative, independent of the staining method (76.9% of H&E and 76.3% of p16INK4a). Consequently, observer 1 diagnosed a lower number of CIN1, CIN2 and CIN3 lesions than observers 2 and 3 by H&E staining (23.1 vs. 88.5 and 96.2%, respectively) and by p16INK4a staining (23.7 vs. 94.7 and 97.4%, respectively). Observer 2 achieved a similar distribution of diagnoses with the two staining methods. Observer 3 achieved different distributions of CIN1 and CIN2 diagnoses with the two staining methods. Among all categories analyzed, the least intraand inter-observer variation was achieved with CIN3 diagnoses. Representative examples of the stained sections are shown in Fig. 1 (A–D). Figs. 1A and B show a low grade squamous intraepithelial lesion (LSIL) stained with H&E (Fig. 1A) and with an anti-p16INK4a-monoclonal antibody (clone E6H4) (Fig.
2 categories 4 categories P value OR (CI 95%) INK4a
P16 n = 228 H&E n = 234 P value OR (CI 95%) a
a
192 (84%) 142(61%) b0.001 3.46 (2.2–5.5)
88 (39%) b0.001 68(29%) b0.001 0.03 1.53 (1.02–2.31)
8.48 (5.3–13.6) 3.77 (2.5–5.6)
Two p16INK4a stained slides were not included by technical problems.
1B). An example for CIN3 lesions is shown in Fig. 1C (H&E stain) and Fig. 1D (p16INK4a). The number of concordant scores observed when four or two diagnostic categories were used is shown in Table 2. We found poor inter-observer concordance with both staining methods when the lesions were classified according to four categories. In contrast, the concordance was higher when the categories were classified according to two categories, especially with p16INK4a staining. Table 3 shows the interobserver discordance of more than one grade. A statistically higher discordance was observed with H&E compared to p16INK4a staining. The relative frequency of p16INK4a positive scores (p16INK4a positivity) within the four categories is shown in Table 4. The p16INK4a positivity was directly associated with the severity of the CIN grade (P b 0.001). Table 5 shows the resulting kappa statistics. Independent of the staining method, agreement was poor between observer 1 and observers 2 and 3. When scoring included four diagnostic categories, the H&E-stained slides achieved a group kappa
Fig. 1. Cervical biopsy specimens. (A) H&E-stained section showing an example of a low grade squamous intraepithelial lesion (CIN1). (B) P16INK4a immunohistochemistry of a consecutive slide of the lesion that was stained with a monoclonal antibody against p16INK4a (clone E6H4). (C) H&E-stained section showing an example of a high-grade squamous intraepithelial lesion (CIN 3). (D) Consecutive section of the same biopsy as shown in Fig. 1C stained with a monoclonal antibody against p16INK4a (clone E6H4) displayed diffuse and strong overexpression of p16INK4a.
C.M. Gurrola-Díaz et al. / Gynecologic Oncology 111 (2008) 120–124 Table 3 Discordances greater than one grade
H&E p16INK4a ⁎
Number of discordant scores
Relative frequency
61/234 21/228
26.10% 9.20%
⁎ P b 0.001.
value of only 0.26; in contrast the p16INK4a immunohistochemistry stained slides achieved a group kappa value of 0.41. When scoring included two categories, the group kappa values were 0.22 and 0.62 for H&E and p16INK4a stained slides, respectively. Discussion The clinical management of cervical precancerous lesions is based on the histopathological diagnosis of cervical biopsies. Currently in Mexico there has been little critical evaluation of the standard diagnostic method using H&E staining and even less of diagnostic methods that include additional biomarkers like p16INK4a. However, some studies have evaluated the agreement in scoring cytological specimens and the results are surprisingly poor [15–18]. The H&E stain of cervical biopsy sections is currently considered the “gold standard”; although a definitive diagnosis is hampered by the poor concordance among different observers. Accordingly, the aim of this study was to assess the diagnostic reproducibility of p16INK4a stained slides in comparison to the conventional “gold standard” (H&E staining) in a series of biopsy samples from Mexican women. Our results demonstrated an improvement in the overall concordance in diagnoses when p16INK4a staining was used. The reduction of diagnostic categories according to whether treatment was required (two categories) [12,19] also considerably improved the concordance with both staining methods (P = 0.03 for four categories versus P b 0.001 for two categories). In addition, the inter-observer concordance was lowest for diagnosing CIN1, and improved with the application of two categories. The discrepancy between observers in diagnosing negative lesions and CIN1 is consistent with reports from other authors [20]. In agreement with previous reports, we confirmed a direct relationship between inter-observer agreement and severity of the lesion [4,21]. To assess whether the discordance among observers was clinically relevant, discordances higher than one grade were analyzed. Histological misclassification may result in the clinical mismanagement of patients with cervical lesions, and our results revealed a significantly lower discordance in diagnoses when p16INK4a staining was used. In assessing p16INK4a positivity, only diffuse staining patterns of basal and parabasal cell layers were considered positive. Table 4 Interpretation of p16INK4a staining for biopsies scored negative and CIN1–3
INK4a
negative P16 P16INK4a positive p16INK4a positivity (%)⁎ Chi Square 115.5, ⁎P b 0.001.
NEG
CIN1
CIN2
CIN3
63 0 0
94 2 2%
23 10 30%
4 30 88%
123
Table 5 Kappa (κ) values between pairs of observers and group kappa values obtained for each staining method κ1
κ2
H&E 1 vs 2 1 vs 3 2 vs 3 Group kappa values
0.28 0.15 0.36 0.26 (0.14–0.37 a)
0.36 0.12 0.18 0.22 (0.12–0.33 b)
p16INK4a 1 vs 2 1 vs 3 2 vs 3 Group kappa values
0.34 0.35 0.60 0.41 (0.29–0.52 a)
0.59 0.70 0.59 0.62 (0.5–0.75 b)
κ1 weighted kappa values for four categories (negative, CIN1, CIN2 and CIN3). κ2 kappa values for two categories (negative + CIN1 and CIN2 + CIN3). () 95% CI. a For weighted group kappa values. b For group kappa values.
Although our results are in agreement with other studies [22], we reported a somewhat lower frequency of p16INK4a positivity for CIN2, suggesting that some of the CIN2 may have been misclassified based on the H&E stain. Nevertheless, it is important to recognize the substantial improvement in the interobserver agreement of p16INK4a stained slides. This was emphasized by the fact that this was the first experience for the participating pathologists in interpreting p16INK4a stained slides. Landis standards established the kappa statistics as an index of observer agreement represented with a numerical value, ranking from 0 (low agreement) to 1 (perfect agreement) [13]. In this study, the diagnostic value of each technique was determined with three observers; this number of independent observers is acceptable for estimating the agreement indexes [23]. The overall inter-observer agreement on the H&E-stained slides was surprisingly poor (group kappa value 0.22). Alternatively, a substantially improved group kappa value (0.62) was obtained with p16INK4a stained slides. Nevertheless, the kappa values may have improved even more with a higher number of observers. Interestingly, the reduction of diagnostic categories according to whether treatment was indicated substantially improved the inter-observer agreement for p16INK4a stained samples, but not for H&E-stained samples. This result strongly underlines the clinical implications of using p16INK4a immunohistochemistry for CIN diagnosis. Additionally, recent reports [24,25] described the predictive value of various primary cervical cancer screening methods. They found that Pap-cytology and HPV-screening were substantially improved when p16INK4a immunohistochemistry was used in conjunction to the conventional H&E staining. In conclusion, this study further substantiates the clinical utility of p16INK4a as an accurate biomarker that improves the inter-observer agreement in the histological classification of cervical biopsy specimens. The data clearly justify future plans to extend this work and analyze the clinical performance of the p16INK4a biomarker approach in substantially larger cohorts. Based on these data, additional cost-effectiveness studies
124
C.M. Gurrola-Díaz et al. / Gynecologic Oncology 111 (2008) 120–124
should soon be undertaken especially in patients with ≥ CIN2 lesions before implementing the p16INK4a immunohistochemistry technique into the everyday practice. Nevertheless, this research may help to reduce false positive and false negative test results, thus reducing the number of unnecessary surgical treatments without neglecting patients that require surgical treatment. Undoubtedly, the high mortality rate of patients with cervical cancer in Mexico points to the importance of more reliable screening methods for identifying precancerous lesions. Reproducible and reliable interpretation of histology sections is a key part of the diagnostic screening process. Conflict of interest statement MvKD is a share holder, advisor, and member of the Board of mtm-laboratories Inc. Heidelberg, Germany. This company produces and purchases products related to the context of this article. All other authors have no conflicts of interest to declare.
[8]
[9]
[10]
[11] [12] [13] [14]
Acknowledgments [15]
We are very grateful to Ing. Rogelio Troyo Sanroman for help in the statistical analysis. We also thank Gabriela Blancas for technical assistance. References [1] Dirección General de Información en Salud, Secretaría de Salud, México. Mexico's mortality statistics. Deaths registered in 2003. Salud Publica Mex 2005; 47:171-87. [2] Norma Oficial Mexicana (1994) and last modification, available from: http://www.hgm.sald.gob.mx/servmed/nom_014_zssa2_1994.pdf and http://www.salud.gob.mx/unidades/cdi/nom/m014ssa24.html Accessed 4 June 2007. [3] McCluggage WG, Bharucha H, Caughley LM, et al. Interobserver variation in the reporting of cervical colposcopic biopsy specimens: comparison of grading systems. J Clin Pathol 1996;49:833–5. [4] Stoler MH, Schiffman M, Atypical Squamous Cells of Undetermined Significance—Low-grade Squamous Intraepithelial Lesion Triage Study (ALTS) Group. Interobserver reproducibility of cervical cytologic and histologic interpretations: realistic estimates from the ASCUS-LSIL Triage Study. JAMA 2001;285:1500–5. [5] Shutter J, Atkins KA, Ghartey K, et al. Clinical applications of immunohistochemistry in gynecological malignancies. Int J Gynecol Cancer 2007;17:311–5. [6] Sano T, Masuda N, Oyama T, et al. Overexpression of p16 and p14ARF is associated with human papillomavirus infection in cervical squamous cell carcinoma and dysplasia. Pathol Int 2002;52:375–83. [7] Klaes R, Friedrich T, Spitkovsky D, et al. Overexpression of p16(INK4A)
[16] [17]
[18]
[19] [20]
[21]
[22]
[23] [24]
[25]
as a specific marker for dysplastic and neoplastic epithelial cells of the cervix uteri. Int J Cancer 2001;92:276–84. von-Knebel-Doeberitz M. New markers for cervical dysplasia to visualise the genomic chaos created by aberrant oncogenic papillomavirus infections. Eur J Cancer 2002;38:2229–42. Carozzi F, Cecchini S, Confortini M, et al. Role of P16(INK4a) expression in identifying CIN2 or more severe lesions among HPV-positive patients referred for colposcopy after abnormal cytology. Cancer 2006;108:119–23. Klaes R, Benner A, Friedrich T, et al. p16INK4a immunohistochemistry improves interobserver agreement in the diagnosis of cervical intraepithelial neoplasia. Am J Surg Pathol 2002;26:1389–99. Richart RM. Natural history of cervical intraepithelial neoplasia. Clin Obstet Gynecol 1967;10:748–84. Miller AB, Nazeer S, Fonn S, et al. Report on consensus conference on cervical cancer screening and management. Int J Cancer 2000;86:440–7. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74. McCluggage WG, Walsh MY, Thornton CM, et al. Inter- and intraobserver variation in the histopathological reporting of cervical squamous intraepithelial lesions using a modified Bethesda grading system. Br J Obstet Gynaecol 1998;105:206–10. de-Ruiz PA, Lazcano-Ponce EC, Duarte-Torres R, et al. Diagnostic reproducibility of Pap testing in two regions of Mexico: the need for quality control mechanisms. Bull Pan Am Health Organ 1996;30:330–8. Lazcano-Ponce EC, de-Ruiz PA, Martinez-Arias C, et al. Diagnostic concordance in gynecologic cytology. Rev Invest Clin 1997;49:111–6. Lazcano-Ponce EC, AlonsodeRuiz P, Lopez-Carrillo L, et al. Validity and reproducibility of cytologic diagnosis in a sample of cervical cancer screening centers in Mexico. Acta Cytol 1997;41:277–84. Lazcano-Ponce EC, Alonso de Ruiz P, Martinez-Arias C, et al. Reproducibility study of cervical cytopathology in Mexico: a need for regulation and professional accreditation. Diagn Cytopathol 1997;17:20–4. Holowaty P, Miller AB, Rohan T, et al. Natural history of dysplasia of the uterine cervix. J Natl Cancer Inst 1999;91:252–8. Ismail SM, Colclough AB, Dinnen JS, et al. Observer variation in histopathological diagnosis and grading of cervical intraepithelial neoplasia. BMJ 1989;298:707–10. Robertson AJ, Anderson JM, Beck JS, et al. Observer variability in histopathological reporting of cervical biopsy specimens. J Clin Pathol 1989;42:231–8. Negri G, Vittadello F, Romano F, et al. p16INK4a expression and progression risk of low-grade intraepithelial neoplasia of the cervix uteri. Virchows Arch 2004;445:616–20. Walter SD. Measuring the reliability of clinical data: the case for using three observers. Rev Epidemiol Sante Publique 1984;32:206–11. Zhang Q, Kuhn L, Denny LA, et al. Impact of utilizing p16INK4A immunohistochemistry on estimated performance of three cervical cancer screening tests. Int J Cancer 2007;120:351–6. Ekalaksananan T, Pientong C, Sriamporn S, et al. Usefulness of combining testing for p16 protein and human papillomavirus (HPV) in cervical carcinoma screening. Gynecol Oncol 2006;103:62–6.