Hepatology Research 33 (2005) 122–127
Interobserver variation in the histopathological assessment of nonalcoholic steatohepatitis Toshio Fukusato a,∗ , Junichi Fukushima a , Junji Shiga a , Yoshihisa Takahashi a , Toshiyuki Nakano b , Shiro Maeyama c , Uchikoshi Masayuki c , Makoto Ohbu d , Toshiharu Matsumoto e , Koji Matsumoto f , Hiroshi Hano g , Michiie Sakamoto h , Fukuo Kondo i , Akio Komatsu j , Takashi Ishikawa k , Hiroo Ohtake l , Hajima Takikawa m , Kenichi Yoshimura n , Liver Disease Working Group-Kantoa,b,c,d,e,f,g,h,i,j,k,l,m a
Department of Pathology, Teikyo University School of Medicine, 2-11-1 Kaga, Itabashi-ku, Tokyo 173-8605, Japan b Division of Clinical Pathology, National Chiba Hospital, Chiba, Japan c Department of Pathology, St. Marianna University School of Medicine, Kawasaki, Japan d Department of Pathology, Kitasato University School of Medicine, Sagamihara, Kanagawa, Japan e Department of Pathology, Juntendo University School of Medicine, Tokyo, Japan f Department of Pathology, Nippon Medical School Second Hospital, Kawasaki, Japan g Department of Pathology, Jikei University School of Medicine, Tokyo, Japan h Department of Pathology, Keio University School of Medicine, Tokyo, Japan i Department of Pathology, Funabashi Central Hospital, Funabashi, Japan j Jiseikai Hospital, Funabashi, Japan k Department of Pathology, Health Service Center, University of Tokyo, Japan l Department of Medicine, Komagome Hospital, Tokyo, Japan m Department of Medicine, Teikyo University School of Medicine, Tokyo, Japan n Department of Biostatistics, School of Health Sciences and Nursing, University of Tokyo, Tokyo, Japan
Abstract Pathological assessment is considered as the gold standard for the diagnosis of nonalcoholic steatohepatitis (NASH). However, there is no agreement of histological features required for the diagnosis of NASH. In the present study, eight experienced hepatopathologists read liver biopsy specimen slides of 21 cases of NASH and suspected NASH independently and were asked to assess histopathological features and render a diagnosis. Interobserver variation among pathologists was evaluated by kappa statistics. Significant, good agreement was present in evaluation of the extent of steatosis and grade of fibrosis. Agreement was moderate concerning the localization of steatosis, localization of fibrosis, and glycogen nuclei. Only slight or poor agreement was seen in evaluation of type of steatosis, ballooning, intralobular necroinflammatory change, portal inflammation, and degree of neutrophilic infiltration. Thus the agreement varied for histological variables. Significant, moderate agreement was seen in the diagnosis of NASH but agreement was poor in the diagnosis of suggestive NASH. The agreement for the diagnosis of NASH was not high as for the individual histological findings that were thought to be the basis of the diagnosis. In conclusion, some histological features in NASH might prove useful for the development of a standardized and reliable pathological diagnosis and scoring system. © 2005 Elsevier Ireland Ltd. All rights reserved. Keywords: Nonalcoholic steatohepatitis (NASH); Pathological diagnosis; Interobserver variation; Kappa analysis
∗
Corresponding author. Tel.: +81 33964 1211x2071; fax: +81 33964 3569. E-mail address:
[email protected] (T. Fukusato).
1386-6346/$ – see front matter © 2005 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.hepres.2005.09.018
T. Fukusato et al. / Hepatology Research 33 (2005) 122–127
1. Introduction
Table 2 Questionnaire
Pathological assessment is considered the gold standard for the diagnosis of nonalcoholic steatohepatitis (NASH) [1–4]. However, there is disagreement about the histological features required for the diagnosis of NASH [4,5]. The significant difference in the prevalence of NASH between similar populations has been mainly explained by the different histological criteria used for the diagnosis of NASH in clinical studies [5–7]. In the USA and Europe, the standardized approach to pathological diagnosis is hampered by the diversity of opinions among pathologists regarding the minimal criteria for a diagnosis of NASH [4,5]. In Japan, pathologists may include so-called nonalcoholic steatofibrosis in NASH because studies have shown that alcoholic fibrosis without inflammation is the most common type of alcoholic liver injury in Japan [8–10]. In this study we present the interobserver variation in the histopathological interpretation of NASH. This approach may explain the reason for disagreement in the diagnosis of NASH and enable us to deliver a proposal for standardizing the pathological diagnosis of NASH [11,12].
1. Diagnosis NASH, diagnostic NASH, suggestive No
2. Patients and methods
7. Glycogen nuclei None, present (a few, many)
Twenty-one needle liver biopsy cases were selected for this study according to the criteria shown in Table 1. Liver biopsy specimens were obtained from fatty liver cases with abnormal liver function tests, particularly abnormal ALT, in nonalcoholic patients without other liver diseases. Liver biopsy specimens were stained with silver, HE and AzanMallory solution. The methods of study are as follows. At first, 4 months before the initiation of the study, we held a conference on NASH. In the conference, the study group met and reviewed the definition of pathological diagnosis and each pathological feature by reviewing typical slides. Next, eight experienced pathologists from six universities and two hospitals in Japan independently read liver biopsy specimen slides of 21 cases of NASH and suspected NASH and were asked to assess the histopathological features and render a diagnosis. Table 2 shows a questionnaire that pathologists were asked to fill out and return. Finally, interobserver variation among the pathologists was evaluated by kappa statistics [13,14] using SAS software [15]. Multivariate analysis was done using a ordinal logistic regression model [16] and SAS software. Kappa value is a scale for interpreting the degree of agreement. Complete agreement is 1 and 0 Table 1 Criteria for selection of liver biopsy cases in this study 1. Persistent abnormal liver function (abnormal ALT, AST values) 2. Non-alcoholic 3. No other liver diseases 4. Histologically detectable fatty change of the liver
123
2. Fatty change a. None, present (mild, moderate, severe) b. Macrovesicular, mixed microvesicular c. Centrilobular, intermediate, periportal, diffuse 3. Hepatocyte ballooning None, present (mild, moderate, severe) 4. Intralobular necroinflammatory change None, present (mild, moderate, severe) 5. Fibrosis a. None, present (mild, moderate, severe) b. Centrilobular fibrosis (none, present) c. Pericellular fibrosis (none, present) d. Periportal fibrosis (none, present) e. Portal fibrosis (none, present) 6. Portal inflammation None, present (mild, moderate, severe)
8. Mallory body None, present 9. Neurophil infiltration None, present (a few, prominent) Table 3 Kappa coefficient and the strength of concordance (Demetris et al. [17]) κ
Agreement
>0.70 0.51–0.70 0.31–0.50 0.11–0.30 <0.11
Excellent Good Moderate Fair Poor
means less than what could be expected by chance only [17] (Table 3).
3. Results and discussions 3.1. Evaluation of consensus agreement among pathologist The consensus agreement among pathologists about the diagnosis of NASH was evaluated. Full agreement between all eight pathologists with respect to the histopathological diagnosis was found in only two biopsies (Table 4). Five or more pathologists agreed with the final diagnosis in eight biopsies (38%). However, when both diagnostic NASH and
T. Fukusato et al. / Hepatology Research 33 (2005) 122–127
124
Table 4 Consensus agreement among pathologists about the diagnosis of NASH Total 21 Cases
NASH, diagnostic NASH, suggestive NASH, diagnostic or suggestive No
Number of biopsies for which 8 observers’ diagnosis All agree
7 observers agree
6 observers agree
5 observers agree
Total 5 or more agreements (%)
2 0 9 0
1 0 6 0
3 2 1 1
2 2 2 2
8 (38.1) 4 (19.0) 18 (85.7) 3 (14.3)
suggestive NASH cases were included in the consensus agreement, full agreement was found in nine cases, and five or more pathologists agreed with the diagnostic or suggestive NASH results in 18 cases (86%). Fig. 1 shows histopathological features of a typical NASH case for which all pathologists were in full agreement with the diagnosis.
Table 5 Interobserver agreement in the diagnosis of NASH (κ value)
3.2. Kappa analysis for interobserver variation in the diagnosis of NASH
obtained (Table 6). The percentage agreement and kappa value between each pair of observers was statistically significant in 8 pairs (28.6%) and insignificant in 20 pairs. The range of percentage agreement was from 57 to 86% in 8 pairs and from 29 to 52% in the other 20 pairs.
Kappa analysis for the diagnosis of NASH by all eight pathologists was done at first. In a result (Table 5), moderate agreement was seen in the diagnosis of NASH but poor agreement was seen in the diagnosis of suggestive NASH. The agreement for the diagnosis of NASH was not as high for the individual histological features that were thought to be the basis for the diagnosis. Next, the percentages and kappa coefficients for each pathologist pair in the agreement of NASH diagnosis were
NASH diagnostic NASH suggestive No Overall
Ratio (%)
κ
p value
44.64 37.50 17.86
0.329 0.068 0.270 0.218
<0.0001 0.0507 <0.0001 <0.0001
3.3. Kappa analysis for interobserver variation in the assessment of each histopathological feature Good agreement was present in the evaluation of the extent of steatosis and the grade of fibrosis (Table 7). Agreement was moderate concerning the localization of steatosis and local-
Fig. 1. Histopathological features of a typical NASH case. The upper left figure shows severe steatosis and intralobular inflammation. In the upper right figure, ballooning is evident. The lower right figure shows portal inflammation. Pericellular fibrosis is shown in the lower right figure.
T. Fukusato et al. / Hepatology Research 33 (2005) 122–127
125
Table 6 Percentage and kappa coefficients for each pathologists pair in the agreement of NASH diagnosis Observer
Observer A
B
C
D
E
F
G
H
A
B
0.571 0.344 0.014
C
0.619 0.422 0.0033
0.571 0.354 0.0111
D
0.571 0.403 0.0054
0.667 0.464 0.0021
0.476 0.188 0.1142
E
0.571 0.339 0.0151
0.571 0.325 0.0211
0.476 0.197 0.1035
0.476 0.143 0.1895
F
0.476 0.13 0.2167
0.524 0.155 0.1889
0.286 −0.111 0.7591
0.524 0.152 0.2047
G
0.476 0.113 0.2424
0.571 0.221 0.0956
0.286 −0.125 0.787
H
0.429 0.111 0.241
0.524 0.229 0.0791
0.381 −0.322 0.9701
0.429 −0.014 0.532 0.524 0.222 0.0906
0.524 0.175 0.1607 0.476 0.089 0.3006
0.857 0.66 0.0005
0.429 0.085 0.3
0.524 0.101 0.2903
0.429 −0.122 0.7605
Upper: agreement; middle: kappa value; lower: significance (p).
ization of fibrosis. Only slight or poor agreement was seen in evaluation of the type of steatosis, ballooning, intralobular necroinflammatory change, and portal inflammation. Thus the agreement varied for histological variables. In Table 8, the results of interobserver variability in the assessment of each histopathological feature were summarized. Features which have high kappa values included the extent of steatosis, degree of fibrosis and location of fibrosis. Features which have low kappa values included the type of steatosis, ballooning, intralobular necroinflammation, portal inflammation, and neutrophilic infiltration. 3.4. Multivariate analysis using an ordinal logistic regression and proportional odds model We used an ordinal logistic regression and proportional odds model in this multivariate analysis to evaluate the contribution of each histological feature in the diagnosis of NASH by each pathologist. A high odds ratio means a high probability that pathologists will make a diagnosis based on this pathological feature. Five items showed a significant and high odds ratio (Table 9). In Table 10, I summarized the results of the multivariate analysis. Four items that show high odds ratio included the degree of neutrophilic infiltration, the degree of steatosis, bal-
Table 7 Interobserver variation in the assessment of each histopathological feature Pathologic features
Interobserver variation in all eight pathologists κ
p
Steatosis Extent of steatosis Type of steatosis Location of steatosis
0.535 0.116 0.317
<0.0001 0.0025 <0.0001
Ballooning
0.142
<0.0001
Fibrosis Degree of fibrosis
0.553
<0.0001
0.484 0.346 0.430 0.449
<0.0001 <0.0001 <0.0001 <0.0001
0.099 0.142 0.332 0.223 0.054
0.0006 <0.0001 <0.0001 <0.0001 0.0367
Location of fibrosis Centrilobular Pericellular Periportal Portal Intralobular necroinflammatory change Portal inflammation Glycogen nuclei Mallory body Degree of neutrophilic inflammation
126
T. Fukusato et al. / Hepatology Research 33 (2005) 122–127
Table 8 Interobserver variability in the assessment of histopathological features Features that have high κ values Extent of steatosis Degree of fibrosis Location of fibrosis Glycogen nuclei
Table 11 Contribution to the pathological diagnosis and the agreement in the interpretation of histopathological features High odds ratio and high κ value Steatosis Centrilobular or pericellular fibrosis High odds ratio but low κ value Neutrophilic infiltration Ballooning
Features that have low values Type of steatosis Ballooning Intralobular necro-inflammation Portal inflammation Degree of neutrophilic infiltration
Low odds ratio and low κ value Intralobular inflammation
Table 9 Multivariate analysis using an ordinal logistic regression model for histological features Histological feature
Odds ratio
p
Neutrophilic infiltration Ballooning Pericellular fibrosis Extent of steatosis Centrilobular fibrosis Portal inflammation Intralobular necroinflammation Fibrosis
66.03 18.64 13.93 13.57 12.93 2.96 2.22 0.73
0.0006 0.0053 0.0019 0.0008 0.0042 0.0527 0.3529 0.8011
looning and centrilobular or pericellular fibrosis. The items that have no significant odds ratio included portal inflammation and fibrosis. 3.5. Association between the odds ratio in multivariate analysis and the kappa value in interobserver variation In Table 11, the association between multivariate analysis and interobserver variation was summarized. The odds ratio in multivariate analysis indicates the contribution to the pathological diagnosis and the kappa value indicates the strength of concordance in the interpretation of histopathological features. Steatosis and centrilobular or pericellular fibrosis had high odds ratios and high kappa values. Neutrophilic infiltration and ballooning had high odds ratios but low kappa values. In the other hand, intralobular inflammation had a low odds ratio and low kappa value. Portal change including inflammation and fibrosis had a low odds ratio but a moderate kappa values, suggesting that only some
Low odds ratio but moderate κ value Portal changes including inflammation and fibrosis
Table 12 A proposal for the standardization of the histopathological diagnosis of NASH Proposal 1 Proposal 2
Proposal 3
Redefine the histological definition of ballooning to evaluate it with high concordance Define the standardized evaluation of intralobular necroinflammation in association with neutrophilic infiltration Exclude portal changes from the diagnostic criteria of NASH
pathologists included the portal changes in the diagnostic criteria.
4. Proposal We made proposals for the standardization of the histopathological diagnosis of NASH that would make it possible for pathologists to be in excellent agreement on the diagnosis of NASH (Table 12). The first proposal was to redefine the histological definition of ballooning to include high concordance in the evaluation process. The second proposal was to define a standard for the evaluation of intralobular necroinflammation in association with neutrophilic infiltration. The third proposal was to exclude portal changes from the diagnostic criteria of NASH in adults. In this study, We emphasized that it is possible for pathologists to make a reliable pathological diagnosis. In the future, the present results will provide useful clues to develop a standardized and reliable pathological diagnosis and scoring system [4,11,12].
Table 10 Odds ratio of histopathological features for the diagnosis of NASH Features that have high odds ratios Degree of neutrophilic infiltration Degree of steatosis Ballooning Centrilobular or pericellular fibrosis Features that have low odds ratios Necroinflammation Portal inflammation Portal fibrosis
5. Conclusions In conclusion, these results indicated that interobserver variability in the diagnosis of NASH among pathologists might be caused by variability in interpretation of histological findings and by diversity of minimal criteria for diagnosis. Some histological features in NASH, including steatosis and fibrosis, might prove useful for the development of a standardized and reliable pathological diagnosis and scoring system.
T. Fukusato et al. / Hepatology Research 33 (2005) 122–127
Acknowledgement We thank all members of Liver Study Group-Kanto for supporting this study. References [1] Ludwig J, Viggiano TR, McGill DB, et al. Nonalcoholic steatohepatitis: Mayo Clinic experiences with a hitherto unnamed disease. Mayo Clinic Proc 1980;55:434–8. [2] Angulo P. Nonalcoholic fatty liver disease. N Engl J Med 2002;346:1221–31. [3] Neuschwander-Tetri BA, Caldwell SH. Nonalcoholic steatohepatitis: summary of an AASLD Single Topic Conference. Hepatology 2003;37:1202–9. [4] Brunt EM. Nonalcoholic steatohepatitis. Semin Liver Dis 2004;24:3–20. [5] Garc´ıa-Monz´on C, Fern´andez-Bermejo M. A wider view on diagnostic criteria of nonalcoholic steatohepatitis. Gastroenterology 2002;122:840–2. [6] Dixon JB, Bhathal PS, O’Brien PE, et al. Nonalcoholic fatty liver disease: predictors of nonalcoholic steatohepatitis and liver fibrosis in the severely obese. Gastroenterology 2001;121:91–100. [7] Garc´ıa-Monz´on C, Mant´ın-P´erez E, Lo lacono O, et al. Characterization of pathogenic and prognostic factors of nonalcoholic steatohepatitis associated with obesity. J Hepatol 2000;33:716–24.
127
[8] Takada A, Nei J, Matsuda Y, et al. Clinicopathological study of alcoholic fibrosis. Am J Gastroenterol 1982;77:660–6. [9] Sugimoto M, Hatori T, Ito T, et al. Characteristic features of liver disease in Japanese alcoholic. Am J Gastroenterol 1985;80:993– 7. [10] Nonomura A, Mizukami Y, Unoura M, et al. Clinicopathologic study of alcohol-like liver disease in non-alcoholic; non-alcoholic steatohepatitis and fibrosis. Gastroenterol Jpn 1992;27:521–8. [11] Younossi ZM, Gramlich T, Liu YC, et al. Nonalcoholic fatty liver disease: assessment of variability in pathologic interpretations. Mod Pathol 1998;11:560–5. [12] Brunt EM, Janney CG, Di Bisceglie AM, et al. Nonalcoholic steatohepatitis: a proposal for grading and staging the histological lesions. Am J Gastroenterol 1999;94:2467–74. [13] Landis JR, Koch GG. An application of hierarchical κ-type statistics in the assessment of majority agreement among multiple observers. Biometrics 1977;33:363–74. [14] O’Connell DL, Dobson AJ. General observer-agreement measures on individual subjects and groups of subjects. Biometrics 1984;40:973–83. [15] Stokes ME, Davis CS, Koch GG. Categorical data analysis using the SAS system. Cary, NC: SAS Institute Inc.; 1995. [16] Walker SH, Duncan DB. Estimation of the probability of an event as a function of several independent variables. Biometrika 1967;54:167–79. [17] Demetris AJ, Belle SH, Hart J, et al. Intraobserver and interobserver variation in the histopathological assessment of liver allograft rejection. Hepatology 1991;14:751–5.