Journal of Hepatology 38 (2003) 240–242 www.elsevier.com/locate/jhep
Editorial
Assessment of liver biopsies in chronic hepatitis: how is it best done? Peter J. Scheuer* Department of Histopathology, Royal Free Hospital, Pond Street, London NW3 2QG, UK
See Article, pages 223–229
Scoring of histological changes on liver biopsy, with its two components of grading and staging, is a widely-used tool in chronic hepatitis. Over the past decade it has replaced the earlier classification of chronic hepatitis into chronic persistent, chronic active and chronic lobular forms [1,2], a classification devised at a time when most examples of chronic hepatitis were of unknown aetiology. In 1994, two independent international working parties agreed that the 1968 classification was no longer appropriate [3,4]. Grading of necroinflammatory changes and staging of structural alterations were seen as important requirements of therapeutic trials and as a possible supplement to verbal descriptions in routine diagnostic practice [3]. Several different scoring systems are now available, and the inter- and intra-observer variation of some of these has been investigated. In their study of 127 patients with chronic hepatitis reported in this issue, Rozario and Ramakrishna from India used two different scoring methods [5]. One was the modification by Ishak et al. [6] of the earlier Histology Activity Index [7]. The other was the algorithm devised by the METAVIR group in France [8] together with a simple staging system for fibrosis and cirrhosis [9,10]. The aim of Rozario and Ramakrishna’s study was to compare the two scoring methods and also to investigate the correlation between the necroinflammatory component of the Ishak score and serum transaminase levels. In keeping with previously published results, this correlation was poor. In order to facilitate comparison between the Ishak and METAVIR scoring systems, the Ishak necroinflammatory scores were grouped into four grades of severity – minimal, mild, moderate and severe. There was moderate agreement between the two scoring systems for necroinflammation, and excellent agreement for fibrosis and cirrhosis. Similar results were obtained when biopsies from the 82 patients with chronic hepatitis B and those from 45 patients with hepatitis C were separately assessed. * Tel.: 144-20-8455-5459; fax: 144-20-8455-4383. E-mail address:
[email protected] (P.J. Scheuer).
The practicalities of scoring affect the accuracy of the results. The METAVIR group [9] performed a careful study of intra- and interobserver variation and, as also noted by others [11], found that variation was greater for necroinflammatory features than for fibrosis and cirrhosis. They commented that while a single observer was sufficient for recognition of general pathological features of chronic hepatitis C, two observers were likely to obtain better results for numerical scoring. The two observers should examine the biopsy specimens together. Significantly, each feature was comprehensively discussed between all observers at the beginning of the study, and agreement reached on the definition and scoring of each feature. The present study was carried out in a similarly careful way, both authors examining all biopsies and determining scores by consensus. Rozario and Ramakrishna conclude that both scoring methods are appropriate for use in chronic hepatitis B and C, and that the choice of system can be left to the preferences of the clinician and pathologist. They further comment that in their experience of routinely scoring all liver biopsies in their institution, the Ishak system has proved user-friendly and comprehensive, providing a quantitative value that is well understood by clinicians. Not everyone will agree with the last comment. Are the results of scoring really quantitative? The numbers generated for each pathological feature do not represent measurements but categories. The numbers are essentially the result of subjective judgements made by pathologists, and hence inevitably influenced by the observers’ experience and bias. That is why the METAVIR group agreed on definitions and scoring for each feature at the beginning of their study. Experience shows that different observers acting independently may generate somewhat different scores for a given feature such as interface hepatitis or lobular inflammation. The raw numbers generated in different studies cannot therefore be legitimately amalgamated in a meta-analysis. Indeed, even the results obtained by a single observer at different times are subject to observer variation, suggesting that the results of routine scoring of diagnostic liver biopsies
0168-8278/02/$20.00 q 2002 European Association for the Study of the Liver. Published by Elsevier Science B.V. All rights reserved. doi:10.1016/S 0168 -82 78(02)00406-3
P.J. Scheuer / Journal of Hepatology 38 (2003) 240–242
can be misleading. In this author’s view routine scoring should only be carried out if clinician and pathologist agree that it is needed for a particular purpose, and then only under carefully controlled conditions. What is the best scoring system? There can be no general answer to this question. In order to choose the most appropriate system, the clinician and pathologist should first define the purpose of scoring for their particular needs. If the purpose of scoring is simply to stratify patients into those with mild, moderate or severe disease for purposes of management, then a simple system is likely to be adequate. Several are available [8,12,13] and a new scoring system can if necessary be devised to suit the particular project. Simple systems have two advantages; they are easier and quicker to use, and observer error is likely to be relatively low [11]. On the other hand, if the purpose of scoring is to detect small differences in liver damage or fibrosis in a clinical trial, then a system with a substantial range of numbers and therefore sensitivity would be more appropriate. Because of the greater likelihood of observer variation inherent in more complex systems, all possible steps to reduce this to an acceptable level should be taken. These include use of two or more observers, agreement on criteria before scoring is undertaken, consensus on scores where observers do not agree, and consistency audit of a proportion of the scored slides to make sure that intra-observer variation is not excessive. Scoring should be carried out without knowledge of the clinical details of the individual patients. Scoring of different biopsies should as far as possible be done over a relatively short period of time in order to avoid fluctuation of criteria in the minds of the observers. When scores are generated in a clinical trial or research project, they need to be handled according to their subjective nature. There is a temptation, not always resisted, to assume that because the results are numbers, they can be manipulated in the same way as, for example, measurements of serum bilirubin or viral load. Statistical methods used must be designed for categorical rather than numerical data. Moreover, individual components of grading scores (e.g. interface hepatitis, lobular hepatitis, confluent necrosis and portal inflammation) should be individually analysed. The scales for each of these differ from the others, and they are not linear. Combining them into a total necroinflammatory score gives an approximate idea of the severity of a patient’s hepatitis, but not of the nature of the histological lesions involved. In the case of staging, morphometry, the measurement of tissue components in histological sections, provides an alternative approach [14]. Collagen is stained by an appropriate method such as Sirius red, and the area occupied by fibrous tissue per unit area determined with the help of appropriate computer software. Masseroli [15] investigated the differences between morphometry and staging and concluded that the two techniques were complementary rather than in competition. Staging takes into account not only the amount of fibrous tissue but also the presence or
241
absence of structural changes such as nodule formation. Indeed, as cirrhosis develops and nodules enlarge, the amount of fibrous tissue per unit area may actually decrease rather than increase. However, morphometry does offer objective measurement as opposed to the subjective categorisation inherent in staging, and can sometimes detect changes not evident from the latter [16,17]. One obvious advantage of staging over morphometry, on the other hand, is that it needs no special equipment and is usually less time-consuming. Yet another approach to the problem has been measurement of the markers of fibrogenesis alphasmooth muscle actin and C-terminal procollagen a1 (III) propeptide [18]. Given the need for accurate, reproducible and practicable markers of necroinflammation and fibrosis in chronic viral hepatitis, the search for better methods should continue. In the meantime grading and staging must be performed critically and with due attention to the need to minimise observer variation.
References [1] De Groote J, Desmet VJ, Gedigk P, Korb G, Popper H, Poulsen H, et al. A classification of chronic hepatitis. Lancet 1968;ii:626–628. [2] Popper H, Schaffner F. The vocabulary of chronic hepatitis. N Engl J Med 1971;284:1154–1156. [3] Desmet VJ, Gerber M, Hoofnagle JH, Manns M, Scheuer PJ. Classification of chronic hepatitis: diagnosis, grading and staging. Hepatology 1994;19:1513–1520. [4] Working Party. Terminology of chronic hepatitis, hepatic allograft rejection, and nodular lesions of the liver: summary of recommendations developed by an international working party, supported by the World Congresses of Gastroenterology. Los Angeles 1994. Am J Gastroenterol 1994;89:S177–S181. [5] Rozario R, Ramakrishna B. Histopathological study of chronic hepatitis B and C: a comparison of two scoring systems. J Hepatol 2003; 38:223–229. [6] Ishak K, Baptista A, Bianchi L, Callea F, De Groote J, Gudat F, et al. Histological grading and staging of chronic hepatitis. J Hepatol 1995;22:696–699. [7] Knodell RG, Ishak KG, Black WC, Chen TS, Craig R, Kaplowitz N, et al. Formulation and application of a numerical scoring system for assessing histological activity in asymptomatic chronic active hepatitis. Hepatology 1981;1:431–435. [8] Bedossa P, Poynard T, The METAVIR cooperative study group. An algorithm for the grading of activity in chronic hepatitis C. Hepatology 1996;24:289–293. [9] Bedossa P, Bioulac-Sage P, Callard P, Chevallier M, Degott C, Deugnier Y, et al. Intraobserver and interobserver variations in liver biopsy interpretation in patients with chronic hepatitis C. Hepatology 1994;20:15–20. [10] Poynard T, Bedossa P, Opolon P. Natural history of liver fibrosis progression in patients with chronic hepatitis C. Lancet 1997;349:825–832. [11] Goldin RD, Goldin JG, Burt AD, Dhillon P, Hubscher S, Wyatt J, et al. Intra-observer and inter-observer variation in the histopathological assessment of chronic viral hepatitis. J Hepatol 1996;25:649–654. [12] Batts KP, Ludwig J. Chronic hepatitis. An update on terminology and reporting. Am J Surg Pathol 1995;19:1409–1417. [13] Scheuer PJ. Classification of chronic viral hepatitis: a need for reassessment. J Hepatol 1991;13:372–374.
242
P.J. Scheuer / Journal of Hepatology 38 (2003) 240–242
[14] Scheuer PJ, Standish RA, Dhillon AP. Scoring of chronic hepatitis. Clin Liver Dis 2002;6:335–347. [15] Masseroli M, Caballero T, O’Valle F, Del Moral RM, Pe´ rez-Milena A, Del Moral RG. Automatic quantification of liver fibrosis: design and validation of a new image analysis method: comparison with semi-quantitative indexes of fibrosis. J Hepatol 2000;32:453–464. [16] Manabe N, Chevallier M, Chossegros P, Causse X, Guerret S, Trepo C, et al. Interferon-alpha 2b therapy reduces liver fibrosis in chronic non-A, non-B hepatitis: a quantitative histological evaluation. Hepatology 1993;18:1344–1349.
[17] Duchatelle V, Marcellin P, Giostra E, Bregeaud L, Pouteau M, Boyer N, et al. Changes in liver fibrosis at the end of alpha interferon therapy and 6–18 months later in patients with chronic hepatitis C: quantitative assessment by a morphometric method. J Hepatol 1998;29:20– 28. [18] Kweon Y-O, Goodman ZD, Dienstag JL, Schiff ER, Brown NA, Burkhardt E, et al. Decreasing fibrogenesis: an immunohistochemical study of paired liver biopsies following lamivudine therapy for chronic hepatitis B. J Hepatol 2001;35:749–755.