Comparison of blood tests for liver fibrosis specific or not to NAFLD

Comparison of blood tests for liver fibrosis specific or not to NAFLD

Journal of Hepatology 50 (2009) 165–173 www.elsevier.com/locate/jhep Comparison of blood tests for liver fibrosis specific or not to NAFLDq Paul Cale`s...

232KB Sizes 0 Downloads 44 Views

Journal of Hepatology 50 (2009) 165–173 www.elsevier.com/locate/jhep

Comparison of blood tests for liver fibrosis specific or not to NAFLDq Paul Cale`s1,2,*, Fabrice Laine´3, Je´roˆme Boursier1,2, Yves Deugnier3,4, Vale´rie Moal5, Fre´de´ric Oberti1,2, Gilles Hunault2, Marie Christine Rousselet2,6, Isabelle Hubert1,2, Jihane Laafi2,7, Pierre Henri Ducluzeaux8, Francßoise Lunel2,9 1

Service d’He´pato-Gastroente´rologie, CHU, 49933 Angers Cedex 09, France 2 Laboratoire HIFIH, UPRES 3859, IFR 132, Universite´, Angers, France 3 CIC Inserm 0203, Hoˆpital Pontchaillou, CHU, Rennes, France 4 Service des Maladies du Foie, Hoˆpital Pontchaillou, CHU, Rennes, France 5 Laboratoire de Biochimie et Biologie Mole´culaire, CHU, Angers, France 6 De´partement de Pathologie Cellulaire et Tissulaire, CHU, Angers, France 7 Service EFD d’He´pato-Gastroente´rologie, Hoˆpital Ibn Sina, CHU, Rabat, Morocco 8 Service d’Endocrinologie-Diabe´tologie-Nutrition, CHU, Angers, France 9 Laboratoire de Virologie, CHU, Angers, France

Background/Aims: To compare blood tests of liver fibrosis specific for NAFLD: the FibroMeter NAFLD and the NAFLD fibrosis score (NFSA) with a non-specific test, APRI. Methods: Two hundred and thirty-five NAFLD patients with liver Metavir staging and blood markers from two independent centres were randomly assigned to a test (n = 121) or a validation population (n = 114). Results: The highest accuracy – 91% – for significant fibrosis was obtained with the FibroMeter whose (i) AUROC (0.943) was significantly higher than those of NFSA (0.884, p = 0.008) and APRI (0.866, p < 10 3; p = 0.309 vs NFSA) in the whole population, and (ii) misclassification rate (9%) was significantly lower than those of NFSA (14%, p = 0.04) and APRI (16%, p = 0.002) and did not vary according to centre (14 vs 7%, p = 0.07), unlike those of NFSA (25 vs 9%, p = 0.001) and APRI (29 vs 11%, p < 10 3). By using thresholds of 90% predictive values, liver biopsy could have been avoided in most patients: FibroMeter: 97.4% vs NFSA: 86.8% (p < 10 3) and APRI: 80.0% (p < 10 3). A new classification provided three reliable diagnosis intervals: F0/1, F0/1/2, F2/3/4 with 91.4% accuracy for FibroMeter, avoiding biopsy in all patients. Conclusions: FibroMeter NAFLD had high performance and provided reliable diagnosis for significant fibrosis, significantly outperforming NFSA and APRI. Ó 2008 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved. Keywords: Blood fibrosis markers; NAFLD; Liver biopsy; Liver fibrosis; Sensitivity; Specificity

Received 10 April 2008; received in revised form 1 June 2008; accepted 2 July 2008; available online 7 October 2008 Assistant Editor: Silvia Fargion q Paul Cale`s, Fre´de´ric Oberti, Isabelle Hubert, and Francßoise Lunel have mentioned potential conflict of interest due to stock ownership in a society (BioLiveScale) recently created under the auspices of University of Angers. * Corresponding author. Tel.: +33 2 41 35 34 10; fax: +33 2 41 35 41 19. E-mail address: [email protected] (P. Cale`s). Abbreviations: AST, aspartate aminotransferase; ALT, alanine aminotransferase; APRI, aspartate aminotransferase to platelet ratio index; AUROC, area under the receiver operating characteristic; BMI, body mass index; CLD, chronic liver disease; NAFLD, non-alcoholic fatty liver disease; NFSA, NAFLD fibrosis score of Angulo et al.; NPV, negative predictive value; PPV, positive predictive value. 0168-8278/$34.00 Ó 2008 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.jhep.2008.07.035

P. Cale`s et al. / Journal of Hepatology 50 (2009) 165–173

166

mance, such as diagnostic targets and fibrosis stages, as well as reproducibility.

1. Introduction Several blood tests have been proposed to diagnose liver fibrosis [1]. Some tests are simple, like the aspartate aminotransferase to platelet ratio index (APRI) [2]. Others are more complex, constructed as algorithms (regression score) like the FibroMeter [3]. Most of them have been developed in chronic hepatitis C or in miscellaneous causes [4]. However, in a previous study, we observed that the cause of CLD was an independent predictor of fibrosis and thus it was preferable to develop specific tests for alcoholic or viral CLD to improve accuracy [3]. Non-alcoholic fatty liver disease (NAFLD) is an increasingly recognized condition in several countries [5] that can lead to cirrhosis or liver cancer. Some simple variables [6–8], fibrosis blood markers [9,10] and other tests [4,11] have been evaluated in NAFLD but these studies are rare or performed in only a few patients [4] and no blood test had been specifically designed for this prevalent disease until recently. We thus designed a simple algorithm in a previous study [12]. More recently, the NAFLD fibrosis score of Angulo et al. (NFSA) has been implemented in a large cohort with excellent performance [13]. However, this test was designed for severe fibrosis whereas most tests have been designed for significant fibrosis and usually for chronic hepatitis C. Some of the latter have been validated in NAFLD [11,14]. The main aim of the present study was to implement a blood test for significant liver fibrosis specifically designed for NAFLD with high diagnostic performance. The secondary aims were to compare this test to the only other specific test, the NFSA, and to a non-specific reference test, i.e. APRI, the simplest test. Other aims were to evaluate the factors influencing this diagnostic perfor-

2. Patients and methods 2.1. Centres Two tertiary centres, Angers and Rennes, provided, respectively, 73 and 162 patients, for a total of 235. The centres were independent for study design, patient recruitment, blood measurements, and liver interpretation. Due to differences in size and patient characteristics, especially fibrosis stages, between the two centres (Table 1), all patients were pooled then randomly divided into test (121 patients) and validation (114 patients) populations with stratification based on Metavir fibrosis stages.

2.2. Patients Patients were considered as having NAFLD and prospectively included between 2001 and 2006 if they had abnormal liver blood tests or ultrasonography showing diffuse hyperechogenicity compared to that of the spleen, together with at least one of the five clinical features used in the definition of metabolic syndrome according to the Adult Treatment Panel III Working Group [12,15] detailed elsewhere [12]. In addition, liver specimen had to be compatible [16] and alcohol consumption had to be <30 g/day for the past five years according to a standard questionnaire as described elsewhere [12,17]. Patients were not included if they had another cause of CLD, complicated cirrhosis or were given putative anti-fibrotic treatment in the past 6 months. In the Angers centre, 48.8% of patients with suspected NAFLD were not included as liver biopsy had not been performed; they were characterized by less severe liver disease (data not shown). The study protocol conformed to the ethical guidelines of the current Declaration of Helsinki and was approved by a local Ethics Committee.

2.3. Methods 2.3.1. Clinical data and blood tests Diabetes was defined as fasting glucose P126 mg/dL at inclusion or patient under drug treatment [18]. Fasting blood samples were taken at inclusion (date of liver biopsy ±7 days). The usual blood

Table 1 Characteristics of populations.

N patients Sex (% male) Age (years) Body weight (kg) BMI (kg/m2) Metavir fibrosis stage: F0 (%) F1 (%) F2 (%) F3 (%) F4 (%) Significant fibrosis (%) Severe fibrosis (%) Metavir fibrosis score Liver specimen size (mm) APRI NFSA FibroMeter

Whole

By centre

After randomization

Angers

Rennes

p

Test

Validation

p

235 74.5 51.1 ± 11.0 82.9 ± 16.0 28.7 ± 4.9

73 64.4 54.8 ± 11.8 87.6 ± 21.6 30.8 ± 6.7

162 79.0 49.4 ± 10.2 80.8 ± 12.2 27.8 ± 3.5

121 70.3 51.2 ± 12.3 12 83.5 ± 17.9 29.1 ± 5.5

114 78.9 51.0 ± 9.5 82.2 ± 13.8 28.4 ± 4.2

43.4 28.9 8.9 8.1 10.6 27.7 18.7 1.1 ± 1.3 30 ± 12 0.53 ± 0.54 0.20 ± 0.24 0.28 ± 0.35

17.8 23.3 13.7 23.3 21.9 58.9 45.2 2.1 ± 1.4 21.5 ± 10 0.77 ± 0.66 0.39 ± 0.31 0.55 ± 0.37

54.9 31.5 6.8 1.2 5.6 13.6 6.8 0.7 ± 1.0 34 ± 11 0.43 ± 0.44 0.12 ± 0.16 0.15 ± 0.26

– 0.02 <10 3 0.003 <10 3 < 10 3 – – – – <10 3 < 10 3 <10 3 <10 3 <10 3 <10 3 < 10 3 < 10 3

43.0 28.9 9.1 9.3 10.7 28.1 19.0 1.2 ± 1.4 27 ± 12 0.59 ± 0.65 0.22 ± 0.27 0.31 ± 0.36

43.9 29.0 8.8 7.9 10.5 27.2 18.4 1.1 ± 1.3 34 ± 11 0.47 ± 0.39 0.18 ± 0.22 0.24 ± 0.34

– 0.13 0.88 0.55 0.28 > 0.99 – – – – 0.96 0.88 0.91 0.88 <10 3 0.09 0.22 0.11

P. Cale`s et al. / Journal of Hepatology 50 (2009) 165–173 variables were included, especially fasting glucose at inclusion, as were the following fibrosis markers: hyaluronic acid, a2-macroglobulin, apolipoprotein-A1, prothrombin index, platelets, aspartate and alanine aminotransferases (AST, ALT), c-glutamyltranspeptidase, total bilirubin and urea. Methods and reagents were previously described [3,12]. The NFSA (age, hyperglycaemia, body mass index (BMI), platelets, albumin, AST/ALT) and APRI were calculated according to published scores [2,13]. The inter-laboratory reproducibility of blood tests was evaluated between the Angers centre and a centralized laboratory (Merieux, Lyon, France) employing different methods on the same blood sample in an additional 33 patients with various CLD.

3. Results

2.3.2. Liver biopsy

3.1. General characteristics

167

The size of the exploratory population was determined to show a significant difference between the FibroMeter and the NFSA. With a risk: 0.05, b risk: 0.2, significant fibrosis prevalence: 0.5, AUROC correlation: 0.7, and a bilateral test, the sample size was 222 patients for the following hypothesis of AUROC: FibroMeter: 0.90, NFSA: 0.84 according to preliminary data in one centre. The statistical software programmes used were SPSS version 11.5.1 (SPSS Inc., Chicago, IL, USA) and SAS 9.1 (SAS Institute Inc., Cary, NC, USA).

A percutaneous liver biopsy (usually using a 1.4–1.6 mm diameter needle, suction technique) was performed generally within one week (maximum 3 months) of blood sampling. The indication for liver biopsy was different in the two centres, being mainly based on the predictors of Ratziu et al. [7] in Angers and on abnormal liver blood tests in Rennes [12]. Fibrosis was graded into 5 stages from F0 to F4 by one (Rennes) or two independent (Angers) pathologist(s) according to Metavir staging [19]. Three diagnostic targets were used: significant fibrosis (main judgement criterion) including stages F2 + F3 + F4, severe fibrosis including stages F3 + F4, and cirrhosis including stage F4. Biopsy specimens were not re-examined centrally, as we have shown in a previous study (involving most of the same pathologists) that inter-observer agreement on the Metavir staging system is excellent among senior hepato-pathologists [20].

Mean age of patients was 51 years, 75% were male, mean BMI was 29 and 27.7% had significant fibrosis. Type II diabetes was present in 24.1% and under drug treatment in 13.1% of the patients. Present or past consumption of alcohol and/or tobacco was noted in 67.5% and 26.8% of patients, respectively. Alcohol intake (median, interquartiles) was 13.5 (4.8–23.4) g/d for 25.8 (16.0–31.1) years. Obesity (BMI P 30) was present in 31.0% of patients. Non-alcoholic steatohepatitis (NASH) was diagnosed on liver biopsy in 39.8% of patients; 89% of patients had liver specimen P15 mm. Most characteristics were significantly different between the two centres (Table 1), e.g. patients in the Angers population had higher BMI and more marked alterations with more fibrosis. Test and validation populations were not significantly different except for liver specimen size (Table 1).

2.3.3. Statistical analysis Data were reported according to STARD statements [21]. Quantitative variables were expressed as means ± SD, unless otherwise specified. Forward stepwise binary logistic regression was used, especially for the determination of FibroMeter as described elsewhere [3]. The diagnostic cut-off of logistic regression score probability was established in two ways: a priori at 0.5 before running the model to maximize the diagnostic accuracy according to statistical rules, except for APRI, and a posteriori according to the highest Youden index (Se + Spe 1) or the maximum overall accuracy to accordingly optimize overall accuracy after running the model. The performance of each test was mainly expressed either by the overall accuracy, i.e. true positives and negatives, and the area under the receiver operating characteristic (AUROC), or with more detailed indices [22–24]. Among these indices, kappa was determined to reflect the agreement between blood tests and diagnostic targets. AUROC were compared by the Delong test [25]. In addition, the misclassification rate of blood tests, also called test performance profile [26], for significant fibrosis was calculated in each or in possibly combined Metavir F stage(s). Validation was performed in the validation population and in the whole population.

3.2. Overall test performance Logistic regression provided the regression score with the highest overall accuracy for significant fibrosis in the test population when seven variables were included: glucose, AST, ferritin, platelet, ALT, body weight and age. This test was named FibroMeter NAFLD. The regression function was: 0.4184 glucose (mmol/l) + 0.0701 AST (UI/l) + 0.0008 ferritin (lg/l) 0.0102 platelet

Table 2 Comparison of AUROCs between blood tests according to population and diagnostic target. Significant fibrosis Population

Severe fibrosis

Cirrhosis

Test

Valid.

Whole

Test

Valid.

Whole

Test

Valid.

Whole

AUROC FibroMeter NFSA APRI

0.936 0.900 0.843

0.952 0.865 0.892

0.943 0.884 0.866

0.928 0.937 0.821

0.950 0.926 0.905

0.937 0.932 0.861

0.929 0.943 0.853

0.888 0.857 0.838

0.904 0.902 0.842

Comparisona FibroMeter vs NFSA FibroMeter vs APRI NFSA vs APRI

0.112 0.008 0.078

0.033 0.017 0.590

0.008 <10 3 0.309

0.531 0.002 0.017

0.351 0.058 0.432

0.294 <10 3 0.015

0.671 0.009 0.058

0.473 0.157 0.607

0.944 0.005 0.047

Significant differences are in bold characters. a Delong test.

P. Cale`s et al. / Journal of Hepatology 50 (2009) 165–173

Sensitivity

168 1.0

FibroMeter: 91%, NFSA: 86% (p = 0.04 vs FibroMeter), APRI: 84% (p = 0.004 vs FibroMeter).

0.8

NFSA

3.2.2.2. Other diagnostic targets. AUROCs of FibroMeter and NFSA for severe fibrosis or cirrhosis were not significantly different. Globally, they were significantly superior to APRI AUROCs.

APRI

3.3. Reliable diagnosis

FibroMeter

The P90% negative (NPV) and positive (PPV) predictive values for significant fibrosis, defining the traditional intervals of reliable diagnosis, were observed in the whole population when values were, respectively: FibroMeter: 60.611 and P0.715, NFSA: 6 0.227 and P0.514, APRI: 60.454 and P0.918. The ensuing proportion of patients with reliable diagnosis were: FibroMeter: 97.4%, NFSA: 86.8% (p < 10 3 vs FibroMeter), APRI: 80.0% (p < 10 3 vs FibroMeter and p = 0.04 vs NFSA) (Fig. 2). We have recently implemented a new classification to define the intervals of reliable diagnosis [27]. Considering FibroMeter and using the threshold of 95% NPV only (FibroMeter: 0.297) and the diagnostic cut-off (maximum accuracy: 0.490), it was possible to obtain a reliable diagnosis in 100% of patients with the following three intervals and corresponding diagnosis: 695% NPV: F0/1 (accuracy: 95.0%, 68.5% of population), >95% NPV to
0.5

0.3

0.0 0.0

.3

.5 1 - Specificity

.8

1.0

Fig. 1. Overall performance: comparison of AUROCs of blood tests (p = 0.004) for significant fibrosis in the whole population.

(G/l) 0.0260 ALT (UI/l) + 0.0459 body weight (kg) + 0.0842 age (yr) + 11.6226. 3.2.1. Comparison between test and validation populations The AUROCs of FibroMeter were very high (0.94 for significant fibrosis) and close between both populations except in cirrhosis (Table 2). The AUROCs of NFSA and APRI were high but less close between both populations. 3.2.2. Comparison between blood tests AUROC comparisons between blood tests are presented as a function of different populations and diagnostic targets in Table 2.

3.4. Test performance profile 3.2.2.1. Main diagnostic target. The FibroMeter AUROC for significant fibrosis was significantly higher than that of APRI or NFSA (except in the test population) (Fig. 1). AUROCs of NFSA and APRI were not significantly different. Detailed diagnostic indices are presented in Table 3. Briefly, overall accuracies were:

The misclassification rate (Fig. 4) was significantly lower for FibroMeter than for NFSA or APRI in combined stages of significant fibrosis, whereas it was not significantly different between the three blood tests in F0 or F1 stages (details not shown).

Table 3 Performance indices of blood tests for significant fibrosis in the whole population. Test

ja

Se

Spe

+PV

FibroMeter

0.769 ±0.048 0.628 ±0.060 0.584 ±0.060

78.5 (68.5–88.5) 60.9 (49.0–72.9) 66.1 (54.7–77.7)

95.9 (92.9–98.9) 96.3 (93.3–99.2) 90.6 (86.2–95.0)

87.9 (79.5–96.3) 86.7 (76.7–96.6) 72.9 (61.5–84.2)

NFSA APRI

PV 92.1 (88.1–96.1) 86.0 (81.0–91.1) 87.5 (82.6–92.4)

+LR 19.05 (9.12–39.80) 16.25 (7.24–36.50) 7.03 (4.27–11.56)

LR 0.22 (0.14–0.36) 0.41 (0.30–0.55) 0.37 (0.27–0.53)

DOR

OA

AUROCb

84.8 (32.5–221.6) 40.0 (15.4–104.3) 18.8 (9.1–38.9)

91.1 (87.4–94.7) 86.2 (81.6–90.7) 83.8 (79.1–88.5)

0.943 (0.911–0.975) 0.884 (0.829–0.939) 0.866 (0.812–0.919)

95% confidence intervals are into brackets. j: kappa index (mean ± SD), Se: sensitivity (%), Spe: specificity (%), PV: predictive value (%), LR: likelihood ratio, DOR: diagnostic odds ratio, OA: overall accuracy (%), AUROC: area under the receiver operating characteristic, NFSA: NAFLD fibrosis score of Angulo et al. Diagnostic cut-offs of blood tests were fixed a posteriori (maximum accuracy). a Kappa index reflecting the agreement with liver specimen. b AUROC is independent of diagnostic cut-off.

169

80 p<10-3

120

Misclassification rate (%)

Pts (%) with predictive values >=90%

P. Cale`s et al. / Journal of Hepatology 50 (2009) 165–173

p<10-3 100

p=0.04

80 60 40 20

FibroMeter

70

NFSA

60

APRI

50 40 30 20 10

0

FibroMeter

NFSA

APRI

Fig. 2. Reliable diagnosis: comparison of patients with blood test values P thresholds of 90% positive or negative values as a function of blood test in the whole population.

0 0

1

2

3

4

Metavir fibrosis stage

Fig. 4. Detailed performance: comparison of misclassification rates of blood tests for significant fibrosis as a function of Metavir fibrosis stages (test performance profiles) in the whole population.

3.5. Centre effect As expected, the mean blood test values were significantly different between centres (Table 1). Differences in AUROCs between tests were comparable between centres (detailed data not shown), e.g. FibroMeter vs NFSA for significant fibrosis: p = 0.054 in Angers and p = 0.068 in Rennes. The overall (all F stages) misclassification rate for significant fibrosis varied significantly according to centre for NFSA (respectively in the Angers and Rennes populations): 25% vs 9% (p = 0.001), and APRI: 29% vs 10% (p < 10 3), but not for FibroMeter: 14% vs 7% (p = 0.09) (Table 4). The independent predictors of misclassification rate were for FibroMeter: fibrosis stage (p = 0.003); for APRI: ALT (p = 0.048), age (p = 0.025) and centre (p = 0.008); and for NFSA: fibrosis stage (p < 10 3) and platelets (p = 0.029).

100 90

The reliable diagnosis with 90% predictive values was observed in 75.3% of patients in Angers and 97.5% in Rennes (p < 10 3). The accuracy of reliable diagnosis intervals, as previously defined, was 86.3% in Angers and 93.2% in Rennes (p = 0.141). The intraclass correlation coefficient for FibroMeter NAFLD was 0.99 between the two laboratories. 3.6. Sensitivity analysis As serum glucose is sensitive to medical intervention, we constructed another FibroMeter model where glucose was replaced by diabetes (fasting glucose P1.26 g/l or under drug treatment for diabetes). The diagnostic indices of this new model were: AUROC: 0.947 (p = 0.614 vs FibroMeter including raw glucose) and misclassification rate: 11.1% (p = 0.227). AUROCs for significant fibrosis in patients with liver specimens P15 or 20 mm were, respectively: FibroMeter: 0.929 and 0.941, NFSA: 0.869 and 0.870, APRI: 0.865 and 0.887 (p = 0.004 and p = 0.027, respectively, between the three blood tests).

Fibrosis stage (%)

80 70

4. Discussion

60

F

50 4

4.1. Methods

40 3 30 2

20

1

10 0

0

<=NPV95% >= CO >NPV95% to CO Reliable intervals

Fig. 3. Reliable diagnosis: intervals of reliable diagnosis by FibroMeter using two thresholds: NPV 95% and diagnostic cut-off (CO). The corresponding diagnosis categories are: F0/1, F0/1/2, F2/3/4.

Metavir staging, initially developed in chronic hepatitis C, was used in the present study. We have previously demonstrated that the Metavir system correctly staged fibrosis in alcoholic steatohepatitis [28]; it was also a histological reference in NAFLD [12]. We chose significant fibrosis (F P 2) as the main diagnostic target for the following reasons. First and importantly, this target is thought to be clinically significant in several circumstances, not only in viral hepatitis but also in NAFLD where periportal fibrosis is linked with the development of CLD complications [29]. Second, statistical methods

170

P. Cale`s et al. / Journal of Hepatology 50 (2009) 165–173

Table 4 Comparison of misclassified patient rates by blood test (%, grey cells) for significant fibrosis, determined on liver specimen, between centres (top) and bloods tests within each centre (bottom, the comparison for both centres is in Fig. 4).

used to implement blood tests, such as binary logistic regression, are more difficult to develop when the prevalence of the diagnostic target is far from 0.5, such as in viral hepatitis, but this is not the case in NAFLD. This bias is decreased by using significant fibrosis instead of severe fibrosis. Third, several blood tests with good performance in NAFLD were initially designed for significant fibrosis with other cause(s) and then applied to that diagnostic target in NAFLD [4,11,14]. Finally, septal fibrosis, defining Metavir PF2 stage, is the main fibrotic lesion influencing blood markers [28,30]. The Kleiner system used in NAFLD [31] includes perisinusoidal or portal/periportal fibrosis in F1 stage and a mix of them in F2 stage. Kleiner and Metavir systems are thus roughly comparable only in F3 (bridging fibrosis) and F4 (cirrhosis) stages. As there are differences between Metavir and Kleiner staging in fibrosis stages 0, 1 and 2, FibroMeter should be tested and/or adapted with the Kleiner staging. However, the performance of FibroMeter for severe fibrosis will probably be the same as in the Kleiner staging. In addition, FibroMeter NAFLD should be evaluated in other centres, especially from different countries, like with the NFSA score. Most of the patients had a liver specimen size considered as reliable [32] and sensitivity analysis showed that liver specimen length had only a weak influence on performance. 4.2. Performance indices 4.2.1. NFSA The NFSA AUROC for significant fibrosis was high (0.88) and comparable to that observed in another recent study in 91 patients (0.86) [14]. The NFSA AUROC for severe fibrosis in the present series (0.93) was even higher than those of the pivotal study (0.88)

or Guha’s study (0.89). The present study thus provides an independent external validation of NFSA for significant fibrosis in a large series and for severe fibrosis. 4.2.2. APRI We observed a similar performance for significant fibrosis of APRI AUROC compared to the original publication in chronic viral hepatitis C. APRI AUROC was significantly inferior to that of FibroMeter and NFSA for the three diagnostic targets (except with NFSA for significant fibrosis). 4.2.3. FibroMeter Diagnostic performance of FibroMeter was significantly higher than that of NFSA for significant fibrosis but not for severe fibrosis or cirrhosis. The AUROC of FibroMeter NAFLD for significant fibrosis (0.94) was significantly higher than that of the FibroMeter already published [3] for viral or alcoholic CLD and applied to the present whole population (detailed data not shown): 0.89 and 0.71, respectively. This supports the interest in specifically designing blood tests for the main causes of CLD [3]. The accuracy of FibroMeter NAFLD was better in intermediate fibrosis stages (F1, F2), lesser in severe fibrosis (F3, F4) and similar in F0 compared to FibroMeter virus as measured in 1056 patients with chronic hepatitis C [27]. Thus, the high accuracy observed in the present study was not due to a high F0 prevalence (43%) but to high discrimination between F1 and F2 secondary to a high accuracy of FibroMeter NAFLD in F1 (Fig. 4). Finally, considering P90% predictive values as clinically acceptable (Fig. 2), liver biopsy could be avoided in 97.4% of patients with FibroMeter NAFLD for the diagnosis of significant fibrosis, which is significantly higher than for NFSA and APRI. Finally, a new

P. Cale`s et al. / Journal of Hepatology 50 (2009) 165–173

classification [27] distinguishing patients into three diagnostic categories allowed biopsy to be avoided in 100% of patients with an overall accuracy of 91.9% (Fig. 3). Considering that a positive likelihood ratio >5 and a negative likelihood ratio <0.2 provided strong diagnostic evidence, whereas a diagnostic odds ratio >30 provided reasonable test performance [23,24], FibroMeter NAFLD should be the only suitable test for significant fibrosis (Table 3). This interpretation should be adjusted due to the well-known misclassification rates of liver biopsy [20,33–36] especially in NAFLD [37,38], which tend to underestimate blood test accuracy. Interestingly and as expected [39], FibroMeter NAFLD included three independent variables linked to metabolic syndrome: glucose, ferritin and body weight that was more accurate than BMI. Weight gain and insulin resistance were shown to be associated with progression of liver fibrosis in NAFLD [29]. Serum AST [40], glucose [41], ferritin [41,42], ALT, BMI and age [5,7,41] have been identified as independent predictors of fibrosis in NAFLD. Platelet count was found to be an independent predictor of cirrhosis [43], as expected, but not of fibrosis [44] in NAFLD unlike in chronic viral hepatitis C [42] or in the present study. Hyaluronate has already been shown to be a predictor of liver fibrosis in NAFLD [12,39,43]. 4.2.4. Other tests Fibrotest cannot predict severity of liver fibrosis in a third of patients who have NAFLD [11]. The ELF test targeted significant fibrosis (F2 + 3 + 4) using a modified Scheuer score in 61 patients with NAFLD [4]: AUROC was 0.870. In a recent study, the AUROC of the ELF test was 0.82 for moderate fibrosis (Kleiner stages 2 + 3 + 4) and 0.90 for severe fibrosis (stages 3 + 4) in 192 patients with NAFLD [14]. Using thresholds with a sensitivity and specificity of 90%, 62% of patients would avoid a liver biopsy, with 52% correctly classified. As these thresholds include more patients than 90% thresholds of predictive values, this would indicate that the reliability interval is less with the ELF test than with FibroMeter. In the same study [14], the AUROC of combined NFSA and ELF was 0.93 for moderate fibrosis in a subgroup of 91 patients. The ELF test was not evaluated in the present study since we have observed a large inter-laboratory variability in the TIMP1 measurement included in this test [45], whereas this marker was measured in a single independent reference laboratory in the Guha’s study including two referral centres [14].

171

(Fig. 4). This study clearly shows the interest of detailed description of performance: although the overall performance of APRI and NFSA was excellent, their profile showed poor performance in all stages of significant fibrosis, especially in F2. This apparent paradox is due to the low prevalence of significant fibrosis in NAFLD compared to viral or alcoholic CLD. Thus, a population with a high prevalence of significant fibrosis or severe fibrosis, like in Angers, will favour the performance of FibroMeter compared to NFSA or APRI, whereas a low prevalence of significant fibrosis, like in Rennes, will attenuate the differences between blood tests (Fig. 4). 4.3.2. Diagnostic target The overall performance of the tests did not vary as a function of the diagnostic target as determined by Metavir F stage cut-offs (Table 2). 4.3.3. Centre effect There was a difference in performance of blood tests between centres. This can be attributed to the difference in the relative prevalence of Metavir F stages. The overall misclassification rate for significant fibrosis between blood tests and liver specimens significantly varied between centres for NFSA and for APRI but not for FibroMeter (Table 4). These misclassification changes for APRI were also independently centre-linked. Finally, the inter-laboratory reproducibility of FibroMeter NAFLD was excellent in the present study. These characteristics suggest that FibroMeter NAFLD is a robust test. The reliable diagnosis defined by the new classification was far less sensitive to centre effect than the traditional 90% predictive values. In conclusion, this study, conducted in two independent centres, demonstrates high performance and robustness, with the least variability in overall accuracy, for FibroMeter NAFLD compared to NFSA and APRI. APRI performance for significant fibrosis was similar to that of NFSA and that observed in chronic hepatitis C. The main factors influencing accuracy were the relative prevalence of Metavir F stages and fibrosis markers.

Funding A grant was provided by the French national agency for valorisation ‘‘OSEO-ANVAR”. The sponsor had no role in study design, in data collection, analysis, and interpretation, in the writing of the report or in the decision to submit the paper for publication.

4.3. Factors influencing the test performance

Acknowledgement

4.3.1. Fibrosis stage The test performance profile clearly shows that test performance depends significantly on the fibrosis stage

We express our thanks to Anne Laure Tropet, Dermot O’Toole, Kevin Erwin, and Gwe´nae¨lle Soulard for their contributions.

172

P. Cale`s et al. / Journal of Hepatology 50 (2009) 165–173

References [1] Sebastiani G, Alberti A. Non-invasive fibrosis biomarkers reduce but not substitute the need for liver biopsy. World J Gastroenterol 2006;12:3682–3694. [2] Wai CT, Greenson JK, Fontana RJ, Kalbfleisch JD, Marrero JA, Conjeevaram HS, et al. A simple noninvasive index can predict both significant fibrosis and cirrhosis in patients with chronic hepatitis C. Hepatology 2003;38:518–526. [3] Cales P, Oberti F, Michalak S, Hubert-Fouchard I, Rousselet MC, Konate A, et al. A novel panel of blood markers to assess the degree of liver fibrosis. Hepatology 2005;42:1373–1381. [4] Rosenberg WM, Voelker M, Thiel R, Becka M, Burt A, Schuppan D, et al. Serum markers detect the presence of liver fibrosis: a cohort study. Gastroenterology 2004;127:1704–1713. [5] Clark JM. The epidemiology of non-alcoholic fatty liver disease in adults. J Clin Gastroenterol 2006;40:S5–S10. [6] Angulo P, Keach JC, Batts KP, Lindor KD. Independent predictors of liver fibrosis in patients with non-alcoholic steatohepatitis. Hepatology 1999;30:1356–1362. [7] Ratziu V, Giral P, Charlotte F, Bruckert E, Thibault V, Theodorou I, et al. Liver fibrosis in overweight patients. Gastroenterology 2000;118:1117–1123. [8] Dixon JB, Bhathal PS, O’Brien PE. Non-alcoholic fatty liver disease: predictors of non-alcoholic steatohepatitis and liver fibrosis in the severely obese. Gastroenterology 2001;121: 91–100. [9] Musso G, Gambino R, Biroli G, Carello M, Faga E, Pacini G, et al. Hypoadiponectinemia predicts the severity of hepatic fibrosis and pancreatic Beta-cell dysfunction in nondiabetic nonobese patients with non-alcoholic steatohepatitis. Am J Gastroenterol 2005;100:2438–2446. [10] Bahcecioglu IH, Yalniz M, Ataseven H, Ilhan N, Ozercan IH, Seckin D, et al. Levels of serum hyaluronic acid, TNF-alpha and IL-8 in patients with non-alcoholic steatohepatitis. Hepatogastroenterology 2005;52:1549–1553. [11] Ratziu V, Massard J, Charlotte F, Messous D, Imbert-Bismut F, Bonyhay L, et al. Diagnostic value of biochemical markers (FibroTest-FibroSURE) for the prediction of liver fibrosis in patients with non-alcoholic fatty liver disease. BMC Gastroenterol 2006;6:6. [12] Laine F, Bendavid C, Moirand R, Tessier S, Perrin M, Guillygomarc’h A, et al. Prediction of liver fibrosis in patients with features of the metabolic syndrome regardless of alcohol consumption. Hepatology 2004;39:1639–1646. [13] Angulo P, Hui JM, Marchesini G, Bugianesi E, George J, Farrell GC, et al. The NAFLD fibrosis score: a noninvasive system that identifies liver fibrosis in patients with NAFLD. Hepatology 2007;45:846–854. [14] Guha I, Parkes J, Roderick P, Chattopadhyay D, Cross R, Harris S, et al. Noninvasive markers of fibrosis in non-alcoholic fatty liver disease: Validating the European Liver Fibrosis Panel and exploring simple markers. Hepatology 2008;47:455–460. [15] Ford ES, Giles WH, Dietz WH. Prevalence of the metabolic syndrome among US adults: findings from the third National Health and Nutrition Examination Survey. JAMA 2002;287:356–359. [16] Brunt EM. Alcoholic and non-alcoholic steatohepatitis. Clin Liver Dis 2002;6:399–420. [17] Lasfargues G, Vol S, Le Clesiau H, Bedouet M, Hagel L, Constans T, et al. Validity of a short self-administered dietary questionnaire compared with a dietetic interview. Presse Med 1990;19:953–957. [18] Anonymous. Report of the expert committee on the diagnosis and classification of diabetes mellitus. Diabetes Care 2003;26: S5–S20.

[19] The French METAVIR Cooperative Study Group. Intraobserver and interobserver variations in liver biopsy interpretation in patients with chronic hepatitis C. Hepatology 1994;20: 15–20. [20] Rousselet MC, Michalak S, Dupre F, Croue A, Bedossa P, SaintAndre JP, et al. Sources of variability in histological scoring of chronic viral hepatitis. Hepatology 2005;41:257–264. [21] Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem 2003;49:7–18. [22] Greenhalgh. How to read a paper Papers that report diagnostic or screening tests. Br Med J 1997;315:540–543. [23] Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol 2003;56:1129–1135. [24] Parkes J, Guha IN, Roderick P, Rosenberg W. Performance of serum marker panels for liver fibrosis in chronic hepatitis C. J Hepatol 2006;44:462–474. [25] DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837–845. [26] Halfon P, Bacq Y, De Muret A, Penaranda G, Bourliere M, Ouzan D, et al. Comparison of test performance profile for blood tests of liver fibrosis in chronic hepatitis C. J Hepatol 2007;46:395–402. [27] Cale`s, P, de Ledinghen V, Halfon P, Bacq Y, Leroy V, Boursier J. Evaluating accuracy and increasing the reliable diagnosis rate of blood tests for liver fibrosis in chronic hepatitis C. Liver International. in press. [28] Michalak S, Rousselet MC, Bedossa P, Pilette C, Chappard D, Oberti F, et al. Respective roles of porto-septal fibrosis and centrilobular fibrosis in alcoholic liver disease. J Pathol 2003;201:55–62. [29] Ekstedt M, Franzen LE, Mathiesen UL, Thorelius L, Holmqvist M, Bodemar G, et al. Long-term follow-up of patients with NAFLD and elevated liver enzymes. Hepatology 2006;44:865–873. [30] Pilette C, Rousselet MC, Bedossa P, Chappard D, Oberti F, Rifflet H, et al. Histopathological evaluation of liver fibrosis: quantitative image analysis vs semi-quantitative scores Comparison with serum markers. J Hepatol 1998;28:439–446. [31] Kleiner DE, Brunt EM, Van Natta M, Behling C, Contos MJ, Cummings OW, et al. Design and validation of a histological scoring system for non-alcoholic fatty liver disease. Hepatology 2005;41:1313–1321. [32] Goldstein NS, Hastah F, Galan MV, Gordon SC. Fibrosis heterogeneity in non-alcoholic steatohepatitis and hepatitis C virus needle core biopsy specimens. Am J Clin Pathol 2005;123:382–387. [33] Bedossa P, Dargere D, Paradis V. Sampling variability of liver fibrosis in chronic hepatitis C. Hepatology 2003;38:1449–1457. [34] Regev A, Berho M, Jeffers LJ, Milikowski C, Molina EG, Pyrsopoulos NT, et al. Sampling error and intraobserver variation in liver biopsy in patients with chronic HCV infection. Am J Gastroenterol 2002;97:2614–2618. [35] Colloredo G, Guido M, Sonzogni A, Leandro G. Impact of liver biopsy size on histological evaluation of chronic viral hepatitis: the smaller the sample, the milder the disease. J Hepatol 2003;39:239–244. [36] Nord HJ. Biopsy diagnosis of cirrhosis: blind percutaneous versus guided direct vision techniques–a review. Gastrointest Endosc 1982;28:102–104. [37] Ratziu V, Charlotte F, Heurtier A, Gombert S, Giral P, Bruckert E, et al. Sampling variability of liver biopsy in non-alcoholic fatty liver disease. Gastroenterology 2005;128:1898–1906.

P. Cale`s et al. / Journal of Hepatology 50 (2009) 165–173 [38] Merriman RB, Ferrell LD, Patti MG, Weston SR, Pabst MS, Aouizerat BE, et al. Correlation of paired liver biopsies in morbidly obese patients with suspected non-alcoholic fatty liver disease. Hepatology 2006;44:874–880. [39] Adams LA, Angulo P. Role of liver biopsy and serum markers of liver fibrosis in non-alcoholic fatty liver disease. Clin Liver Dis 2007;11:25–35. [40] Ong JP, Elariny H, Collantes R, Younoszai A, Chandhoke V, Reines HD, et al. Predictors of non-alcoholic steatohepatitis and advanced fibrosis in morbidly obese patients. Obes Surg 2005;15:310–315. [41] Bugianesi E, Manzini P, D’Antico S, Vanni E, Longo F, Leone N, et al. Relative contribution of iron burden, HFE mutations, and insulin resistance to fibrosis in non-alcoholic fatty liver. Hepatology 2004;39:179–187.

173

[42] Bugianesi E, Marchesini G, Gentilcore E, Cua IH, Vanni E, Rizzetto M, et al. Fibrosis in genotype 3 chronic hepatitis C and non-alcoholic fatty liver disease: Role of insulin resistance and hepatic steatosis. Hepatology 2006;44:1648–1655. [43] Kaneda H, Hashimoto E, Yatsuji S, Tokushige K, Shiratori K. Hyaluronic acid levels can predict severe fibrosis and platelet counts can predict cirrhosis in patients with non-alcoholic fatty liver disease. J Gastroenterol Hepatol 2006;21:1459–1465. [44] Iacobellis A, Marcellini M, Andriulli A, Perri F, Leandro G, Devito R, et al. Non-invasive evaluation of liver fibrosis in paediatric patients with non-alcoholic steatohepatitis. World J Gastroenterol 2006;12:7821–7825. [45] Cales P, Veillon P, Konate A, Mathieu E, Ternisien C, Chevailler A, et al. Reproducibility of blood tests of liver fibrosis in clinical practice. Clin Biochem 2008;41:10–18.