Prediction of fatty liver disease using machine learning algorithms

Prediction of fatty liver disease using machine learning algorithms

Computer Methods and Programs in Biomedicine 170 (2019) 23–29 Contents lists available at ScienceDirect Computer Methods and Programs in Biomedicine...

846KB Sizes 0 Downloads 191 Views

Computer Methods and Programs in Biomedicine 170 (2019) 23–29

Contents lists available at ScienceDirect

Computer Methods and Programs in Biomedicine journal homepage: www.elsevier.com/locate/cmpb

Prediction of fatty liver disease using machine learning algorithms Chieh-Chen Wu a,e, Wen-Chun Yeh b, Wen-Ding Hsu c, Md. Mohaimenul Islam a,e, Phung Anh (Alex) Nguyen e, Tahmina Nasrin Poly a,e, Yao-Chin Wang a,e,d, Hsuan-Chia Yang e, Yu-Chuan (Jack) Li a,e,f,∗ a

Graduate Institute of Biomedical Informatics, College of Medicine Science and Technology, Taipei Medical University, Taipei, Taiwan Division of Hepatogastroenterology, Department of Internal Medicine, New Taipei City Hospital, Taiwan c Division of Nephrology, Department of Internal Medicine, New Taipei City Hospital, Taiwan d Department of Emergency, Min-Sheng General Hospital, Taoyuan, Taiwan e International Center for Health Information Technology(ICHIT), Taipei Medical University, Taipei, Taiwan f Department of Dermatology, Wan Fang Hospital, Taipei, Taiwan b

a r t i c l e

i n f o

Article history: Received 31 October 2018 Revised 21 December 2018 Accepted 28 December 2018

Keywords: Fatty liver disease Machine learning Classification model Random forest

a b s t r a c t Background and objective: Fatty liver disease (FLD) is a common clinical complication; it is associated with high morbidity and mortality. However, an early prediction of FLD patients provides an opportunity to make an appropriate strategy for prevention, early diagnosis and treatment. We aimed to develop a machine learning model to predict FLD that could assist physicians in classifying high-risk patients and make a novel diagnosis, prevent and manage FLD. Methods: We included all patients who had an initial fatty liver screening at the New Taipei City Hospital between 1st and 31st December 2009. Classification models such as random forest (RF), Naïve Bayes (NB), artificial neural networks (ANN), and logistic regression (LR) were developed to predict FLD. The area under the receiver operating characteristic curve (ROC) was used to evaluate performances among the four models. Results: A total of 577 patients were included in this study; of those 377 patients had fatty liver. The area under the receiver operating characteristic (AUROC) of RF, NB, ANN, and LR with 10 fold-cross validation was 0.925, 0.888, 0.895, and 0.854 respectively. Additionally, The accuracy of RF, NB, ANN, and LR 87.48, 82.65, 81.85, and 76.96%. Conclusion: In this study, we developed and compared the four classification models to predict fatty liver disease accurately. However, the random forest model showed higher performance than other classification models. Implementation of a random forest model in the clinical setting could help physicians to stratify fatty liver patients for primary prevention, surveillance, early treatment, and management. © 2018 Elsevier B.V. All rights reserved.

1. Introduction Fatty liver disease (FLD) is a common clinical problem; it is also associated with high morbidity and mortality. FLD eventually leads to noncholestatic cirrhosis and hepatocellular carcinoma [1]. Additionally, FLD has been increasing in parallel with the prevalence of diabetes, metabolic syndrome and obesity [2]. Higher prevalence of FLD has appeared as a greater economic burden. Therefore, accurate identification of individuals at risk and early recognition of FLD could offer immense benefits for diagnosis, preventive or even proper treatment. Over the past decade, the biopsy has been used ∗ Corresponding author at: College of Medicine Science and Technology (CoMST), Taipei Medical University, Department of Dermatology, Wan Fang Hospital, 250Wuxing Street, Xinyi District, Taipei 11031, Taiwan. E-mail address: [email protected] (Y.-C. (Jack) Li).

https://doi.org/10.1016/j.cmpb.2018.12.032 0169-2607/© 2018 Elsevier B.V. All rights reserved.

to stratify patients, and considered as a diagnostic reference standard for the assessment of fatty infiltration of the liver. However, this method is highly invasive and costly; it also might trigger side effects and sampling errors during the application of this method. Although, ultrasonography is using as a functional tool for FLD diagnosis with higher accuracy, while identifying accuracy is highly operator dependent [3]. Machine learning (ML) is a field of computer science that uses computer algorithms to identify patterns in large data, and also assist to predict the various outcome based on data [4]. ML techniques have emerged as a potential tool for prediction and decision-making in a multitude of disciples [5]. Due to the availability of clinical data, ML has been playing a critical role in medical decision making as well [6,7]. Developing a machine learning model would serve as a valuable aid to identify disease and make a real-time effective clinical decision. It would also allow for op-

24

C.-C. Wu, W.-C. Yeh and W.-D. Hsu et al. / Computer Methods and Programs in Biomedicine 170 (2019) 23–29

timization of hospital resources by classifying right patients with significant several risk factors earlier. Nowadays, many studies have already been investigated medical imaging techniques such as ultrasound (US), computed tomography (CT), and magnetic resonance imaging (MRI) for fatty liver disease classification. Ultrasound imaging is noninvasive, inexpensive, easy to operate, and portable. Andrade et al. [8] evaluated the performance of three classifiers for diagnosis of liver steatosis, using several extracted features from ultrasound images. Ribeiro and Sanches [9] utilized the anatomic and echogenic information of normal liver and fatty liver ultrasonic images, and used Bayesian framework on the extracted feature parameters for fatty liver diagnosis. Owjimehr et al. [10] demonstrated an automatic ROI selection and hierarchical classification method to discriminate normal and three stages of fatty liver, steatosis, fibrosis, and cirrhosis. Their algorithms discriminated the normal patients from fatty liver patients in the first step by the use of wavelet packet transform (WPT) features, and classified steatosis and the other stages of the fatty liver in the second step by a fusion of WPT and gray-level difference statistical (GLCM) features. Moreover, Li et al. [11] analyzed B-mode ultrasonic images texture features of fatty liver, composed near-field light-spot density, near-far-field grayscale ratio, grayscale co-occurrence matrix, and neighborhood gray-tone difference matrix, and used support vector machine (SVM) as the classification algorithm. However, the diagnosis of fatty liver ultrasonic images varies due to use of different ultrasound equipment’s, poor quality of images, and physical differences of patients. However, a prediction model based on available clinical variables would help clinicians to correctly identify and make an actionable decision of prevention, early diagnosis and targeted intervention. To date, the benefits of utilizing classification models with data from the electronic medical record to predict FDL have not been evaluated on a large scale. We, therefore, aimed to construct a predicting model for fatty liver disease using the modern technique of machine learning, especially in classification approach. To our knowledge, this is the most comprehensive study used machine learning models to predict FDL. 2. Methods 2.1. Study population We collected data from New Taipei City Municipal Hospital Banqiao Branch under a liver protection project. We included all patients who had received initial fatty liver screening in December 2009. We excluded patients if they a) were ≤ 30 years, b) had an incomplete examination process c) were suspected case of fatty liver under ultrasonography test. This study was reviewed and approved by the institutional ethical committee board of Taipei Medical University and Taipei Medical University Hospital, conducted in accord with the ethical guidelines of the Declaration of Helsinki of the world medical association. 2.2. Clinical data and outcomes We collected all patient’s demographic, clinical data from the electronic medical records at the time of screening. All fatty and non-fatty liver patients were identified by the abdominal ultrasonography. Nine predictor variables were collected from both FLD and Non-FLD patients and used those variables in our proposed models. Those variables were age, gender, systolic blood pressure (SBP), diastolic blood pressure (DBP), abdominal girth (AG), glucose AC, triglyceride, HDL-C, SGOT-AST, and SGPT-ALT. The classification models were used to identify FLD risk patients which would facilitate personalized medicine in fatty liver patients in the future.

Fig. 1. Machine learning pathway it involves automated feature selection by information gain ranking building for classification model with k-fold cross validation.

2.3. Machine learning The primary objective of this study was, to select prognostic factors for predicting fatty liver disease using classification machine learning models. In this process, we divided our machine learning approach into four steps: 1. Data preprocessing: it includes data cleaning, resolves missing data, data transformation, and data imbalance reduction 2. Variables selection: a process of selecting best subset of the relevant variable for use in model building (help to reduce overfitting, improve accuracy, and reduce the training time). 3. Model building: select a suitable classification for higher prediction 4. Cross-validation: to select entire dataset into two separate group (Kn − 1 :1) for training and testing. Fig. 1 illustrates the ML process. 2.3.1. Data preprocessing A total of 577 patients were included in this study, of whom 377 patients were diagnosed with the fatty liver disease. As, the data preprocessing is an important step in machine learning; we, therefore removed all those variables that contained more than 50% missing value. In addition, data imputation and normalization were performed to get a high-quality dataset. We also used Synthetic Minority Over-Sampling Technique (SMOTE) method to generate synthesis samples for the minority class and balance the positive and negative values of the training set. 2.3.2. Variable selection In this process, we assessed the weight of each variable by the information gain ranking process. It helped to evaluate the effectiveness of included variables in the training dataset. We included only those variables into the final model building whose score was > 0 in the information gain ranking (Fig. 2). We used forward selection model for variable reduction process in our current study. 2.3.3. Model building The predictive classifier models were developed for accurately identify FLD patients. The classification models such as random forest (RF), artificial neural network (ANN), Naïve Bayes (NB), and lo-

C.-C. Wu, W.-C. Yeh and W.-D. Hsu et al. / Computer Methods and Programs in Biomedicine 170 (2019) 23–29

25

Fig. 2. Feature selection. Information gain ranking was used to evaluate the worth of each variable by measuring the entropy gain with respect to the outcome, and then rank attributes by their individual evaluations (left to right).

gistic regression (LR) was used to developed prediction models. We considered these four models due to their following characteristic. Random forest (RF) is an ensemble classification algorithm that is composed of a multitude of decision trees developed by Leo Breiman and Adele Culter in 1999 [12]. The tree is built independently by applying the general technique of bootstrap aggregating (i.e. bagging) and is randomly selected sample for the training set. The final result is determined by a simple majority vote of all trees. RF has proven to be a highly accurate algorithm in various fields including medical diagnosis. Artificial neural networks (ANN) are computational models that emulate the biological neural networks. It is very powerful nonlinear modelling which is already proven for accurate predictions in many CDS [13]. This model consists of a number of artificial neural units called “perceptron” [14]. ANN is quite similar to the biological neural cell where the signal is transmitted into neuron through dendrite. It simulates the signal transmission through an input layer to several hidden layers, and finally an output layer. However, each layer comprises many perceptron, and the perceptron between layers are connected by different weights that can be adjusted in training the algorithms. In this repetition process, it automatically learns from the training dataset with a number of samples until each input matches to corrected output in order to achieve the best prediction. Naive Bayes is a generative model that makes dealing with missing values a lot easier [15]. It is a classification model which predicts a class label y given a feature vector x = [x1 , x2 , x3 …xd ]T and helps to make an inference on a new sample xnew = [x1 , x2 , x3 …xd ]T with a missing feature xm . It is yet very powerful model that is used to return not the prediction but also the degree of certainty. It is very easy to understand and implement. Logistics regression (LR) is one of the discrete choice models which belongs to multivariate analysis. It is widely and most commonly-used method of empirical analysis in sociology, biostatistics, clinical medicine, quantitative psychology, econometrics, marketing, and often uses to compare with machine learning studies [16]. It has many advantage including high power and accuracy. The equation of logistic regression:

y=

e(b0 +b1 ∗x ) . (1 − e(b0 +b1 ∗x)

Here y is the predicted output, b0 is the bias or intercept term and b1 is the coefficient for the single input value (x). Each column

in the study input data is associated b coefficient (a constant real value) that is learned from training data. 2.3.4. Cross validation We assessed the performance and general error of entire classification models by using stratified k-fold cross-validation (Fig.1). This is widely used and preferred validation technique in machine learning due to differ from the conventional split sample approach. This approach helped to: 1) reduces the variance in prediction error; 2) maximizes the use of data for both training and validation, without overfitting or overlap between the test and validation data; and 3) guards against testing hypothesis suggested by arbitrarily split data. The dataset was randomly divided into equal k-fold (3, 5, 10) with approximately the same number of events. In this process, one-fold used as the validation set, and the remaining folds as the training set. Therefore, each fold was used once for testing and training. The validation results from k (3, 5, 10) experimental models were then combined to provide a measure of the overall performance. 2.4. Statistical analysis Continuous variables were presented as the mean ± standard deviation or median which is analyzed by unpaired t-test. Categorical variables were presented as absolute (n), and relative (%) frequency that was analyzed by chi-square test or Fisher’s exact test, as appropriate. The performance of classification models to predict fatty liver prediction was measured by the receiver-operating curve. We also calculated the accuracy (AC), sensitivity (SN), specificity (SP) with 95% confidence interval. R software (Version 3.4.2) and Weka (V.3.9) was used to construct a model by using classification models [17]. Weka contains a collection of visualization tools and graphical user interface for easily performing algorithms. 2.5. Model assessment The confuse matrix was used to determine the relationship between the actual values and predicted values [18]. Table 1 shows the structure of confusion matrix. Accuracy: Model accuracy defines as the total positive instances of the model are divided by the total number of instances. Accuracy parameter provides the percentage of correctly classified instances.

26

C.-C. Wu, W.-C. Yeh and W.-D. Hsu et al. / Computer Methods and Programs in Biomedicine 170 (2019) 23–29 Table 1 Confusion matrix representation.

Predicted true (+) Predicted false (−)

Positive

Negative

TP FP

TN FN

The accuracy of model is defined as

Accuracy =

TP + TN TP + FP + TN + FN

Sensitivity: Sensitivity is used to determine the degree of the attribute to correctly classify the person with diseases and is defined as

Sensitivity =

TP TP + FN

Specificity: Specificity is used to determine the degree of the attribute to correctly classify the person without diseases and is defined as-

Specificity =

TN TN + FP

The sensitivity, specificity is also known as quality parameters and used to define the quality of the predicted class. To determine the goodness of the medical diagnosis model basically three parameters are used, these three parameters are accuracy, sensitivity and specificity. 3. Results 3.1. Patient’s characteristics We identified 700 participants who had received the initial fatty liver screening in New Taipei City Municipal Hospital Banqiao Branch from 1st December to 31st December in 2009. A total of 22 patients who were aged ≤ 30 years were excluded. In addition, 123 patients were excluded due to incomplete examination and suspicion of fatty liver by ultrasonography test. However, 577 patients who met all inclusion criteria were used for model development (Supplementary Fig. 1). Demographic and clinical characteristics of overall 577 patients are summarized in Table 2. The age of patients with FLD was 54.1 ± 12.6 years, and age of the non-FLD group was 49.4 ± 15.2) years. There were 207 (54.9%), and 66 (33%) males for FLD and non-FLD groups (p < 0.0 0 01), respectively. The mean value of other variables in the FLD group was significantly higher than that of the non-FLD group except HDL-C, SBP, and DBP. 3.2. Model performance Table 3 shows the performance of classification models. The area under ROC of RF with 3, 5 and 10 cross-validation was 0.915, 0.922, and 0.925 respectively. In addition, the accuracy of RF, was 84.29, 86.35, and 86.48 respectively. AUROC was plotted to compare different classification models. Fig 3 summarized the ROC curves of four different models. 4. Discussion The results of this study suggest that machine learning models are well suited for meaningful prediction of FLD. The random forest model showed better performance among other prediction classification models. Fatty liver disease is a common complication of critical illness associated with higher mortality and morbidity. In recent years, traditional diagnostic and treatment plan have been contributing to an improved understanding of FLD. But

it sometimes drives adverse effects and waste resources. However, machine learning models always provide a significant insight compared with traditional statistical models. We, therefore, developed and evaluated a classification machine learning model to predict FLD. Random forest model with 10-fold cross-validation showed higher performance with C statistic 0.925. To our knowledge, this is the first study attempted to predict FLD using various classification machine learning models. Although, implementation and evaluation of machine learning models have rapidly been increased in recent years, a promising model has not been applied to predict FLD in routine clinical care. Hence, the performances of different models are the most important consideration, along with the easy to use and the interpretation of the models. Our finding suggests that random forest model would be best option to implement a system for predicting fatty liver disease patients appropriately and effectively. Application of machine model in analyzing the clinical variables from electronic medical record is an efficient approach for discovering the existing relationships among variables that is ordinarily difficult to detect. Random forest model has shown that it can be exploited to extract implicit, useful, nontrivial associations even from factors that are not direct or explicit indicators of the class. However, early stage prediction of risk for developing fatty liver disease is not enough, and clinicians may also want to know the main predictors that are responsible for developing fatty liver disease. In this study, we also ranked all predictors using information gain ranking; abdominal griddle was the most potential factor that was followed by GPT_ALT, triglyceride, HDL_C, Glucose_AC. However, BMI was a potential risk factor that was also supported by several studies [19,20]. Lin et al. revealed a 1.29 fold higher risk of fatty liver disease among patients with higher BMI [21]. Several epidemiological studies reported that GPT is closely associated with accumulation of fat in the liver disease [22,23]. However, age, sex, TG, ALT, GOT, GPT, AST/ALT ratio, total bilirubin, and fasting blood glucose was found to be associated with FLD and had been used in various diagnostic panels [24,25]. Additionally, a significant amount of studies described that FLD patients were asymptomatic, and pointed out specific cause of fatty liver disease. They mentioned that elevation of circulatory concentrations biomarkers such as serum glutamic oxaloacetic transaminase (SGOT), serum glutamic pyruvic transaminase (SGPT)are mainly responsible for hepatic damage [26,27]. Recently, several studies have reported classification results to correctly identify fatty liver patients and non-fatty liver patients. Ma et al. [28] developed machine learning techniques to evaluate the optimal predictive clinical model of NAFLD. Among the 10,508 enrolled subjects, 2522 (24%) met the diagnostic criteria of NAFLD. A 10-fold cross-validation was used in the classification, and the Bayesian network model achieved the best performance (accuracy: 82.92%, sensitivity: 0.67, specificity: 0.878, precision: 0.636, and F-measure: 0.655) from among the 11 different techniques. Moreover, Islam et al. [29] constructed four classification models [Random Forest (RF), Support Vector Machine (SVM), Artificial Neural Network (ANN), and Logistic Regression (RF)] to predict fatty liver disease; logistic regression technique provides a better result (Accuracy 76.30%, sensitivity 74.10%, and specificity 64.90%) among all others machine learning algorithms. Another classification model was developed by Birjandi et al. [30], identifying the most important factors influencing NAFLD using a classification tree (CT) to predict the probability of NAFLD. However, main potential variables for predicting NAFLD based on the CT was BMI, WHR, triglycerides, glucose, SBP, and alanine aminotransferase, and model achieved a prediction accuracy 80% with the area under the receiver operating characteristic (ROC) curve 78%. Furthermore, Jamali et al. [31] developed a model based on serum adipokines for discriminating NAFLD from healthy individ-

C.-C. Wu, W.-C. Yeh and W.-D. Hsu et al. / Computer Methods and Programs in Biomedicine 170 (2019) 23–29

27

Table 2 Demographic characteristics of participants. Variables Age Mean (SD) Gender, N (%) Male Systolic blood pressure (mmHg) Diastolic blood pressure (mmHg) Abdominal Girdle Triglyceride (mg/dL) HDL-C (mg/dL) Glucose AC (mg/dL) GOT-AST GPT-ALT

Fatty liver n = 377

Non-fatty liver n = 200

54.1 (12.6)

49.4 (15.2)

207 (54.9) 130.2 (18.8) 80.1 (11.2) 85.8 (11.2) 146 (83.8) 50.9 (13.1) 105.4 (28.3) 29.4 (15.2) 35.7 (24.6)

66 (33) 119.5 (17.1) 74.7 (11.1) 73.5 (7.4) 87.9 (44.8) 64.7 (15.4) 93.9 (14.4) 24.3 (11.2) 20.6 (14.1)

p-value 0.001 <0.0 0 01 0.203 0.638 0.001 <0.0 0 01 0.037 <0.0 0 01 0.003 <0.0 0 01

Table 3 Summary of four classification models with 3, 5, 10 cross-validation. Model 3 fold cross validation

RF LR ANN NB

5 fold cross validation

10 fold cross validation

AUROC AC (%) SN (95% CI)

SP (95% CI)

AUROC AC

SN (95% CI)

SP (95% CI)

AUROC AC

SN (95% CI)

SP (95% CI)

0.915 0.892 0.903 0.856

83.41 (79.48–86.86) 81.97 (77.93–85.55) 82.90 (78.95–86.37) 72.65 (68.48–76.55)

0.922 0.888 0.881 0.852

86.92 (83.04–90.20) 83.10 (78.83–86.82) 85.44 (81.06–89.14) 84.27 (79.52–88.29)

85.85 (82.10–89.08) 81.49 (77.42–85.11) 76.79 (72.66–80.57) 72.30 (68.11–76.22)

0.925 0.888 0.895 0.854

87.16 (83.29 −90.41) 83.43 (79.19–87.11) 81.55 (77.24–85.35) 84.38 (79.66–88.37)

85.89 (82.14–89.11) 81.93 (77.88–85.51) 82.13 (78.04–85.75) 72.60 (88.41–76.51)

84.29 82.75 84.17 76.70

85.32 83.66 85.67 83.79

(81.24–88.81) (79.43–87.32) (81.60–89.14) (79.04–87.84)

86.35 82.28 80.30 76.70

86.48 82.65 81.85 76.96

Note: AC = Accuracy, SN = Sensitivity SP = Specificity, AUROC = Area under receiver operating curve.

Fig. 3. Receiver-Operating Characteristic curve for prediction of fatty liver. Random models showed better performance than other three classification models.

uals and nonalcoholic steatohepatitis (NASH) from simple steatosis. In NAFLD discriminant score, 86.4% of original grouped cases were correctly classified. Yip et al. [32] developed and validated a laboratory parameter-based machine learning model to detect NAFLD for the general population. They randomly divided 922 subjects from a population screening study into training and validation groups, and 23 routine clinical and laboratory parameters after elastic net regulation. However, their model achieved AUROC of 0.87 (95% CI 0.83–0.90) and 0.88 (0.84–0.91) in the training and validation groups respectively. The details of the parameters used in machine learning performance with other studies are provided in Table 4.

There are many kinds of machine learning algorithms have been developed along with the most popular Bayesian algorithm, it is hard to make a proper algorithm for clinical decision making and clinical practices [33]. Therefore, model performance along with interpretation is considering for appropriate clinical decision. As included models in our study, particularly random forest showed better prediction so that it could effectively identify fatty liver disease (FLD) for anyone by initial screening without using abdominal ultrasonography. Additionally, this model would provide an easy, fast, low cost, and non-invasive method to accurately diagnose FLD [34]. A total of ten predictors that used to predict fatty liver disease might be considered as a robust and concise evidence.

28

C.-C. Wu, W.-C. Yeh and W.-D. Hsu et al. / Computer Methods and Programs in Biomedicine 170 (2019) 23–29 Table 4 Performance comparison between proposed model and others. Author

Year

Country

Source of data

Fatty/ Non-fatty

Validation method

ACC (%)

SEN (%)

SPE (%)

AUC (%)

Ma Islam Birjandi Jamali Yip Proposed

2018 2018 2016 2016 2017 2018

China Taiwan Iran Iran Hong Kong Taiwan

Hospital Hospital Hospital Hospital Hospital Hospital

2522/7986 593/401 359/1241 54/54 264/658 377/200

10-fold 10-fold N/A N/A N/A 10-fold

82.92 70.70 80 N/A N/A 86.48

67.5 74.1 74 91 92 87.16

87.8 64.90 83 83 90 85.89

N/A 76.30 78 84.4 87 0.925

(LT, PE, LU) (LT) (LT) (LT) (LT) (LT)

Note: LT = Laboratory test, PE = Physical examination, LU = Liver ultrasonography, N/A = Not applicable.

The healthcare data has been increasing day by day and machine learning allows massive amounts of data to be analyzed rapidly [35]. Therefore, it is an opportunity to apply machine learning models to the care of individual patients in medical practice. Using appropriate machine learning prediction models, physicians could be able to extract the minimum data necessary to make a therapeutic decision [36]. Our model has the potential to early FLD detection that would assist to improve precise and appropriate treatment pattern. It is very important for physicians to know about the most predictive variables for the best treatment outcome. Patient’s baseline characteristics might be the strongest predictors of FDL for evaluation of the individual patient level [37]. Therefore, we carefully adopted a feature selection strategy and used k-fold cross-validation to repeatedly screen potential variables. Data were included from a medical center EMR without additional clinical assessments, and our high-performance prediction model could be easily integrated into EMR to identify FLD risk. Our prediction model could help to identify FLD patients that might significantly impact on treatment pattern. Early prediction using this model might bring benefits from treatment reduction, and medical cost decrease.

the global burden of FLD. Future studies are needed to validate our model to predict FLD in various types of dataset. Compliance with ethical standards Conflict of interest None. Ethical approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. Informed consent None. Funding

4.1. Limitations This present study has several limitations that need to be addressed. First, we only collected data from one medical center. But, multicenter dataset and external validation could have better performance and more reliable. Additionally, validation of the derived risk score will be required in future. Second, we evaluated only 577 patient’s information that was considered as sample size although most of the variables were statistically significant. We also used k-fold cross-validation which is reliable for small data set and help to reduce significant errors. In this method, the data set are selected randomly into ten groups, and all groups are used for both training and validation [38]. It gives nearly unbiased estimates of the prediction error even if the data size is small [39]. Third, only nine variables were used to predict fatty liver disease but it could assist physicians to take clinical decision precisely. Fourth, we could not classify patients into fatty and non-fatty liver disease patients due to data insufficiency. Patients BMI information was not also included in our study. Because our electronic medical record database does not contain this information. Finally, we used a classification approach for automatic ML variables integration, but deep learning approach could have been used to improve better prediction. 5. Conclusion The findings of this study show that machine learning classification model especially the random forest model accurately predicts fatty liver disease patient using minimum clinical variables. This method may lead to greater insights in the real world clinical practice which would assist physicians to effectively identify FLD for novel diagnosis, preventive and therapeutic purpose to mitigate

This work was financially supported by the Higher Education Sprout Project of the Ministry of Education (MOE) in Taiwan (TMU DP2-107-21121-01-A06). Acknowledgment We would like to thanks our colleague who is a Native English Speaker for editing our manuscript. Supplementary material Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.cmpb.2018.12.032. References [1] M. Lazo, J.M. Clark, in: The Epidemiology of Nonalcoholic Fatty Liver Disease: A Global Perspective: Seminars in Liver Disease, 28, © Thieme Medical Publishers, 2008, pp. 339–350. [2] M.H. Le, P. Devaki, N.B. Ha, D.W. Jun, H.S. Te, R.C. Cheung, M.H. Nguyen, Prevalence of non-alcoholic fatty liver disease and risk factors for advanced fibrosis and mortality in the United States, PLoS One 12 (2017) e0173499. [3] Q.M. Anstee, G. Targher, C.P. Day, Progression of NAFLD to diabetes mellitus, cardiovascular disease or cirrhosis, Nat. Rev. Gastroenterol. Hepatol. 10 (2013) 330–344. [4] M. Motwani, D. Dey, D.S. Berman, G. Germano, S. Achenbach, M.H. Al-Mallah, D. Andreini, M.J. Budoff, F. Cademartiri, T.Q. Callister, Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: a 5-year multicentre prospective registry analysis, Eur. Heart J. 38 (2016) 500–507. [5] Sani A. Machine Learning for Decision Making, Université de Lille 1, 2015, [6] W. Raghupathi, V. Raghupathi, Big data analytics in healthcare: promise and potential, Health Inf. Sci. Syst. 2 (2014) 3. [7] P. Groves, B. Kayyali, D. Knott, S.V. Kuiken, The ’Big Data’ Revolution in Healthcare: Accelerating Value and Innovation, 2016. [8] A. Andrade, J.S. Silva, J. Santos, P. Belo-Soares, Classifier approaches for liver steatosis using ultrasound images, Procedia Technol. 5 (2012) 763–770.

C.-C. Wu, W.-C. Yeh and W.-D. Hsu et al. / Computer Methods and Programs in Biomedicine 170 (2019) 23–29 [9] R. Ribeiro, J. Sanches, Fatty liver characterization and classification by ultrasound, in: Iberian Conference on Pattern Recognition and Image Analysis, Springer, 2009, pp. 354–361. [10] M. Owjimehr, H. Danyali, M.S. Helfroush, A. Shakibafard, Staging of fatty liver diseases based on hierarchical classification and feature fusion for back-scan— converted ultrasound images, Ultrason. Imaging 39 (2017) 79–95. [11] G. Li, Y. Luo, W. Deng, X. Xu, A. Liu, E. Song, Computer aided diagnosis of fatty liver ultrasonic images based on support vector machine: engineering in medicine and biology society, in: 2008 EMBS 2008 30th Annual International Conference of the IEEE, IEEE, 2008, pp. 4768–4771. [12] L. Breiman, in: Random Forests, 45, Machine learning, 2001, pp. 5–32. [13] M.C. Papadopoulos, P.M. Abel, D. Agranoff, A. Stich, E. Tarelli, B.A. Bell, T. Planche, A. Loosemore, S. Saadoun, P. Wilkins, A novel and accurate diagnostic test for human African trypanosomiasis, Lancet 363 (2004) 1358–1363. [14] F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain, Psychol. Rev. 65 (1958) 386. [15] I. Rish, An empirical study of the naive Bayes classifier: IJCAI 2001 workshop on empirical methods in artificial intelligence, IBM 3 (2001) 41–46. [16] S. Dreiseitl, L. Ohno-Machado, Logistic regression and artificial neural network classification models: a methodology review, J. Biomed. Inform. 35 (2002) 352–359. [17] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA data mining software: an update, ACM SIGKDD Explor. Newslett. 11 (2009) 10–18. [18] R. Kohavi, F. Provost, Glossary of terms, Mach. Learn. 30 (1998) 271–274. [19] A.K. Loomis, S. Kabadi, D. Preiss, C. Hyde, V. Bonato, M. St. Louis, J. Desai, J.M. Gill, P. Welsh, D. Waterworth, Body mass index and risk of nonalcoholic fatty liver disease: two electronic health record prospective studies, J. Clin. Endocrinol. Metab. 101 (2016) 945–952. [20] Q. Pang, J.-Y. Zhang, S.-D. Song, K. Qu, X.-S. Xu, S.-S. Liu, C. Liu, Central obesity and nonalcoholic fatty liver disease risk after adjusting for body mass index, World J. Gastroenterol. 21 (2015) 1650. [21] Y.-C. Lin, S.-C. Chou, P.-T. Huang, H.-Y. Chiou, Risk factors and predictors of non-alcoholic fatty liver disease in Taiwan, Ann. Hepatol. 10 (2011) 125–132. [22] G. Marchesini, S. Avagnina, E. Barantani, A. Ciccarone, F. Corica, E. Dall’Aglio, R. Dalle Grave, P. Morpurgo, F. Tomasi, E. Vitacolonna, Aminotransferase and gamma-glutamyltranspeptidase levels in obesity are associated with insulin resistance and the metabolic syndrome, J. Endocrinol. Invest. 28 (2005) 333–339. [23] R.K. Schindhelm, M. Diamant, J.M. Dekker, M.E. Tushuizen, T. Teerlink, R.J. Heine, Alanine aminotransferase as a marker of non-alcoholic fatty liver disease in relation to type 2 diabetes mellitus and cardiovascular disease, Diabetes Metab. Res. Rev. 22 (2006) 437–443.

29

[24] M.G. Sanal, Biomarkers in nonalcoholic fatty liver disease-the emperor has no clothes? World J. Gastroenterol. 21 (2015) 3223. [25] L. Castera, V. Vilgrain, P. Angulo, Noninvasive evaluation of NAFLD, Nat. Rev. Gastroenterol. Hepatol. 10 (2013) 666–675. [26] Z.-w. Chen, L.-y. Chen, H.-l. Dai, J.-h. Chen, L.-z. Fang, Relationship between alanine aminotransferase levels and metabolic syndrome in nonalcoholic fatty liver disease, J. Zhejiang Univ.-Sci. B 9 (2008) 616–622. [27] J.M. Clark, A.M. Diehl, Defining nonalcoholic fatty liver disease: implications for epidemiologic studies, Gastroenterology 124 (2003) 248–250. [28] H. Ma, C.-f. Xu, Z. Shen, C.-h. Yu, Y.-m. Li, Application of machine learning techniques for clinical predictive modeling: a cross-sectional study on nonalcoholic fatty liver disease in China, BioMed Res. Int. 2018 (2018). [29] M.M. Islam, C.C. Wu, T.N. Poly, H.C. Yang, Y.C. Li, Applications of machine learning in fatty live disease prediction, in: 40th Medical Informatics in Europe Conference, MIE 2018, IOS Press, 2018, pp. 166–170. [30] M. Birjandi, S.M.T. Ayatollahi, S. Pourahmad, A.R. Safarpour, Prediction and diagnosis of non-alcoholic fatty liver disease (NAFLD) and identification of its associated factors using the classification tree method, Iran. Red Crescent Med. J. 18 (2016). [31] R. Jamali, A. Arj, M. Razavizade, M.H. Aarabi, Prediction of nonalcoholic fatty liver disease via a novel panel of serum adipokines, Medicine 95 (2016). [32] T.F. Yip, A. Ma, V.S. Wong, Y.K. Tse, H.Y. Chan, P.C. Yuen, G.H. Wong, Laboratory parameter-based machine learning model for excluding non-alcoholic fatty liver disease (NAFLD) in the general population, Aliment. Pharmacol. Therapeutics 46 (2017) 447–456. [33] J. Wu, J. Roy, W.F. Stewart, Prediction modeling using EHR data: challenges, strategies, and a comparison of machine learning approaches, Med. Care 48 (2010) S106–S113. [34] J. Kang, T. Lee, I. Yap, K. Lun, Analysis of cost-effectiveness of different strategies for hepatocellular carcinoma screening in hepatitis B virus carriers, J. Gastroenterol. Hepatol. 7 (1992) 463–468. [35] T. Condie, P. Mineiro, N. Polyzotis, M. Weimer, Machine learning on big data: data engineering (ICDE), in: 2013 IEEE 29th International Conference on, IEEE, 2013, pp. 1242–1244. [36] T.B. Murdoch, A.S. Detsky, The inevitable application of big data to health care, JAMA 309 (2013) 1351–1352. [37] G.K. Savova, P.V. Ogren, P.H. Duffy, J.D. Buntrock, C.G. Chute, Mayo clinic NLP system for patient smoking status identification, J. Am. Med. Inform. Assoc. 15 (2008) 25–28. [38] G. McLachlan, K.-A. Do, C. Ambroise, Analyzing Microarray Gene Expression Data, John Wiley & Sons, 2005. [39] B. Efron, Estimating the error rate of a prediction rule: improvement on cross– validation, J. Am. Statist. Assoc. 78 (1983) 316–331.