Predicting the risk of squamous dysplasia and esophageal squamous cell carcinoma using minimum classification error method

Predicting the risk of squamous dysplasia and esophageal squamous cell carcinoma using minimum classification error method

Computers in Biology and Medicine 45 (2014) 51–57 Contents lists available at ScienceDirect Computers in Biology and Medicine journal homepage: www...

385KB Sizes 0 Downloads 58 Views

Computers in Biology and Medicine 45 (2014) 51–57

Contents lists available at ScienceDirect

Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm

Predicting the risk of squamous dysplasia and esophageal squamous cell carcinoma using minimum classification error method Motahareh Moghtadaei a,1, Mohammad Reza Hashemi Golpayegani a,n, Farshad Almasganj b,2, Arash Etemadi c,d,3, Mohammad R. Akbari e,f,3, Reza Malekzadeh d a Complex Systems and Cybernetic Control Lab., Faculty of Biomedical Engineering, Amirkabir University of Technology (Tehran Polytechnic), P.O. Box 1591634311, Tehran, Iran b Speech Lab., Faculty of Biomedical Engineering, Amirkabir University of Technology (Tehran Polytechnic), P.O. Box 1591634311, Tehran, Iran c Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA d Digestive Disease Research Center, Shariati Hospital, Tehran University of Medical Sciences, P.O. Box 1411713135, Tehran, Iran e Womens College Research Institute, Womens College Hospital, University of Toronto, Toronto, Canada f Dalla Lana School of Public Health, University of Toronto, Toronto, Canada

art ic l e i nf o

a b s t r a c t

Article history: Received 3 June 2013 Accepted 18 November 2013

Early detection of squamous dysplasia and esophageal squamous cell carcinoma is of great importance. Adopting computer aided algorithms in predicting cancer risk using its risk factors can serve in limiting the clinical screenings to people with higher risks. In the present study, we show that the application of an advanced classification method, the Minimum Classification Error, could considerably enhance the classification performance in comparison to the logistic regression model and the variable structure fuzzy neural network, as the latest successful methods. The results yield the accuracy of 89.65% for esophageal squamous cell carcinoma, and 88.42% for squamous dysplasia risk prediction. & 2013 Elsevier Ltd. All rights reserved.

Keywords: Early detection of cancer Risk factors Cancer prediction Squamous dysplasia Esophageal squamous cell carcinoma Classification

1. Introduction Esophageal cancer (EC) is the eighth common cancer globally and is the sixth most common cause of cancer death worldwide. Two most common types of EC are squamous cell carcinoma (ESCC) and adenocarcinoma (EAC). EAC is the most common type in western countries while ESCC is still the predominant type worldwide, especially in the developing countries [14]. ESCC is the squamous cell carcinoma mostly seen in the middle or upper one-third of the esophagus [14]. In recent years, incidence rates for ESCC have been steadily declining in several western countries but it is increasing in certain Asian areas that stretch from northern Iran through the central Asian republics to north-central China [14]. The five-year relative survival of ESCC is less than 20% in the United States [15]. In Golestan Province, five-year relative survival of ESCC is only 3.3% with the median survival of only 7 months [7].

n

Corresponding author. Tel./fax: þ 98 21 64542389. E-mail address: [email protected] (M.R. Hashemi Golpayegani). 1 Tel./fax: þ 98 21 64542389. 2 Tel.: þ98 21 64542372; fax: þ98 21 66495655. 3 Tel.: þ98 21 82415300; fax: þ98 21 82415400.

0010-4825/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.compbiomed.2013.11.011

For these reasons, diagnosis of ESCC and squamous dysplasia, in very early stages in order to prevent tumor formation, local invasion and metastasis is of great importance [26]. Although squamous dysplasia can be detected by endoscopy and biopsy [4], but endoscopic screening of the whole population is not practical due to huge costs needed and time required for screening (endoscopy and biopsy) of the whole population. Any new method that can distinguish the people with higher risks of squamous dysplasia, and ESCC, is valuable because it can decrease the need for clinical screenings. The complexity of the tumor system concerning different levels of gene, molecular, cellular, tissue, organ, body and population interacting via various complicated signal transduction pathways [16] makes it difficult to detect people with higher cancer risks. Mathematical modeling is a very useful tool for estimating the complicated, unknown dependency between risk factors as inputs and the possibility of squamous dysplasia, and ESCC initiation as the output. In a previous study by Etemadi et al. in 2012 [7], a logistic regression model has been exploited for modeling and prediction of squamous dysplasia, and ESCC using their risk factors. The classification methods that try to calculate the probability distributions, like the method used by Etemadi et al. to

52

M. Moghtadaei et al. / Computers in Biology and Medicine 45 (2014) 51–57

estimate the regression model often, do not lead to an optimal performance especially in biological applications. That is because in complex biological applications the a priori probabilities that express the related uncertainties before the data is taken into account, and the state conditional densities of classes are unknown. On the other hand, usually the data cannot represent the posterior probability distributions either. In such models, the estimated probabilities are an approximation of the true probabilities, and the occurred modeling errors prevent maximum A-posteriori probability (MAP) rule to be implemented as accurate as it could potentially operate [16]. To avoid this problem, we recently suggested a non-statistical variable structure fuzzy neural network (FNN) as an approximator which does not need to primarily estimate probabilities [21]. To optimize the model, we adopt a hybrid global chaotic optimization algorithm (COA). This method is proved to have a higher performance than the logistic regression model, in the ESCC and dysplasia risk prediction [21]. Moreover, in this paper, a different strategy is adopted, and we are going to face the mentioned shortness of statistical classifiers via a compensated approach so-called the minimum classification error (MCE) classifier. This is an efficient approach to estimate the Gaussian mixture model (GMM) parameters to effectively be employed in sorting the subjects with higher risks of squamous dysplasia, and ESCC. However, we will see that this method is more efficient in comparison to the mentioned hybrid COA. The continuous search region of the COA algorithm finally led to the optimum response but is very time consuming. So, the computational time is reduced via the suggested hybrid method [21]. The MCE method does not face such problem and is computationally more efficient. In the following section, the database used in this study is first described; in Section 3, the MCE algorithm for classification is briefly described. Subsequently, the classification results obtained via MCE are reported in Section 4; finally, the discussion and conclusion are included in Sections 5 and 6, respectively.

2. Data description Two datasets are used in this study. The first dataset is relating to ESCC, and includes 300 biopsy-proven ESCC cases and 571 age and sex-matched neighborhood controls. This dataset is from the Golestan Case-Control Study that was conducted from 2003 to 2007 [22]. The second dataset is relating to dysplasia and includes 26 individuals with dyplastic lesions and 698 controls. This dataset is collected from individuals visiting Atrak Clinic, a gastroenterology research clinic in Gonbad City, Golestan Province, between 2002 and 2007. Video endoscopy with Lugol's iodine staining, questionnaire and biopsies helped to develop this dataset [22]. The conduct of studies performed to obtain the dataset including risk factors were reviewed and approved by the Institutional Review Boards of Tehran University Digestive Disease Research Center (DDRC), the US National Cancer Institute (NCI) and the International Agency for Research on Cancer (IARC) [22]. In each of the cases, the data set includes all known risk factors associated to dysplasia and ESCC known in the Golestan Province [1,2,12,13,22]. The risk factors are described below:

(iii) Ethnicity: Approximately half of the residents of the study area are of Turkmen ethnicity, and the rest Persians, Kurds, Turks, and others [22]. (iv) Tobacco smoking: Cumulative use (average intensity multiplied by duration of use) of tobacco is considered [22]. (v) Opium use: Cumulative use (average intensity multiplied by duration of use) of opium is considered [22]. (vi) Socio-economic status: A socio-economic score is assigned to each subject considering education level, and relatives and family structure. A wealth score is also assigned to each subject considering house ownership, house structure, house size, number of people living together in the current house, ownership of household appliances, and the duration of owning these appliances, and the most recent occupation of subjects, using a multiple correspondence analysis (MCA) [13]. (vii) Oral health: Frequency of brushing teeth is used as the most important oral hygiene factor [1]. (viii) Family history: The data relating the family history contains information on all of the first- and second-degree relatives and first cousins including the vital status of these family members and all occurrences of esophageal cancer and other cancers. In addition, current age, age at diagnosis of cancer, site of cancer, age of death, clinical and pathological diagnosis of cancer were recorded for all first-degree relatives. The presence of parental consanguinity was also recorded for cases and controls [2]. (ix) Tea temperature: The tea temperature degree, estimated by the time interval between tea being poured and drunk, was also recorded for each case [12]. (x) Water source: Subject's water source, access to piped water, and years having access to piped water were recorded [22]. In general, in order to express the mentioned risk factors in terms of numbers, there is a total of 21 numbers recorded for each subject in the ESCC and dysplasia datasets. To better describe the datasets, two important histograms for ESCC and dysplasia datasets are demonstrated in Figs. 1 and 2, respectively. Presenting the histograms of all of the data is not feasible due to the large number recorded data.

3. The MCE method The MCE algorithm is a type of discriminant training algorithms. In the commonly used model-based classification approaches, in which individual statistical models are constructed for different classes of data, the model parameters are found in a way to minimize the error between the model and the real probability distribution. So, the classification decision rule in such algorithms does not appear in the overall training phase of the models, and it does not necessarily lead to minimizing the overall classification error. In the MCE training algorithm, in addition to the modeling error, the final classification result also roles in the final optimization of the models' parameters [16,27]. Consider an input vector xn, and its target class yn, fxn ; yn gN n ¼ 1 . In the general form of the MCE training algorithm, in the case of K classes, the classifier makes its decision by the following decision rule y ¼ k if g y ¼ k ðxn ; ΛÞ ¼ max g y ¼ i ðxn ; ΛÞ for all iϵK

ð1Þ

where y is the decision of the classifier for input xn, or equivalently (i) Age (ii) Place of residence: Place of residence of subjects contributed in this study include the counties of Gonbad, Minoodasht, Kalaleh, Azadshahr, and Ramian in eastern Golestan province [22].

y ¼ k if g y ¼ k ðxn ; ΛÞ  max g y ¼ i ðxn ; ΛÞ 4 0 for all i a k

ð2Þ

where g y ¼ i ðxn ; ΛÞ is the discriminant function by which the score of the assignment of xn to class i is evaluated, and Λ is the

150 100 50 0

1

2

3

4

5

400 300 200 100 0

1

2

3

4

5

frequency of brushing teeth

number of control subjects number of dysplasia cases

number of control subjects

number of ESCC cases

M. Moghtadaei et al. / Computers in Biology and Medicine 45 (2014) 51–57

53

20 15 10 5 0

1

2

3

4

5

2

3

4

5

400 300 200 100 0

1

150 100 50 0

1

2

3

400 300 200 100 0

1

2

3

tea temperature description Fig. 1. Histograms of: (a) frequency of brushing teeth (1: Once a day, 2: Twice a day, 3: Three time a day, 4: Other (specify), 5: Never), and (b) tea temperature (estimated by the time interval between tea being poured and drunk, and categorized in minutes as 1: 4 min or more, 2: 2–3 min, and 3: 0–1 min), for ESCC dataset (total number of ESCC cases ¼ 300, total number of control subjects ¼571).

parameter set of this function. So the measure of the misclassification of xn into class k can be defined as the negative of this function. For the kth class, where the known class of xn is k, yn ¼k, the misclassification measure is given by dyn ¼ k ðxn ; ΛÞ ¼

max g y ¼ i ðxn ; ΛÞ  g y ¼ k ðxn ; ΛÞ

for all i a k

ð3Þ

Using some statistical models, we can use the MAP classifier that assigns each input to its most probable class, C(xn). Cðxn Þ ¼ arg maxfor all i A K ½ ln Pðxn jy ¼ iÞPðy ¼ iÞ

ð4Þ

This way, by assuming identical prior probabilities for the classes, the discriminant function is defined as: g y ¼ k ðxn ; ΛÞ ¼ ln Pðxjy ¼ k; ΛÞ

ð5Þ

Using this, we can rewrite Eq. (5) as dyn ¼ k ðxn ; ΛÞ ¼ max ln Pðxn y ¼ i; ΛÞ  ln Pðxn y ¼ k; ΛÞ for all i a k



¼ max ln iak

Pðxn jy ¼ i; ΛÞ Pðxn jy ¼ k; ΛÞ

 ð6Þ

number of control subjects number of dysplasia cases

number of control subjects

number of ESCC cases

frequency of brushing teeth

20 15 10 5 0

1

2

3

2

3

400 300 200 100 0

1

tea temperature description Fig. 2. Histograms of (a) frequency of brushing teeth (1: Once a day, 2: Twice a day, 3: Three time a day, 4: Other (specify), 5: Never), and (b) tea temperature (estimated by the time interval between tea being poured and drunk, and categorized in minutes as 1: 4 min or more, 2: 2–3 min, and 3: 0–1 min) for dysplasia dataset (total number of dysplasia cases ¼ 26, total number of control subjects¼ 698).

Thus, the classification error criterion can be written as    1 1 Pðxn jy ¼ i; ΛÞ L ¼ ∑ f ðdyn ¼ k ðxn ; ΛÞÞ ¼ ∑ Θ max ln Nn Nn Pðxn jy ¼ k; ΛÞ iak where Θ is the hard threshold function  1x Z0 ΘðxÞ ¼ 0x o0

ð7Þ

ð8Þ

To implement Eq. (7), the most likely incorrect class, using the max function, is first found. Second, the threshold function Θ registers an error if the incorrect class is more probable than the correct one, and finally, all the errors are summed over the whole training data. Setting the derivative of L in (7) with respect to the parameters Λ of the discriminant function to zero, we evaluate the parameters that minimize the misclassification error. Of course, the function of Eq. (7) is not differentiable, due to both max operation and threshold function. Based on [16], by replacing max operation with the softmax function, and the threshold function with sigmoid function, the resulted error function becomes smooth

54

M. Moghtadaei et al. / Computers in Biology and Medicine 45 (2014) 51–57

and differentiable. The converted error function is given by dyn ¼ k ðxn ; ΛÞ ¼ ln ∑ Pðxn jy ¼ i; ΛÞ  ln Pðxn jy ¼ k; ΛÞ

ð9Þ

iak

1 1 L¼ ∑ N 1 þ expð dyn ¼ k ðxn ; ΛÞÞ

ð10Þ

Minimizing the loss function resulted in Eq. (10) leads to the minimum classification error, and it is known as the MCE criterion [16,27]. In the present study, to implement this approach for the involved classification task, the GMM is employed to represent the probability density functions of different classes of data samples. The GMM is a parametric model of the probability distribution of a specific (e.g. the ith) class, represented as a weighted sum of M (Mi) Gaussian densities. The complete GMM, PðxjΛÞ, of a D-dimensional data vector x (in our application, the extracted features), of the ith class is parameterized by the mean vectors (μi), covariance matrices (∑j) and mixture weights (wj), corresponding to the components Gaussian densities [3], as given by

Table 1 Classifier performance for ESCC risk prediction for the train dataset. Specificity (%)

Sensitivity (%)

Accuracy (%)

Precision (%)

97.47

97.03

97.32

95.27

Table 2 Classifier performance for ESCC risk prediction for the test dataset. Specificity (%)

Sensitivity (%)

Accuracy (%)

Precision (%)

91.22

86.66

89.65

83.87

Table 3 Classifier performance for dysplasia risk prediction for the train dataset. Specificity (%)

Sensitivity (%)

Accuracy (%)

Precision (%)

96.97

76.92

91.54

90.45

Mi

Pðxn jy ¼ i; ΛÞ ¼ ∑ wij pðxn jμij ; Σ ij Þ

Table 4 Classifier performance for dysplasia risk prediction for the test dataset.

j¼1

Mi

1

j¼1

ð2πÞD=2 jΣ ij j1=2

¼ ∑ wij

1

1

e  2 ðxn  μij Þ′Σ ij

ðxn  μij Þ

ð11Þ

The parameters of the models are obtained by minimizing the introduced loss function. So, next to the differentiating step, the obtained equations are not linear with respect to the model parameters; so, to solve them, an iterative algorithm, like the steepest gradient descent algorithm must be employed. The overall algorithm by which the GMM parameters could be evaluated, for a group of data samples, is known as EM algorithm [5]. This method consists of two iterating steps: the Expectation and Maximization steps. The details of the algorithm could be found in [11]. To verify how the classification results will generalize to an independent data set, testing datasets are employed via a 10-fold cross validation approach, to estimate how accurately the predictive model will perform in practice. In the 10-fold algorithm, the original sample is randomly partitioned into 10 subsamples. The training and testing procedures are repeated 10 times (folds). In each fold, 90% of the data is used for training and the rest 10% of the data for evaluating the performance of the algorithm. In this manner, each of the 10 subsamples are used exactly once as the validation data [8]. This way, one can make sure that the classifier can generalize classification for different data sets of the same type. On the other hand the test dataset helps to avoid overfitting during the train procedure. In principle, while training the model, first the prediction error for the test data set decreases, but at the point that overfitting begins, this error suddenly increases. The training procedure should be stopped at this point.

4. Results Four standard measures, namely, accuracy – the percentage of predictions that are correct – , precision – the percentage of positive predictions that are correct – , specificity – the percentage of negative predictions in healthy persons – and sensitivity – the percentage of positive predictions in diseased persons – have been used to evaluate the classifier's performances [6,19,23,24]. The accuracy, specificity, precision and sensitivity of the model for ESCC and dysplasia risk prediction for the train and test datasets are represented in Tables 1–4.

Specificity (%)

Sensitivity (%)

Accuracy (%)

Precision (%)

97.10

65.38

88.42

89.47

The accuracy, specificity, precision and sensitivity are defined as accuracy ¼ ðTP þ TNÞ=ðTP þ FP þ FN þ TNÞ12

ð12Þ

precision ¼ TP=ðTP þ FPÞ

ð13Þ

specifity ¼ TN=ðTN þFPÞ

ð14Þ

sensitivity ¼ TP=ðTP þ FNÞ

ð15Þ

in which, TP stands for the number of true positives, i.e. sick people correctly diagnosed as sick, FP stands for the number of false positives, i.e. healthy people incorrectly identified as sick, TN stands for the number of true negatives, i.e. healthy people correctly identified as healthy, and finally FN stands for the number of false negatives, i.e. sick people incorrectly identified as healthy. The receiver operating characteristic (ROC) diagrams for ESCC and dysplasia risk prediction are presented in Figs. 3 and 4, respectively. An ROC space depicts relative trade-offs between true positive and false positive. Each prediction result or instance of a confusion matrix represents one point in the ROC space. If the classifier parameter is a continuous measure, like the threshold of indicating disease, the ROC graph will be continuous. Because the true positive rate (TPR) and false positive rate (FPR) are available for different values of the classifier parameter. But in the present approach, there is no continuous measure like threshold. In the present study, two models are built for the two classes of data; One for healthy and one for patient cases. For any new case in the data, the outputs of both models are evaluated. Finally, the data is assigned to the class with higher probability, namely to the model with larger output value. So, there is no thresholding and the ROC curve is then plotted for the best result achieved from the described algorithm. For this reason, the ROC curve in Figs. 3 and 4 are not continuous curves, and there is no cut-off in the ROC curve.

M. Moghtadaei et al. / Computers in Biology and Medicine 45 (2014) 51–57

Similarly, no area under curve (AUC) is calculated. On the other hand, AUC has recently been questioned and criticized based upon new researches [9,10,18]. A comparison of the results of the proposed method with the results of the multivariate logistic regression model and variable structure FNN using the hybrid COA algorithm, to predict the risk of ESCC and dysplasia, is managed in Table 5.

1 train test

0.9 0.8

TPR (sensitivity)

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FPR (1-specificity) Fig. 3. ROC diagram comparing the sensitivity and specificity of the model to predict ESCC in Golestan Province.

1 train test

0.9 0.8

TPR (sensitivity)

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FPR (1-specificity) Fig. 4. ROC diagram comparing the sensitivity and specificity of the model to predict squamous dysplasia in Golestan Province.

55

The accuracy and precision of the proposed MCE method is compared to the variable structure FNN using the hybrid COA method in Table 6.

5. Discussion As noted in the literature, the minimum classification error (MCE) algorithm is proved to be helpful in enhancement of the performance of some of the classifiers in many applications [17,20,25]. In this paper, we employed this approach to improve ESCC and dysplasia risk prediction results. The simulation results show that using the proposed method has higher performance in ESCC and dysplasia risk prediction in comparison to two of the recent methods, i.e. logistic regression and hybrid COA algorithms. The reproducibility and generalization of the classifier is verified by checking its performance on an independent dataset that can well represent the population, and yet sufficiently distinct from the training dataset, i.e. the test datasets by applying a cross validation strategy, i.e. the 10-fold algorithm. Over-fitting to the training dataset is a major challenge in machine learning methods. In this study, we have used early stopping technique based on the test dataset classification error in each fold of the 10-fold algorithm. The superiority of the MCE classifier against the commonly used model-based classification approaches is because the first step in designing such classifier is to estimate probability distributions of different classes, using the existing labeled samples of different classes of the training database. Since, the estimated probabilities are typically a broad approximation of the actual probabilities, even if the modeling error is very small, still could not exactly reflect the details of the relevant data samples. Moreover, in this modeling process it is not guaranteed that the developed models have learned the details by which we could distinguish among the classes. But, applying the MCE algorithm to this problem biases the models in opposite directions. It means that, some minor variations in the model parameters bias the models to better represent the differentiating details of the data samples classified in different classes. For this reason, the conventional model-based classifiers do not often lead to the optimal classification performances. The loss function in the MCE algorithm is a representation of the sum of misclassifications, and guides the algorithm toward better final classification results. Since the model aims to be used for screening purpose, sensitivity is a very important index for evaluating the classification performance. Based on the results provided in Table 5, For the ESCC dataset, sensitivity of the proposed model is higher than both previously published models, but it is close to the variable structure FNN model. For the squamous dysplasia dataset, sensitivity of the proposed model is higher than the multivariate logistic regression model, but it is less than the variable structure FNN model. In this regard, the variable structure FNN is highly competitive with the proposed model. Yet, as the results in

Table 5 Comparison of performance for ESCC and squamous dysplasia between GMM model using MCE algorithm, logistic regression model, and the variable structure FNN model using hybrid COA algorithm. Dataset

Method

Sensitivity (%)

Specificity (%)

ESCC

GMM model using MCE algorithm Variable structure FNN using hybrid COA algorithm Multivariate logistic regression

86.66 85.76 80.6

91.22 88.27 82.4

Squamous dysplasia

GMM model using MCE algorithm Variable structure FNN using hybrid COA algorithm Multivariate logistic regression

65.38 75.83 61.5

97.10 80.96 69.5

56

M. Moghtadaei et al. / Computers in Biology and Medicine 45 (2014) 51–57

Table 6 Comparison of the accuracy and precision of the GMM model using MCE algorithm, and the variable structure FNN model using Hybrid COA algorithm for ESCC and squamous dysplasia. Dataset

Method

Accuracy (%)

Precision (%)

ESCC

GMM model using MCE algorithm Variable structure FNN using hybrid COA algorithm

89.65 87.35

83.87 79.54

Squamous dysplasia

GMM model using MCE algorithm Variable structure FNN using hybrid COA algorithm

88.42 70.53

89.47 61.11

Tables 5 and 6 demonstrate, specificity, accuracy and precision of the proposed model is always higher than the variable structure FNN model with considerable difference. Considering all this, the results given in Tables 5 and 6 show the improvement in the performances of the proposed method in comparison to the multivariate logistic regression model and the variable structure FNN model that uses hybrid COA algorithm for discriminating between the two classes in each dataset and consequently predicting the risk of ESCC and dysplasia, using their risk factors. The perfect classification method leads to the point (0,1) in the ROC space, representing 100% sensitivity and 100% specificity. The proposed classification method, yields points above the line of nodiscrimination, close to the perfect classification in the ROC diagram as shown in Figs. 3 and 4 which means that the model is a good classifier and a good predictor.

[5] [6]

[7]

[8] [9]

[10]

[11]

[12]

6. Conclusion In this paper, we showed that the algorithmic classification methods are very helpful in ESCC and dysplasia risk predictions. Moreover it is shown that the minimum classification error (MCE) algorithm, which enhances the performance of some of the classifiers, is also useful in this task. We employed this approach to improve ESCC and dysplasia risk prediction results. The proposed MCE method was proved to be more efficient in predicting the risk of ESCC and squamous dysplasia using their risk factors, compared to the latest prediction methods proposed for this application. The datasets used to build this model were obtained from the case control epidemiologic study in the Golestan Province. We plan to test the performance of this method in early detection of esophageal cancer in our ongoing on-line clinical studies.

Conflict of interest statement None declared. References [1] C.C. Abnet, F. Kamangar, F. Islami, D. Nasrollahzadeh, P. Brennan, K. Aghcheli, S. Merat, A. Pourshams, H.A. Marjani, A. Ebadati, M. Sotoudeh, P. Boffetta, R. Malekzadeh, S.M. Dawsey, Tooth loss and lack of regular oral hygiene are associated with higher risk of esophageal squamous cell carcinoma, Cancer Epidemiol. Biomark. Prev. 17 (11) (2008) 3062–3068, http://dx.doi.org/ 10.1158/1055-9965.epi-08-0558. [2] M.R. Akbari, R. Malekzadeh, D. Nasrollahzadeh, D. Amanian, P. Sun, F. Islami, M. Sotoudeh, S. Semnani, P. Boffeta, S.M. Dawsey, P. Ghadirian, S.A. Narod, Familial risks of esophageal cancer among the Turkmen population of the Caspian littoral of Iran, Int. J. Cancer 119 (5) (2006) 1047–1051, http://dx.doi. org/10.1002/ijc.21906. [3] C.M. Bishop, Pattern Recognition and Machine Learning. Information Science and Statistics, Springer, Singapore, 2009. [4] S.M. Dawsey, D.E. Fleischer, G.-Q. Wang, B. Zhou, J.A. Kidwell, N. Lu, K.J. Lewin, M.J. Roth, T.L. Tio, P.R. Taylor, Mucosal iodine staining improves endoscopic visualization of squamous dysplasia and squamous cell carcinoma of the

[13]

[14]

[15] [16]

[17] [18]

[19]

[20]

[21]

[22]

[23]

[24]

esophagus in linxian, china, Cancer 83 (2) (1998) 220–231, http://dx.doi.org/ 10.1002/(sici)1097-0142(19980715)83:2 o 220::aid-cncr4 43.0.co;2-u. A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B (Methodol.) 39 (1) (1997) 1–38. R.P.B. Eisner, D. Szafron, P. Lu, R. Greiner, 2005. Improving protein function prediction using the hierarchical structure of the gene ontology, in: Proceedings the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology. A. Etemadi, C.C. Abnet, A. Golozar, R. Malekzadeh, Modeling the risk of esophageal squamous cell carcinoma and squamous dysplasia in a high risk area in Iran, Arch. Iran Med. 15 (1) (2012) 18–21. S. Geisser, Predictive Inference: An Introduction, Chapman & Hall, New York, USA, 1993. B. Hanczar, J. Hua, C. Sima, J. Weinstein, M. Bittner, E.R. Dougherty, Smallsample precision of ROC-related estimates, Bioinformatics 26 (6) (2010) 822–830, http://dx.doi.org/10.1093/bioinformatics/btq037. D. Hand, Measuring classifier performance: a coherent alternative to the area under the ROC curve, Mach. Learn. 77 (1) (2009) 103–123, http://dx.doi.org/ 10.1007/s10994-009-5119-5. T. Hastie, R. Tibshirani, J.H. Friedman, The EM Algorithm. In: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, second ed., Springer, New York, USA, 2011. F. Islami, P. Boffetta, J.-S. Ren, L. Pedoeim, D. Khatib, F. Kamangar, Hightemperature beverages and foods and esophageal cancer risk—a systematic review, Int. J. Cancer 125 (3) (2009) 491–524, http://dx.doi.org/10.1002/ ijc.24445. F. Islami, F. Kamangar, D. Nasrollahzadeh, K. Aghcheli, M. Sotoudeh, B. AbediArdekani, S. Merat, S. Nasseri-Moghaddam, S. Semnani, A. Sepehr, J. Wakefield, H. Møller, C.C. Abnet, S.M. Dawsey, P. Boffetta, R. Malekzadeh, Socio-economic status and oesophageal cancer: results from a population-based case-control study in a high-risk area, Int. J. Epidemiol. 38 (4) (2009) 978–988, http://dx. doi.org/10.1093/ije/dyp195. A. Jemal, F. Bray, M.M. Center, J. Ferlay, E. Ward, D. Forman, Global cancer statistics, CA Cancer J. Clin. 61 (2) (2011) 69–90, http://dx.doi.org/10.3322/ caac.20107. A. Jemal, R. Siegel, J. Xu, E. Ward, Cancer statistics, CA Cancer J. Clin. 60 (5) (2010) 277–300, http://dx.doi.org/10.3322/caac.20073. A.V. Lee, R. Schiff, X. Cui, D. Sachdev, D. Yee, A.P. Gilmore, C.H. Streuli, S. Oesterreich, D.L. Hadsell, New mechanisms of signal transduction inhibitor action, Clin. Cancer Res. 9 (1) (2003) 516s–523s. B.-H. JuangW. HouC.-H. Lee, Minimum classification error rate methods for speech recognition, IEEE Trans. Speech Audio Process. 5 (3) (1997) 257–265. J.M. Lobo, A. Jiménez-Valverde, R. Real, AUC: a misleading measure of the performance of predictive distribution models, Global Ecol. Biogeogr. 17 (2) (2008) 145–151, http://dx.doi.org/10.1111/j.1466-8238.2007.00358.x. Z. Lu, D. Szafron, R. Greiner, P. Lu, D.S. Wishart, B. Poulin, J. Anvik, C. Macdonell, R. Eisner, Predicting subcellular localization of proteins using machine-learned classifiers, Bioinformatics 20 (4) (2004) 547–556, http://dx.doi.org/10.1093/ bioinformatics/btg447. E. McDermott, T.J. Hazen, J.L. Roux, A. Nakamura, S. Katagiri, Discriminative training for large-vocabulary speech recognition using minimum classification error, IEEE Trans. Audio Speech Lang. Process. 15 (1) (2007) 203–223. M. Moghtadaei, M.R.H. Golpayegani, R. Malekzadeh, A variable structure fuzzy neural network model of squamous dysplasia and esophageal squamous cell carcinoma based on a global chaotic optimization algorithm, J. Theor. Biol. 318 (2013) 164–172, http://dx.doi.org/10.1016/j.jtbi.2012.11.013. D. Nasrollahzadeh, F. Kamangar, K. Aghcheli, M. Sotoudeh, F. Islami, C.C. Abnet, R. Shakeri, A. Pourshams, H.A. Marjani, M. Nouraie, M. Khatibian, S. Semnani, W. Ye, P. Boffetta, S.M. Dawsey, R. Malekzadeh, Opium, tobacco, and alcohol use in relation to oesophageal squamous cell carcinoma in a high-risk area of Iran, Br. J. Cancer 98 (11) (2008) 1857–1863. A. Osareh, M. Mirmehdi, B. Thomas, R. Markham, Classification and Localisation of Diabetic-Related Eye Disease Computer Vision—ECCV 2002, in: A. Heyden, G. Sparr, M. Nielsen, P. Johansen (Eds.), Lecture Notes in Computer Science, vol. 2353, Springer, Berlin/Heidelberg, 2006, pp. 325–329, http://dx.doi.org/10.1007/ 3-540-47979-1_34. V.P. Shah, K.K. Midha, S. Dighe, I.J. McGilveray, J.P. Skelly, A. Yacobi, T. Layloff, C.T. Viswanathan, C.E. Cook, R.D. McDowall, K.A. Pittman, S. Spector, K.S. Albert, S. Bolton, M. Dobrinska, W. Doub, M. Eichelbaum, J.W.A. Findlay, K. Gallicano, W. Garland, D.J. Hardy, J.D. Hulse, H.T. Karnes, R. Lange, W.D. Mason, G. McKay,

M. Moghtadaei et al. / Computers in Biology and Medicine 45 (2014) 51–57

E. Ormsby, J. Overpeck, H.D. Plattenberg, G. Shiu, D. Sitar, F. Sorgel, J.T. Stewart, L. Yuh, Analytical methods validation: bioavailability, bioequivalence and pharmacokinetic studies: sponsored by the American Association of Pharmaceutical Chemists, U.S. Food and Drug Administration, Fédération Internationale Pharmaceutique, Health Protection Branch (Canada) and Association of Official Analytical Chemists, Int. J. Pharm. 82 (1–2) (1992) 1–7. [25] O. Vinyals, L. Deng, D. Yu, A. Acero, Discriminative pronounciation learning using phonetic decoder and minimum-classification-error criterion, in: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, 2009, pp. 4445–4448.

57

[26] G.-Q. Wang, C.C. Abnet, Q. Shen, K.J. Lewin, X.-D. Sun, M.J. Roth, Y.-L. Qiao, S. D. Mark, Z.-W. Dong, P.R. Taylor, S.M. Dawsey, Histological precursors of oesophageal squamous cell carcinoma: results from a 13 year prospective follow up study in a high risk population, Gut 54 (2) (2005) 187–192 http://dx.doi.org/10.1136/gut.2004.046631. [27] X. Wang, K.K. Paliwal, A Modified Minimum Classification Error (MCE) training algorithm for dimensionality reduction, J. VLSI Signal Process. 32 (1) (2002) 19–28, http://dx.doi.org/10.1023/a:1016307200123.