Hybrid intelligent system for cardiac arrhythmia classification with Fuzzy K-Nearest Neighbors and neural networks combined with a fuzzy system

Hybrid intelligent system for cardiac arrhythmia classification with Fuzzy K-Nearest Neighbors and neural networks combined with a fuzzy system

Expert Systems with Applications 39 (2012) 2947–2955 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal hom...

1MB Sizes 1 Downloads 28 Views

Expert Systems with Applications 39 (2012) 2947–2955

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Hybrid intelligent system for cardiac arrhythmia classification with Fuzzy K-Nearest Neighbors and neural networks combined with a fuzzy system Oscar Castillo ⇑, Patricia Melin, Eduardo Ramírez, José Soria Tijuana Institute of Technology, Graduate Studies, Tijuana, BC, Mexico

a r t i c l e

i n f o

Keywords: Fuzzy KNN Mamdani fuzzy system Neural network Arrhythmia classification

a b s t r a c t In this paper we describe a hybrid intelligent system for classification of cardiac arrhythmias. The hybrid approach was tested with the ECG records of the MIT-BIH Arrhythmia Database. The samples considered for classification contained arrhythmias of the following types: LBBB, RBBB, PVC and Fusion Paced and Normal, as well as the normal heartbeats. The signals of the arrhythmias were segmented and transformed for improving the classification results. Three methods of classification were used: Fuzzy K-Nearest Neighbors, Multi Layer Perceptron with Gradient Descent and momentum Backpropagation, and Multi Layer Perceptron with Scaled Conjugate Gradient Backpropagation. Finally, a Mamdani type fuzzy inference system was used to combine the outputs of the individual classifiers, and a very high classification rate of 98% was achieved. Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction An electrocardiogram or ECG represents the electrical activity of the heart, as a waveform graph. An ECG signal contains important information that can help medical diagnosis, reflecting cardiac activity of a patient, if it is normal or failing heart that has certain pathologies. The ECG is the standard tool used in diagnosing heart disease (Health, 2009). The physicians get those signals easily and noninvasively by adding electrodes to the patient’s body. The Holter device is frequently used for ECG recording. Physicians apply the Holter device to the patient when ECG monitoring is required to find the existence of abnormal heartbeats in a one day ECG. A person can register about 100,000 heartbeats in one day (Health, 2009). The ECG shows each heartbeat as a series of electrical waves. The contractions that pump blood are represented by the P wave, the QRS complex and T wave. The P wave represents activity in the upper chambers of the heart. The QRS complex and T wave represents activity in the lower chambers (Health, 2009) (see Fig. 1). By arrhythmia we mean any alteration in the activity of the heart rhythm in amplitude, duration or shape of the rhythm. The MIT-BIH Arrhythmia Database is a set of 48 ECG records with 30 min duration each, and each record corresponds to a patient. In this database there are different types of arrhythmias such as: L-Left Bundle Branch Block (LBBB), R-Right Bundle Branch Block (RBBB), A-Atrial Premature Beat, a-aberrated Premature Atrial Beat

⇑ Corresponding author. E-mail address: [email protected] (O. Castillo). 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.08.156

Premature junctional Nodal J-Beat, Fusion of Ventricular and Normal Beat, I-Ventricular Flutter Wave, J-junctional Nodal Escape Beat, E-Ventricular Escape Beat Supra-ventricular Premature Beat S-, f-Fusion of an Paced Beat Normal and normal heartbeats (MIT-BIH Arrhythmia Database. PhysioBank, 2000). Many solutions have been proposed to develop automated recognition and classification of ECG. Some processing methods have been applied to the ECG signal: Statistical and Syntatic, MultiLayer Perceptron (MLP), Self-Organizing Maps (SOM), Learning Vector Quantization (LVQ), Linear Discriminant System, Fuzzy or NeuroFuzzy Systems, Support Vector Machines (SVM), Bayesian approach, Experts Systems, Markov Models, Hybrid system use a combination of different solutions to improve performance (Acharya, Kumar, & Bhat, 2004; Alzate & Giraldo, 2006; Anuradha, Suresh Kumar & Veera Reddy, 2008; Barbosa, Kleisinger, Valdez, & Monzón, 2001; Belgacem, Chikh, & Bereksi Reguig, 2003; Cepek et al., 2007; Ceylan, Ozbay, & Karlik, 2009; Clifford, Azuaje, & McSharry, 2006; de Chazal & Reilly, 1998; Engin, 2004; Khadra, Al-Fahoum, Al-Nashash, 1997; Maglaveras, Stamkopoulos, Diamantaras, Pappas, & Strintzis, 1998; Nabney, Evans, Tenner, & Gamlyn, 2001; O’Dwyer, de Chazal, & Reilly, 2000; Ozbay, Ceylan, & Karlik, 2006; Patra, Kumar Das, & Pradhan, 2009; Sun & Chan, 2000; Tsipouras & Fotiadis, 2003; Werbos, 1994). In this paper, we describe a hybrid intelligent system for classifying cardiac arrhythmia classification using three methods: Fuzzy KNN, MLP Gradient Descent with momentum Backpropagation and MLP Scaled Conjugate Gradient Backpropagation, and finally combine these outputs with a Mamdani fuzzy inference system that improves performance by achieving a very high classification rate.

2948

O. Castillo et al. / Expert Systems with Applications 39 (2012) 2947–2955

ECG Signal 115.dat 3 2

Voltage (mV)

1 0

1

1

1

1

-1 -2 -3 -4 -5 247

247.5

248

248.5

249

249.5

250

Time (Seconds)

Fig. 1. Normal heartbeat with fiducial points.

2. Problem statement and outline of the proposed method

Fig. 3. The red line represents the MLII electrode signal. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

The problem addressed in this work is the classification of cardiac arrhythmias of the MIT-BIH Arrhythmia Database. The proposed hybrid system learns from a set of samples of arrhythmias: Normal Beat, Left Bundle Brunch Block (LBBB), Right Bundle Brunch Block (RBBB), Premature Ventricular Contraction (PVC), Fusion Paced and Normal Beat. The hybrid system is trained using several samples from each above mentioned class, and during testing it must determine the correct label for a previously unknown sample. The hybrid intelligent system consists for three classification methods: Fuzzy K Nearest Neighbors, Multi Layer Perceptron with Gradient Descent with momentum Backpropagation and Multi Layer Perceptron with Scaled Conjugate Gradient Backpropagation. Each classifier provides a matrix of memberships or activations in the case of the MLPs, this matrix is called the matrix of integration, and this is the input to the Mamdani fuzzy inference system, which combines the outputs of the three classifiers. The fuzzy system increases the classification rate given by the contribution of each classifier. See the proposed architecture in Fig. 2.

We used the MLII electrode signal of MIT-BIH Arrhythmia Database (see Fig. 3), and for this work five classes were selected: Normal Beat, LBBB, RBBB, PVC, Fusion Paced and Normal Beat. We took 500 samples in total, 100 samples for each above mentioned class. The used records for Normal were 115, 122, and 113, for LBBB 109, 111, 214, for RBBB 118, 124, 212, and for Fusion Paced and Normal class 217. For the heartbeat segmentation we took the R points documented in the database, and for this reason we determine the beginning and the end for each heartbeat. The segmentation was performed manually (see Fig. 4). A transformation of the signals was proposed, that consists in rearranging the voltage values; we just took the highest larger 70 voltage values and the smaller 70 voltage values per each heartbeat. Having a set of vectors of 140 voltage values, where each vector corresponds to a heartbeat (see Fig. 5 and 6). This transformation provided a simpler signal to be classified.

Fig. 2. Architecture of the hybrid system.

O. Castillo et al. / Expert Systems with Applications 39 (2012) 2947–2955

2949

Fig. 4. Heartbeat segmentation. (a) sample for Normal class, (b) sample for RBBB class.

Fig. 5. Heartbeat normalization.

In the following section, we present a brief review of some of the main concepts used in the proposed hybrid approach for classification.

Conjugate Backpropagation. We also discuss how the outputs from the above mentioned classifiers can be combined using a Mamdani fuzzy inference system, and we describe the heartbeats transformation that provide the input for each classifiers in the hybrid system.

3. Background theory 3.1. Fuzzy K-Nearest Neighbors algorithm In this section, we provide a general overview of Fuzzy K-Neartest Neighbors algorithm, Multi Layer Perceptrons with Gradient Descent with momentum Backpropagation and Gradient Scaled

The Fuzzy K-Nearest Neighbor algorithm assigns class membership to a sample vector rather than assigning the vector to a

2950

O. Castillo et al. / Expert Systems with Applications 39 (2012) 2947–2955

Fig. 6. Example of Heartbeats Normalized for each class.

particular class. The advantage is that no arbitrary assignments are made by the algorithm. In addition, the vector’s membership values should provide a lever of assurance to accompany the resultant classification. For example, if a vector is assigned a membership of 0.9 in one class and a membership of 0.05 in two other classes we can be reasonably sure that the class of membership of 0.9 is the class to which the vector belongs. On the other hand, if a vector is assigned a membership of 0.55 in class one, membership of 0.44 in class two, and membership of 0.01 in class three, then we should be hesitant to assign the vector based on these results. However, we can feel confident that it does not belong to class three. In such a case the vector might be examined further to determine its classification, because the vector exhibits a high degree of membership in both classes one and two. Clearly the membership assignments produced by the algorithm can be useful in the classification process (Keller, Gray, & Givens, 1985). The basis of the algorithm is to assign membership as a function of the vector’s distance from its K-nearest neighbors and those neighbors’ memberships in the possible classes. The fuzzy algorithm is similar to the crisp version in the sense that it must also search the labeled sample set for the K-nearest neighbors. Beyond obtaining these k samples, the procedures differ considerably (Keller et al., 1985). Let W = {x1, x2, . . . , xn} be the set of n labeled samples. Also let ui(x) be the assigned membership of the vector x (to be computed), and uij be the membership in the ith class of the jth vector of the labeled sample set. The algorithm is as follows:

BEGIN Input x, of unknown classification. Set K,1 6 K 6 n. Initialize i = 1. DO UNTIL (K-nearest neighbors to x found) Compute distance from x to xi IF (i 6 K) THEN Include xi in the set of K-nearest neighbors ELSE IF (xi closer to x than any previous nearest neighbor) THEN Delete the farthest of the K-nearest neighbors Include xi in the set of the K-nearest neighbors. END IF END DO UNTIL Initialize i = 1. DO UNTIL (x assigned membership in all classes) Compute ui(x) using (1). Increment i. END DO UNTIL END

Where

  1=kx  xj k2ðm1Þ  : ui ðxÞ ¼ P  k 2ðm1Þ j¼1 1=kx  xj k Pk

j¼1 uij

ð1Þ

O. Castillo et al. / Expert Systems with Applications 39 (2012) 2947–2955

As in Eq. (1), the assigned memberships of x are influenced by the inverse of distances from the nearest neighbors and their class memberships. The inverse distance serves to weight a vector’s membership more if it is closer and less if it is further from the vector under consideration. The labeled samples can be assigned class memberships in several ways. First, they can be given complete membership in their known class and non-membership in all other classes. Other alternatives are to assign the samples’ membership based on the distance from their class mean or based on the distance from the labeled samples of their own class and those of the other class or classes and then to use the resulting memberships in the classifier (Keller et al., 1985).

2951

Brío & Molina, 2007; Network for Mathematica, 2003; Samarasinghe, 2007). In GDM, the weight update takes into account previous weight changes along with the current propagated error. It can be understood as an averaging process that does not allow for weight changes that are extremely different relative to the previous ones. The update rule proposed in GDM is given by Dwm ¼ aDwm1  w ð1  aÞedm , where a is the momentum coefficient, and Dwm1 is the weight change used in the previous epoch. The e parameter represents the learning rate. In GDM the momentum coefficient a is held constant throughout the entire training process, usually set to 0.9 (Bishop, 1996; Brío & Molina, 2007; Network for Mathematica, 2003; Samarasinghe, 2007).

3.2. Multi layer perceptron 3.5. Gradient Scaled Conjugate (SC) An artificial neural network is a mathematical and computational model that simulates the abstract structure and functional aspects of biological neural networks. The basic computational elements in an ANN are known as neurons, nodes that serve as inputs, outputs or internal processing units. These neurons communicate and pass signals among themselves using what are known as synapses, which are modulated by weight parameters. The different forms in which neurons are organized and connected define the architecture of the network. Here, we will focus on the feedforward and fully connected architecture used by the MLP. In this network, the information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) to the output nodes, and there are no loops in a network of this type (Bishop, 1996; Brío & Molina, 2007; Network for Mathematica, 2003; Samarasinghe, 2007).

In second-order methods, the curvature of the error surface, denoted by the second derivative of the error surface, is used to more efficiently guide the error down the error surface. These methods provide faster solutions because of the incorporation of an extra second derivative of error information and automatic internal adjustments that are made to the learning parameters. However, this comes at a substantial computational cost of the calculation of the second derivative of error, Hessian, H, and inverse of Hessian, R, especially for a network with a large number of weights. The conjugate gradients, gradient scaled conjugate, Newton and Levenberg–Maquardt are algorithms of second-order methods (Bishop, 1996; Brío & Molina, 2007; Network for Mathematica, 2003; Samarasinghe, 2007). 4. Experiments

3.3. Backpropagation training for a multilayer perceptron The BP algorithm can me summarized in four fundamental steps that are repeated iteratively, and these are: 1. Initialize the connection weights with random values. 2. Compute the output of the ANN, denoted by yp, by propagating each input pattern xp through the network in a forward direction. 3. Compute the error between the desired output, d, and the output produced by the ANN y, this is given by Ep ¼ 12 ðdp ln  dp ln Þ2 . 4. Adjust the connection weights according to the following rule.

W ln½tþ1 ¼ W ln½t þ bdp ln xpl :

ð2Þ

In the above equation b represents the learning rate, a parameter that is used to modulate the amount by which the connection weights can be modified in each iteration of the algorithm. The network dpln is computed in two ways. First, if n is an output node then it is given by (dpln  ypln). P Otherwise, it is given by f (ypln) = dpln + 1wl + 1, where f represents the activation function of node n. The above process is repeated until a stopping criterion is met, which can be a desired minimum error or a maximum number of iterations, also known as epochs (Bishop, 1996; Brío & Molina, 2007; Network for Mathematica, 2003; Samarasinghe, 2007).

In this section we present the selected classes for the MIT-BIH Arrhythmia Database, describe the experimental set-up, and detail the experimental results that were obtained by using Fuzzy KNN, MLPs and the fuzzy inference system. 4.1. MIT-BIH Arrhythmia Database For this work, we used the MIT-BIH Arrhythmia Database. The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Approximately 60% of these recordings were obtained from inpatients. The subjects were 25 men aged from 32 to 89 years, and 22 women aged from 23 to 89 years. Two or more

3.4. Gradient descent with momentum (GDM) Another proposed improvement is the GDM algorithm, which takes a different approach towards overcoming some of the shortcomings of BP. It is equivalent to the traditional BP algorithm, with an added parameter called the momentum coefficient, which is used when the connection weights are updated (Bishop, 1996;

Fig. 7. Fuzzy inference system, Mamdani type.

2952

O. Castillo et al. / Expert Systems with Applications 39 (2012) 2947–2955

Fig. 8. Fuzzy inference system. (a) The inputs. (b) The outputs.

cardiologists independently annotated each record; disagreements were resolved to obtain the computer-readable reference annotations for each beat (approximately 110,000 annotations in all) included with the database. Sample images from the database are shown in Fig. 3 (MIT-BIH Arrhythmia Database. PhysioBank, 2000). 4.2. Experimental setup In the experimental setup we have a 70% for training, which means 350 samples and for testing 30%, which represents 150 samples. With the 70% in the training we used cross validation K-fold of 10. We have 10 random subsamples that are disjoint, we took one for validation and the rest for training and we repeat this cycle with the second subsample and so on. We apply this process of cross validation with three different parameters or architecture of respective classification method and the better result in this process determined the selections of a representative structure per each classifier. Once we have the three representative structures we training with de 70% of the samples and the rest for the test. Finally we have the tables that represent the confusion matrix for the classification. The parameters used of the first classifier Fuzzy KNN is k = 5, 4, 3 nearest neighbors. The architecture for the second classifier Multi Layer Perceptron is 140 neurons in the input layer, 10,000 epochs, learning rate = 0.3, learning method Gradient Descent with momentum 0.5 and we use 50, 100 and 150 neurons in the hidden layer, five neurons in the output layer. Each output represents the respective class. The architecture of the third classifier Multi Layer Perceptron with Scaled Conjugate Gradient Backpropagation is 140 neurons in the input layer, 10,000 epochs, lr = 0.001, also we used 50, 100, 150 neurons in the hidden layer, five neurons in the output layer. Table 1 Results for cross validation K-fold 10 of Fuzzy K-Nearest Neighbor, K = 3, first classifier. Class Normal LBBB RBBB PVC FPN

Normal 68 0 1 0 1

Classification rate = 94.28%.

LBBB 1 68 1 3 1

RBBB 0 0 67 1 1

PVC 0 2 0 64 4

We use a fuzzy inference system to combine the outputs for the three classifiers (see Fig. 7). The characteristics for this FIS are of Mamdani type with centroid deffuzification, 15 inputs with three trapezoid functions. The first five inputs represent the matrix of memberships of Fuzzy KNN classifier. The second five represent the matrix of activations for the Multi Layer Perceptron with the Gradient Descent with momentum Backpropagation learning method. The last five are the matrix of activations of the third classifier Multi Layer Perceptron with Scaled Conjugate Gradient Backpropagation. We have defined 120 fuzzy rules and five outputs with three trapezoid functions. The outputs represent the classes: Normal, LBBB, RBBB, PVC, Fusion Paced and Normal (see Fig. 8). In what follows, we present a detailed account of each of these experimental tests. 4.3. Classifiers 4.3.1. First classifier: Fuzzy K-Nearest Neighbors In the results for the first classifier, we found that the best result with 10 fold cross validation was with k = 3 nearest neighbors, being a classification rate of 94.28%, which is summarized in

Table 2 Results for test of Fuzzy Knn K-Nearest Neighbor, K = 3, First classifier. Class

Normal

LBBB

RBBB

PVC

FPN

Normal LBBB RBBB PVC FPN

30 0 0 0 0

0 28 1 1 0

0 0 29 0 1

0 2 0 28 1

0 0 0 1 28

Classification rate = 95.33%.

Table 3 Results for cross validation K-fold 10 of MLP Gradient Descent with momentum backpropagation, 150 neurons hidden layer, Second classifier.

FPN

Class

Normal

LBBB

RBBB

PVC

FPN

1 0 1 2 63

Normal LBBB RBBB PVC FPN

70 0 1 0 0

0 67 2 7 2

0 2 66 0 3

0 1 0 60 3

0 0 1 3 62

Classification rate = 92.85%.

2953

O. Castillo et al. / Expert Systems with Applications 39 (2012) 2947–2955

Table 1. We then used this parameter in the testing phase and the classification rate was 95.33%, as can be seen in Table 2. 4.3.2. Second classifier: Multi Layer Perceptron with Gradient Descent with momentum Backpropagation In the results for the second classifier, we found that the best result with 10 fold cross validation was with 150 neurons in the hidden layer, being a classification rate of 92.85% that is summarized in Table 3. We used this architecture in the testing phase and the classification rate was 96.67%, as can be seen in Table 4. 4.3.3. Third classifier: Multi Layer Perceptron with Scaled Conjugated Gradient Backpropagation In the results for the third classifier, we found that the best result with 10 fold cross validation was with 100 neurons in the hidden layer, being a classification rate of 93.62%, this is summarized in Table 5. We used this architecture in the testing phase and the classification rate was 97.33%, as can be seen in Table 6. 4.3.4. Fuzzy inference system Finally, with the fuzzy inference system mentioned in the previous section, we combine the outputs produced by each classifier, and this results in a classification rate of 98%, as can be seen in Table 7. 4.4. Statistical analysis Below are the results of statistical analysis to compare the error rate of the three classification methods, this analysis was done with software named statdisk, the comparisons that were made are shown below:

Table 4 Results for test of mlp Gradient Descent with momentum backpropagation, 150 neurons hidden layer, second classifier. Class

Normal

LBBB

RBBB

PVC

FPN

Normal LBBB RBBB PVC FPN

30 0 0 0 0

0 30 1 1 0

0 0 29 0 1

0 0 0 28 1

0 0 0 1 28

Classification rate = 96.67%.

Table 5 Results for cross validation K-fold 10 of mlp scaled conjugated gradient backpropagation, 100 neurons hidden layer, third classifier. Class

Normal

LBBB

RBBB

PVC

FPN

Normal LBBB RBBB PVC FPN

69 0 0 0 0

0 64 2 5 0

0 2 67 0 0

0 3 0 60 3

1 1 1 5 67

 Fuzzy K-Nearest Neighbor vs. MLP with Gradient Descent with momentum Backpropagation.  Fuzzy K-Nearest Neighbors vs. MLP with Scaled Conjugated Gradient Backpropagation.  MLP with Gradient Descent with momentum Backpropagation vs. MLP with Scaled Conjugated Gradient Backpropagation. 4.4.1. Fuzzy K-Nearest Neighbor vs. MLP with Gradient Descent with momentum Backpropagation Table 8 shows the results of the statistical analysis of the comparison of Fuzzy K-Nearest Neighbor with respect to the MLP with Gradient Descent with momentum Backpropagation and this is also illustrated in Fig. 9. 4.4.2. Fuzzy K-Nearest Neighbor vs. MLP Scaled Conjugated Gradient Backpropagation The Table 9 shows the results of the statistical analysis to comparison of Fuzzy K-Nearest Neighbor VS MLP Scaled Conjugated Gradient Backpropagation and this is also illustrated in Fig. 10.

Table 7 Results for fuzzy inference system combining the outputs of the classifiers. Class

Normal

LBBB

RBBB

PVC

FPN

Normal LBBB RBBB PVC FPN

30 0 0 0 0

0 30 1 0 0

0 0 29 0 1

0 0 0 30 1

0 0 0 0 28

Classification rate = 98%.

Table 8 Results of the hypothesis testing of the Fuzzy K-Nearest Neighbor vs. MLP with Gradient Descent with momentum Backpropagation. Test: l = l(hyp) Size of sample, n: 10 Difference of means, d: 1.428571 Difference of st. dev., sd: 4.312088 Statistic, t: 1.0476 Critical value of t: ±2.2622 Value of-P: 0.3221 95% Confidence interval: 4.51325 < l < 1.656108 Null hypothesis not rejected Sample does not provides sufficient statistical evidence to reject null hypothesis

Classification rate = 93.62.

Table 6 Results for test of mlp scaled conjugated gradient backpropagation, 100 neurons hidden layer, third classifier. Class

Normal

LBBB

RBBB

PVC

FPN

Normal LBBB RBBB PVC FPN

30 0 0 0 0

0 29 1 0 0

0 0 29 0 0

0 0 0 30 2

0 1 0 0 28

Classification rate = 97.33%.

Fig. 9. Hypothesis Test of Fuzzy K-Nearest Neighbor vs. MLP with Gradient Descent with momentum Backpropagation.

2954

O. Castillo et al. / Expert Systems with Applications 39 (2012) 2947–2955

Table 9 Results of the hypothesis test of Fuzzy K-Nearest Neighbor vs. MLP Scaled Conjugated Gradient Backpropagation. Test: l = l(hyp) Size of sample, n: 10 Difference of means, d: 0.8571429 Difference of st. dev., sd: 3.82141 Statistic, t: 0.7093 Critical value of t: ±2.2622 Value of-P: 0.4961 95% Confidence interval: 3.590812 < l < 1.876526 Null hypothesis not rejected Sample does not provides sufficient statistical evidence to reject null hypothesis

4.4.3. MLP with Gradient Descent with momentum Backpropagation vs. MLP Scaled Conjugated Gradient Backpropagation Table 10 shows the results of the statistical analysis to comparison of MLP with Gradient Descent with momentum Backpropagation VS MLP Scaled Conjugated Gradient Backpropagation and this is also illustrated in Fig. 11.

5. Conclusions Based on the performed experiments we noticed that in the three classifiers used we achieved good results individually. But by combining their outputs using the Mamdani fuzzy inference system with the appropriate membership functions and rules we achieved a very high classification rate of 98%. We applied the hypothesis testing to compare the error rate of the three classification methods and we found that not exist enough statistical evidence to reject the nulls hypothesis in all made comparisons, so the three methods are good, but they capture (like an expert) different knowledge about classification. Therefore, the aggregation of the three classifiers with a fuzzy system provides a better overall classifier.

References

Fig. 10. Hypothesis Test of Fuzzy K-Nearest Neighbor vs. MLP Scaled Conjugated Gradient Backpropagation.

Fig. 11. Hypothesis Test of MLP with Gradient Descent with momentum Backpropagation vs. MLP Scaled Conjugated Gradient Backpropagation.

Table 10 Results of the hypothesis test of MLP with Gradient Descent with momentum Backpropagation vs. MLP Scaled Conjugated Gradient Backpropagation. Test: l = l(hyp) Sample size, n: 10 Difference of means, d: 0.5714286 Difference of st. dev., sd: 2.950844 Statistic, t: 0.6124 Critical value of t: ±2.2622 Value of-P: 0.5554 95% Confidence interval: 1.539476 < l < 2.682333 The null hypothesis was not rejected Sample does not provides sufficient statistical evidence to reject null hypothesis

Acharya Kumar & Bhat (2004). Classification of cardiac abnormalities using heart rate signals. Medical & Biological Engineering & Computing, 42(3), 288–293. Alzate, A., & Giraldo, E. (2006). Clasificación de Arritmias utilizando ANFIS, Redes Neuronales y Agrupamiento Substractivo, 2006. available online at: redalyc.uaemex.mx/pdf/849/84911639005.pdf. Anuradha, B., & Veera Reddy, V. C. (2008). ANN for classification of cardiac Arrhythimias. ARPN Journal of Engineering and Applied Sciences, 3(3), 1–6. Barbosa, L., Kleisinger, G. H., Valdez, A. D., & Monzón, J. E. (2001). Utilización del Modelo Kohonen y del Perceptrón Multicapa para detectar Arritmias Cardiacas, 2001. available online at: . Belgacem, N., Chikh, M. A., Bereksi Reguig, F. (2003). Supervised classification of ECG using neural networks. Available online at: . Bishop, C. M. (1996). Neural networks for pattern recognition. Oxford, UK: Clarendon Press. Cepek, M., Chudácek, V., Petrik, M., Geogoulas, G., Stylios, C., & Lhotská, L. (2007). Comparison of inductive modeling method to other classification methods for holter ECG. Available at: www.gmdh.net/articles/iwim/IWIM_0.pdf. Ceylan, R., Ozbay, Y., & Karlik, B. (2009). A novel approach for classification of ECG arrhythmias: Type-2 fuzzy clustering neural network. Expert Systems with Applications, 6721–6726. Clifford, G., Azuaje, F., & McSharry, P. (2006). Advanced methods tools for ECG data analysis. Engineering medicine biology. Boston-London: Artech House. de Chazal, P., & Reilly, R. B. (1998). Automatic classification of ECG beats using wave form shape and heart beat interval features, 1998. Available at: citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.2742&rep. . .. del Brío, B. M., & Sanz Molina, A. (2007). Redes Neuronales y Sistemas Borrosos, 3ª Edición. Alfaomega Ra-Ma Publisher. Engin, M. (2004). ECG beat classification using neuro-fuzzy network. Pattern Recognition Letters, 25, 1715–1722. Heart Health, (2009). National Geographic. . Keller, J. M., Gray, M. R., & Givens, J. A. Jr., (1985). A fuzzy K-nearest neighbor algorithm. IEEE Transactions on Systems Man and Cybernetics, 15(4), 580–585. Khadra Al-Fahoum & Al-Nashash (1997). Detection of life-threatening cardiac arrhythmias using wavelets transformation. Medical and Biological Engineering Computing, 35(6), 626–632. Maglaveras, N., Stamkopoulos, T., Diamantaras, K., Pappas, C., & Strintzis, M. (1998). ECG pattern recognition and classification using non-linear transformations and neural networks. Journal of Medical Informatics, 191–208. MIT-BIH Arrhythmia Database. PhysioBank, (2000). Physiologic signal archives for biomedical research. . Nabney, I. T., Evans, D. J., Tenner, J., & Gamlyn, L. (2001). Benchmarking beat classification algorithms, computers in cardiology. Netherlands: Rotterdam, pp. 529–532. Neural Network for Mathematica, Wolfram Research Inc., Chicago, IL, 2003. O’Dwyer, M., de Chazal, P., & Reilly, R. B. (2000). Beat classification for use in arrhythmia analysis, 2000. Available at: . Ozbay, Y., Ceylan, R., & Karlik, B. (2006). A fuzzy clustering neural network architecture for classification of ECG Arrhtymias. Computers in Biology and Medicine, 376–388. Patra, D., Kumar Das, M., Pradhan, S. (2009). Integration of FCM, PCA and neural networks for classification of ECG arrhythmias. IAENG International Journal of

O. Castillo et al. / Expert Systems with Applications 39 (2012) 2947–2955 Computer Science, 36(3) 2009, available online at: . Samarasinghe, S. (2007). Neural networks for applied science and engineering. Auerback Publications. Sun, Y., & Chan, K. (2000). Arrhythmia detection and recognition in ECG signals using nonlinear techniques. Annals of Biomedical Engineering, 28, 37.

2955

Tsipouras, M., & Fotiadis, D. (2003). An efficient system for the detection of arrhythmic segments in ECG recordings based on non-linear features of the RR interval signal. Computers in Cardiology, 533–536. Werbos, P. J. (1994). The roots of backpropagation form ordered derivatives to neural networks and political forecasting. Wiley series on adaptive and learning system for signal processing communication and control. New York: Wiley.