An expert system based on linear discriminant analysis and adaptive neuro-fuzzy inference system to diagnosis heart valve diseases

An expert system based on linear discriminant analysis and adaptive neuro-fuzzy inference system to diagnosis heart valve diseases

Available online at www.sciencedirect.com Expert Systems with Applications Expert Systems with Applications 35 (2008) 214–222 www.elsevier.com/locate...

382KB Sizes 11 Downloads 108 Views

Available online at www.sciencedirect.com

Expert Systems with Applications Expert Systems with Applications 35 (2008) 214–222 www.elsevier.com/locate/eswa

An expert system based on linear discriminant analysis and adaptive neuro-fuzzy inference system to diagnosis heart valve diseases Abdulkadir Sengur Firat University, Department of Electronics and Computer Science, 23119, Elazig, Turkey

Abstract In the last two decades, the use of artificial intelligence methods in biomedical analysis is increasing. This is mainly because of the effectiveness of classification and detection systems have improved in a great deal to help medical experts in diagnosing. In this paper, we investigate the use of linear discriminant analysis (LDA) and adaptive neuro-fuzzy inference system (ANFIS) to determine the normal and abnormal heart valves from the Doppler heart sounds. The proposed heart valve disorder detection system is composed of three stages. The first stage is the pre-processing stage. Filtering, normalization and white-denoising are the processes that were used in this stage. The feature extraction is the second stage. During feature extraction stage, Wavelet transforms and short-time Fourier transform were used. As next step, wavelet entropy was applied to these features. For reducing the complexity of the system, LDA was used for feature reduction. In the classification stage, ANFIS classifier is chosen. To evaluate the performance of proposed methodology, a comparative study is realized by using a data set containing 215 samples. The validation of the proposed method is measured by using the sensitivity and specificity parameters. 95.9% sensitivity and 94% specificity rate was obtained. Ó 2007 Elsevier Ltd. All rights reserved. Keywords: Doppler heart sounds; Heart valves; Feature extraction; Wavelet decomposition; Feature reduction; Adaptive neuro-fuzzy inference system

1. Introduction The heart consists of four chambers, two atria and two ventricles. There is a valve through which blood passes before leaving each chamber of the heart. The valves prevent the backward flow of blood. These valves are actual flaps that are located on each end of the two ventricles (http:// www.healthsystem.virginia.edu/uvahealth/adult_cardiac/ disvalve.cfm). Heart valve disease is when one or more valves in the heart are not working fully and blood does not flow through the heart as it should. This can put an extra strain on the heart and cause symptoms such as breathlessness and swollen ankles. Severe heart valve disease can cause the heart to pump less efficiently (http://hcd2.bupa.co.uk/ fact_sheets/html/heart_valve_disease.html).

E-mail address: ksengur@firat.edu.tr 0957-4174/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2007.06.012

Heart valve disease may be suspected if the heart sounds heard through a stethoscope are abnormal. This is usually the first step in diagnosing a heart valve disease. A characteristic heart murmur (abnormal sounds in the heart due to turbulent blood flow) can often indicate valve regurgitation. To further define the type of valve disease and extent of the valve damage, physicians may use any of the following diagnostic procedures; electrocardiogram (ECG or EKG), chest X-ray, cardiac catheterization, transesophageal echo (TEE), radionuclide scans and magnetic resonance imaging (MRI) (Nanda, 1993). According to the researches most of human deaths in the world are due to heart diseases. For this reason, early detection of heart valve disorders is necessary in the medical research areas (Akay, Akay, & Welkowitz, 1992). In the last decade, Doppler technique has gained much more interest since Satomura first demonstrated the application of the Doppler Effect to the measurement of blood velocity in 1959 (Keeton & Schlindwein, 1997). Doppler heart

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

sounds (DHS) are one of the most important sounds produced by blood flow, valves motion and vibration of the other cardiovascular components (Plett et al., 2000). However, the factors such as calcified disease or obesity often result in a diagnostically unsatisfactory Doppler techniques assessment and, therefore, it is sometimes necessary to assess the spectrogram of the Doppler shift signals to elucidate the degree of the disease (Jing, Xuemin, Mingshi, & Wie, 1997). In addition to Doppler techniques, the techniques that are more complex have also been developed (Laplace Transform and Principal Components Analysis). Many studies have been implemented to classify Doppler signals in the pattern recognition field (Chan, Chan, Lam, Lui, & Poon, 1997; Wright, Gough, Rakebrandt, Wahab, & Woodcock, 1997). In this study, an expert system has been proposed that has three stages. The first stage presents several pre-processing units for DHS signals. White de-noising, normalization and filtering are the components of the pre-processing unit. The second stage is based on the powerful mathematical and statistical structures. To separate normal and abnormal heart valves: firstly, a few mathematical structures include wavelet transform and wavelet entropy: short-time fourier transform (STFT) are used to extract the features from the pre-processed Doppler signals. Feature reduction was also carried out in this stage (Guler, Kiymik, Kara, & Yuksel, 1992). The dimension of the obtained feature vector that has 91 features was reduced to 4 features using linear discriminant analysis (LDA). The third stage is the classifier stage where the ANFIS structure is used. The performance of the proposed expert system was evaluated with several statistical validation methods. Moreover, we compared our results with the previous methods where the same data set have been used. Our system obtained 95.9% sensitivity and 94% specificity rate. The rest of the paper is organized as following; in Section 2, previous related researches, the Doppler heart signals, wavelet decomposition, short-time Fourier transform, wavelet entropy and ANFIS structure is described. The methodology is described in Section 3. This method enables a large reduction of the Doppler signal data while retaining problem specific information which facilitates an efficient pattern recognition process. In Section 4, the performance evaluation methods are given. The effectiveness of the proposed method for classification of Doppler signals in the diagnosis of heart valve diseases is demonstrated in Section 5. Finally, we concluded this paper in Section 6. 2. Background 2.1. Previous research Up to now, several papers have been proposed to determine Doppler signals by using pattern recognition approaches (C ¸ omak, Arslan, & Tu¨rkog˘lu, 2007; Guler et al., 1992; Turkoglu, Arslan, & Ilkay, 2002). Turkoglu et al. (2002) proposed an expert diagnosis system for inter-

215

pretation the Doppler signals of the heart valve diseases based on the pattern recognition (Guler et al., 1992). The proposed methodology was composed of a feature extractor and a back-propagation artificial neural networks (BPANN) classifier. Wavelet transforms and short-time Fourier transform methods were used for feature extraction from the Doppler signals on the time-frequency domain. Wavelet entropy method was applied to these features. The back-propagation neural network was used to classify the extracted features. The performance of the developed system has been evaluated in 215 samples. The test results showed that this system was effective to detect Doppler heart sounds. The correct classification rate was about 94% for normal subjects and 95.9% for abnormal subjects. The dataset (215 samples) which was obtained by Turkoglu et al. (2002) was later used by C ¸ omak et al. (2007). C ¸ omak et al. (2007) investigated the use of least square support vector machines (LS-SVM) classifier for improving the performance of the Turkoglu’s (2002) proposal (Turkoglu et al., 2002). Moreover, they intended to realize a comparative study. Classification rates of the examined classifiers were evaluated by ROC curves based on the terms of sensitivity and specificity. The application results showed that according to the ROC curves, the LS-SVM classifier performance was almost same with BP-ANN. It was reported that, LS-SVM was more suitable than BP-ANN since it has some advantages against BPANN. Another reported advantage of LS-SVM was its shorter training time. It was also reported that, ANN’s training time was about 13 times longer than LS-SVM’s training time according to the experimental results. Furthermore, this property was the only advantage of LSSVM against BP-ANN. Later, Uguz, Arslan, and Tu¨rkog˘lu (2007), proposed a biomedical system based on Hidden Markov Model for clinical diagnosis and recognition of heart valve disorders (C ¸ omak et al., 2007). The proposed methodology was also used the database of Turkoglu et al. (2002). In the presented study, continuous HMM (CHMM) classifier system was used. Single Gaussian model was preferred to determine emission probability. The proposed methodology was composed of two stages. At the first stage, the initial values of average and standard deviation were calculated by separating observation symbols into equal segments according to the state number and using observation symbols appropriate to each segment. At the second stage, the initial values of average and standard deviation were calculated by separating observation symbols into the clusters (FCM or K-means algorithms) that have equal number of states and using observation symbols appropriate to the separated clusters. The implementations of the experimental studies were carried out on three different classification systems such as CHMM, FCM–K-means/CHMM and ANN. These experimental results were obtained for specificity and sensitivity rates 92% and 94% for CHMM, 92% and 97.26% for (FCM–K-means/CHMM), respectively.

216

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

2.2. DHS signals The DHS can be obtained directly by placing the Doppler ultrasonic flow transducer over the chest of the patient (Wright et al., 1997). A DHS signal from aortic heart valve is given in Fig. 1. The DHS produced from echoes backscattered by moving blood cells is generally in the range of 0.5 to 10 kHz (Uguz et al., 2007). DHS signal spectral estimation is now commonly used to evaluate blood flow parameters in order to diagnose cardiovascular diseases. Spectral estimation methods are particularly used in Doppler ultrasound cardiovascular disease detection. Clinical diagnosis procedures generally include analysis of a graphical display and parameter measurements, produced by blood flow spectral evaluation. Ultrasonic instrumentation typically employ Fourier based methods to obtain the blood flow spectra, and blood flow measurements (Saini, Nanda, & Maulik, 1993). A Doppler signal is not a simple signal. It includes random characteristics due to the random phases of scattering particles present in the sample volume. Other effects such as geometric broadening and spatially varying velocity also affect the signal (Karabetsos, Papaodysseus, & Koutsouris, 1998). The following is Doppler equation: 2mf cos h Df ¼ c

ð1Þ

where m equals the velocity of the blood flow, f equals the frequency of the emitted ultrasonic signal, c equals the velocity of sound in tissue (approximately 1540 m/s), Df equals the measured Doppler frequency shift, and h equals the angle of incidence between the direction of blood flow and the direction of the emitted ultrasonic beam (Uguz et al., 2007). 2.3. Wavelet decomposition The main advantage of wavelets is that they have a varying window size, being wide for slow frequencies and narrow for the fast ones, thus leading to an optimal timefrequency resolution in all the frequency ranges. Furthermore, owing to the fact that windows are adapted to the transients of each scale, wavelets lack of the requirement of stationarity (Quiroga, 1998).

Wavelet decomposition uses the fact that it is possible to resolve high frequency components within a small time window, and only low frequencies components need large time windows. This is because a low frequency component completes a cycle in a large time interval whereas a high frequency component completes a cycle in a much shorter interval. Therefore, slow varying components can only be identified over long time intervals but fast varying components can be identified over short-time intervals. Wavelet decomposition can be regarded as a continuous time wavelet decomposition sampled at different frequencies at every level or stage. The wavelet decomposition functions at level m and time location tm can be expressed as Eq. (2): t  t  m d m ðtm Þ ¼ xðtÞ  Wm ð2Þ 2m where Wm is the decomposition filter at frequency level m. The effect of the decomposition filter is scaled by the factor 2m at stage m, but otherwise the shape is the same at all stages. The synthesis of the signal from its time-frequency coefficients given in Eq. (3) can be rewritten to express the composition of the signal x[n] from its wavelet coefficients. d½n ¼ x½n  h½n c½n ¼ x½n  g½n

ð3Þ

where h[n] is the impulse response of the high-pass filter and g[n] is the impulse response of the low pass filter (Devasahayam, 2000). 2.4. Short-time fourier transform Short-time fourier transform (STFT), also know as the time-dependent or the windowed Fourier transform, attempts to analyse non-stationary signals by dividing the whole signal into shorter data frames. In short, the STFT can be compactly represented by Eq. (4):   N 1 X j2pnk X ðkÞ ¼ xðnÞ  xðn  n0 Þ  exp  ð4Þ N n¼0 where, x(n  n0) is a window function to suppress side lobes while minimizing the main lobe leakage. The output of successive STFTs can provide a time-frequency representation of the signal. To accomplish this signal is truncated into short data frames by multiplying it by a window so that the modified signal is zero outside the data frame. The frequency spectrum for the data frame is calculated using the fast Fourier transform. One of the limitations of STFT is that the time frame for analysis of the signal is fixed (Keeton & Schlindwein, 1997). 2.5. Wavelet entropy

Fig. 1. The waveform pattern of the Doppler heart sound.

Entropy-based criteria describe information-related properties for an accurate representation of a given signal. Entropy is a common concept in many fields, mainly in

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

signal processing (Coifman & Wickerhauser, 1992). A method for measuring the entropy appears as an ideal tool for quantifying the ordering of non-stationary signals. An ordered activity (i.e. a sinusoidal signal) is manifested as a narrow peak in the frequency domain, thus having low entropy. On the other hand, random activity has a wide band response in the frequency domain, reflected in a high entropy value (Quiroga, Roso, & Basar, 1999). The types of entropy computing are Shannon, threshold, norm, log energy and sure (Coifman & Wickerhauser, 1992).

2.6. Architecture of adaptive-network-based fuzzy inference system An adaptive-network, as its name implies, is a network structure consisting of nodes and directional links through which the nodes are connected. Moreover, parts or all of the nodes are adaptive, which means each output of these nodes depends on the parameters pertaining to this node and the learning rule specifies how these parameters should be changed to minimize a prescribe error measure (Gu¨ler & ¨ beyli, 2005; Jang, 1993). For simplicity, we assume the U fuzzy inference system under consideration has two inputs x and y and one output z. Suppose that the rule base contains two fuzzy if-then rules of Takagi and Sugeno’s type: Rule 1: If (x is A1) and (y is B1) then (f1 = p1x + q1y + r1) Rule 2: If (x is A2) and (y is B2) then (f2 = p2x + q2y + r2) where Ai and Bi are the fuzzy sets, fi are the outputs within the fuzzy region specified by the fuzzy rule, pi, qi and ri are the design parameters that are determined during the training process. The ANFIS architecture to implement these two rules is shown in Fig. 2, in which a circle indicates a fixed node, whereas a square indicates an adaptive node.

217

where x is the input to node i, and Ai is the linguistic label (small, large, etc.) associated with this node function, and where lAi (x), lBi2 (y) can adopt any fuzzy membership function. Usually we choose lAi (x) to be bell-shaped with maximum equal to 1 and minimum equal to 0, such as lAi ðxÞ ¼ 1þ



1 xci ai

2 bi

where (ai, bi and ci) is the parameter set. Parameters in this layer are referred to as premise parameters. Layer 2: The nodes in this layer are fixed. These are labelled M to indicate that play the role of a simple multiplier. The outputs of these nodes are given by: O2i ¼ wi ¼ lAi ðxÞlBi ðyÞ i ¼ 1; 2

O1i ¼ lAi ðxÞ; O1i

i ¼ 1; 2

¼ lBi2 ðyÞ;

Layer 1

ð5Þ

i ¼ 3; 4

Layer 2

ð6Þ

Layer 3

Layer 4 x y

Layer 5

A1 x

M

w1

N

w1 w1 f1

A2 S B1 y

M B2

w2

w2 f 2

N

w2 x

Fig. 2. ANFIS architecture.

y

f

ð8Þ

which are the so-called firing strengths of the rules. Layer 3: Every node in this layer is a circle node labelled N. The ith node calculates the ratio of the ith rule’s firing strength to the sum of all rules’ firing strengths: wi O3i ¼ wi ¼ i ¼ 1; 2 ð9Þ w1 þ w2 For convenience, outputs of this layer will be called normalized firing strengths. Layer 4: In this layer, the nodes are adaptive nodes. The output of each node in this layer is simply the product of the normalized firing strength and a first order polynomial (for a first order Sugeno model). Thus, the outputs of this layer are given by: O4i ¼ wi fi ¼ wi ðpi x þ qi y þ ri Þ i ¼ 1; 2

Layer 1: Every node i in this layer is a square node with a node function

ð7Þ

ð10Þ

Parameters in this layer will be referred to as consequent parameters. Layer 5: The P single node in this layer is circle node labelled that computes the overall output as the summation of all incoming signals, i.e., P2 2 X wi fi O5i ¼ wi fi ¼ i¼1 ð11Þ w þ w2 1 i¼1 It can be watched that there are two adaptive layers in this ANFIS architecture, namely the first layer and the fourth layer. In the first layer, there are three modifiable parameters {ai, bi, ci}, which are related to the input membership functions. These parameters are the so-called premise parameters. In the fourth layer, there are also three modifiable parameters {pi, qi, ri}, pertaining to the first order polynomial. These parameters are so-called consequent parameters (Gu¨ler & ¨ beyli, 2005; Jang, 1993). U

218

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

2.7. Learning algorithm of ANFIS The aim of the training algorithm for this architecture is tune all the changeable parameters to make the ANFIS output match the training data. Note here that parameters ai, bi and ci of the membership function are fixed, and describe the sigma, slope and centre of the bell membership functions, respectively. Thus, the output of the ANFIS model can be written as (Jang, 1993): w1 w2 f ¼ f1 þ f2 ð12Þ w1 þ w2 w1 þ w2

Doppler Ultrasound

PRE-PROCESSING

Data acquisition Filtering White de-noising Normalization Cleaned DHS Signal (The heart valves)

FEATURE EXTRACTION

Wavelet decomposition Short-time Fourier transform Wavelet entropy Feature Space (91 features)

Substituting Eq. (5) into Eq. (8) yields: f ¼ w1 f1 þ w2 f2

ð13Þ

FEATURE REDUCTION

LDA

Substituting the fuzzy if-then rules into Eq. (9), it becomes: f ¼ w1 ðp1 x þ q1 y þ r1 Þ þ w2 ðp2 x þ q2 y þ r2 Þ

(4 features)

ð14Þ

After rearrangement, the output can be written as: CLASSIFICATION

f ¼ ðw1 xÞp1 þ ðw1 yÞq1 þ ðw1 Þr1 þ ðw2 xÞp2 þ ðw2 yÞq2 þ ðw2 Þr2

ANFIS

ð15Þ

which is a linear combination of the changeable consequent parameters p1, q1, r1, p2, q2 and r2. The least squares method can be used to identify the optimal values of these parameters easily. When the premise parameters are not fixed, the search space becomes larger and the convergence of the training becomes slower. A hybrid algorithm combining the least squares method and the gradient descent method is adopted to solve this problem. The hybrid algorithm is composed of a forward pass and a backward pass. The least squares method (forward pass) is used to optimize the consequent parameters with the premise parameters fixed. When the optimal consequent parameters are found, the backward pass starts immediately. The gradient descent method (backward pass) is used to adjust optimally the premise parameters corresponding to the fuzzy sets in the input domain. The output of the ANFIS is calculated by employing the consequent parameters found in the forward pass. The output error is used to adapt the premise parameters by means of a standard back propagation algorithm. It has been proven that this hybrid algorithm is ¨ beyli, highly efficient in training the ANFIS (Gu¨ler & U 2005; Jang, 1993). 3. Methodology The proposed methodology for detection of heart valve disorders is illustrated in Fig. 3. It consists of three parts: (a) Data Acquisition and Pre-processing, (b) Feature Extraction and Feature Reduction, (c) Classification Using ANFIS. 3.1. Data acquisition and pre-processing DHS signals were acquired from the Acuson Sequoia 512 Model Doppler Ultrasound system in the Cardiology

Decision Space

CLASSIFICATION Abnormal valve Normal valve RESULTS

Fig. 3. The algorithm of the expert diagnostic system.

Department of the Firat Medical Center. DHS signals were sampled at 20 kHz for 5 s and signal to noise ratio of 0 dB by using a sound card which has 16-bit A/D conversion resolution and computer software prepared by us in the MATLAB (version 6.5) (The MathWorks Inc.). The Doppler ultrasonic flow transducer (Model 3V2c) uses an operating mode of 2 MHz continuous wave. Filtering: The reserved DHS signals were high-pass filtered to remove unwanted low-frequency components, because the DHS signals are generally in the range of 0.5–10 kHz. The filter is a digital FIR, which is a fiftiethorder filter with a cut-off frequency equal to 500 Hz and window type is the 51-point symmetric Hamming window. i. White de-noising: White noise is a random signal that contains equal amounts of every possible frequency, i.e. its FFT has a flat spectrum (Devasahayam, 2000). The DHS signals were filtered removing the white noise by using wavelet packet. The white denoising procedure contains three steps (Bakhtazad, Palazoglu, & Romagnoli, 1999): 1. Decomposition: Computing the wavelet packet decomposition of the DHS signal at level 4 and using the Daubechies wavelet of order 4. 2. Detail coefficient thresholding: For each level from 1 to 4, soft thresholding is applied to the detail coefficients.

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

219

3. Reconstruction: Computing wavelet packet reconstruction based on the original approximation coefficients of level 4 and the modified detail coefficients of levels from 1 to 4. ii. Normalization: The DHS signals in this study were normalized using Eq. (16) so that the expected amplitude of the signal is no affected from the rib cage structure of the patient. DHSsignal ¼

DHSsignal jðDHSsignal Þmax j

ð16Þ

3.2. Feature extraction The DHS waveform patterns from heart valves are rich in detail and highly non-stationary. The goal of the feature extraction is to extract features from these patterns for reliable intelligent classification. After the data pre-processing has been realised, three steps are proposed in this paper to extract the characteristics of these waveforms using MATLAB with the wavelet toolbox and the signal processing toolbox: i. Wavelet decomposition: For wavelet decomposition of the DHS waveforms, the decomposition structure, reconstruction tree at level 12 as illustrated in Fig. 4 was used. Wavelet decomposition was applied to the DHS signal using the Daubechies-10 wavelet decomposition filters. Thus obtaining two types of coefficients: one-approximation coefficients cA and twelve-detail coefficients cD. A representative example of the wavelet decomposition of the Doppler sound signal of the heart aortic valve was shown in Fig. 5. ii. Short-time fourier transform: The STFT is the most robust and understood one of the various time-frequency representations. The STFT of waveforms of terminal nodes was computed using a Hanning window function of 25,000-points, the sections overlap of 12,500 points between scans and zero padding the sections if the length of the window exceeds 25,000-points. A representative example of the STFT spectrums of a terminal node waveform is indicated in Fig. 6.

Original signal

Fig. 5. The terminal node waveforms of wavelet decomposition at twelve levels of the DHS signal.

Fig. 6. The STFT spectrums of a terminal node waveform.

iii. Wavelet entropy: We next calculated the norm entropy as defined in Eq. (17) of waveforms of the STFT spectrums. X 3=2 EðsÞ ¼ jsi j ð17Þ i

cA12

cD12

cD3

cD2

cD1

Terminal nodes

Fig. 4. The decomposition structure in level-12.

where, s is the STFT spectrum and si is ith coefficients of s. The resultant entropy data, which were normalized with 1/50,000, were plotted in Fig. 7. The plot of

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

Entropy magnitude

220

Table 1 ANFIS architecture and training parameters

4

Architecture The number of layers

2

0

0

20

40

60

80

100

The number of entropy Fig. 7. The wavelet entropy of the DHS signal.

the entropy data includes 91 features obtained from 13 terminal nodes which each one contains waveform of seven frequency spectrums per DHS signal. Thus, the feature vector was extracted by computing the wavelet entropy values for each DHS signal. 3.3. Linear discriminant analysis (LDA) for feature reduction Linear discriminant analysis (LDA) searches for those vectors in the feature space that best discriminate among classes rather than those that best describe the data (Kim, Kim, & Bang, 2002; Lee, Park, Song, & Lee, 2005). The main aim of LDA is to seek a transformation matrix W that in some sense maximizes the ratio of the between-class scatter to the within-class scatter. A withinclass scatter matrix Sw is defined as: c X X T ðx  mi Þðx  mi Þ ð18Þ Sw ¼ i¼1 x2C i

where c is the number of classes, Ci is a set of data belonging to the ith class, and mi is the mean of the ith class. The withinclass scatter matrix represents the degree of scatter withinclasses as a summation of covariance matrices of all classes. A between-class scatter matrix SB is defined as follows: c X ni ðmi  mÞðmi  mÞT ð19Þ SB ¼ i¼1

The transformation matrix W that in some sense maximizes the ratio of the between-class scatter and the within-class scatter is needed. The criterion function J(W) can be defined as; J ðW Þ ¼

jW T S B W j jW T S w W j

ð20Þ

One can obtain the transformation matrix W as one that maximizes the criterion function J(W) and, given a number of independent features relative to which data is described, LDA creates a linear combination of these which yields the largest mean differences the desired classes (Martinez & Kak, 2001; Lee et al., 2005). 3.4. Classification using ANFIS The objective of classification is to demonstrate the effectiveness of the proposed feature extraction method

Type of input membership functions Training parameters Learning rule

Momentum constant Sum-squared error Epochs number to sumsquared error

5 Input :5 Number of rules : 32 Output :1 Bell-shaped

Hybrid learning algorithm (back-propagation for nonlinear parameters (ai, ci) and Least square errors for linear parameters (pi, qi, ri, si, ssi, ppi, ui)) 0.9 0.00001 767

from the DHS signals. For this purpose, the feature vectors were applied as the input to an ANN classifier. The classification by ANFIS was performed using MATLAB. The training parameters and the structure of the ANFIS used in this study are as shown in Table 1. 4. Performance evaluation methods Different evaluation methods were used for calculating the performance of the proposed expert system. These methods are classification accuracy, sensitivity and specificity measures and confusion matrix. The description of these methods will be given in the following sub sections. 4.1. Classification accuracy The classification accuracy is the common method that is used in the pattern recognition applications. The classification accuracy for the experiment is taken as the ratio of the number of samples correctly classified to the total number of samples. 4.2. Sensitivity and specificity For sensitivity and specificity analysis, we use the following expressions. TP % TP þ FN TN Specificity ¼ % FP þ TN Sensitivity ¼

ð21Þ ð22Þ

where TP, TN, FP and FN denote true positives, true negatives, false positives, and false negatives, respectively. True positive (TP): An input is detected as a patient with atherosclerosis diagnosed by the expert clinicians. True negative (TN): An input is detected as normal that is labeled as a healthy person by the expert clinicians. False positive (FP): An input is detected as a patient that is labeled

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

221

as a healthy by the expert clinicians. False negative (FN): An input is detected as normal with atherosclerosis diagnosed by the expert clinicians.

Table 2 Testing results of the proposed methodology

4.3. Confusion matrix

The number of samples Correct classification Incorrect classification The accuracy (%)

A confusion matrix is composed of the actual and the predicted classifications done by a classification system. Confusion matrix identifies the common misclassifications of the proposed classification schema. Performance of such a system is commonly evaluated using the data in the matrix. 5. Experimental classification results The experimental study was performed with 215 data that consist of 110 data (54 abnormal, 56 normal) of the aortic heart valve and 105 data (66 abnormal, 39 normal) of the mitral heart valve. Thus, the populations of 95 normal and 120 abnormal samples were handled. These samples were from 132 males and 83 females. Their ages were between 15 and 80 years. Also, the mean of their ages was 48 years. These abnormal data include all diseases related to aortic and mitral valves such as aortic insufficiency and stenosis, and mitral insufficiency and stenosis. Normal represents that there was no insufficiency and stenosis in aortic or mitral heart valves, whereas abnormal represents their presence. The data set was obtained in supervision of expert doctors. The diagnosis of whether a patient was normal or abnormal is determined after the Doppler observations and clinical results were discussed by experiment doctors as in Jing et al. (1997). In the training processing; 14 abnormal and 25 normal subjects were selected for diagnosis of the aortic heart valve and 33 abnormal and 20 normal subjects were chosen for diagnosis of the mitral heart valve. Remaining of the data set was used as a test set. As it was mentioned in the earlier section, the number of the features that was used for characterizing the normal and abnormal mitral and aortic heart valves was 91. Therefore, instead of using these features directly, the linear projection of the feature vector was obtained by using LDA.

The heart mitral valve

The heart aortic valve

N

AN

N

AN

19 18 1 94.7

33 32 1 96.9

31 29 2 93.5

40 38 2 95

In other words, the feature vector was reduced to 4 features with the LDA algorithm which is described in Section 3.3. The classification accuracy of the proposed LDA and ANFIS algorithm’s testing results were given in Table 2. One abnormal and one normal heart mitral valve patterns were classified incorrectly whereas two normal and two abnormal heart aortic valve patterns were classified incorrectly by the proposed methodology. The confusion matrix and the calculated sensitivity and specificity rates were given in Table 3. The performance comparison of the proposed system with the other classifiers from the literature was also demonstrated in the Table 3. According to these results, higher sensitivity rate (97.3%) was obtained by using the FCM–CHMM (Uguz et al., 2007). Our proposal and ANN methods produced the same and the second higher sensitivity rate (95.9%). SVM method gained the worst sensitivity rate (94.5%) On the other hand, the higher specificity rate (94%) was obtained with our proposal and ANN method. FCM–CHMM produced the 92% specificity value. And finally the SVM produced the lower specificity rate (90%). 6. Discussion and conclusion In this study, a medical decision support system with normal and abnormal classes has been developed. In the related previous works, a feature vector which has 91 features was considered as input to the classifiers. The complexities of the previous works were quite high due to the dimension of the feature vector. Therefore a data reduction process was emerged. Data reduction with LDA and ANFIS classifier was used in this work for diagnosis of aortic and mitral heart valve disorders from DHS signals.

Table 3 Obtained performance parameters with our proposed system and other classifiers from the literature Method and classifier

Type

No. of patients

Detected as abnormal

Detected as normal

SN = Sensitivity (%), SP = Specificity (%)

LDA and ANFIS

Abnormal Normal

73 50

70 47

3 3

SN = 95.9 SP = 94

ANN (Turkoglu et al., 2002)

Abnormal Normal

73 50

70 47

3 3

SN = 95.9 SP = 94

SVM (C ¸ omak et al., 2007)

Abnormal Normal

73 50

69 45

4 5

SN = 94.5 SP = 90

FCM/CHMM (Uguz et al., 2007)

Abnormal Normal

73 50

71 46

2 4

SN = 97.3 SP = 92

222

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

For this purpose, computer simulations were carried out and statistical validation indexes were used for determining the performance of the proposed methodology. The previous reported results were also compared with our proposal. The proposed methodology was composed of three main stages. These stages are pre-processing, feature extraction and feature reduction with LDA and classification. The wavelet decomposition for multi-scale analysis, STFT for time-frequency representations, and the wavelet entropy was used for feature extraction whereas LDA was used for feature reduction and ANFIS classifier was adopted for efficient recognition. According to the experimental results, proposed method is efficient for interpretation of the disease. Our proposal and ANN methods produced the 95.9% sensitivity rate and 94% specificity rate. This specificity rate is the highest rate. SVM method gained the worst sensitivity and specificity rate. FCM–CHMM produced the 97.3% sensitivity and 92% specificity values. 97.3% sensitivity rate is the highest value among the proposed previous works. The proposed system has some advantages of automation. It is rapid, easy to process, non-invasive and cheap for clinical application. This system is of the better clinical application especially for earlier survey of population. However, the position of the ultrasound probe, which is used for data acquisition from the heart valves, must be taken into consideration by physician. Acknowledgment The author thank Dr. Ibrahim TURKOGLU for providing the DHS signals and for his valuable suggestion in improving the technical presentation of this paper. References Akay, M., Akay, Y. M., & Welkowitz, W. (1992). Neural networks for the diagnosis of coronary artery disease. International Joint Conference on Neural Networks, IJCNN, 2, 419–424. Bakhtazad, A., Palazoglu, A., & Romagnoli, J. A. (1999). Process data denoising using wavelet transform. Intelligent Data Analysis, 3(October), 267–285. Chan, B. C. B., Chan, F. H. Y., Lam, F. K., Lui, P. W., & Poon, P. W. F. (1997). Fast detection of venous air embolism is doppler heart sound using the wavelet transform. IEEE Transactions on Biomedical Engineering, 44(4), 237–245. Coifman, R. R., & Wickerhauser, M. V. (1992). Entropy-based Algorithms for best basis selection. IEEE Transactions on Information Theory, 38(2), 713–718. C ¸ omak, E., Arslan, A., & Tu¨rkog˘lu, I. (2007). A decision support system based on support vector machines for diagnosis of the heart valve diseases. Computers in Biology and Medicine, 37, 21–27.

Devasahayam, S. R. (2000). Signals and Systems in Biomedical Engineering. Kluwer Academic Publishers. Guler, I., Kiymik, M. K., Kara, S., & Yuksel, M. E. (1992). Application of autoregressive analysis to 20 MHz pulsed Doppler data in real time. International Journal Biomedical Computing, 31(3–4), 247–256. _ &U ¨ beyli, E. D. (2005). Adaptive neuro-fuzzy inference system Gu¨ler, I., for classification of EEG signals using wavelet coefficients. Journal of Neuroscience Methods, 148(2), 113–121. http://hcd2.bupa.co.uk/fact_sheets/html/heart_valve_disease.html. accessed 01.12.2006. http://www.healthsystem.virginia.edu/uvahealth/adult_cardiac/disvalve. cfm. accessed 01.12.2006. Jang, J.-S. R. (1993). ANFIS: Adaptive-network-based fuzzy inference system. IEEE Transactions on Systems Management Cybernetics, 23(3), 665–685. Jing, F., Xuemin, W., Mingshi, W., & Wie, L. (1997). Noninvasive acoustical analysis system of coronary heart disease. Biomedical Engineering Conference, Proceedings of the 1997 Sixteenth Southern, pp. 239–241. Karabetsos, E., Papaodysseus, C., & Koutsouris, D. (1998). Design and development of a new ultrasonic doppler technique for estimation of the aggregation of red blood cells. Elsevier Meas, 24, 207–215. Keeton, P. I. J., Schlindwein, F. S. (1997). Application of wavelets in Doppler ultrasound. MCB University Press, Vol. 17, Number 1, pp. 38– 45. Kim, H. C., Kim, D. J., & Bang, S. Y., (2002). ‘‘Face Recognition Using LDA Mixture Model.’’ Proceedings 16th International Conference on Pattern Recognition, Vol. 2, pp. 925–928. Lee, J., Park, K. L., Song, M. H., & Lee, K. J., (2005). Arrhythmia Classification with Reduced Features by Linear Discriminant Analysis. Engineering in Medicine and Biology 27th Annual Conference Shanghai, China, September 1–4, 2005. Martinez, A. M., & Kak, A. C. (2001). PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 228–233. Nanda, N. C. (1993). Doppler Echocardiography (2nd ed.). London: Lea and Febiger. Plett, M. I. (2000). Ultrasonic arterial vibrometry with wavelet based detection and estimation. University of Washington, Ph.D. Thesis, pp. 17–18. Quiroga, R. Q. (1998). Quantitative analysis of EEG signals: Timefrequency methods and chaos theory. Intitute of Physiology – Medical University Lu¨beck. Quiroga, R. Q., Roso, O. A., & Basar, E. (1999). Wavelet entropy: A measure of order in evoked potentials. Elsevier Science, Evoked Potentials and Magnetic Fields, September, 49, pp. 298–302. Saini, V. D., Nanda, N. C., & Maulik, D. (1993). Basic Principles of Ultrasound and Doppler Effect. Doppler Echocardiography. Philadelphia, London: Lea and Febiger. Turkoglu, I., Arslan, A., & Ilkay, E. (2002). An expert system for diagnosis of the heart valve diseases. Expert Systems with Applications, 23, 229–236. Uguz, H., Arslan, A., & Tu¨rkog˘lu, I. (2007). A biomedical system based on hidden Markov model for diagnosis of the heart valve diseases, Pattern Recognition Letters. Available online 11 October, 2006. Wright, I. A., Gough, N. A. J., Rakebrandt, F., Wahab, M., & Woodcock, J. P. (1997). Neural network analysis of doppler ultrasound blood flow signals: A pilot study. Ultrasound in Medicine and Biology, 23(5), 683–690.