An expert system based on linear discriminant analysis and adaptive neuro-fuzzy inference system to diagnosis heart valve diseases

Available online at www.sciencedirect.com Expert Systems with Applications Expert Systems with Applications 35 (2008) 214–222 www.elsevier.com/locate...

Download PDF

382KB Sizes 11 Downloads 108 Views

Report

PDF Reader
Full Text

Available online at www.sciencedirect.com

Expert Systems with Applications Expert Systems with Applications 35 (2008) 214–222 www.elsevier.com/locate/eswa

An expert system based on linear discriminant analysis and adaptive neuro-fuzzy inference system to diagnosis heart valve diseases Abdulkadir Sengur Firat University, Department of Electronics and Computer Science, 23119, Elazig, Turkey

Abstract In the last two decades, the use of artiﬁcial intelligence methods in biomedical analysis is increasing. This is mainly because of the eﬀectiveness of classiﬁcation and detection systems have improved in a great deal to help medical experts in diagnosing. In this paper, we investigate the use of linear discriminant analysis (LDA) and adaptive neuro-fuzzy inference system (ANFIS) to determine the normal and abnormal heart valves from the Doppler heart sounds. The proposed heart valve disorder detection system is composed of three stages. The ﬁrst stage is the pre-processing stage. Filtering, normalization and white-denoising are the processes that were used in this stage. The feature extraction is the second stage. During feature extraction stage, Wavelet transforms and short-time Fourier transform were used. As next step, wavelet entropy was applied to these features. For reducing the complexity of the system, LDA was used for feature reduction. In the classiﬁcation stage, ANFIS classiﬁer is chosen. To evaluate the performance of proposed methodology, a comparative study is realized by using a data set containing 215 samples. The validation of the proposed method is measured by using the sensitivity and speciﬁcity parameters. 95.9% sensitivity and 94% speciﬁcity rate was obtained. Ó 2007 Elsevier Ltd. All rights reserved. Keywords: Doppler heart sounds; Heart valves; Feature extraction; Wavelet decomposition; Feature reduction; Adaptive neuro-fuzzy inference system

1. Introduction The heart consists of four chambers, two atria and two ventricles. There is a valve through which blood passes before leaving each chamber of the heart. The valves prevent the backward ﬂow of blood. These valves are actual ﬂaps that are located on each end of the two ventricles (http:// www.healthsystem.virginia.edu/uvahealth/adult_cardiac/ disvalve.cfm). Heart valve disease is when one or more valves in the heart are not working fully and blood does not ﬂow through the heart as it should. This can put an extra strain on the heart and cause symptoms such as breathlessness and swollen ankles. Severe heart valve disease can cause the heart to pump less eﬃciently (http://hcd2.bupa.co.uk/ fact_sheets/html/heart_valve_disease.html).

E-mail address: ksengur@ﬁrat.edu.tr 0957-4174/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2007.06.012

Heart valve disease may be suspected if the heart sounds heard through a stethoscope are abnormal. This is usually the ﬁrst step in diagnosing a heart valve disease. A characteristic heart murmur (abnormal sounds in the heart due to turbulent blood ﬂow) can often indicate valve regurgitation. To further deﬁne the type of valve disease and extent of the valve damage, physicians may use any of the following diagnostic procedures; electrocardiogram (ECG or EKG), chest X-ray, cardiac catheterization, transesophageal echo (TEE), radionuclide scans and magnetic resonance imaging (MRI) (Nanda, 1993). According to the researches most of human deaths in the world are due to heart diseases. For this reason, early detection of heart valve disorders is necessary in the medical research areas (Akay, Akay, & Welkowitz, 1992). In the last decade, Doppler technique has gained much more interest since Satomura ﬁrst demonstrated the application of the Doppler Eﬀect to the measurement of blood velocity in 1959 (Keeton & Schlindwein, 1997). Doppler heart

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

sounds (DHS) are one of the most important sounds produced by blood ﬂow, valves motion and vibration of the other cardiovascular components (Plett et al., 2000). However, the factors such as calciﬁed disease or obesity often result in a diagnostically unsatisfactory Doppler techniques assessment and, therefore, it is sometimes necessary to assess the spectrogram of the Doppler shift signals to elucidate the degree of the disease (Jing, Xuemin, Mingshi, & Wie, 1997). In addition to Doppler techniques, the techniques that are more complex have also been developed (Laplace Transform and Principal Components Analysis). Many studies have been implemented to classify Doppler signals in the pattern recognition ﬁeld (Chan, Chan, Lam, Lui, & Poon, 1997; Wright, Gough, Rakebrandt, Wahab, & Woodcock, 1997). In this study, an expert system has been proposed that has three stages. The ﬁrst stage presents several pre-processing units for DHS signals. White de-noising, normalization and ﬁltering are the components of the pre-processing unit. The second stage is based on the powerful mathematical and statistical structures. To separate normal and abnormal heart valves: ﬁrstly, a few mathematical structures include wavelet transform and wavelet entropy: short-time fourier transform (STFT) are used to extract the features from the pre-processed Doppler signals. Feature reduction was also carried out in this stage (Guler, Kiymik, Kara, & Yuksel, 1992). The dimension of the obtained feature vector that has 91 features was reduced to 4 features using linear discriminant analysis (LDA). The third stage is the classiﬁer stage where the ANFIS structure is used. The performance of the proposed expert system was evaluated with several statistical validation methods. Moreover, we compared our results with the previous methods where the same data set have been used. Our system obtained 95.9% sensitivity and 94% speciﬁcity rate. The rest of the paper is organized as following; in Section 2, previous related researches, the Doppler heart signals, wavelet decomposition, short-time Fourier transform, wavelet entropy and ANFIS structure is described. The methodology is described in Section 3. This method enables a large reduction of the Doppler signal data while retaining problem speciﬁc information which facilitates an eﬃcient pattern recognition process. In Section 4, the performance evaluation methods are given. The eﬀectiveness of the proposed method for classiﬁcation of Doppler signals in the diagnosis of heart valve diseases is demonstrated in Section 5. Finally, we concluded this paper in Section 6. 2. Background 2.1. Previous research Up to now, several papers have been proposed to determine Doppler signals by using pattern recognition approaches (C ¸ omak, Arslan, & Tu¨rkog˘lu, 2007; Guler et al., 1992; Turkoglu, Arslan, & Ilkay, 2002). Turkoglu et al. (2002) proposed an expert diagnosis system for inter-

215

pretation the Doppler signals of the heart valve diseases based on the pattern recognition (Guler et al., 1992). The proposed methodology was composed of a feature extractor and a back-propagation artiﬁcial neural networks (BPANN) classiﬁer. Wavelet transforms and short-time Fourier transform methods were used for feature extraction from the Doppler signals on the time-frequency domain. Wavelet entropy method was applied to these features. The back-propagation neural network was used to classify the extracted features. The performance of the developed system has been evaluated in 215 samples. The test results showed that this system was eﬀective to detect Doppler heart sounds. The correct classiﬁcation rate was about 94% for normal subjects and 95.9% for abnormal subjects. The dataset (215 samples) which was obtained by Turkoglu et al. (2002) was later used by C ¸ omak et al. (2007). C ¸ omak et al. (2007) investigated the use of least square support vector machines (LS-SVM) classiﬁer for improving the performance of the Turkoglu’s (2002) proposal (Turkoglu et al., 2002). Moreover, they intended to realize a comparative study. Classiﬁcation rates of the examined classiﬁers were evaluated by ROC curves based on the terms of sensitivity and speciﬁcity. The application results showed that according to the ROC curves, the LS-SVM classiﬁer performance was almost same with BP-ANN. It was reported that, LS-SVM was more suitable than BP-ANN since it has some advantages against BPANN. Another reported advantage of LS-SVM was its shorter training time. It was also reported that, ANN’s training time was about 13 times longer than LS-SVM’s training time according to the experimental results. Furthermore, this property was the only advantage of LSSVM against BP-ANN. Later, Uguz, Arslan, and Tu¨rkog˘lu (2007), proposed a biomedical system based on Hidden Markov Model for clinical diagnosis and recognition of heart valve disorders (C ¸ omak et al., 2007). The proposed methodology was also used the database of Turkoglu et al. (2002). In the presented study, continuous HMM (CHMM) classiﬁer system was used. Single Gaussian model was preferred to determine emission probability. The proposed methodology was composed of two stages. At the ﬁrst stage, the initial values of average and standard deviation were calculated by separating observation symbols into equal segments according to the state number and using observation symbols appropriate to each segment. At the second stage, the initial values of average and standard deviation were calculated by separating observation symbols into the clusters (FCM or K-means algorithms) that have equal number of states and using observation symbols appropriate to the separated clusters. The implementations of the experimental studies were carried out on three diﬀerent classiﬁcation systems such as CHMM, FCM–K-means/CHMM and ANN. These experimental results were obtained for speciﬁcity and sensitivity rates 92% and 94% for CHMM, 92% and 97.26% for (FCM–K-means/CHMM), respectively.

216

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

2.2. DHS signals The DHS can be obtained directly by placing the Doppler ultrasonic ﬂow transducer over the chest of the patient (Wright et al., 1997). A DHS signal from aortic heart valve is given in Fig. 1. The DHS produced from echoes backscattered by moving blood cells is generally in the range of 0.5 to 10 kHz (Uguz et al., 2007). DHS signal spectral estimation is now commonly used to evaluate blood ﬂow parameters in order to diagnose cardiovascular diseases. Spectral estimation methods are particularly used in Doppler ultrasound cardiovascular disease detection. Clinical diagnosis procedures generally include analysis of a graphical display and parameter measurements, produced by blood ﬂow spectral evaluation. Ultrasonic instrumentation typically employ Fourier based methods to obtain the blood ﬂow spectra, and blood ﬂow measurements (Saini, Nanda, & Maulik, 1993). A Doppler signal is not a simple signal. It includes random characteristics due to the random phases of scattering particles present in the sample volume. Other eﬀects such as geometric broadening and spatially varying velocity also aﬀect the signal (Karabetsos, Papaodysseus, & Koutsouris, 1998). The following is Doppler equation: 2mf cos h Df ¼ c

ð1Þ

where m equals the velocity of the blood ﬂow, f equals the frequency of the emitted ultrasonic signal, c equals the velocity of sound in tissue (approximately 1540 m/s), Df equals the measured Doppler frequency shift, and h equals the angle of incidence between the direction of blood ﬂow and the direction of the emitted ultrasonic beam (Uguz et al., 2007). 2.3. Wavelet decomposition The main advantage of wavelets is that they have a varying window size, being wide for slow frequencies and narrow for the fast ones, thus leading to an optimal timefrequency resolution in all the frequency ranges. Furthermore, owing to the fact that windows are adapted to the transients of each scale, wavelets lack of the requirement of stationarity (Quiroga, 1998).

Wavelet decomposition uses the fact that it is possible to resolve high frequency components within a small time window, and only low frequencies components need large time windows. This is because a low frequency component completes a cycle in a large time interval whereas a high frequency component completes a cycle in a much shorter interval. Therefore, slow varying components can only be identiﬁed over long time intervals but fast varying components can be identiﬁed over short-time intervals. Wavelet decomposition can be regarded as a continuous time wavelet decomposition sampled at diﬀerent frequencies at every level or stage. The wavelet decomposition functions at level m and time location tm can be expressed as Eq. (2): t t m d m ðtm Þ ¼ xðtÞ Wm ð2Þ 2m where Wm is the decomposition ﬁlter at frequency level m. The eﬀect of the decomposition ﬁlter is scaled by the factor 2m at stage m, but otherwise the shape is the same at all stages. The synthesis of the signal from its time-frequency coeﬃcients given in Eq. (3) can be rewritten to express the composition of the signal x[n] from its wavelet coeﬃcients. d½n ¼ x½n h½n c½n ¼ x½n g½n

ð3Þ

where h[n] is the impulse response of the high-pass ﬁlter and g[n] is the impulse response of the low pass ﬁlter (Devasahayam, 2000). 2.4. Short-time fourier transform Short-time fourier transform (STFT), also know as the time-dependent or the windowed Fourier transform, attempts to analyse non-stationary signals by dividing the whole signal into shorter data frames. In short, the STFT can be compactly represented by Eq. (4): N 1 X j2pnk X ðkÞ ¼ xðnÞ xðn n0 Þ exp ð4Þ N n¼0 where, x(n n0) is a window function to suppress side lobes while minimizing the main lobe leakage. The output of successive STFTs can provide a time-frequency representation of the signal. To accomplish this signal is truncated into short data frames by multiplying it by a window so that the modiﬁed signal is zero outside the data frame. The frequency spectrum for the data frame is calculated using the fast Fourier transform. One of the limitations of STFT is that the time frame for analysis of the signal is ﬁxed (Keeton & Schlindwein, 1997). 2.5. Wavelet entropy

Fig. 1. The waveform pattern of the Doppler heart sound.

Entropy-based criteria describe information-related properties for an accurate representation of a given signal. Entropy is a common concept in many ﬁelds, mainly in

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

signal processing (Coifman & Wickerhauser, 1992). A method for measuring the entropy appears as an ideal tool for quantifying the ordering of non-stationary signals. An ordered activity (i.e. a sinusoidal signal) is manifested as a narrow peak in the frequency domain, thus having low entropy. On the other hand, random activity has a wide band response in the frequency domain, reﬂected in a high entropy value (Quiroga, Roso, & Basar, 1999). The types of entropy computing are Shannon, threshold, norm, log energy and sure (Coifman & Wickerhauser, 1992).

2.6. Architecture of adaptive-network-based fuzzy inference system An adaptive-network, as its name implies, is a network structure consisting of nodes and directional links through which the nodes are connected. Moreover, parts or all of the nodes are adaptive, which means each output of these nodes depends on the parameters pertaining to this node and the learning rule speciﬁes how these parameters should be changed to minimize a prescribe error measure (Gu¨ler & ¨ beyli, 2005; Jang, 1993). For simplicity, we assume the U fuzzy inference system under consideration has two inputs x and y and one output z. Suppose that the rule base contains two fuzzy if-then rules of Takagi and Sugeno’s type: Rule 1: If (x is A1) and (y is B1) then (f1 = p1x + q1y + r1) Rule 2: If (x is A2) and (y is B2) then (f2 = p2x + q2y + r2) where Ai and Bi are the fuzzy sets, fi are the outputs within the fuzzy region speciﬁed by the fuzzy rule, pi, qi and ri are the design parameters that are determined during the training process. The ANFIS architecture to implement these two rules is shown in Fig. 2, in which a circle indicates a ﬁxed node, whereas a square indicates an adaptive node.

217

where x is the input to node i, and Ai is the linguistic label (small, large, etc.) associated with this node function, and where lAi (x), lBi2 (y) can adopt any fuzzy membership function. Usually we choose lAi (x) to be bell-shaped with maximum equal to 1 and minimum equal to 0, such as lAi ðxÞ ¼ 1þ

1 xci ai

2 bi

where (ai, bi and ci) is the parameter set. Parameters in this layer are referred to as premise parameters. Layer 2: The nodes in this layer are ﬁxed. These are labelled M to indicate that play the role of a simple multiplier. The outputs of these nodes are given by: O2i ¼ wi ¼ lAi ðxÞlBi ðyÞ i ¼ 1; 2

O1i ¼ lAi ðxÞ; O1i

i ¼ 1; 2

¼ lBi2 ðyÞ;

Layer 1

ð5Þ

i ¼ 3; 4

Layer 2

ð6Þ

Layer 3

Layer 4 x y

Layer 5

A1 x

M

w1

N

w1 w1 f1

A2 S B1 y

M B2

w2

w2 f 2

N

w2 x

Fig. 2. ANFIS architecture.

y

f

ð8Þ

which are the so-called ﬁring strengths of the rules. Layer 3: Every node in this layer is a circle node labelled N. The ith node calculates the ratio of the ith rule’s ﬁring strength to the sum of all rules’ ﬁring strengths: wi O3i ¼ wi ¼ i ¼ 1; 2 ð9Þ w1 þ w2 For convenience, outputs of this layer will be called normalized ﬁring strengths. Layer 4: In this layer, the nodes are adaptive nodes. The output of each node in this layer is simply the product of the normalized ﬁring strength and a ﬁrst order polynomial (for a ﬁrst order Sugeno model). Thus, the outputs of this layer are given by: O4i ¼ wi fi ¼ wi ðpi x þ qi y þ ri Þ i ¼ 1; 2

Layer 1: Every node i in this layer is a square node with a node function

ð7Þ

ð10Þ

Parameters in this layer will be referred to as consequent parameters. Layer 5: The P single node in this layer is circle node labelled that computes the overall output as the summation of all incoming signals, i.e., P2 2 X wi fi O5i ¼ wi fi ¼ i¼1 ð11Þ w þ w2 1 i¼1 It can be watched that there are two adaptive layers in this ANFIS architecture, namely the ﬁrst layer and the fourth layer. In the ﬁrst layer, there are three modiﬁable parameters {ai, bi, ci}, which are related to the input membership functions. These parameters are the so-called premise parameters. In the fourth layer, there are also three modiﬁable parameters {pi, qi, ri}, pertaining to the ﬁrst order polynomial. These parameters are so-called consequent parameters (Gu¨ler & ¨ beyli, 2005; Jang, 1993). U

218

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

2.7. Learning algorithm of ANFIS The aim of the training algorithm for this architecture is tune all the changeable parameters to make the ANFIS output match the training data. Note here that parameters ai, bi and ci of the membership function are ﬁxed, and describe the sigma, slope and centre of the bell membership functions, respectively. Thus, the output of the ANFIS model can be written as (Jang, 1993): w1 w2 f ¼ f1 þ f2 ð12Þ w1 þ w2 w1 þ w2

Doppler Ultrasound

PRE-PROCESSING

Data acquisition Filtering White de-noising Normalization Cleaned DHS Signal (The heart valves)

FEATURE EXTRACTION

Wavelet decomposition Short-time Fourier transform Wavelet entropy Feature Space (91 features)

Substituting Eq. (5) into Eq. (8) yields: f ¼ w1 f1 þ w2 f2

ð13Þ

FEATURE REDUCTION

LDA

Substituting the fuzzy if-then rules into Eq. (9), it becomes: f ¼ w1 ðp1 x þ q1 y þ r1 Þ þ w2 ðp2 x þ q2 y þ r2 Þ

(4 features)

ð14Þ

After rearrangement, the output can be written as: CLASSIFICATION

f ¼ ðw1 xÞp1 þ ðw1 yÞq1 þ ðw1 Þr1 þ ðw2 xÞp2 þ ðw2 yÞq2 þ ðw2 Þr2

ANFIS

ð15Þ

which is a linear combination of the changeable consequent parameters p1, q1, r1, p2, q2 and r2. The least squares method can be used to identify the optimal values of these parameters easily. When the premise parameters are not ﬁxed, the search space becomes larger and the convergence of the training becomes slower. A hybrid algorithm combining the least squares method and the gradient descent method is adopted to solve this problem. The hybrid algorithm is composed of a forward pass and a backward pass. The least squares method (forward pass) is used to optimize the consequent parameters with the premise parameters ﬁxed. When the optimal consequent parameters are found, the backward pass starts immediately. The gradient descent method (backward pass) is used to adjust optimally the premise parameters corresponding to the fuzzy sets in the input domain. The output of the ANFIS is calculated by employing the consequent parameters found in the forward pass. The output error is used to adapt the premise parameters by means of a standard back propagation algorithm. It has been proven that this hybrid algorithm is ¨ beyli, highly eﬃcient in training the ANFIS (Gu¨ler & U 2005; Jang, 1993). 3. Methodology The proposed methodology for detection of heart valve disorders is illustrated in Fig. 3. It consists of three parts: (a) Data Acquisition and Pre-processing, (b) Feature Extraction and Feature Reduction, (c) Classiﬁcation Using ANFIS. 3.1. Data acquisition and pre-processing DHS signals were acquired from the Acuson Sequoia 512 Model Doppler Ultrasound system in the Cardiology

Decision Space

CLASSIFICATION Abnormal valve Normal valve RESULTS

Fig. 3. The algorithm of the expert diagnostic system.

Department of the Firat Medical Center. DHS signals were sampled at 20 kHz for 5 s and signal to noise ratio of 0 dB by using a sound card which has 16-bit A/D conversion resolution and computer software prepared by us in the MATLAB (version 6.5) (The MathWorks Inc.). The Doppler ultrasonic ﬂow transducer (Model 3V2c) uses an operating mode of 2 MHz continuous wave. Filtering: The reserved DHS signals were high-pass ﬁltered to remove unwanted low-frequency components, because the DHS signals are generally in the range of 0.5–10 kHz. The ﬁlter is a digital FIR, which is a ﬁftiethorder ﬁlter with a cut-oﬀ frequency equal to 500 Hz and window type is the 51-point symmetric Hamming window. i. White de-noising: White noise is a random signal that contains equal amounts of every possible frequency, i.e. its FFT has a ﬂat spectrum (Devasahayam, 2000). The DHS signals were ﬁltered removing the white noise by using wavelet packet. The white denoising procedure contains three steps (Bakhtazad, Palazoglu, & Romagnoli, 1999): 1. Decomposition: Computing the wavelet packet decomposition of the DHS signal at level 4 and using the Daubechies wavelet of order 4. 2. Detail coeﬃcient thresholding: For each level from 1 to 4, soft thresholding is applied to the detail coeﬃcients.

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

219

3. Reconstruction: Computing wavelet packet reconstruction based on the original approximation coeﬃcients of level 4 and the modiﬁed detail coefﬁcients of levels from 1 to 4. ii. Normalization: The DHS signals in this study were normalized using Eq. (16) so that the expected amplitude of the signal is no aﬀected from the rib cage structure of the patient. DHSsignal ¼

DHSsignal jðDHSsignal Þmax j

ð16Þ

3.2. Feature extraction The DHS waveform patterns from heart valves are rich in detail and highly non-stationary. The goal of the feature extraction is to extract features from these patterns for reliable intelligent classiﬁcation. After the data pre-processing has been realised, three steps are proposed in this paper to extract the characteristics of these waveforms using MATLAB with the wavelet toolbox and the signal processing toolbox: i. Wavelet decomposition: For wavelet decomposition of the DHS waveforms, the decomposition structure, reconstruction tree at level 12 as illustrated in Fig. 4 was used. Wavelet decomposition was applied to the DHS signal using the Daubechies-10 wavelet decomposition ﬁlters. Thus obtaining two types of coeﬃcients: one-approximation coeﬃcients cA and twelve-detail coeﬃcients cD. A representative example of the wavelet decomposition of the Doppler sound signal of the heart aortic valve was shown in Fig. 5. ii. Short-time fourier transform: The STFT is the most robust and understood one of the various time-frequency representations. The STFT of waveforms of terminal nodes was computed using a Hanning window function of 25,000-points, the sections overlap of 12,500 points between scans and zero padding the sections if the length of the window exceeds 25,000-points. A representative example of the STFT spectrums of a terminal node waveform is indicated in Fig. 6.

Original signal

Fig. 5. The terminal node waveforms of wavelet decomposition at twelve levels of the DHS signal.

Fig. 6. The STFT spectrums of a terminal node waveform.

iii. Wavelet entropy: We next calculated the norm entropy as deﬁned in Eq. (17) of waveforms of the STFT spectrums. X 3=2 EðsÞ ¼ jsi j ð17Þ i

cA12

cD12

cD3

cD2

cD1

Terminal nodes

Fig. 4. The decomposition structure in level-12.

where, s is the STFT spectrum and si is ith coeﬃcients of s. The resultant entropy data, which were normalized with 1/50,000, were plotted in Fig. 7. The plot of

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

Entropy magnitude

220

Table 1 ANFIS architecture and training parameters

4

Architecture The number of layers

2

0

0

20

40

60

80

100

The number of entropy Fig. 7. The wavelet entropy of the DHS signal.

the entropy data includes 91 features obtained from 13 terminal nodes which each one contains waveform of seven frequency spectrums per DHS signal. Thus, the feature vector was extracted by computing the wavelet entropy values for each DHS signal. 3.3. Linear discriminant analysis (LDA) for feature reduction Linear discriminant analysis (LDA) searches for those vectors in the feature space that best discriminate among classes rather than those that best describe the data (Kim, Kim, & Bang, 2002; Lee, Park, Song, & Lee, 2005). The main aim of LDA is to seek a transformation matrix W that in some sense maximizes the ratio of the between-class scatter to the within-class scatter. A withinclass scatter matrix Sw is deﬁned as: c X X T ðx mi Þðx mi Þ ð18Þ Sw ¼ i¼1 x2C i

where c is the number of classes, Ci is a set of data belonging to the ith class, and mi is the mean of the ith class. The withinclass scatter matrix represents the degree of scatter withinclasses as a summation of covariance matrices of all classes. A between-class scatter matrix SB is deﬁned as follows: c X ni ðmi mÞðmi mÞT ð19Þ SB ¼ i¼1

The transformation matrix W that in some sense maximizes the ratio of the between-class scatter and the within-class scatter is needed. The criterion function J(W) can be deﬁned as; J ðW Þ ¼

jW T S B W j jW T S w W j

ð20Þ

One can obtain the transformation matrix W as one that maximizes the criterion function J(W) and, given a number of independent features relative to which data is described, LDA creates a linear combination of these which yields the largest mean diﬀerences the desired classes (Martinez & Kak, 2001; Lee et al., 2005). 3.4. Classiﬁcation using ANFIS The objective of classiﬁcation is to demonstrate the eﬀectiveness of the proposed feature extraction method

Type of input membership functions Training parameters Learning rule

Momentum constant Sum-squared error Epochs number to sumsquared error

5 Input :5 Number of rules : 32 Output :1 Bell-shaped

Hybrid learning algorithm (back-propagation for nonlinear parameters (ai, ci) and Least square errors for linear parameters (pi, qi, ri, si, ssi, ppi, ui)) 0.9 0.00001 767

from the DHS signals. For this purpose, the feature vectors were applied as the input to an ANN classiﬁer. The classiﬁcation by ANFIS was performed using MATLAB. The training parameters and the structure of the ANFIS used in this study are as shown in Table 1. 4. Performance evaluation methods Diﬀerent evaluation methods were used for calculating the performance of the proposed expert system. These methods are classiﬁcation accuracy, sensitivity and speciﬁcity measures and confusion matrix. The description of these methods will be given in the following sub sections. 4.1. Classiﬁcation accuracy The classiﬁcation accuracy is the common method that is used in the pattern recognition applications. The classiﬁcation accuracy for the experiment is taken as the ratio of the number of samples correctly classiﬁed to the total number of samples. 4.2. Sensitivity and speciﬁcity For sensitivity and speciﬁcity analysis, we use the following expressions. TP % TP þ FN TN Specificity ¼ % FP þ TN Sensitivity ¼

ð21Þ ð22Þ

where TP, TN, FP and FN denote true positives, true negatives, false positives, and false negatives, respectively. True positive (TP): An input is detected as a patient with atherosclerosis diagnosed by the expert clinicians. True negative (TN): An input is detected as normal that is labeled as a healthy person by the expert clinicians. False positive (FP): An input is detected as a patient that is labeled

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

221

as a healthy by the expert clinicians. False negative (FN): An input is detected as normal with atherosclerosis diagnosed by the expert clinicians.

Table 2 Testing results of the proposed methodology

4.3. Confusion matrix

The number of samples Correct classiﬁcation Incorrect classiﬁcation The accuracy (%)

A confusion matrix is composed of the actual and the predicted classiﬁcations done by a classiﬁcation system. Confusion matrix identiﬁes the common misclassiﬁcations of the proposed classiﬁcation schema. Performance of such a system is commonly evaluated using the data in the matrix. 5. Experimental classiﬁcation results The experimental study was performed with 215 data that consist of 110 data (54 abnormal, 56 normal) of the aortic heart valve and 105 data (66 abnormal, 39 normal) of the mitral heart valve. Thus, the populations of 95 normal and 120 abnormal samples were handled. These samples were from 132 males and 83 females. Their ages were between 15 and 80 years. Also, the mean of their ages was 48 years. These abnormal data include all diseases related to aortic and mitral valves such as aortic insuﬃciency and stenosis, and mitral insuﬃciency and stenosis. Normal represents that there was no insuﬃciency and stenosis in aortic or mitral heart valves, whereas abnormal represents their presence. The data set was obtained in supervision of expert doctors. The diagnosis of whether a patient was normal or abnormal is determined after the Doppler observations and clinical results were discussed by experiment doctors as in Jing et al. (1997). In the training processing; 14 abnormal and 25 normal subjects were selected for diagnosis of the aortic heart valve and 33 abnormal and 20 normal subjects were chosen for diagnosis of the mitral heart valve. Remaining of the data set was used as a test set. As it was mentioned in the earlier section, the number of the features that was used for characterizing the normal and abnormal mitral and aortic heart valves was 91. Therefore, instead of using these features directly, the linear projection of the feature vector was obtained by using LDA.

The heart mitral valve

The heart aortic valve

N

AN

N

AN

19 18 1 94.7

33 32 1 96.9

31 29 2 93.5

40 38 2 95

In other words, the feature vector was reduced to 4 features with the LDA algorithm which is described in Section 3.3. The classiﬁcation accuracy of the proposed LDA and ANFIS algorithm’s testing results were given in Table 2. One abnormal and one normal heart mitral valve patterns were classiﬁed incorrectly whereas two normal and two abnormal heart aortic valve patterns were classiﬁed incorrectly by the proposed methodology. The confusion matrix and the calculated sensitivity and speciﬁcity rates were given in Table 3. The performance comparison of the proposed system with the other classiﬁers from the literature was also demonstrated in the Table 3. According to these results, higher sensitivity rate (97.3%) was obtained by using the FCM–CHMM (Uguz et al., 2007). Our proposal and ANN methods produced the same and the second higher sensitivity rate (95.9%). SVM method gained the worst sensitivity rate (94.5%) On the other hand, the higher speciﬁcity rate (94%) was obtained with our proposal and ANN method. FCM–CHMM produced the 92% speciﬁcity value. And ﬁnally the SVM produced the lower speciﬁcity rate (90%). 6. Discussion and conclusion In this study, a medical decision support system with normal and abnormal classes has been developed. In the related previous works, a feature vector which has 91 features was considered as input to the classiﬁers. The complexities of the previous works were quite high due to the dimension of the feature vector. Therefore a data reduction process was emerged. Data reduction with LDA and ANFIS classiﬁer was used in this work for diagnosis of aortic and mitral heart valve disorders from DHS signals.

Table 3 Obtained performance parameters with our proposed system and other classiﬁers from the literature Method and classiﬁer

Type

No. of patients

Detected as abnormal

Detected as normal

SN = Sensitivity (%), SP = Speciﬁcity (%)

LDA and ANFIS

Abnormal Normal

73 50

70 47

3 3

SN = 95.9 SP = 94

ANN (Turkoglu et al., 2002)

Abnormal Normal

73 50

70 47

3 3

SN = 95.9 SP = 94

SVM (C ¸ omak et al., 2007)

Abnormal Normal

73 50

69 45

4 5

SN = 94.5 SP = 90

FCM/CHMM (Uguz et al., 2007)

Abnormal Normal

73 50

71 46

2 4

SN = 97.3 SP = 92

222

A. Sengur / Expert Systems with Applications 35 (2008) 214–222

For this purpose, computer simulations were carried out and statistical validation indexes were used for determining the performance of the proposed methodology. The previous reported results were also compared with our proposal. The proposed methodology was composed of three main stages. These stages are pre-processing, feature extraction and feature reduction with LDA and classiﬁcation. The wavelet decomposition for multi-scale analysis, STFT for time-frequency representations, and the wavelet entropy was used for feature extraction whereas LDA was used for feature reduction and ANFIS classiﬁer was adopted for eﬃcient recognition. According to the experimental results, proposed method is eﬃcient for interpretation of the disease. Our proposal and ANN methods produced the 95.9% sensitivity rate and 94% speciﬁcity rate. This speciﬁcity rate is the highest rate. SVM method gained the worst sensitivity and speciﬁcity rate. FCM–CHMM produced the 97.3% sensitivity and 92% speciﬁcity values. 97.3% sensitivity rate is the highest value among the proposed previous works. The proposed system has some advantages of automation. It is rapid, easy to process, non-invasive and cheap for clinical application. This system is of the better clinical application especially for earlier survey of population. However, the position of the ultrasound probe, which is used for data acquisition from the heart valves, must be taken into consideration by physician. Acknowledgment The author thank Dr. Ibrahim TURKOGLU for providing the DHS signals and for his valuable suggestion in improving the technical presentation of this paper. References Akay, M., Akay, Y. M., & Welkowitz, W. (1992). Neural networks for the diagnosis of coronary artery disease. International Joint Conference on Neural Networks, IJCNN, 2, 419–424. Bakhtazad, A., Palazoglu, A., & Romagnoli, J. A. (1999). Process data denoising using wavelet transform. Intelligent Data Analysis, 3(October), 267–285. Chan, B. C. B., Chan, F. H. Y., Lam, F. K., Lui, P. W., & Poon, P. W. F. (1997). Fast detection of venous air embolism is doppler heart sound using the wavelet transform. IEEE Transactions on Biomedical Engineering, 44(4), 237–245. Coifman, R. R., & Wickerhauser, M. V. (1992). Entropy-based Algorithms for best basis selection. IEEE Transactions on Information Theory, 38(2), 713–718. C ¸ omak, E., Arslan, A., & Tu¨rkog˘lu, I. (2007). A decision support system based on support vector machines for diagnosis of the heart valve diseases. Computers in Biology and Medicine, 37, 21–27.

Devasahayam, S. R. (2000). Signals and Systems in Biomedical Engineering. Kluwer Academic Publishers. Guler, I., Kiymik, M. K., Kara, S., & Yuksel, M. E. (1992). Application of autoregressive analysis to 20 MHz pulsed Doppler data in real time. International Journal Biomedical Computing, 31(3–4), 247–256. _ &U ¨ beyli, E. D. (2005). Adaptive neuro-fuzzy inference system Gu¨ler, I., for classiﬁcation of EEG signals using wavelet coeﬃcients. Journal of Neuroscience Methods, 148(2), 113–121. http://hcd2.bupa.co.uk/fact_sheets/html/heart_valve_disease.html. accessed 01.12.2006. http://www.healthsystem.virginia.edu/uvahealth/adult_cardiac/disvalve. cfm. accessed 01.12.2006. Jang, J.-S. R. (1993). ANFIS: Adaptive-network-based fuzzy inference system. IEEE Transactions on Systems Management Cybernetics, 23(3), 665–685. Jing, F., Xuemin, W., Mingshi, W., & Wie, L. (1997). Noninvasive acoustical analysis system of coronary heart disease. Biomedical Engineering Conference, Proceedings of the 1997 Sixteenth Southern, pp. 239–241. Karabetsos, E., Papaodysseus, C., & Koutsouris, D. (1998). Design and development of a new ultrasonic doppler technique for estimation of the aggregation of red blood cells. Elsevier Meas, 24, 207–215. Keeton, P. I. J., Schlindwein, F. S. (1997). Application of wavelets in Doppler ultrasound. MCB University Press, Vol. 17, Number 1, pp. 38– 45. Kim, H. C., Kim, D. J., & Bang, S. Y., (2002). ‘‘Face Recognition Using LDA Mixture Model.’’ Proceedings 16th International Conference on Pattern Recognition, Vol. 2, pp. 925–928. Lee, J., Park, K. L., Song, M. H., & Lee, K. J., (2005). Arrhythmia Classiﬁcation with Reduced Features by Linear Discriminant Analysis. Engineering in Medicine and Biology 27th Annual Conference Shanghai, China, September 1–4, 2005. Martinez, A. M., & Kak, A. C. (2001). PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 228–233. Nanda, N. C. (1993). Doppler Echocardiography (2nd ed.). London: Lea and Febiger. Plett, M. I. (2000). Ultrasonic arterial vibrometry with wavelet based detection and estimation. University of Washington, Ph.D. Thesis, pp. 17–18. Quiroga, R. Q. (1998). Quantitative analysis of EEG signals: Timefrequency methods and chaos theory. Intitute of Physiology – Medical University Lu¨beck. Quiroga, R. Q., Roso, O. A., & Basar, E. (1999). Wavelet entropy: A measure of order in evoked potentials. Elsevier Science, Evoked Potentials and Magnetic Fields, September, 49, pp. 298–302. Saini, V. D., Nanda, N. C., & Maulik, D. (1993). Basic Principles of Ultrasound and Doppler Eﬀect. Doppler Echocardiography. Philadelphia, London: Lea and Febiger. Turkoglu, I., Arslan, A., & Ilkay, E. (2002). An expert system for diagnosis of the heart valve diseases. Expert Systems with Applications, 23, 229–236. Uguz, H., Arslan, A., & Tu¨rkog˘lu, I. (2007). A biomedical system based on hidden Markov model for diagnosis of the heart valve diseases, Pattern Recognition Letters. Available online 11 October, 2006. Wright, I. A., Gough, N. A. J., Rakebrandt, F., Wahab, M., & Woodcock, J. P. (1997). Neural network analysis of doppler ultrasound blood ﬂow signals: A pilot study. Ultrasound in Medicine and Biology, 23(5), 683–690.

An expert system based on linear discriminant analysis and adaptive neuro-fuzzy inference system to diagnosis heart valve diseases

An expert system based on linear discriminant analysis and adaptive neuro-fuzzy inference system to diagnosis heart valve diseases

Recommend Documents