A novel seizure diagnostic model based on kernel density estimation and least squares support vector machine

A novel seizure diagnostic model based on kernel density estimation and least squares support vector machine

Biomedical Signal Processing and Control 41 (2018) 233–241 Contents lists available at ScienceDirect Biomedical Signal Processing and Control journa...

2MB Sizes 0 Downloads 86 Views

Biomedical Signal Processing and Control 41 (2018) 233–241

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc

A novel seizure diagnostic model based on kernel density estimation and least squares support vector machine Mingyang Li, Wanzhong Chen ∗ , Tao Zhang College of Communication Engineering, Jilin University, Changchun, 130012, China

a r t i c l e

i n f o

Article history: Received 19 May 2017 Received in revised form 2 November 2017 Accepted 10 December 2017 Keyword: EEG WPT Kernel density estimation (KDE) LS-SVM

a b s t r a c t The automated system can be an effective tool for assisting neurologists in seizure detection. However, most of the existing methods are failed to trade off the effectivity and computation cost, which is not appropriate for on-line application. In this research, we propose a novel method for dealing with 3-class electroencephalogram (EEG) problem, based upon kernel density estimation (KDE) and least squares support vector machine (LS-SVM). The filtered EEG is decomposed into several sub-bands by wavelet packet transform (WPT), then KDE is explored to calculate the corresponding probability density. Five parameters are employed for EEG representation: the maximum (Max), the skewness (Ske), the kurtosis (Kur), the energy (En), and the central moment (CM). And significant features selected by Analysis of Variance (ANOVA) are fed to LS-SVM for pattern recognition. Furthermore, eight types of wavelet bases and four well-known functions are considered for feature extraction. Experimental results show that our approach has achieved satisfactory and comparable results for all validation methods when configured with coiflet of order 1 and uniform kernel. The highest accuracy of 10-fold cross-validation and standard 50-50 methodology is 99.40% and 99.60% with 27 and 26 features, respectively. As compared to previous literature, our proposed scheme is more suitable for diagnosis of epilepsy with higher accuracy and less number of feature that can be extracted with less computational cost. Overall, the advantages of high accuracy, easy implementation and low computational consumption have made this technique a suitable candidate for extensive clinical deployment. © 2017 Elsevier Ltd. All rights reserved.

1. Introduction Epilepsy is a chronic neurological disorder characterized by the sudden and excessive neural discharges in the brain [1,2]. It has been estimated that approximately one in every 100 people worldwide are suffering from epilepsy [3,4]. Until recently, epilepsy has become a global problem in the public healthy, which severely affected the patients’ life quality, study, and working abilities. Electroencephalography (EEG) is an important clinical tool that is used for epileptic detection as it is a condition related to the brain activity [5,6]. However, visual inspection of the EEG recordings by experienced neurologist is a very time consuming, cumbersome and subjective task [7]. Hence, the automatic seizure detection technology holds great significance and prospect in clinical implications, which can help doctors confirm their initial diagnosis as well as develop suitable treatment of patients with the personalized.

∗ Corresponding author at: College of Communication Engineering, Jilin University, Ren Min Street, 5988, Changchun, China. E-mail address: [email protected] (W. Chen). https://doi.org/10.1016/j.bspc.2017.12.005 1746-8094/© 2017 Elsevier Ltd. All rights reserved.

Various methods addressing feature extraction have been proposed for the purpose of epileptic seizure detection. These methods can be broadly summarized as four categories namely time domain analysis [8], frequency domain analysis [9], time-frequency domain analysis [10] and non-linear dynamics analysis [11]. There has been an increasing interest in the study of seizure detection using wavelet transform (WT) and wavelet packet transform (WPT) in recent years. Subasi [10] used statistical features over the set of the discrete wavelet transform (DWT) coefficients for the classification of epileptic EEG signals. Ocak [12] developed a system for two-class epilepsy detection based on DWT and approximate entropy (ApEn). Kumar et al. [13] have employed the DWT based fuzzy approximate entropy as a feature for automated seizure detection using support vector machine (SVM). Martis et al. [14] has proposed an epileptic EEG classification method using WPT and non-linear parameters. Acharya et al. [15] modified the feature extraction with the use of WPT along with principal components analysis (PCA). Although both DWT and WPT are popular in feature extraction of EEG signals, WPT is able to offer better partial characteristics and analyze the information both in high and low frequencies. So in the paper, we adopt WPT for signal processing.

234

M. Li et al. / Biomedical Signal Processing and Control 41 (2018) 233–241

As noted in earlier works that most methods are likely to extract non-linear or statistical features in the DWT or WPT domain. However, the non-linear algorithms will incur increased computational time requirement complexity. And statistical features extracted in only time-frequency domain are limited in their subtle characteristics representation. The constraints have made them hard to be implemented in real time application. Hence, a hybrid method based on the kernel density estimation (KDE) and WPT are proposed in this research. As far we are aware of, there is no study in the literature related to the KDE in the diagnosis of epilepsy. The KDE is a non-parametric way for estimating probability density function (PDF) of signals, which is extensively used in image processing because of its discriminatory power and computational simplicity. In this regard, the KDE is deployed in combination with WPT and five statistical features are derived in the KDE-WPT domain. By this means, both the statistical properties and transient changes can be captured and localized. Particularly, we have not only exploited the use of KDE in seizure detection but also considered the influence brought by different wavelet bases and kernel functions through experimental evaluation. This study proposes a new seizure detection method that is different from the approaches presented in the previous studies. Fig. 1 has shown a block diagram of the proposed technique. The data used in this work is divided in two parts, one is for model building and the other is for model testing. As seen in Fig. 1, the filtered EEG are subjected to WPD for 4-level decomposition. Then both the raw EEG and the WPD coefficients are analyzed by the KDE for corresponding PDF, from which five statistical features such as the maximum (Max), the skewness (Ske), the kurtosis (Kur), the energy (En), and the central moment (CM) are obtained. In order to eliminate redundant information, highly significant features were selected using the Analysis of Variance (ANOVA) test. And least squares support vector machine (LS-SVM) is employed as the classifier for deciding the class of the input features. The rest of the paper is designed as follows: Section 2 provides the description of the data and the proposed methodology. And Section 3 presents the experimental results. A discussion of our approach is given in Section 4, followed by the summary of this research in the last Section.

2.2. KDE based feature extraction 2.2.1. WPT decomposition WPT, introduced by Coifman and Wickerhauseran [17], is known as the further development of discrete wavelet transform (DWT). WPT has attracted increasing attention because of its ability in providing more flexible time-frequency decomposition, especially in the higher frguency region [18]. Since DWT decomposes only approximate information into each successive level, the analysis of WPT is more detailed, and has more accurate partial analysis ability. In the case of WPT, both detail and approximation coefficients are generated for further decomposition at each level. By decomposing the original signal, one approximation coefficient and one detail coefficient are obtained at the first level. Similarly, these two components are further decomposed into four coefficients at level 2, and so on. Due to the binary tree structure, WPT offers better resolution in comparison with DWT. Hence, WPT is a superior approach to preciously reveal and localize transient features in epileptic data. The choice of decomposition levels and suitable wavelet function directly affects WPT processing results. The number of decomposition levels is thus fixed to 4 in this present paper, the same number adopted by Zhang et al. [19]. As well, we have performed an empirical study with different types of wavelets aiming at finding the appropriate wavelet bases for particular applications. 2.2.2. KDE analysis Kernel density estimation (KDE) is a well known technique that is widely used in the field of statistics and pattern recognition [20]. As a non-parametric model, KDE provides a smooth, continuous, and differentiable density estimate without assuming any specific underlying distribution. Owing to these advantages, we have deployed KDE into EEG analysis, which can not only describe a time series but also capture the features of distribution density. Let xi , (i = 1,. . .,n) be an independent sequence drawn from an arbitrary probability distribution. And the form of KDE can be given: 1  x − xi fˆ (x) = ) G( Nh h N

(1)

i=1

2. Materials and methods 2.1. EEG dataset The data used in this paper is obtained from the Department of Epileptology, University of Bonn. More detailed information about the data is provided in Ref. [16] including the acquisition process, which are not mentioned to keep this paper reasonably concise. The whole database consists of five sub-sets denoted as Z, O, N, F, and S, each containing 100 single-channel EEG segments of 23.6 s duration. These EEG signals are recorded at a sampling rate of 173.61 Hz using a 128-channel amplifier system with an average common reference. In this study, sub-sets Z, F, S are used for further analysis since the case of Z, F and S (simplified as Z-F-S) has become a classical and complicated classification problem and attracted the great attention of researchers in recent years. Set Z consist of surface EEG segments collected from five healthy volunteers with their eyes open. The segments in both F and S were obtained from sick volunteers with electrodes placed in the epileptogenic zone. Group F was recorded during seizure-free interictal trials, while group S was measured during seizure activity. The EEG are classified into three different classes namely normal intervals (Z), inter-ictal intervals (F) and ictal intervals (S). Sample EEG signals from the three sets are plotted in Fig. 2.

Where f denotes the probability density function (PDF), N is the sample number, G is the kernel function and h is the bandwidth. The kernel function is an important factor for final joint distribution and the four popular functions are summarized: 1) Uniform kernel (Un)

G(x) =

⎧ 1 ⎪ ⎪ ⎪ 2 , |x| ≤ 1 ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩

(2) 0, other

2) Triangle kernel (Tr)

G(x) =

⎧ ⎨ 1 − |x|, |x| ≤ 1 ⎩

(3) 0, other

3) Epanechnikov kernel (Ep)

G(x) =

⎧ 3 ⎪ (1 − x2 ), |x| ≤ 1 ⎪ ⎪ 4 ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩

(4) 0, other

M. Li et al. / Biomedical Signal Processing and Control 41 (2018) 233–241

235

Fig. 1. Block diagram of the proposed technique.

Fig. 2. Sample EEG signals taken from three different sets.

Table 1 Five statistical features.

4) Gaussian kernel (Ga) 1 1 G(x) = √ exp(− x2 ) 2 2

(5)

Feature names

Formulas n 

Energy

In order to optimally investigate the use of KDE in seizure diagnosis, four commonly applied kernel functions are estimated in this research. The parameter h is adaptively selected using the method of mean integrated square error (MISE), which is introduced concretely in literature [21].

wi2

(6)

i=1

n 

Central Moment

1 n

Skewness

1 n

Kurtosis

1 n

Maximum

max(wn )

k

¯ (wi − w) (7)

i=1

n 

(

¯ 3 wi − w ) (8) 

(

¯ 4 wi − w ) (9) 

i=1

2.2.3. Derived EEG features The high dimension of input parameters can place a significant load on the classifier and increase computational complexity. In order to ensure running efficiency, five statistical features are extracted over the kernel density coefficients. These selected features, including the En, the CM, the Max, the Ske and the Kur, are evaluated using the following equations, as listed in Table 1. In (6)–(10), wn is a data at sample point n,  is the standard devia¯ is the mean value of the sample and k is the order of tion of wn , w the central moment. In this paper, we have fixed the value of k to 0.2 on the basis of numerous experiments. Generally, the absolute value of CM is used to avoid unwanted complex values. Thus, these five statistical metrics can be used as features that describe the epileptic activities. Besides the sixteen sub-bands (marked as AAAA, AAAD, AADA, AADD, ADAA, ADAD, ADDA, ADDD, DAAA, DAAD, DADA, DADD, DDAA, DDAD, DDDA, DDDD) generated with 4-level WPT decomposition, the original input EEG signals are also employed for feature extraction. By this means, the aforemen-

n 

i=1

(10)

tioned features are derived from each of these individual waves. The eigenvector can be described with following expression: Max Eig = [S1En , S2CM , S3Kur , S4Ske , S5Max , AAAAEn 6 , ..., DDDD85 ]

(11)

where Eig is the input eigenvector of 85-dimension, S is the raw EEG data. 2.3. Feature selection using ANOVA Because of the large dimension of feature vector, some of the feature values are redundant and even irrelevant [22]. These features may hurt the generalization performance of classifiers while making no contribution to the recognition rate. Therefore, one-

236

M. Li et al. / Biomedical Signal Processing and Control 41 (2018) 233–241

way ANOVA test is often implemented to select the significant and unique features. The extracted features are subjected to ANOVA with respect to significant difference testing. A F-statistic is provided for each feature, and the feature with relatively higher F is able to provide good discrimination. The computation process of the F-statistic can be found in literature [23]. Particularly, feature ranking is utilized to pick out the feature combination of competitive distinguish ability. Hence, we rank the extracted features in the decreasing order of their F-statistic. And the features are added one by one in the decreasing order of their F-statistic and are classified by LS-SVM.

formance of LS-SVM. The definitions of these parameters are briefly recapitulated below [13]: Sensitivity(Sen) =

True positives × 100% True positives + False negatives

Specificity(Spe) =

True negatives × 100% True negatives + False positives

Accuracy(Acc) =

(17)

Correctly classified samples × 100% Total number of samples

2.4. Classifications using LS-SVM LS-SVM developed by Suykens et al. has been deemed as a particular modification of standard SVM [24]. The LS-SVM use equality constraints instead of inequality constraints and solve linear equations instead of the quadratic programming [25]. Therefore, the LS-SVM outperforms SVM in generalization ability and computational load. In this paper, LS-SVM classifier is performed for intelligent recognition of epileptic EEG signals.  n Given the training set of N data pairs xi , yi and xi  Rn is i=1 the ith input variable with the corresponding output yi  {−1,+1}. And the separating hyperplanes in feature space is represented as [26,27]: y(xi ) = ωT (xi ) + b

(12)

where ω is the weight vector, b is the bias term and (·) is the mapping function. Then the optimization problem of LS-SVM can be formulated in the following form:

⎧ ⎪ ⎨ ⎪ ⎩

1 T 1  2 i ω ␻+ ␥ 2 2 l

min J(␻, ␰) = subject to yi





ωT (xi ) + b

(13)

i=1

= 1 − i , i = 1, ..., l

where  is the regularization parameter and  i is the fitting error. The Lagrange function can be constructed as: L(ω, b, , ˛) = J(ω, ) −

l  

˛i yi [ωT (xi ) + b] − 1 + i



(14)

i=1

where ˛i is the Langrange multiplier. After working out the partial derivatives with respect to each variable in Eq. (13) and eliminating ω and  i , the optimization problem is converted to solving following equations:



0

QT

Q

PP T

 b

+  −1 I

˛

 =

0 y

(15)

where P = [␸(x1 )T y1 ,. . .,␸(xl )T yl ], y = [1,1,1,. . .1], Q = [y1 ,. . .,yl ]T , ␣ = [␣1 ,. . .,˛l ]T . Then the prediction model of LS-SVM is described as:

 n

f (x) =

ai K(x, xi ) + b

(16)

i=1

where K(x,xi ) is the kernel function. This study adopts the powerful Radial Basis Function (RBF) for classification. And in the case of RBF, two parameters  and  are involved [28]. Additionally, considering the importance of parameters selection, the grid research approach is used to obtain prominent parameters. 2.5. Classifier assessment performance To be rigorous, three measures such as sensitivity, specificity and classification accuracy are introduced to investigate the per-

3. Experimental results 3.1. Feature extraction The proposed scheme for automated diagnosis of epilepsy is implemented in Matlab 2016b and run on a 3.40 GHz AMD CPU processor machine with 4 G RAM. Initially, the EEG is band-limited at the range of 0–60 Hz using Butterworth low-pass filter, since there is little useful information contained in EEG signals above a frequency of 60 Hz. The filtered signals are subjected to WPT upto 4 levels and we have focus our analysis on the influence of different wavelet families with different orders. Eight wavelet bases (WB), Daubecchies (Db) orders 4, 6, 8, coiflets (Coif) orders 1, 3 and Symlet (Sym) orders 4–6, are taken into account while conducting the experiments in this part. In an attempt to enrich the discriminating capability of eigenvectors, we conduct KDE on both raw EEG and obtain sub-bands for probability density calculating with respect to four different kernel functions (Un, Tr, Ep, Ga). However, for lack of space, we here only plot the PDF of each segment in the case of coif1 and Un kernel, as depicted in Fig. 3. Notice that there are significant differences among the PDF of three data sets. In general, the distribution of set F is more concentrated in a smaller range as well as provides higher peak values. On the other hand, the distribution of set S achieves longer span and smaller peak values. Hence, with the approaching of KDE, more statistical patterns appear in EEG so the EEG recording becomes more and more regular. After KDE, five well-known statistics namely, Max, Ske, Kur, En, and CM, are calculated over the induced PDF components as valuable features for 3-class EEG discriminating, generating 85 values for each eigenvector without features selection. The box plots in Fig. 4 shows the discrimination for En, CM, Kur, Ske and Max of sets Z, F and S. With the goal of selecting prominent feature combination, these features are subjected to statistical tests such as ANOVA to test the discrimination potential of each of them. 3.2. Statistical validation on extracted features In the feature selection part, feature values are ranked according to F-statistic and fed to the LS-SVM classifier by adding one by one. The F-statistic of the extracted features are exhibited in Fig. 5. The feature is arranged by the order in Eq. (11). Notice that CM has obtained relatively higher F values, which would make the most contribution to research of epilepsy diagnosis. With salient features, better performance can be achieved in machine learning and data mining tasks based on supervised learning of LS-SVM models. Here, LS-SVM classifier is supposed to undergo two data allocation methods: a 10-fold cross-validation technique and a standard 5050 methodology. While implementing 10-fold cross-validation, the data set is partitioned into 10 sub-sets of equal size, one of which is reserved as testing set and the rest as testing sets. This process is repeated 10 times with each of the 10 sub-sets used exactly once as the testing data. In order to entangle a standard 50-50 method-

M. Li et al. / Biomedical Signal Processing and Control 41 (2018) 233–241

237

Fig. 3. KDE analysis of the sample EEGs and its corresponding sub-bands.

Fig. 4. Box plots of the features extracted form the induced PDF component.

ology, 50% of the samples are used for training the model, and the remaining 50% for testing. LS-SVM first learn the training parameters that best predict the class of the input sample, then the testing set is subjected to this trained model to be classified into one of the three classes. The features are added one by one in the decreasing order of their Fstatistic and are fed to LS-SVM. Taking Coif1 and Un as an example, the classification performance with increasing features is shown Fig. 6. From Fig. 6, we find that there is no significant change in Sen when the number of features is larger than 6. And the values of Acc and Sen have obvious fluctuations. For both 10-fold crossvalidation and standard 50-50 methodology, the highest Acc with less features is yielded for comprehensive analysis.

3.3. Classification performance on different KDE kernels and wavelet bases Through the above process, the best performance is recorded when changes are made to WPT bases and kernel functions, as depicted in Table 2. Notice that some wavelet bases have delivered the same accuracy values, hence, the one which presents minimum features is a worthwhile option. With this guideline, the best Acc and the corresponding metrics of each kernel function are highlighted in bold. Considering the results conveyed in Table 3, it is possible to observe that, the combination of Coif1 and Un kernel have achieved prominent results for both 10-fold cross-validation technique and standard 50-50 methodology. This demonstrates the

238

M. Li et al. / Biomedical Signal Processing and Control 41 (2018) 233–241

Fig. 5. F-statistic of the extracted features.

Fig. 6. Classification performance delivered by the combination of Coif1 and Un. (a) 10 fold cross validation. (b) Standard 50-50 methodology.

usefulness of this combination, both in terms of efficiency and effectiveness criteria. In the case of 10-fold cross-validation technique, Db8 presented the second best average rate, reaching 99.33% with 30 features when configured with Tr kernel. As for standard 5050 methodology, the combination of Coif1 and Tr has performed a classification of 99.60% with 39 features, whereas the same accuracy is yielded by Coif1 and Un using only 26 features. Hence, it can also be found that the choice of the wavelet bases and kernel functions seems to be a significant issue for EEG analysis. In this regard, efforts toward finding out a suitable wavelet bases as well as kernel functions are of great importance. Another important aspect to be observed is that the overall Spe shown by Ga kernel is rather comparable than others regardless of the wavelet bases. That is, Ga kernel tends to more appropriate for coping with 2-class (normal and epileptic) EEG classification problems. Notice that Coif1 are in fact good filter for characterizing EEG signals for the purpose of epilepsy diagnosis, leading to the best results delivered by

the LS-SVM classifier. On the basis of these results, we can choose Coif1 with Un kernel as the suitable feature extractor to achieve higher classification accuracy as well as lesser computational cost compared to others. 3.4. Performance comparison with WPT-based features In order to further evaluate the potentials of applying KDE to the task of epilepsy diagnosis, an extra experiment using certain features extracted from the WPT sub-bands is done in comparison with our WPT-KDE features. In this part, the coif1 and Un kernel are fixed based on results stated above. Additionally, the computing time (CT) for one sample is analyzed. Table 3 has displayed classification performance delivered by two feature extractors. It is clear from the results that the behavior shown by WPT-KDE outperforms that exhibited by WPT features in terms of classification accuracy and feature numbers for all validation techniques. The WPT method

M. Li et al. / Biomedical Signal Processing and Control 41 (2018) 233–241

239

Table 2 Classification performance of the proposed method using different kernels and wavelet bases. Kernel

Un

Tr

Ep

Ga

WB

10-fold cross validation

Db4 Db6 Db8 Sym4 Sym5 Sym6 Coif1 Coif3 Db4 Db6 Db8 Sym4 Sym5 Sym6 Coif1 Coif3 Db4 Db6 Db8 Sym4 Sym5 Sym6 Coif1 Coif3 Db4 Db6 Db8 Sym4 Sym5 Sym6 Coif1 Coif3

Standard 50-50 methodology

Sen (%)

Spe (%)

Acc (%)

Feature No.

Sen (%)

Spe (%)

Acc (%)

Feature No.

99.40 ± 0.22 99.10 ± 0.22 99.60 ± 0.22 99.30 ± 0.27 99.40 ± 0.89 99.30 ± 0.27 99.40 ± 0.22 99.20 ± 0.27 99.50 ± 0 99.20 ± 0.27 99.70 ± 0.27 99.40 ± 0.22 99.40 ± 0.22 99.20 ± 0.27 99.20 ± 0.27 98.70 ± 0.27 99.40 ± 0.22 99.20 ± 0.27 99.50 ± 0 99.30 ± 0.27 99.50 ± 0 99.20 ± 0.27 99.0 ± 0 99.80 ± 0.45 99.4 ± 0.22 99.10 ± 0.22 99.70 ± 0.27 99.20 ± 0.27 99.50 ± 0 99.10 ± 0.22 99.20 ± 0.27 99.10 ± 0.22

99.4 ± 0.89 99.0 ± 0 99.20 ± 0.84 99.20 ± 0.45 99.30 ± 0.27 99.40 ± 0.55 100 ± 0 99.80 ± 0.45 99.60 ± 0.55 99.0 ± 0 99.40 ± 0.89 99.40 ± 0.55 99.80 ± 0.45 98.80 ± 0.45 100 ± 0 100 ± 0 99.40 ± 0.55 99.40 ± 0.55 99.80 ± 0.45 99.20 ± 0.45 99.60 ± 0.89 99.80 ± 0.45 100 ± 0 99.0 ± 0.35 99.80 ± 0.45 98.80 ± 0.45 99.60 ± 0.55 99.60 ± 0.55 99.60 ± 0.55 99.80 ± 0.45 99.80 ± 0.45 99.80 ± 0.45

98.87 ± 0.65 98.80 ± 0.18 99.13 ± 0.38 98.93 ± 0.27 98.93 ± 0.64 98.53 ± 0.51 99.40 ± 0.27 98.87 ± 0.30 99.0 ± 0.41 98.73 ± 0.28 99.33 ± 0.41 99.20 ± 0.18 99.13 ± 0.18 98.47 ± 0.30 99.27 ± 0.43 98.93 ± 0.28 98.80 ± 0.51 98.67 ± 0.24 99.20 ± 0.18 99.0 ± 0.23 99.20 ± 0.38 98.53 ± 0.38 99.13 ± 0.18 98.87 ± 0.45 99.0 ± 0.41 98.60 ± 0.43 99.33 ± 0.47 99.0 ± 0.33 99.13 ± 0.30 98.60 ± 0.28 99.13 ± 0.18 99.0 ± 0.24

51 38 55 53 37 26 27 31 38 35 32 31 40 20 33 37 41 30 30 46 58 38 35 32 40 34 35 35 36 30 48 32

99.0 ± 0 99.0 ± 1.0 99.40 ± 0.89 99.40 ± 0.89 99.40 ± 0.89 99.0 ± 0.71 99.40 ± 0.55 99.40 ± 0.89 98.80 ± 0.45 99.0 ± 0.71 98.60 ± 1.14 99.40 ± 0.89 99.20 ± 0.84 99.60 ± 0.55 99.80 ± 0.45 99.20 ± 0.84 99.20 ± 0.45 99.20 ± 0.84 98.80 ± 0.84 99.80 ± 0.45 98.60 ± 0.55 100 ± 0 99.0 ± 1.0 99.60 ± 0.55 98.80 ± 0.84 99.0 ± 0.71 99.40 ± 0.55 99.20 ± 0.84 99.20 ± 0.84 99.80 ± 0.45 99.40 ± 0.89 98.80 ± 0.84

99.60 ± 0.89 99.20 ± 1.10 98.80 ± 1.79 98.80 ± 1.79 98.0 ± 2.45 99.60 ± 0.89 100 ± 0 100 ± 0 99.60 ± 0.89 100 ± 0 100 ± 0 99.20 ± 1.10 99.60 ± 0.89 98.40 ± 1.67 99.20 ± 1.79 98.40 ± 2.19 100 ± 0 98.80 ± 1.10 100 ± 0 99.20 ± 1.10 100 ± 0 99.20 ± 1.10 99.60 ± 0.89 97.60 ± 1.67 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 99.20 ± 1.10 100 ± 0 99.20 ± 1.09

98.67 ± 0.67 98.80 ± 0.73 98.80 ± 0.56 98.40 ± 1.01 98.53 ± 0.56 98.80 ± 0.87 99.60 ± 0.37 98.80 ± 0.56 98.27 ± 0.60 98.27 ± 0.37 98.80 ± 0.99 98.93 ± 0.37 98.93 ± 0.37 98.13 ± 1.10 99.60 ± 0.60 98.53 ± 0.30 98.80 ± 0.56 98.40 ± 0.37 98.80 ± 0.56 99.20 ± 0.87 98.80 ± 0.30 98.80 ± 1.19 98.67 ± 0.67 98.40 ± 1.12 98.80 ± 0.56 98.80 ± 0.87 99.07 ± 0.76 99.20 ± 0.73 98.40 ± 1.01 98.80 ± 0.56 98.83 ± 1.53 97.73 ± 0.37

34 44 33 35 27 35 26 24 41 62 41 26 69 21 39 27 46 38 18 37 24 21 44 21 38 57 35 30 27 53 34 34

Table 3 Classification performance delivered by two feature extractors. Feature

WPT WPT-KDE

10 fold cross validation

Standard 50-50 methodology

Sen (%)

Spe (%)

Acc (%)

No. features

CT (ms)

Sen (%)

Spe (%)

Acc (%)

No. features

CT (ms)

99.50 99.40

99.40 100

98.87 99.40

31 27

0.230 0.249

99.0 99.40

99.20 100

98.53 99.60

38 26

0.103 0.108

has provide the accuracy values of 98.87% and 98.53% for 10-fold cross-validation and standard 50-50 methodology, separately. Performances of the recognition model are significantly improved after KDE algorithm is integrated into this framework as a feature extractor. Our results show that features extracted in WPT-KDE domain are remarkable superior to that in WPT domain. Furthermore, CT of the two methods is rather similar which implies that our proposed is able to achieve better results without increasing the calculation burden. Particularly, noted that the CT of our method is less than 0.3 ms. Finally, the generalization error of our model (0.27%) has indicated its excellent abilities of generalization and robustness. We have confirmed empirically that the deployment of KDE come to be a good alternative if one wishes to achieve high levels of accuracy for discriminating epileptic from non-epileptic profiles. 4. Discussion In this section, we make a comparison between the proposed method and the techniques reported in other literature. To make the results reliable, we only reviewed those studies that used the same data and same classification problems. A summary of the existing methods are listed in Table 4. Kaya et al. [29] proposed an effective approach based on onedimensional local binary pattern (1D-LBP) and 95.67% of accuracy for 10-fold cross-validation was obtained by BayesNet classifier.

Acharya et al. [30] presented a technique that used four different entropies in combination with an fuzzy classifier for automated detection of epilepsy. An outstanding accuracy of 98.10% is provided by this technique in detecting epileptic signals. The same group [31] has shown by classifying the EEG into three classes using higher order spectra (HOS) and textures extracted from continuous wavelet transform (CWT) with SVM obtaining 96.0% of accuracy. Riaz et al. [32] utilized empirical mode decomposition (EMD) to processing EEG and spectral features were extracted from the intrinsic mode functions (IMF). The best performance is obtained using the proposed method in combination with the SVM giving about 84.0% accuracy. In our previous work. [33], a novel method was developed for detecting normal, interictal and epileptic signals using wavelet-based envelope analysis (EA) neural network ensemble (NNE). We obtained an improved accuracy of 98.78% with standard 50-50 methodology. Moreover, our group [34] also automatically classified three classes using frequency slice wavelet transform (FSWT) and SVM, by which an accuracy of 98.33% was performed. Orhan et al. [35] has combined WT and K-means algorithm for feature extraction. On using the computed probability distributions, accuracies of 96.67% were reported by multi-layer perceptron neural network (MLPNN) model. In a study by Martis et al. [36], an classification accuracy of 96.0% was yielded using EMD decomposition and C4.5 classifier in the case of 10-fold crossvalidation.

240

M. Li et al. / Biomedical Signal Processing and Control 41 (2018) 233–241

Table 4 Classification accuracy compared with other existing methods with the same epileptic EEG data sets. Authors

Method

Validation method

Acc (%)

Kaya et al. [29] Acharya et al. [30] Achary et al. [31] Riaz et al. [32] Martis et al. [36] Proposed method Li et al. [33] Zhang et al. [34] Orhan et al. [35] Proposed method

1D-local binary pattern + BayesNet Entropies + Fuzzy Classifier CWT + SVM EMD + SVM EMD + C4.5 WPT-KDE + LS-SVM DWT-based EA + NNE FSWT + SVM WT + k-means clustering + MLPNN WPT-KDE + LS-SVM

10-fold cross-validation 10-fold cross-validation 10-fold cross-validation 10-fold cross-validation 10-fold cross-validation 10-fold cross-validation 50−50 50−50 50−50 50−50

95.67 98.10 96.0 84.0 95.33 99.40 98.78 98.33 96.67 99.60

In this present paper, a new framework based on KDE and LS-SVM is developed in classification of normal, interictal and ictal EEG. It is evident from the comparison that our approach has achieved the superior performance with both standard 50-50 methodology and 10-fold cross-validation. The salient points of this research are listed below: (1) One novelty of our paper is the application of KDE, which realizes considerable enhancement in accuracy compared to other methods. Such framework has never been developed in this field. KDE makes it possible to excavate the data distribution by not using the prior knowledge of data distribution. This method avoids the impact of model and parameters estimation. Hence, by employing KDE in our experiments, the distribution situation of the coefficients in WPT domain can be extrapolated actually, which means more representative information and data characteristic are exposed. (2) Five statistical features are used to characterize the EEG instead of the complicated non-linear features. And the features, extracted in combined WPT-KDE domain, have been proven to provide better discrimination ability without loading the computational cost. The entire process is thus computationally simple and fast to carry out. Moreover, to draw reliable and precise conclusions, ranking method are adopted for feature selection and LS-SVM classifier is estimated with two widely used validation technique. (3) Our major contribution is that the influence of different kinds of wavelet bases and kernel functions are taken into account. In this regard, extensively experiments are conducted to find out the prominent combination so that the classification of the signals can be achieved with higher accuracy and less features. According to the results, the combination of coif1 and Un has yielded superior results with few features that can be extracted with less computational cost. In a word, our method have make a trade off between the performance and efficiency, which is feasible to be extended as a real-time EEG monitor. 5. Conclusion To address the complex problem of subjective visual inspection of tremendous EEG recordings, a novel automatic approach based on KDE and LS-SVM is proposed for seizure detection. In this paper, we have explored the potential of KDE to the task of EEG processing and attempted to associate the WPT with KDE as the feature extractor. With the goal of developing suitable features to distinguish multi-class EEG signals, some experiments have been conducted to analyze the effects of eight wavelet bases in associated with four kernel functions. Another distinctive aspect is that we compare the classification performance between the WPT-KDE model used in our framework and the WPT model, and find that

considerable enhancement in accuracy is able to be realized by applying KDE. And the LS-SVM classifier configured with coif1 and Un has yielded satisfied results with less number of features that can be extracted with less computational cost. Implementation results indicated that our proposed scheme has the advantages of being more precise, effective, and easy-implementation. Thus, this technique is valuable to be deployed in a real-time clinical setting, which helps physicians to confirm cases as well as decide the course of treatment. In the future, we will validate the method using lager database and intend to develop an online applications. Conflict of interest None. Acknowledgments This work is supported by the Fundamental Research Funds for the Central Universities (Grant No. 451170301193) and Natural Science Foundation for Science and Technology Development Plan of Jilin Province, China (Grant No. 20150101191JC). References [1] J. Li, W. Zhou, S. Yuan, Y. Zhang, C. Li, Q. Wu, An improved sparse representation over learned dictionary method for seizure detection, Int. J. Neural Syst. 26 (1) (2016) 1550035. [2] S. Ghosh-Dastidar, H. Adeli, N. Dadmehr, Mixed-Band wavelet-Chaos-Neural network methodology for epilepsy and epileptic seizure detection, IEEE Trans. Biomed. Eng. 54 (9) (2007) 1545–1551. [3] M. Peker, B. Sen, D. Delen, A novel method for automated diagnosis of epilepsy using complex-Valued classifiers, IEEE J. Biomed. Health Inf. 20 (1) (2015) 108–118. [4] Y. Song, J. Zhang, Discriminating preictal and interictal brain states in intracranial EEG by sample entropy and extreme learning machine, J. Neurosci. Methods 257 (2016) 45–54. [5] J.L. Song, W. Hu, R. Zhang, Automated detection of epileptic EEGs using a novel fusion feature and extreme learning machine, Neurocomputing 175 (2015) 383–391. [6] T.S. Kumar, V. Kanhangad, R.B. Pachori, Classification of seizure and seizure-free EEG signals using local binary patterns, Biomed. Signal Process. Control 15 (2015) 33–40. [7] P. Swami, T.K. Gandhi, B.K. Panigrahi, M. Tripathi, S. Anand, A novel robust diagnostic model to detect seizures in electroencephalography, Expert Syst. Appl. 56 (C) (2016) 116–130. [8] S. Altunay, Z. Telatar, O. Erogul, Epileptic EEG detection using the linear prediction error energy, Expert Syst. Appl. 37 (8) (2010) 5661–5665. [9] K. Polat, S. Gunes, Classification of epileptiform EEG using a hybrid systems based on decision tree classifier and fast fourier transform, Appl. Math. Comput. 187 (2) (2007) 1017–1026. [10] A. Subasi, EEG signal classification using wavelet feature extraction and a mixture of expert model, Expert Syst. Appl. 32 (4) (2007) 1084–1093. [11] Q. Yuan, W. Zhou, S. Li, D. Cai, Epileptic EEG classification based on extreme learning machine and nonlinear features, Epilepsy Res. 96 (1–2) (2011) 29–38. [12] H. Ocak, Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy, Expert Syst. Appl. 36 (2) (2009) 2027–2036. [13] Y. Kumar, M.L. Dewal, R.S. Anand, Epileptic seizure detection using DWT based fuzzy approximate entropy and support vector machine, Neurocomputing 133 (8) (2014) 271–279.

M. Li et al. / Biomedical Signal Processing and Control 41 (2018) 233–241 [14] R.J. Martis, J.H. Tan, C.K. Chua, T.C. Loon, S.Y.W. Jie, L. Tong, Epileptic EEG classification using nonlinear parameters on different frequency bands, J. Mech. Med. Biol. 15 (3) (2015) 1550040. [15] U.R. Acharya, S.V. Sree, A.P.C. Alvin, J.S. Suri, Use of principal component analysis for automatic classification of epileptic EEG activities in wavelet framework, Expert Syst. Appl. 39 (10) (2012) 9072–9078. [16] K. Andrzejak, F. Lehnertz, C. Mormann, P. Rieke, C.E. David, Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state, Phys. Rev. 64 (6) (2001) 061907. [17] R.R. Coifman, M.V. Wickerhauser, Entropy-based algorithms for best basis selection, IEEE Trans. Inf. Theory 38 (1992) 713–718. [18] H.Z. Hosseinabadi, R. Amirfattahi, B. Nazari, H.P. Mirdamadi, S.A. Atashipour, GUW-based structural damage detection using WPT statistical features and multiclass SVM, Appl. Acoust. 86 (8) (2014) 59–70. [19] Z. Tao, W.Z. Chen, M.Y. Li, Recognition of epilepsy electroencephalography based on AdaBoost algorithm, Acta Phys. Sin. 64 (12) (2015) 128701. [20] C. Tai, Y.H. Liu, A robust estimator for structure from motion based on kernel density estimation, in: International Conference on Intelligent Robots and Systems (IROS 2006), Beijing, China, 2006, pp. 1298–1303 (October 9–15). [21] M. Rosenblatt, Remarks on some nonparametric estimates of a density function, Annal. Math. Stat. 27 (3) (1956) 832–837. [22] S. Noshadi, V. Abootalebi, M.T. Sadeghi, M.S. Shahvazian, Selection of an efficient feature space for EEG-based mental task discrimination, Biocyber. Biomed Eng. 34 (3) (2014) 159–168. [23] A.M. Brown, A new software for carrying out one-way ANOVA post hoc tests, Comput. Methods Prog. Biomed. 79 (1) (2005) 89–95. [24] J.A.K. Suykens, J. Vandewalle, Least squares support vector machines dassifiers, Neural Net. Work Lett. 9 (1999) 293–300. [25] S. Siuly, Y. Li, Designing a robust feature extraction method based on optimum allocation and principal component analysis for epileptic EEG signal classification, Comput. Methods Programs Biomed. 119 (1) (2015) 29–42.

241

[26] H. Xu, G. Chen, An intelligent fault identification method of rolling bearings based on LSSVM optimized by improved PSO, Mech. Syst. Signal Process. 35 (1–2) (2013) 167–175. [27] C. Lu, J. Chen, R. Hong, Y. Feng, Y. Li, Degradation trend estimation of slewing bearing based on LSSVM model, Mech. Syst. Signal Process. 76–77 (2016) 353–366. [28] S. Ismail, A. Shabri, R. Samsudin, A hybrid model of self-organizing maps (SOM) and least square support vector machine (LSSVM) for time-series forecasting, Expert Syst. Appl. 38 (8) (2011) 10574–10578. [29] Y. Kaya, M. Uyar, R. Tekin, S. Yildirim, 1D-local binary pattern based feature extraction for classification of epileptic EEG signals, Appl. Math. Comput. 243 (2014) 209–219. [30] U.R. Acharya, F. Molinari, S.V. Sree, S. Chattopadhyay, K. Ng, J.S. Suri, Automated diagnosis of epileptic EEG using entropies, Biomed. Signal Process. Control. 7 (4) (2012) 401–408. [31] U.R. Acharya, R. Yanti, J.W. Zheng, Automated diagnosis of epilepsy using CWT, HOS and texture parameters, Int. J. Neural Syst. 23 (3) (2013) 1001–1007. [32] F. Riaz, A. Hassan, S. Rehman, I.K. Niazi, K. Dremstrup, EMD based temporal and spectral features for the classification of EEG signals using supervised learning, IEEE Trans. Neural Syst. Rehabil. Eng. 24 (1) (2015) 28–35. [33] M. Li, W. Chen, T. Zhang, Classification of epilepsy EEG signals using DWT-based envelope analysis and neural network ensemble, Biomed. Signal Process. Control 31 (2017) 357–365. [34] T. Zhang, W. Chen, M. Li, Automatic seizure detection of electroencephalogram signals based on frequency slice wavelet transform and support vector machine, Acta Phys. Sin. 65 (3) (2016) 038703. [35] U. Orhan, M. Hekim, M. Ozer, EEG signals classification using the K-means clustering and a multilayer perceptron neural network model, Expert Syst. Appl. 38 (10) (2011) 13475–13481. [36] R.J. Martis, U.R. Acharva, J.H. Tan, A. Petznick, Application of empirical mode decomposition (emd) for automated detection of epilepsy using EEG signals, Int. J. Neural Syst. 22 (6) (2012) 809–827.