Expert Systems with Applications 38 (2011) 10425–10436
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Automatic feature extraction using genetic programming: An application to epileptic EEG classification Ling Guo ⇑, Daniel Rivero, Julián Dorado, Cristian R. Munteanu, Alejandro Pazos Department of Information Technologies and Communications, University of La Coruña, Campus Elviña, 15071 A Coruña, Spain
a r t i c l e
i n f o
Keywords: Genetic programming Feature extraction K-nearest neighbor classifier (KNN) Discrete wavelet transform (DWT) Epilepsy EEG classification
a b s t r a c t This paper applies genetic programming (GP) to perform automatic feature extraction from original feature database with the aim of improving the discriminatory performance of a classifier and reducing the input feature dimensionality at the same time. The tree structure of GP naturally represents the features, and a new function generated in this work automatically decides the number of the features extracted. In experiments on two common epileptic EEG detection problems, the classification accuracy on the GPbased features is significant higher than on the original features. Simultaneously, the dimension of the input features for the classifier is much smaller than that of the original features. Ó 2011 Elsevier Ltd. All rights reserved.
1. Introduction Feature extraction for classification is to seek a transformation or mapping from original features to a new feature space which can maximize the separability of different classes. A classification problem cannot be properly solved if important interactions and relationships between the original features are not taken into consideration. Thus, many researches agreed that the feature extraction is the most important key to any pattern recognition and classification problem: ‘‘The precise choice of features is perhaps the most difficult task in pattern recognition’’ (Micheli-Tzanakou, 2000), ‘‘an ideal feature extraction would yield a representation that makes the job of the classifier trivial’’ (Duda, Hart, & Stork, 2001). In most cases, feature extraction is done by human based on the researchers knowledge, experience, and/or intuition. Epilepsy is a type of neurological disorder disease. About 40 or 50 million people in the world suffer from epilepsy (Kandel, Schwartz, & Jessell, 2000). The significant characteristic of epilepsy is recurrent seizures. Epilepsy can have several physical, psychological and social consequences, including mood disorders, injuries and sudden death. Until now, the specific cause of epilepsy in individuals is often unknown and the mechanisms behind the seizure are little understood. Thus, efforts toward its diagnosis and treatment are of great importance. This work has a multidisciplinary nature, in the combination of evolutionary computation, feature extraction and biomedical signal processing. It applies genetic programming (GP) to automatic feature extraction with the purpose of improving the discrimination performance of K-nearest neighbor (KNN) classifier and decreasing the input feature dimension. The tree structure of GP ⇑ Corresponding author. Tel.: +34 981 167000x1302; fax: +34 981 167160. E-mail address:
[email protected] (L. Guo). 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.02.118
helps itself naturally representing new features; and a new function generated in this work for GP helps the automatic feature extraction. The novel method is verified to be successful with application on the epileptic EEG classification problems. It is seen that the classification accuracy has been greatly increased with the GP extracted features. At the same time, the input feature dimension is tremendously reduced, down to three or four features for epileptic EEGs discrimination. In addition, through analyzing the expression of the GP extracted features, informative measures useful for EEGs classification are selected from original features. The paper is organized as follows: In Section 2, an introduction of EEG and epilepsy is presented to allow general understanding the nature of the application. Then, discrete wavelet transform (DWT), which is an efficient non-stationary signal processing tool, is briefly described. Later, some basic aspects of the genetic programming are presented, since this is the key technique of the proposed method. A short description of K-nearest neighbor classifier is included in the same part. Previous works on genetic programming application on feature extraction and epileptic EEGs classification are also described in this section. The epileptic EEGs classification problem considered in this work is given in Section 3. In Section 4, the detail explanation of the developed methodology is covered. Continuously, implementation and results of the proposed methodology on two different epileptic EEG classifications are discussed. Finally, conclusions and the future work are included.
2. State of the art 2.1. Epilepsy and electroencephalogram (EEG) Epilepsy is the second most prevalent neurological disorder in humans after stroke. It is characterized by recurring seizure in
10426
L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436
which abnormal electrical activity in the brain causes altered perception or behavior. Approximately one in every 100 persons will experience a seizure at some time in their life. Until now, the occurrence of an epileptic seizure is unpredictable and its course of action is little understood. Electroencephalograhy is the recording of the electrical activity of the brain, usually taken through several electrodes at the scalp. EEG contains lots of valuable information relating to the different physiological states of the brain and thus is a very useful tool for understanding the brain disease, such as epilepsy. The detection of epileptiform discharges occurring in the EEG is an important component in the diagnosis and treatment of epilepsy (Subasi, 2005a). However, usually tons of data are included in EEG recordings and visual inspect for discriminating EEGs is time consuming process and high costly. Much effort has been devoted to develop epileptic EEG classification techniques.
Mallat (1989) developed an efficient way of implementation DWT by passing the signal through a series of low-pass and high-pass filters. The DWT implementation procedure is schematically shown in Fig. 1, where filters h[n] and g[n] correspond to high-pass and low-pass filters, respectively (Subasi, 2007). In the first stage, the signal is simultaneously passed through h[n] and g[n] filters with the cut-off frequency being the one fourth of the sampling frequency. The outputs of h[n] and g[n] filters are referred to as detail (D1) and approximation (A1) coefficients of the first level, respectively. The same procedure is repeated for the first level approximation coefficients to get the second level coefficients. At each stage of this decomposition process, the frequency resolution is doubled through filtering and the time resolution is halved through downsampling. The coefficients A1, D1, A2, and D2 represents the frequency content of the original signal within the band 0 fS/4, fS/4 fS/2, 0 fS/8, and fS/8 fS/4, respectively, where fS is the sampling frequency of the original signal x[n].
2.2. Discrete wavelet transform 2.3. Genetic programming EEG signal is a complicated and non-stationary signal and its characteristics are spatio-temporal dependent. Based on these properties, DWT is chosen in this work for pre-analyzing the epileptic EEG signal (Adeli, Zhou, & Dadmehr, 2003). Recently, wavelet transform (WT) has been widely applied in many engineering fields for solving various real-life problems. The Fourier transform (FT) of a signal obtains the frequency content of the signal and eliminates time information. The short-time Fourier transform (STFT) is a series of FT with a fixed window size. Because a large window loses time resolution and a short window loses frequency resolution, there always exists a trade-off between time and frequency resolutions: on the one hand, a good time resolution requires a short window with short time support; on the other hand, a good frequency resolution requires a long time window. The fixed window size (fixed time–frequency resolution) of the STFT results in constraint on some applications. Contrary to STFT, WT provides a more flexible way of time–frequency representation of a signal by allowing the use of variable sized analysis windows. The attractive feature of WT is that it provides accurate frequency information at low frequencies and accurate time information at high frequencies (Adeli et al., 2003). This property is important in biomedical applications, because most signals in this field always contain high frequency information with short time duration and low frequency information with long time duration. Through wavelet transform, transient features are precisely captured and localized in both time and frequency domain. WT has been commonly considered as one of the most powerful tools to EEG signal analysis. The continuous wavelet transform (CWT) of a signal S(t) is defined as the correlation between S(t) and the wavelet function wa,b as follows (Chui, 1992): 1
CWT ða;bÞ ¼ jaj2
Z
1
SðtÞw
1
tb dt; a
ð1Þ
Genetic programming is an evolutionary technique used to create computer programs that represent approximate or exact solutions to a problem (Koza, 1992). GP works based on the evolution of a given population. In this population, every individual represents a solution for the problem that is intended to be solved. GP looks for the best solution by means of a process based on the Evolution Theory (Darwin, 1864), in which, from an initial population with randomly individuals, after subsequent generations, new individuals are produced from old ones by means of crossover, selection and mutation operations, based on natural selection, the good individual will have more chances of survival to become part of the next generation. Thus, after successive generations, obtains the best-so-far individual corresponding to the final solution of the problem. The GP encoding for the solutions is tree-shaped, so the user must specify which are the terminals (leaves of the tree) and the functions (nodes capable of having descendants) for being used by the evolutionary algorithm in order to build complex expressions. Also, a fitness function that is used for measuring the appropriateness of the individuals in the population has to be defined in GP. And it is the most critic point in the design of a GP system. The wide application of GP to various environments and its success are due to its capability for being adapted to numerous different problems. Although one of the most common applications of GP is the mathematical expressions generation (Rivero, Rabuñal, Dorado, & Pazos, 2005), it has been also used in other fields such as rule generation (Bot & Langdon, 2000), filter design (Rabuñal et al., 2003), and classification (Espejo, Ventura, & Herrera, 2010), etc. 2.4. K-nearest neighbor classifier K-nearest neighbor (KNN) classifier (Cover & Hart, 1967) is a nonparametric, nonlinear and relatively simple classifier. It classi-
where a, b are called the scale (reciprocal of frequency) and translation (time localization) parameters, respectively. When a and b are taken as discrete numbers and defined on the basis of power of two, which are like the following:
(
aj ¼ 2j ;
ð2Þ
bj;k ¼ 2j k; j; k 2 Z:
Then the discrete wavelet transform is obtained and the Eq. (1) becomes (Chui, 1992):
DWT ðj;kÞ ¼ 2j=2
Z
1
1
SðtÞw
t 2j k 2j
! dt:
ð3Þ Fig. 1. Sub-band decomposition of DWT implementation.
L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436
fies a new sample on the basis of measuring the ‘‘distance’’ to a number of patterns which are kept in memory. The class that KNN classifier determines for this new sample is decided by the pattern which most resembles it, i.e. the one that has the smallest distance to it. The common distance function used in the KNN classifier is the Euclidean distance. Instead of taking the single nearest sample, it is normally taking a majority vote from the k-nearest neighbors. The parameter k has to be selected in practice. In this work, k is chosen to 3. 2.5. Previous work of genetic programming application on feature extraction Raymer, Punch, Goodman, and Kuhn (1996) applied GP to improve KNN classifier performance without feature reduction. For each attribute, they evolved a tree which is a function of the attribute itself and zero or more constants. They applied this to one biochemistry database and obtained better classification performance over similar system with genetic algorithm. In the work of Sherrah (1998), he created a feature extraction system which uses evolutionary computation. The individuals in his work are multi-trees, each tree encoding one feature. This means that the number of extracted features is determined beforehand. His system was not designed for KNN classifier but for other classification systems, such as the minimal distance to means classifier, the parallelepiped classifier and the Gaussian maximum likelihood classifier. Tackett (1993) developed a processing tree derived from GP for the classification of features extracted from images: measurements from the segmented images were weighted and combined through linear and non-linear operations. If the resulting value out of the tree was greater than zero, then the object was classified as a target. Ebner and co-workers have evolved image processing operators using GP (Ebner & Rechnerarchitektur, 1998; Ebner & Zell, 1999). The authors pose the interest point (IP) detection as an optimization problem attempting to evolve the Moravec operator (Ebner & Rechnerarchitektur, 1998) using genetic programming. The authors reported a 15% localization error between interest point detection of his evolved operator and that obtained using the Moravec detector. Bot (2001) has used GP to evolve new features for classification problems, adding these one-at-a-time to a KNN classifier if the newly evolved feature improved the classification performance over a certain amount. Bot’s approach is a greedy algorithm and therefore almost certainly sub-optimal. Harvey et al. (2002) evolved pipelined image processing operations to transform multi-spectral input synthetic aperture radar (SAR) image planes into a new set of image planes and a conventional supervised classifier was used to label the transformed features. Training data were used to derive a Fisher linear discriminant and GP was applied to find a threshold to reduce the output from the discriminant-finding phase to a binary image. However, the discriminability is constrained in the discriminantfinding phase and the GP only used as a one-dimensional search tool to find a threshold. Kotani, Nakai, and Akazawa (1999) used GP to evolve the polynomial combination of raw features to fed into a KNN classifier and proved an improvement in classification accuracy. The author assumed in advance that the features were polynomial expressions and was the product sum of the original patterns. Krawiec (2002) constructed a fixed-length decision vector using GP proposing an extended method to protect ‘useful’ blocks during the evolution. This protection method, however, results in the over-fitting which is proved from his experiments. Krawiee’s results showed that for some datasets, his feature extraction method
10427
actually produces worse classification performance than using the raw input data. Guo, Jack, and Nandi (2005) have evolved features using GP in a condition monitoring task although it is not clear whether the elements in the vector of decision variables were evolved at the same time or hand selected after evolution. Firpi, Goodman, and Echauz (2006) developed some artificial features without physical meaning by means of GP, then applied those features to a KNN classifier for predicting epileptic seizures. The researchers evaluated the performance of GP artificial features on IEEG data from seven patients and obtained satisfied prediction accuracy. However, the maximum number of artificial features evolved through GP has been predefined by the author. Recently, Sabeti, Katebi, and Boostani (2009) employed GP to select the best features from the original feature set for increasing classifier performance on EEG classification problems. In that work, the aim of utilizing GP was to pick out the most important feature elements, and not to create the new artificial GP based features. In the present work, GP is used to create new features from original feature database to improve the KNN classifier performance and simultaneously decrease the input feature dimension for the classifier. The input feature dimension is automatically determined during GP evolution, not fixed beforehand or decided by humans. 2.6. Previous work of epileptic EEGs classification In this part, other researchers’ work on epileptic EEGs classification is briefly reviewed. Mohseni, Maghsoudi, Kadbi, Hashemi, and Ashourvan (2006) applied short time Fourier transform (STFT) analysis of EEG signals and extracted features based on the pseudo Wigner–Ville and the smoothed-pseudo Wigner–Ville distribution. Then those features are used as inputs to an artificial neural network (ANN) for classification. Kalayci and Ozdamar (1995) used wavelet transform to capture some specific characteristic features of the EEG signals and then combined with ANN to get satisfying classification result. Nigam and Graupe (2004) described a method for automated detection of epileptic seizures from EEG signals using a multistage nonlinear pre-processing filter for extracting two features: relative spike amplitude and spike occurrence frequency. Then they fed those features to a diagnostic artificial neural network. In the work of Jahankhani, Kodogiannis, and Revett (2006), the EEGs were decomposed with wavelet transform into different subbands and some statistical information were extracted from the wavelet coefficients. Radial basis function network (RBF) and multi-layer perceptron network (MLP) were utilized as classifiers. Subasi (2005b, 2006, 2007) decomposed the EEG signals into time–frequency representations using discrete wavelet transform. Some features based on DWT were obtained and applied for different classifiers for epileptic EEG classification, such as feed-forward error back-propagation artificial neural network (FEBANN), dynamic wavelet network (DWN), dynamic fuzzy neural network (DFNN) and mixture of expert system (ME). Übeyli (2009) employed wavelet analysis with combined neural network model to discriminate EEG signals. The EEGs were decomposed into time–frequency representations using DWT and then statistical feature were calculated. Then a two-level neural network model was used to classify three types of EEG signals. The results proved that utilizing combined neural network model achieved better classification performance than the stand-alone neural network model. Ocak (2009) detected epileptic seizures based on approximate entropy (ApEn) and discrete wavelet transform. EEG signals were firstly decomposed into approximation and detail coefficients using DWT, and then ApEn values for each set of coefficients were
10428
L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436
computed. Finally, surrogate data analysis was used on the ApEn values to classify EEGs. Isßik and Sezer (in press) investigated wavelet transform for diagnosing epilepsy. The EEGs were decomposed into several sub-bands using wavelet transform and a set of features vectors were extracted. Dimensions of these feature vectors were reduced via principal component analysis method and then classified as epileptic or healthy using multi-layer perceptron and Elman neural networks. Instead of applying features derived from DWT as the inputs of the classifiers to discriminate EEGs, other quantitative information from time series of signals were also investigated. In Güler’s work (Güler, Übeyli, & Güler, 2005), Layapunov exponents were extracted from EEGs with Jacobi matrices and then applied as inputs to recurrent neural networks (RNNs) to obtain good classification results. Übeyli (2006b) classified the EEG signals by combination of Lyapunov exponents and fuzzy similarity index. Fuzzy sets were obtained from the feature sets (Lyapunov exponents) of the signals under study. The results demonstrated that the similarity between the fuzzy sets of the studied signals indicated the variabilities in the EEG signals. Thus, the fuzzy similarity index could discriminate the different EEGs. In the work of Übeyli (2006a), the author used the computed Lyapunov exponents of the EEG signals as inputs of the MLPNNs trained with backpropagation, delta-bar-delta, extended delta-bar-delta, quick propagation, and Levenberg–Marquardt algorithms. The classification accuracy of MLPNN trained with the Levenberg–Marquardt algorithm was 95% for healthy, seizurefree and seizure EEGs discrimination. In the study presented by Übeyli and Güler (2007), decision making was performed in two stages: feature extraction by eigenvector methods and classification using the classifiers trained on the extracted features. The inputs of these expert systems composed of diverse or composite features were chosen according to the network structures. The five-class classification accuracies of expert system with diverse features (MME) and with composite feature (ME) were 95.53% and 98.6%, respectively. Except ANNs are used as classification systems, other types of classifiers are also utilized for EEGs discrimination, which includes linear discriminant analysis (LDA), Multiclass support vector machines (SVMs), Bayesian classifier, and Nearest neighbor classifier. LDA assumes normal distribution of the data, with equal covariance matrix for two classes. The separating hyperplane is obtained by seeking the projection that maximizes the distance between the two classes’ means and minimizes the interclass variance. This technique has a very low computational requirement which makes it suitable for the online and real-time classification problem (Garrett, Peterson, Anderson, & Thaut, 2003). SVM also uses a discriminant hyperplane to identify classes. However, concerning SVM, the selected hyperplane is the one that maximizes the margins, i.e., the distance from the nearest training points. Maximizing the margins is known to increase the generalization capabilities (Blankertz, Curio, & Müller, 2002). The main weak of SVM is its relatively low execution speed. Übeyli (2008a) presented the multiclass support vector machine (SVM) with the error correcting output codes (ECOC) for EEGs classification. The features were extracted by the usage of eigenvector methods which were used to train novel classifier (multiclass SVM with the ECOC) for the EEG signals. The Bayesian classifier aims at assigning to a feature vector the class based on highest probability. The Bayers rule is used to compute the so-called a posteriori probability that a feature vector has of belonging to a given class. Using the MAP (maximum a posteriori) rule and these probabilities, the class of this feature vector can be estimated (Fukunaga, 1990). Nearest neighbor classifiers are relatively simple. They assign a feature vector to a class according to its nearest neighbor(s) and they are discriminative non-linear classifiers (Garrett et al., 2003).
3. Problem description The epileptic EEG classification problem described by Andrzejak et al. (2001) was considered in current research. The whole dataset consists of five sets (denoted as Z, O, N, F and S), each containing 100 single-channel EEG segments of 23.6 s duration, with sampling rate of 173.6 Hz. These segments were selected and cut out from continuous multi-channel EEG recordings after visual inspection for artifacts, e.g., due to muscle activity or eye movements. Sets Z and O consisted of segments taken from surface EEG recordings that were carried out on five healthy volunteers using a standardized electrode placement scheme. Volunteers were relaxed in an awake state with eyes open (Z) and eyes closed (O), respectively. Sets N, F and S originated from an EEG archive of presurgical diagnosis. Segments in set F were recorded from the epileptogenic zone, and those in set N from the hippocampal formation of the opposite hemisphere of the brain. While sets N and F contained only activity measured during seizure free intervals, set S only contained seizure activity. Here, segments were selected from all recording sites exhibiting ictal activity. In this work, two different classification problems are created from the described dataset and then are testified by our method. In the first problem, two classes are examined, normal and seizure. The normal class includes only the segment Z while the seizure class includes the segment S. The second problem includes three classes, normal, seizure-free and seizure. The normal class includes only the set Z, the seizure-free class set F, and the seizure class set S. According to the previous description, the datasets consist of 200 and 300 EEG segments for these two problems, respectively. 4. Methodology The current research is applying genetic programming to automatically extracting new features which improve the classifier performance and reduce the input feature dimension simultaneously. Here, the term ‘automatically’ refers two meanings: one is that the expressions of the new features are automatically defined by GP. Another is that the number of new features is also automatically determined by GP. Fig. 2 illustrates the approach of feature extraction using GP for classification problem. The whole process consists of three stages: Stage 1: creation of original feature database. The feature extraction requires creating original feature database as a prerequisite for the subsequent procedures. In this work, the original features are created on the basis of discrete wavelet transform analysis for the raw EEG signals. Stage 2: genetic programming based feature extraction system. This is the main part of the proposed method. It consists of GP and a KNN classifier. GP non-linearly transforms original features obtained in step 1 to a set of new features for facilitating classification. The output of the system is GP-based features. Stage 3: classification. The purpose of this stage is to testify the efficiency of the GP-based features on the test data. The test data is processed by the GP-based features. Then they are fed to a classifier for evaluating the classification performance. The detail description of the above three stages is in the following. 4.1. Creation of original feature database Since the EEG is non-stationary signal, discrete wavelet transform is chosen to analyze the EEGs and helps create original features. The basic theory of DWT was explained in Section 2.2.
L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436
10429
These five measures calculated on each sub-signal of the EEG signal are used to create the original feature database. 4.2. Genetic programming based feature extraction system
Fig. 2. Diagram of the GP based feature extraction for epileptic EEG classification.
The raw EEG signal is firstly decomposed into several sub-signals by DWT, which includes one approximate and several other detail sub-signals. Each sub-signal represents original signal in different frequency bands. Then, five classic measures used in EEG signal analysis are calculated for each sub-signal. These measures were selected from time, statistics, and information theory to reveal the most important EEG signal characteristics. In order to define these measures, let S be a sampled signal with N samples, where Si is the ith sample, Si1 is the previous one. The definitions of the five measures are: 4.1.1. Mean value of the signal Average (arithmetic mean) of the signal amplitudes. N 1X Si : N i¼1
4.1.2. Standard deviation of the signal The square root of the variance l represents the mean value of the signal.
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N u 1 X t ðSi lÞ2 : N 1 i¼1 4.1.3. Energy of the signal Measures the average instantaneous energy of the signal.
Genetic programming based feature extraction system developed in this work consists of genetic programming and a KNN classifier. The main part of the system is genetic programming. Each individual in GP represents a set of new features, which are nonlinearly transformation from original features. Afterward, those new features are passed to the KNN classifier. The goodness of the new features is evaluated through the misclassification out of the KNN classifier. In classic GP evolution process, each individual of the population represents one expression. When this expression is evaluated, it only allows obtaining one feature. That is the most common situation of GP application in feature extraction. In order to allow each individual to automatically generate more than one feature, in current work a new function named ‘F’ is created and added to the function set. It is a special function used to create the output new feature vector. The ‘F’ function can appear at any position of the tree in GP. ‘F’ has only one argument and the output of ‘F’ is just the copy of its input. However, the input argument of ‘F’ is added to the list of new feature vector. Thus, when evaluating an individual in GP, depending on the number of ‘F’ nodes included in the tree, the same number of new features will be automatically generated. The usage of function ‘F’ can be clearly described with the example shown in Fig. 3, where there are 10 input variables to a classifier. A GP tree in the Fig. 3 contains two ‘F’ nodes. When a tree is evaluated, the numerical value resulting of this evaluation is not used. What is interesting is the sub-trees children of each ‘F’ node. Those sub-trees will determine the extracted new features expression: the first node ‘F’ implies the creation of a feature with the expression x2 ⁄ x5 and it is put to the new feature vector. Then second node ‘F’ implies creation of the feature with expression x7 + x10 and it is also added to the same new feature vector. Thus, after finishing evaluation of the whole tree in Fig. 3, the feature vector including two new features is created. The feature dimension in this example has been decreased from original 10 down to 2. 4.2.1. The fitness function The fitness function is the most important point in GP evolution. In current work, the fitness function is defined as the misclassification of the classifier on the training data, which is depicted in Fig. 4.
N 1X S2 : N i¼1 i
4.1.4. Curve length of the signal Sum of the lengths of the vertical line segments between samples. It provides measures of both time and frequency characteristics. N X
jSiþ1 Si j:
i¼1
4.1.5. Skewness of the signal Measure of the asymmetry of the data distribution. l and r represent the mean and standard deviation of the signal individually.
4 N 1X Si l 3: N i¼1 r
Fig. 3. Example of a GP tree creates the feature vector including two new features.
10430
L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436 Table 1 Frequency bands of EEG signals with 4-level DWT decomposition. Sub-signals
Frequency bands (Hz)
Decomposition level
D1 D2 D3 D4 A4
43.4–86.8 21.7–43.4 10.8–21.7 5.4–10.8 0–5.4
1 2 3 4 4
5. Repeating steps 1–4 several times and calculating the mean value of ei, which is the fitness value of the individual.
Fig. 4. Calculating the fitness of each individual in GP.
The detail procedure of calculating the fitness value of each individual is: 1. The training data pre-processed by original features are randomly split into a sub-training set (40% of the pre-processed training data) and a validation set (60% of the pre-processed training data). 2. Evaluating the tree in GP and then extracting new features. 3. The KNN classifier is trained by the sub-training set processed by the new features. 4. The trained KNN classifier classifies the validation set processed by the new features, and obtains the misclassification value ei:
ei ¼ Nv alidation Ncorrect ;
EEG signal
where Nvalidation is the number of samples in validation set, and Ncorrect is the number of samples correctly classified
After the GP evolution procedure is ended, the best-of-all individual is obtained. Consequently, the best new features for facilitating classification on the training data will be used as the output of the GP based feature extraction system, which are called GP-based features. One point should be mentioned: in current work, the misclassification out of a KNN classifier is used to evaluate the fitness of GP individuals, thus the GP-based features obtained are only optimized for the KNN classifier. Although it is possible that the GP-based feature works well with other classifiers, the performance cannot be guaranteed. But, the idea of this methodology can be expanded to apply on other classification system, such as artificial neural networks.
4.3. Classification Because the GP-based features are derived from the training data, their performance on the classification problem has to be verified on the test data. Since the KNN classifier was selected as the component of GP based feature extraction system, KNN classifier
200 0 −200
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
A4
500 0 −500
D4
200 0 −200
D3
200 0 −200
D2
50 0 −50
D1
20 0 −20
Time (second) Fig. 5. Approximation and details of a sample normal EEG signal.
10431
EEG signal
L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436
500 0 −500
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
A4
1000 0 −1000
D4
1000 0 −1000
D3
500 0 −500
D2
100 0 −100
D1
20 0 −20
Time (second) Fig. 6. Approximation and details of a sample epileptic seizure-free EEG signal.
is also chosen as a classification system to verify the performance of the GP-based features on the test data. 5. Results In this section, the results of implementing the developed methodology on the specific classification problem described in Section 3 are discussed.
base. The dimension of the original feature space is 5 ⁄ 5 = 25. In order to simplify the description, a vector X which contains 25 variables corresponding to the component of original features is defined. The definition of each variable is shown in Table 2. 5.2. GP configurations
5.1. Original feature database
Before applying genetic programming to solve a real-world problem, several types of parameters need to be defined in advance.
For creation of original features, there are two steps. In the first step, discrete wavelet transform is applied to decompose the EEG signal into several sub-signals within different frequency bands. Selection the number of decomposition levels and suitable wavelet function are also important for EEG signal analysis with DWT. In current research, the number of decomposition levels is chosen 4, which is recommended by others’ work (Subasi, 2006). And the wavelet function selected is Daubechies with order 4, which was also proven to be the best suitable wavelet function for epileptic EEG signal analysis (Subasi, 2006). The frequency bands responding to 4-level DWT decomposition with sampling frequency of 173.6 Hz on the EEG signal are shown in Table 1. Figs. 5–7 show five different sub-signals (one approximation A4 and four details D1–D4) of a sample normal EEG signal (set Z), epileptic seizure-free EEG signal (set F) and epileptic seizure EEG signal (set S), respectively. After one raw EEG signal is decomposed into five sub-signals, which individually correspond to different frequency bands described in Table 1. Five classic measures explained in Section 4.1 are calculated on each sub-signal to form the original feature data-
5.2.1. The function set and the terminal set Although there are many different functions which can be used in GP, normally only a small sub-set of those is used simultaneously, because the size of the search space increases exponentially with the size of the function set. The function set employed in this work is listed in Table 3. Since the functions need to satisfy the closure property in GP, the square root, logarithm, and division operators are implemented in a protected way. Protected-division works identically to ordinary division except that it outputs the value of nominator when its denominator is zero. In order to avoid complex values and negative arguments, the protected square root operator applies an absolute value operator to the input value before taking the square root. Also, logarithms are protected versions that output zero for an argument of zero and apply an absolute value operator to the argument to handle negative arguments. Referring to the terminal set of this work, it includes 25 variables, which the definitions are in Table 2, and one random constant. Through essentially inputting the full original feature database to the GP, it is expected that GP can explore and exploit
EEG signal
10432
L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436
2000 0 −2000
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
A4
5000 0 −5000
D4
5000 0 −5000
D3
2000 0 −2000
D2
1000 0 −1000
D1
1000 0 −1000
Time (second) Fig. 7. Approximation and details of a sample epileptic seizure EEG signal.
Table 2 Definition of variables in original feature space X. Classic measure
Table 4 Control parameters in GP.
Frequency bands (Hz)
Mean Standard deviation Energy Curve length Skewness
43.4–86.8
21.7–43.4
10.8–21.7
5.4–10.8
0–5.4
x1 x2 x3 x4 x5
x6 x7 x8 x9 x10
x11 x12 x13 x14 x15
x16 x17 x18 x19 x20
x21 x22 x23 x24 x25
Table 3 The function set. Name
Number of argument
Operation
+ log sqrt F
2 2 2 2 1 1 1
Arithmetic add Arithmetic subtract Arithmetic multiply Protected division Protected natural logarithm Protected square root Output the value of input
the EEG signals so that increasing the chance of finding more informative and successful features to discriminate EEGs. 5.2.2. The control parameters These parameters are used to control the GP run. There exist much possibilities of combination with different control parameters. For solving the given problems, several different combinations of parameters have been tried. The parameters that returned the best results are shown in Table 4.
Initial population generation Maximum tree depth Population size Number of generation Crossover probability Mutation probability
Ramped half-and-half method 9 300 10 80% 20%
5.2.3. The termination criteria The GP execution is terminated when one of the following criteria is met: the mean value of misclassification out of the KNN classifier on the training data is zero or the maximum number of generation is reached. GPLab software (Silva, 2007) is employed in this work, which is a popular genetic programming software in Matlab. 5.3. Results Two different epileptic EEG classification problems from the dataset described in Section 3 are used to verify the developed methodology: the first problem is two-class classification (normal and seizure classification); and the second is three-class classification (normal, seizure-free and seizure classification). The raw dataset is pre-processed by the original features. 5.3.1. Classification performance and input feature dimension The classification accuracies of the 50 executions on the two classification problems are shown in Fig. 8. The procedure of calculating classification accuracy is:
10433
L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436
Normal and Seizure EEGs Classification Classification accuracy (%)
100 98 96 94 92 90 88 86
0
5
10
15
20
25 30 Number of execution
35
40
45
50
GP−KNN classifier KNN−alone classifier
Normal, Seizure−free and Seizure EEGs Classification Classification accuracy (%)
100 95 90 85 80 75 70 65
0
5
10
15
20
25 30 Number of execution
35
40
45
50
Fig. 8. Classification accuracy comparison of GP–KNN classifier and KNN-alone classifier on the testing subset of two epileptic EEGs classification.
1. Random selecting 30% of the whole pre-processed dataset is used for obtaining GP-based features. 2. The other 70% of the pre-processed dataset is processed with and without the GP-based features. Then they are randomly split into training subset (40% of them) and testing subset (60% of them). 3. The KNN classifier is trained by the training subset with and without the GP-based features. Therefore, two types of KNN classifiers are applied for comparison. 4. The two trained KNN classifiers are used to classify the testing subset with and without GP-based features, respectively, and obtain the classification accuracy. 5. Repeating steps 2, 3 and 4 for 300 times and calculating the mean value of classification accuracy as result of one time execution in Fig. 8. In the Fig. 8, ‘‘GP–KNN classifier’’ and ‘‘KNN-alone classifier’’ represent a KNN classifier with and without the GP-based features, respectively. The upper graph in the Fig. 8 is the classification accuracy comparison of two classifiers on two-class EEG classification. The bottom graph in the same figure is the classification accuracy comparison of two classifiers on three-class EEG classification. From the Fig. 8, it is obvious that GP–KNN classifier has much higher classification accuracy than KNN-alone classifier on both classification problems. It means that the GP-based features greatly improve the discrimination performance of the KNN classifier. Another objective of the developed methodology is to reduce the input feature dimension. The dimension of the original features without using GP based feature extraction is 25, which is already
mentioned in Section 5.1. In Fig. 9, the individual dimension of the GP-based features obtained on 50 executions is depicted: for two-class EEGs classification, the maximum dimension of the GPbased features is 4 and most of them are only 2 or 3. For three-class EEGs classification, the maximum dimension of GP-based features is 6 and most of them are from 2 to 5. Table 5 is the summarization of the results in Figs. 8 and 9. The values of classification accuracy and dimension of the input features in the table are the mean value of 50 executions. From the table, it is obvious that with the GP based feature extraction: 1. The classification performance of the KNN classifier has been significantly improved. For two-class classification, the improvement is more than 10%; and for three-class classification, the improvement is more than 25%. 2. Input feature dimension for the classifier is enormously reduced. The average number of input features is down to 2.32 for two-class classification problem and 3.48 for threeclass classification. Most number of the input features necessary for two-class classification are 2 or 3, thus the average value in the table is 2.32. For three-class classification, most input features required are 3 or 4, so the average value is 3.48. It is logic that the input feature dimension for three-class classification is higher than that of two-class since three-class classification problem is more complicated and need more features to discriminate different patterns. 5.3.2. GP-based features expression and analysis In most cases, each time executing genetic programming based feature extraction obtained different expressions and different
10434
L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436
Dimension of GP−based features
Normal and Seizure EEGs Classification 4
3
2
1
0
0
5
10
15
20
25 30 Number of execution
35
40
45
50
40
45
50
Dimension of GP−based features
Normal, Seizure−free and Seizure EEGs Classification 6 5 4 3 2 1 0
0
5
10
15
20
25
30
35
Number of execution Fig. 9. The dimension of GP-based features over 50 executions for two epileptic EEGs classification.
Table 5 Classification accuracy and input feature dimension of the two KNN classifiers on two classification problems. Normal and seizure EEG classification
Normal, seizure-free and seizure EEG classification
Classification accuracy (std)
Input feature dimension
Classification accuracy (std)
Input feature dimension
KNN-alone classifier
88.6% (1.12)
25
67.2% (1.17)
25
GP–KNN classifier
99.2% (0.49)
2.32
93.5% (1.20)
3.48
Table 6 Most important measures selected by GP for epileptic EEG classification. Measures (Hz)
Normal-seizure classification
Normal-seizure free-seizure classification
x24 – Curve length of the signal within 0–5.4 x22 – Standard deviation of the signal within 0–5.4 x17 – Standard deviation of the signal within 5.4–10.8 x12 – Standard deviation of the signal within 10.8–21.7 x19 – Curve length of the signal within 5.4–10.8 x7 – Standard deviation of the signal within 21.7–43.4 x9 – Curve length of the signal within 21.7–43.4
U U U U U
U
number of GP-based features. The expressions may be as simple as just one of the variables or may be a complicated function with many variables and constants. Several examples of the GP-based features expression obtained in this work are listed in following: Normal-seizure EEGs classification
pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi ½ x24 x17 þ x19 ; qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi logðx24 Þ ; x22 ; ½x19 ; x24 ; x14 :
U U U U
Normal-seizure free-seizure EEGs classification
½x24 ; x7 =x19 ; qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi x7 ; logðx9 Þ; x19 =0:25355 ; ½x19 ; x24 ; x12 ; x7 þ x19 x24 : From the above equations, it can see that the features extracted by genetic programming are intuitively difficult to interpret. This shows that GP can find combinations of the input variables and
10435
L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436 Table 7 The classification accuracy and the number of features required for the epileptic EEG classification of our method compared to the results of other methods. Researchers
Method
Dataset
Accuracy (%)
Number of input features
Srinivasan et al. (2005) Polat and Günesß (2007) Nigam and Graupe (2004) Subasi (2007) Tzallas et al. (2007)
Time & Frequency domain features–recurrent neural network Fast Fourier transform–decision tree Nonlinear pre-processing filter–diagnostic neural network Discrete wavelet transform–mixture of expert model Time frequency analysis–artificial neural network
Z, Z, Z, Z, Z,
S S S S S
99.6 98.72 97.2 95 99
5 129 2 16 13
This work Sadati et al. (2006) Tzallas et al. (2007) Übeyli (2008b)
GP-based feature extraction-KNN classifier Discrete wavelet transform–adaptive neural fuzzy network Time frequency analysis–artificial neural network Discrete wavelet transform–mixture of experts network
Z, Z, Z, Z,
S F, S F, S F, S
99.2 85.9 98.6 93.17
2.32 6 22 30
This work
GP-based feature extraction-KNN classifier
Z, F, S
93.5
3.48
functions that would not be found by humans. After looking into the whole expressions of GP-based features obtained in this work, it can be found that only several measures within the original feature database are useful for epileptic EEG classification. The selected measures are listed in Table 6. 5.4. Comparison with other works There are many other methods proposed for the epileptic EEG signal classification, which were described in Section 2.6. Table 7 presents a comparison on the results between the method developed in this work and other proposed methods. Only methods evaluated in the same dataset are included. Not only the classification accuracy but also the input features dimension for classifier is listed in the table for comparison. For two-class classification problem, the accuracy obtained from our method is the second best presented for this dataset. However, the number of features required for our method is the smallest while considering the classification accuracy is higher than 99%. For the three-class problem, the accuracy obtained from our method is also the second best presented for this dataset. While, the number of features necessary for our method is the smallest. 6. Conclusions In this work, genetic programming is applied to extract new features from original feature database for classification problem. In comparison with other pattern classification methods, genetic programming based feature extraction automatically determines: the dimensionality of the new features, which measures in the original feature database are useful for epileptic EEG classification, how to non-linearly transform the selected original measures to new features. Implementation results showed that the proposed method significantly improved the KNN classifier performance. Furthermore, the huge input feature dimension reduction was also achieved by the developed method. For the given two problems in this paper, three and four features are sufficient to attain high classification accuracy. In addition, through natural evolution, the informative measures useful for discrimination are selected by GP from the original feature database. Referring to the features extracted by GP, it is hard to explain their physical meaning. Thus, it proves that GP can discover the ‘‘hidden’’ relationship among the terminals and functions sets, which are difficult to fulfill by humans. The limitation of the proposed approach is that GP-based feature extraction system is computationally expensive. The increase of the original feature database size as well as the number of training data would bring about a significant increase on the computa-
tion cost, which makes the developed method inappropriate for real-time applications. 7. Future work Feature extraction using genetic programming has shown great success on epileptic EEG classification problems. The same method can be applied to a more wide range of pattern recognition problems which are important to humans, such as the Alzheimer’s and Parkinson’s diseases detection and diagnosis. Current methodology was derived from the combination of genetic programming and the KNN classifier, and more work can investigate the performance of genetic programming combination with other classifier systems, such as artificial neural networks. These are all interesting directions for the future work. Acknowledgements Ling Guo was financially supported through a fellowship of the Agencia Española de Cooperación International (AECI) and the Spanish Ministry of Foreign Affairs. References Adeli, H., Zhou, Z., & Dadmehr, N. (2003). Analysis of EEG records in an epileptic patient using wavelet transform. Journal of Neuroscience Methods, 123(1), 69–87. Andrzejak, R., Lehnertz, K., Mormann, F., Rieke, C., David, P., & Elger, C. (2001). Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E, 64(6). 061907-1–061907-8. Blankertz, B., Curio, G., & Müller, K. (2002). Classifying single trial EEG: Towards brain computer interfacing. In Advances in neural information processing systems: Proceedings of the 2002 conference (pp. 157–164). Cambridge, Massachusetts: MIT Press. Bot, M. (2001). Feature extraction for the k-nearest neighbour classifier with genetic programming. Lecture Notes in Computer Science, 256–267. Bot, M., & Langdon, W. (2000). Application of genetic programming to induction of linear classification trees. In Genetic programming, proceedings of EuroGP’2000 (pp. 247–258). Berlin, Heidelberg: Springer-Verlag. Chui, C. (1992). An introduction to wavelets. Boston: Academic Press. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. Darwin, C. (1864). On the origin of species by means of natural selection or the preservation of favoured races in the struggle for life. Cambridge, UK: Cambridge University Press. Duda, R., Hart, P., & Stork, D. (2001). Pattern classification. New York: Wiley. Ebner, M., & Rechnerarchitektur, A. (1998). On the evolution of interest operators using genetic programming. In Late breaking papers at EuroGP’98: The first European workshop on genetic programming (pp. 6–10). Ebner, M., & Zell, A. (1999). Evolving a task specific image operator. Lecture Notes in Computer Science, 74–89. Espejo, P., Ventura, S., & Herrera, F. (2010). A survey on the application of genetic programming to classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 40(2), 121–144. Firpi, H., Goodman, E., & Echauz, J. (2006). On prediction of epileptic seizures by means of genetic programming artificial features. Annals of Biomedical Engineering, 34(3), 515–529. Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). Boston: Academic Press.
10436
L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436
Garrett, D., Peterson, D., Anderson, C., & Thaut, M. (2003). Comparison of linear and nonlinear methods for EEG signal classification. IEEE Transactions on Neural Systems and Rehabilitative Engineering, 11(2), 141–144. Güler, N., Übeyli, E., & Güler, I. (2005). Recurrent neural networks employing Lyapunov exponents for EEG signals classification. Expert Systems with Applications, 29(3), 506–514. Guo, H., Jack, L., & Nandi, A. (2005). Feature generation using genetic programming with application to fault classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 35(1), 89–99. Harvey, N., Theiler, J., Brumby, S., Perkins, S., Szymanski, J., Bloch, J., et al. (2002). Comparison of GENIE and conventional supervised classifiers for multispectral image feature extraction. IEEE Transactions on Geoscience and Remote Sensing, 40(2), 393–404. Isßik, H., & Sezer, E. (in press). Diagnosis of epilepsy from electroencephalography signals using multilayer perceptron and Elman artificial neural networks and wavelet transform. Journal of Medical Systems. doi:10.1007/s10916-0109440-0. Jahankhani, P., Kodogiannis, V., & Revett, K. (2006). EEG signal classification using wavelet feature extraction and neural networks. In IEEE John Vincent Atanasoff 2006 international symposium on modern computing (JVA’06) (pp. 52–57). Kalayci, T., & Ozdamar, O. (1995). Wavelet preprocessing for automated neural network detection of EEG spikes. IEEE Engineering in Medicine and Biology Magazine, 14(2), 160–166. Kandel, E., Schwartz, J., & Jessell, T. (2000). Principles of neural science. New York: McGraw-Hill, Health Professions Division. Kotani, M., Nakai, M., & Akazawa, K. (1999). Feature extraction using evolutionary computation. In Proceedings of the 1999 congress on evolutionary computation, 1999. CEC 99 (Vol. 2, pp. 1230–1236). Koza, J. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge, Massachusetts: MIT Press. Krawiec, K. (2002). Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genetic Programming and Evolvable Machines, 3(4), 329–343. Mallat, S. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 674–693. Micheli-Tzanakou, E. (2000). Supervised and unsupervised pattern recognition: Feature extraction and computational intelligence. Boca Raton, FL: CRC Press. Mohseni, H., Maghsoudi, A., Kadbi, M., Hashemi, J., & Ashourvan, A. (2006). Automatic detection of epileptic seizure using time-frequency distributions. In IET 3rd International conference on advances in medical, signal and information processing, MEDSIP 2006 (pp. 1–4). Nigam, V., & Graupe, D. (2004). A neural-network-based detection of epilepsy. Neurological Research, 26(1), 55–60. Ocak, H. (2009). Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy. Expert Systems with Applications, 36(2), 2027–2036. Polat, K., & Günesß, S. (2007). Classification of epileptiform EEG using a hybrid system based on decision tree classifier and fast Fourier transform. Applied Mathematics and Computation, 187(2), 1017–1026.
Rabuñal, J., Dorado, J., Puertas, J., Pazos, A., Santos, A., & Rivero, D. (2003). Prediction and modelling of the rainfall–runoff transformation of a typical urban basin using ANN and GP. Applied Artificial Intelligence, 17(4), 329–343. Raymer, M., Punch, W., Goodman, E., & Kuhn, L. (1996). Genetic programming for improved data mining: Application to the biochemistry of protein interactions. In Proceedings of the first annual conference on genetic programming (pp. 375–380). Cambridge, Massachusetts: MIT Press. Rivero, D., Rabuñal, J., Dorado, J., & Pazos, A. (2005). Time series forecast with anticipation using genetic programming. In Computational intelligence and bioinspired systems, 8th international work-conference on artificial neural networks (pp. 968–975). Berlin, Heidelberg: Springer. Sabeti, M., Katebi, S., & Boostani, R. (2009). Entropy and complexity measures for EEG signal classification of schizophrenic and control participants. Artificial Intelligence in Medicine, 47(3), 263–274. Sadati, N., Mohseni, H., & Maghsoudi, A. (2006). Epileptic seizure detection using neural fuzzy networks. In 2006 IEEE international conference on fuzzy systems (pp. 596–600). Sherrah, J. (1998). Automatic feature extraction for pattern recognition. Ph.D. Thesis, The University of Adelaide. Silva, S. (2007). GPLAB-a genetic programming toolbox for MATLAB. Srinivasan, V., Eswaran, C., & Sriraam, N. (2005). Artificial neural network based epileptic detection using time-domain and frequency-domain features. Journal of Medical Systems, 29(6), 647–660. Subasi, A. (2005a). Automatic recognition of alertness level from EEG by using neural network and wavelet coefficients. Expert Systems with Applications, 28(4), 701–711. Subasi, A. (2005b). Epileptic seizure detection using dynamic wavelet network. Expert Systems with Applications, 29(2), 343–355. Subasi, A. (2006). Automatic detection of epileptic seizure using dynamic fuzzy neural networks. Expert Systems with Applications, 31(2), 320–328. Subasi, A. (2007). EEG signal classification using wavelet feature extraction and a mixture of expert model. Expert Systems with Applications, 32(4), 1084–1093. Tackett, W. (1993). Genetic programming for feature discovery and image discrimination. In Proceedings of the fifth international conference on genetic algorithms, ICGA-93 (pp. 303–309). Tzallas, A., Tsipouras, M., & Fotiadis, D. (2007). A time-frequency based method for the detection of epileptic seizures in EEG recordings. In 20th IEEE international symposium on computer-based medical systems, 2007, CBMS’07 (pp. 135–140). Übeyli, E. (2006a). Analysis of EEG signals using Lyapunov exponents. Neural Network World, 16(3), 257–273. Übeyli, E. (2006b). Fuzzy similarity index employing Lyapunov exponents for discrimination of EEG signals. Neural Network World, 16(5), 421–431. Übeyli, E. (2008a). Analysis of EEG signals by combining eigenvector methods and multiclass support vector machines. Computers in Biology and Medicine, 38(1), 14–22. Übeyli, E. (2008b). Wavelet/mixture of experts network structure for EEG signals classification. Expert Systems with Applications, 34(3), 1954–1962. Übeyli, E. (2009). Combined neural network model employing wavelet coefficients for EEG signals classification. Digital Signal Processing, 19(2), 297–308. Übeyli, E., & Güler, I. (2007). Features extracted by eigenvector methods for detecting variability of EEG signals. Pattern Recognition Letters, 28(5), 592–603.