Automatic feature extraction using genetic programming: An application to epileptic EEG classification

Expert Systems with Applications 38 (2011) 10425–10436 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ...

Download PDF

702KB Sizes 0 Downloads 70 Views

Report

PDF Reader
Full Text

Expert Systems with Applications 38 (2011) 10425–10436

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Automatic feature extraction using genetic programming: An application to epileptic EEG classiﬁcation Ling Guo ⇑, Daniel Rivero, Julián Dorado, Cristian R. Munteanu, Alejandro Pazos Department of Information Technologies and Communications, University of La Coruña, Campus Elviña, 15071 A Coruña, Spain

a r t i c l e

i n f o

Keywords: Genetic programming Feature extraction K-nearest neighbor classiﬁer (KNN) Discrete wavelet transform (DWT) Epilepsy EEG classiﬁcation

a b s t r a c t This paper applies genetic programming (GP) to perform automatic feature extraction from original feature database with the aim of improving the discriminatory performance of a classiﬁer and reducing the input feature dimensionality at the same time. The tree structure of GP naturally represents the features, and a new function generated in this work automatically decides the number of the features extracted. In experiments on two common epileptic EEG detection problems, the classiﬁcation accuracy on the GPbased features is signiﬁcant higher than on the original features. Simultaneously, the dimension of the input features for the classiﬁer is much smaller than that of the original features. Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction Feature extraction for classiﬁcation is to seek a transformation or mapping from original features to a new feature space which can maximize the separability of different classes. A classiﬁcation problem cannot be properly solved if important interactions and relationships between the original features are not taken into consideration. Thus, many researches agreed that the feature extraction is the most important key to any pattern recognition and classiﬁcation problem: ‘‘The precise choice of features is perhaps the most difﬁcult task in pattern recognition’’ (Micheli-Tzanakou, 2000), ‘‘an ideal feature extraction would yield a representation that makes the job of the classiﬁer trivial’’ (Duda, Hart, & Stork, 2001). In most cases, feature extraction is done by human based on the researchers knowledge, experience, and/or intuition. Epilepsy is a type of neurological disorder disease. About 40 or 50 million people in the world suffer from epilepsy (Kandel, Schwartz, & Jessell, 2000). The signiﬁcant characteristic of epilepsy is recurrent seizures. Epilepsy can have several physical, psychological and social consequences, including mood disorders, injuries and sudden death. Until now, the speciﬁc cause of epilepsy in individuals is often unknown and the mechanisms behind the seizure are little understood. Thus, efforts toward its diagnosis and treatment are of great importance. This work has a multidisciplinary nature, in the combination of evolutionary computation, feature extraction and biomedical signal processing. It applies genetic programming (GP) to automatic feature extraction with the purpose of improving the discrimination performance of K-nearest neighbor (KNN) classiﬁer and decreasing the input feature dimension. The tree structure of GP ⇑ Corresponding author. Tel.: +34 981 167000x1302; fax: +34 981 167160. E-mail address: [email protected] (L. Guo). 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.02.118

helps itself naturally representing new features; and a new function generated in this work for GP helps the automatic feature extraction. The novel method is veriﬁed to be successful with application on the epileptic EEG classiﬁcation problems. It is seen that the classiﬁcation accuracy has been greatly increased with the GP extracted features. At the same time, the input feature dimension is tremendously reduced, down to three or four features for epileptic EEGs discrimination. In addition, through analyzing the expression of the GP extracted features, informative measures useful for EEGs classiﬁcation are selected from original features. The paper is organized as follows: In Section 2, an introduction of EEG and epilepsy is presented to allow general understanding the nature of the application. Then, discrete wavelet transform (DWT), which is an efﬁcient non-stationary signal processing tool, is brieﬂy described. Later, some basic aspects of the genetic programming are presented, since this is the key technique of the proposed method. A short description of K-nearest neighbor classiﬁer is included in the same part. Previous works on genetic programming application on feature extraction and epileptic EEGs classiﬁcation are also described in this section. The epileptic EEGs classiﬁcation problem considered in this work is given in Section 3. In Section 4, the detail explanation of the developed methodology is covered. Continuously, implementation and results of the proposed methodology on two different epileptic EEG classiﬁcations are discussed. Finally, conclusions and the future work are included.

2. State of the art 2.1. Epilepsy and electroencephalogram (EEG) Epilepsy is the second most prevalent neurological disorder in humans after stroke. It is characterized by recurring seizure in

10426

L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436

which abnormal electrical activity in the brain causes altered perception or behavior. Approximately one in every 100 persons will experience a seizure at some time in their life. Until now, the occurrence of an epileptic seizure is unpredictable and its course of action is little understood. Electroencephalograhy is the recording of the electrical activity of the brain, usually taken through several electrodes at the scalp. EEG contains lots of valuable information relating to the different physiological states of the brain and thus is a very useful tool for understanding the brain disease, such as epilepsy. The detection of epileptiform discharges occurring in the EEG is an important component in the diagnosis and treatment of epilepsy (Subasi, 2005a). However, usually tons of data are included in EEG recordings and visual inspect for discriminating EEGs is time consuming process and high costly. Much effort has been devoted to develop epileptic EEG classiﬁcation techniques.

Mallat (1989) developed an efﬁcient way of implementation DWT by passing the signal through a series of low-pass and high-pass ﬁlters. The DWT implementation procedure is schematically shown in Fig. 1, where ﬁlters h[n] and g[n] correspond to high-pass and low-pass ﬁlters, respectively (Subasi, 2007). In the ﬁrst stage, the signal is simultaneously passed through h[n] and g[n] ﬁlters with the cut-off frequency being the one fourth of the sampling frequency. The outputs of h[n] and g[n] ﬁlters are referred to as detail (D1) and approximation (A1) coefﬁcients of the ﬁrst level, respectively. The same procedure is repeated for the ﬁrst level approximation coefﬁcients to get the second level coefﬁcients. At each stage of this decomposition process, the frequency resolution is doubled through ﬁltering and the time resolution is halved through downsampling. The coefﬁcients A1, D1, A2, and D2 represents the frequency content of the original signal within the band 0 fS/4, fS/4 fS/2, 0 fS/8, and fS/8 fS/4, respectively, where fS is the sampling frequency of the original signal x[n].

2.2. Discrete wavelet transform 2.3. Genetic programming EEG signal is a complicated and non-stationary signal and its characteristics are spatio-temporal dependent. Based on these properties, DWT is chosen in this work for pre-analyzing the epileptic EEG signal (Adeli, Zhou, & Dadmehr, 2003). Recently, wavelet transform (WT) has been widely applied in many engineering ﬁelds for solving various real-life problems. The Fourier transform (FT) of a signal obtains the frequency content of the signal and eliminates time information. The short-time Fourier transform (STFT) is a series of FT with a ﬁxed window size. Because a large window loses time resolution and a short window loses frequency resolution, there always exists a trade-off between time and frequency resolutions: on the one hand, a good time resolution requires a short window with short time support; on the other hand, a good frequency resolution requires a long time window. The ﬁxed window size (ﬁxed time–frequency resolution) of the STFT results in constraint on some applications. Contrary to STFT, WT provides a more ﬂexible way of time–frequency representation of a signal by allowing the use of variable sized analysis windows. The attractive feature of WT is that it provides accurate frequency information at low frequencies and accurate time information at high frequencies (Adeli et al., 2003). This property is important in biomedical applications, because most signals in this ﬁeld always contain high frequency information with short time duration and low frequency information with long time duration. Through wavelet transform, transient features are precisely captured and localized in both time and frequency domain. WT has been commonly considered as one of the most powerful tools to EEG signal analysis. The continuous wavelet transform (CWT) of a signal S(t) is deﬁned as the correlation between S(t) and the wavelet function wa,b as follows (Chui, 1992): 1

CWT ða;bÞ ¼ jaj2

Z

1

SðtÞw

1

tb dt; a

ð1Þ

Genetic programming is an evolutionary technique used to create computer programs that represent approximate or exact solutions to a problem (Koza, 1992). GP works based on the evolution of a given population. In this population, every individual represents a solution for the problem that is intended to be solved. GP looks for the best solution by means of a process based on the Evolution Theory (Darwin, 1864), in which, from an initial population with randomly individuals, after subsequent generations, new individuals are produced from old ones by means of crossover, selection and mutation operations, based on natural selection, the good individual will have more chances of survival to become part of the next generation. Thus, after successive generations, obtains the best-so-far individual corresponding to the ﬁnal solution of the problem. The GP encoding for the solutions is tree-shaped, so the user must specify which are the terminals (leaves of the tree) and the functions (nodes capable of having descendants) for being used by the evolutionary algorithm in order to build complex expressions. Also, a ﬁtness function that is used for measuring the appropriateness of the individuals in the population has to be deﬁned in GP. And it is the most critic point in the design of a GP system. The wide application of GP to various environments and its success are due to its capability for being adapted to numerous different problems. Although one of the most common applications of GP is the mathematical expressions generation (Rivero, Rabuñal, Dorado, & Pazos, 2005), it has been also used in other ﬁelds such as rule generation (Bot & Langdon, 2000), ﬁlter design (Rabuñal et al., 2003), and classiﬁcation (Espejo, Ventura, & Herrera, 2010), etc. 2.4. K-nearest neighbor classiﬁer K-nearest neighbor (KNN) classiﬁer (Cover & Hart, 1967) is a nonparametric, nonlinear and relatively simple classiﬁer. It classi-

where a, b are called the scale (reciprocal of frequency) and translation (time localization) parameters, respectively. When a and b are taken as discrete numbers and deﬁned on the basis of power of two, which are like the following:

(

aj ¼ 2j ;

ð2Þ

bj;k ¼ 2j k; j; k 2 Z:

Then the discrete wavelet transform is obtained and the Eq. (1) becomes (Chui, 1992):

DWT ðj;kÞ ¼ 2j=2

Z

1

1

SðtÞw

t 2j k 2j

! dt:

ð3Þ Fig. 1. Sub-band decomposition of DWT implementation.

L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436

ﬁes a new sample on the basis of measuring the ‘‘distance’’ to a number of patterns which are kept in memory. The class that KNN classiﬁer determines for this new sample is decided by the pattern which most resembles it, i.e. the one that has the smallest distance to it. The common distance function used in the KNN classiﬁer is the Euclidean distance. Instead of taking the single nearest sample, it is normally taking a majority vote from the k-nearest neighbors. The parameter k has to be selected in practice. In this work, k is chosen to 3. 2.5. Previous work of genetic programming application on feature extraction Raymer, Punch, Goodman, and Kuhn (1996) applied GP to improve KNN classiﬁer performance without feature reduction. For each attribute, they evolved a tree which is a function of the attribute itself and zero or more constants. They applied this to one biochemistry database and obtained better classiﬁcation performance over similar system with genetic algorithm. In the work of Sherrah (1998), he created a feature extraction system which uses evolutionary computation. The individuals in his work are multi-trees, each tree encoding one feature. This means that the number of extracted features is determined beforehand. His system was not designed for KNN classiﬁer but for other classiﬁcation systems, such as the minimal distance to means classiﬁer, the parallelepiped classiﬁer and the Gaussian maximum likelihood classiﬁer. Tackett (1993) developed a processing tree derived from GP for the classiﬁcation of features extracted from images: measurements from the segmented images were weighted and combined through linear and non-linear operations. If the resulting value out of the tree was greater than zero, then the object was classiﬁed as a target. Ebner and co-workers have evolved image processing operators using GP (Ebner & Rechnerarchitektur, 1998; Ebner & Zell, 1999). The authors pose the interest point (IP) detection as an optimization problem attempting to evolve the Moravec operator (Ebner & Rechnerarchitektur, 1998) using genetic programming. The authors reported a 15% localization error between interest point detection of his evolved operator and that obtained using the Moravec detector. Bot (2001) has used GP to evolve new features for classiﬁcation problems, adding these one-at-a-time to a KNN classiﬁer if the newly evolved feature improved the classiﬁcation performance over a certain amount. Bot’s approach is a greedy algorithm and therefore almost certainly sub-optimal. Harvey et al. (2002) evolved pipelined image processing operations to transform multi-spectral input synthetic aperture radar (SAR) image planes into a new set of image planes and a conventional supervised classiﬁer was used to label the transformed features. Training data were used to derive a Fisher linear discriminant and GP was applied to ﬁnd a threshold to reduce the output from the discriminant-ﬁnding phase to a binary image. However, the discriminability is constrained in the discriminantﬁnding phase and the GP only used as a one-dimensional search tool to ﬁnd a threshold. Kotani, Nakai, and Akazawa (1999) used GP to evolve the polynomial combination of raw features to fed into a KNN classiﬁer and proved an improvement in classiﬁcation accuracy. The author assumed in advance that the features were polynomial expressions and was the product sum of the original patterns. Krawiec (2002) constructed a ﬁxed-length decision vector using GP proposing an extended method to protect ‘useful’ blocks during the evolution. This protection method, however, results in the over-ﬁtting which is proved from his experiments. Krawiee’s results showed that for some datasets, his feature extraction method

10427

actually produces worse classiﬁcation performance than using the raw input data. Guo, Jack, and Nandi (2005) have evolved features using GP in a condition monitoring task although it is not clear whether the elements in the vector of decision variables were evolved at the same time or hand selected after evolution. Firpi, Goodman, and Echauz (2006) developed some artiﬁcial features without physical meaning by means of GP, then applied those features to a KNN classiﬁer for predicting epileptic seizures. The researchers evaluated the performance of GP artiﬁcial features on IEEG data from seven patients and obtained satisﬁed prediction accuracy. However, the maximum number of artiﬁcial features evolved through GP has been predeﬁned by the author. Recently, Sabeti, Katebi, and Boostani (2009) employed GP to select the best features from the original feature set for increasing classiﬁer performance on EEG classiﬁcation problems. In that work, the aim of utilizing GP was to pick out the most important feature elements, and not to create the new artiﬁcial GP based features. In the present work, GP is used to create new features from original feature database to improve the KNN classiﬁer performance and simultaneously decrease the input feature dimension for the classiﬁer. The input feature dimension is automatically determined during GP evolution, not ﬁxed beforehand or decided by humans. 2.6. Previous work of epileptic EEGs classiﬁcation In this part, other researchers’ work on epileptic EEGs classiﬁcation is brieﬂy reviewed. Mohseni, Maghsoudi, Kadbi, Hashemi, and Ashourvan (2006) applied short time Fourier transform (STFT) analysis of EEG signals and extracted features based on the pseudo Wigner–Ville and the smoothed-pseudo Wigner–Ville distribution. Then those features are used as inputs to an artiﬁcial neural network (ANN) for classiﬁcation. Kalayci and Ozdamar (1995) used wavelet transform to capture some speciﬁc characteristic features of the EEG signals and then combined with ANN to get satisfying classiﬁcation result. Nigam and Graupe (2004) described a method for automated detection of epileptic seizures from EEG signals using a multistage nonlinear pre-processing ﬁlter for extracting two features: relative spike amplitude and spike occurrence frequency. Then they fed those features to a diagnostic artiﬁcial neural network. In the work of Jahankhani, Kodogiannis, and Revett (2006), the EEGs were decomposed with wavelet transform into different subbands and some statistical information were extracted from the wavelet coefﬁcients. Radial basis function network (RBF) and multi-layer perceptron network (MLP) were utilized as classiﬁers. Subasi (2005b, 2006, 2007) decomposed the EEG signals into time–frequency representations using discrete wavelet transform. Some features based on DWT were obtained and applied for different classiﬁers for epileptic EEG classiﬁcation, such as feed-forward error back-propagation artiﬁcial neural network (FEBANN), dynamic wavelet network (DWN), dynamic fuzzy neural network (DFNN) and mixture of expert system (ME). Übeyli (2009) employed wavelet analysis with combined neural network model to discriminate EEG signals. The EEGs were decomposed into time–frequency representations using DWT and then statistical feature were calculated. Then a two-level neural network model was used to classify three types of EEG signals. The results proved that utilizing combined neural network model achieved better classiﬁcation performance than the stand-alone neural network model. Ocak (2009) detected epileptic seizures based on approximate entropy (ApEn) and discrete wavelet transform. EEG signals were ﬁrstly decomposed into approximation and detail coefﬁcients using DWT, and then ApEn values for each set of coefﬁcients were

10428

L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436

computed. Finally, surrogate data analysis was used on the ApEn values to classify EEGs. Isßik and Sezer (in press) investigated wavelet transform for diagnosing epilepsy. The EEGs were decomposed into several sub-bands using wavelet transform and a set of features vectors were extracted. Dimensions of these feature vectors were reduced via principal component analysis method and then classiﬁed as epileptic or healthy using multi-layer perceptron and Elman neural networks. Instead of applying features derived from DWT as the inputs of the classiﬁers to discriminate EEGs, other quantitative information from time series of signals were also investigated. In Güler’s work (Güler, Übeyli, & Güler, 2005), Layapunov exponents were extracted from EEGs with Jacobi matrices and then applied as inputs to recurrent neural networks (RNNs) to obtain good classiﬁcation results. Übeyli (2006b) classiﬁed the EEG signals by combination of Lyapunov exponents and fuzzy similarity index. Fuzzy sets were obtained from the feature sets (Lyapunov exponents) of the signals under study. The results demonstrated that the similarity between the fuzzy sets of the studied signals indicated the variabilities in the EEG signals. Thus, the fuzzy similarity index could discriminate the different EEGs. In the work of Übeyli (2006a), the author used the computed Lyapunov exponents of the EEG signals as inputs of the MLPNNs trained with backpropagation, delta-bar-delta, extended delta-bar-delta, quick propagation, and Levenberg–Marquardt algorithms. The classiﬁcation accuracy of MLPNN trained with the Levenberg–Marquardt algorithm was 95% for healthy, seizurefree and seizure EEGs discrimination. In the study presented by Übeyli and Güler (2007), decision making was performed in two stages: feature extraction by eigenvector methods and classiﬁcation using the classiﬁers trained on the extracted features. The inputs of these expert systems composed of diverse or composite features were chosen according to the network structures. The ﬁve-class classiﬁcation accuracies of expert system with diverse features (MME) and with composite feature (ME) were 95.53% and 98.6%, respectively. Except ANNs are used as classiﬁcation systems, other types of classiﬁers are also utilized for EEGs discrimination, which includes linear discriminant analysis (LDA), Multiclass support vector machines (SVMs), Bayesian classiﬁer, and Nearest neighbor classiﬁer. LDA assumes normal distribution of the data, with equal covariance matrix for two classes. The separating hyperplane is obtained by seeking the projection that maximizes the distance between the two classes’ means and minimizes the interclass variance. This technique has a very low computational requirement which makes it suitable for the online and real-time classiﬁcation problem (Garrett, Peterson, Anderson, & Thaut, 2003). SVM also uses a discriminant hyperplane to identify classes. However, concerning SVM, the selected hyperplane is the one that maximizes the margins, i.e., the distance from the nearest training points. Maximizing the margins is known to increase the generalization capabilities (Blankertz, Curio, & Müller, 2002). The main weak of SVM is its relatively low execution speed. Übeyli (2008a) presented the multiclass support vector machine (SVM) with the error correcting output codes (ECOC) for EEGs classiﬁcation. The features were extracted by the usage of eigenvector methods which were used to train novel classiﬁer (multiclass SVM with the ECOC) for the EEG signals. The Bayesian classiﬁer aims at assigning to a feature vector the class based on highest probability. The Bayers rule is used to compute the so-called a posteriori probability that a feature vector has of belonging to a given class. Using the MAP (maximum a posteriori) rule and these probabilities, the class of this feature vector can be estimated (Fukunaga, 1990). Nearest neighbor classiﬁers are relatively simple. They assign a feature vector to a class according to its nearest neighbor(s) and they are discriminative non-linear classiﬁers (Garrett et al., 2003).

3. Problem description The epileptic EEG classiﬁcation problem described by Andrzejak et al. (2001) was considered in current research. The whole dataset consists of ﬁve sets (denoted as Z, O, N, F and S), each containing 100 single-channel EEG segments of 23.6 s duration, with sampling rate of 173.6 Hz. These segments were selected and cut out from continuous multi-channel EEG recordings after visual inspection for artifacts, e.g., due to muscle activity or eye movements. Sets Z and O consisted of segments taken from surface EEG recordings that were carried out on ﬁve healthy volunteers using a standardized electrode placement scheme. Volunteers were relaxed in an awake state with eyes open (Z) and eyes closed (O), respectively. Sets N, F and S originated from an EEG archive of presurgical diagnosis. Segments in set F were recorded from the epileptogenic zone, and those in set N from the hippocampal formation of the opposite hemisphere of the brain. While sets N and F contained only activity measured during seizure free intervals, set S only contained seizure activity. Here, segments were selected from all recording sites exhibiting ictal activity. In this work, two different classiﬁcation problems are created from the described dataset and then are testiﬁed by our method. In the ﬁrst problem, two classes are examined, normal and seizure. The normal class includes only the segment Z while the seizure class includes the segment S. The second problem includes three classes, normal, seizure-free and seizure. The normal class includes only the set Z, the seizure-free class set F, and the seizure class set S. According to the previous description, the datasets consist of 200 and 300 EEG segments for these two problems, respectively. 4. Methodology The current research is applying genetic programming to automatically extracting new features which improve the classiﬁer performance and reduce the input feature dimension simultaneously. Here, the term ‘automatically’ refers two meanings: one is that the expressions of the new features are automatically deﬁned by GP. Another is that the number of new features is also automatically determined by GP. Fig. 2 illustrates the approach of feature extraction using GP for classiﬁcation problem. The whole process consists of three stages: Stage 1: creation of original feature database. The feature extraction requires creating original feature database as a prerequisite for the subsequent procedures. In this work, the original features are created on the basis of discrete wavelet transform analysis for the raw EEG signals. Stage 2: genetic programming based feature extraction system. This is the main part of the proposed method. It consists of GP and a KNN classiﬁer. GP non-linearly transforms original features obtained in step 1 to a set of new features for facilitating classiﬁcation. The output of the system is GP-based features. Stage 3: classiﬁcation. The purpose of this stage is to testify the efﬁciency of the GP-based features on the test data. The test data is processed by the GP-based features. Then they are fed to a classiﬁer for evaluating the classiﬁcation performance. The detail description of the above three stages is in the following. 4.1. Creation of original feature database Since the EEG is non-stationary signal, discrete wavelet transform is chosen to analyze the EEGs and helps create original features. The basic theory of DWT was explained in Section 2.2.

L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436

10429

These ﬁve measures calculated on each sub-signal of the EEG signal are used to create the original feature database. 4.2. Genetic programming based feature extraction system

Fig. 2. Diagram of the GP based feature extraction for epileptic EEG classiﬁcation.

The raw EEG signal is ﬁrstly decomposed into several sub-signals by DWT, which includes one approximate and several other detail sub-signals. Each sub-signal represents original signal in different frequency bands. Then, ﬁve classic measures used in EEG signal analysis are calculated for each sub-signal. These measures were selected from time, statistics, and information theory to reveal the most important EEG signal characteristics. In order to deﬁne these measures, let S be a sampled signal with N samples, where Si is the ith sample, Si1 is the previous one. The deﬁnitions of the ﬁve measures are: 4.1.1. Mean value of the signal Average (arithmetic mean) of the signal amplitudes. N 1X Si : N i¼1

4.1.2. Standard deviation of the signal The square root of the variance l represents the mean value of the signal.

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u N u 1 X t ðSi lÞ2 : N 1 i¼1 4.1.3. Energy of the signal Measures the average instantaneous energy of the signal.

Genetic programming based feature extraction system developed in this work consists of genetic programming and a KNN classiﬁer. The main part of the system is genetic programming. Each individual in GP represents a set of new features, which are nonlinearly transformation from original features. Afterward, those new features are passed to the KNN classiﬁer. The goodness of the new features is evaluated through the misclassiﬁcation out of the KNN classiﬁer. In classic GP evolution process, each individual of the population represents one expression. When this expression is evaluated, it only allows obtaining one feature. That is the most common situation of GP application in feature extraction. In order to allow each individual to automatically generate more than one feature, in current work a new function named ‘F’ is created and added to the function set. It is a special function used to create the output new feature vector. The ‘F’ function can appear at any position of the tree in GP. ‘F’ has only one argument and the output of ‘F’ is just the copy of its input. However, the input argument of ‘F’ is added to the list of new feature vector. Thus, when evaluating an individual in GP, depending on the number of ‘F’ nodes included in the tree, the same number of new features will be automatically generated. The usage of function ‘F’ can be clearly described with the example shown in Fig. 3, where there are 10 input variables to a classiﬁer. A GP tree in the Fig. 3 contains two ‘F’ nodes. When a tree is evaluated, the numerical value resulting of this evaluation is not used. What is interesting is the sub-trees children of each ‘F’ node. Those sub-trees will determine the extracted new features expression: the ﬁrst node ‘F’ implies the creation of a feature with the expression x2 ⁄ x5 and it is put to the new feature vector. Then second node ‘F’ implies creation of the feature with expression x7 + x10 and it is also added to the same new feature vector. Thus, after ﬁnishing evaluation of the whole tree in Fig. 3, the feature vector including two new features is created. The feature dimension in this example has been decreased from original 10 down to 2. 4.2.1. The ﬁtness function The ﬁtness function is the most important point in GP evolution. In current work, the ﬁtness function is deﬁned as the misclassiﬁcation of the classiﬁer on the training data, which is depicted in Fig. 4.

N 1X S2 : N i¼1 i

4.1.4. Curve length of the signal Sum of the lengths of the vertical line segments between samples. It provides measures of both time and frequency characteristics. N X

jSiþ1 Si j:

i¼1

4.1.5. Skewness of the signal Measure of the asymmetry of the data distribution. l and r represent the mean and standard deviation of the signal individually.

4 N 1X Si l 3: N i¼1 r

Fig. 3. Example of a GP tree creates the feature vector including two new features.

10430

L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436 Table 1 Frequency bands of EEG signals with 4-level DWT decomposition. Sub-signals

Frequency bands (Hz)

Decomposition level

D1 D2 D3 D4 A4

43.4–86.8 21.7–43.4 10.8–21.7 5.4–10.8 0–5.4

1 2 3 4 4

5. Repeating steps 1–4 several times and calculating the mean value of ei, which is the ﬁtness value of the individual.

Fig. 4. Calculating the ﬁtness of each individual in GP.

The detail procedure of calculating the ﬁtness value of each individual is: 1. The training data pre-processed by original features are randomly split into a sub-training set (40% of the pre-processed training data) and a validation set (60% of the pre-processed training data). 2. Evaluating the tree in GP and then extracting new features. 3. The KNN classiﬁer is trained by the sub-training set processed by the new features. 4. The trained KNN classiﬁer classiﬁes the validation set processed by the new features, and obtains the misclassiﬁcation value ei:

ei ¼ Nv alidation Ncorrect ;

EEG signal

where Nvalidation is the number of samples in validation set, and Ncorrect is the number of samples correctly classiﬁed

After the GP evolution procedure is ended, the best-of-all individual is obtained. Consequently, the best new features for facilitating classiﬁcation on the training data will be used as the output of the GP based feature extraction system, which are called GP-based features. One point should be mentioned: in current work, the misclassiﬁcation out of a KNN classiﬁer is used to evaluate the ﬁtness of GP individuals, thus the GP-based features obtained are only optimized for the KNN classiﬁer. Although it is possible that the GP-based feature works well with other classiﬁers, the performance cannot be guaranteed. But, the idea of this methodology can be expanded to apply on other classiﬁcation system, such as artiﬁcial neural networks.

4.3. Classiﬁcation Because the GP-based features are derived from the training data, their performance on the classiﬁcation problem has to be veriﬁed on the test data. Since the KNN classiﬁer was selected as the component of GP based feature extraction system, KNN classiﬁer

200 0 −200

0

5

10

15

20

25

0

5

10

15

20

25

0

5

10

15

20

25

0

5

10

15

20

25

0

5

10

15

20

25

0

5

10

15

20

25

A4

500 0 −500

D4

200 0 −200

D3

200 0 −200

D2

50 0 −50

D1

20 0 −20

Time (second) Fig. 5. Approximation and details of a sample normal EEG signal.

10431

EEG signal

L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436

500 0 −500

0

5

10

15

20

25

0

5

10

15

20

25

0

5

10

15

20

25

0

5

10

15

20

25

0

5

10

15

20

25

0

5

10

15

20

25

A4

1000 0 −1000

D4

1000 0 −1000

D3

500 0 −500

D2

100 0 −100

D1

20 0 −20

Time (second) Fig. 6. Approximation and details of a sample epileptic seizure-free EEG signal.

is also chosen as a classiﬁcation system to verify the performance of the GP-based features on the test data. 5. Results In this section, the results of implementing the developed methodology on the speciﬁc classiﬁcation problem described in Section 3 are discussed.

base. The dimension of the original feature space is 5 ⁄ 5 = 25. In order to simplify the description, a vector X which contains 25 variables corresponding to the component of original features is deﬁned. The deﬁnition of each variable is shown in Table 2. 5.2. GP conﬁgurations

5.1. Original feature database

Before applying genetic programming to solve a real-world problem, several types of parameters need to be deﬁned in advance.

For creation of original features, there are two steps. In the ﬁrst step, discrete wavelet transform is applied to decompose the EEG signal into several sub-signals within different frequency bands. Selection the number of decomposition levels and suitable wavelet function are also important for EEG signal analysis with DWT. In current research, the number of decomposition levels is chosen 4, which is recommended by others’ work (Subasi, 2006). And the wavelet function selected is Daubechies with order 4, which was also proven to be the best suitable wavelet function for epileptic EEG signal analysis (Subasi, 2006). The frequency bands responding to 4-level DWT decomposition with sampling frequency of 173.6 Hz on the EEG signal are shown in Table 1. Figs. 5–7 show ﬁve different sub-signals (one approximation A4 and four details D1–D4) of a sample normal EEG signal (set Z), epileptic seizure-free EEG signal (set F) and epileptic seizure EEG signal (set S), respectively. After one raw EEG signal is decomposed into ﬁve sub-signals, which individually correspond to different frequency bands described in Table 1. Five classic measures explained in Section 4.1 are calculated on each sub-signal to form the original feature data-

5.2.1. The function set and the terminal set Although there are many different functions which can be used in GP, normally only a small sub-set of those is used simultaneously, because the size of the search space increases exponentially with the size of the function set. The function set employed in this work is listed in Table 3. Since the functions need to satisfy the closure property in GP, the square root, logarithm, and division operators are implemented in a protected way. Protected-division works identically to ordinary division except that it outputs the value of nominator when its denominator is zero. In order to avoid complex values and negative arguments, the protected square root operator applies an absolute value operator to the input value before taking the square root. Also, logarithms are protected versions that output zero for an argument of zero and apply an absolute value operator to the argument to handle negative arguments. Referring to the terminal set of this work, it includes 25 variables, which the deﬁnitions are in Table 2, and one random constant. Through essentially inputting the full original feature database to the GP, it is expected that GP can explore and exploit

EEG signal

10432

L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436

2000 0 −2000

0

5

10

15

20

25

0

5

10

15

20

25

0

5

10

15

20

25

0

5

10

15

20

25

0

5

10

15

20

25

0

5

10

15

20

25

A4

5000 0 −5000

D4

5000 0 −5000

D3

2000 0 −2000

D2

1000 0 −1000

D1

1000 0 −1000

Time (second) Fig. 7. Approximation and details of a sample epileptic seizure EEG signal.

Table 2 Deﬁnition of variables in original feature space X. Classic measure

Table 4 Control parameters in GP.

Frequency bands (Hz)

Mean Standard deviation Energy Curve length Skewness

43.4–86.8

21.7–43.4

10.8–21.7

5.4–10.8

0–5.4

x1 x2 x3 x4 x5

x6 x7 x8 x9 x10

x11 x12 x13 x14 x15

x16 x17 x18 x19 x20

x21 x22 x23 x24 x25

Table 3 The function set. Name

Number of argument

Operation

+ log sqrt F

2 2 2 2 1 1 1

Arithmetic add Arithmetic subtract Arithmetic multiply Protected division Protected natural logarithm Protected square root Output the value of input

the EEG signals so that increasing the chance of ﬁnding more informative and successful features to discriminate EEGs. 5.2.2. The control parameters These parameters are used to control the GP run. There exist much possibilities of combination with different control parameters. For solving the given problems, several different combinations of parameters have been tried. The parameters that returned the best results are shown in Table 4.

Initial population generation Maximum tree depth Population size Number of generation Crossover probability Mutation probability

Ramped half-and-half method 9 300 10 80% 20%

5.2.3. The termination criteria The GP execution is terminated when one of the following criteria is met: the mean value of misclassiﬁcation out of the KNN classiﬁer on the training data is zero or the maximum number of generation is reached. GPLab software (Silva, 2007) is employed in this work, which is a popular genetic programming software in Matlab. 5.3. Results Two different epileptic EEG classiﬁcation problems from the dataset described in Section 3 are used to verify the developed methodology: the ﬁrst problem is two-class classiﬁcation (normal and seizure classiﬁcation); and the second is three-class classiﬁcation (normal, seizure-free and seizure classiﬁcation). The raw dataset is pre-processed by the original features. 5.3.1. Classiﬁcation performance and input feature dimension The classiﬁcation accuracies of the 50 executions on the two classiﬁcation problems are shown in Fig. 8. The procedure of calculating classiﬁcation accuracy is:

10433

L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436

Normal and Seizure EEGs Classification Classification accuracy (%)

100 98 96 94 92 90 88 86

0

5

10

15

20

25 30 Number of execution

35

40

45

50

GP−KNN classifier KNN−alone classifier

Normal, Seizure−free and Seizure EEGs Classification Classification accuracy (%)

100 95 90 85 80 75 70 65

0

5

10

15

20

25 30 Number of execution

35

40

45

50

Fig. 8. Classiﬁcation accuracy comparison of GP–KNN classiﬁer and KNN-alone classiﬁer on the testing subset of two epileptic EEGs classiﬁcation.

1. Random selecting 30% of the whole pre-processed dataset is used for obtaining GP-based features. 2. The other 70% of the pre-processed dataset is processed with and without the GP-based features. Then they are randomly split into training subset (40% of them) and testing subset (60% of them). 3. The KNN classiﬁer is trained by the training subset with and without the GP-based features. Therefore, two types of KNN classiﬁers are applied for comparison. 4. The two trained KNN classiﬁers are used to classify the testing subset with and without GP-based features, respectively, and obtain the classiﬁcation accuracy. 5. Repeating steps 2, 3 and 4 for 300 times and calculating the mean value of classiﬁcation accuracy as result of one time execution in Fig. 8. In the Fig. 8, ‘‘GP–KNN classiﬁer’’ and ‘‘KNN-alone classiﬁer’’ represent a KNN classiﬁer with and without the GP-based features, respectively. The upper graph in the Fig. 8 is the classiﬁcation accuracy comparison of two classiﬁers on two-class EEG classiﬁcation. The bottom graph in the same ﬁgure is the classiﬁcation accuracy comparison of two classiﬁers on three-class EEG classiﬁcation. From the Fig. 8, it is obvious that GP–KNN classiﬁer has much higher classiﬁcation accuracy than KNN-alone classiﬁer on both classiﬁcation problems. It means that the GP-based features greatly improve the discrimination performance of the KNN classiﬁer. Another objective of the developed methodology is to reduce the input feature dimension. The dimension of the original features without using GP based feature extraction is 25, which is already

mentioned in Section 5.1. In Fig. 9, the individual dimension of the GP-based features obtained on 50 executions is depicted: for two-class EEGs classiﬁcation, the maximum dimension of the GPbased features is 4 and most of them are only 2 or 3. For three-class EEGs classiﬁcation, the maximum dimension of GP-based features is 6 and most of them are from 2 to 5. Table 5 is the summarization of the results in Figs. 8 and 9. The values of classiﬁcation accuracy and dimension of the input features in the table are the mean value of 50 executions. From the table, it is obvious that with the GP based feature extraction: 1. The classiﬁcation performance of the KNN classiﬁer has been signiﬁcantly improved. For two-class classiﬁcation, the improvement is more than 10%; and for three-class classiﬁcation, the improvement is more than 25%. 2. Input feature dimension for the classiﬁer is enormously reduced. The average number of input features is down to 2.32 for two-class classiﬁcation problem and 3.48 for threeclass classiﬁcation. Most number of the input features necessary for two-class classiﬁcation are 2 or 3, thus the average value in the table is 2.32. For three-class classiﬁcation, most input features required are 3 or 4, so the average value is 3.48. It is logic that the input feature dimension for three-class classiﬁcation is higher than that of two-class since three-class classiﬁcation problem is more complicated and need more features to discriminate different patterns. 5.3.2. GP-based features expression and analysis In most cases, each time executing genetic programming based feature extraction obtained different expressions and different

10434

L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436

Dimension of GP−based features

Normal and Seizure EEGs Classification 4

3

2

1

0

0

5

10

15

20

25 30 Number of execution

35

40

45

50

40

45

50

Dimension of GP−based features

Normal, Seizure−free and Seizure EEGs Classification 6 5 4 3 2 1 0

0

5

10

15

20

25

30

35

Number of execution Fig. 9. The dimension of GP-based features over 50 executions for two epileptic EEGs classiﬁcation.

Table 5 Classiﬁcation accuracy and input feature dimension of the two KNN classiﬁers on two classiﬁcation problems. Normal and seizure EEG classiﬁcation

Normal, seizure-free and seizure EEG classiﬁcation

Classiﬁcation accuracy (std)

Input feature dimension

Classiﬁcation accuracy (std)

Input feature dimension

KNN-alone classiﬁer

88.6% (1.12)

25

67.2% (1.17)

25

GP–KNN classiﬁer

99.2% (0.49)

2.32

93.5% (1.20)

3.48

Table 6 Most important measures selected by GP for epileptic EEG classiﬁcation. Measures (Hz)

Normal-seizure classiﬁcation

Normal-seizure free-seizure classiﬁcation

x24 – Curve length of the signal within 0–5.4 x22 – Standard deviation of the signal within 0–5.4 x17 – Standard deviation of the signal within 5.4–10.8 x12 – Standard deviation of the signal within 10.8–21.7 x19 – Curve length of the signal within 5.4–10.8 x7 – Standard deviation of the signal within 21.7–43.4 x9 – Curve length of the signal within 21.7–43.4

U U U U U

U

number of GP-based features. The expressions may be as simple as just one of the variables or may be a complicated function with many variables and constants. Several examples of the GP-based features expression obtained in this work are listed in following: Normal-seizure EEGs classiﬁcation

pﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃ ½ x24 x17 þ x19 ; qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ logðx24 Þ ; x22 ; ½x19 ; x24 ; x14 :

U U U U

Normal-seizure free-seizure EEGs classiﬁcation

½x24 ; x7 =x19 ; qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ x7 ; logðx9 Þ; x19 =0:25355 ; ½x19 ; x24 ; x12 ; x7 þ x19 x24 : From the above equations, it can see that the features extracted by genetic programming are intuitively difﬁcult to interpret. This shows that GP can ﬁnd combinations of the input variables and

10435

L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436 Table 7 The classiﬁcation accuracy and the number of features required for the epileptic EEG classiﬁcation of our method compared to the results of other methods. Researchers

Method

Dataset

Accuracy (%)

Number of input features

Srinivasan et al. (2005) Polat and Günesß (2007) Nigam and Graupe (2004) Subasi (2007) Tzallas et al. (2007)

Time & Frequency domain features–recurrent neural network Fast Fourier transform–decision tree Nonlinear pre-processing ﬁlter–diagnostic neural network Discrete wavelet transform–mixture of expert model Time frequency analysis–artiﬁcial neural network

Z, Z, Z, Z, Z,

S S S S S

99.6 98.72 97.2 95 99

5 129 2 16 13

This work Sadati et al. (2006) Tzallas et al. (2007) Übeyli (2008b)

GP-based feature extraction-KNN classiﬁer Discrete wavelet transform–adaptive neural fuzzy network Time frequency analysis–artiﬁcial neural network Discrete wavelet transform–mixture of experts network

Z, Z, Z, Z,

S F, S F, S F, S

99.2 85.9 98.6 93.17

2.32 6 22 30

This work

GP-based feature extraction-KNN classiﬁer

Z, F, S

93.5

3.48

functions that would not be found by humans. After looking into the whole expressions of GP-based features obtained in this work, it can be found that only several measures within the original feature database are useful for epileptic EEG classiﬁcation. The selected measures are listed in Table 6. 5.4. Comparison with other works There are many other methods proposed for the epileptic EEG signal classiﬁcation, which were described in Section 2.6. Table 7 presents a comparison on the results between the method developed in this work and other proposed methods. Only methods evaluated in the same dataset are included. Not only the classiﬁcation accuracy but also the input features dimension for classiﬁer is listed in the table for comparison. For two-class classiﬁcation problem, the accuracy obtained from our method is the second best presented for this dataset. However, the number of features required for our method is the smallest while considering the classiﬁcation accuracy is higher than 99%. For the three-class problem, the accuracy obtained from our method is also the second best presented for this dataset. While, the number of features necessary for our method is the smallest. 6. Conclusions In this work, genetic programming is applied to extract new features from original feature database for classiﬁcation problem. In comparison with other pattern classiﬁcation methods, genetic programming based feature extraction automatically determines: the dimensionality of the new features, which measures in the original feature database are useful for epileptic EEG classiﬁcation, how to non-linearly transform the selected original measures to new features. Implementation results showed that the proposed method signiﬁcantly improved the KNN classiﬁer performance. Furthermore, the huge input feature dimension reduction was also achieved by the developed method. For the given two problems in this paper, three and four features are sufﬁcient to attain high classiﬁcation accuracy. In addition, through natural evolution, the informative measures useful for discrimination are selected by GP from the original feature database. Referring to the features extracted by GP, it is hard to explain their physical meaning. Thus, it proves that GP can discover the ‘‘hidden’’ relationship among the terminals and functions sets, which are difﬁcult to fulﬁll by humans. The limitation of the proposed approach is that GP-based feature extraction system is computationally expensive. The increase of the original feature database size as well as the number of training data would bring about a signiﬁcant increase on the computa-

tion cost, which makes the developed method inappropriate for real-time applications. 7. Future work Feature extraction using genetic programming has shown great success on epileptic EEG classiﬁcation problems. The same method can be applied to a more wide range of pattern recognition problems which are important to humans, such as the Alzheimer’s and Parkinson’s diseases detection and diagnosis. Current methodology was derived from the combination of genetic programming and the KNN classiﬁer, and more work can investigate the performance of genetic programming combination with other classiﬁer systems, such as artiﬁcial neural networks. These are all interesting directions for the future work. Acknowledgements Ling Guo was ﬁnancially supported through a fellowship of the Agencia Española de Cooperación International (AECI) and the Spanish Ministry of Foreign Affairs. References Adeli, H., Zhou, Z., & Dadmehr, N. (2003). Analysis of EEG records in an epileptic patient using wavelet transform. Journal of Neuroscience Methods, 123(1), 69–87. Andrzejak, R., Lehnertz, K., Mormann, F., Rieke, C., David, P., & Elger, C. (2001). Indications of nonlinear deterministic and ﬁnite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E, 64(6). 061907-1–061907-8. Blankertz, B., Curio, G., & Müller, K. (2002). Classifying single trial EEG: Towards brain computer interfacing. In Advances in neural information processing systems: Proceedings of the 2002 conference (pp. 157–164). Cambridge, Massachusetts: MIT Press. Bot, M. (2001). Feature extraction for the k-nearest neighbour classiﬁer with genetic programming. Lecture Notes in Computer Science, 256–267. Bot, M., & Langdon, W. (2000). Application of genetic programming to induction of linear classiﬁcation trees. In Genetic programming, proceedings of EuroGP’2000 (pp. 247–258). Berlin, Heidelberg: Springer-Verlag. Chui, C. (1992). An introduction to wavelets. Boston: Academic Press. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classiﬁcation. IEEE Transactions on Information Theory, 13(1), 21–27. Darwin, C. (1864). On the origin of species by means of natural selection or the preservation of favoured races in the struggle for life. Cambridge, UK: Cambridge University Press. Duda, R., Hart, P., & Stork, D. (2001). Pattern classiﬁcation. New York: Wiley. Ebner, M., & Rechnerarchitektur, A. (1998). On the evolution of interest operators using genetic programming. In Late breaking papers at EuroGP’98: The ﬁrst European workshop on genetic programming (pp. 6–10). Ebner, M., & Zell, A. (1999). Evolving a task speciﬁc image operator. Lecture Notes in Computer Science, 74–89. Espejo, P., Ventura, S., & Herrera, F. (2010). A survey on the application of genetic programming to classiﬁcation. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 40(2), 121–144. Firpi, H., Goodman, E., & Echauz, J. (2006). On prediction of epileptic seizures by means of genetic programming artiﬁcial features. Annals of Biomedical Engineering, 34(3), 515–529. Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). Boston: Academic Press.

10436

L. Guo et al. / Expert Systems with Applications 38 (2011) 10425–10436

Garrett, D., Peterson, D., Anderson, C., & Thaut, M. (2003). Comparison of linear and nonlinear methods for EEG signal classiﬁcation. IEEE Transactions on Neural Systems and Rehabilitative Engineering, 11(2), 141–144. Güler, N., Übeyli, E., & Güler, I. (2005). Recurrent neural networks employing Lyapunov exponents for EEG signals classiﬁcation. Expert Systems with Applications, 29(3), 506–514. Guo, H., Jack, L., & Nandi, A. (2005). Feature generation using genetic programming with application to fault classiﬁcation. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 35(1), 89–99. Harvey, N., Theiler, J., Brumby, S., Perkins, S., Szymanski, J., Bloch, J., et al. (2002). Comparison of GENIE and conventional supervised classiﬁers for multispectral image feature extraction. IEEE Transactions on Geoscience and Remote Sensing, 40(2), 393–404. Isßik, H., & Sezer, E. (in press). Diagnosis of epilepsy from electroencephalography signals using multilayer perceptron and Elman artiﬁcial neural networks and wavelet transform. Journal of Medical Systems. doi:10.1007/s10916-0109440-0. Jahankhani, P., Kodogiannis, V., & Revett, K. (2006). EEG signal classiﬁcation using wavelet feature extraction and neural networks. In IEEE John Vincent Atanasoff 2006 international symposium on modern computing (JVA’06) (pp. 52–57). Kalayci, T., & Ozdamar, O. (1995). Wavelet preprocessing for automated neural network detection of EEG spikes. IEEE Engineering in Medicine and Biology Magazine, 14(2), 160–166. Kandel, E., Schwartz, J., & Jessell, T. (2000). Principles of neural science. New York: McGraw-Hill, Health Professions Division. Kotani, M., Nakai, M., & Akazawa, K. (1999). Feature extraction using evolutionary computation. In Proceedings of the 1999 congress on evolutionary computation, 1999. CEC 99 (Vol. 2, pp. 1230–1236). Koza, J. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge, Massachusetts: MIT Press. Krawiec, K. (2002). Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genetic Programming and Evolvable Machines, 3(4), 329–343. Mallat, S. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 674–693. Micheli-Tzanakou, E. (2000). Supervised and unsupervised pattern recognition: Feature extraction and computational intelligence. Boca Raton, FL: CRC Press. Mohseni, H., Maghsoudi, A., Kadbi, M., Hashemi, J., & Ashourvan, A. (2006). Automatic detection of epileptic seizure using time-frequency distributions. In IET 3rd International conference on advances in medical, signal and information processing, MEDSIP 2006 (pp. 1–4). Nigam, V., & Graupe, D. (2004). A neural-network-based detection of epilepsy. Neurological Research, 26(1), 55–60. Ocak, H. (2009). Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy. Expert Systems with Applications, 36(2), 2027–2036. Polat, K., & Günesß, S. (2007). Classiﬁcation of epileptiform EEG using a hybrid system based on decision tree classiﬁer and fast Fourier transform. Applied Mathematics and Computation, 187(2), 1017–1026.

Rabuñal, J., Dorado, J., Puertas, J., Pazos, A., Santos, A., & Rivero, D. (2003). Prediction and modelling of the rainfall–runoff transformation of a typical urban basin using ANN and GP. Applied Artiﬁcial Intelligence, 17(4), 329–343. Raymer, M., Punch, W., Goodman, E., & Kuhn, L. (1996). Genetic programming for improved data mining: Application to the biochemistry of protein interactions. In Proceedings of the ﬁrst annual conference on genetic programming (pp. 375–380). Cambridge, Massachusetts: MIT Press. Rivero, D., Rabuñal, J., Dorado, J., & Pazos, A. (2005). Time series forecast with anticipation using genetic programming. In Computational intelligence and bioinspired systems, 8th international work-conference on artiﬁcial neural networks (pp. 968–975). Berlin, Heidelberg: Springer. Sabeti, M., Katebi, S., & Boostani, R. (2009). Entropy and complexity measures for EEG signal classiﬁcation of schizophrenic and control participants. Artiﬁcial Intelligence in Medicine, 47(3), 263–274. Sadati, N., Mohseni, H., & Maghsoudi, A. (2006). Epileptic seizure detection using neural fuzzy networks. In 2006 IEEE international conference on fuzzy systems (pp. 596–600). Sherrah, J. (1998). Automatic feature extraction for pattern recognition. Ph.D. Thesis, The University of Adelaide. Silva, S. (2007). GPLAB-a genetic programming toolbox for MATLAB. Srinivasan, V., Eswaran, C., & Sriraam, N. (2005). Artiﬁcial neural network based epileptic detection using time-domain and frequency-domain features. Journal of Medical Systems, 29(6), 647–660. Subasi, A. (2005a). Automatic recognition of alertness level from EEG by using neural network and wavelet coefﬁcients. Expert Systems with Applications, 28(4), 701–711. Subasi, A. (2005b). Epileptic seizure detection using dynamic wavelet network. Expert Systems with Applications, 29(2), 343–355. Subasi, A. (2006). Automatic detection of epileptic seizure using dynamic fuzzy neural networks. Expert Systems with Applications, 31(2), 320–328. Subasi, A. (2007). EEG signal classiﬁcation using wavelet feature extraction and a mixture of expert model. Expert Systems with Applications, 32(4), 1084–1093. Tackett, W. (1993). Genetic programming for feature discovery and image discrimination. In Proceedings of the ﬁfth international conference on genetic algorithms, ICGA-93 (pp. 303–309). Tzallas, A., Tsipouras, M., & Fotiadis, D. (2007). A time-frequency based method for the detection of epileptic seizures in EEG recordings. In 20th IEEE international symposium on computer-based medical systems, 2007, CBMS’07 (pp. 135–140). Übeyli, E. (2006a). Analysis of EEG signals using Lyapunov exponents. Neural Network World, 16(3), 257–273. Übeyli, E. (2006b). Fuzzy similarity index employing Lyapunov exponents for discrimination of EEG signals. Neural Network World, 16(5), 421–431. Übeyli, E. (2008a). Analysis of EEG signals by combining eigenvector methods and multiclass support vector machines. Computers in Biology and Medicine, 38(1), 14–22. Übeyli, E. (2008b). Wavelet/mixture of experts network structure for EEG signals classiﬁcation. Expert Systems with Applications, 34(3), 1954–1962. Übeyli, E. (2009). Combined neural network model employing wavelet coefﬁcients for EEG signals classiﬁcation. Digital Signal Processing, 19(2), 297–308. Übeyli, E., & Güler, I. (2007). Features extracted by eigenvector methods for detecting variability of EEG signals. Pattern Recognition Letters, 28(5), 592–603.

Automatic feature extraction using genetic programming: An application to epileptic EEG classification

Automatic feature extraction using genetic programming: An application to epileptic EEG classification

Recommend Documents