An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction

An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction

ARTICLE IN PRESS JID: CAEE [m3Gsc;March 4, 2016;15:22] Computers and Electrical Engineering 0 0 0 (2016) 1–20 Contents lists available at ScienceD...

2MB Sizes 6 Downloads 133 Views

ARTICLE IN PRESS

JID: CAEE

[m3Gsc;March 4, 2016;15:22]

Computers and Electrical Engineering 0 0 0 (2016) 1–20

Contents lists available at ScienceDirect

Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng

An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and predictionR Zarita Zainuddin a,∗, Kee Huong Lai a, Pauline Ong b a

School of Mathematical Sciences, Universiti Sains Malaysia, 11800 Pulau, Pinang, Malaysia Faculty of Mechanical and Manufacturing Engineering, Universiti Tun Hussein Onn Malaysia, 86400 Parit Raja, Batu Pahat, Johor, Malaysia

b

a r t i c l e

i n f o

Article history: Received 5 October 2015 Revised 12 February 2016 Accepted 12 February 2016 Available online xxx Keywords: Harmony search Feature selection Wavelet neural networks Epileptic seizure classification Epileptic seizure prediction

a b s t r a c t Feature selection is a well-studied problem in the areas of pattern recognition and artificial intelligence. Apart from reducing computational cost and time, a good feature subset is also imperative in improving the classification accuracy of automated classifiers. In this work, a wrapper-based feature selection approach is proposed using the evolutionary harmony search algorithm, whereas the classifiers used are the wavelet neural networks. The metaheuristic algorithm is able to find near-optimal solutions within a reasonable amount of iterations. The modifications are accomplished in two ways—initialization of harmony memory and improvisation of solutions. The proposed algorithm is tested and verified using UCI benchmark data sets, as well as two real life binary classification problems, namely epileptic seizure detection and prediction. The simulation results show that the standard harmony search algorithm and other similar metaheuristic algorithms give comparable performance. In addition, the enhanced harmony search algorithm outperforms the standard harmony search algorithm. © 2016 Published by Elsevier Ltd.

1. Introduction The task of feature selection entails the search of an optimal feature subset that can best represent a given set of data, and hence, maximizes the performance of classifiers. Feature selection algorithms offer many attractive advantages. Apart from eliminating the redundant and irrelevant features [1], the proposed feature subset, with reduced numbers of features, will indirectly cut down computational cost and time. Moreover, the features selected are useful for the purpose of both data mining and data analysis. The final output of the feature selection algorithm will indicate which features are important in characterizing the entire data set. It should be noted that the term feature selection refers to the algorithms that return a subset of the input features, whereas the term feature extraction encompasses the techniques used to derive new features from the original data set. This pre-processing step is normally accomplished through mathematical transformations. The two aforementioned terms, feature selection and feature extraction, are two related, yet distinct fields of study. This work emphasizes on the former algorithm. R ∗

Reviews processed and recommended for publication to the Editor-in-Chief by Associate Editor Dr. M. R. Daliri. Corresponding author. Tel.: +60 46533940; fax: +60 46570910. E-mail address: [email protected] (Z. Zainuddin).

http://dx.doi.org/10.1016/j.compeleceng.2016.02.009 0045-7906/© 2016 Published by Elsevier Ltd.

Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

JID: CAEE 2

ARTICLE IN PRESS

[m3Gsc;March 4, 2016;15:22]

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

The problem of feature selection, which is known to be NP-hard [2], has been studied extensively in the domains of machine learning and knowledge discovery. Mathematically, the problem of feature selection can be formulated as follows: Given an original set of data, A, with a features, the ultimate goal is to find a subset B, with b features, where b < a and B⊂A, so as to maximize the overall classification accuracy. In general, the algorithms used fall into three main categories, namely wrapper, filter, and hybrid methods [3–4]. The wrapper methods are classifiers-dependent. As the name suggests, the feature selection process is “wrapped” in a learning algorithm. The classification accuracy reported by the classifiers is used as the criterion to evaluate the quality of the chosen feature subsets. As the classifiers need to be retrained for each of the proposed feature subsets, wrapper methods incur a very high computational cost. Instead of using the classification accuracy of the classifiers to score feature subsets, the filter methods employ a predetermined evaluation metric to serve the same purpose. As such, these methods are not as computationally intensive as the wrapper methods, as the evaluation metric is relatively simple and easy to compute. Nonetheless, as these methods are independent of classifiers, a feature subset with good evaluation score does not necessarily yield high classification accuracy when the features are trained and tested using the machine classifiers. Unlike the features selected by the wrapper methods, features chosen by filter methods are not specifically tuned to a particular type of classifier. In the case of hybrid methods, they combine the merits of both wrapper and filter approaches. The features are first chosen using filter methods, which are computationally fast and efficient. The subsequent step, accomplished by wrapper methods, concerns the fine-tuning process, which aims at further narrowing down the number of candidate features. This step incorporates the classifiers in the decision making process to determine which features to be included and excluded. To tackle the problem of feature selection, an exhaustive approach can be employed but the colossal amount of computational cost and time imposed by the algorithm render the problem virtually intractable and implausible. To illustrate, for a problem domain with n features, as many as 2n − 1 combinations are possible, provided that at least one feature must be selected. The number of possible feature subsets grows exponentially with the value of n. To circumvent this stumbling block, metaheuristic methods can be used to find sub-optimal solutions. The global search and local search features embedded in metaheuristic algorithms are able to explore the entire solution space efficiently and propose desirably good candidate solutions. Harmony search algorithm is a type of population-based search technique that has gained widespread popularity owing to its attractive advantages over other metaheuristic algorithms. Feature selection has found widespread applications in various fields. In particular, it is a powerful tool that can be used in medical domain to determine the most prominent features that best characterize a certain medical condition. For example, in [5], a novel strategy that uses a weighted combination of four feature selection scores is developed to study the brain activities using the functional magnetic resonance imaging (fMRI). In [6], a binary particle swarm optimization (PSO) algorithm is embedded in the proposed feature selection method to investigate four medical conditions, namely heart, cancer, diabetes, and erythemato-squamous diseases. In the present work, a new variant of harmony search based algorithm is considered in the task of feature selection. The efficiency and robustness of the proposed algorithm is tested and verified using ten UCI benchmark problems, as well as two real life applications, namely epileptic seizure detection and prediction. The remaining of the paper is organized as follows. Section 2 gives an overview of the existing metaheuristic algorithms and their applications in the field of feature selection. Section 3 presents the novel harmony search algorithm. Section 4 details the experimental setup of the standard benchmark problems and two real life applications. Section 5 presents the results and discussion. Lastly, Section 6 concludes the paper and provides some future research directions. 2. Metaheuristic algorithms Metaheuristics are a class of algorithmic framework that attempts to find sufficiently good or near-optimal solutions by incorporating intelligence in the search process. In general, exploration (global search) and exploitation (local search), both of which are controlled by user-defined parameters, are the two most commonly used concepts or search strategies. Striking a balance between these two search strategies is crucial in defining the success of a particular metaheuristic algorithm. Metaheuristic algorithms draw inspiration from many processes occurring in nature, such as Darwinian natural selection (genetic algorithm), movement of swarm or group of organisms (particle swarm optimization, ant colony optimization, artificial bee colony), flashing behavior of fireflies (firefly algorithm), and echolocation behavior of microbats (bat algorithm). In addition, some metaheuristics are motivated and modeled by man-made processes, such as the heating and cooling of materials (simulated annealing), and the improvisation of harmony in the field of music theory (harmony search). 2.1. Modifications on the existing metaheuristics Most of the aforementioned metaheuristic algorithms are first proposed and developed for continuous optimization problems that involve real values. For example, the algorithms are used to locate the minimum or maximum points of hyperplanes. Due to the success of their implementation, these metaheuristic algorithms have been modified and suited for mathematical problems that deal with integers and binary values. In particular, the process of feature selection can be formulated as an optimization problem, where the variables are encoded in a list of bits. These binary values will in turn, determine the inclusion or exclusion of a particular feature in the feature subset. Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

ARTICLE IN PRESS

JID: CAEE

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

[m3Gsc;March 4, 2016;15:22] 3

Taking into consideration all the modifications proposed in the literature, a total of three enhancements from different aspects based on the existing metaheuristic strategies are identified and summarized as follows: (i) Initialization: In general, the first set of solutions is generated randomly. The nature of this initialization method and the limited choice of the output values in binary form pose a challenge, as the candidate solutions proposed for the next iteration depend greatly on this initial set of solutions. (ii) Search strategy: The local search (exploitation) and global search (exploration) are two important pillars of any metaheuristic algorithms. There is often a trade-off between these two elements. A careful consideration, therefore, should be put forth in determining the rules or updating mechanisms in order to achieve a delicate balance between these two search strategies. Also, instead of using a parameter of a fixed value throughout all the iterations, a dynamic parameter can be introduced to govern the local and global search in the entire search space. (iii) Evaluation function: The filter methods operate independently of the learning algorithm, as opposed to the wrapper methods, where the classification accuracy obtained from the machine classifiers is used as the evaluation metric. 3. Harmony search based feature selection algorithm 3.1. Harmony search algorithm Originally proposed by Geem et al. [7], the metaheuristic harmony search (HS) algorithm is a population-based evolutionary algorithm that is inspired by the improvisation process of musicians when composing a new piece of music. The HS algorithm has gained widespread attention among researchers from various disciplines. This music-inspired algorithm has been developed and used to solve a wide range of optimization problems in the field of science and engineering. The success of this HS algorithm stems from the fact that this metaheuristic strategy offers many attractive advantages. The merits [7] are listed as follows: (i) The HS algorithm employs a stochastic random search technique instead of a gradient-based search. Therefore, derivative information is not required. (ii) The HS algorithm imposes very few mathematical requirements. Hence, the initial value settings of the decision variables are not required. (iii) The local and global search embedded in HS algorithm is able to prevent the solutions from getting trapped at local minima. (iv) In HS, a new solution is generated based on a set of solutions (usually 10) stored in harmony memory. In contrast, a new solution is derived from only two parent solutions in GA. (v) The HS algorithm can handle both continuous and discrete variables. The five main steps of the HS algorithm are listed as follows: Step Step Step Step Step

1: 2: 3: 4: 5:

Initialize parameters Initialize harmony memory (HM) Improvise a new harmony (solution) Update HM if a better solution is found Repeat step 3 and 4 until stopping criterion is met

3.1.1. Initialize parameters The pre-defined parameters are harmony memory size (HMS), harmony memory consideration rate (HMCR), pitch adjusting rate (PAR), distance bandwidth (BW) and number of improvisations (NI). HMS is the number of sets of solutions that are stored in the harmony memory (HM), which is a matrix where each row corresponds to a solution vector. The value of HMCR and PAR are set to be between 0 and 1. These two parameters control the global and local search of the HS algorithm. BW refers to the step size and its value is found using the formula α (xUi − xLi ), where α is a small value (0.01 or 0.001), whereas xUi and xLi are the upper bound and lower bound of a decision variable, respectively. NI is used as stopping criterion. When the number of iteration reaches NI, the search process terminates. 3.1.2. Initialize harmony memory The harmony memory (HM) is a matrix that stores solutions generated by the HS algorithm. Initially, a total of HMS solutions are generated. The HM takes the following form:



x11 x21 .. .

x12 x22 .. .

··· ···

xHMS 1

xHMS 2

···

⎢ ⎢ ⎣

HM = ⎢

..

.

x1N x2N .. . xHMS N



f ( x1 ) f ( x2 ) ⎥ ⎥ ⎥. .. ⎦ . HMS f (x )

(1)

j

The notation xi denotes the ith decision variable of the jth solution. In other words, the subscript denotes the position of the decision variable, whereas the superscript denotes the order or number of solution. In (1), each row represents a Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

ARTICLE IN PRESS

JID: CAEE 4

[m3Gsc;March 4, 2016;15:22]

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

Fig. 1. Pseudo code of the standard harmony search algorithm.

solution vector, [x1 , x2 , x3 , . . . , xN ]. The cost or fitness function, f evaluates the quality of the solution and stores the output, f(x) in the last column of each row. 3.1.3. Improvise a new harmony During this step, a new harmony, or a new candidate solution, is improvised based on the existing solutions stored in HM. The new value of a decision variable, xi , is obtained using the following rules:

⎧ ⎨

xi ∈ [xLi , xUi ], xi ← xi ∈ HM = {x1i , x2i , x3i , . . . , xHMS }, i ⎩ L U  xi + α (xi − xi )|xi ∈ HM,

if r > HMCR if r < HMCR if r < PAR,

(2)

i = 1, 2, . . . , N.

A random number 0 < r < 1 is generated and its value determines the new numerical value of a decision variable, xi . The value of r is compared with the two pre-defined parameters, HMCR and PAR. If r > HMCR, then the value xi will be taken randomly from the entire possible solution range for that particular decision variable. If < HMCR, then the new value, xi will take the form of one of the values stored in the same column in HM. If r < PAR, then a small increment size will be added to the existing value. The newly generated value is also checked to ensure that it falls within the range of the solution space. 3.1.4. Update harmony memory For each of the newly generated solution in Step 3, its quality will be evaluated using the cost function, f. The output, f(x ), is compared with all the other outputs stored in HM. If the new solution is found to be better, the new solution vector, x will replace the worst solution in HM. Mathematically, the rule is given as:

x ∈ HM ∧ xworst ∈ / HM.

(3)

3.1.5. Stopping criterion The algorithm terminates when it reaches the maximum number of iteration, NI. The best solution is the one that gives the least or greatest output, depending on whether the cost function is to be minimized or maximized. All the remaining sub-optimal solutions that are stored in HM can be used as alternative solutions. The pseudocode of the HS algorithm is given in Fig. 1. 3.2. Variants of harmony search algorithm The simplicity and the ease of implementation of the standard HS algorithm have given rise to many different improved versions of the algorithm. In [8], the authors point out the drawback of using fixed value parameters. The justification of the Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

ARTICLE IN PRESS

JID: CAEE

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

[m3Gsc;March 4, 2016;15:22] 5

need of a dynamic parameter is given as follows: The value of PAR controls the local search. In addition, a larger BW value is needed in the early stage to increase diversification of solutions. Toward the end of the algorithm, a smaller BW value is required to perform the fine-tuning process more efficiently. In view of this, two formulas are proposed in the improved harmony search (IHS) algorithm, where the values of PAR and BW change in each iteration. The value of PAR increases linearly, whereas the value of BW decreases exponentially with each successive iteration. In [9], a novel variant of evolutionary algorithm, called global-best harmony search (GHS) algorithm is presented. The paper highlights the difficulty in tuning the value of BW as BW can take any value from the range (0, ∞). The authors also mention the major drawback of the previous IHS algorithm, that is, the lower and upper values for BW are generally hard to define a priori as they are problem dependent. In GHS, the BW parameter is replaced where the new solution or harmony simply takes the value of the best harmony stored in the HM. The enhanced GHS algorithm works well in both continuous and discrete optimization problems. The paper reveals that a large HMCR value favors optimization problems with high dimensionality. In [10], the issue of premature convergence of the GHS algorithm is debated. An algorithm called self adaptive harmony search (SHS) algorithm is given, where the BW parameter is replaced and the new harmony is updated according to the maximal and minimal values stored in HM. The two formulas proposed are in accordance with the fact that the maximum and minimum values of variables would approach their respective optimum values gradually. The pitch adjustment rule proposed through this mechanism does not violate the boundary constraints imposed on the decision variables. Besides, an initialization method based on the concept of low-discrepancy sequences is employed instead of the standard pseudorandom number based approach. The justification given being that the values generated by the low-discrepancy sequences are more evenly distributed. Despite this new initialization method, it is found that the classification results are not affected by the different initialization methods used. In [11], the self-adaptive global best harmony search (SGHS) algorithm is presented to solve continuous optimization problems. Three parameters, namely HMCR, PAR, and BW, are dynamically adjusted during the improvisation step to generate new harmonies. The updating mechanism for BW is done in such a way that a delicate balance between exploration (during the early stage) and exploitation (during the final stage) is achieved. The effect of using different values of HMS is also investigated. It is reported that the SGHS algorithm favors small values (5 and 10) over larger values (20 and 50). The smaller values are expected because the HS algorithm is imitating musicians’ short-term memory capability. Having reviewed several variants of the HS algorithm, it is worth mentioning that the efforts for improving the standard HS algorithm is devoted to devising new updating mechanisms for several parameters, which include HMS, HMCR, PAR, and BW. Apart from that, different initialization strategies are utilized to enhance the effectiveness of the search algorithm. 3.3. Harmony search in feature selection To employ the HS algorithm in the problem of feature selection, two different approaches can be formulated. These two approaches are covered and discussed in [12], where two new approaches, namely horizontal approach and vertical approach, are presented. In horizontal approach, each musician is assigned one of the two binary values. On the contrary, for vertical approach, a different technique is employed. Here, each musician is treated as an individual expert who will determine which feature to be chosen and included in the feature subset. Different musicians are allowed to choose the same feature. In addition, they can choose no feature at all. The final harmony is generated based on the decisions made by all the individual musicians. The merit of the vertical approach is that it makes good use of the principles embedded in the HS algorithm, where a diverse range of candidate solutions can be generated. Nevertheless, the drawback of the approach is that an additional tunable parameter needs to be introduced, which complicates the structure of the algorithm. In [13], a dynamic parameter is introduced as an alternative to the standard, fixed value. The value of this adaptive parameter changes during various stages (initialization, intermediate, and termination) of the search process. As a result, a wide range of solutions are generated and proposed during the initialization stage. This encourages deep exploration of the entire solution space. During the intermediate stage, a steady improvement in terms of the quality of solutions is noted. Lastly, during the termination stage, further fine-tuning takes place and a fast convergence rate is observed. In [14], an algorithm that couples the HS algorithm with the optimum-path forest classifier (OPF) is proposed. The HS algorithm is chosen due to its simple rules and fast computational speed. The fitness function provided by the OPF serves as rules to guide the search process of the HS algorithm. The efficiency and robustness of the algorithm is tested in the real life application of non-technical losses detection in power distribution systems. In [15], the HS algorithm is used in conjunction with Markov blanket. It is reported that the hybrid wrapper-filter algorithm is able to identify and eliminate redundant genes for the gene selection problems performed on microarray datasets. In addition to preserving the overall classification accuracy, the proposed algorithm also manages to reduce the number of genes (features) effectively. 3.4. An enhanced harmony search based algorithm for feature selection The challenge of incorporating the HS algorithm in the binary feature selection problem lies in the fact that each of the decision variables can take only one of the two possible values. Unlike binary optimization problems, the decision variables Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

ARTICLE IN PRESS

JID: CAEE 6

[m3Gsc;March 4, 2016;15:22]

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

for standard continuous optimization problems take a range of values on some intervals. The binary values give a new interpretation to the optimization problems at hands. Furthermore, some of the parameters defined in the standard HS algorithm need to be modified or discarded completely. To illustrate, BW and PAR are not entirely relevant in the context of optimization problems involving feature selection. In the standard HS algorithm, these two parameters are used to fine-tune the value of variables. The concept of neighborhood is involved where a small step size will be added to the existing value. This, however, is not the case for binary decision variables as there exists no relationship between the values of two adjacent decision variables. In light of this shortcoming in the standard HS algorithm, two proposals are given in an enhanced variant of the HS algorithm. 3.4.1. Initialization of harmony memory The first idea is in the aspect of number of features. It is noticed that the different variants of HS algorithms reported in the literature do not take this factor into consideration. Statistically speaking, let X1 , X2 , . . . , Xn be binary random variables, each represented by the following probability distribution:

Xi =

⎧ ⎨0, w.p.P (Xi = 0 ) = 1

2 , for i = 1, 2, . . . , n. 1 1, w.p.P (Xi = 1 ) = 2



(4)

In (4), the abbreviation w.p. stands for with probability. As shown in (4), it is noticed that P (Xi = 0 ) = P (Xi = 1 ) = 0.5. In other words, during the random initialization stage, each decision variable has an equal probability of being assigned a value of 0 or 1. In a feature selection problem with n independent decision variables, the average number of decision variables that will be assigned a value of 1, denoted as X¯ one can be calculated as follows:

X¯ one = n · E (X ) = n · ( 0 · P (X = 0 ) + 1 · P (X = 1 )) = n · ( 0 · 0.5 + 1 · 0.5 ) n = 2

(5)

Put in words, Eq. (5) asserts that when the solution vectors are being randomly initialized in the second step of the HS algorithm, on average, 50% of all the decision variables will be included and the rest will be excluded. The initialization stage is as important as the improvisation step because the solutions generated in this initial phase will direct and guide the successive iterations. The snowball effect comes into mind if the initial pool of solutions is generated poorly. In reality, the optimum number of variables could be less than half the total number of decision variables. To circumvent this problem, a strategy is formulated in such a way that a mixture of different X¯ one values is obtained for different solution vectors during the initialization stage. In general, less than 50% of the features are sufficient to give good classification accuracy as they contain most of the vital and useful information that characterize a data set. In view of this, a few constants, namely 1.5, 2, 2.5, 3, and 4 (which correspond to 66.7%, 50%, 40%, 33.3%, and 25%, respectively) are chosen such that the decision variables that will be assigned values of 1 are less than 50% of the total number of decision variables. Specifically, the variable X¯ one has the following probability distribution:

X¯ one =

⎧ n ⎪ , ⎪ ⎪ 1.5 ⎪ ⎪

⎪ ⎪ n ⎪ , ⎪ ⎪ ⎪ 2 ⎨ n

⎪ ⎪ 2n.5 ⎪ ⎪ ⎪ , ⎪ ⎪ ⎪ ⎪ 3n ⎪ ⎪ ⎩ , 4

,

w.p. w.p. w.p. w.p. w.p.

1 5 1 5 1 5 1 5 1 , 5

(6)

where n is the number of decision variables and · is the floor function. This is achieved through an additional checking step. The solution vectors will be initialized randomly without any restriction imposed on them. Next, the number of decision variables that are assigned one will be added up and the value will be compared with the value of X¯ one generated from Eq. (6). If the number of variables that are assigned one is greater than (or less than) the value of X¯ one , the decision variables will be excluded (or included) accordingly. Instead of starting with 10 random solutions and allocating all the computational cost to generating and evaluating the successive iterations, a new division is proposed in this work, where 10% of the total computational cost is assigned to generating an initial pool of solutions. Here, Eq. (6) is used to govern the number of decision variables that is assigned one. Each of these solution vectors will be evaluated to obtain the corresponding fitness values. The best 10 solutions will be stored in the HM. The successive improvisation stage uses the remaining 90% of the computational cost. For example, for a problem with 10 0 0 iterations, instead of initializing the HM with 10 random solutions and performing 10 0 0 iterations, Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

ARTICLE IN PRESS

JID: CAEE

[m3Gsc;March 4, 2016;15:22]

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

7

this new division will first generate 100 random solutions. The best 10 solutions out of the 100 solutions will be stored in the HM. This is followed by the improvisation stage, where another 900 solution vectors will be generated and evaluated accordingly. 3.4.2. Improvisation stage of harmony search The improvisation stage is also deemed a crucial step as new solutions are proposed based on the values of two vital parameters, namely HMCR and PAR. The improvisation steps are achieved through the second proposal which is outlined below. Let x1i , x2i , . . . , xHMS be the values of the ith decision variable, stored in the same column of the HM. Denote |xi | as i the sum of x1i , x2i , . . . , xHMS . Because of the nature of the binary variables, a value of |xi | = m indicates that there are m i variables that are assigned a value of 1. In other words, these features are included in the feature subset. It should be noted that the terms “features” and “decision variables” refer to the same concept and they are used interchangeably in this work. Define two new variables xM and xm , where the superscript M and m stand for majority and minority, respectively. The two variables are defined as follows:

xM i

xm i

=

=

⎧ ⎨0 ,

HMS 2 HMS if |xi | ≥ . 2

(7)

HMS 2 HMS if |xi | < . 2

(8)

if |xi | <

⎩1, ⎧ ⎨0,

if |xi | ≥

⎩1,

m xM i + xi = 1, ∀i = 1, 2, . . . , n.

(9)

Let xi be the new value of the ith decision variable, and r be a random value generated uniformly from the interval (0,1).

In this work, the updating rules or mechanisms are revised as follows: (i) If r < HMCR, xi = xM . i (ii) If r ≥ HMCR, xi = xm . i f

Also, the PAR will act as a bit-flip operator. Define xi as:



0, 1,

xif =

if xi = 1 if xi = 0.

(10)

f (iii) If r < PAR, xi = xi . (iv) If r ≥ PAR, xi = xi (value remains unchanged).

Using constant values of HMCR and PAR throughout all the iterations would hamper the search capability of an algorithm. Taking into account this observation, different values of HMCR and PAR are used during different iteration phases (beginning, intermediate, and termination) using the following equations:

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨

0.7,

HMCRi = 0.7 +

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨

i−

n

total

n 3 × 0.3,

⎪  ⎪ 0.0, ⎪ ⎪ ⎪ ⎩

1.0,

n

3 1.0,

i=

0.0,

PARi =

i=

total

0.3, for A90% for A10%

i= ,

n

i = 1, 2, . . . ,

n

i=

total

3

2 3

+ 1,

total

3

2 3

+ 1,

n

ntotal + 1,

total

3

2 3

total

n

2

total

3

3

+ 2, . . . ,

2 3

ntotal

(11)

ntotal + 2, . . . , ntotal ,

+ 2, . . . ,



3

3

ntotal + 1,

i = 1, 2, . . . ,

n

total

2 3

ntotal

(12)

ntotal + 2, . . . , ntotal ,

where the subscript i stands for the ith iteration and ntotal is the total number of iteration. Also, A90% is a subset that represents 90% of all the decision variables of the ith iteration selected randomly. The notation A10% is defined similarly. The entire iteration stage is further divided into three phases of equal size. During the beginning phase, The HMCR is set to a constant value of 0.7. This relatively low value is chosen to encourage exploration of the entire solution space so as to diversify the candidate solutions generated. Moreover, the PAR is set to 0.0. This means that PAR does not play a Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

ARTICLE IN PRESS

JID: CAEE 8

[m3Gsc;March 4, 2016;15:22]

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

Fig. 2. The changes in the values of decision variables over iterations. Shaded boxed are assigned values of 1, indicating that the features are included in the feature subset. On the contrary, unshaded boxes are assigned values of 0, meaning that the features are excluded from the feature subset. The values of the decision variables (DV) may remain unchanged (like DV1 and DV2 ) or flip between 0 and 1 (like DV3 , DV4 , and DV5 ) throughout the iterations.

role during this beginning phase. The value of 0.0 is chosen because during the beginning phase, the PAR parameter, which serves as a bit-flip operator, plays little or no role at all at generating the solutions. This is partly due to the nature of the binary variables. In other words, the task of generating solution vectors during this beginning phase is handled and governed completely by the sole parameter, HMCR. Upon the completion of the previous beginning phase, many sub-optimal solutions of good quality have been identified and stored in the HM. Therefore, during the intermediate phase, a linearly increasing function is adopted to define the value of HMCR. As iteration proceeds, the chances of selecting a value from the one stored in the HM increases. Furthermore, the PAR is assigned a fixed value of 0.3. This implies that the values of some of the decision variables will be reversed. This helps the solutions escape from local minima. During the termination phase, the HMCR is assigned a value of 1.0. In other words, the values of all the decision variables are governed by Eqs. (7) and (8), where the concept of majority and minority are used. The justification of selecting this constant value of 1.0 being that many good solutions have been found during the beginning and intermediate phases. As such, the fine-tuning process performed during this termination phase only relies on the value of PAR. Each solution vector will be divided randomly into two disjoint subsets, namely A10% and A90% . This means that the values of 10% of all the decision variables of each solution vector will change or flip between the two values. The values of the remaining 90% of the decision variables will remain unchanged. A specific example is given as follows: For a problem where ntotal = 900, the values of HMCR and PAR are defined by the following piecewise functions:

⎧ ⎪ ⎪ ⎨

0.7,

i = 1, 2, . . . , 300

HMCRi = 0.7 + i − 300 × 0.3, ⎪ 300 ⎪ ⎩ 1.0,

⎧ 0.0, ⎪ ⎨ 0.3,  PARi = ⎪ ⎩ 0.0, for A90% , 1.0,

for A10%

i = 301, 302, . . . , 600

(13)

i = 601, 602, . . . , 900

i = 1, 2, . . . , 300 i = 301, 302, . . . , 600

(14)

i = 601, 602, . . . , 900,

An example of how the values of decision variables of a candidate solution change over iterations is given in Fig. 2. The pseudo code of the proposed harmony search-based feature selection algorithm is given in Fig. 3. In the algorithm detailed in Fig. 3, lines 1–15 involve the initialization stage of the harmony memory, HM. In line 2, the entire data set is first divided into two disjoint sets, namely training set, Strain and testing set, Stest .The for loop in line 3 creates a total of 0.1 · NI sets of candidate solutions. The inner for loop in lines 4–5 assigns a random integer, either 0 or 1, to each of the decision variable in a candidate solution. In line 7, the value of X¯ one is evaluated. This value is compared with the value generated from Eq. (6) in line 8 and modifications are performed accordingly. Following this step, the new   training set, Strain and testing set Stest are generated, using the criterion xki = 1, as given in lines 9–10. The generation or selection of these training and testing sets are illustrated using a simple data set, as shown in Fig. 4. In line 11, the classifier   is trained using the training set, Strain and subsequently, tested using the testing set, Stest . In line 12, the overall classification accuracy is obtained from the testing set and the value is stored. Having evaluated all the 0.1 · NI solutions, the values of the reported classification accuracy, f(xk ) are sorted in descending order in line 14. In line 15, the best 10 solutions, namely Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

JID: CAEE

ARTICLE IN PRESS Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

[m3Gsc;March 4, 2016;15:22] 9

Fig. 3. Pseudo code of the proposed harmony search-based feature selection algorithm.

the 10 solutions with the highest classification accuracy are identified and they are stored in the harmony memory matrix, HM. With this, the initialization step of HM is complete. The second section of the algorithm, as covered by lines 16–29, is dedicated to the improvisation step, where new solutions are generated using the rules governed by the values of HMCR and PAR. Line 17 creates a total of 0.9 · NI solutions. In line 18, the values of several parameters are evaluated. They include the worst solution, Hworst , that is stored in the HM, as well as the values of |xi |, xM , andxm . The inner for loop in lines 19–22 is responsible for creating a new harmony or i i candidate solution, by using the corresponding values of HMCR and PAR. Lines 23–26 are similar to lines 9–12, where the training set and testing set are created. For every candidate solution that is proposed, the quality of the solution is evaluated by comparing the classification accuracy obtained with those that have been stored previously in the HM. If a better solution is found, or in other words, a higher classification accuracy is obtained, then the new solution, or harmony, Hj will replace the worst solution, Hworst . In line 29, the best solution or harmony, H∗ is identified from the HM. This solution vector represents the best feature subset that yields the highest classification accuracy.

4. Experimental setup To evaluate the effectiveness and robustness of the proposed algorithm in the task of feature selection, simulations are performed on different data sets. A total of ten different benchmark data sets are selected from the UCI machine repository [16]. Additionally, the algorithm is tested on two real life applications, namely epileptic seizure detection and prediction, using the wavelet coefficients extracted from the biomedical electroencephalography (EEG) signals. For consistency purposes, all the experiments that employ the feature selection stage (standard HS, enhanced HS, PSO, and GA) are run for 10 0 0 iterations. Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

ARTICLE IN PRESS

JID: CAEE 10

[m3Gsc;March 4, 2016;15:22]

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

  . V is the ith variable, whereas D is the jth data or obserFig. 4. A simple data set to illustrate the generation of training set, Strain and testing set, Stest i j vation. (a) The small data set comprises of 5 observations, with 3 variables. (b)–(c) The first four data constitute the training set, Strain and the remaining   are generated. and testing set, Stest observation is the testing set Stest . (d)–(e) Using the binary codes, the new training set, Strain

Table 1 UCI benchmark data sets. No.

Data set

Size

Dimension

Class

1 2 3 4 5 6 7 8 9 10

Glass Heart Hepatitis Ionosphere Iris Pima Vehicle Votes Wine Zoo

214 303 155 351 150 768 946 435 178 101

10 14 19 34 4 8 18 16 13 17

7 2 2 2 3 2 4 2 3 7

4.1. UCI machine repository benchmark data set The ten UCI benchmark data sets that are used for simulation are presented in Table 1. The number (size) of sample, number of feature (dimension), and number of class are also listed in Table 1. Let N be the number of features or dimension of a particular data set. All the features in each set of the benchmark data set are replicated in order to introduce redundancy to the existing data set, thereby doubling the dimension or the number of features. Furthermore, a total of N sets of irrelevant feature are generated randomly using Boolean values. On top of this, an additional N irrelevant features are created, where the values of these features are random numbers between 0 and 1, generated using the rand function. Upon the inclusion of the redundant and irrelevant features, a data set with a dimension of N originally has now been recreated and extended to a larger data set with a total dimension of 4N. In other words, with the introduction of additional artificially-generated features, the number of attributes has increased four-fold. The generation and inclusion of these extra pseudo sets of data in the original set of data are illustrated in Table 2. The ultimate goal of the proposed feature selection algorithm is to eliminate the redundant and irrelevant features in the data set, thereby increase the overall classification accuracy. 4.2. Epileptic seizure detection data set Epilepsy is a very common neurological disorder that arises from the abnormal firing of electrical signals in the brain. It is estimated that there are more than 50 million people worldwide who are diagnosed with this disease [17]. With this Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

ARTICLE IN PRESS

JID: CAEE

[m3Gsc;March 4, 2016;15:22]

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

11

Table 2 The larger data set with the total dimension of 4 N is derived from the original set of data and three pseudo sets of data. Data

Description

Features

Total number of features

Original set Pseudo sets

Original data Redundancy is introduced by replicating the original data Irrelevant data are added by using randomly generated Boolean values Irrelevant data are added by using the rand function Total

S 1 = { n1 , n2 , . . . , nN } S 2 = { n1 , n2 , . . . , nN } S3 = { ni = { 0, 1} , i = 1, 2, . . . , N }

N N N

S 4 = {0 < ni < 1, i = 1, 2, . . . , N } 4N

N

Table 3 Data set for epileptic seizure detection. Group A B C D E

Class

Number of samples

Subtotal

1 (normal)

100 100 100 100 100

400

2 (epileptic)

100

staggering amount of patients, the need of better healthcare service in the field of epileptology is of utmost importance. First introduced by German neuropsychiatrist Hans Berger, electroencephalography (EEG) is a versatile clinical tool that is used to study the mechanisms of ictogenesis. Additionally, the biomedical signals recorded provide useful insight into the dynamics of the brain. By examining the EEG signals, the different segments of signals can be identified. In general, the four aforementioned segments are interictal (seizure-free period), pre-ictal (before the onset of seizure), ictal (during the occurrence of seizure), and post-ictal (after the onset of seizure). In the task of epileptic seizure detection, the goal is to differentiate between interictal and ictal EEG signals. In the absence of automated machine classifiers, trained epileptologists are resorted to performing visual inspection manually on the EEG signals that could be recorded for days, or even weeks, for pre-surgical evaluation purpose. As such, scrutinizing EEG signals manually is a time-consuming process, not to mention the high medical cost incurred. The development of machine classifiers or expert systems that can aid in the automated classification of EEG signals is hence of paramount importance. The benchmark data set used for the purpose of epileptic seizure detection in this study is obtained from [18]. The five groups of EEG signals are labeled A until E. Each set of data consists of 100 segments, where each segment is an EEG signal recorded for 23.6 s, at a sampling frequency of 173.61 Hz, which accounts for the total amount of 4097 data points. The five sets of data are recorded from different people subjected to different conditions. Set A and set B were both recorded from healthy volunteers. Set A was recorded with their eyes open, whereas set B was recorded with their eyes closed. On the contrary, set C until E were recorded from epileptic patients. Both set C and D were recorded during the interictal or seizure free period. Set C was recorded from the hippocampal formation of the opposite hemisphere of the brain. On the other hand, set D was recorded from a different region of the brain, namely within the epileptogenic zone. The last set of data, set E, was recorded when the patients were experiencing seizure. In other words, set E contains ictal data. In short, set A until D belong to the same class of normal EEG signal, as opposed to set E, which contains epileptic EEG signal. The summary of the data is given in Table 3. The research in epileptic seizure detection has grown exponentially over the past decades. Recent works on epileptic seizure classification reported in the literature are discussed in [19], where the issue of generalization of epileptic seizure detection is examined. Instead of designing machine classifiers that can detect patient-specific seizures, the works reported in [19] are dedicated to developing expert systems that are able to perform such classification tasks across all epileptic patients. The raw EEG signals are obtained in the time domain, where the magnitude of the amplitude is recorded over a period of time. Nonetheless, the temporal nature of the signals does not provide much useful information. Therefore, a suitable transformation is deemed necessary for the subsequent feature extraction stage [20]. The Daubechies wavelet of order 4 (db4) is chosen to analyze the EEG signals, based on the empirical finding and recommendation given in [21]. In this work, the 4 levels of decomposition scheme is employed, which yields five groups of wavelet coefficients that correspond to different frequency subbands: d1 (43.4–86.8 Hz), d2 (21.7–43.4 Hz), d3 (10.8–21.7 Hz), d4 (5.4–10.8 Hz), and a4 (0–5.4 Hz). For each group, eight summary statistics are computed. They are (i) maximum, (ii), minimum, (iii) 90th percentile, (iv) 10th percentile, (v) mean, (vi) standard deviation, (vii) skewness, and (viii) kurtosis of the wavelet coefficients. The mean, μ, standard deviation, σ , skewness, s, and kurtosis, k, are defined as

μ=

N

i=1

N

xi

,

(15)

Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

ARTICLE IN PRESS

JID: CAEE 12

[m3Gsc;March 4, 2016;15:22]

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20 Table 4 Data set for epileptic seizure prediction.

σ=



N i=1

Patient

Sex

Age

Seizure origin

Seizure analyzed

Interictal duration

3 4 5 9 10 16 17 18 20 21

M F F M M F M F M M

14 26 16 44 47 50 28 25 33 13

Frontal Temporal Frontal Temporal/Occipital Temporal Temporal Temporal Frontal Temporal/Parietal Temporal

5 5 5 5 5 5 5 5 5 5

24 h 24 h 24 h 24 h 24 h 24 h 24 h 25 h 26 h 24 h

(xi − μ )2 N

E (x − μ )

,

(16)

3

s=

σ

3

E (x − μ )

,

(17)

,

(18)

4

k=

σ4

where x is the variable under study, N is the total number of variable, and E(x) is the expected value of x. Therefore, each EEG signal is characterized by a total of 5 × 8 = 40 values or features. The job of the feature selection algorithm is to identify which, among these 40 features, are the most imperative in the task of differentiating between normal and epileptic EEG signal. 4.3. Epileptic seizure prediction data set While the task of epileptic seizure detection aims at differentiating between interictal and ictal EEG signals, the task of epileptic seizure prediction intends to distinguish between pre-ictal and ictal EEG signals. In [22], the use of relative combinations of sub-band spectral powers of EEG signals to study the EEG signals is reported. The existence of the so-called pre-ictal period has been confirmed and reported in [23], where a decrease in the value of dynamical similarity is noticed just minutes before the occurrence of seizure from the intracranial EEG signals. The localization of the pre-ictal EEG signals is extremely beneficial and of great importance in the medical community as it could assist greatly in the development of an alarm system that is able to issue warning to the epileptic patients of impending seizure attacks. With the realization of such warning system, many seizure-related injuries such as drowning and road accidents can be prevented, thus increasing the quality of life of patients. The data set used for the purpose of epileptic seizure prediction is acquired from [24]. The publicly available data set, recorded at the Epilepsy Center of the University of Freiburg, Germany, contains invasive EEG recordings of 21 epileptic patients. The recordings were performed using a Neurofile NT digital video EEG system with 128 channels. The sampling frequency used was 256 Hz. For each of the 21 patients, six electrodes were used to acquire the EEG signals. Three of them, termed focal electrodes, were placed on seizure onset areas, whereas the remaining three were extrafocal electrodes. The EEG recording of each patient contains a minimum of 24 h of interictal signals and 50 mins of pre-ictal signals. The number of seizures experienced by each of these patients ranged from 2 to 5. For consistency purpose, only 10 patients, each with 5 seizures, were chosen for analysis. The details of the patients and the type of seizure are summarized in Table 4. For this work, a time window of 16 s is used. With a sampling frequency of 256 Hz, this gives each EEG segment a total of 4096 data points. For feature extraction, the same method, as explained in Section 4.2 is adopted. Each EEG signal is hence represented by a vector of 40 values. Since the EEG recordings were done simultaneously using 6 electrodes (3 focal and 3 extrafocal), this gives a total of 6 × 40 = 240 values. 5. Results and discussion The classifiers used in this work are wavelet neural networks (WNNs) [25]. While the traditional multilayer perceptrons (MLPs) use global sigmoidal functions as the activation functions, WNNs employ localized wavelet functions in their hidden nodes. This results in a more compact network topology and faster learning speed. The WNN model consists of three layers. As the name suggests, the input layer receives input and sends the values to the hidden layer. At hidden layer, nonlinear Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

ARTICLE IN PRESS

JID: CAEE

[m3Gsc;March 4, 2016;15:22]

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

13

Table 5 Overall classification accuracy of the UCI benchmark data sets. No.

Data set

Full set

Enhanced HS

Standard HS

GA

PSO

1 2 3 4 5 6 7 8 9 10

Glass Heart Hepatitis Ionosphere Iris Pima Vehicle Votes Wine Zoo

65.24 ± 3.21 83.00 ± 1.89 76.67 ± 8.46 80.00 ± 3.30 85.33 ± 2.81 74.68 ± 1.26 88.21 ± 0.83 82.50 ± 2.16 83.89 ± 3.15 68.00 ± 6.32

81.43 ± 4.17 93. 33 ± 1.57 97.50 ± 4.37 90.29 ± 2.00 96.67 ± 5.67 82.47 ± 1.40 94.21 ± 0.74 98.18 ± 0.96 99.44 ± 1.76 99.00 ± 3.16

79.05 ± 4.02 92.67 ± 1.41 96.25 ± 5.27 89.43 ± 2.35 96.00 ± 8.43 81.82 ± 0.61 93.37 ± 0.71 97.95 ± 0.72 98.89 ± 3.51 97.00 ± 6.75

80.95 ± 3.17 93.67 ± 1.89 95.00 ± 4.93 87.43 ± 2.00 94.67 ± 7.57 82.34 ± 0.91 92.95 ± 0.71 98.41 ± 1.10 98.33 ± 3.75 96.00 ± 8.43

78.57 ± 3.37 92.33 ± 1.61 95.63 ± 5.15 88.29 ± 2.84 97.33 ± 6.44 82.60 ± 1.25 93.58 ± 0.78 97.73 ± 1.07 97.78 ± 3.88 98.00 ± 6.32

Average rank Final rank

5.0 4

1.4 1

2.8 2

3.0 3

2.8 2

mappings are performed and the values obtained are summed and propagated to the final output layer. The activation function used in the hidden layer is the Morlet wavelet. The equation is given as follows:



x2 ψ (x ) = cos(5x ) · exp − 2



.

(19)

For comparison purposes, the results obtained using the two other metaheuristic algorithms are also presented. They are genetic algorithms (GA) and particle swarm optimization (PSO). In addition, the standard harmony search (HS) algorithm, without any enhancement in the harmony memory (HM) initialization, as well as during the improvisation stage, is also employed. It should be noted that performing cross validation during the training and testing stage would incur very high computational cost, and hence, increase the training time tremendously. Nonetheless, cross validation is necessary to ensure that the classifier is robust because the average values across different training and testing subsets are used. To circumvent this problem, the 10-fold cross validation is only performed on the 10 best solutions obtained upon the completion of the improvisation process. To illustrate, the original ten sets of data are divided into ten disjoint subsets of equal (or similar) size. The first nine sets of data are used as the training set, whereas the last set of data is used as the testing set. The process is repeated for ten times, where during each cycle, a different set of data is used as the testing set. The average of the ten classification accuracy is then reported. The results obtained for the ten UCI benchmark data sets, as well as the data set for epileptic seizure detection (Andrzejak data set) and prediction (Freiburg data set), are shown in Table 5, 7, and 10, respectively. 5.1. UCI benchmark data sets The performance metric used to evaluate the efficiency of the proposed method on the ten UCI benchmark data sets is the overall classification accuracy. The results are shown in Table 5. The best result for each case is highlighted in boldface. The final two rows of the table show the average rank and final rank of the classifiers. As apparent from Table 5, the classifier that adopts the enhanced HS-based feature selection algorithm gives the best performance, with an average rank of 1.4. Out of the ten classification problems, the enhanced HS algorithm outperforms the other algorithms for six cases. It is interesting to observe that the classifiers that use three other standard metaheuristic algorithms, namely HS, GA, and PSO, yield similar average rank values. This is most probably due to the fact that all the three aforementioned algorithms belong to the same class of metaheuristic platform or paradigm [26]. The inspiration and name of such algorithms might be different, but all these algorithms are based on the facts that they employ certain guided local and global searches in the process of finding near-optimal solutions. Nevertheless, as evident from the simulation performed, the standard metaheuristic algorithms can be improved and enhanced in such a way that they are more suited to the problems at hands. HS, GA, and PSO, are originally developed to solve continuous optimization problems. Based on the observation that the decision variables are now in the form of binary values, changes are made and introduced to the existing HS algorithm so that the resulting enhanced HS algorithm can tackle the problem of feature selection more efficiently. The first column of Table 5 shows the results obtained using the full sets of data that consist of the original data set and the three pseudo sets of data. As evident from Table 5, the overall classification accuracy obtained for all the ten different UCI data sets are the lowest in their respective class. This is most probably due to the fact that the redundancy and irrelevant data introduced in the pseudo sets of data have degraded the performance of classifier. These empirical results once again, highlight the importance of feature selection as a necessary pre-processing step in machine learning. Table 6 shows the size of the resultant feature subsets. The numbers listed in the ‘full set’ column refer to the size of the larger data set obtained by combining the original data set and three pseudo data sets generated, as shown in Table 2. As evident from Table 6, all the feature selection algorithms do a good job in reducing the size of feature subset. Out of the ten cases examined, the enhanced HS feature selection algorithm yields the smallest resultant feature subsets in six cases. Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

ARTICLE IN PRESS

JID: CAEE 14

[m3Gsc;March 4, 2016;15:22]

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20 Table 6 Size of resultant feature subsets. No.

Data set

Full set

Enhanced HS

Standard HS

GA

PSO

1 2 3 4 5 6 7 8 9 10

Glass Heart Hepatitis Ionosphere Iris Pima Vehicle Votes Wine Zoo

40 56 76 136 16 32 72 64 52 68

12 17 24 43 5 11 23 23 13 19

12 18 24 45 5 12 23 22 13 20

13 17 26 45 5 11 24 20 12 22

12 18 27 44 4 10 23 22 14 19

Fig. 5. The comparison of the rate of convergence of the standard and enhanced HS algorithms.

It should be noted that a feature subset with smaller size does not necessarily imply a higher classification accuracy. For instance, for the wine data set, the resultant feature subset obtained using the GA algorithm has a dimension of 12, whereas the feature subset obtained by the enhanced HS algorithm has a size of 13. From Table 5, it is found that the enhanced HS algorithm gives a marginally higher classification accuracy. Examining the size of the feature subsets obtained using various feature selection algorithm, it is noticed that the numbers are very close to each other. It is also interesting to observe that the size of the resultant feature subsets obtained is slightly greater than the size of the original data set. To illustrate, for the glass data set, the original data set has a dimension of 10. After combining this original data set with three sets of pseudo data, the large data set has a dimension of 40. Using the enhanced harmony search feature selection algorithm, the final resultant feature subset has a size of 12. While the proposed feature selection algorithm is able to reduce the size of the resultant feature subset considerably, it still includes some additional features that are added previously to introduce redundancy. To examine the effect of the proposed modifications on the rate of convergence of the HS algorithms, two UCI data sets, namely the hepatitis and ionosphere data sets, are chosen. The results of the remaining eight UCI data sets follow a similar pattern or trend, and hence, their respective figures are omitted. The values of the highest classification accuracy stored in the HM are plotted against the number of iterations. The total number of iteration for the standard HS algorithm is 10 0 0. On the other hand, the enhanced HS algorithm uses only 900 iterations. The data for the first 100 iterations are not available because the computational cost is assigned to the HM initialization stage. The horizontal dotted line before the first jump at the beginning of the graph represents this initialization stage. As observed from Fig. 5, the enhanced HS algorithm starts the improvisation stage with a better solution which gives higher classification accuracy. This is evident from the fact that the best ten solutions are chosen from the initial 100 soluPlease cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

ARTICLE IN PRESS

JID: CAEE

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

[m3Gsc;March 4, 2016;15:22] 15

Table 7 The performance metrics for the epileptic seizure detection problem. Algorithms

Sensitivity

Specificity

Overall

Standard HS Enhanced HS GA PSO

94.27 ± 1.01 95.12 ± 0.91 93.96 ± 1.19 93.56 ± 1.21

98.89 ± 0.12 99.15 ± 0.10 97.84 ± 0.38 97.88 ± 0.41

98.02 ± 0.15 98.88 ± 0.12 97.24 ± 0.23 97.37 ± 0.27

tions. To illustrate, for the hepatitis data set, the first solution generated by the standard HS algorithm gives a classification accuracy of approximately 33%, whereas the first solution improvised by the enhanced HS algorithm gives a value of 70%. By commencing the improvisation step from a better starting point, this subsequently affects the rate of convergence of the algorithms. It can be seen from Fig. 5 that the solution obtained using the enhanced HS algorithm converges faster to the maximum classification accuracy. For the hepatitis data set, both standard and enhanced HS algorithms yield similar maximum classification accuracy, namely approximately 97%. The enhanced HS algorithm, however, converges to the maximum classification accuracy much earlier than the standard HS algorithm. Turning our attention to the ionosphere data set, a similar trend or pattern is observed. The maximum classification accuracy given by the standard HS algorithm is about 90%. This value remains constant from the 500th iteration onwards. The enhanced HS algorithm also yields a value of 90% much earlier, at 400th iteration. Later, a better set of solution, which gives an even higher classification accuracy is obtained, owing to the rules and mechanisms introduced to fine tune the solutions. In short, the two modifications incorporated in the enhanced HS algorithm are able to increase the overall classification accuracy, as well as the rate of convergence of the feature selection algorithm. 5.2. Andrzejak data set for epileptic seizure detection In the literature of epileptic seizure detection, the most commonly used statistical measures are in the form of sensitivity, specificity, and overall classification accuracy. Their formulas as given as follows:

Sensitivity =

TP × 100%, TP + FN

(20)

Specificity =

TN × 100%, TN + FP

(21)

Overall accuracy =

TN + TP × 100%, TN + TP + FN + FP

(22)

where TP (an epileptic EEG signal is correctly classified as epileptic by WNN), TN (a normal EEG signal is correctly classified as normal by WNN), FP (a normal EEG signal is incorrectly classified as epileptic by WNN), and FN (an epileptic EEG signal is incorrectly classified as normal by WNN) stand for true positive, true negative, false positive, and false negative, respectively. The simulation result is presented in Table 7. From Table 7, it is noticed that all four feature selection algorithms give very high classification accuracy. With an overall classification accuracy of 98.88%, the performance metrics reported by the enhanced HS algorithm are the highest among the four algorithms. Nevertheless, the enhanced HS algorithm only performs marginally better than the rest of the algorithms. This is most probably due to the fact that the data obtained from Andrzejak is a relatively small and simple data set. The ROC curves, which plot the values of sensitivity against false positive rate (FPR) across different cut-off points or threshold values, are given in Fig. 6. The best possible ROC curve, from a perfect classification, is the one that shows a vertical line joining the origin (0,0) and the point (0,1), and a horizontal line that joins the top left point (0,1) and the top right point (1,1). The diagonal line joining the origin (0,0) and the top right point (1,1), is used as a guideline. A completely random classifier would yield this diagonal line. Any classifier that yields points above this diagonal line is considered a better classifier in comparison to a random classifier. From Fig. 6, all four classifiers show good performance. The classifier that employs the enhanced HS algorithm, in particular, outperforms the other three classifiers. From the original 40 features, the enhanced HS algorithm manages to reduce the dimensionality to only 18. In other words, this represented a reduction of 55% in terms of the number of features. The features selected are listed in Table 8. The four main frequency subbands that constitute the EEG signals are the delta band (1–4 Hz), the theta band (4–8 Hz), the alpha band (8–13 Hz), and the beta band (13–22 Hz) [18]. Referring to Table 8, it can be seen that out of the 5 frequency subbands used, 17 out of the 18 features selected are taken from the d2 , d3 , d4 anda4 subbands, which cover to the frequency range of 0–43.4 Hz. This particular range corresponds to the four aforementioned delta, theta, alpha, and beta band. It can be inferred that the useful information needed for the task of epileptic seizure detection can be extracted from these four bands. Additionally, some information can be gathered from the perspective of the summary statistics chosen by the enhanced HS feature selection algorithm. From Table 8, the 90th and 10th percentiles are preferred over the conventional maximum Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

JID: CAEE 16

ARTICLE IN PRESS

[m3Gsc;March 4, 2016;15:22]

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

Fig. 6. The ROC curves for the classifiers with different feature selection algorithms.

Table 8 The features selected by the enhanced HS algorithm. d1 Maximum Minimum 90th percentile 10th percentile Mean Std. dev. Skewness Kurtosis Subtotal

d2

d3

d4

a4

/ / / / 1

3

/ / / / / 5

/ / / / / 5

/ / / / 4

Subtotal 0 1 3 3 3 2 4 2 18

and minimum values. It should be noted that EEG signals might be contaminated with artifacts and unwanted noise. The extreme values at two ends, namely the maximum and minimum values, usually contain outliers, which will in turn degrade the performance of classifiers. Hence, by utilizing the values of 90th and 10th percentiles, the problem or issue of outliers can be eliminated. Three mean values are also chosen to be included in the optimal feature subset. This implies that the mean is a good statistics to be used to capture the characteristics of the EEG signals. Furthermore, a total of four features are derived from the values of skewness across different frequency subbands. This corroborated the finding reported in [27], where 14 out of the 50 features selected by the GA are values of skewness. The remaining 34 features are in the form of mean (11), energy (7), entropy (6), standard deviation (5), kurtosis (4), and median (3). In [28], a total of 384 features are yielded from the feature extraction stage. Upon the completion of feature selection stage, the number of features is reduced to only 45. The 45 features are in the form of energy (12), kurtosis (8), entropy (8), mean (7), standard deviation (6), and skewness (4). Combining the two works reported in the literature and this work, Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

ARTICLE IN PRESS

JID: CAEE

[m3Gsc;March 4, 2016;15:22]

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

17

Table 9 Comparison of the features selected from three works. Reference Data set

Dhiman and Saini [27] Andrzejak et al [18]

Classifier

Support vector machine (SVM) Discrete wavelet transform (DWT) Genetic algorithms 1792 Skewness 14

Feature extraction Feature selection Original number of features Number of features of the optimal feature subset

Number of features of the optimal feature subset Reduction in terms of the number of features (%) Classification class considered and overall classification accuracy

Mean Energy Entropy Std. dev. Kurtosis Median TOTAL 50

11 7 6 5 4 3 50

Ghandi et al [28] Ghandi et al [28] and Andrzejak et al [18] Probabilistic neural network (PNN)

This work Andrzejak et al [18]

Discrete harmony search 384 Energy 12

Enhanced harmony search 40 Skewness 4

Kurtosis Entropy Mean Std. dev. Skewness

8 8 7 6 4

TOTAL 45

45

90th perc. 10th perc. Mean Std. dev. Kurtosis Minimum TOTAL 18

Wavelet neural network (WNN)

97.21

88.28

55.00

A-E (100)

A-E (100)

ABCD-E (98.88)

3 3 3 2 2 1 18

Table 10 The performance metrics for the epileptic seizure prediction problem. Algorithms

Standard HS

Enhanced HS

GA

PSO

SOP (minutes)

Sensitivity (%)

FPR (h−1 )

10 20 30 10 20 30 10 20 30 10 20 30

74.32 79.96 83.27 79.52 82.73 86.98 70.04 75.09 80.82 69.38 76.82 79.74

0.16 0.20 0.18 0.14 0.16 0.16 0.19 0.22 0.24 0.22 0.27 0.30

it is noticed that there are some summary statistics, such as mean, skewness, and kurtosis, which play an important role in capturing the vital information embedded in the raw EEG signals. The findings of these three works are summarized in Table 9. The different summary statistics chosen by different feature selection algorithms shown in Table 9 show that the human brain is a highly dynamic and complicated organ. The works reported in [27] and [28] also use the same benchmark data set [18]. Nonetheless, the classification task considered in these two works only involves set A and E. Using the feature selection algorithm proposed, both [27] and [28] report the perfect classification accuracy of 100%. In this work, a more challenging classification task is considered, where sets A until D represent the normal EEG signal, whereas set E represents the epileptic EEG signal. The ABCD-E binary classification task reported in this work yields an overall classification accuracy of 98.88%.

5.3. Freiburg data set for epileptic seizure prediction The evaluation metrics used in literature to report the efficiency of classifiers in the task of epileptic seizure prediction are sensitivity and false positive rate (FPR). The formula of FPR is:

FPR =

FP × 100% = 1 − specificity. TN + FP

(23)

In the literature of epileptic seizure prediction, a very important terminology is seizure occurrence period (SOP), which refers to the time frame for the prediction time window. In the survey conducted in [29], most patients prefer an SOP that is less than 1 h. In this work, three SOP values are used, namely 10, 20, and 30 mins. The results are shown in Table 10. Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

ARTICLE IN PRESS

JID: CAEE 18

[m3Gsc;March 4, 2016;15:22]

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20 Table 11 McNemar’s statistical test. Classifier

Standard HS

Enhanced HS

GA

PSO

Standard HS Enhanced HS GA PSO



2.11 –

–1.15 –2.84 –

–1.94 –2.89 –0.50 –

Table 12 The features selected by the enhanced HS algorithm. EEG channel

Number of features

Frequency subbands

Number of features

Summary statistics

Number of features

Focal Extrafocal

59 9

d1 d2 d3 d4 a4

6 22 18 12 10

Maximum Minimum 90th percentile 10th percentile Mean Std. dev. Skewness Kurtosis

TOTAL

68

1 3 11 9 16 7 12 9 68

68

From Table 10, the classifier that gives the best performance with highest value of sensitivity is the one that uses the enhanced HS algorithm. With an SOP of 30 mins, the sensitivity obtained is 86.98%. The false positive rate (FPR) reported in 0.16 per hour. It is also evident from Table 10 that a higher value of SOP leads to higher value of sensitivity, but at the same time, it increases the value of FPR as well. A higher value of FPR is deemed not ideal as too many false alarms would annoy and confuse the epileptic patients. To evaluate whether there is any significance difference between the performances of two classifiers, the McNemar’s test is performed. The normal test statistics of the statistical test is:

Zi j =

f − f ji

i j

fi j + f ji

,

(24)

where Zij measures the significance of the accuracies obtained by classifier i and j, fij is the number of cases where the EEG signal is correctly classified by classifier i, but incorrectly classified by classifier j. The notation fji is defined similarly. In this statistical test, a 5% level of significance is used. The performance or classification accuracy of two classifiers are said to differ significantly if |Zij | > 1.96. It is worth noting that a positive value of Zij means that classifier i performs better than classifier j. Table 11 presents the results of the McNemar’s statistical test. The row in Table 11 represents classifier i, whereas the column represents classifier j. The values of the entries in lower diagonal have been omitted because Table 11 is an anti-symmetric matrix, with the property mrc = −mcr , where the subscript r and c refer to the indices of row and column, respectively. The values of the test statistics that show significance difference between the two classifiers are highlighted in boldface. It is worth pointing out even though the value of sensitivity obtained by the standard HS is higher than that of GA, the McNemar’s test reveals that there is no significance difference between the performance of the two classifiers. The same result is obtained when comparing the standard HS and PSO algorithms. Also, GA does not outperform PSO statistically. However, in terms of statistical significance, the enhanced HS algorithm outperforms the standard HS algorithm. Similarly, the enhanced HS algorithm is found to give significant better results than both GA and PSO algorithms. From the original 240 features, the enhanced HS algorithm feature selection algorithm reduces the dimension of the optimal feature subset to only 68. This equates to a reduction of 71.67%. The number of features selected by the enhanced HS algorithm, across different classification schemes, is summarized in Table 12. Referring to Table 12, it is observed that approximately 87% of the features selected by the enhanced HS algorithm are those features derived from the focal EEG channel. This demonstrates that the EEG signals recorded from the focal EEG channels are more important than those recorded from extrafocal channels. In terms of frequency subbands, the distribution of the features selected follows the same trend of that obtained from the epileptic seizure detection problem. Most of the chosen features are taken from the subbands of d2 , d3 , d4 , and a4 . With regards to the summary statistics chosen, the top three statistics are mean, skewness, and the 90th percentile. This illustrates that these three statistics are good representation of the EEG signals. In [30], the importance of the reduction in the input dimension is highlighted. One of the main findings of the work is that the most prominent features that can characterize a given EEG signal are those related to spectral information and statistical moments. In addition, it is pointed out that an optimal feature subset with reduced number of features and channels can be obtained using suitable feature selection techniques, which corroborates the finding of this work. Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

JID: CAEE

ARTICLE IN PRESS Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

[m3Gsc;March 4, 2016;15:22] 19

6. Conclusion and future work An efficient feature selection algorithm is imperative in the field of machine learning. By identifying the most optimal feature subset, the input dimension and computational cost can be reduced. The modifications done on the standard harmony search algorithm include two aspects, namely the initialization of the harmony memory and the improvisation of the candidate solutions. The initialization strategy proposed ensures a more diverse pool of initial solutions, whereas the use of dynamic parameters caters the needs of the different requirements during the improvisation stage. The initial iterations encourage exploration or global search of the entire possible solution space. On the other hand, toward the end of the iterations, the algorithm gives more focus on the exploitation aspect or local search. The efficiency and robustness of the proposed algorithm is verified using ten UCI benchmark problems and two real life data sets, namely the Andrzejak data set for the problem of epileptic seizure detection and the Freiburg data set for the task of epileptic seizure prediction. The simulation result shows that the classifiers that employ standard HS, genetic algorithm, and particle swarm optimization algorithms give similar and comparable classification accuracy. Further statistical analysis demonstrates the superiority of the proposed enhanced HS algorithm. For future work, a hybrid feature selection algorithm will be considered as the proposed wrapper-based algorithm is relatively computationally expensive. Alternatively, the harmony search algorithm can be modified and used in conjunction with other feature subset evaluators as part of the filter-based feature selection techniques. Also, the proposed expert systems can be readily used for classification tasks using other biomedical signals, such as electrocardiography (ECG) and electromyography (ECG) signals. Using data sets with higher dimensionality, such as DNA microarray data, would be another possible future work to further evaluate the effectiveness and robustness of the proposed feature selection algorithm. Acknowledgments The authors gratefully acknowledge the financial assistance provided by the Ministry of Higher Education, Malaysia, under the Fundamental Research Grant Scheme (FRGS), and Universiti Sains Malaysia, under the USM Fellowship Scheme. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28]

Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinf Comput Biol 2005;3(02):185–205. Kohavi R, John GH. Wrappers for feature subset selection. Artif Intell 1997;97(1):273–324. Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res 2003;3:1157–82. Chandrashekar G, Sahin F. A survey on feature selection methods. Comput Electr Eng 2014;40(1):16–28. Daliri MR. Predicting the cognitive states of the subjects in functional magnetic resonance imaging signals using the combination of feature selection strategies. Brain Topogr 2012;25(2):129–35. Daliri MR. Feature selection using binary particle swarm optimization and support vector machines for medical diagnosis. Biomed Tech/Biomed Eng 2012;57(5):395–402. Geem ZW, Kim JH, Loganathan GV. A new heuristic optimization algorithm: harmony search. Simulation 2001;76(2):60–8. Mahdavi M, Fesanghary M, Damangir E. An improved harmony search algorithm for solving optimization problems. Appl Math Comput 2007;188(2):1567–79. Omran MG, Mahdavi M. Global-best harmony search. Appl Math Comput 2008;198(2):643–56. Wang CM, Huang YF. Self-adaptive harmony search algorithm for optimization. Expert Syst Appl 2010;37(4):2826–37. Pan QK, Suganthan PN, Tasgetiren MF, Liang JJ. A self-adaptive global best harmony search algorithm for continuous optimization problems. Appl Math Comput 2010;216(3):830–48. Diao R, Shen Q. Two new approaches to feature selection with harmony search. In: International Conference on Fuzzy Systems (FUZZ), 2010 IEEE; 2010. p. 1–7. Diao R, Shen Q. Feature selection with harmony search. Systems, man, and cybernetics,. Part B: IEEE Trans Cybernet 2012;42(6):1509–23. Ramos CC, Souza AN, Chiachia G, Falcão AX, Papa JP. A novel algorithm for feature selection using harmony search and its application for non-technical losses detection. Comput Electr Eng 2011;37(6):886–94. Shreem SS, Abdullah S, Nazri MZA. Hybridising harmony search with a Markov blanket for gene selection problems. Inf Sci 2014;258:108–21. Lichman M. UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science; 2013. http://archive. ics.uci.edu/ml . World Health Organization (WHO), Epilepsy fact sheet, 2015 (online). http://www.who.int/mental_health/neurology/epilepsy/en. Andrzejak RG, Lehnertz K, Mormann F, Rieke C, David P, Elger CE. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Phys Rev E 2001;64(6):061907. Fergus P, Hignett D, Hussain A, Al-Jumeily D, Abdel-Aziz K. Automatic epileptic seizure detection using scalp EEG and advanced artificial intelligence techniques. BioMed Research International, 2015; 2015. Fergus P, Cheung P, Hussain A, Al-Jumeily D, Dobbins C, Iram S. Prediction of preterm deliveries from EHG signals using machine learning. PLoS ONE 2013;8(10):e77154. Adeli H, Zhou Z, Dadmehr N. Analysis of EEG records in an epileptic patient using wavelet transform. J Neurosci Methods 2003;123(1):69–87. Bandarabadi M, Teixeira CA, Rasekhi J, Dourado A. Epileptic seizure prediction using relative spectral power features. Clin Neurophysiol 2015;126(2):237–48. Le Van Quyen M, Martinerie J, Baulac M, Varela F. Anticipating epileptic seizures in real time by a non-linear analysis of similarity between EEG recordings. Neurorep 1999;10(10):2149–55. Maiwald T, Winterhalder M, Aschenbrenner-Scheibe R, Voss HU, Schulze-Bonhage A, Timmer J. Comparison of three nonlinear seizure prediction methods by means of the seizure prediction characteristic. Phys D: Nonlinear Phenom 2004;194(3):357–68. Zhang Q, Benveniste A. Wavelet networks.. IEEE Trans Neural Netw 1992;3(6):889–98. Padberg M. Harmony search algorithms for binary optimization problems. In: Operations Research Proceedings 2011. Springer; 2012. p. 343–8. Dhiman R, Saini JS. Genetic algorithms tuned expert model for detection of epileptic seizures from EEG signatures. Appl Soft Comput 2014;19:8–17. Gandhi TK, Chakraborty P, Roy GG, Panigrahi BK. Discrete harmony search based expert model for epileptic seizure detection in electroencephalography. Expert Syst Appl 2012;39(4):4055–62.

Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009

JID: CAEE 20

ARTICLE IN PRESS

[m3Gsc;March 4, 2016;15:22]

Z. Zainuddin et al. / Computers and Electrical Engineering 000 (2016) 1–20

[29] Schulze-Bonhage A, Sales F, Wagner K, Teotonio R, Carius A, Schelle A, et al. Views of patients with epilepsy on seizure prediction devices. Epilepsy Behav. 2010;18(4):388–96. [30] Direito B, Ventura F, Teixeira C, Dourado A. Optimized feature subsets for epileptic seizure prediction studies. In: Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE; 2011. p. 1636–9. Zarita Zainuddin is a professor at the School of Mathematical Sciences, Universiti Sains Malaysia. She obtained her B. Sc. in Mathematics from Monmouth College, Illinois, USA, and received an M. Sc. in Applied Mathematics from Ohio University, USA. She holds a Ph. D. from Universiti Sains Malaysia. Her research interests include mathematical modeling, neural network learning algorithms, and crowd dynamics. Kee Huong Lai is a Ph. D. student at the School of Mathematical Sciences, Universiti Sains Malaysia. He received his B. Sc. (Edu.) with a major in mathematics. His research interests include neural networks, metaheuristic algorithm, and biomedical signal processing. Pauline Ong received her Ph. D. in applied mathematics from Universiti Sains Malaysia, Penang, Malaysia in 2011. She is currently a senior lecturer with Universiti Tun Hussein Onn Malaysia, Johor, Malaysia. Her research interests include artificial neural networks, mathematical modeling, and metaheuristic algorithm.

Please cite this article as: Z. Zainuddin et al., An enhanced harmony search based algorithm for feature selection: Applications in epileptic seizure detection and prediction, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.02.009