Channel selection by Rayleigh coefficient maximization based genetic algorithm for classifying single-trial motor imagery EEG

Channel selection by Rayleigh coefficient maximization based genetic algorithm for classifying single-trial motor imagery EEG

Neurocomputing 121 (2013) 423–433 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Channel...

2MB Sizes 0 Downloads 37 Views

Neurocomputing 121 (2013) 423–433

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Channel selection by Rayleigh coefficient maximization based genetic algorithm for classifying single-trial motor imagery EEG Lin He a,n, Youpan Hu a,b, Yuanqing Li a, Daoli Li a a b

College of Automation Science and Engineering, South China University of Technology, Guangzhou 510640, Guangdong, China Guangzhou Institutes of Advanced Technology, Chinese Academy of Sciences, Guangzhou 511458, Guangdong, China

art ic l e i nf o

a b s t r a c t

Article history: Received 22 October 2012 Received in revised form 8 April 2013 Accepted 6 May 2013 Communicated by S. Hu Available online 4 June 2013

While common spatial pattern may be the most widely used feature for discriminating motor imagery based EEG signals, Rayleigh coefficient maximization enable us to have one more effective. However, such a feature is often deteriorated by redundant electrode channels which may result in low classification accuracy, extra subsequent computational load and difficulty in understanding which part of the brain relates to classification-relevant activity. In this paper, we present a channel selection method to deal with these problems, in which an improved genetic algorithm based on the Rayleigh coefficient feature is conducted to determine the optimal subset of channels. Experiment results on two motor imagery EEG datasets verify that our method is effective in channel selection for classifying motor imagery EEG signals. & 2013 Elsevier B.V. All rights reserved.

Keywords: EEG Channel selection Rayleigh coefficient maximization Genetic algorithm (GA)

1. Introduction The brain–computer interface (BCI) offers users an alternative communication and control channel between the brain and the external devices [1]. This research area has attracted an increasing interest during the past decade due to its huge potential in practical applications. In BCI applications, many methods can be utilized to measure brain signals, such as electroencephalography (EEG), magnetonencephalography (MEG), positron emission tomography (PET), functional magnetic resonance imaging (fMRI). Among them, EEG gains the advantages of non-invasive nature, feasibility in most environments, and requiring relatively simple and inexpensive equipments [1–5]. EEG-based BCI has become a highly attractive and active research field. In the study of EEGbased BCIs, particular attention has been paid to the motor imagery based BCI. By imagining movements of different parts of the body, subjects can voluntarily regulate their mu or beta rhythms over sensorimotor cortices [3–5]. The main distinction between real movement and motor imagery is that execution would be blocked at some corticospinal level in the latter case. Some related studies has proved that, when performing motor imagination, mu and beta rhythms are found to reveal eventrelated synchronization and desynchronization (ERS/ERD) over sensorimotor cortex just like when one actually does the motor

n

Corresponding author. Tel.: +86 020 87114390. E-mail address: [email protected] (L. He).

0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2013.05.005

tasks [1,3,6], wherein ERD represents an energy decrease of EEG signals in given frequency bands while ERS patterns occur as an energy increases. One of the challenges in motor imagery based BCI is how to correctly and efficiently extract subject-specific and task-specific features from the blurred scalp EEG for subsequently classifying the signal [4,7–9]. According to the existing literature, common spatial pattern (CSP) may be the most widely used features. CSP simultaneously diagonalizes the covariance matrices of two classes and maximizes the difference of variances of these classes. In contrast to CSP, Rayleigh coefficient maximization is capable of not only maximizing the difference of covariance matrices of two classes, but also minimizing the sum of these two covariance matrices. Thus, properly defined Rayleigh coefficient (RC) feature are more discriminating than CSP. A specific motor imagery task is associated with a set of optimal recording channels that builds the greatest discriminability of RC feature. If such an optimal channel subset is unknown, a simple approach is to directly utilize all available channels. However, this may trigger some inherent weakness, such as the overfitting of RC features to noise which leads to low classification accuracy, the extra computation load, and the difficulty to identify which part of the brain generates the class-relevant activity [10,11]. Therefore, conducting channel selection is normally necessary for EEG signal classification. Channel selection for EEG signal classification is mathematically a problem of combinatorial optimization and analytical intractable. Due to the capacity of encoding discrete variable and being not vulnerable to local optimum, evolution or genetic

424

L. He et al. / Neurocomputing 121 (2013) 423–433

algorithms can be employed to solve this problem. In fact, such methods have been applied to feature selection [12,13] and introduced to reduce data dimensions in EEG signal processing [14,15]. Lan et al. utilized mutual information maximization as the objective of genetic algorithm (GA) to select channels during mental tasks [14], but their method is not task-specific and does not take into account the discriminative of the ERD/ERS feature. Schroder et al. proposed a method combining genetic algorithm with support vector machine (SVM-GA) for the selection of features of EEG signal [15]. SVM-GA is effective to obtain the high classification accuracy, but it has to be supported by additional cross-validation, which is high computation cost and requires experienced operation. In this paper, we present an improved GA algorithm for channel selection based on RC maximization. Hereinafter, we call it RC based GA algorithm (RC-GA). Compared with aforementioned GA based methods, RC-GA is oriented by the discriminability of the RC feature directly, which offers the potential of the high classification accuracy under the lower computation cost. As the comparisons with RC-GA, we conduct SVM-GA based channel selection and additionally two channel selection methods evolved from two benchmark searching algorithms: sequential forward search (SFS) and sequential backward search (SBS). The results of experiment on two real datasets verify the effectiveness of our RC-GA channel selection method.

2. Methods 2.1. Common spatial patterns and Rayleigh coefficient maximization CSP [8,16–20] reflects the difference of motor imagery EEG data from two varied tasks. It finds projections (i.e., spatial filters) that reach maximization of the separation of two classes by supervised simultaneous diagonalizing the covariance matrices of both classes. To be more specific, suppose that L ¼ fXðiÞ; yðiÞji ¼ 1; …; N trial g is a labeled training dataset for two classes, where XðiÞ∈RMT denotes the EEG of ith trial, and M and T denote the numbers of channels and time samples per channel, respectively, Ntrial denotes the number of labeled trials, and yðiÞ∈f1; 2g represents the class label of the ith trial. Normalized spatial covariances of two classes can then be calculated as Γ1 ¼ ∑yðiÞ ¼ 1 XðiÞXT ðiÞ=trðXðiÞXT ðiÞÞ and Γ2 ¼ ∑yðiÞ ¼ 2 XðiÞXT ðiÞ=trðXðiÞXT ðiÞÞ. CSP intends to find the matrix W∈RMM , which simultaneously diagonalize the two class-specific covariance matrices WT Γ1 W ¼ D and WT Γ2 W ¼ I−D, where D ¼ diagðλ1 ; …; λM Þ and λ1 ; …λM are positive and arranged in descending order. The spatial filter wj ðj ¼ 1; …; MÞ, which is the jth column of W, yielding high variance (i.e. large eigenvalue) for Γ1 will achieve small eigenvalue for Γ2 , and vice versa. Consequently, projecting data of two classes with 2M′ spatial filters corresponding to M′ largest and M′ smallest eigenvalues will result in the most significant difference of variance between two classes. Thus, CSP features are formulated as xCSP ðiÞ ¼ diagðW′T XðiÞXT ðiÞW′Þ

ð1Þ

where W′ ¼ ½w1 ; …; wM′ ; wM−M′þ1 ; …; wM ]. From the essence of CSP, we can extend this feature extraction in the framework of Rayleigh coefficient maximization [16,21]. By defining two matrices Fd ¼ Γ1 −Γ2 and Fc ¼ Γ1 þ Γ2 , which reflect the discriminative activity and the common activity of two classes, respectively, solving the spatial filters can be equivalent to maximizing

λðwÞ ¼

wT Fd w : wT Fc w

ð2Þ

The solution to maximizing (2) is obtained by solving the following general eigenvalue problem: Fd w ¼ λFc w

ð3Þ

where λ is the general eigenvalue called Raleigh coefficient, and w is the corresponding general eigenvector. There are normally M solutions λi ði ¼ 1; 2; …MÞ for λ and thus M associated general eigenvector wi ði ¼ 1; 2; …MÞ. From the definition of Fc and Fd, we easily have   w T Γ1 w 1 w T Fd w 1 ð1Þ ð4Þ ¼ λ þ 1 ¼ ðλ þ 1Þ ¼ 2 wT Fc w 2 wT Fc w Furthermore, there is always w T Fc w wT Γ1 w wT Γ2 w ¼ T þ ¼ λð1Þ þ λð2Þ ¼ 1 T w Fc w w Fc w wT Fc w

ð5Þ

where λð1Þ and λð2Þ are the general eigenvalues from Γ1 wq ¼ λFc wq and Γ2 wq ¼ λFc wq , respectively. (4) implies that general eigenvector w maximizing λ also maximizes λð1Þ , and (5) suggests w maximizing λð1Þ and minimizing λð2Þ simultaneously. Therefore, if sorting general eigenvalues of (3) in descending order, 2M′ general eigenvector of M′ largest and M′ smallest general eigenvalues will result in the most significant difference of variances of two classes. By closer inspection of (2), (4) and (5), we find that Raleigh coefficient maximization will degenerate to solving CSP if Fc is equal to the identity matrix. solving Raleigh coefficient involves in both between-class covariance matrix Fd and within-class covariance matrix Fc, and thus solving (3) can result in feature with stronger discriminating ability than CSP. In addition, this frame of Raleigh coefficient is naturally in accordance with Fisher linear discriminant. In our channel selection method, RC features achieved by (3), instead of CSP features, are utilized. 2.2. RC-GA based channel selection To guarantee the classification accuracy on motor imagery EEG, we have to extract effective subject-specific and task-specific features from the raw blurred EEG. For motor imagery EEG, a specific task associates with a set of optimal channels that provides the greatest discriminability for the leveraged feature. Redundant or low relevant channels will decrease such an ability. In Raleigh coefficient context, involving these channels in feature selection leads to the overfitting of Raleigh coefficient to noise or artifacts and thus the subsequent low classification performance. Additionally, there are the triggered extra computation load and difficulty to identify the active brain region to the specific task. Consequently, channel selection prior to feature extraction and applying classification is beneficial. 2.2.1. Formulation of EEG channel selection Unlike feature selection which is directly followed by the classification, channel selection for EEG classification task is normally followed by both feature extraction and classification. However, we can borrow the categories organized into for feature selection, i.e. filter, wrapper and embedded techniques, to analyze channel selection method. In EEG channel selection context, filter technique only assesses the relevance inside the channel set and ignores the interaction of channel set with the subsequent feature extraction and classification, whereas the search for optimal channel subset in embedded technique is built into the classifier construction. Compared with filter and embedded technique, wapper technique evaluates each of candidate channel subsets with classification-relevant index. Due to relatively low computation cost and classification-dependent, we make use of wrapper strategy in our channel selection method.

L. He et al. / Neurocomputing 121 (2013) 423–433

Suppose M general eigenvalues solved with (3) are sorted in descending order as λi ði ¼ 1; 2; …MÞ. According to (4) and (5), the first and last M′ general eigenvalues correspond to projection directions with the greatest discriminating ability. The larger the first M′ eigenvalue λi ði ¼ 1; 2; …M′Þ is, the larger the difference between two classes is. The smaller the last M′ eigenvalue λi ði ¼ M−M′ þ 1; …MÞ are, the larger the difference between two classes is also. Therefore, channel selection can be formulated as the following optimization problem: M′

∑ ðλi −λMþ1−i Þ s:t:

max Λ

i¼1

Fd ðΛÞwq ¼ λFc ðΛÞwq

ð6Þ

where Λ is the index set of selected channels. Clearly, (6) is a problem of combinatorial optimization and analytical intractable. As GA can encode the channels directly, has high speed of convergence, and is not vulnerable to local optimum. We resort to it to solve this combinatorial optimization problem. 2.2.2. RC-GA method GA is a global search method and has been successfully applied in BCI problems [15]. GA stipulates a population of potential solutions by the survival of the fittest to obtain better and better approximations to an optimum. In each generation, a set of approximations is updated through selecting individuals according to their level of fitness and breeding them together utilizing operators borrowed from natural genetics. In this process, the evolution of populations of individuals that are better suited to their environment than the individuals that they were created from are achieved, just as in natural adaptation [22]. For the case of classifying motor imagery based EEG, we propose a method of integrating Raleigh coefficient and GA (called RC-GA) to select the optimal channels subset. In this method, Raleigh coefficient and the channel number are utilized to modulate GA. Assume there are total N channels in original EEG data. RC-GA can be described in detail as follows: (1) Preprocessing. Common average reference (CAR) and band-pass temporal filtering are first conducted. CAR is of the form xi ¼ xi −

1 M ∑ x Mj¼1 j

ð7Þ

which aims to reduce artifacts and noise where xi is the ith channel of the EEG signal, while band-pass temporal filtering is used to separate task-related component of EEG [8,16,17,19,23]. The passband of it is determined by either neurophysiologically prior knowledge (e.g., 7–30 Hz, where mu and beta rhythms normally locate in), or by maximizing the following measure: Δs ¼

ωH

∑ ∥s1 ðωÞ−s2 ðωÞ∥2

ð8Þ

ω ¼ ωL

where s1 ðωÞ and s2 ðωÞ are averages of power spectra of various channels under two different tasks, respectively, and ωH and ωL are upper and lower cutoff frequencies. Then, to improve the computation efficiency, some channels which are obviously not contribute to classification should be deleted in advance. It is shown in (3), Raleigh coefficient cannot be utilized for the situation of a single channel. We use T

diagðXðiÞX ðiÞÞ

¼ ½∥x1 ðiÞ∥22 ; …; ∥xM ðiÞ∥22 

ð9Þ

to calculate the energies and adopt them as the feature for all single channels, where ∥xj ðiÞ∥22 is the energy feature of jth channel in ith trial. Then we use F s ðjÞ ¼

Sd ðjÞ Sw ðjÞ

ð10Þ

425

Fig. 1. GA chromosome for channel selection.

to measure the discriminability of every single channel, where Sd(j) and Sw(j) are within-class variance and between-class variance of jth channel, respectively. After calculating F s ðjÞðj ¼ 1; …; MÞ, we sort all the channels in descending order in terms of F s ðjÞ and select the first NC channels as the candidate channel set for the further processing. (2) Encoding. Each individual (or chromosome) is constituted by NC various genes, each of which is represented by a 0 or 1, which denotes the channel is not selected and selected, respectively. A NC-dimensional individual encoding the selected channels is showed in Fig. 1. (3) Initializing population. An initial population of chromosomes is set. Each chromosome is a randomly generated NC-dimensional binary vector which comprises N ðiÞ S entries being equal to 1, where i ¼ 1; 2; …; N CH denotes the index of chromosomes. (4) Setting fitness function. In each of generations, a fitness function is required to evaluate whether a channel subset is “fit” for the selection. Some existing criteria, such as Scatter, InfoGain and Cfs [24,25], can be adopted, which lead to filter method. Another type of strategy is to set fitness function by the accuracy of specific classifier (e.g. SVM-GA). But such a methodology may involve in additional model selection and incur computation burden, hence being not suitable for the iteration procedure in GA. Therefore, we directly use the following quantity as the fitness function of a channel subset C: NRC

JðCÞ ¼ ∑ ðλi −λNðjÞ þ1−i Þ i¼1

ð11Þ

S

where jλi j is the ith ordered Rayleigh coefficient solved by Eq. (5). If J increases under a fixed NRC, the FD becomes better. (5) Selection. In each generation, multiple individuals are selected from the current population based on their fitness. Here, we adopt the method of tournament selection, which operates by running several “tournaments” among a few individuals randomly chosen from the population [26]. The winners of each tournament are selected for crossover and mutation. (6) Crossover and mutation. In each generation, the selected individuals or chromosomes are modified to form a new population by recombining and mutating. In our method, adaptive probabilities of crossover and mutation are adopted. The crossover probability pc and mutation probability pm are defined as ( pc1 J′ o J avg pc ¼ ð12Þ pc1 −ðpc1 −pc2 ÞnðJ′−J avg Þ=ðJ max −J avg Þ J′≥J avg and pm ¼

(

pm1

J o J avg

pm1 −ðpm1 −pm2 ÞnðJ max −JÞ=ðJ max −J avg Þ

J≥J avg

ð13Þ

where Javg denotes the average fitness value of each generation, J max represents the maximum fitness value of current generation, J′ is the larger fitness value between the parents (two selected chromos), and J stands for the fitness value of individual for

426

L. He et al. / Neurocomputing 121 (2013) 423–433

mutation. When the fitness value of individuals tends to become consistent, the value of pc and pm will be increased; and when the population fitness value is dispersed, the value of pc and pm will be decreased. The individual which has large fitness value will have low pc and pm, and it will get more protection. Therefore, this improved GA will maintain diversity of the population and guarantee the convergence of genetic algorithms [27,28]. (7) The alternating optimization procedure (for step 4 to step 6) repeats until the stopping criterion is satisfied. Here our algorithm terminates with a fixed number of generations. Table 1 shows the terminology in the context of genetic algorithm and its counterpart used in channel selection. After obtaining the optimal channel subset, test samples under selected channels are classified by a classifier such as FLD. The procedure of RC-GA is shown in Fig. 2. 2.3. RC-SFS and RC-SBS algorithm In order to make comparisons, we additionally conduct two channel selection methods based on benchmark searching strategies which are sequential forward search (SFS) and sequential backward search (SBS). Detailed descriptions of SFS and SBS can be found in [18,29]. These SBS and SFS based channel selection Table 1 Terminologies. GA context

Channel selection context

Gene Chromosome Population

Channel Encoded channel set Set of encoded channel sets

methods have the same preprocessing step and class separability criterion as above RC-GA. We hereinafter call them RC-SBS and RC-SFS, respectively.

3. Experimental results Algorithms are tested on two EEG datasets. The first is dataset IVa in BCI Competition III. The second is recorded from online motor imagery experiments in our lab. 3.1. Experiment I: dataset IVa in BCI Competition III The dataset IVa in BCI competition III was recorded from five healthy subjects (i.e., aa, al, av, aw and ay). According to the international 10/20 system, 118 channels were placed. Signals were down-sampled to 100 Hz for subsequent analysis. During each trial, the subject was required to perform one of the following motor imagery tasks for 3.5 s: right-hand and right-foot movements. The partition of training set and test set in our experiments is the same as the setting for BCI competition III [30]. EEG signals are first preprocessed via CAR and then band-pass temporal filtering. Based on maximization of the right-hand side of (8), we determine the passband of temporal filtering for each of subjects (typically in mu or beta bands). The selected frequency bands are 12.3–14.2 Hz for subjects aa, 12.2–14.2 Hz for al, 10.2– 12.3 Hz for av, 11.8–13.8 for aw, and 10.1–12.3 Hz for ay. When implementing channel selection, we first select 50 candidate channels by utilizing (9) and (10). Fig. 3 shows the Fisher ratios of every of all the 118 channels in descending order, which are calculated with (10). The first 50 channels are selected as candidate channels for further selection. In the next step, The

Original EEG of N channels

Original training EEG of N channels

Original test EEG of N channels

CAR band-pass filtering and pre-selecting to obtain candidate channel set using (10)

CAR band-pass filtering Channel selection

Initial population

Set the number of chromosomes

Theoptimal channel set

Updatethe population

RC feature extraction

Yes No

Calculate the fitness of each individual by (11)

Calculate Javg, J', and Jmax

Islagerthan maximal iteration?

FDA classifier

Select individuals

Calculate J of selected individuals

Crossover and mutation

Classification result

Fig. 2. Schema of RC-GA.

L. He et al. / Neurocomputing 121 (2013) 423–433

427

Fig. 3. Fisher ratios of single channels. (a) subject aa (b) subject al (c) subject av (d) subject aw and (e) subject ay.

Fig. 4. Maximum of J(C) in varied generation of RC-GA.

running parameters of GA are chosen as follows: N ¼ 30, Gen¼50, pc1 ¼0.9, pc2 ¼0.5, pm1 ¼0.1, pm2 ¼0.001. Fig. 4 shows the maximal fitness values of various generations obtained by RC-GA. It suggests that, with the increasing of the generation, the maximum of J (C) becomes stable and our RC-GA tends to be convergent within 50 generations. Fig. 5 demonstrates the topograph of selected channels with our RC-GA method. Figs. 6 and 7 show the maximum of J(C) under different number of selected channels by RC-SFS and RC-SBS, respectively. In these two figures, the location of the maximal values of the curve corresponds to the number of optimal selected channels by RC-SFS and RC-SBS. After channel selection, we extract the RC features of the optimal channel subset and subsequently apply FLD classifier to yield the classification accuracy. The classification accuracies resulted from RC-GA, RC-SFS, RC-SBS and SVM-GA are shown in Table 2. The

classification accuracies on all channels (i.e. without channel selection) are also in the table. For RC-GA and SVM-GA, the average of accuracies over ten runs and the associated standard variance are listed, due to inherent randomnesses of such two methods. From the table, we observe that our RC-GA achieves significant higher accuracies than those of other methods. The submission of the third author of this paper, which is depended on manually tuned parameters, ranked the second in BCI Competition III [30]. Classification accuracies achieved by our RC-GA is comparable to that result. Furthermore, compared with RC-GA method, RC-SFS and RC-SBS lead to the selected channel set comprising quite a few channels for almost all subjects. This indicates that RCSFS and RC-SBS obviously accommodate irrelevant channels and are unable to select compact channels subset which is the most effective for classification.

428

L. He et al. / Neurocomputing 121 (2013) 423–433

Fig. 5. Topographic map of selected channels by RC-GA.

In Fig. 5, selected channels by RC-GA are marked, which tend to distribute in left and central motor cortex area. From neurophysiological viewpoint, motor imagery of right hand and foot actually correspond to left and central motor cortex area [8,31,32]. Furthermore, such an channel selection result in the highest classification accuracy in four compared methods as shown in Table 2. Therefore, results from RC-GA are neurophysiologically reasonable and validate the effectiveness of this method in channel selection. 3.2. Experiment II: dataset from our motor imagery experiments Six healthy subjects (i.e., a, b, c, d, e and f) participated in our online BCI control experiments with visual feedback. The left-hand

and right-hand movement imagination was designated to control the horizontal movement of a cursor. Fig. 8(a) shows the paradigm of the feedback experiment. The EEG recording was made using Neuroscan system. According to the 10/20 international system, 32 channels were placed, using the left mastoid for reference and right mastoid as ground. Electrode layout is shown in Fig. 8(b). EEG signals were digitized at 250 Hz. For each subject, a dataset composed of 80 trials (40 trials per class) recorded in a single session was used for off-line algorithmic studies. A time period of 3 s (between 3.5 and 6.5 s) is chosen from each of trials for our processing. The EEG signal is preprocessed via CAR and then a bandpass filter whose pass frequency ranges from 7 to 30 Hz is applied to the data to restrict the analysis to ERD/ERS features. According to prior neurological knowledge, ERD/ERS

L. He et al. / Neurocomputing 121 (2013) 423–433

429

Fig. 6. Maximum of JC under various numbers of selected channels by RC-SFS.

Fig. 7. Maximum of J(C) under various numbers of selected channels by RC-SBS.

Table 2 Classification accuracy (%) under various channel selection methods. Subject

All channels

RC-SFS

RC-SBS

SVM-GA

RC-GA

aa al av aw ay

67.9 88.3 59.2 87.6 80.4

80.4 92.3 64.3 90.0 82.8

72.3 91.7 60.7 91.2 81.7

78.3 75.5 94.8 72.1 67.9 78.0 92.6 73.2 85.7 74.2

86.4 7 3.6 98.5 7 2.4 75.17 6.3 93.9 7 2.8 87.17 3.9

features appear on sensorimotor cortex area in such a frequency band. By utilizing (10), 25 channels are selected as candidate channels for the further selection. For our RC-GA algorithm, The GA running parameters are chosen as follows: N ¼ 30, Gen ¼50, pc1 ¼0.9, pc2 ¼ 0.5, pm1 ¼ 0.1, pm2 ¼ 0.001. Fig. 9 shows the maximal J(C) of each generation by RC-GA algorithm. Similarly to the first dataset, the RC-GA

algorithm can effectively accomplish the channel selection within the 50 generations. For clear illustration, the optimal channels subsets for each subject are shown in Table 3. Figs. 10 and 11 show the maximum of J(C) under various numbers of channels by RC-SFS and RC-SBS algorithm, respectively. These figures determine the numbers of selected channels by obtain the optimal channels combination corresponding to the maximal J(C). By comparison of Table 3, Figs. 10 and 11, we can see that the numbers of selected channels by RC-SFS and RC-SBS algorithms are more than those by RC-GA. The classification results under different channel selection methods are shown in Table 3. As can be seen, compared with the method without channel selection, the other four methods can achieve better results. It is also worthwhile to note that RC-SFS and RC-SBS perform worse than using all channels methods in subject c and d, which suggests RC-SFS and RC-SBS may performance poorly in practical applications. Our RC-GA achieves the highest classification accuracies. From the inspection of Table 3, it is obvious that RC-GA tends to select channels in right and left

430

L. He et al. / Neurocomputing 121 (2013) 423–433

AF3 F5

F3

AF4

FC5 FC3

FCz

C5

Cz

C3

P5

FC4 FC6 C4

CPz

CP5 CP3

C6

CP4 CP6

Pz

P3

F6

F4

Fz

P4

P6

PO1POzPO2

0.5

Maximal J (C)

Maximal J (C)

Fig. 8. (a) Paradigm of the online motor imagery experiment. During the first 3 s, the screen was blank, and the subject was in the relax state. After that, an arrow was presented on the screen, indicating the performed imagery task. At the third second, a cursor started to move horizontally according to the subject's EEG signal. At the end of the trial, a mark appeared to show the classification result of the trial, and the subject was instructed to relax and wait for the next task. (b) Channels layout.

0.45 0.4 0.35

subject a 0

10

20

30

40

4 3 2 1

50

subject b 0

10

20

1 0.5 subject c 0

0

10

20

30

40

subject d 0

10

20

Maximal J (C)

Maximal J (C)

1.5

subject e 20

30

40

50

Gen

2

10

50

0.6 0.5

50

2.5

0

40

0.7

Gen

1

30 Gen

Maximal J (C)

Maximal J (C)

Gen

30

40

50

2 1.5 1 subject f 0.5

0

10

Gen

20

30

40

50

Gen

Fig. 9. The curve of maximal J(C) under varied generation of RC-GA.

Table 3 Candidates for channel selection (initial channel set). Subject

Electrodes layout

a b c d e f

F5, F3, F4, F6, FC5, FC3, FC4, FC6, C3, Cz, C4, CP5, CP3 F5, F6, FC5, FC6, C3, Cz, C4, CP3, CP4, P3, P4 FC4, C3, Cz, C4, CP5,CP3, CPz, CP4, P3 F5, F3, FC6, FCz, FC4, C3,Cz, C4, CP5,CP3, CP4, P4 F5, F3, FC5, FC3, FC4, C3, Cz, C4, CP5, CP3, CP4, CP6 F5, F3, FC5, FC3, FC4, FC6, C3, Cz, C4, CP5, CP3, CP4, CP6

motor cortex area which correspond to motor imagery of left-hand and right-hand movement, while achieves the highest classification compared with other methods (Table 4).

3.3. Analysis Based on aforementioned experiment results, we present the following analysis. Using EEG of all channels mostly lead to the worst classification accuracy, and RC-SBS and RC-SFS result in higher accuracies, while RC-GA achieves the highest accuracies. The effectiveness of RC-GA, RC-SBS and RC-SFS on classification accuracy imply that channel selection is normally necessary or beneficial to the performance of motor imagery based EEG classification. As we know, SFS and SBS based selection methods suffer from nesting effect. We take the RC-SFS algorithm for example. When the process of channel selection by RC-SFS, initial channels have to be selected as the starting point of selection algorithm. These

L. He et al. / Neurocomputing 121 (2013) 423–433

Maximal J (C)

1.5 1 0.5

0

1 Maximal J (C)

1.5

subject a

10 20 Number of channels

0

10 20 Number of channels

10 20 Number of channels

30

subject d

0

10 20 Number of channels

30

1.5 subject e

Maximal J (C)

Maximal J (C)

0

0.5

0

30

2.5 2 1.5 1 0.5

0.5

1

subject c

0.5

0

subject b

1

0

30

Maximal J (C)

Maximal J (C)

2

431

0

10 20 Number of channels

subject f 1 0.5 0

30

0

10 20 Number of channels

30

Fig. 10. The curve of maximal J(C) under different numbers of channels by RC-SFS.

2 Maximal J (C)

Maximal J (C)

1 subject a 0.5

0

0

10 20 Number of channels

30

Maximal J (C)

Maximal J (C)

0.5

10 20 Number of channels

30

0.5 subject d 0

0

10 20 Number of channels

30

0

10 20 Number of channels

30

1.5 Maximal J (C)

3 Maximal J (C)

0

1 subject c

2 1 subject e 0

subject b 0

1

0

1

0

10 20 Number of channels

30

1 0.5 subject f 0

0

10 20 Number of channels

30

Fig. 11. The curve of maximal J(C) under different numbers of channels by RC-SBS.

initial channels are determined by the maximal Fisher ratio of feature of each channel. Once these initial channels and the subsequent selected channels are set to be initial set, they can not be discarded any longer. This makes the high possibility that

the final selected channel set will include some irrelevant channels which will affect the classification accuracy. As for RC-GA, the mechanism of random and global searching can effectively avoid such irrelevant channels.

432

L. He et al. / Neurocomputing 121 (2013) 423–433

the length of coding is N C , then fitness space HF contains 2NC elements.

Table 4 Classification accuracy (%) under various channel selection methods. Subject

All channels

RC-SFS

RC-SBS

SVM-GA

RC-GA

a b c d e f

87.3 68.7 90.5 85.4 70.6 76.5

92.5 75.9 89.1 81.5 85.4 83.7

92.5 75.4 88.5 80.6 84.3 84.2

94.17 3.1 79.5 7 4.1 90.3 7 3.4 86.6 7 6.0 85.8 7 5.5 85.17 5.9

97.4 7 2.1 81.3 73.2 92.2 7 1.9 89.8 7 4.1 87.1 7 3.3 88.5 7 3.9

Compared with RC-SFS and RC-SBS, RC-GA achieves more compact selected channel set while has significantly higher classification accuracy. This implies that selects effective channel set for discrimination while excludes useless or redundant channels. Furthermore, the inspection of the layouts of selected channels reveals their locations are in accordance with the relevant neurophysiological principles. Namely, our RC-GA achieves neurophysiologically reasonable channel selection.

4. Conclusion Channel selection often acts as a necessary or beneficial preprocessing before EEG feature extraction and classification. By selecting discriminating-relevant and redundant-free channel sets, we can not only save a great amount of computational resource in subsequent analysis, but often achieve a more effective and accurate model. Besides, channel selection can improve the comprehensibility of the resulting model and clarify the relationships among channels. In this paper, we have presented RC-GA algorithm for channel selection, aiming at better motor imagery EEG classification. In this method, energy feature of each single channel is first extracted and Fisher ratio on it is calculated to reserve part of channels to form candidate channel set for further selection. Then, an improved genetic algorithm oriented by RC Maximization is constructed to fulfill random global searching. Experimental results based on two real-world EEG datasets demonstrated that RC-GA achieves good classification performance, while reaching compact and neurophysiologically reasonable channel subset.

Acknowledgment This work is sponsored by the National High-Tech R&D Program of China (863 Program) under Grant 2012AA011601, National Natural Science Foundation of China under Grant 91120305, University High Level Talent Program of Guangdong under Grant N9120140A, and Fundamental Research Funds for the Central Universities, SCUT under Grant 2012ZM0098.

Appendix A. Proof of the convergence of RC-GA We first introduce some basic concepts. Definition 1. Supposed the length of coding is NC . Then HI ¼ f0; 1gNC is defined as individual space, and HNCH is defined as population space contains Nc individuals, where ith individual is represented with yi ¼ ðyi;1 ; yi;2 ; …; yi;NC Þ and yi1 ; yi2 ; …; yi;NC are genes with descendant weights. They determine the global and local position of the individual in the solution space. The higher power a gene has, the more globally it contributes to the solution. Definition 2. For our fitness J defined in (11), we define its domain N C −1 RC f∑N g as fitness space HF. Supposed i ¼ 1 ðλi −λN ðjÞ þ1−i Þjyi ; i ¼ 0; …; 2 S

Theorem 1. For a specific population YðtÞ ¼ ðy0 ; y1 ; …; yNCH Þ, after an operation of selection and crossover, the probability of an individual reaching the optimal solution yopt can be expressed as pscross ðyopt Þ ¼ ps ðyopt Þð1−pc Þ þ pcross ðyopt Þ

ð14Þ

where ps ðyopt Þ is the probability of selecting optimal solution, and pcross ðyopt Þ denotes the probability of yielding optimal solution by crossover. Theorem 2. For a specific population YðtÞ ¼ ðy0 ; y1 ; …; yNCH Þ, after an operation of mutation, the probability of an individual reaching a solution suspace Vji is as follows: ( 1−pm ; yb ¼ ym pðyb ¼ Mðym ÞÞ ¼ ð15Þ pm ; yb ≠ym where pm is the mutation probability mentioned to Eq. (11). If let 8 > < 1; δm ¼ pm ðyopt Þ−pm ; > : 1−2p m

yopt ¼ ym yopt ≠ym

;

where pm ðyopt Þ denotes the potability of the non-optimal individual ym mutated to be a optimal one yopt, according to (15), Then, we can deduce pmut ðyopt Þ as follows: pmut ðyopt Þ ¼ pðyb ¼ Mðym Þ; yb ∈Yðt þ 1Þym ∈YðtÞÞ ¼ pm þ δm ð1−2pm Þ ¼ 12 þ ðδm −12Þð1−2pm Þ

ð16Þ

where Mðym Þ stands for the operation of individual ym mutation. After the above illustration. The convergence of the our algorithm can be proved as follows. In our channel selection method, each generation is conducted by combination of selection, crossover and mutation. Supposed that the global optimal solution is yopt, that the individuals in Tth generation will contain global optimal solution genes and their complementary genes, Then it should prove that the probability of the appearance of the global optimal solution p⋆T will converge to 1 after T generations, detailed descriptions is as follows: Firstly, in each generation, there are selections and crossovers for r times, and mutations for s times. Thus, after the operation of sections and crossovers, the probability of optimal individual appear as p1 ¼ 1−ð1−pscross ðyopt ÞÞr

ð17Þ

and, after the operation of mutations, the probability of individual in yopt is p2 ¼ 1−ð1−pmut ðyopt ÞÞs

ð18Þ

Therefore, after a generation, pgen ðyopt Þ, the probability of individual in yopt, can be deduced as follows: pgen ðyopt Þ ¼ p1 þ p2 −p1 p2 ¼ 1−ð1−pscross ðyopt ÞÞr þ 1−ð1−pmut ðyopt ÞÞs −½1−ð1−pscross ðyopt ÞÞr ½1−ð1−pmut ðyopt ÞÞs  ¼ 1−ð1−pscross ðyopt ÞÞr ð1−pmut ðyopt ÞÞs

ð19Þ

Then in the Tth generation, the probability of getting global optimal solution is of the form T

T

t¼1

t¼1

r s p⋆ T ¼ 1− ∏ ð1−pgen ðyopt ÞÞ ¼ 1− ∏ ð1−pscross ðyopt ÞÞ ð1−pmut ðyopt ÞÞ

ð20Þ

L. He et al. / Neurocomputing 121 (2013) 423–433

If T approaches infinite, the probability p⋆T goes to 1 which means our channel selection is convergent to optimal channel subset. References [1] J. Wolpaw, N. Birbaumer, D. McFarland, G. Pfurtscheller, T. Vaughan, Brain– computer interfaces for communication and control, Clin. Neurophysiol. 113 (6) (2002) 767–791. [2] T. Ebrahimi, J.M. Vesin, G. Garcia, Brain–computer interface in multimedia communication, IEEE Signal Process. Mag. 20 (1) (2003) 14–24. [3] G. Pfurtschellera, C. Brunnera, A. Schlögla, F. Lopes da Silva, Mu rhythm (de) synchronization and eeg single-trial classification of different motor imagery tasks, NeuroImage 31 (1) (2006) 153–159. [4] G. Pfurtscheller, D. Flotzinger, J. Kalcher, Brain–computer interface-a new communication device for handicapped persons, J. Microcomp. Appl. 16 (3) (1993) 293–299. [5] N. Brodua, F. Lotteb, A. Lécuyerd, Exploring two novel features for EEG-based brain–computer interfaces: multifractal cumulants and predictive complexity, Neurocomputing 79 (1) (2012) 87–94. [6] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, K.-R. Müller, Optimizing spatial filters for robust EEG single-trial analysis, IEEE signal Process. Mag. 25 (1) (2008) 41–46. [7] B. Blankertz, G. Dornhege, M. Krauledat, K.-R. Mülle, G. Curio, The noninvasive berlin brain–computer interface: fast acquisition of effective performance in untrained subjects, NeuroImage 37 (2) (2007) 539–550. [8] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, K.-R. Muller, Optimizing spatial filters for robust EEG single-trial analysis, IEEE signal Process. Mag. 25 (1) (2008) 41–56. [9] W. Wu, X. Gao, B. Hong, S. Gao, Classifying single-trial EEG during motor imagery by iterative spatio-spectral patterns learning (ISSPL), IEEE Trans. Biomed. Eng. 55 (6) (2008) 1733–1743. [10] M. Arvaneh, C. Guan, K.K. Ang, C. Quek, Optimizing the channel selection and classification accuracy in EEG-based BCI, IEEE Trans. Biomed. Eng. 58 (6) (2011) 1865–1873. [11] T.N. Lal, M. Schröder, T. Hinterberger, J. Weston, M. Bogdan, N. Birbaumer, B. Scholkopf, Support vector channel selection in BCI, IEEE Trans. Biomed. Eng. 51 (6) (2004) 1003–1010. [12] J. Yang, V. Honavar, Feature subset selection using a genetic algorithm, IEEE Intell. Syst. Appl. 13 (2) (1998) 44–49. [13] A. Jain, D. Zongker, Feature selection: evaluation, application, and small sample performance, IEEE Trans. Pattern Anal. Mach. Intell. 19 (2) (1997) 153–158. [14] T. Lan, D. Erdogmus, A. Adami, et al., Salient EEG channel selection in brain– computer interfaces by mutual information maximization, in: Proceedings of the 27th Annual Conference of the IEEE-EMBS, 2006, pp. 7064–7067. [15] M. Schröder, M. Bogdan, T. Hinterberger, Automated EEG feature selection for brain–computer interfaces, in: Proceedings of the 1st International IEEE–EMBS Conference on Neural Engineering, 2003, pp. 626–629. [16] Y. Li, C. Guan, Joint feature re-extraction and classification using an iterative semi-supervised support vector machine algorithm, Mach. Learn. 71 (1) (2008) 33–53. [17] L. He, Z. Gu, Y. Li, Z. Yu, Classifying motor imagery EEG signals by iterative channel elimination according to compound weight, in: LNCS, volume 6320, Artificial Intelligence and Computational Intelligence, 2010, pp. 71–78. [18] L. He, Z. Yu, Z. Gu, Y. Li, Bhattacharyya bound based channel selection for classification of motor imageries in EEG signals, in: Proceedings of the China Control and Decision Conference (CCDC09), 2009, pp. 2353–2356. [19] Y. Li, K.K. Ang, C. Guan, Digital signal processing and machine learning, in: B. Graimann, B. Allison, G. Pfurtscheller (Eds.), Brain–Computer Interfaces: Revolutionizing Human Computer Interaction, Springer-Verlag, Berlin, 2010, pp. 305–330. [20] Y. Li, C. Guan, An extended em algorithm for joint feature extraction and classification in brain–computer interfaces, Neural Comput. 18 (11) (2006) 2730–2761. [21] S. Mika, Kernel Fisher Discriminants, Ph.D. Thesis, University of Technology, Berlin, 2002. [22] P. Melin, O. Castillo, E.G. Ramrez, J. Kacprzyk, W. Pedrycz, Analysis and Design of Intelligent Systems Using Soft Computing Technique, Springer Publishing Company, 2007. [23] M. Cheng, W. Jia, X. Gao, S. Gao, F. Yang, Mu rhythm-based cursor control: an offline analysis, Clin. Neurophysiol. 115 (4) (2004) 745–751. [24] F. Garca-Lpez, M. Garca-Torres, B. Melin, J.A. Moreno, J.M. Moreno-Vega, Solving feature subset selection problem by a parallel scatter search, Eur. J. Oper. Res. 169 (2) (2006) 477–489. [25] P. Gholami, A. Bazle, M. Eftekhary, P. Gholami, A new method for filtered attribute in data mining using entropy, J. Basic Appl. Sci. Res. 2 (2) (2012). [26] T. Blickle, L. Thiele, A comparison of selection schemes used in evolutionary algorithms, Evol. Comput. 4 (4) (1996) 361–394. [27] M. Srinivas, L.M. Patnaik, Adaptive probabilities of crossover and mutation in genetic algorithm, IEEE Trans. Syst. Man Cybern. 24 (4) (2008) 656–667.

433

[28] P.Y. Xia, X.Q. Ding, B.N. Jiang, A GA-based feature selection and ensemble learning for high-dimensional datasets, in: 2009 International Conference on Machine Learning and Cybernetics, 2009, pp. 7–12. [29] S. Theodoridis, K. Koutroumbas, Pattern Recognition, Academic Press, San Diego, 2003. [30] BCI Competition III. Available from: 〈http://www.bbci.de/competition/iii/〉. [31] G. Pfurtscheller, C. Neuper, C. Andrew, G. Edlinger, Foot and hand area mu rhythms, Int. J. Psychophysiol. 26 (1–3) (1997) 121–135. [32] G. Pfurtscheller, G. Neuper, Motor imagery activates primary sensorimotor area in humans, Neurosci. lett. 239 (2-3) (1997) 65–68.

Lin He received the B.S. degree in 1995 from Xi'an Institute of Technology, China, the M.S. degree in 2003 from Chongqing University, China, and the Ph.D. degree in 2007 from Northwestern Polytechnical University, China. In 2007, he joined the College of Automation Science and Engineering, South China University of Technology, China, as a Lecturer. His research interests include statistical pattern recognition, hyperspectral target detection and classification, and biomedical signal processing.

Youpan Hu is a Master student with College of Automation Science and Engineering at South china university of Technology, China. His research interest is focused on BCI technology, EEG signal processing, feature extraction and classification.

Yuanqing Li received the B.S. degree in applied mathematics from Wuhan University, Wuhan, China, in 1988, the M.S. degree in applied mathematics from South China Normal University, Guangzhou, China, in 1994, and the Ph.D. degree in control theory and applications from the South China University of Technology, Guangzhou, in 1997. Since 1997, he has been with the South China University of Technology, Guangzhou, China, where he became a Full Professor in 2004. During 2002–2004, he was a Researcher at the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Saitama, Japan. He was a Research Scientist at the Laboratory for Neural Signal Processing, Institute for Infocomm Research, Singapore during 2004-2008. He is the author or coauthor of more than 60 scientific papers in journals and conference proceedings. His research interests include blind signal processing, sparse representation, machine learning, brain–computer interface, EEG and fMRI data analysis.

Daoli Li is a Master student with College of Automation Science and Engineering at South china university of Technology, China. His research interest is focused on BCI EEG signal processing and classification.