Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Classification of mental tasks from EEG data using backtracking search optimization based neural classifier Saurabh Kumar Agarwal a,n, Saatvik Shah a, Rajesh Kumar b a b
Computer Engineering Department, MNIT Jaipur, India Electrical Engineering Department, MNIT Jaipur, India
art ic l e i nf o
a b s t r a c t
Article history: Received 22 May 2014 Received in revised form 28 January 2015 Accepted 16 March 2015 Communicated by Wei Wu
Brain Computer Interface (BCI) has been applied to augment impaired human cognitive function by converting mental signals into control signals. This paper presents a neural classifier optimized using Backtracking Search optimization Algorithm (BSANN) to classify three mental tasks consisting of right or left hand movement imagination and generation of word. BSA is an Evolutionary Algorithm (EA) which is suitable for deciphering non-linear and non-differentiable problems. Single control parameter gives BSA an upshot over other EA due to the lower degree of randomness. BSA keeps memory of old population to generate a new candidate set i.e. solution, so it gets the advantage of utilizing the search results of the previous population. The proposed method (BSANN) has been tested on the publicly available datasets of BCI Competition 3-5. Experimental result shows that BSANN exhibits better results than 21 other algorithms for classification of mental tasks in terms of classification accuracy. & 2015 Elsevier B.V. All rights reserved.
Keywords: Backtracking Search optimization Algorithm (BSA) Brain Computer Interface (BCI) Mental tasks classification Neural network (NN) Electroencephalogram (EEG)
1. Introduction Brain Computer Interface (BCI) also referred as Brain Machine Interface (BMI) is a system that processes brain's neural activity to provide an extra channel for communication [7]. Several BCI technologies like electrocorticographic (ECoG) engrafts [8], Magnetoencephalography (MEG) [9], Functional Magnetic Resonance Imaging (fMRI) [10] and Functional Near-Infrared systems (fNIR) [11] have been described in the literature. This paper uses dataset having Electroencephalogram (EEG) signals. An Electroencephalogram (EEG) is a recording of low voltage (about 5–100 mV) brain electrical activity. EEG is a measure of voltage difference between a placed electrode and reference electrode. Motor neuron diseases are a group of neurological disorders that affect muscle activities including walking, talking, swallowing and breathing [1]. BCI provides alternative communication for people suffering from motor diseases like quadriplegia due to Spinal Cord Injury (SCI) or affected by Cerebrovascular Accident (CVA) [2,3]. Along with this, BCI has also been used in multimedia, game industry [4] and controlling robotic arms [5,6]. Over the years, several methods
n
Corresponding author. E-mail addresses:
[email protected] (S.K. Agarwal),
[email protected] (S. Shah),
[email protected] (R. Kumar).
have been proposed to extract meaningful information from EEG signals [12–15]. All these algorithms have a common method of extracting information which involves preprocessing, feature extraction and finally classification of data. Generally preprocessing involves reconstruction of data by filtering noisy data. Process of feature extraction usually deals with the selection of frequency components [16]. Several mechanisms have been suggested over the years to understand neural activities in the brain. These mechanisms include slow cortical potential, sensorimotor rhythms, P300 potentials and Steady-State Visual Evoked Potentials (SSVEP) [17]. Each mechanism involves different preprocessing and feature extraction techniques. Finally, the same classifier can be used on selected features. Three major classification techniques have been described in the given literature. The first method is based on statistical classification like k Nearest Neighbors (kNN) or the Bayesian classifier [20,21]. Second method is based on unsupervised learning methods like Radial Bias Function (RBF) [18,19]. Third one is based on supervised learning algorithm like Linear Discriminant Analysis (LDA) [22,23], Neural Networks (NN) [24,25] and Support Vector Machines (SVM) [26]. Classification performance can be improved using some feature extraction techniques [27,28] to reduce data dimensionality and noise, which can be combined with NN to provide better results [29]. Several optimization algorithms have been combined with NN to increase classification accuracy like Genetic Algorithms (GA) [37], Particle Swarm Optimization [37] (PSO), Hidden Markov Models (HMM) [44].
http://dx.doi.org/10.1016/j.neucom.2015.03.041 0925-2312/& 2015 Elsevier B.V. All rights reserved.
Please cite this article as: S.K. Agarwal, et al., Classification of mental tasks from EEG data using backtracking search optimization based neural classifier, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.03.041i
S.K. Agarwal et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
2
This paper proposes an extension to the Backtracking Search optimization Algorithm (BSA) [30] as Backtracking Search optimization Algorithm for Neural Network (BSANN). As a classification task, EEG data which are provided by BCI Competition III [34] have been considered. Three mental tasks are taken into consideration which are imagination of repetitive left-hand movement, repetitive right-hand movement and generation of word. These three mental tasks can be used as three control signals for BCI system. This paper presents a neural classifier optimized using Backtracking Search optimization Algorithm (BSA) [30] with Neural Network (NN) to classify data. BSA is an Evolutionary Algorithm (EA). EAs are a part of heuristic algorithms which use nature's reproductive methods as their problem solving strategy. Unlike gradient based approaches, EAs are better in solving non-linear, non-differentiable and complex problems [31,32]. EAs itself have lots of problem such as panoptic sensitivity to control parameters, slow speed and premature convergence [33]. BSA has a unique method for generation of trail population that enables BSA to solve problems quickly. BSA's Trail population generation has basic operators such as selection, mutation and crossover. Unlike most Differential Evolution (DE) algorithms and their derivatives like adaptive differential algorithm (JDE), parameter adaptive differential algorithm (JADE) and selfadaptive differential algorithm (SADE), BSA's random mutation method uses only one direction individual for each agent. This strategy gives BSA an edge over other DE algorithms. BSA randomly selects one of its previous populations to serve as a direction for future populations to evolve. BSA's crossover strategy is complex, non-uniform and differs from other crossover strategies used by other DE algorithms. BSA has a single control parameter that helps it to overcome these shortcomings. It has a memory which stores some random previous generation. This memory allows the next generation to take advantage of knowledge of the previous generations. Thus, BSA's memory helps it to get better exploration and exploitation ability. In this paper, an extensive comparison with other algorithms which are implemented by other authors has been done [36,38,39–42,43,45–47]. This paper is further organized as follows: Section 2 provides an insight to the reader regarding previous research in the field of Motor Imagery BCI pattern detection with the respective results and analysis. Section 3 elaborates on the algorithm used for classification step by step. Section 4 provides details regarding how data was acquired, processed, and finally an in-depth analysis of the detection algorithm, its pros, cons and evaluation with respect to other efficient alternatives. The results of this comparison have been analyzed and tabulated in the following tables. Section 5 highlights the final cross-validation results and describes the possible future work.
2. Literature survey This section provides a comprehensive study of the abundant literature available on detection of motor imagery events in EEG. Sun et al. applied a basic Decorrelated Least Mean Square (DLMS) [43] based method in which a Bayesian classifier is coupled with Gaussian mixture model under LMS to generate formula for gradient decent algorithm. Statistical learning methods like ensemble methods [42] and Discriminative Analysis (DA) have been used. Several flavors of DA have emerged such as Fisher Discriminative Analysis (FDA) [37], Regularized Discriminative Analysis (RDA) [37] and Distance Based Discriminative Analysis (DBDA) [39]. DBDA is based on Canonical Variates Transformation (CVT) and DBDA combined with a Mental Tasks Transitions Detector (MTTD) to classify spontaneous mental activities in a brain–computer interface working under an asynchronous protocol. Authors such as Sun et al. have focused on
methods deriving better features when proposing Random Electrode Selection method (RESE) [41] as well as Common Spatial Pattern (CSP) to get good accuracy. In RESE, a feature subspace is first determined by a couple of randomly selected electrodes and Principal Component Analysis (PCA) is used to carry out dimensionality reduction. Several add-ons of CSP have been implemented earlier like Stationary CSP (SCSP), Windowed CSP (WCSP) and Adaptive CSP (ACSP) [45]. In these the weighed update of signal covariances and the most discriminative features related to the current brain states are extracted by the method of multi-class Common Spatial Patterns (CSP). SCSP does not update spatial filters with a new test session. WCSP updates a single covariance. ACSP introduces a variability coefficient which symbolizes a notion of weighted average of historical covariance in the algorithm. All three of these algorithms use Support Vector Machine (SVM) for classification. Hierarchical form of previously existing methods has been used to improve results, e.g. Hierarchical Hidden Markov Model (HHMM) and Hierarchical Hidden Conditional Random Fields (HHCRF) [36]. HHCRF is a discriminative model corresponding to HHMM models the conditional probability of the states at the upper levels given observations. The states at the lower levels are hidden and marginalized in the model definition. Two algorithms are developed for the model: a parameter learning algorithm that needs only the states at the upper levels in the training data and the marginalized Viterbi algorithm, which computes the most likely state sequences at the upper levels by marginalizing the states at the lower levels. Along with these algorithms, neural network (NN) and SVM have also been very popular amongst researchers for EEG based pattern detection. Various flavors of SVM like Least Square SVM (LS-SVM) [40], SVM [37], Transition Detection based SVM (TD-SVM) [46] and Evolved filters (SVM) [47] have been used. In LS-SVM, decision making is performed in two stages. In the first stage, clustering technique (CT) has been used to extract representative features of EEG data. In the second stage, least square support vector machine (LS-SVM) is applied to the extracted features for classification. In TDþSVM, the classification problem is reformulated into two subproblems: detecting class transitions and determining the class for sequences of samples between transitions. Evolved filters (SVM) algorithm uses an automatically optimize spatial and frequency-selection filters by means of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Several stochastic search algorithms have been used to optimize classification performance of NN such as Genetic Algorithm (GANN) [37], BackPropagation (BPNN) [37] and Particle Swarm Optimization (PSONN) [37]. Additionally, feature extraction techniques with a combination of these optimization algorithms have been proposed like Principal Component Analysis (PCA) with Improved PSO (PCA-IPSONN) [37] and HMM with BPNN [44]. IPSO method consists of the Modified Evolutionary Direction Operator (MEDO) and the PSO. The proposed MEDO combines the Evolutionary Direction Operator (EDO) and the migration. The MEDO can strengthen the searching of a global solution.
3. Classification methodology Complete classification consists of two parts – creating a model and implementing the algorithm to optimize this model. This paper uses the neural network as a model to classify the data. BSA has been applied to optimize the weights of the neural network. 3.1. Neural network A multilayer feed-forward neural network is used for classification of three mental tasks. Architecture of the neural network consists of three layers: input, hidden and output. Let X be the input vector ðx1 ; x2 ; x3 …; xNi þ 1 Þ, H vector ðh1 ; h2 ; h3 …; hNh þ 1 Þ represents
Please cite this article as: S.K. Agarwal, et al., Classification of mental tasks from EEG data using backtracking search optimization based neural classifier, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.03.041i
S.K. Agarwal et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
hidden node values and Y be the output vector ðy1 ; y2 ; y3 …; yNo Þ. Where N i ; N h and No are the number of nodes in input, hidden and output layer respectively. Ni is equal to the number of features in the data. One extra unit is added to the input X and hidden H vector which is the bias. The sigmoid function is chosen as the activation function for both the layers. One vs. all technique has been used for multiclass classification in which a single classifier is used per class to discriminate that class from other classes. Classifier with highest confidence score is selected as given in the following equation: y ¼ max ψ k ð1Þ k A 1…N o
y is the selected output class and cost function (ψk) is defined as mean square error of difference between expected (Y) and observed output (O) as given in the following equation:
ψ k ¼ mseðsigmðΘTho nsigmðΘTih nXÞÞ YÞ
ð2Þ
i þ 1ÞnN h Θih ðΘ1ih ; Θ2ih ; Θ3ih ; …‥ΘðN Þ represents the weight vector ih between input and hidden layer. Θho represents the weight vector h þ 1ÞnN o Θ1ho ; Θ2ho ; Θ3ho ; …ΘðN between hidden and output layer. sigm ho
(x) is the sigmoid function as given in the following equation: sigmðxÞ ¼
1 1þex
ð3Þ
BSA is used to optimize the weights of the neural network. Concatenation of both Θih and Θho is given as the input for BSA. So, the problem dimension becomes ðN i þ 1ÞnNh þ ðN h þ1ÞnN o . For each subject, data set has been divided into training and testing data sets. Three-fourth of the data is selected randomly for training and the remaining one-fourth of the data is selected for testing. Ten different sets of training and testing data are created and tested. Class distribution of one of the 10 datasets is shown in Table 1. Fig. 2 delineates the neural network model used for training. The bias terms b1 and b2 are used in the hidden and output layers respectively. Learning algorithm plays a decisive role in the performance of neural network. The core of the optimization problem is minimization of cost function which may have multiple minima. Several learning algorithms have been used to optimize the weights of neural network which include both derivative and stochastic based approaches. Performance of derivative based approaches deteriorates as the number of minima increases in the optimization problem. EAs give better classification accuracy because of high entropy i.e. randomness in the algorithm which prevents stagnation. Major problem with most of the EAs lies in their strong sensitivity towards multiple control parameters [30] but BSANN only has a single control parameter so it performs better than other EAs. Unlike other algorithms that choose the best or random population as a search direction, BSANN chooses its historical population as a search direction. Thus, augmentation of
3
single control parameter with a memory component helps it achieve high classification accuracy in comparison with other EAs. 3.2. Backtracking Search optimization Algorithm for neural network (BSANN) Backtracking Search optimization Algorithm (BSA) is based on the concept of Evolutionary Algorithms (EA). It follows an iterative model design and selects the global minimum solution from the entire population. Here population refers to a group of candidate solutions for the given optimization problem. BSA for neural network (BSANN) can be described using six processes: representation, initialization, selection-1, mutation, crossover and selection-2. The General Structure of BSANN is outlined in Algorithm 1. Stopping conditions of algorithm are 1. Maximum number of epochs reached. 2. No change in cost function for 100 consecutive iterations. Algorithm 1. General structure of BSANN. 1. Representation 2. Initialization while stopping conditions are met do 3. Selection-1 4. Mutation 5. Crossover 6. Selection-2 end while
Representation: Each feature of the problem set is an input to the NN. Each neuron is connected with an associated weight in NN. The goal of the NN is to find an optimum set of weights corresponding to minimized error value. BSA takes a population of such weight vectors as input. A weight vector is made by augmenting the weights of all layers. A representation of the weight vector for a 3 layer architecture is given in Fig. 3. Initialization: BSANN initializes weights of neural network as given in the following equation:
ΘGen;w A UðLoww ; Upw Þ
ð4Þ
Fig. 1. Data acquisition of one session.
Table 1 Class distribution of samples in training and testing data. Subject
Class
Training
Validation
Testing
1
L R W
2969 2020 2931
1000 665 943
264 368 368
2
L R W
3042 2206 2552
990 754 856
456 272 272
3
L R W
2589 2522 2605
851 886 835
392 304 304
Fig. 2. Neural network model.
Fig. 3. Representation of NN weights as an population vector for BSA.
Please cite this article as: S.K. Agarwal, et al., Classification of mental tasks from EEG data using backtracking search optimization based neural classifier, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.03.041i
S.K. Agarwal et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
4
Here, w¼ 1,2,3…,Nt and Gen ¼ 1; 2; …P, where Nt is the total number of weights which is also equal to the problem dimension, P is the population size, Gen is the generation number from the population and U(low,up) is the uniform distribution function that initializes weights randomly between the lower and upper constraints. Selection-1: This process randomly selects one amongst the former populations as the historical population. This historical population helps in calculating the search direction for the newly generated population. Initial historical population is calculated using the same methodology as given in Eq. (4). BSANN optionally selects one of its former population in the beginning of each iteration using ’if–then’ rule given in the following equation: if
a o b then oldΘ≔Θ j a; bA Uð0; 1Þ
ð5Þ
Here, ≔is update operator. Eq. (5) ensures that BSANN can select one of its previous populations as a guiding population for the new weights. Thus, BSANN has a memory. Once oldΘ is determined, Eq. (6) is used to change the order of individuals in oldΘ oldΘ ¼ randshuffleðoldΘÞ
ð6Þ
randshuffle() function represents random shuffling function. Mutation: This process adds or subtracts some offspring to the previously generated solution and generates “Mutant” using the following equation: Mutant ¼ Θ þ F:ðoldΘ ΘÞ
ð7Þ
Here, F is the control parameter that governs the amplitude of the search route which is (oldΘ Θ). An important thing to note here is that, F is the only control parameter in BSA algorithm. BSANN uses oldΘ in the calculation of new offspring so it uses its previous generation to guide its new solution set. This way, previous experience helps the new population to get a better solution. The value of F is chosen as 3nrandn as suggested [30]. Here randn represents a normally distributed pseudorandom number. Crossover: Crossover is the second step for reproduction of generation used by evolutionary algorithms. New population generation is guided by individuals with the best fitness value of previous generations. BSANN's crossover strategy consists of two steps. First step is the generation of a matrix map of size N t nP, where Nt is the total number of weights and P is the population size. Data type of “map” matrix is binary. If value of mapGen;w , where w A 1; 2; 3…; Nt and Gen A 1; 2; 3…; P is true, T gets updated as T Gen;w ≔ΘGen;w . The crossover strategy is discussed in Algorithm 2 where ⌈⌉ represents the ceiling function. The “mixrate” parameter controls the number of individuals to be mutate by using ⌈mixratenrndnN t ⌉, where rnd A Uð0; 1Þ. BSA's crossover process is much more complex thanthe ones used by other differential evolution (DE) algorithms. BSA defines a map using two predefined strategies. First one uses “mixrate” and randomly selects some weights from the dimension of length Nt. Second strategy chooses only one weight from the complete population. One out of both strategies is used randomly to set the map value to be true. Due to mutation, some of the individuals may fall outside the search space bounds. So, BSA uses a boundary control mechanism to confine the population within a given range. As discussed in Algorithm 2, its boundary control mechanism randomly chooses a point within search bounds for replacement. Algorithm 2. Pseudo code of BSANN. Require: NNCostFun,P,Nt,epoch,mixrate, low1:Nt , up1:Nt Ensure: globalminimum, globalminimizer Procedure BSANN ðNNCostFun; P; N t ; epoch; low; upÞ globalminimum ¼inf
for Gen from 1 to P do for w from 1 to Nt do
ΘGen;w ¼ rndnðupw loww Þ þ loww ▹ Initialization of Θ oldΘGen;w ¼ rndnðupw loww Þ þ loww ▹ Initialization of oldΘ end for end for for iteration from 1 to epoch do If a obj a; b A Uð0; 1Þ then ▹ Selection-1 oldΘ’Θ end if oldΘ’ ransuffle(oldΘ) ▹ random position change in oldΘ mutant’Θ þ 3nrandnðoldΘ ΘÞ ▹ Mutation map1:P;1:Nt ¼ false ▹ Initial map is a P by Nt matrix ▹ Crossover if c o dj c; d A Uð0; 1Þthen for Gen from 1 to P do mapGen;u1:⌈mixratenrndnN ⌉ ¼ truej u ¼ randsuffleð1; 2; 3…; Nt Þ t
end for else for Gen from 1 to P do mapGen;randiðNt Þ ¼ true end for end if T ’ mutant ▹ Generation of Trial population, T for Gen from 1 to P do for w from 1 to Nt do if mapGen;w ¼ false then T Gen;w ’ΘGen;w end if end for end for for Gen from 1 to P do Boundary Control Mechanism for w from 1 to w do if T Gen;w o loww J T Gen;w 4 upw then T Gen;w ’rnd:ðupw loww Þ þ loww end if end for end for fitness T ¼ NNCostFun(T) ▹ Selection 2 for Gen from 1 to P do if fitnessT Gen o fitnessΘGen then fitnessΘGen ’fitnessT Gen ΘGen ’T Gen end if end for fitnessΘbest ¼ minðfitnessΘÞj best A 1; 2; 3…; N if fitnessP best o globalminimum then globalminimum’fitnessΘbest globalminimizer’Θbest end if end for end procedure
Selection-2: BSA's selection-2 process is based on survival of the fittest. After every iteration, a trial population T is generated from its previous population Θ and Θ gets updated using greedy selection. Each individual from both populations is compared and the fittest among them is selected for the next iteration. Best individual from selected population is said to be global minimizer and the corresponding weight vector is chosen as the global minimum solution. BSANN's structure is discussed in Algorithm 2. Additionally a python library implementation is available at http://drrajeshku mar.wordpress.com/downloads/.
Please cite this article as: S.K. Agarwal, et al., Classification of mental tasks from EEG data using backtracking search optimization based neural classifier, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.03.041i
S.K. Agarwal et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
4. Results and discussion In this section description of the dataset used, the BSANN algorithm used for classifying the data and finally its working methodology is discussed. 4.1. Data acquisition The EEG Dataset has been provided by IDIAP Research Institute in the BCI Data Competition III [34]. The Apparatus used consisted of a Biosemi system using a cap integrated with 32 electrodes located at the standard position of International 10-20 system with a sampling rate of 512 Hz was used. Three normal subjects had to perform four non feedback sessions. Every subject performed one of the three tasks at a time(as told by instructor) – imagination of left hand movements, imagination of right hand movements and thinking of words beginning with the same random letter. The subjects sat in a normal chair with their arms relaxed and resting on their legs. The subject performed a given task for about 15 s and then switched randomly to another task at the operator's request. All four sessions for each subject were acquired on the same day, each session lasting for 4 min with 5–10 min breaks in between each session. The dataset was not split into trials because the tasks were being performed continuously without any breaks. Raw EEG Data was spatially filtered by means of a surface Laplacian [35] to ensure that we get signals corresponding to a specific neuronal cluster before the feature extraction stage. Diagrammatic representation of Data Acquisition stage can be observed in Fig. 1. 4.2. Feature extraction Frequency based features were derived from raw time series EEG data. Every 62.5 ms (16 times per second) the Power Spectral Density (computed by the Welch Periodogram Method) in the 8–30 Hz band was estimated over the last second of the EEG data with a frequency resolution of 2 Hz for the Eight Channels C3, Cz, C4, CP1, CP2, P3, Pz and P4. Finally, EEG Sample Data with 96 dimensions (8 channels n12 frequency components) was obtained. These features have been extracted by the data provider itself with no additional preprocessing done from our side. The data from all the 4 min trials was concatenated following which it was been shuffled. Following this the data was segmented into training and testing sets following a 75–25% rule. In these final training and testing sets the L class was present in 2936 training data and 1000 of the testing data. These details have also been added to the paper. Here, L corresponds to left hand movement imagination, R denotes right hand movement imagination and W denotes imagination of word generation. 4.3. BSA This paper is using BSA to optimize weights of a NN. BSANN is able to achieve optimum classification accuracy without any kind of feature extraction which is additionally done from our side. We have directly used the preprocessed data provided in the BCI Competition.
5
one is BSA's unique non-uniform crossover strategy which is much more complex than other crossover strategies used in many genetic algorithms. Although the use of historical population enables better exploration capability, it mitigates BSA's convergence speed. Manual setting of upper and lower limits of BSA leads to inefficient constraint handling which is a major shortcoming. In our future work, we will try to apply BSANN on constraint optimization problems. Many methods can be envisioned to set these limits such as the use of a penalty function. 4.3.2. Result summary Each result is evaluated using 4-fold cross-validation and tested from 1000 random samples of the data. Since BSANN has a single control parameter (mixrate), its performance is less sensitive to control parameters. Population size is chosen as 100. Increasing this number increases the running time without making a substantial difference to the accuracy.Mixrate was set to the value 1. Boundary Control is applied by setting weight value bounds between 10 and 10. The algorithm was run for 30,000 iterations to obtain the required result. A comparison of learning algorithm's accuracies has been compiled in Table 5. Confusion matrix for all three subjects has been tabulated in Tables 2–4. Confusion matrix gives statistical information about the algorithm. This matrix is important as it tells how good an algorithm is classifying a class from another. Beside accuracy, two statistical measure has been calculated. First one is Positive Predicted Value (PPV), which gives an idea about correctly measured outcomes of a class in comparison to overall cases of that class. Mathematically PPV is given in the following equation: PPV ¼
corrected predicted cases of an class Actual cases of that class
ð8Þ
Table 2 Statistical analysis of Subject 1. Predicted class
Actual class
Right Left Word PPV
Right
Left
Word
TPR
208 42 34
27 303 41
29 23 293
0.7878 0.8234 0.7962
0.7324
0.8167
0.8492
Table 3 Statistical analysis of Subject 2. Predicted class
Actual class
Right Left Word PPV
Right
Left
Word
TPR
295 70 62
52 160 4
109 42 206
0.6344 0.5882 0.7573
0.6909
0.7407
0.5770
Table 4 Statistical analysis of Subject 3. Predicted class
4.3.1. Description As discussed earlier BSANN is a derivative of Evolutionary Algorithms which reduces its error-rate by using old populations during mutation fitted on a Neural Network Architecture. There are two major factors which help BSANN obtain higher classification accuracy. First one is the single control parameter (mixrate), ensuring classification performance is not over-sensitive. Second
Actual class
Right Left Word PPV
Right
Left
Word
TPR
304 67 78
29 119 55
59 118 171
0.7755 0.3914 0.5625
0.6771
0.6771
0.4913
Please cite this article as: S.K. Agarwal, et al., Classification of mental tasks from EEG data using backtracking search optimization based neural classifier, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.03.041i
S.K. Agarwal et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
6
Table 5 Comparison of subject wise mental task classification accuracy. Algorithms
Feature extraction
Classification method
Subject 1
Subject 2
Subject 3
Average
Ensemble methods [42] DLMS [43] RESE [41] GANN [37] SCSP [45] HHMM [36] WCSP [45] LS-SVM [40] BPNN [37] RDA [37] ACSP [45] HHCRF [36] FDA [37] PSONN [37] SVM [37] PCA-ISONN [37] Galan [48] BCI-competition winner [48] DBDA [39] TD þ SVM [46] HMM-BPNN [43] Evolved Filters (SVM) [47] BSANN
N.A. N.A. Electrode selection þ PCA PCA SCSP N.A. WCSP Clustering technique PCA
k-NN, C4.5, DT, SVM Bayesian þ GMM þ LMS Bayesian þ GMM NN with GA SVM with RBF kernel HHMM. SVM with RBF kernel LS-SVM NN with BP
ACSP N.A.
SVM with RBF kernel HHCRF
PCA Band Power, AAR, FD PCA
NN with PSO SVM NN with IPSONN
CVT TD M-filter bank þ HMM Evolved filters N.A
DBDA þ MTTD SVM NN with BP SVM NN with BSA
70.59 69.23 68.75 69.32 63.23 79.05 64.66 68.19 76.02 78.08 67.70 94.58 76.03 75.98 77.85 78.31 79.60 79.60 79.60 80.8 81.52 79.97 80.32
48.85 48.97 56.41 60.32 55.54 61.58 56.91 64.77 65.89 68.83 68.10 70.17 69.36 69.78 66.36 70.27 70.31 70.31 70.30 74.6 73.48 75.11 66.03
40.92 45.80 44.82 44.4 55.97 34.40 55.84 52.12 51.14 52.72 59.55 32.11 51.61 53.83 53.44 56.46 56.02 56.02 56.02 52.2 55.72 57.76 59.34
53.45 54.67 56.66 58.01 58.25 58.34 59.14 61.69 64.34 64.91 65.12 65.62 65.67 66.33 65.90 68.35 68.64 68.65 65.67 69.24 70.24 70.95 68.56
Second measure is a True Positive Rate (TPR), which given an idea about correctly measured outcomes of a class in comparison to overall prediction of that class. Mathematically, TPR is given in the following equation: TPR ¼
corrected predicted cases of an class total prediction of that class
Table 6 Cross-validation results.
ð9Þ
4.3.3. Performance comparison Table 5 highlights numerous algorithms which have been applied to motor imagery data based on different feature extraction and classifier algorithms to achieve highest accuracy. It can be inferred that BSANN outshines other applicable methods, giving competitive testing results compared to the winner of the BCI competition. Table 6 highlights the results obtained by 4-fold cross validation applied on the complete data and Table 7 shows the classification accuracy obtained by using samples from the testing set. We can see high performances for all the three subjects with minimal standard deviation which is always between 0 and 1 in cross-validation accuracy. This further establishes that BSANN is not overly sensitive to the training data while simultaneously providing high accuracy.
Subjects
Avg.%
Std.
Subject 1 Subject 2 Subject 3
84.43 72.0 64.35
0.61 0.78 0.95
Table 7 Testing results. Subjects
Avg. %
Subject 1 Subject 2 Subject 3
80.32 66.03 59.34
architecture which will possibly improve computation speeds by 20–100 times. Currently the code released in python2.7 allows multicore CPU computations in OVA-BSANN significantly improving training speed. Our future work will also include the implementation of this algorithm for online BCI, use of feature extraction methods such as PCA and Independent Component Analysis (ICA) which might give better performance.
5. Conclusion This paper centres its discussion on the classification of three mental tasks including right or left hand movement imagination and word generation. An improvement of BSA for neural network as BSANN has been proposed to optimize classification performance of the neural network. Experimental results show that classification accuracy of subject 1 is 80.32%, for Subject 2 is 66.03% and for Subject 3 is 59.34%. High classification accuracy obtained by BSANN can lead the way to use people's cognitive abilities to restore their communication. This can be of great importance to a person with motor disabilities. BSANN, being not only an evolutionary genetic algorithm but also based on a multilayer neural network has a relatively slow speed of convergence. This makes speed a significant bottleneck for the purpose of online learning. Hence currently work is ongoing in reorganizing the BSANN algorithm along a GPU architecture instead of a CPU
References [1] 〈http://www.ninds.nih.gov/disorders/motor_neuron_diseases/detail_motor_ neuron_diseases.htm〉. [2] N. Birbaumer, L.G. Cohen, Brain–computer interfaces: communication and restoration of movement in paralysis, J. Physiol. 579 (3) (2007) 621–636. [3] Ubeda, Andres, Eduardo Ianez, Jose M. Azorin, Jose M. Sabater, Eduardo Fernandez, Classification method for BCIs based the correlation of EEG maps, Neurocomputing 114 (2012) 98–106. [4] A. Nijholt, B. Reuderink, D. Oude Bos, Turning shortcomings into challenges: brain–computer interfaces for games, in: O. Akan, P. Bellavista, Cao (Eds.), Intelligent Technologies for Interactive Entertainment, 9, Springer, Berlin, Germany, 2009, pp. 153–168. [5] D.M. Taylor, S.I.H. Tillery, A.B. Schwartz, Direct cortical control of 3D neuroprosthetic devices, Science 296 (5574) (2002) 1829–1832. [6] J. Wessberg, C.R. Stambaugh, J.D. Kralik, P.D. Beck, M. Laubach, J.K. Chapin, J. Kim, S.J. Biggs, M.A. Srinivasan, M.A. Nicolelis, Real-time prediction of hand trajectory by ensembles of cortical neurons in primates, Nature 408 (6810) (2000) 361–365.
Please cite this article as: S.K. Agarwal, et al., Classification of mental tasks from EEG data using backtracking search optimization based neural classifier, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.03.041i
S.K. Agarwal et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ [7] M.A.L. Nicolelis, Actions from thoughts, Nature 409 (2001) 403–407. [8] T. Fitzpatrick, Teenager Moves Video Icons Just by Imagination, Washington University in St. Louis. Available at: 〈http://news.wustl.edu/news/Pages/7800. aspx〉(accessed 21.01.14). [9] J. Mellinger, G. Schalk, C. Braun, H. Preissl, W. Rosenstiel, N. Birbaumer, A. Kübler, An MEG-based brain–computer interface, Neuroimage 36 (3) (2007) 581–593. [10] R. Sitaram, A. Caria, R. Veit, T. Gaber, G. Rota, A. Kuebler, N. Birbaumer, fMRI brain–computer interface: a tool for neuroscientific research and treatment, Comput. Intell. Neurosci. (2007), http://dx.doi.org/10.1155/2007/25487 254871-25487-10. [11] S.M. Coyle, T.E. Ward, C.M. Markham, Brain–computer interface using a simplified functional near-infrared spectroscopy system, J. Neural Eng. 4 (2007) 219–226. [12] C.W. Anderson, E.A. Stolz, S. Shamsunder, Multivariate autoregressive models for classification of spontaneous electroencephalogram during mental tasks, IEEE Trans. Biomed. Eng. 45 (3) (1998) 277–286. [13] E.C. Leuthardt, et al., A brain computer interface using electrocorticographic signals in humans, J. Neural Eng. 1 (2004) 63–71. [14] R. Palaniappan, P. Raveendran, S. Nishida, N. Saiwaki, A new brain–computer interface design using fuzzy ARTMAP, IEEE Trans. Neural Syst. Rehabil. Eng. 10 (September (3)) (2002) 140–148. [15] J.R. Wolpaw, N. Birbaumer, W.J. Hectderks, D.J. McFarland, P.H. Pecleham, G. Schalk, E. Donchin, L.A. Quatrano, C.J. Robinson, T.M. Vaughan, Brain– computer interface technology: a review of the first international meeting, IEEE Trans. Rehabil. Eng. 8(June (2)) (2000) 164–173. [16] A. Bashashati, M. Fatourechi, R. Ward, G. Birch, A survey of signal processing algorithms in brain–computer interfaces based on electrical brain signals, J. Neural Eng. 4 (2007) R32–R57. [17] A. Kubler, K.R. Muller, Toward Brain–Computer Interfacing, MIT Press, Cambridge, MA, 2007, Chapter 1. [18] C.M. Bishop, Improving the generalization properties of radial basis function neural networks, Neural Comput. 3 (1991) 579–588. [19] del R. Millan, Josep Mourino, Marco Franze, Febo Cincotti, Markus Varsta, Jukka Heikkonen, Fabio Babiloni, A local neural classifier for the recognition of EEG patterns associated to mental tasks, IEEE Trans. Neural Netw. 13 (3) (2002) 678–686. [20] P. Shenoy, R.P.N. Rao, Dynamic Bayesian Networks for Brain–Computer Interfaces. NIPS 2004 (Advances in NIPS 17, pp. 1265–1272, 2005). [21] F. Lotte, M. Congedo, A. Le cuyer, F. Lamarche, B. Arnaldi, A review of classification algorithms for EEG-based brain–computer interfaces, J. Neural Eng. 4 (2007) 1–13. [22] D. Garrett, A.P. David, C.W. Anderson, M.H. Thaut, Comparison of linear, nonlinear, and feature selection methods for EEG signal classification, IEEE Trans. Rehab Eng 11 (2) (2003) 141–144. [23] C. Guger, A. Schlo gl, C. Neuper, D. Walterspacher, T. Strein, G. Pfurtscheller, Rapid prototyping of an EEG-based brain–computer interface (BCI), IEEE Trans. Rehab. Eng. 9 (2001) 49–58. [24] G. Pfurtscheller, J. Kalcher, C. Neuper, D. Flotzinger, M. Pregenzer, online EEG classification during externally-placed hand movements using a neural network-based classifier, Electroen. Clin. Neurophysiol. 99 (1996) 416–425. [25] J.R. Millan, J. Mourino, M. Franze, F. Cincotti, M. Varsta, J. Heikkonen, F. Babiloni, A local neural classifier for the recognition of EEG patterns associated to mental tasks, IEEE Trans. Neural Netw. 13 (3) (2002). [26] Zhiwei, Li, Shen Minfen. Classification of mental task EEG signals using wavelet packet entropy and SVM, in: Eighth International Conference on Electronic Measurement and Instruments, 2007, ICEMI'07, IEEE, Xian, China, 2007, pp. 3-906. [27] K.I. Diamantaras, S.Y. Kung, Principal Component Neural Networks: Theory and Applications, Wiely, Inc., New York, 1996. [28] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, Wiely, Inc., New York, 2001. [29] C. Lin, M. Hsieh, Classification of mental task from EEG data using neural networks based on particle swarm optimization, Neurocomputing 72 (2009) 1121–1130. [30] Pinar Civicioglu, Backtracking Search Optimization Algorithm for numerical optimization problems, Appl. Math. Comput. 219 (15) (2013) 8121–8144. [31] Branke, Jurgen. Evolutionary algorithms for neural network design and training, in: Proceedings of the First Nordic Workshop on Genetic Algorithms and its Applications, 1995. [32] Peter J. Angeline, Gregory M. Saunders, B. Pollack Jordan, An evolutionary algorithm that constructs recurrent neural networks, IEEE Trans. Neural Netw. 5 (1) (1994) 54–65. [33] Brest, Janez, Saso Greiner, Borko Boskovic, Viljem Zumer, Self-adapting control parameters in differential evolution: a comparative study on numerical benchmark problems, IEEE Trans. Evol. Comput. 10 (6) (2006) 646–657. [34] J. del R. Millán, On the need for on-line learning in brain–computer interfaces, in: Proceedings of the International Joint Conference on Neural Networks, 2004. [35] D.J. McFarland, L.M. McCane, S.V. David, J.R. Wolpaw, Spatial filter selection for EEG-based communication, Electroen. Clin. Neurophysiol. 103 (1997) 386–394. [36] T. Sugiura, N. Goto, A. Hayashi, A discriminative model corresponding to hierarchical HMMs, in: Proceedings of the Eighth International Conference on Intelligent Data Engineering and Automated Learning, Springer-Verlag, Berlin, Germany, 2007, pp. 375–384.
7
[37] C.-J. Lin, M.-H. Hsieh, Classification of mental task from EEG data using neural networks based on particle swarm optimization, Neurocomputing 72 (4–6) (2009) 1121–1130. [38] J.F. Delgado Saa, M. Cetin, Discriminative methods for classification of asynchronous imaginary motor tasks from EEG data, IEEE Trans. Neural Syst. Rehabil. Eng. 21 (September (5)) (2013) 716–724. [39] Galán, Ferran, Francesc Oliva, Joan Guardia, Using mental tasks transitions detection to improve spontaneous mental activity classification, Med. Biol. Eng. Comput. 45 (6) (2007) 603–609. [40] Yan Li, Peng Paul Wen, Clustering technique-based least square support vector machine for EEG signal classification, Comput. Methods Prog. Biomed. 104 (3) (2011) 358–372. [41] S. Sun, C. Zhang, Yue Lu, The random electrode selection ensemble for EEG signal, Pattern Recognit. 41 (2008) 1663–1675. [42] S. Sun, C. Zhang, D. Zhang, An experimental evaluation of ensemble methods for EEG signal classification, Pattern Recognit. Lett. 28 (2007) 2157–2163. [43] S. Sun, C. Zhang, Learning on-line classification via decorrelated LMS algorithm: application to brain computer interfaces, in: A. Hoffman, H. Motoda, T. Scheffer (Eds.), DS 2005 (2005), Lecture Notes in Artificial Intelligence, vol. 3735, pp. 215–226. [44] Nasehi, Saadat, Hossein Pourghassem, Mental task classification based on HMM and BPNN, in: 2013 International Conference on Communication Systems and Network Technologies (CSNT), IEEE, Gwalior, India, 2013, pp. 210–214. [45] Shiliang Sun, Changshui Zhang, Adaptive feature extraction for EEG signal classification, Med. Biol. Eng. Comput. 44 (10) (2006) 931–935. [46] R. Aler, I. Galvan, J. Valls, Transition detection for brain computer interface classification, Presented at the International Joint Conference on Biomedical Engineering Systems and Technologies: BIOSTEC, Porto, Portugal, 2009. [47] R. Aler, I. Galván, J. Valls, Evolving spatial and frequency selection filters for brain–computer interfaces, in: 2010 IEEE World Congress on Computational Intelligence WCCI 2010, Barcelona, Spain, 2010, pp. 1–7. [48] BCI Competition III Final Results, 2005, Online Available: 〈http://www.bbci.de/ competition/iii/results〉.
Saurabh Kumar Agarwal received the B.Tech. degree in Computer Science and Engineering from Malaviya National Institute of Technology in 2014. He is currently working as a research engineer in C DOT Delhi. His research interests includes neural network, optimization algorithms and brain computer interface. He has published number of papers in international journals and conferences.
Saatvik Shah is currently pursuing his B.Tech. in Computer Engineering from Malaviya National Institute of Technology, Jaipur, India. His research interests include neural networks, evolutionary algorithms, brain machine interfaces and big data analytics.
Rajesh Kumar received his B.Tech. degree from National Institute of Technology (NIT), Kurukshetra, India, in 1994, M.E. from Malaviya National Institute of Technology (MNIT), Jaipur, India, in 1997 and Ph.D. degree from the University of Rajasthan, India, in 2005. Since 1995, he has been a Faculty Member in the Department of Electrical Engineering, MNIT, Jaipur, where he is serving as an Associate Professor. He was Post Doctorate Research Fellow in the Department of Electrical and Computer Engineering at the National University of Singapore (NUS), Singapore, from 2009 to 2011. His field of interest includes theory and practice of machine learning, intelligent systems, evolutionary algorithms, bio and nature inspired algorithms, fuzzy and neural methodologies, applications of AI to image processing and bioinformatics. Dr. Kumar is a Senior Member IEEE, Member IE (INDIA), Fellow Member IETE, Senior Member IEANG and Life Member ISTE.
Please cite this article as: S.K. Agarwal, et al., Classification of mental tasks from EEG data using backtracking search optimization based neural classifier, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.03.041i