Pattern recognition for electroencephalographic signals based on continuous neural networks

Pattern recognition for electroencephalographic signals based on continuous neural networks

Accepted Manuscript Pattern recognition for electroencephalographic signals based on continuous neural networks M. Alfaro-Ponce, A. Arg¨uelles, I. Cha...

2MB Sizes 2 Downloads 150 Views

Accepted Manuscript Pattern recognition for electroencephalographic signals based on continuous neural networks M. Alfaro-Ponce, A. Arg¨uelles, I. Chairez PII: DOI: Reference:

S0893-6080(16)00045-9 http://dx.doi.org/10.1016/j.neunet.2016.03.004 NN 3601

To appear in:

Neural Networks

Received date: 13 November 2015 Revised date: 9 March 2016 Accepted date: 11 March 2016 Please cite this article as: Alfaro-Ponce, M., Arg¨uelles, A., & Chairez, I. Pattern recognition for electroencephalographic signals based on continuous neural networks. Neural Networks (2016), http://dx.doi.org/10.1016/j.neunet.2016.03.004 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Pattern recognition for electroencephalographic signals based on continuous neural networks M. Alfaro-Ponce a , A. Arg¨uellesb , and I. Chairezc a

c

Escuela Superior de Tizayuca, Universidad Autonoma del Estado de Hidalgo Tizayuca, Hidalgo, Mexico b Centro de Investigacion en Computacion, Instituto Politecnico Nacional Mexico City, Mexico Unidad Profesional Interdisciplinaria de Biotecnologia, Instituto Politecnico Nacional Mexico City, Mexico

Abstract This study reports the design and implementation of a pattern recognition algorithm to classify electroencephalographic (EEG) signals based on artificial neural networks (NN) described by ordinary differential equations (ODEs). The training method for this kind of continuous NN (CNN) was developed according to the Lyapunov theory stability analysis. A parallel structure with fixed weights was proposed to perform the classification stage. The pattern recognition efficiency was validated by two methods, a generalization-regularization and a k-fold cross validation (k=5). The classifier was applied on two different databases. The first one was made up by signals collected from patients suffering of epilepsy and it is divided in five different classes. The second database was made up by 90 single EEG trials, divided in three classes. Each class corresponds to a different visual evoked potential. The pattern recognition algorithm achieved a maximum correct classification percentage of 97.2% using the information of the entire database. This value was similar to some results previously reported when this database was used for testing pattern classification. However, these results were obtained when only two classes were considered for the testing. The result reported in this study used the whole set of signals (five different classes). In comparison with similar pattern recognition methods that even considered less number of classes, the proposed CNN proved to achieve the same or even better correct classification results. Keywords: Signal classifier, Continuous Neural Networks, Electroencephalographic signals, Pattern recognition

1. Introduction Automatic detection and classification of EEG recordings have become important fields of research for the development of brain computer interfaces, security, interactive games and medical diagnosis systems (Bashashati et al., 2007; Birbaumer, 2006; Goel et al., 1996; Hwang et al., 2013). As a result, many algorithms for the EEG classification have been proposed, but most of these algorithms are limited by the fact that they are able to classify just one characteristic of the many that are codified in the EEG signals (Nicolas-Alonso and Gomez-Gil, 2012). Nevertheless that different methods have been proposed to classify patterns that appear in the EEG signals as, for example, spindles and K-complexes for sleep staging, epileptiform patterns for epilepsy analysis, among others, all of them have in common an automatic system that uses some characteristics from the signals. These characteristics are extracted from the EEG recording. In general, these pattern recognition algorithms are based on different machine learning techniques such as autoregresive modeling (Ge et al., 2007), Markov chains (Boussemarta and Cummings, 2011), self-organizing maps (Allinson and Yin, 1999), fuzzy c-means clustering techniques (Roy et al., 2014), neural networks (Kannathal et al., 2007) and coefficients of the wavelet transform (Chang et al., 2014), among others. Most of these algorithms implement different pre-treatment algorithms to construct the pattern vector that is to be evaluated in the classifier. This strategy can omit certain characteristics of the information that may be relevant to the characterization of the signal. Preprint submitted to Neural Networks

March 30, 2016

This omission can be a consequence of the preliminary manipulation of EEG signals before they are evaluated by the classifier (Riaz et al., 2015). Moreover, the continuous nature of the EEG signal is left out in the classifier structure (Akareddy and Kulkarni, 2013). Although several pattern recognition schemes have been applied over the years on EEG signals, static NN (SNN) (Weng and Khorasani, 1996) based pattern recognition classifiers have gained significant prominence with respect to some other alternatives (Gotman and Wang, 1991; Lu et al., 2004). SNN have been successfully employed to determine complex, non- linear, multidimensional mathematical relationships between noisy uncertain sets of data with considerable dissimilar natures (Dunne, 2006). Nowadays, NN based pattern recognition solutions have been frequently applied to classify EEG signals, especially in the domain of function approximation (Alotaiby et al., 2014), pattern recognition (Kasabova et al., 2013), automated medical diagnostic systems (Amato et al., 2013), decision support systems (Ubeyli, 2009), time series prediction (Coyle et al., 2005), signal processing (Sanei and Chambers, 2013), image processing. (Haykin, 1999; Bose and Liang, 1996; Hassoun, 1995), wavelets (Chen, 2014) and some others. The success of NN in pattern recognition is a consequence of their capability to approximate nonlinear relationships between the input and output pairs (Cybenko, 1989). Therefore, the method selected to adjust the weights in the NN structure plays a key role on forcing a higher efficiency on the classification task. Approximation problems can be solved by employing either supervised learning (Nait-Ali, 2009) (where the weights and biases of the SNN are learned in presence of training data set) or unsupervised learning (where inputs are classified into different clusters in a multidimensional space, in absence of targets’ training data). Today, supervised learning continues as the most preferred option when pattern recognition algorithms are employed. However, most of these algorithms were executed using patterns formed by vectors of characteristics. These vectors were obtained as solution of preliminary signal processing algorithms that usually do not take into account the continuous nature of the EEG signal. Therefore, relevant information can be lost during this preliminary process that can simplify the pattern recognition algorithm. Regarding the EEG signal pattern classification, several types of SNNs have been proposed (Cheng-Jian and MingHua, 2009). Most of these solutions employed a supervised learning mode which implies high levels of computational effort. Additionally, the complexity and necessity of performing preliminary treatment demand an extra processing effort that may compromise the application of the pattern classifier to obtain an on-line pattern recognition solution (Omerhodzic et al., 2013). This study proposes an alternative method to solve the EEG signal pattern recognition problem. A class of CNN (Poznyak et al., 2001) is used to represent the relationship between the EEG signal and its particular pattern class represented by a sigmoid type of function. The CNN concept is defined by the approximation provided by NN to the right hand side of ODEs. The CNN structure preserves the highly parallel structure that characterizes many of the usual pattern recognition forms. By virtue of its parallel distribution, the proposed CNN is tolerant under the presence of faults and external noises, able to generalize the input-output relationships well and capable of solving nonlinear approximation problems (Benvenuto and Piazza, 1992). The pattern recognition method proposed in this study was applied on the signals contained in the database taken from (NA, 2012). The classification results obtained in this study were compared to the ones obtained by other researchers (Nigam, 2004; Guo et al., 2009; Tzallas et al., 2007; Guo et al., 2010), who applied signal classifiers based on SNN. This article is organized as follows: Section 2 describes the mathematical structure of the CNN as a classifier and its parallel implementation for the classification task, this section also describes the training and validation process. Section 3 contains the description of the two databases that were employed to test the CNN performance. Section 4 describes the set of two databases used to evaluate the classification performance achieved by the algorithm proposed here. Section 5 details the simulation results obtained from the implementation of the CNN to the two databases. Section 6 closes the article with some conclusions and discussions. 2. CNN EEG pattern classification There is a general method that must be applied including the stages of training, validation and testing (despite the class of NN used to perform the signal classification). The first stage on the EEG signal classification requires defining 2

a set of targets associated to the specific class of EEG. Therefore, if the EEG signal is considered as the input number j in the class i, ui, j to the NN, then the output, namely xi corresponds to the specific class where the signal belongs among the L available classes. Then, the state xi corresponds to the concept of target. For this study, this target was represented as a sigmoid function described by: xl (v) =

al 1 + e−cv

(1)

where the variable xl represents the target that belongs to class l (l = 1, ..., L). The positive constant al was modified according to the class where the particular EEG signal belongs. These constants served to modify the amplitude of the sigmoid function and then to characterize each class. The positive constant c was chosen to regulate the slope of the sigmoid function. One may notice that different functions could be selected to define the characteristic of a class but according to the Cybenko’s seminal paper (Cybenko, 1989), this selection (sigmoid function) seems to be more natural. The training process consisted of comparing the output of the NN with the target xl (v) when they both are affected by the same EEG signal. The training process consisted of executing the evaluation of the NN with a percentage of PL all EGG signals ulr (v) that represents the r signal in the class l (r ∈ [1, Nl ] , l=1 Nl = N , N is the number of signals of the entire database) signal in the class l. Then when the EEG signal ulr+1 (v) is executed, the set of weights produced by this training step W ∗,l is used a part of NN in this training stage. Once the whole set of N signals selected to perform the training process has been tested, L different sets of weights WN∗,ll have been produced. If the training process has been correctly executed, the aforementioned weights are recovered as part of a set of L non-adjustable NN with the same structure as the one used during the training. This part of the process is named the validation stage. Based on the well-known generalization-regularization and a k-cross validation methods, a percentage of the whole set of EEG signals ulr (v) is used to evaluate the output of the set of L NN with the ∗,l ∗,l corresponding set of W1,N and W2,N . At this part of the validation, all the L NN are evaluated in parallel. The output l l l of each NN named NN is compared with the corresponding value al . The mean square error al − xlr is calculated over the period of time corresponding to the length of the EEG signal, that is J

T,l

=T

−1

ZT  2 al − xlr (t) dt

t=0

One must notice that the length of all the testing signals was kept constant. The selection of T should be done according to the nature of signal. This is still a matter of interest and many studies have been proposed during the last 50 years. In this particular case, the window size was selected in agreement to the results presented in (Kuncheva and Zliobaite, 2009). The minimum value of this set of mean square (LMS) errors (moving and varying) was the indicator of the class where the EEG signal tested at that moment belongs. The validation state considers that all EEG signals used in this part of the analysis were previously used in the training stage. However, during the testing stage, a set of signals that have never been presented to the classifier is considered. The effect of noises and artifacts in classification results can be neglected by the averaging process developed as part of the training process. In summary, the classifier structure proposed for this work follows the structure presented in Fig. 1. The first stage is the training process; here the characteristic weights for the CNN are developed for each class. In the next stage, a parallel structure is developed; this structure is made up with CNNs and their corresponding fixed weights for each class. This parallel structure uses an EEG signal as input. The signal is evaluated in parallel by the structure and from each CNNs in the parallel structure, the LMS error is obtained. Then, the one with the smallest error is the one corresponding to the selected class. 3. The CNN classifier structure As mentioned before, CNN structures are employed for the development of the classifier. CNN have emerged as a powerful tool to extent the classification capabilities of NN as a class of adaptive systems. To the authors’ knowledge, 3

Figure 1: Structure of pattern classifier (training/validation). This figure is divided in two main sections; The first one is the training, here in the CNNs the EEG signals is named ui, j , after running all the training sets per class the CNNs fix their weights. The second part is the validation. In this paper, two validation techniques were employed: k-fold cross validation and generalization-regularization method. Both methods used the LMS error of the parallel CNNs structure to determine the class of the signal.

CNN have never been used as EEG pattern classifiers. 3.1. The concept of signal classifier as a differential algorithm The signal classifier information obtained from the EEG signals collected during medical examinations can be interpreted as an absolute continuous signal namely x. Therefore, it is possible to represent the brain classification response as a solution of an ODE as follows     d i (2) x (t) = f i xi (t) , ui, j (t) + ξ xi (t) , t dt

Here f i (·, ·) represents the specific pattern classifier response enforced by an input EEG signal ui, j ∈ Rm that belongs to the class i. The term ξ x j (t) , t symbolizes the external perturbations and the uncertainties produced by the EEG signals measured at the same time when the input is presented as stimulus to the pattern classifier method. 3.2. Neural network approximation of the continuous signal classifier The EEG signals are assumed to be continuous with respect to the time. In this work the signals are represented as ui, j ∈ Rm . As a result the classifier can be represented as     d i x (t) = Axi (t) + W10,i Ψ1 (xi (t)) + W20,i Ψ2 (xi (t))ui, j (t) + f˜i, j xi (t) , ui, j (t) + ξ xi (t), t dt (3) x j (0) fixed and bound 4

where A ∈ Rn×n , W10,i ∈ Rn×l1 , W20,i ∈ Rn×l2 are the constant matrices that represent the relationship between the EEG signals represented by ui, j corresponding to the i − th class. The vector xi ∈ Rn defines a specific target trajectory that corresponds to the specific i-th class where the particular EEG signal represented by ui, j belongs. Notice that despite the specific EEG signal in the class i, all of them must produce the same set of specific NN parameters (weights). The functions Ψ1 : Rn → Rl1 and Ψ2 : Rn → Rl2 ×m are the set of activation functions used to generate the CNN structure.  The approximation error of EEG signal is defined by f˜i, j xi , ui, j : Rn+m → Rn , this error is associated to the finite   number of activation functions used in the network structure. The term ξ xi (t) , t represents the effect of uncertainties that can affect the correct classification of signal ui, j into the class xi . The constant matrices W10,i , W20,i represent the CNN weights (3) that characterize the relationship between the class and the particular EEG signal. By assumption, these matrices are assumed to be bound in the following manner W p0,i (Λip )−1 [W p0,i ]⊤ ≤ W p+,i , p = 1, 2 h i⊤ Λip = Λip ∈ Rn×n , Λip > 0

According to the supervised training method, all the EEG signals belonging to a specific class i should be presented to the so-called time varying classifier Therefore, the training process consists of designing an approximate algorithm that can obtain the same dynamic response between the input ui, j and the class xi . This approximation satisfies the structure given by d i xˆ (t) = A xˆi (t) + W1i (t) Ψ1 ( xˆi (t)) + W2i (t) Ψ2 ( xˆi (t))ui, j (t) dt (4) xˆi (0) fixed and bound where W1i ∈ Rn×l1 , W2i ∈ Rn×l2 are the parameters to be adjusted in order to achieve the same relationship between ui, j and xi . The introduction of this classifier has been proposed to define a well-posed approximation problem based on NN. One may notice that this condition is mandatory in order to obtain a feasible and stable training algorithm. The class obtained by the classifier is defined by the vector xˆi ∈ Rn . The EEG signal pattern recognition algorithm proposed in this work is based on a set of parallel CNNs with the same CNNs structure but with different W1i and W2i . These set of weights are obtained after the application of the training method. Based on the supervised learning scheme, L sets of weights are produced. From the results developed in (Poznyak et al., 2001), the following set of matrix differential equations correspond to the so-called weight updating (learning) law and they are described by d i W (t) = −k1i Pi ∆i (t) Ψ⊤1 ( xˆi (t)), W1i (0) = W1i−1 (T ) dt 1 h i⊤ d i W2 (t) = −k2i Pi ∆i (t) ui, j (t) Ψ⊤2 ( xˆi (t)), W2i (0) = W2i−1 (T ) dt

(5)

where W1j−1 (T ) and W2j−1 (T ) are the final weights obtained when the input ui, j−1 was used for the training stage. The identification errors used during the training procedure are defined by the time varying vectors ∆i ∈ Rn , ∆i = x − xˆi . The constant parameters kip and p = 1, 2 are used to adjust the learning rates. The matrices Pi are positive definite solutions of Riccati equations defined by P i A + A ⊤ P i + P i R i P i + Qi = 0 Ri = W1+,i + W2+,i + Λia + Λib Qi = ( f¯1i + λmax (2Λib )l1 )Inxn + Q10 where Λia ,Λib ∈ Rn×n are appropriate positive definite matrices too. The positive constant f¯1i is used to characterize the

   

2 effect of modeling errors as well as the presence of perturbations in the following sense

f˜i, j xi (t) , ui, j (t) + ξ x j (t), t

≤ f¯i which is valid ∀t ≥ 0. 1

5

Remark 1. The set of CNN used to perform the on-line signal classification satisfies the structure of the nonparametric pattern classifier proposed in (Poznyak et al., 2001). Therefore, the set of training algorithms represented by (5) can be obtained by the Lyapunov based stability analysis based on the following Lyapunov functions 2  h i⊤ X   h i⊤ ˜ pi = W p0,i − W pi ˜ 1i , W ˜ 2i = ∆i Pi ∆i + ˜ pi , W ˜ pi W V i ∆i , W kip tr W

(6)

p=1

A relatively standard method leads to prove that

2 lim sup

∆i (t)

≤ αi t→∞

i

where α is a positive scalar that is proportional to the power of noises and uncertainties and to the inverse of the number of neurons in the network. Therefore, this method leads obtaining to obtain a formal characterization of the training solution and proves that under some conditions, under the selected training method, the approximation (4) can represent exactly the relationship between the EEG signal ui, j and its class i. Remark 2. If the method to adjust the weights are modified to   d i W1 (t) = −k1i Pi ∆i (t) Ψ⊤1 ( xˆi (t)) − δi1 W1i − W1i (0) , W1i (0) = W1j−1 (T ) dt h i⊤   d i W2 (t) = −k2i Pi ∆i (t) ui, j (t) Ψ⊤2 ( xˆi (t)) − δi2 W2i − W2i (0) , W2i (0) = W2i−1 (T ) dt with δi1 and δi2 two positive scalar constants, then it can be proven that

2 lim supt→∞

∆i (t)

≤ αi  h i⊤ i ˜ pi ≤ α ˜ pi W lim supt→∞ tr W i kp

To prove this argument, the same Lyapunov candidate function presented in (6) can be used. The proof can be completed according to the results presented in (Chairez, 2009). 3.3. Validation The validation process for the CNN was divided in two validation stages, the first one used the so-called generalization technique (Urolagin et al., 2012) and the second one used the k-fold validation method (k=5). In NNs, the generalization property is the ability of the NN to handle unseen patterns (Baum and Haussler, 1989). For the generalization, the full database of EEG trials were split in two parts, a 60% for the training of the NNs and 40% for the testing or validation of the NNs. This validation method was used in agreement to the usual manner to characterize the efficiency of a pattern classifier based on NN. However, a second option was considered where a more demanding validation technique was employed, the k-fold cross-validation (k-cv) scheme. In order to validate the capacity of the CNN to classify the EEG signals, the regular k-cv method was implemented. Since the k-cv process is not an exhaustive cross-validation method, the data was split into (80/20) training and validation samples with some predefined ratio. Validation samples were used to estimate the prediction error. In the k-cv method, a set of samples S n is uniformly and randomly partitioned into k folds with similar sizes ρ = {ρ1 , ..., ρk }. Let T i = S n \ρi be the complement data set of ρi . Then, all the samples belonging to T i are used as training data. This particular cross-validation procedure is repeated k times (the number of folds), with each of the k sub-samples used exactly once as the validation data. The complete set of k results obtained from the folds is averaged to produce a single estimation. Indeed, the cross-validation algorithm G (·) induces a classifier from T i , defined by ψi = G (T i ) , where the prediction error based on ρi is estimated. So the general prediction error ǫˆk (S n , ρ) based on the classifier ψ = G (S n ) that uses the entire set of samples is estimated as follows ǫˆk (S n , ρ) =

k 1X X 1 (c, ψi (x)) n i=1 (x,c)∈ρ i

6

where n is a multiple of k, 1 (i, j) = 1 if i , j and zero otherwise. This error is the average of errors committed by ψi in their corresponding partitions ρi . In particular, the so-called 5-fold cross validation was used to perform the validation procedure of the signal classifier proposed in this study. An additional general validation measure was performed by using a modified version of Signal to Noise plus Distortion Ratio (SNDR) (Chen, 2014). This measure was attained following the next mathematical structure for each class   t=nT h  i−1 Z

 j  2 

x j (t)

dt S NDR = 20 log  J (t = nT ) (7)   t=(n−1)T

4. Databases evaluated by the classifier

4.1. Database I Database I was taken from (NA, 2012). This data collection contains 500 EEG recordings The EEG data were acquired using a Neurofile NT digital video EEG system with 128 channels, 256 Hz sampling rate, and a 16 bit analogue-to-digital converter. Notch or band pass filters are not applied during the recording of the signals. So, each signal is considered to be raw. The database is divided in five classes. Each class contains 100 samples. Sets Z and O are signals recorded from the EEG surface with volunteers relaxed in awaken mode with eyes open and closed respectively. Set N was taken from the hippocampus formation of the opposite hemisphere of the brain, set F was recorded from the epileptogenic zone, while set S only contained seizure activity (Polat and Gnes, 2007). Unfortunately, the Database I has been discontinued and it is not further available for download as it has been superseded by the new European Epilepsy Database. 4.2. Database II Database II was made up of 3 classes, each class contains 30 EEG recordings that were acquired by an EMOTIV R

device (Inc., 2014). Each class represents the visual evoked potentials (VEP) generated by the stimulation with one of the three patterns represented in Figure 3. Even though, the EMOTIV R device provides an interface that allows the recording of the VEP data, it was not suitable to perform the visual stimulation tests. The main problem with the EMOTIV software is the fact that visual stimulation cannot be synchronized with the acquisition process. So, it was not possible to start the stimulation and the recording of the EEG signals at the same time. In order to fulfill this requirement, an independent software was developed to control the recording of EEG data and the visual stimulation. The software used the libraries that control the EMOTIV drivers. Based on the functions included in the aforementioned library, the user is able to control acquisition parameters such as: recording time, start of stimulation, quality of the EEG electrode contact and the file format of signals recorded. Until now a total of six volunteers helped in developing this database, each volunteer produced a set of five trials per class. The volunteers were healthy adults between 25 and 35 years of age. Subjects with regular intelligence and without mental disorders were included in this study. All data acquisition experiments were performed in agreement with medically ethical standards. The subjects were not sleep-deprived. During the period of one week before the experiment, they had no deviations from their usual circadian cycle, and they did not take any medicine or alcohol. All recordings were band pass filtered (fifth order) with a bandwidth from 0.2 and 45 Hz. Finally, a total of 90 records integrated the database. A set of 14 parallel acquisition channels was obtained during each trial with a sampling frequency of 256 Hz. These channels correspond to the AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4 channels in the international 10/20 EEG acquisition system. Each signal was digitalized with a 14 bits analog to digital converter. Each sequence of visual stimulus has a duration of 2 minutes.

7

0 0

2000

4000

Sim. time (k) Class O 100 50 0 0

2000

4000

Amplitude

50

Amplitude

Amplitude Amplitude

Class F

100

Class N 100 50 0 0

4000

Sim. time (k) Class S

100 50 0 0

Sim. time (k) Amplitude

2000

2000

4000

Sim. time (k) Class Z

100 50 0 0

2000

4000

Sim. time (k) Figure 2: Examples of the signal taken from NA (2012), class F signals correspond to extracranial data taken form the neocortical structures. Class N signals correspond to extracranial data taken from the hippocampus, class O signals correspond to intracranial data taken from neocortical structures, class Z signals correspond to intracranial data taken from hippocampus and finally class S signals correspond to intracranial data taken during seizure episode.

5. Simulation results The results presented in this section were divided as follows, the first part describes the results achieved when the classifier was applied to the information contained in database I, the second part shows the classification results when database II was considered. Two databases are presented here to show the versatility of the pattern recognition algorithm proposed in this study. Each section contains the training and the validation results of the pattern recognition. During training, the input (EEG signal) and desired (class) data were repeatedly presented to the CNN (Nigam, 2004). 5.1. Pattern classification results for database 1 5.1.1. Training process All the numerical experiments were evaluated in a Dual Xeon E5-2637v3 3.5GHz, 15MB cache, 9.60 QPI (FourCore) 192GB DDR4-2133 REG ECC (12 - 16GB DIMMS) 2 x NVIDIA Quadro K2200 4GB. The classifier proposed in this part of the work was evaluated with the following parameters A = −2.6 W11 (0) = 2.0 W21 (0) = 7.5 m = n = 1 8

Figure 3: Stimulation patterns employed for database II, each stimulation pattern has a duration of 40 s.

Today, no formal manner exists to select these parameters differently. Notice that the dynamic nature of the CNN classifier reduces the necessity of having hidden layers in its structure. However, the inclusion of hidden layers in the classifier structure is still a matter of further investigation. Notice also that despite that the dimension of EEG signals is 1, the number of components for both W1i and W2i can be adjusted freely. In this part of the document, only the case is presented when the number of components was fixed to 1 which seems to be the most simple form of the classifier that can be tested. This selection was done also in agreement of the well-known problem of over-fitting. This problem occurs when the training error in each trial is driven to zero or to a very small value. Nevertheless, during the validation process, this error is large enough to obtain low percentages of successful classification results. This condition occurs when the network memorized the training datasets, but it has not acquired the ability to generalize the relationship between input and output. Figure 4 shows both W1i and W2i resulting from the training process (upper left-hand side and upper right-hand side). Notice the separation between the five different classes weights in W2i . As mentioned before, the weights contain the information regarding the specific characteristic of each class. Therefore, the observed separation between them leads to consider that the classifier may work adequately (achieving high percentages of correct classifications). Each numerical evaluation in the training process took 7 minutes in average. Figure 4 also shows the approximation performance of the CNN when performing the training for a specific EEG signal taken from class five. The convergence between both signals is an indirect demonstration of the training

9

efficiency generated by the learning laws proposed for the pattern classifier based on CNN. To evaluate the training quality, the integral of the LMS error was also calculated when each signal was evaluated in the training procedure. This error served to evaluate the degree of over-fitting during the training process.

80

10

Amplitude

Amplitude

Weights W2

Weights W1

×10 4

5 0 -5

40 20 0 -20

0

1

2

3

300

0

4

Sim. time (k)

×10

5

1

2

3

4 ×10 5

Sim. time (k) Integral of the DfNN LMS

Desired output and DfNN 0.8 Amplitude

Amplitude

60

200 100 0

0.6 0.4 0.2 0

1

2

Sim. time (k)

3

4 ×10

0 5

1

2

Sim. time (k)

3

4 ×10 5

Figure 4: On top, the W1i and W2i for database I obtained from the generalization training. On the bottom left, approximation of the CNN (in red line the trajectory in black line the CNN approximation). On bottom right, integral of the LMS obtained from the bottom left figure.

The S NDR evaluation of the classifier performance was 324.24 dB for class F, 344.81 dB for class N, 348.35 dB for class O, 354.70 dB for class S and 332.71 dB for class Z. This value is considered acceptable when it is compared with similar evaluations of classifiers based on artificial NN. 5.1.2. Validation process Validation results for database I are divided in two studies: the first uses the well-known training-generalizationtraining, the second evaluation used the k-cross fold training with k=5. The results of the classification process obtained for the training-generalization-testing process are contained in Table 1. This method achieved a 94.88% of correct classification for database I. This result is in the same range of those reported in similar studies (using the same database to evaluate different classifiers) but where only two or three classes were considered. The 5-fold cross validation applied to the database I resulted in a total classification accuracy of 97.4%. For this part of the process, the total set of signals in the database was employed. Notice that even when the final classification percentage is high, there are some sections of the validation that were not so satisfactory (2◦ S. CfA for the class S for example). These results can be a consequence of the randomized selection of signals participating in each experiment. From Tables 1 and 2 it is clear that the results obtained from the two validation methods are different, due to the fact that the signals employed for the training process are selected randomly, so in some cases the signals selected for the training represent the characteristics of the class. 10

Table 1: Results from the generalization validation method for database I.

F N O S Samples 100.0 100.0 100.0 100.0 Training 100.0% 100.0% 100.0% 100.0% Generalization 100.0% 53.3% 100.0% 100.0% Independent Test 100.0% 70.0% 100.0% 100.0% CfA 100.0% 74.4% 100.0% 100.0% *CfA Classification Accuracy percentage, according to the number of samples per section. Table 2: CNN results from the 5-fold cross validation process for database I.

F N Samples 100.0 100.0 1◦ S. CfA 100.0% 100.0% 2◦ S. CfA 100.0% 100.0% 3◦ S. CfA 100.0% 100.0% 4◦ S. CfA 100.0% 100.0% 5◦ S. CfA 100.0% 100.0% CfA 100.0% 100.0% *CfA Classification Accuracy percentage. *S. Segment.

O 100.0 90.0% 95.0% 95.0% 100.0% 100.0% 96.0%

S 100.0 85.0% 80.0% 90.0% 100.0% 100.0% 91.0%

Z 100.0 100.0% 100.0% 100.0% 100.0%

Z 100.0 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%

5.2. Classification results for borderline signals In order to evaluate the classification capacities of the algorithm proposed in this study, a borderline set of evaluation signals was selected. A set of 250 signals was prepared artificially according to the following procedure: consider a first signal S i from a class Ci and a second one S j from a class C j , the hybrid signal S i j was generated as the convex combination of S i j = λS i + (1 − λS j ). Then, these signals were tested on the parallel arrangement of trained CNN. In particular, for this particular analysis, λ = 0.8 meaning the 80% of a signal belonging to a class Ci was combined with 20% of a signal belonging to class C j . A second round of analysis considered the analysis of borderline signals using λ = 0.7. After total evaluation of the classification process, the CNN based classifier achieved a 95% of correct classification for database I. This result shows the detectability capacity of the classifier proposed in this study. In order to detail the classification capacities of the proposed identifier, we proceeded with the analysis of Negative Prediction Value (NPV), Positive Prediction Value (PPV), True Positive Rate (TPR) and Specificity (SPC). These values are calculated according to the following equations (Fawcett, 2005): T ruePositives (T ruePositives + T rueNegatives) T rueNegatives PPV = (T rueNegatives + T ruePositives) T ruePositives T PR = (T ruePositives + FalseNegative) T rueNegatives S PC = (T rueNegatives + FalsePositive)

NPV =

Table 3 contains the results of NPV, PPV, TPR and SPC when evaluation of the classifier was executed over the set of borderline signals considering that λ = 0.8. Accordingly to the accuracy results as well as the predictive analysis (with all results above 90%), the classifier proposed in this study seems to be a reliable method to classify certain characteristics in EEG signals. Table 4 contains the results of NPV, PPV, TPR and SPC when the evaluation of classifier was executed over the set of borderline signals considering that λ = 0.7. Despite that λ decreases 12.5%, the levels of predictive analysis still remained above 90.0% with an accuracy percentage of 96.0%. 11

Table 3: NPV, PPV, TPR and SPC results for the borderline signals with λ = 0.8 I.

NPV PPV TPR SPC

F80% N20% 0.9377 0.0622 0.9868 0.7142

NPV PPV TPR SPC*

F70% N30% 0.9511 0.0488 0.9661 0.3859

N80% O20% 0.9258 0.0742 0.9776 0.6604

O80% S20% 0.9529 0.0470 0.9674 0.5641

S80% Z20% 0.9457 0.0542 0.9582 0.5555

Z80% F20% 0.9423 0.0576 0.9692 0.6000

Table 4: NPV, PPV, TPR and SPC results for the borderline signals with λ = 0.7 I.

N70% O30% 0.9234 0.0766 0.9613 0.5738

O70% S30% 0.9279 0.0720 0.9648 0.4383

S70% Z30% 0.9336 0.0663 0.9504 0.5357

Z70% F30% 0.9186 0.0813 0.9675 0.5441

5.3. Pattern classification results for database 2 This section describes the results achieved by the proposed pattern classification structure when working with the database that was developed with the EMOTIV R device. The classifier proposed in this part of the work was evaluated with the following parameters A = −2.3 W11 (0) = 0.1 W21 (0) = 0.3 ∗ ones(14) m = 14, n = 1 where ones(r) is a vector of dimension r with entries equal to one. Notice that this specific example demonstrates how to consider the presence of input signals with dimensions greater than one. 5.3.1. Training process For the generalization training process 60% of the full database II was employed. Just one component of the resulting W1i and W2i are shown in Figure 5 (upper left-hand side and upper right-hand side): in black line the results for class 1, in blue for class 2 and in red for class 3. Notice the significant differences between the weights depicted in these figures. Similar behavior was obtained for the other components. This separation justifies the claim regarding the capacity of the technique proposed in this study to classify complex signals even with input information organized in vectors with dimensions greater than one. In the bottom left of the figure, the desired trajectory and the CNN approximation during the generalization training and the trajectory generated by the classifier are depicted together. The convergence between both signals is an indirect demonstration of the training efficiency generated by the learning laws proposed for the pattern classifier based on CNN. The integral of the LMS error resulting of the CNN approximation for the sample example is demonstrated in the lower right-hand side of the same figure. The S NDR evaluation of the classifier performance was 259.34 for the signals included in class 1, 268.56 for class 2 and finally, 266.73 for the signals included in class 3. This value is considered acceptable when it is compared to similar evaluations of classifiers based on artificial NN. 5.3.2. Validation process The generalization-validation results are shown in Table 5. Here the generalization group was made up by 30% of the full database. The total accuracy for this generalization method was 91.18%. This value can be considered high but it is difficult to evaluate its relative success because the signals included in this database have never been evaluated by other classifiers. Nevertheless, this particular example shows how the proposed classifier can be used even when the time-varying patterns used as input can be of dimension greater than one. Moreover, one may notice that no significant modification has to be done to adjust the training process under this more complicated situation. The 5-fold cross validation applied to the database II, employed the full set of signals in the second database for the validation. This validation method obtained a correct classification accuracy of 92%, as can be seen in Table 6. 12

Weights W1

Weights W2 2000

0

Amplitude

Amplitude

×10 6

-10

1000

0

-20 0

1000

2000

3000

4000

0

1

Sim. time (k) Desired output and DfNN

2

3

4

Sim. time (k) Integral of DfNN LMS

×10 4

Amplitude

Amplitude

200 150 100 50 0

40

20

0 0

1

2

3

Sim. time (k)

4

5

0 ×10 5

1

2

3

Sim. time (k)

4

5 ×10 5

Figure 5: On top the W1i and W2i for the database II. On bottom left the desired trajectory in red an the CNN approximation in black. On bottom right the integral of the LMS error from the approximation and the desired trajectory.

Once again this value is just of reference. However, one may notice the individual percentages of correct classification achieved during the evaluation of these more demanding validation tests. 6. Conclusions In this paper, the capability of a CNN to be employed as an EEG signal pattern classifier was tested with the information collected in two databases. The training process based on the method of adjusting the weights in the CNN was also proposed. The training results as well as the validation percentages were reported and evaluated for both databases. To evaluate the effectiveness of the classifier proposed in this study, the classification percentages were compared to the results achieved by other classifiers based on NN that have used similar information. Table 7 shows the results reported by other classifiers based on NN that work with the same database, the last two results listed correspond to the achievements of this paper when validating with the generalization and a 5-fold cross validation method. It is important to remark that in contrast to other techniques; the proposed CNN works with the five classes included in the database I as compared to the results reported by other research groups, as the one reported by (Srinivasan et al., 2007) which only employed 2 of 5 possible classes. Even though the results reported by others may be higher in their total correct classification accuracy, they are not working with the entire database. Moreover, the method presented here used the raw EEG signal without considering the presence of any preliminary signal treatment. This methodology introduces a different way of thinking regarding how to consider the inclusion of CNN in the problem of pattern recognition or classification. Even more, according to the classification results obtained in this study, this kind of classifier can be extended to some other problems where the raw signal can be more informative that when making several steps of pre-treatment. In general, the application of this kind of classifier only requires the selection of training 13

Table 5: Results from the generalization validation method for database II.

C1 (%) C2 (%) C3 (%) Samples 30 30 30 Training 100.0% 100.0% 100.0% Generalization 100.0% 100.0% 88.9% Independent Test 100.0% 66.7% 66.7% CfA 100.0% 88.9% 84.7% *CfA Classification Accuracy, according to the number of samples per section.

Table 6: CNN results from the 5-fold cross validation process for database II.

C1 Samples 100 1◦ S. CfA 100.0% 2◦ S. CfA 100.0% 3◦ S. CfA 100.0% 4◦ S. CfA 100.0% 5◦ S. CfA 80.0% CfA 96.0% *CfA Classification Accuracy. *S. Segment.

C2 100 100.0% 100.0% 60.0% 100.0% 100.0% 92.0%

C3 100 100.0% 80.0% 100.0% 100.0% 60.0% 88.0%

time T and to perform an exhaustive supervised training. This is an advantage of the classifier structure proposed in this study because no particular pre-treatment had to be designed. Table 7: Comparison between EEG signal pattern classifiers based on NN that employed database I.

Reported in

Validation method

Datasets from (NA, 2012)

(Srinivasan et al., 2007) (Tzallas et al., 2007) (Nigam, 2004) In this work In this work

not mentioned not mentioned not mentioned Generalization k-cv not mentioned not mentioned not mentioned

Z,S Z,S Z,S F,N,O,S,Z F,N,O,S,Z S S interictal & ictal EEG

(Liu et al., 2012) (Bajaj and Pachori, 2013) (Zhang et al., 2015) *CfA Classification Accuracy, according to the number of samples per section.

Quality measure 99.6% (C f A) 100% (C f A) 97.2% (C f A) 94.88% (C f A) 97.4% (C f A) 94.46% (S ensitivity) 90% (S ensitivity) 92.94% (S ensitivity)

References Akareddy, S., Kulkarni, P., 2013. Eeg signal classification for epilepsy seizure detection using improved approximate entropy. International Journal of Public Health Science 2, 23–32. Allinson, N., Yin, H., 1999. Kohonen Maps. Elsevier Science. chapter 8. Self-Organising Maps for pattern recognition. pp. 111–120. Alotaiby, T., Alshebeili, S., Alshawi, T., Ahmad, I., El-Samie, F., 2014. Eeg seizure detection and prediction algorithms: a survey. EURASIP Journal on Advances in Signal Processing 2014, 1–21.

14

Amato, F., Lpez, A., Pea-Mendez, E., Vanhara, P., Hampl, A., Havel, J., 2013. Artificial neural networks in medical diagnosis. Journal of Applied Biomedicine 11, 47–58. Bajaj, V., Pachori, R., 2013. Epileptic seizure detection based on the instantenous area of analytic intrinsic mode ffunction of eeg signals. Biomed Eng Lett 2013, 17–21. Bashashati, A., Fatourechi, M., Ward, R., Birch, G., 2007. A survey of signal processing algorithms in brain-computer interfaces based on electrical brain signals. Journal of Neural Engineering 4, R32–R57. Baum, E., Haussler, D., 1989. What size net gives valid generalization? Neural Computation 1, 151–160. Benvenuto, N., Piazza, F., 1992. On the complex backpropagation algorithm. IEEE Transactions on Signal Processing 40, 967–969. Birbaumer, N., 2006. Breaking the silence: Brain computer interfaces (bci) for communication and motor control. Psychophysiology 43, 517–532. Bose, N., Liang, P., 1996. Neural Network Fundamentals with Graphs, Algorithms and Applications. Tata McGrawHill. Boussemarta, Y., Cummings, M., 2011. Predictive models of human supervisory control behavioral patterns using hidden semi-markov models. Engineering Applications of Artificial Intelligence 24, 1252–1262. Chairez, I., 2009. Wavelet differential neural network. IEEE Transactions on Neural Networks 20, 1439–1449. Chang, C., Lin, C., Wei, M., Lin, K., Chen, S., 2014. High-precision real-time premature ventricular contraction (pvc) detection system based on wavelet transform. Journal of Signal Processing Systems 77, 289–296. Chen, G., 2014. Automatic eeg seizure detection using dual-tree complex wavelet-fourier features. Expert Systems with Applications 41, 2391–2394. Cheng-Jian, L., Ming-Hua, H., 2009. Regarding eeg signal pattern classification, several types of snns have been proposed. Neurocomputing 72, 1121–1130. Coyle, D., Prasad, G., McGinnity, T., 2005. A time-series prediction approach for feature extraction in a braincomputer interface. IEEE Transaction on Neural Systems Rehabilitation Eng. 13, 461–467. Cybenko, G., 1989. Approximation by superpositions of sigmoidal function. Mathematics of Control Signals and Systems 1989, 303–314. Dunne, R., 2006. A Statistical Approach to Neural Networks for Pattern Recognition. Elsevier Science. Fawcett, T., 2005. An introduction to roc analysis. Pattern Recognition Letters 27, 861–874. Ge, D., Srinivasan, N., Krishnan, S., 2007. Advances in Cardiac Signal Processing. Springer-Verlag Berlin Heidelberg. chapter 8. The Application of Autoregressive Modeling in Cardiac Arrhythmia Classification. pp. 209–224. Goel, V., Brambrink, A., Baykal, A., Koehler, R., Hanley, D., Thakor, N., 1996. Dominant frequency analysis of eeg reveals brains response during injury and recovery. IEEE Transaction on Biomedical Engeeniering 43, 1083–1092. Gotman, J., Wang, L., 1991. State-dependent spike detection: Concepts and preliminary results. Electroencephalography and Clinical Neurophysiology 79, 11–19. Guo, L., Rivero, D., Dorado, J., Rabual, R., Pazos, A., 2010. Automatic epileptic seizure detection in eegs based on line length feature and artificial neural networks. Journal of Neuroscience Methods 191, 101–109.

15

Guo, L., Rivero, D., Seoane, J., Pazos, A., 2009. Classification of eeg signals using relative wavelet energy and artificial neural networks, in: GEC ’09 Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, ACM New York. pp. 177–184. Hassoun, M., 1995. Fundamentals of Artificial Neural Networks. MIT Press. Haykin, S., 1999. Neural Networks, A Comprehensive Foundation. Prentice - Hall. Hwang, H., Kim, S., Choi, S., Im, C., 2013. Eeg-based brain-computer interfaces: A thorough literature survey. International Journal of Brain Computer Interaction 29, 814–826. Inc., E., 2014. EMOTIV EPOC Brain computer interface & scientific contextual EEG. Technical Report. EMOTIV Inc. Kannathal, N., Rajendra, U., ChooMin, L., Suri, J.S., 2007. Advances in Cardiac Signal Processing. Springer-Verlag Berlin Heidelberg. chapter 7. Classification of Cardiac Patient States Using Artificial Neural Networks. pp. 187– 208. Kasabova, N., Dhoblea, K.and Nuntalida, N., Indiverib, G., 2013. Dynamic evolving spiking neural networks for on-line spatio- and spectro-temporal pattern recognition. Neural Networks 41, 188 – 201. Kuncheva, L.I., Zliobaite, I., 2009. On the window size for classification in changing environments. Journal of Intelligen Data Analysis 13, 6. Liu, Y., Zhou, W., Chen, S., 2012. Automatic seizure detection using wavelet transform and svm in long term intracraneal eeg. IEEE Transactions on Neural Systems and Rehabilitation Engineering 2’, 749–755. Lu, L., Shin, J., Ichikawa, M., 2004. Massively parallel classification of single- trial eeg signals using a min-max modular neural network. IEEE Transactions on Biomedical Engenieering 24, 551–558. NA, 2012. Seisure prediction project freiburg university of freiburg. online. Http://epilepsy.uni-freiburg.de/freiburgseizure-prediction-project/eeg-database. Nait-Ali, A. (Ed.), 2009. Advanced Biosignal Processing. Springer. Nicolas-Alonso, L., Gomez-Gil, J., 2012. Brain computer interfaces, a review. Sensors 12, 1211–1279. Nigam, V.a., 2004. A neural-network-based detection of epilepsy. Neurological Research 26, 55–60. Omerhodzic, I., Avdakovic, S., Nuhanovic, A., Dizdarevic, K., 2013. Energy distribution of eeg signals: Eeg signal wavelet-neural network classifier. World Academy of Science, Engineering and Technology 61, 1190–1195. Polat, K., Gnes, S., 2007. Classification of epileptiform eeg using a hybrid system based on decision tree classifier and fast fourier transform. Applied Mathematics and Computation 87, 1017–1026. Poznyak, A., Sanchez, E., Wen, Y., 2001. Differential Neural Networks for Robust Nonlinear Control; Identification, State Estimation and Trajectory Tracking. World Scientific Publishing Co. Pre. Ltd. Riaz, F., Hassan, A., Rehman, S., Niazi, I., Dremstrup, K., 2015. Emd based temporal and spectral features for the classification of eeg signals using supervised learning. IEEE Transactions on Neural Systems and Rehabilitation Engineering PP, 1. Roy, R., Charbonnier, S., Bonneta, S., 2014. Eye blink characterization from frontal eeg electrodes using source separation and pattern recognition algorithms. Biomedical Signal Processing 14, 256–264. Sanei, S., Chambers, J., 2013. EEG Signal Processing. Whiley.

16

Srinivasan, V., Eswaran, C., Sriraam, N., 2007. Approximate entropy-based epileptic eeg detection using artificial neural networks. IEEE Transactions on Information Technology in Biomedicine 11, 288 – 295. Tzallas, A., Tsipouras, M., Fotiadis, D., 2007. Automatic seizure detection based on time-frequency analysis and artificial neural networks. Computational Intelligence and Neuroscience 2007, 13. Ubeyli, E., 2009. Decision support systems for time-varying biomedical signals: Eeg signals classification. Expert Systems with Applications 36, 2275–2284. Urolagin, S., Prema, K., Reddy, S., 2012. Generalization capability of artificial neural network incorporated with pruning method. Advanced Computing, Networking and Security 7135, 171–178. Weng, W., Khorasani, K., 1996. An adaptive structure neural network with application to eeg automatic seizure detection. Neural Networks 9, 1223–1240. Zhang, Y., Zhou, W., Yuan, S., 2015. Mulltifractal analisys and relevance vector machine based automatic seizure detection in intracraneal eeg. International Journal of Neural Systems 25, 14.

17