Journal Pre-proof Motor imagery EEG recognition with KNN-based smooth auto-encoder Xianlun Tang, Ting Wang, Yiming Du, Yuyan Dai
PII:
S0933-3657(19)30134-4
DOI:
https://doi.org/10.1016/j.artmed.2019.101747
Reference:
ARTMED 101747
To appear in:
Artificial Intelligence In Medicine
Received Date:
27 February 2019
Revised Date:
6 October 2019
Accepted Date:
27 October 2019
Please cite this article as: Tang X, Wang T, Du Y, Dai Y, Motor imagery EEG recognition with KNN-based smooth auto-encoder, Artificial Intelligence In Medicine (2019), doi: https://doi.org/10.1016/j.artmed.2019.101747
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier.
Motor imagery EEG recognition with KNN-based smooth autoencoder Xianlun Tang1, Ting Wang1, Yiming Du2, Yuyan Dai1
Highlights
Based on K-Nearest Neighbor (KNN) algorithm and convolution auto-encoder (CAE) network, an innovative semi-supervised model is proposed in this paper, called KNN-based smooth autoencoder (k-SAE). K-SAE looks for the nearest neighbor values of the samples to construct a new input and learns the robust features representation by reconstructing this new input instead of the original input, which is different from the traditional automatic encoder (AE). The Gaussian filter is selected as the convolution kernel function in k-SAE to smooth the noise in the feature. The data information and spatial position of the feature map are recorded by max-pooling and unpooling, that help to prevent loss of important information. The experiments in this paper select two sets of data for verifying the validity of the proposed method. One is obtained by EEG signal acquisition experiment and the other is public data sets. Compared with other state-of-the-art EEG recognition methods, k-SAE has superior performance.
ro of
lP
re
-p
Abstract: As new human-computer interaction technology, brain-computer interface has been widely used in various fields of life. The study of EEG signals can not only improve people's awareness of the brain, but also establish new ways for the brain to communicate with the outside world. This paper takes the motion imaging EEG signal as the research object and proposes an innovative semi-supervised model called KNN-based smooth auto-encoder (k-SAE). K-SAE looks for the nearest neighbor values of the samples to construct a new input and learns the robust features representation by reconstructing this new input instead of the original input, which is different from the traditional automatic encoder (AE). The Gaussian filter is selected as the convolution kernel function in k-SAE to smooth the noise in the feature. Besides, the data information and spatial position of the feature map are recorded by max-pooling and unpooling, that help to prevent loss of important information. The method is applied to two data sets for feature extraction and classification experiments of motor imaging EEG signals. The experimental results show that k-SAE achieves good recognition accuracy and outperforms other state-of-the-art recognition algorithms.
ur na
Keywords: KNN-based smooth auto-encoder, BCI, motor imagery, feature extraction, EEG recognition
1. Introduction
Jo
Electroencephalogram (EEG) is an overall reflection of the electrophysiological activity of brain cells in the cerebral cortex and scalp, which contains a large amount of physiological and disease information [1]. The brain computer interface (BCI) based on EEG can be used as a communication system to replace the peripheral nerve and muscle normal output channels to realize the interaction between the brain and the outside world. BCI has been widely concerned by scholars and researchers in the world since it was proposed [2-4], such as sleep staging algorithm research based on single channel EEG signals [5-8]and feature extraction of EEG signals for epileptic seizure detection [9,10]. The brain computer interface based on motor imagery is one of the most important BCI systems. Motor imagery (MI) refers to an object that only imagines the movement of the body without actual action, and is an idea activity that can be actively controlled by human [11]. Human imaginary movements or actual movements produce related EEG signals. The EEG signals generated during motor imaging have eventrelated synchronization (ERS) and event-related desynchronization (ERD) characteristics [12]. By _____________ Ting Wang
[email protected] 1 Chongqing Key Laboratory of Complex Systems and Bionic Control, College of Automation, Chongqing University of Posts and Telecommunications, Nan’an district, Chongqing 400065, China 2 College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Nan’an district, Chongqing 400065, China
ur na
lP
re
-p
ro of
analyzing MI-EEG signals, the imaginary's motion intention can be judged, and thereby achieving control of external devices. Such an implementation method provides a new way of communicating with the outside world. Therefore, the research of MI-EEG signal processing technology can accelerate people's exploration of brain cognition, brain therapy and brain application, and has been widely used in many fields such as medical rehabilitation, entertainment and smart home [13-17]. The huge application prospects have pushed MI-BCI research into a period of rapid development, making it one of the most popular research areas. The core component of the BCI system based on motor imagery contains two parts: feature extraction of EEG signals and feature classification of EEG signals. Commonly used feature extraction algorithms include wavelet transform (WT) [18, 19] and common spatial pattern (CSP) [20]. These methods have been proven by many researchers to achieve good recognition results [21-23]. However, they either need to rely on a large amount of prior knowledge or are sensitive to noise. The complexity of the mechanism of MI-EEG signal generation reduces the flexibility of their application. Linear discriminant analysis (LDA) and artificial neural networks (ANN) are often used for feature classification. Yukun Wen designed an algorithm based on LDA for Online Motor Imagery BCI systems [24]. In 2013, Kottaimalai et al. combined principal component analysis with artificial neural network to reduce and classify the five types of imaginary EEG signals of seven subjects, and obtained better recognition results [25]. Support vector machine (SVM) is widely used in biomedical field because of its good prediction accuracy and ability to process large amounts of data [26]. Although SVM has strong generalization ability and nonlinear mapping ability, the choice of kernel function and the selection of parameters lead to uncertainty in its performance. In recent years, deep learning with powerful ability to process nonlinear and high-dimensional data without requiring a large amount of prior knowledge has also been applied to the analysis of EEG signals. Cecotti et al. constructed a P300 recognition method based on CNN, and the BCI Competition public data set is used for classification experiments, and the recognition rate was up to 95.5% [27]. Yang et al. combined CSP with CNN to effectively analyze multichannel motion imaging data [28]. An et al. applied deep belief network (DBN) to build a powerful classifier for left and right hand motor imagery EEG feature extraction and recognition [29]. However, these models have the disadvantage of parameter redundancy, and they can easily break the information structure, resulting in the loss of correlation data. This paper proposes a new model, KNN-based smooth auto-encoder, to achieve accurate recognition of motor imaging EEG signals. The model is used for efficient feature extraction of rough raw EEG data, and it has powerful classification capabilities. In this paper, the proposed method is used to experiment on two sets of data, and is compared with other related state-of-the-art methods. The experimental results show that k-SAE model can improve the recognition accuracy to a certain extent, and has strong robustness and good generalization ability. The main contributions of this methods can be summarized as follows: (1) the proposed method uses the KNN algorithm [30] to find the nearest neighbor values of the input data, and then obtains a new transformed input through the weight function. Different from the traditional auto-encoder, which reconstructs the original input, the new input is reconstructed in k-SAE. Such a structure makes feature mapping vary smoothly on manifolds; (2) for the convolutional layer in the model, the Gaussian kernel function is used for convolution operation to remove noise and further smooth the data; (3) max-pooling and unpooling are used to realize dimensional reduction and recovery. These operations help to record feature values while preserving spatial location information, and maximize the integrity of data information to prevent loss of important information [31].
2.Related work
Jo
2.1 Auto-encoder Auto-encoder is a kind of neural network with unsupervised learning, which contains only one hidden layer. It is considered to be made up of two parts: encoder and decoder, as shown in Fig.1. Encoder reduces the dimension of the original data, which is a mapping of the given input x to the hidden layer representation
where
h f (Wx b)
W , b , ε is nonlinear activation function that typically the sigmoid ( x)
(1)
1 . 1 e x
The decoding network is equivalent to the inverse process of the encoder, and is used to restore the hidden layer data to high-dimensional data to realize reconstruction of the original input data. In the decoder, the h is mapped back to the output layer 2
with
y g ' (W ' x c)
' W ' , c .
(2)
The parameters sets of b and c are bias vectors, and decoder weight matrices W’ usually takes the constrained form W’ =WT. The basic idea of the auto-encoder is to find the minimization error between the output of the network and the input data by means of adjusting the network parameters
W , b, c
output
y
so as to discover the underlying characteristics of the input data. As the reconstruction
is
required
to
be
equal
to
the
input
x,
the
construct
error
follows
J AE ( ) xD L( x, g ( f ( x))) , where L is the reconstruction error, which could be squared
Euclidean distance
L( x, y) x y or cross-entropy loss
2
(3)
L( x, y) i 1 xi log yi (1 xi ) log(1 yi )
(4)
lP
re
-p
ro of
dx
Jo
ur na
Fig.1. Auto-encoder 2.2 Convolutional neural network(CNN) CNN is a non-fully connected network consisting of two special structural layers: convolution layer and pooling layer. All neurons in the convolution layer form a feature map called convolution kernel. Each node in the convolution kernel is connected with the local node from the previous input layer. The result of local weighting is then passed through a nonlinear function (such as sigmoid, ReLU, ELU, etc.). The weight and bias information on each feature map is constant and does not change as the input position changes, this is weight sharing. Each network can have more than one convolution kernel, and each convolution kernel represents different features. Meanwhile, the connection weights in different convolution kernels are different. Usually each convolution layer is followed by a pooling layer. The purpose of the pooling layer is to reduce the dimensions of the features obtained by the convolution layer. There are two commonly used pooling methods: max-pooling and average pooling. The former takes the maximum value of all neurons in the feature maps, and the latter gets the average. Pooling operation is insensitive to local translation. 2.3 Convolution auto-encoder(CAE) Convolution auto-encoder [32] is an unsupervised training neural network. Its implementation is consistent with the basic idea of Auto-Encoder. On the basis of AE, it also takes advantages of CNN's convolution and pooling operations to achieve invariant feature extraction. In the encoding part and decoding part of CAE, a stack of convolutional layers and pooling layers are added. Through this, the CAE parameters are reduced and the strong noise reduction capability is also guaranteed. For a monochannel input x the latent representation of the k-th feature map is given by
h k ( x W k b k )
(5) The output value is then reconstructed by the inverse convolution operation of the corresponding convolutional decoder. Each set of feature h is convolved with the transpose of its corresponding convolution kernel and the result is added to bias vectors c. The reconstruction is obtained, using 3
~
y ( h k W k c)
(6)
kH
~
H represents the group of latent feature maps; W represents the flip operation over both dimensions of the weights. To update the weights, determine the cost function
E ( )
1 n ( xi yi )2 2n i 1
(7)
Like a standard network, the backpropagation algorithm is applied to calculate the gradient of the error function relative to the parameters. This can be easily obtained through a convolution operation using the following formula: ~ E ( ) k k x h h y W k
(8)
ro of
δh and δy are the deltas of the hidden states and the reconstruction, respectively. The weights are then updated using stochastic gradient descent. The main difference between CAE and CNN is that the former is unsupervised learning, which is usually used to train to extract features from input data to reconstruct input data. AE forces each feature to be fully connected, so its parameters are redundant, while CAE is not. Due to the convolution operation of CAE, it generates the same number of active graphs regardless of the dimension of the input data.
3. The proposed method
ur na
lP
re
-p
3.1 The architecture of k-SAE Through the structure of the auto-encoder, we can understand that it reconstructs the original input, but it does not pay attention to the inherent structure of the data, so that small fluctuations between the data may lead to large deviation in the results. In this paper, we design a network which integrates the KNN to select similar feature data and uses the unsupervised learning of CAE to learn nonlinear feature representations, called KNN-based smooth auto-encoder (k-SAE). In addition to the CAE feature extraction pre-training component, supervised training CNN is also included to complete the classification tasks. This network ensures the similarity between local neighbors while reconstructing learning based on AE, and preserves the smoothing characteristics. The model structure is shown in Fig.2. As training this network, we should pre-train the unsupervised network (blue block diagram) firstly. The supervised learning network (red block diagram) is then initialized with pre-trained weights and biases, and finally fine-tunes and completes classification tasks. First of all, according to the principle of KNN, we get a nearest neighbors set Ωi = {xj, … , xk} for the input xi . According to unsupervised learning settings, we choose k nearest neighbors based on an appropriate distance measurement algorithm, such as Euclidian, Mahalanobis and cosine. In this paper, Euclidian is chosen to define the neighbors. The k nearest neighbors are then combined with the original ~
input xi to make up a new input xi through the weight function. ~
xi j 1 ( x j , xi ) x j k
where ω(·, ·) is a weight function defined through a smoothing kernel
Jo
and the Z is a standardized item used to guarantee
k j 1
(9)
( x j , xi )
1 K (d ( x j , xi )) , Z
( x j , xi ) 1 for all i, d(·, ·) is a distance
function. K(·) is defined as a kernel function which can have many choices, such as Gaussian kernel, triangular, tricube and uniform. The Gaussian kernel function has the ability to extract local feature information, often used to process data noise, remove details, and smooth data. In this paper, we choose Gaussian kernel function: 2 xi x j 1 exp( ) x j i ( x j , xi ) Z otherwise 0
(10)
σ represents the bandwidth of Gaussian kernel function, the value of σ is set to 0.5 according to experience [33] in this paper. 4
When we set the value of k, the smaller k causes the approximation error to decrease, and the estimated error of learning increases, which makes the features susceptible to noise. Larger k values will reduce the prediction accuracy, so choosing the right number of neighbors is also important for this algorithm. In this paper, k =10, and the specific reasons will be explained in the section 4.4. ~
The new input xi is then transferred to the first convolution layer, and the feature map is obtained
Jo
ur na
lP
re
-p
ro of
by the convolution kernel. The pooling operation is followed to obtain a smaller feature map. This feature map is then passed to the next convolution layer. Such a process can be stacked multiple times. The convolution layer is a feature extraction layer. The input of each neuron is connected to the local receptive field of the previous layer and extracts local features. Convolution operations can enhance feature signals and reduce data noise. Its main mechanism, weight sharing, can greatly reduce a large number of parameters and have invariance to small changes that occur. The convolution operation is essentially a process of weighted averaging. We often see that in order to achieve data smoothing, many researchers use a mean filter or a Gaussian filter as a convolution kernel function. Taking the 3*3 convolution kernel size for example, the former convolution kernel has the effect of taking the average of nine values instead of the intermediate pixel value to achieve a smoothing effect; Gaussian filter presents Gaussian distribution in horizontal and vertical directions, which highlights the weight of center point after pixel smoothing. Gaussian filter is used in this paper, which has better smoothing effect compared with mean filter. Fig.3a shows the smooth filtering operation in the spatial domain: By moving a m×m square window throughout the given M×N matrix (particularly, m should be odd), each output in the position of the blue pixel, is obtained based on its surrounding pixels in the dashed blue window, and the red one denotes the repeated step. Fig.3b and Fig.3c represent the template of average filter and Gaussian filter, respectively. After passing through the convolutional layer, the number of feature maps increases. To avoid excessive parameters and reduce computational complexity, a subsampling operation is added after the convolutional layer, that is, each convolutional layer is followed by a pooling layer. Pooling operations can extract important features and reduce network parameters. In this article, we use max-pooling, which records the maximum values while retaining the position information to facilitate backpropagation of subsequent decoder parts. This structure greatly preserves the local correlation of features within the neighborhood. The max-pooling working mode is depicted in Fig.4a.
5
ro of -p re lP
Jo
ur na
Fig. 2. Illustration of k-SAE’s architecture
Fig. 3. (a)The smooth filtering operation in the spatial domain; (b) 3*3 average filter; (c) 3*3 Gaussian filter with σ = 0.5.fig
6
re
-p
ro of
Fig. 4. The schematic diagram of max-pooling and corresponded unpooling: a max-pooling; and b unpooling After N convolution and pooling layers, the top convolutional pooling output we get is actually equivalent to the hidden layer of Auto-encoder. The part above corresponds to the encoder of the CAE. The formula is expressed as in Eq.5. The working principle of the decoder in k-SAE is the same as that in the previous CAE. Corresponding to the N times convolution and pooling operation in the encoder, N deconvolution and unpooling layers are set to restore the data to the original size matrix. Getting the reconstructed output by Eq.6. In the decoder part, the essence of deconvolution is to perform convolution operation. In this paper, we still use Gaussian filter for calculation. The working principle of unpooling is to backpropagate the data stored by the previous maxpooling and the location information, and the surrounding loss information is re-added to zero value, so that the output is the same as the input dimension. The unpooling working mode is shown in Fig.4b. The encoder and decoder parts of the k-SAE are trained using unlabeled samples, as shown in the blue box in Figure 2. The gradient is calculated by the loss function: n
~
J k SAE ( ) L( xi , g ( f ( xi )))
(11)
lP
i 1
ur na
and the parameters are optimized by Adam. We regard this part,which is inside the blue box as pretraining of the model. The data obtained from the top convolutional pooling layer in k-SAE is defined as the features extracted from the hidden layer of a traditional auto-encoder. Finally, the full connection layer is connected and the features are sent to the Softmax classifier for classification tasks through forward propagation. We regard the red part of figure 2 as a general convolution neural network to implement supervised training. During this training, k-SAE training no longer randomly initializes weight and deviation, but initializes with parameters saved in the pre-training step. 3.2 Algorithm of training k-SAE The training in k-SAE includes unsupervised training and supervised fine-tuning, which are listed in detail in Algorithm 1 and Algorithm 2. Algorithm 1 Unsupervised pre-training of k-SAE
Jo
1. Give EEG dataset as input x; ~
2. Compute the transformed new input xi by Eq. (9); 3. Initialize all weight matrices and bias vectors randomly; 4. while not stopping criterion do 5. Forward propagation: compute h and y according to Eqs. (1) and (2) 6. Compute the loss function through Eq. (11) 7. Update parameters {W,b,c} by Adaptive Moment Estimation 8. end while 7
9. Save the parameters {W,b,c}. The researchers indicate that the weight of the encoder and decoder can be swapped when implementing AE [34]. Because the model structure proposed in this paper is realized in a symmetrical way, it can meet the requirement of weight transpose. We only update the encoder weight and then set the decoder weight to the transposition of the encoder weight. After training, only the weight information of the encoder needs to be saved. This operation also simplifies the calculation. Algorithm 2 Supervised fine-tuning of k-SAE 1. Give EEG dataset as input x; 2. Initialize the parameters of the overlapping part of unsupervised training and supervised training with the parameters saved by algorithm 1, and the remaining part is still randomly initialized;
ro of
3. Use BP algorithm with Adam parameters optimization method tune the network’s parameters in the top-down direction; It should be noted that all activation functions in this article use the Relu activation function.
x x 0 Re lu ( x) 0 x 0 4. Experiments on motor imagery EEG classification
(12)
Jo
ur na
lP
re
-p
In order to verify the effectiveness of the proposed k-SAE, two data sets are selected for experiments in this paper. One is obtained through EEG signal acquisition experiments, and the other is public data sets. The proposed algorithm is compared with other advanced EEG classification algorithms based on these data sets. Before describing each data set in detail, we briefly describe the EEG acquisition experiment in this article. 4.1 EEG signal acquisition based on Emotiv EEG signal acquisition is an essential part of the research on the processing and application of EEG. The EEG acquisition instrument used in this paper is the Emotiv EEG acquisition instrument developed by the American Emotiv System Company, as shown in figure 5. The electrodes are placed in accordance with the international 10-20 standard electrode placement method. The placement locations of 14 electrodes of Emotiv EEG acquisition instrument are AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8 and AF4, respectively. Reference electrodes "CMS" and "DRL" are located in P3 and P4. Figure 6 shows the placement position of Emotiv electrodes. The sampling frequency of the device is 128 Hz. The Emotiv EEG signal acquisition instrument comes with TestBench software, and the software can record and save EEG signals.
Fig.5. Emotiv EEG acquisition equipment
Fig.6. The placement position of Emotiv
Fig.7. Sequence diagram of EEG experiment 8
Jo
ur na
lP
re
-p
ro of
Based on the analysis of relevant papers and a large number of test experiments [35-38], seven healthy subjects with no history of brain diseases are finally recruited to participate in the motor imagery EEG collection experiments. All subjects are aged between 22 and 24 years old and are right-hander. None of them had systematic motor imaging training. The experiments are conducted in a relatively quiet environment. After placing the electrode cap, the subjects sit at a table with their hands rested on the table. At the beginning of the recording (t = 0s), the subjects sit quietly on a stool and close their eyes. At t = 2s, the recorder gives the subjects a left or right audible prompt, and then the subjects need to perform the corresponding left-hand or right-hand imaginary task according to the prompt. When t = 4s, the subjects hear one stop order, and then the subjects stop the task, and get ready for the next experiments after a short break. Fig. 7 shows the process of a signal acquisition. Each subject repeats each type of experiment 120 times, that is, for each subject, his left hand motor imagery data is taken 120 times, and his right hand data is taken 120 times. A total of 240 samples are collected for each subject, including two categories of motor imagery. Through extensive literature reading and experiments, it is proved that 4 to 12 channels in the motor brain region can be used to realize the recognition of motor types [39, 40]. Compared with the channel signal data of selecting the whole brain region, the classification accuracy obtained by selecting channel data of the corresponding motor brain region is not much different, but the speed is qualitatively improved. Selecting a small amount of the most suitable channel data can reduce the computational complexity of the model and reduce the data storage space while ensuring the accuracy. F3, F4, FC5, FC6, T7 and T8 channel electrodes are located in the motor function area of the primary sensory motor cortex of the brain. The changes in the acquired EEG signals can reflect the corresponding changes in brain state when the subject imagines left and right hand movements. Therefore, in this experiment, six channels located in the motion perception area, such as F3, F4, FC5, FC6, T7 and T8, are selected to identify the EEG signals of left-right movement imagination. Considering that the subjects may not be fully focused when they first start to imagine the task, each channel retains only the data from the 3s to the 4th, that is, each group of samples contains 128 data. 4.2 Data set Dataset 1 uses the rough data collected directly from the above EEG acquisition experiment. This data set contains two classes of motor imagery EEG signals (left-hand and right-hand) obtained from seven different subjects. Each subject completes 240 trials (half for each class of MI).The sampling frequency is 128Hz, that is, 128 points are sampled per second. This paper selects the data from the 3rd to the 4th second of each motion imagination. A total of seven 128 * 6 * 240 arrays are generated, with 6 representing the number of channels. This dataset is divided into a training set and a testing set in a 5: 1 ratio. Dataset 2 is BCI competition IV data set 2b. This dataset records a total of nine subjects' EEG data, with each subject recorded 5 times, performing in two different days in two weeks respectively. The first two sessions are recorded without feedback on two different days within two weeks and the other three sessions incorporate online feedback. The tasks imagined is made up of left-hand exercise and right-hand exercise. The EEG signals of three channels (C3, Cz, C4) are recorded for each subject at a sampling frequency of 250 Hz. The EEG signals of the C3, Cz, and C4 electrodes are chosen because they are located in the primary sensorimotor cortex functional area of the brain and can reflect the most effective information about the changes in brain state when the subject imagines left and right hand movements. In particular, the EEG in the C3 and C4 positions contains the brain state of the representative region of the hand movement function. The EEG signals collected from these two channels can reflect the motor imaging characteristics of the subject's thinking. Cz is used as a reference electrode between the two. The collected EEG signals are filtered through a 0.5-100 Hz band-pass filter and 50 Hz trap filter. As in [41], only the third session data is used for evaluation. 5-fold cross-validation is used to evaluate the performance of all experiments conducted using dataset 2, which means that the training set and the test set are divided by a ratio of 4:1. More details about the dataset can be found online at http://www.bbci.de/competition/iv/. 4.3 Data processing In general, the EEG signals obtained from the experimental records are not only accompanied by power frequency interference such as power supply and wire, but also doped with artifacts such as myoelectricity and EoG, which are obstacles to the analysis of EEG signals. Therefore, the researchers always perform a series of pre-processing before the experiment, such as digital filtering, the removal of the artifact and normalization, to remove these noises and interference for proper improvement of the signal-to-noise ratio. In this paper, we preprocessed the experimental data set as follows: Step 1: Remove the average. The amplitude of each sample is subtracted from the average amplitude so that the mean value of the EEG signal is zero. Such an operation can limit the values that differ greatly in the data to a certain range while keeping the relative relations between them unchanged. During neural 9
lP
re
-p
ro of
network training, when the input feature value is too large, the change of the output of the corresponding activation function is small, and the problem of gradient disappearing is prone to occur in backpropagation. When the input eigenvalue is reduced and limited to a certain range, the problem of gradient disappearance can be avoided in neural network training, so as to prevent over-fitting. Step 2: Signal filtering [42]. Studies have shown that the frequency of EEG signal is correlated with the physiological activity of human brain. The EEG of different motor imagination tasks differs in the characteristic frequency band. Choosing the appropriate frequency band data can obtain signals that are more conducive to feature extraction and classification recognition. When the left hand or right hand movement is imagined, the ERD phenomenon of μ rhythm (8~13Hz) and the β wave (13~30Hz) are significant and appeared in the contralateral sensory motor area of the human cerebral cortex. Therefore, this paper performs 8~30Hz band-pass filtering on EEG signals. 4.4 Experimental results and analyses In this section, detailed experiments are performed and the results are presented. The compilation software and computer operating environment settings of all simulation experiments are as follows: application platform is TensorFlow, computer processor is Intel(R) Core(TM) i5-6500, clock speed is 3.20 GHz, RAM is 8 GB, and with NVIDIA GeForce GTX 1050 Ti GPU. As can be seen from Section 3.1, the choice of parameter k value has a great influence on the performance of the model. We observe the recognition accuracy of the corresponding model by setting different k values. Each result is the best value which is experimented one by one. Figure 8 shows results of classification performance based on the proposed methods with different k values (k=5, k=10, k=15). In most cases, the nearest neighbors k=10 performs better than the other two. Therefore, in the subsequent experiments, the model in this paper uses k=10.
ur na
Fig.8. The recognition accuracy corresponding to different k values In order to verify the performance of the k-SAE combined with the KNN algorithm and the CAE network, in this section, we apply the proposed method to the samples obtained from the abovementioned left and right hand MI EEG signal acquisition experiments. The AE+softmax, CNN and CAE+softmax models are built with the same training set and tested with the same test set to verify the superiority of k-SAE. The test results of the four methods are listed in Table 1. The experimental results of each method are tested one-by-one, and the recognition accuracy shown in Table 1 is the highest result obtained after multiple tests of each method. We can see that k-SAE shows a better recognition effect. The recognition accuracy of 7 subjects obtained by k-SAE both are 90% and above, and higher than that of the other three algorithms. Table 1. The recognition accuracy (%) of different algorithms for seven subjects
S1
S2
S3
S4
S5
S6
S7
AE+softmax
82.5
90
83.5
95
80.5
62.5
73.5
CNN
85
95
85
97.5
85
72.5
82.5
CAE+softmax
90
92.5
75
97.5
90
80
85
k-SAE
95
100
90
100
100
90
92.5
Jo
subject
Seen from the structure of k-SAE, k-SAE can be considered as a standard auto-encoder with new transformed input. Such a variant has the same working principle as denoising auto-encoder (DAE), 10
Jo
ur na
lP
re
-p
ro of
which make the model more robust by transforming the input data. However, the basic idea of DAE is to add noise to the input by randomly corrupting it. While k-SAE adds adjacent data to the input data, which is purposeful learning and more analytical. In order to verify that k-SAE is more efficient than DAE for classification tasks, we establish a DAE model (the noise reduction factor is 0.01) to train the same data. We set the arithmetic mean of the test results of seven subjects under each algorithm to the average recognition rate of the algorithm. Experiments show that the average recognition rate of DAE is 82.79%, k-SAE is 95.36%. The classification of k-SAE is better. In this paper, ROC curves are used to evaluate different classification models. The left-handed motion is assumed to be a positive class, and the right-handed motion is assumed to be a negative class. The ROC curves generated by the test set classification are compared using methods of CNN,CAE, DAE and k-SAE. Figure 9 shows the ROC curves of the four methods for all subjects. From the figures, we can see that the area under the ROC curve of the proposed method in each figure is larger than the other three methods, that is, the performance of the k-SAE is better.
Fig.9. The ROC curve of subjects under different methods To further illustrate the superior performance of k-SAE, the accuracy of its recognition is compared with other related stated-of-the-art methods such as Improved Discriminative filter bank selection approach using mutual information (I-DFBCSP) [43], Tangent Space Mapping using Common Spatial Pattern (CSP-TSM) [44], Multi-kernel Extreme Learning Machine (MKELM) [45], Dual Tree Complex Wavelet Transform (DTCWT) [46] and multivariate empirical mode decomposition and short time Fourier transform based hybrid method (MEMD-STFT) [47]. For I-DFBCSP and CSP-TSM, the RBF kernel SVM is used as the classifier. DTCWT and MEMD-STFT use KNN as the classifier, in which the K value is 7. In addition, considering that EEG is a kind of signal data with time series, we also establish a popular LSTM [48] network to compare with the model proposed in this paper. The accuracy of different algorithms for identifying EEG signals of 7 subjects is shown in Table 2. Since the recognition accuracy of each method is tested one-by-one and the Table 2 records the average results of 15 11
experiments. It can be seen from Table 2 that for all subjects, k-SAE outperforms the other six feature extraction algorithms. Based on k-SAE, the recognition rate of 7 subjects reaches more than 89% and the average recognition rate of 7 subjects is 94.81%. In addition, all subjects have a small difference in test results. This means that the proposed method has higher recognition rate and generalization ability than the previous method. Table 2. The recognition accuracy (%) of different algorithms for seven subjects. I-DFBCSP
CSP-TSM
MKELM
DTCWT
MEMD-STFT
LSTM
k-SAE
S1
90.00
91.33
92.72
92.50
90.83
89.50
94.17
S2
71.30
70.83
92.83
70.83
72.50
94.83
99.83
S3
88.17
86.67
88.33
87.08
87.50
85.00
89.67
S4
97.86
97.50
98.74
99.17
98.33
96.50
100.00
S5
95.33
89.17
92.53
94.17
93.33
95.50
98.83
S6
75.67
70.83
82.03
78.34
77.50
87.00
89.47
S7
84.13
84.03
89.73
82.41
81.67
91.50
91.67
Average
86.07
84.34
90.99
86.36
85.95
91.40
94.81
ro of
subject
re
-p
To further evaluate and compare the computational complexity of our proposed method with that of other algorithms, the test time taken by the methods are computed. All tests are in the same lab environment and settings. And all examples are from the subject ‘S3’. It can be seen from Table.3 that k-SAE has the fastest test time among these networks. Once the network is trained, the network model would be fixed and the testing time in real-time applications should be concerned about. Therefore, the proposed training model of the network can be applied to a real-time BCI auxiliary system. Table 3. The test time required by different algorithms for subject S3. I-DFBCSP
MKELM
DTCWT
LSTM
k-SAE
Test time(s)
14.43
12.95
18.63
13.57
12.63
lP
Method
Jo
ur na
In order to verify the recognition effect of k-SAE on BCI competition IV data sets 2b, the recognition results are compared with those of I-DFBCSP, CSP-TSM, MKELM, DTCWT and MEMDSTFT, 5-fold cross-validation is used to evaluate the performance of all experiments, and the table also respectively lists the results of the top three competitions, Chin, Gan and Coyle[49]. Through the experimental results of Table 4, we can see that the classification results of the subjects under the same algorithm are different. Motor imagery EEG signals are an abstract imaginary signal difficult to quantify and highly susceptible to psychological, physical and environmental factors. With differences in individual imagination habits and exercise habits, the imaginary signal data is different, and even the same individual will show great fluctuations on the data of the same imaginary task in different time periods [21]. This means that the relevant processing algorithms need to have strong generalization ability and adaptability to ensure good interaction and communication between people and machines. The results in Table 4 show that for most subjects, k-SAE is superior to other algorithms, and outperforms all competing methods on average recognition accuracy. Paired t-test is further implemented on the dataset to investigate the significant differences among different methods. The result reveals that k-SAE achieves significantly higher classification accuracies than other algorithms (k-SAE>Chin: p=0.002(<0.05), k-SAE>GAN: p=0.007(<0.05), k-SAE>Coyle: p=0.000(<0.05), k-SAE> I-DFBCSP: p=0.007(<0.05), k-SAE> CSP-TSM: p=0.004(<0.05), k-SAE>MKELM: p=0.085(<0.1), k-SAE> DTCWT: p=0.002(<0.05), k-SAE> MEMD-STFT: p=0.006(<0.05)). Among them, six subjects of B04B09 achieved better classification results, while the classification accuracy of two subjects B02 and B03 performed poorly under different algorithms. The reason for the low classification accuracy of these two subjects is probably that they do not have good control of their own thinking to implement the motor imaging tasks [50]. Therefore, that the brain-computer interface experimental subjects can control their own thinking is also very important for EEG signal processing. Table 4. The recognition accuracy (%) of different algorithms on BCI competition IV data sets 2b 12
Chin
Gan
Coyle
I-DFBCSP
CSP-TSM
MKELM DTCWT
MEMD-STFT
k-SAE
B01
70.00
71.00
60.00
73.91
72.58
77.50
57.63
56.56
75.71
B02
61.00
61.00
56.00
58.37
60.56
64.40
59.82
61.25
64.29
B03
61.00
57.00
56.00
56.24
60.94
54.30
55.07
55.31
62.86
B04
98.00
97.00
89.00
99.67
99.25
99.30
97.29
96.87
100.00
B05
93.00
86.00
79.00
90.58
93.57
84.60
87.14
86.56
94.29
B06
81.00
81.00
75.00
78.81
76.99
69.50
74.70
77.50
84.29
B07
78.00
81.00
69.00
81.26
78.30
86.80
78.41
79.06
82.86
B08
93.00
92.00
93.00
93.35
92.51
89.90
72.93
71.88
94.29
B09
87.00
89.00
73.00
86.75
85.78
83.70
77.63
81.25
87.14
Average
80.22
79.44
72.22
79.88
80.05
78.89
73.40
74.03
82.86
5. Conclusion
ro of
subject
lP
re
-p
Based on the theory of deep learning, this paper proposes an innovative k-SAE network to realize feature extraction and classification of motor imagery EEG signals. The structure of the model not only retains the feature information of similar samples of data, but also preserves the integrity of important information to the greatest extent and achieves data smoothing. The proposed method is applied to the experimentally collected data and one public dataset, and the model and other classification methods are evaluated using the recognition accuracy rate and the ROC curve. Experiments show that k-SAE outperforms on classification tasks, and has stronger robustness and better generalization ability. The kSAE network provides a novel and effective method for the classification and recognition of motor imagery EEG signals. In the future, we can conduct more in-depth exploration from the following aspects: Further expand the number of experimental samples from different experimental populations and improve the application value of the method; Optimize more network parameters to improve stability and accuracy to cope with more complex EEG signal analyses.
ur na
Acknowledgement This work was supported by National Nature Science Foundation of China under Project 61673079, 61703068 and Natural Science Foundation of Chongqing under Project cstc2018jcyjA0667.
Conflict of Interest Statement
Jo
We declare that there is no conflict of interest with the manuscript entitled “Motor imagery EEG recognition with KNN-based smooth auto-encoder”. We confirm that all authors of this manuscript have directly participated in planning, execution and analysis of this study. We confirm that the contents of this manuscript have not been copyrighted or published previously. In addition, the contents of this manuscript are not under consideration for publication elsewhere.
References [1] Barry L. Jacobs,Casimir A. Fornal. Activity of Brain Serotonergic Neurons in Relation to Physiology and Behavior[J]. Handbook of Behavioral Neuroscience,2010,21. [2] Lynn M. McCane,Susan M. Heckman,Dennis J. McFarland,George Townsend,Joseph N. Mak,Eric W. Sellers,Debra Zeitlin,Laura M. Tenteromano,Jonathan R. Wolpaw,Theresa M. Vaughan. P300based brain-computer interface (BCI) event-related potentials (ERPs): People with amyotrophic 13
Jo
ur na
lP
re
-p
ro of
lateral sclerosis (ALS) vs. age-matched controls[J]. Clinical Neurophysiology,2015,126(11). [3] Fei Wang,Yanbin He,Jun Qu,Qiuyou Xie,Qing Lin,Xiaoxiao Ni,Yan Chen,Jiahui Pan,Steven Laureys,Ronghao Yu,Yuanqing Li. Enhancing clinical communication assessments using an audiovisual BCI for patients with disorders of consciousness[J]. Journal of Neural Engineering,2017,14(4). [4] Lazarou Ioulietta,Nikolopoulos Spiros,Petrantonakis Panagiotis C,Kompatsiaris Ioannis,Tsolaki Magda. EEG-Based Brain-Computer Interfaces for Communication and Rehabilitation of People with Motor Impairment: A Novel Approach of the 21 st Century[J]. Frontiers in human neuroscience,2018,12. [5] Hassan A R , Hassan Bhuiyan M I . Automatic sleep scoring using statistical features in the EMD domain and ensemble methods[J]. Biocybernetics and Biomedical Engineering, 2015:S0208521615000844. [6] Hassan A R , Bhuiyan M I H . Computer-aided sleep staging using Complete Ensemble Empirical Mode Decomposition with Adaptive Noise and bootstrap aggregating[J]. Biomedical Signal Processing and Control, 2016, 24:1-10. [7] Hassan A R , Bhuiyan M I H . A decision support system for automatic sleep staging from EEG signals using tunable Q-factor wavelet transform and spectral features[J]. Journal of Neuroscience Methods, 2016, 271:107-118. [8] Hassan A R , Bhuiyan M I H . Dual tree complex wavelet transform for sleep state identification from single channel electroencephalogram[C]// IEEE International Conference on Telecommunications & Photonics. IEEE, 2016. [9] Subasi A . Application of adaptive neuro-fuzzy inference system for epileptic seizure detection using wavelet feature extraction[J]. Computers in Biology and Medicine, 2007, 37(2):227-244. [10] Hassan A R , Haque M A . [IEEE TENCON 2015 - 2015 IEEE Region 10 Conference - Macao (2015.11.1-2015.11.4)] TENCON 2015 - 2015 IEEE Region 10 Conference - Epilepsy and seizure detection using statistical features in the Complete Ensemble Empirical Mode Decomposition domain[C]// Tencon IEEE Region 10 Conference. IEEE, 2015:1-6. [11] M. Jeannerod. Mental imagery in the motor context[J]. Neuropsychologia,1995,33(11). [12] Ahmed Izzidien,Sriharasha Ramaraju,Mohammed Ali Roula,Peter W. McCarthy,Do-Won Kim. Effect of Anodal-tDCS on Event-Related Potentials: A Controlled Study[J]. BioMed Research International,2016,2016. [13] Saugat Bhattacharyya,Amit Konar,D.N. Tibarewala. A differential evolution based energy trajectory planner for artificial limb control using motor imagery EEG signal[J]. Biomedical Signal Processing and Control,2014,11. [14] Yang Yu,Zongtan Zhou,Erwei Yin,Jun Jiang,Jingsheng Tang,Yadong Liu,Dewen Hu. Toward brain-actuated car applications: Self-paced control with a motor imagery-based brain-computer interface[J]. Computers in Biology and Medicine,2016,77. [15] Javier Asensio-Cubero,John Q. Gan,Ramaswamy Palaniappan. Multiresolution analysis over graphs for a motor imagery based online BCI game[J]. Computers in Biology and Medicine,2016,68. [16] Filip Škola,Fotis Liarokapis. Embodied VR environment facilitates motor imagery brain–computer interface training[J]. Computers & Graphics,2018. [17] Alrajhi W, Alaloola D, Albarqawi A. Smart home: toward daily use of BCI-based systems[C]// International Conference on Informatics, Health & Technology. IEEE, 2017:1-5. [18] Bostanov Vladimir. BCI Competition 2003--Data sets Ib and IIb: feature extraction from eventrelated brain potentials with the continuous wavelet transform and the t-value scalogram[J]. IEEE Transactions on Biomedical Engineering,2004,51(6). [19] WANG Pan,SHEN Ji-Zhong,SHEN Jin-He. Research of P300 Feature Extraction Algorithm Based on Wavelet Transform and Fisher Distance[J]. International Journal of Education and Management Engineering(IJEME),2011,1(6). [20] Banghua Yang,Huarong Li,Qian Wang,Yunyuan Zhang. Subject-based feature extraction by using fisher WPD-CSP in brain–computer interfaces[J]. Computer Methods and Programs in Biomedicine,2016,129. [21] Hsu W Y, Sun Y N. EEG-based motor imagery analysis using weighted wavelet transform features[J]. Journal of Neuroscience Methods, 2009, 176(2):310. [22] Koles Z J, Lazar M S, Zhou S Z. Spatial patterns underlying population differences in the background EEG[J]. Brain Topography, 1990, 2(4):275-284. [23] Hekmatmanesh A, Jamaloo F, Wu H, et al. Common spatial pattern combined with kernel linear discriminate and generalized radial basis function for motor imagery-based brain computer interface applications[C]// American Institute of Physics Conference Series. American Institute of 14
Jo
ur na
lP
re
-p
ro of
Physics Conference Series, 2018:020003. [24] Yukun Wen. Online Motor Imagery BCI Based on Adaptive and Incremental Linear Discriminant Analysis Algorithm[A]. IEEE Beijing Section、Guangdong University of Technology、 University of Electronic Science and Technology of China.Proceedings of 2017 9th IEEE International Conference on Communication Software and Networks (ICCSN 2017)[C].IEEE Beijing Section、Guangdong University of Technology、University of Electronic Science and Technology of China:,2017:5. [25] Kottaimalai R, Rajasekaran M P, Selvam V, et al. EEG signal classification using Principal Component Analysis with Neural Network in Brain Computer Interface applications[C]// International Conference on Emerging Trends in Computing, Communication and Nanotechnology. IEEE, 2013: 227-231. [26] Lima C A M , André L.V. Coelho. Kernel machines for epilepsy diagnosis via EEG signal classification: A comparative study[J]. Artificial Intelligence in Medicine, 2011, 53(2):83-95. [27] Cecotti H, Gräser A. Convolutional neural networks for P300 detection with application to braincomputer interfaces[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2011, 33(3):433-445. [28] Yang H, Sakhavi S, Kai K A, et al. On the use of convolutional neural networks and augmented CSP features for multi-class motor imagery of EEG signals classification.[C]// Engineering in Medicine and Biology Society. IEEE, 2015:2620-2623. [29] An, Xiu, et al. "A Deep Learning Method for Classification of EEG Data Based on Motor Imagery." Intelligent Computing in Bioinformatics. Springer International Publishing, 2014. 203-210 [30] Xia S, Xiong Z, Luo Y, et al. Location difference of multiple distances based k -nearest neighbors algorithm[J]. Knowledge-Based Systems, 2015, 90(C):99-110. [31] Zhao J, Mathieu M, Goroshin R, et al. Stacked What-Where Auto-encoders[J]. Computer Science, 2016, 15(1):3563-3593. [32] J. Masci, U. Meier, D. Cires¸an, and J. Schmidhuber, “Stacked convolutional auto-encoders for hierarchical feature extraction,” in International Conference on Artificial Neural Networks, pp. 52– 59, 2011. [33] Liu A , Zhao Z , Zhang C , et al. Smooth filtering identification based on convolutional neural networks[J]. Multimedia Tools and Applications, 2016. [34] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks [J]. Science,2006,313 ( 5786) : 504-507. [35] Yu T, Xiao J, Wang F, et al. Enhanced Motor Imagery Traning Using a hybrid BCI with feedback[J]. Biomedical Engineering, IEEE Transactions on, 2015, 62(7):1706-1717. [36] Kim H C , Pang S , Je H M , et al. Pattern Classification Using Support Vector Machine Ensemble[C]// International Conference on Pattern Recognition. IEEE, 2002. [37] Lukáö Ruckay, Sovka P . Selection And Classification Of EEG Movement-Related Independent Components[J]. Analysis of Biomedical Signals & Images, 2008, 6848(2):123-127. [38] Xiao D , Mu Z , Hu J . Classification of Motor Imagery EEG Signals Based on Energy Entropy[C]// International Symposium on Intelligent Ubiquitous Computing & Education. IEEE Computer Society, 2009. [39] Dai R M. The Motor Imagery EEG Classification Based on Deep Learning. Master's Thesis, Beijing Institute Of Technology, 2015, p. 38-39. [40] Zhang S H. Analysis of Motor Imagery EEG. Master's Thesis, Shanghai Jiao Tong University, 2015, p. 41-42. [41] Zhang Y , Zhou G , Jin J , et al. Optimizing spatial patterns with sparse filter bands for motorimagery based brain-computer interface[J]. Journal of Neuroscience Methods, 2015:S016502701500285X. [42] H Chu, C Jin, J X, Chen Y X Guo. A 3-D Millimeter-Wave Filtering Antenna With High Selectivity and Low Cross-Polarization [J]. IEEE Trans. On Antennas and Propagation, 2015.05:2375-2380. [43] Kumar S , Sharma A , Tsunoda T . An improved discriminative filter bank selection approach for motor imagery EEG signal classification using mutual information[J]. BMC Bioinformatics, 2017, 18(S16):545. [44] Kumar S , Mamun K , Sharma A . CSP-TSM: Optimizing the performance of Riemannian tangent space mapping using common spatial pattern for MI-BCI[J]. Computers in Biology and Medicine, 2017. 91(Supplement C): p. 231-242. [45] Zhang Y, Wang Y, Zhou G, et al. Multi-kernel extreme learning machine for EEG classification in brain-computer interfaces[J]. Expert Systems with Applications, 2018, 96: 302-310. [46] Bashar S K, Hassan A R, Bhuiyan M I H. Identification of motor imagery movements from eeg 15
[47]
[48] [49]
ro of
[50]
signals using dual tree complex wavelet transform[C]//2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, 2015: 290-296. Bashar S K, Hassan A R, Bhuiyan M I H. Motor imagery movements classification using multivariate EMD and short time Fourier transform[C]//2015 Annual IEEE India Conference (INDICON). IEEE, 2015: 1-6. Aite Zhao,Lin Qi,Junyu Dong,Hui Yu. Dual channel LSTM based multi-feature extraction in gait for diagnosis of Neurodegenerative diseases[J]. Knowledge-Based Systems,2018,145. Lu, N., Li, T., Ren, X., & Miao, H. (2016). A deep learning scheme for motor imagery classification based on restricted boltzmann machines. IEEE transactions on neural systems and rehabilitation engineering, 25(6), 566-576. Zhang Y, Zhou G, Jin J, Wang X, Cichocki A. Optimizing spatial patterns with sparse filter bands for motor-imagery based brain–computer interface. J Neurosci Methods. 2015;255:85– 91.(dataset2)
lP
re
-p
Xianlun Tang, Professor at the College of Automation, Chongqing University of Posts and Telecommunications. His research interest covers pattern recognition and intelligent system, deep learning.
Jo
ur na
Ting Wang, Master student at the College of Automation, Chongqing University of Posts and Telecommunication. Her research interest covers deep learning and pattern recognition.
16
Yiming Du, Master student at the College of Computer Science and Technology, Chongqing University of Posts and Telecommunications. His research interest covers deep learning, image recognition.
Jo
ur na
lP
re
-p
ro of
Yuyan Dai, Master student at the College of Automation, Chongqing University of Posts and Telecommunication. Her research interest include: neural network, load forecasting.
17