Extracting features from phase space of EEG signals in brain–computer interfaces

Extracting features from phase space of EEG signals in brain–computer interfaces

Author's Accepted Manuscript Extracting features from phase space of EEG signals in brain-computer interfaces Yonghui Fang, Minyou Chen, Xufei Zheng ...

803KB Sizes 0 Downloads 53 Views

Author's Accepted Manuscript

Extracting features from phase space of EEG signals in brain-computer interfaces Yonghui Fang, Minyou Chen, Xufei Zheng

www.elsevier.com/locate/neucom

PII: DOI: Reference:

S0925-2312(14)01398-8 http://dx.doi.org/10.1016/j.neucom.2014.10.038 NEUCOM14842

To appear in:

Neurocomputing

Received date: 24 June 2013 Revised date: 18 June 2014 Accepted date: 13 October 2014 Cite this article as: Yonghui Fang, Minyou Chen, Xufei Zheng, Extracting features from phase space of EEG signals in brain-computer interfaces, Neurocomputing, http://dx.doi.org/10.1016/j.neucom.2014.10.038 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Extracting features from phase space of EEG signals in brain-computer interfaces Yonghui Fanga,b , Minyou Chenb,∗, Xufei Zhengc a

b

School of Electronic and Information Engineering, Southwest University, China State Key Laboratory of Power Transmission Equipment & System Security and New Technology, Chongqing University,China c Faculty of Computer and Information Science, Southwest University, China

Abstract Conventional feature extraction methods based on autoregressive and amplitudefrequency analysis assume stationarity in the Electroencephalogram signal along short time intervals in Brain-Computer Interface (BCI) studies. This paper proposes a feature extraction method based on phase space for motor imagery tasks recognition. To remodel the single nonlinear sequence is to reveal variety information hidden in the original series by its phase space reconstruction, and maintaining the original information continuity. Three phase space features (PSF) datasets are used to classify two Graz BCI datasets. The simulation has shown that the linear discriminant analysis(LDA) classifiers based on the PSF outperform all the winners of the BCI Competition 2003 and other similar studies on the same Graz dataset in terms of the competition criterion of the mutual information (MI) or misclassification rate.The maximal MI was 0.67 and the minimal misclassification rate was 9.29% for Graz 2003 dataset by using the combined PSF. According to the competition criterion of the maximal MI steepness, the LDA classifier based on PSF yielded a better performance than the winner of the BCI Competition 2005 and other similar research on the same Graz dataset for subject O3.The maximal MI steepness of O3 is 0.7355. Keywords: Brain-computer interface (BCI), Electroencephalogram (EEG), Phase Space Reconstruction (PSR), Nonlinear, Amplitude-Frequency Analysis ∗

Corresponding author: Minyou Chen Email address: [email protected] (Minyou Chen)

Preprint submitted to Neurocomputing

October 22, 2014

(AFA), Autoregressive (AR) modeling 1. Introduction Brain-Computer Interface(BCI) is a system that helps the serious disabled individuals to drive and control external devices using only their brain activity, without participation of peripheral nerves and muscles [1, 2, 3]. Most noninvasive BCIs measure brain activity through Electroencephalogram (EEG) sensors placed on the head [4]. Although several methods for the brain function analysis such as megnetoencephalography (MEG), functional magnetic resonance imaging (fMRI) and positron emission tomography (PET) have been introduced, the EEG signal is still a valuable tool for monitoring the brain activity due to its relatively low cost and being convenient for the patient [5]. The real purpose of BCI is to translate brain activity electric signal into a command for a computer. To achieve this goal, the feature extraction and the classification algorithm are employed. Currently, the success in the motor imagery-based BCI can be attributed to the underlying neurophysiological phenomena accompanying motor imagery, termed event-related desynchronization (ERD) and event-related synchronization (ERS) [6]. Processing of motor commands or somatosensory stimuli causes an attenuation of the rhythmic activity termed ERD, while an increase in the rhythmic activity is termed ERS [7]. Both phenomena are time-locked but not phase-locked to the event, and they are highly frequency-band specific. For BCI, the important fact is that the ERD is caused also by imagined movements (healthy users) and by intended movements in paralyzed patients [8]. The rhythm in (8-13 Hz) and rhythm in (14-25 Hz) originating in the sensorimotor cortex have been postulated to be good signal features for EEG-based BCI [9, 10]. There have been many EEG feature extraction studies within the recent years. These studies used different feature extraction techniques to evaluate different classifiers. Rodr´ıguez-Berm´ udez et al. presented an efficient LeaveOne Out estimation method to choose the most useful features extracted by PSD, Hjorth, AR and WT. Their results showed that the proposed method is competitive against other conventional methods [11]. Palaniappan et al. compared the classification performances of univariate autoregressive (AR) and multivariate autoregressive (MAR) models for representing EEG signals that were extracted during different mental tasks. The results showed that 2

AR coefficients computed with Burg’s algorithm is the most suitable feature extraction method in their studies [12]. Schl¨ogl et al. applied the AAR to the single trial classification of off-line BCI data for the analysis of EEG patterns [13, 14]. Zhang et al. used the amplitude-frequency analysis (AFA) and phase synchronization techniques to realize an online BCI [15]. Xu et al. used fuzzy support vector machine based on the statistical features over the set of the wavelet coefficients to classify the left and right hand motor imagery tasks. The classification results outperformed the winner of the BCI Competion2003 and 2005 on the same datasets and the same criterions [16]. Zhou et al. stated that by using the bispectrum-based features, all three classifiers (i.e., linear discriminant analysis (LDA), support vector machine (SVM) classifier, and neural network (NN) classifier) outperformed the winner of the BCI Competition 2003 on the same Graz dataset [17]. Park et al. introduced a novel method using extension of the algorithm EMD to determine asymmetry, the lateralization of brain activity [18]. Many feature extraction methods used today always assume that the EEG signals are linear. Many research studies have concerned on the presence of non-linearity in human EEG signals with various outcomes. Liley et al. proposed an EEG model to describe the dynamics of neural activity in cortex [19]. Stam analyzed the nonlinear dynamics of EEG and MEG [20]. All the research results have proven that the EEG is nonlinear. Linear feature extraction method always ignored the nonlinear information contained in the signals[21]. So using non-linear methods to analyze the EEG signals is suitable. One of the popular nonlinear methods is phase space reconstruction(PSR), which is a valuable tool for the studies of this kind of signals [20, 22]. Presently, many nonlinear features are used to analyze epilepsy, mental fatigue. Lots of good results were obtained[21, 23, 24]. However, the chaotic indices are seldom used in BCI research. Banitalebi et al. used the largest Lyapunov exponent, mutual information, correlation dimension and minimum embedding dimension as the features of EEG signals and obtained some good results[25]. As is well known, the calculations of chaos features(such as Lyapunov exponent, mutual information) are time-consuming. A practical BCI must have a high information transfer rate, so these chaos features cannot meet the real-time requirements. To address these deficits, we developed a feature extraction scheme based on phase space. The principle of PSR is to transform the properties of a time series into topological properties of a geometrical object which is embedded in a space, wherein all possible states 3

of the system are represented. Each state corresponds to a unique point, and this reconstructed space shares the same topological properties as the original space[26, 27]. The attractors in phase space are sensitive to the initial conditions. A litter difference in the initial state may cause a big change in the trajectory of the attractor. The structure of the phase space can be seen as a filter. Every reconstructed dimension provides the similar frequency structure of the original times series with variation in phase and amplitude. The trajectory of the EEG attractors may change obviously when the ERD/ERS phenomenon emerges in motor imagery. So, the features extracted in the phase space of the EEG signals may contain more information than that extracted in the original EEG. This contribution will presents a feature extraction method in phase space. In order to access better results, a hybrid feature set are used in this study. In the first stage of the proposed scheme, the phase space of the EEG time series is reconstructed. Then the features were extracted in the phase space by AFA and AR models separately. To evaluate the effectiveness of the feature sets, the LAD classifier is adopted to classify the Graz BCI datasets which are used in the BCI-Competition 2003 [28] and BCI-Competition 2005 [29]. The combined features set combined with the ARPS and AFAPS features were also used to improve the performance of the classifier. 2. Data acquisition and description In order to verify the effectiveness of the proposed features, two datasets come from BCI Competition 2003 and 2005 are used in this study. The first dataset is the dataset III of BCI Competition 2003 (termed Graz2003 dataset) and the second dataset is the dataset IIIB of BCI Competition 2005 (termed Graz2005 dataset). All the datasets are provided by the Laboratory of BrainComputer Interfaces (BCI-Lab), Graz University of Technology [28, 29]. All EEG data were collected during a motor imagery task. The subjects were asked to control a visual feedback setting by means of imagery left or right hand movements. The order of left and right cues was random. In BCI Competition 2003, the Graz dataset was recorded over C3, Cz and C4 from a normal female subject during a feedback session. The experiment consisted of 7 runs with 40 trials each. In the available 280 trials, 140 labeled trials were used to train the classifiers, whereas the other 140 trials are now available in [28] for testing the generalization performance of the trained classifiers. The EEG was sampled at a 128-Hz sampling rate and was filtered between 0.5 4

and 30 Hz. Each trial lasted 9 s. At t=3s, a left or right arrow was displayed as a cue, and the subject was, at the same time, asked to do motor imagery along the direction of the cue. The Graz dataset IIIB of BCI Competition 2005 was recorded over the positions C3, C4 from three subjects (O3, S4 and X11) during the cued motor imagery task with online feedback. The experiment consists of 3 sessions for each subject. Each session consists of 4 to 9 runs. The recordings were made with a bipolar EEG amplifier. The EEG was sampled with 125 Hz; it was filtered between 0.5 and 30Hz with Notch-filter on. The duration of each trial was 8s, with a random interval between trials from 1 to 2s. At t=3s, a left or right arrow was displayed as a cue, and the subject was, at the same time, asked to do motor imagery along the direction of the cue. More detailed description can be found in [29]. The distribution of available trials in the training and testing dataset for the four subjects is summarized in Table 1. Table 1: Distribution of trials in the training and testing datasets from Graz2003 dataset and Graz2005 dataset. The Graz2005 dataset was generated by different experimental settings for the three subjects. Subject

Training dataset

Testing dataset

classes

Feedback

Graz2003 O3 S4 X11

140 242 540 540

140 159 540 540

2 2 2 2

Bar Virtual enviroment Basket Basket

3. Methods 3.1. Feature extraction from EEG signals AFA and AR model are the conventional methods to extract features from EEG for BCI. However, both AFA and AR model assume linearity, Gaussianality and minimum-phase within EEG signals. The nonlinear features in EEG are ignored. AFA and AR are used to extract the features in the phase space of EEG to verify the effectiveness of the proposed scheme. 3.1.1. Amplitude-frequency analysis (AFA) AFA is a method based on Fast Fourier transform(FFT). Fourier transform of a sequence x(n)(n = 0, 1, ..., N − 1) with the length of N can obtain a Discrete Fourier transform sequence X(k)(k = 0, 1, ..., N/2 − 1) with the 5

length of N/2 . When N can be represented by 2m (m is an integer), the Fourier transform can be calculated by a fast algorithm. In generally, it can be expressed as Equation 1. X(k) = |X(k)| ejϕ(k)

(1)

Where|X(k)| is the amplitude-frequency character, ϕ(k) is the phase-frequency character, the unit of k is Fs /N Hz, Fs is sampling frequency. Many studies have shown that sensory, cognitive and motor processing can result in changes of the ongoing EEG in form of ERD or ERS. A broadbanded ERD in the 10-Hz and 20-Hz bands at electrode positions C3 and C4 is with a contralateral dominance during right/left hand imagery [9, 10]. Figure 1 shows the averaging amplitude-frequency curve of the EEG signals on C3 and C4 based on the Graz2003 dataset training trials. When imaging left hand movement, the ERD at C4 is stronger to the ERD at C3; the peak value of the µ/β band over C3 is greater than that over C4; the contrary is true when imaging right hand movement. The changes of the peak value over C3 and C4 is consistent with the contralateral dominance phenomenon of ERD. There was a significance difference between the left and right motor imagery. In practice, physiologically meaningful EEG features can be extracted from various frequency bands of recorded EEG signals. Here, the peak value of the µ(8-13 Hz) and β (18-25Hz) bands are used as the features for EEG classification in this study. 3.1.2. Autoregressive (AR) models Autoregressive (AR) methods have been used in a number of studies to model EEG data by representing the signal at each channel as a linear combination of the signal at previous time points [30]. AR models provide a compact, computationally efficient representation of EEG signals. Furthermore, AR model parameters are invariant to scaling changes in the data that can arise from inter-subject variations, such as scalp and skull thickness. Due to these properties, AR modeling has been extensively used in EEG for different analysis such as feature extraction and classification tasks, detection and classification of cardiac arrhythmias, and analysis of epilepsy data [30]. The autoregressive framework assumes that the EEG signal can be modeled as a linear combination of the signals at the previous time points. An autoregressive model of order p for a single channel can be written as: p ∑ y(t) = αi y(t − i) + εt (2) i=1

6

20

20 C3 C4

18

16

14

14

12

12

Amplitude [µV]

16

10 8

10 8

6

6

4

4

2

2

0

0

5

10

15 20 Frequency [Hz] (a)

25

C3 C4

18

0

30

0

5

10

15 20 Frequency [Hz] (b)

25

30

Figure 1: The averaging amplitude-frequency curve for left(a) and right(b) hand motor imagery over C3, C4 of the training trials in Graz2003 dataset.

Where p denotes the number of times points in the past that are used to model the current time point and εt denotes a zero-mean process with variance δ 2 . The parameters of the AR model are the coefficients αi (i = 1, 2, , p) and the noise variance δ 2 [31]. We use the AR parameters as features for subsequent analysis due to their desirable properties. 3.2. Reconstruction of the systems dynamics in state space The theory of nonlinear phase space reconstruction is proposed by Packard and Takens [27].If the system from which the measurements were taken has an attractor, the series of reconstructed vectors constitute an ’equivalent attractor’ [32]. Takens has proven that this equivalent attractor has the same dynamical properties as the true attractor [27].To remodel the single nonlinear sequence is to reveal variety information hidden in the origin series by its m-dimensional reconstruction, and maintaining the original information continuity. It means that we can obtain valuable information about the dynamics of the system, even if we don’t have direct access to all the systems variables. One of the most used methods of phase space reconstruction is the timedelay embedding. Since this method does not require that the treated system could be mathematically defined, explicitly, it fits well with 1D time series. Let{xk : k = 1, 2, ..., N } is an observed time series. From the time series, an 7

m-dimension phase space can be remodeled. The reconstruction phase space can be shown as the following trajectory matrix:     x1 x1+τ ... x1+(m−1)τ X1   X2  x2+τ ... x2+(m−1)τ    =  x2 (3) X=   ...  ...  ... ... ... xM xN −mτ ... xN XM where M = N − (m − 1)τ , τ is the delaying time and m is the embedding dimension. More precisely, when an infinite amount of noise-free data is available and if m is chosen large enough, then the collection of these vectors ”embed” the underlying flow for basically any choice of τ . Hence, the mdimensional space of delay coordinates serves as a pseudo state-space which provides a natural setting to approximate the quantitative aspects of the dynamics. According to the theory of signals, a time delay nτ in the time domain become a multiplication with the exponential function eiωnτ in the frequency domain using the Fourier transformation [33]. If the Fourier transformation of x(t) is X(ω), then: xn (t) = x(t + nτ ) ⇒ Xn (ω) = X(ω) · eiωnτ

(4)

So the reconstruction can be described as a product between original timeseries and a frequency−depending function: Xn (ω) = X(ω)Fn (ω)

(5)

This means that one needs only the original signals X(ω) and m different reconstruction function Fn (ω) (one function per embedding dimension) to reconstruct the attractor. This structure can also be seen as a filter. Every reconstructed dimension can be seen as a filtered version of the original time series. Hence every reconstructed dimension provides the frequency structure of the original time series with some variations in phase and amplitude. For various reasons (such as the noise, the interference of electrooculographic (EoG) and electromyographic (EMG) and the status of the subjects), the changes of the peak value in µ band over C3, C4 may not be obvious or even not consistent with the ERD for every trial (Fig. 2) as that in Figure 1. The trial 27 in Graz2003 training dataset is the EEG signal for right hand movement imagery. The ERD will appear when imaging hand movement in the µ band. The amplitude of C3 in µ band should smaller than that 8

of C4 according to ERD; however, the opposite is true. The changes in µ band will be adjusted in the phase space because of the effectiveness of the reconstruction function; it will be reflected in the Amplitude-frequency curve (Fig. 2(b)). From Figure 2(b), the amplitude-frequency curve of C3 and C4 is consistent with ERD after PSR. PSR can reduce the effects of the noise or outliers of the EEG signals in BCI. We believe that the features extracted in phase space must help to improve the classification results. 20

20 C3 C4

C3 C4 15 Amplitude[µV]

Amplitude[µV]

15

10

5

5

0

10

0 0

5

10 15 20 Frequency [Hz] (a)

25

30

0

5

10 15 20 Frequency [Hz] (b)

25

30

Figure 2: Amplitude-frequency curve before PSR (a) and after PSR ((b), (m =2, τ =3)) for trial 27 in Graz 2003 train set.

3.3. Parameter selection There are several parameters which must be chosen before extracting the features such as the order p of AR model and the dimension m and delaying time τ . To better present these signals in their phase space, the m and τ should be determined very carefully. There are many methods for choosing τ and m , such as C-C method, mutual information method [34, 35]. However, the EEG data is obtained from the scalp of the brain, it always contaminated by the noise and artifacts due to EMG activity and EoG activity. It is difficult to obtain a fine decision of m and τ for each trial of the EEG. At the other hand, to calculate the m and τ for EEG signal of each channel in each trial is time-consuming; it can not meet the online requirement of a practical BCI system. The goal of this study is to obtain a good feature set not to reconstruct the entire phase space accurately for the EEG signals. Many research studies on phase space reconstruction indicated that many chaotic systems, such as Lorenz, Rossler and EEG, are not sensitive to the selection of embedding dimension m, but sensitive to the selection of time lag τ [36]. 9

So the value of 2 is selected as the embedding dimension m. On the other hand, we predefined the range of the parameters by some experience results. The scope of τ is between 2 and 5; and the range of p is between 5 and 8. Figure 3. shows the Maximal Lyapunov Exponent(MLE) of Graz2003 dataset when m =2. All the MLE are bigger than 0, which shows that the attractors can be reconstructed with m =2. 1.6 C3 C4

C3 C4

The MLE of Graz2003 testing dataset

1.4 1.2 1 0.8 0.6 0.4 0.2

40

60

80

100

120

0

140

Trials

0

20

40

60

80

100

120

140

Trials

Figure 3: The Maximal Lyapunov Exponent of Graz2003 dataset.

To find the optimal τ and p, we estimate the performance of the scheme by means of ten-fold cross-validation procedure on the corresponding training dataset. The ten-fold cross-validation procedure can be described as follows: At first, the training dataset is partitioned into ten equal subsets randomly, a single subset was retained as the validation data for testing the method, and the remaining nine subsets were used as training data. Selecting one parameter set, the cross-validation process was then repeated ten times, with each of the ten subsets used exactly once as the validation data. The ten results from the folds can then be averaged to produce a single estimation. Changing the parameters, and repeating the procedure again until all the parameter combinations were validated. Then, the parameters corresponding to the best result are chosen to evaluate the testing dataset. In order to determine the parameters, the cross-validation should carry out repeatedly. 3.4. Linear Discriminant Analysis (LDA) classifier In order to demonstrate the effectiveness of the proposed features in BCI applications, the Linear Discriminant Analysis (LDA) classifier are used in this paper [37]:

10

3.5. Feature extraction based on phase space Reconstructed phase spaces have been proven to be topologically equivalent to the original system and therefore are capable of recovering the nonlinear dynamics of the generating system [27]. The reconstruction function can tune the amplitude and phase of the original signals and keep the frequency structure of the original signals. This implies that we can use the information of the original signals and the phase space information. The features extracted from it can potentially contain more and/or different information than the common features extraction method. In order to get more efficient features, this paper proposes the following two features extraction schemes using AR model and AFA separately. 3.5.1. Extraction features by AR in phase space(ARPS) (1) Reconstruct the phase space of C3, C4 with the selected m and τ for each trials; (2) Use the AR model to model each dimension in phase space of C3 and C4 channel separately and concatenate the AR parameters to form a single feature vector. If the embedding dimension is m and the order of the AR model is p, the feature vector has 2 × m × p features for each trial. 3.5.2. Extraction features by AFA in phase space(AFAPS) (1) Reconstruct the phase space of C3, C4 with the selected m and τ for each trials; (2) Transform each dimension in phase space of C3 and C4 by FFT separately. Get peak value in µ-band (8-13HZ) and β-band (18-25HZ) as the features. Concatenate the parameters of each dimension to form a single feature vector. If the embedding dimension is m and, the feature vector has 2 × m × 2 features for each trial. 3.6. Description of the classification procedure Based on the description of Section 3.5, features can been extracted by ARPS and AFAPS. In order to find out that whether the features extracted by all the two methods can produce a better result the mixing features combined AFAPS with ARPS are also adopted in this study. Features were extracted at every sampling point, with the sliding window size of 2s for Graz2003 dataset (Fig.4) and Graz2005 dataset of BCI Competitions. Figure 4 shows the chart of sliding window for Graz2003 dataset. The sliding window begin at t=3s and end at t=9s for Graz2003 dataset or t=8s for Graz2005 11

dataset. The procedure of the classifiers with the features extracted in phase space could be divided into two parts: the first one performing to train the classifier with the training trials and the second one being for the analysis of testing trials. The general flow chart was shown in Figure 5 where the left of the flow chart was for the offline training of the classifiers, and the right one was for the classification of the testing trials.

Figure 4: The chart of sliding window for Graz 2003 dataset).

Figure 5: Flow chart of the LDA classification with the phase space features on the Graz datasets in the BCI Competition 2003 and 2005.

Some explanation about the classification procedure was as follows: 12

(1) The parameters used in the testing dataset are the same as they are in training dataset. All these parameters are chosen based on the training dataset. Once the values of m,τ and order p are selected for one subject, they will never change during the classification procedure. (2) The sliding window with the length of 2s begins at t=3s and end at t=9s for Graz2003 dataset or t=8s for Graz2005 dataset. (3) The features extracted from one sliding window at a sampling point training one classifier in the training dataset; this classifier is used to classify the features extracted in the sliding window at the same sampling point in the testing dataset. 4. Results and discussions The benchmark EEG datasets which were used in the BCI-Competition 2003 and 2005 are used to verify the effectiveness of the proposed scheme. All simulations have been carried out in MATLAB 7.10(R2010a) environment running in the same machine with 4GB of memory and 3.10GHz processor. For Graz2003 dataset, only the sliding windows from t=3 to t=9 s were used according to the time scheme of the 2003 experimental paradigm [28]. Bcause the mutual information (MI) takes also into account the magnitude of the outputs, there is a close relationship between the error rate and the MI [38]. The MI was used as the criterion to compare the performances of different methods in BCI-Competition 2003. Suppose Di is the distance for measuring the classification. Ideally, Di > 0 if the i-trial is a left trial and Di < 0 for all right trials. In practice other processes not correlated to the class-relation also influence Di . One can say Di consists of two types of process, one correlated to the movement imagery task containing useful information (signal) and processes not related to the task (noise). The Signal-to-Noise ratio (SNR)(for a fixed time point t within a trial) is defined as Equation 4 [38]. SN Rt =

2 ∗ var(Dti )i∈(L,R) var(Dti )i∈(L) + var(Dti )i∈(R)

−1

(6)

Where (L) and (R) are the sets of left and right trials, and var(.) is the variance over all trials i. The mutual information between the BCI output D and the class relationship can be determined by calculating the difference between the entropy for the total variance and the within-class variance of 13

D. This leads to the following formula for the mutual information for each possible classification time t: It = 0.5 ∗ log2 (1 + SN Rt )

(7)

Figure 6 depicts the time course of the MI obtained by AFA and AFAPS. It is obvious that MI of AFASP is higher than that of AFA during mostly times. The features extracted by AFASP improved the classification results especially between t=3s and t=6s. Figure 7 shows the MI time courses of the AFAPS with the training trials and the test trials. The blue line is the MI time course with the training dataset and the red one is the testing dataset. It is clearly seen that the MI of the training trials is always higher than that of the testing trials. This is reasonable because the parameters all obtained from the training dataset and the classifiers were also trained by the training dataset. The MI courses of the LDA classifiers with the features extracted by AFAPS+ARPS, AFASP and ARSP are shown in figure 8. The features extracted by AFASP+ARSP provided a better result than AFASP and ARSP. MI with AFA MI with AFASP

Mutual information [bit]

0.6 0.5 0.4 0.3 0.2 0.1 0

0

1

2

3

4

5

6

7

8

9

Time [s]

Figure 6: The MI time courses of LDA classifiers with the features extracted by AFA and AFAPS: at t=3s the cue(left or right in random order)was presented.

Table 2 gives the performances of the features extracted in phase space and in original EEG by AFA and AR. It can be seen from table 2 that the features in phase space are always superior than that in original EEG data. Only the maximal MI of ARPS is equal to the maximal MI of AR, while the ARPS has a smaller minimal misclassification rate 11.43% , that is 13.14% for AR features. Table 3 ranks the performance of the proposed methods and the winning method using in the BCI Competition 2003 and the methods used in [16] and[17]. The combined features extracted by AFASP + ARPS 14

Mutual information [bit]

MI of training trials MI of testing trials

0.6 0.5 0.4 0.3 0.2 0.1 0

0

1

2

3

4

5

6

7

8

9

Time [s]

Figure 7: The MI time courses of training trials and testing trials based on the features extracted by AFAPS: at t=3s the cue (left or right in random order) was presented. Table 2: Classification result of the LDA using the features extracted in PS and not in PS on the Graz dataset of BCI completion 2003 in term of the MI criterion. Methods

Maximal MI (bit)

Minimal misclassification rate (%)

Classification time T(s)

AFA AFAPS AR ARPS AFA+AR AFAPS+ARPS

0.56 0.63 0.57 0.57 0.60 0.67

9.29 9.29 13.14 11.43 10 9.29

5.8 4.47 5.76 5.73 6.15 5.76

have the best maximal MI 0.67 and minimal misclassification rate 9.29%, which outperform the results achieved by [16], [17] and the first winner of the BCI Competition 2003 on the same Graz dataset. When both using LDA classifiers, AFAPS has the same maximal MI 0.63 as the method proposed in [17], while APAPS has a smaller minimal misclassification rate 9.29%, which is 10.71% in [17]. For Graz2003 dataset, LDA classifier works better with the features extracted by AFASP + ARPS than AFASP and ARPS. All results shown in table 2 and table 3 demonstrate that the features extracted in phase space improve the classification results for the Graz2003 dataset of BCI Competition 2003. The EEG dataset used in the BCI Competition 2005 is another benchmark problem. It was also used to validate the features extracted in phase space for classifying the motor imageries. The dataset was generated by different experimental settings and three subjects (O3, S4 and X11). The sliding windows slid from t=3s to t=8s point by point. In BCI Competition

15

0.6 0.4 0.2

Mutual information [bit]

0

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0.6 0.4 0.2 0

0.6 0.4 0.2 0

Time [s]

Figure 8: The MI course of the LDA classifier with the features extracted by AFAPS+ARPS(top), AFASP(middle) and ARSP(bottom): at t=3s the cue(left or right in random order)was presented.

2005, the method with the maximum increase of the mutual information (maximum steepness calculated as M I(t)/(t − 3s) for t > 3.5s) was used for validation. In order to avoid a stimulus-response-mechanism, t > 3.5swill be evaluated. The ”steepness” of the mutual information quantifies the response time [39]. It can be seen that the phase space features can work better than the features in original EEG for subject O3, while it just provides a slightly better result for subject S4 and subject X11(Tbl. 4). According to Table 4, the greatest maximal MI steepness are 0.7355 for O3, 0.1184 for S4 and 0.1640 for X11. The AFAPS+ARPS features dataset improves the maximal MI steepness for all subjects. AFA feature provided poor results for subject S4 and X11 based on the evaluation criterion. The Maximal MI steepness of AFAPS is even smaller than that of AFA for subject X11, while AFAPS has greater maximal MI than AFA in our experiments. The maximal MI with AFAPS is 0.2506; it is 0.2365 with AFA. The phase space features may not always improve the results according to the maximal MI steepness validation criterions, while the combined features extracted by AFAPS+ARPS can always work well for every subject. Table 5 ranks the performances of the proposed methods, together with the BCI Competition 2005 winning methods and the methods in [16] accord-

16

Table 3: Ranking order of LDA classifier using the features extracted by the proposed method and the BCI Competition 2003 winning methods together with the methods in[16] and [17] on the Graz2003 dataset in term of the MI criterion.

Ranking

Methods

1 2

AFAPS+ARPS FSVM with wawelet−based features in[16] NN in[17] LDA IN[17] AFAPS BCI-Comp2003-1st winner ARPS BCI-Comp2003-2nd winner

3 4 4 5 6 7

Maximal Minimal MI (bit) misclassification rate (%)

Classification time T(s)

0.67 0.66

9.29 12.14

5.76 5.92

0.64 0.63 0.63 0.61 0.57 0.46

10 10.71 9.29 10.71 12.14 15.71

− − 4.47 7.59 5.73 5.05

ing to the averaged maximal MI steepness of all subjects on the Graz2005 dataset. The features extracted in phase space achieved the maximal MI steepness 0.6860, 0.5074 and 0.7355 on the test dataset from the subject O3, both of which are greater than the one achieved by the winner of the BCI Competition 2005. However, the performances of these features are worse than the winning methods for subject S4 and X11. The ARPS and AFAPS+ARPS only outperform the method proposed in [16]. A comparison between AFAPS+ARPS and the method proposed in [11] according to maximal classification rate is shown in Table 6. Obviously, the AFAPS+ARPS method can achieve a better maximal classification accuracy rate for most of the subjects. The method in [11] only worked well for subject S4 according to the maximal classification accuracy rate validation criterion. The performance of the classification depends upon the selection of several parameters, i.e, the dimension m of the phase space, the time lagging τ and the order p of AR. In this study, all parameters have been chosen based only on training dataset. These parameters were not same for each subject because no two people are identical and the EEG signals vary between subjects. Table 7 lists all the parameters of AFAPS+ARPS when obtaining the results showing in table 2 and table 4. It can be seen that the value of dimension 17

Table 4: Classification result of the LDA using the features extracted in phase space and in original EEG on the Graz2005 dataset in terms of the maximal MI steepness. Maximal MI steepness(bit/s) (%) Method

O3

S4

X11

mean

AFA AFAPS AR ARPS AFA+AR AFAPS+ARPS

0.6178 0.6860 0.4969 0.5074 0.7183 0.7355

0.0391 0.0413 0.1023 0.1024 0.1130 0.1184

0.0636 0.06 0.1546 0.1570 0.1513 0.1640

0.2402 0.2624 0.2512 0.2556 0.3275 0.3393

Table 5: Ranking order of LDA classifier using the features extracted by the proposed method and the BCI Competition 2005 winning methods together with the methods in[16] on the Graz2005 dataset in terms of the maximal MI steepness. Maximal MI steepness(bit/s) (%) Ranking

Method

O3

S4

X11

mean

1 2 3 4 5 6 7

AFAPS+ARPS AFA+AR BCI-Comp2005-1st winner FSVM with wavelet-based features[16] AFAPS ARAR BCI-Comp2005-2nd winner

0.7355 0.7183 0.1698 0.6711 0.6860 0.5074 0.1626

0.1184 0.1130 0.4382 0.0718 0.0413 0.1024 0.4174

0.1640 0.1513 0.3489 0.0809 0.06 0.1570 0.1719

0.3393 0.3275 0.3190 0.2764 0.2624 0.2556 0.2506

m is two for all subjects, while the other parameters are no same for different subjects. It is known that EEG signals are nonlinear and many feature extraction methods always ignored this character. The features extracted by these methods don’t have the nonlinear information. By reconstructing the phase space, more information about the dynamics of the system are obtained, even if we don’t have direct access to all the systems variables [27]. We believe that the features extracted in phase space can provide more valuable information to improve the classification. In present study, only AFA features and AR coefficients in phase space are used to solve the specific tasks on the Graz dataset of BCI Competitions 2003 and 2005. The simulation showed the phase space features outperformed most of the existing methods. The 18

Table 6: Classification accuracy results (in %) using the proposed method and the method in [11]. Methods

Graz2003

O3

S4

X11

AFAPS+ARPS for cross-validation AFAPS+ARPS for testing dataset Method in[11]

90.12±0.84 90.71 82.14

90.63±3.99 87.5 68.95

78.52±3.89 75.10 77.78

84.81±3.56 84.54 67.18

Table 7: Parameters used in AFAPS+ARPS for each subject. parameter

Graz2003

O3

S4

X11

m τ p

2 2 8

2 4 8

2 4 5

2 5 5

key advantages of the proposed feature extraction method are detailed below. (1) The proposed feature sets contain some information hidden in the phase space, while the widely used conventional features only have linear measures. As a consequence, the highly nonlinear signals such as EEG cannot be correctly characterized by the linear features. Extracting the features in phase space can improve the classification ability of conventional linear method because the linear and nonlinear features can be presented in phase space. (2) The proposed feature extraction method is very flexible; it can use various linear feature extraction methods not limited to AFA and AR. 5. Conclusion This work presents a feature extraction method to extract features in phase space for classifying EEG signals corresponding to left/right-hand motor imagery. The features were extracted in phase space using conventional AFA and AR method separately. Experimental results have shown that based on the proposed features, the LDA classifiers achieved better classification performance than the traditional features extracted by AFA and AR in raw EEG signals. Features extracted in phase space do help to improve the classification results for all subjects. According to the experimental results, the combining features dataset of AFAPS+ARPS always outperformed the phase space features extracted solely by AFA or AR. The features extracted 19

by AFAPS outperformed the winners of BCI Competition 2003 and the features of AFAPS+ARPS outperform all the winners of BCI 2003 and other similar studies on the same Graz dataset in terms of the criteria of either MI or misclassification rate. The phase space features have a good performance for the subject O3 in the Graz2005 dataset, and the AFAPS+ARPS features perform better than first winning method in terms of the mean maximal MI steepness criterion. The phase space features extracted by AFAPS+ARPS only worked better than the methods in [16] while it worked worse than the first two winners in BCI Competition 2005 for subject S4 and X11. Maybe the AFA and AR features have worse time reflection than other methods for subject S4 or X11. Another features may be more suitable for the two subjects, hence we will adopt another linear method to extract features in the phase space in the future work. We believe that the phase space features contain significant discriminatory ability and it can improve the linear feature extracting methods such as AFA and AR. Acknowledgments This work was supported by the national natural science foundation of China (No. 61303228) and the Fundamental Research Funds for the Central Universities, China (XDJK2014C119), and the doctoral fund of southwest university, China (NO. SWU112038). Reference [1] H. J. Hwang, K. Kwon, and C. H. Im, Neurofeedback-based motor imagery training for brain-computer interface (BCI), Journal of neuroscience methods.179 (1) (2009) 150-156. [2] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller,T.M.Vaughan. Brain-computer interfaces for communication and control. Clinical neurophysiology. 113(6)(2002)767-791. [3] D. J. McFarland and J. R. Wolpaw, Brain-Computer Interfaces for Communication and Control, Communications of the ACM,54 (5)(2011) 6066.

20

[4] S. G. Mason, A. Bashashati,M. Fatourechi, K. F. Navarro,G. E. Birch. Acomprehensive survey of brain interface technology designs. Annals of biomedical engineering.35 (2) (2007) 137-169. [5] E. Parvinnia, M. Sabeti, M. Zolghadri Jahromi, R. Boostani, Classification of EEG Signals using adaptive weighted distance nearest neighbor algorithm. Journal of King Saud University-Computer and Information Sciences. (2013). URL. [6] G. Pfurtscheller, Functional brain imaging based on ERD/ERS, Vision research. 41(10-11)(2001) 1257-1260. [7] G. Pfurtscheller, A. Aranbibar, Evaluation of event-related desynchronization preceding and following voluntary self-paced movement, Electroencephalogr. Clin. Neurophysiol.46(2) (1979) 138-146 [8] B. Blankertz, R. Tomioka, S. Lemm,M. Kawanabe, K. R. Muller, Optimizing spatial filters for robust EEG single-trial analysis, Signal Processing Magazine, IEEE. 25 (1)(2008) 41-56. [9] G. Pfurtscheller , F. H. Lopes da Silva. Event-related EEG/MEG synchronization and desynchronization: basic principles, Clinical Neurophysiology . 110 (11) (1999) 1842-1857. [10] G. Pfurtscheller , C. Brunner, A. Schl¨ogl, F. H. Lopes da Silva, Mu rhythm (de)synchronization and EEG single-trial classification of different motor imagery tasks, NeuroImage. 31(1) (2006) 153-159. [11] G. Rodr´ıguez-Bermudez, P. J. Garc´ıa-Laencina, J. RocaGonz´alez, J. Roca-Dorda, Efficient feature selection and linear discrimination of EEG signals, Neurocomputing (2013). URL< http://dx.doi.org/10.1016/j.neucom.2013.01.001>. [12] R. Palaniappan, and N. J. Huan, Effects of Hidden Unit Sizes and Autoregressive Features in Mental Task Classification, World Academy of Science, Engineering and Technology. 12( 2005) 171-176. [13] A. Schl¨ogl, The Electroencephalogram and the Adaptive Autoregressive Model: Theory and Applications, Germany: Shaker, 2000.

21

[14] G. Pfurtscheller, C. Neuper, A. Schl¨ogl, and K. Luger, Separability of EEG signals recorded during right and left motor imagery using adaptive autoregressive parameters, Rehabilitation Engineering, IEEE Transactions on. 6 (3) (1998) 316 -325. [15] A. H. Zhang ,B. Yang, L. Huang , W. Y. Jin, Implementation method of brain2computer interface system based on Fourier transform, Journal of Lanzhou University of Technology, China. 34 (4) (2008) 77-82. [16] Q. Xu, H. Zhou, Y. J. Wang, J. Huang, Fuzzy support vector machine for classification of EEG signals using wavelet-based features, Medical Engineering & Physics. 31 (2009) 858-865. [17] S. M. Zhou, J. Q. Gan, F. Sepulveda. Classifying mental tasks based on features of higher-order statistics from EEG signals in brain-computer interface. Information Sciences. 178 (6) (2008) 1629-1640. [18] C. Park, D. Looney, P. Kidmose, M. Ungstrup, D. P. Mandic, Timefrequency analysis of EEG asymmetry using bivariate Empirical Mode Decomposition. Neural Systems and Rehabilitation Engineering, IEEE Transactions on, 19 (4) (2011) 366-373. [19] D. T. Liley, P. J. Cadusch,M. P. Dafilis. A spatially continuous mean field theory of electrocortical activity . Computation in Neural Systems. 13 (1) (2002) 67-113. [20] C. J. Stam, Nonlinear dynamical analysis of EEG and MEG: review of an emerging field, Clinical Neurophysiology. 116 (10) (2005) 2266-2301. [21] M. C. Casdagli, L. D. Iasemidis, R.S. Savi, R. L. Gilmore, S. N. Roper, J. Chris Sackellares, Non-linearity in invasive EEG recordings from patients with temporal lobe epilepsy, Electroencephalography and clinical Neurophysiology. 102 (2) (1997) 98-105. [22] J.P. Dietrich. Phase Space Reconstruction using the frequency domain. Diploma Thesis, University of Potsdam.(2008) [23] U. Rajendra Acharya, S. Vinitha Sree, G. Swapna, R. J. Martis, J. S. Suri. Automated EEG analysis of epilepsy: A review. Knowledge-Based Systems. 45 (2013) 147-165.

22

[24] M. Azarnoosh, A. Motie Nasrabadi, M. R. Mohammadi, M. Firoozagbdi, Investigation of mental fatigue through EEG signal processing based on nonlinear analysis: Symbolic dynamics. Chaos, Solitons & Fractals. 44 (12) (2011) 1054-1062. [25] A. Banitalebi, S. K. Setarehdan, G. A. Hossein-Zadeh. A technique based on chaos for brain computer interfacing, In Computer Conference, 2009. CSICC 2009. 14th International CSI. IEEE. (2009) 464-469. [26] B. Xu, S. Jacquir, G. Laurent, J. M. Bilbault, S. Binczak. Phase space reconstruction of an experimental model of cardiac field potential in normal and arrhythmic conditions, In 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2013. [27] F. Takens. Detecting strange attractors in turbulence, In Dynamical systems and turbulence, Warwick 1980 , Speinger Berlin Heidelberg. (1981) 366-381. [28] BCI Competition II: 2003, URL. [29] BCI Competition III:2005,URL< http://www.bbci.de/competition/iii/>. [30] V. Lawhern, W. D. Hairston, K. McDowell, M. Westerfield, K. Robbin, Detection and classification of subject-generated artifacts in EEG signals using autoregressive models, Journal of Neuroscience Methods, 208 (2) (2012) 181-189. [31] S. Weisberg. Applied linear regression. 3rd ed. John Wiley and Sons, 2005. [32] H. Whitney, Differentiable manifolds, The Annals of Mathematics, 37(3) (1936) 637-645. [33] A. V.Oppenheim, R. W. Schafer, J. R. Buck. Discrete-time signal processing. Upper Saddle River: Prentice Hall, (1999) [34] A. M. Fraser, H. L. Swinney. Independent coordinates for strange attractors form time series, Phys. Rev. A. 33 (1) (1986) 1134-1140. [35] H. S. Kim, R. Eykholt, J. D. Salas. Nonlinear dynamics, delay times and embedding windows. Physica D: Nonlinear Phenomena. 127 (1) (1999) 48-60. 23

[36] Y. Rong-Yi, H. Xiao-Jing. Phase space reconstruction of chaotic dynamical system based on wavelet decomposition. Chinese Physics B. 20(2)(2011) 020505. [37] R. A. Fisher, The use of multiple measurements in taxonomic problems, Ann. Eugen. 7(1936)179-188. [38] A. Schl¨ogl, C. Neuper, G. Pfurtscheller. Estimating the mutual information of an EEG-based brain-computer interface. Biomedizinische Technik/Biomedical Engineering, 47(1-2) (2002) 3-8. [39] B. Blankertz, K. R. Muller, D. Krusienski, G. Schalk, J. R. Wolpaw,A. Schl¨ogl, et al. The BCI competition III: validating alternative approaches to actual BCI problems. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 14(2) (2006) 153-159.

24