Computers in Biology and Medicine 64 (2015) 1–11
Contents lists available at ScienceDirect
Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm
Subject transfer BCI based on Composite Local Temporal Correlation Common Spatial Pattern Sepideh Hatamikia n, Ali Motie Nasrabadi Department of Biomedical Engineering, Shahed University, Tehran, Iran
art ic l e i nf o
a b s t r a c t
Article history: Received 18 March 2015 Accepted 1 June 2015
In this paper, a subject transfer framework is proposed for the classification of Electroencephalogram (EEG) signals in brain–computer interfaces (BCIs). This study introduces a modification of Common Spatial Pattern (CSP) for subject transfer BCIs, where similar characteristics are considered to transfer knowledge from other subjects' data. With this aim, we proposed a new approach based on Composite Local Temporal Correlation CSP, namely Composite LTCCSP with selected subjects, which considers the similarity between subjects using Frobenius distance. The performance of the proposed method is compared with different methods like traditional CSP, Composite CSP, LTCCSP and Composite LTCCSP. Experimental results have shown that our proposed method has increased the performance compared to all these different methods. Furthermore, our results suggest that it is worth emphasizing the data of subjects with similar characteristics in a subject transfer diagram. The suggested framework, as demonstrated by experimental results, can obtain a positive knowledge transfer for enhancing the performance of BCIs. & 2015 Elsevier Ltd. All rights reserved.
Keywords: Common Spatial Patterns Brain–computer interface Subject transfer Local temporal
Contents 1. 2.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. CSP algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2. LTCCSP algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3. Composite LTCCSP: generalization of the LTCCSP method to a transfer learning framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4. The proposed method based on composite LTCCSP with selected subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3. Experimental setup and result analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1. Data description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2. Classification result analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1. Introduction The automatic classification of movement-related Electroencephalogram (EEG) signals is one of the most challenging fields of brain–computer interfaces (BCIs). In a BCI system, users can
n
Corresponding author. Tel.: þ 982151212024; fax: þ982151212021. E-mail addresses:
[email protected] (S. Hatamikia),
[email protected] (A.M. Nasrabadi). http://dx.doi.org/10.1016/j.compbiomed.2015.06.001 0010-4825/& 2015 Elsevier Ltd. All rights reserved.
manipulate the system just by thinking about what they want it to do within a limited set of choices. There are several types of EEG-based BCIs that include mental tasks [1], P300 [2], neural responses elicited during visual stimulus flickering [3] and motor imagery [4]. In the BCIs based on responses to mental tasks, different non-movement mental tasks lead to different EEG patterns associated with these mental tasks. In a P300-based BCI, in order to trigger a P300 waveform in a subject's brain activity, the subjects must focus their attention on a specified stimulus that randomly appears among many others. By detecting
2
S. Hatamikia, A.M. Nasrabadi / Computers in Biology and Medicine 64 (2015) 1–11
the P300 component, the system is enable to recognize the demanded stimulus and hence the demanded command. Some BCIs use visual evoked potentials (VEPs), which are electrical potential-differences originating from the scalp after a visual stimulus flickering like a flash-light. The aim of the VEPs-based BCIs is to identify this frequency reliably with high accuracy. Motor imagery based BCIs use Sensory Motor Rhythms (SMR) information to translate a subject's motor intention into a control signal to have efficient control over an output device such as a neuroprosthesis, a wheelchair, or a computer. Motor imagery tasks are associated with an increase or attenuation of localized brain rhythms activity called Event-Related Synchronization (ERS) or Event-Related Desynchronization (ERD) [5]. Fig. 1 shows the basic scheme of a general EEG-based BCI system. One of the most popular and efficient techniques to extract ERD/ERS related features is the Common Spatial Pattern (CSP), which is widely used for motor imagery BCI designs [6,7]. The CSP method aims to find spatial projections (filters) that simultaneously maximize the variance of one class while minimizing the variance of the other class [8,9]. Despite the efficiency and popularity of CSP in designing BCIs, this algorithm has two inherent drawbacks, one is the high sensitivity to potential outliers and artifacts and the another is the overfitting with small training sets [10]. Traditional CSP considers each time point of all EEG channels as a vector in the feature space and maps it into another space using the average covariance matrix of all EEG signals [11]. In such a situation, the temporally local structure of the EEG signals is not considered and the covariance matrix of all EEG signals is affected by the noise of one tiny time slot, which makes errors in estimating spatial filters [11]. To overcome the above-mentioned inconvenience of traditional CSP, the Local Temporal Common Spatial Patterns (LTCSP) method has been proposed [12]. LTCSP considers temporally neighboring samples and uses the local temporal information by making a time-dependent adjacency graph. Like CSP, this method is computationally simple, but it is less sensitive to noise and artifacts. Wang and Zheng demonstrated that in a two class motor imagery based BCI problem, LTCSP achieves more discrimination compared to the CSP method. Another extension of CSP in the literature which considered the local structure of EEG signals is Local Temporal Correlation Common Spatial Patterns (LTCCSP). LTCCSP uses local temporal correlation information to further improve the estimation of covariance matrices. Compared to CSP and LTCSP, the LTCCSP method has shown the best performance
under outlier condition [13]. In the LTCSP method, the Euclidean distance between different N-channel EEG recording vectors at different time points is calculated to construct a weight matrix for the covariance matrices while in the LTCCSP, the correlation measure is used to construct the weight matrix. Correlation is introduced as a more reasonable measure to construct the weight matrix [13]. Most of the proposed CSP-based techniques in the literature use subject-specific covariance matrices to construct user-specific spatial filters. Limited and user-dependent training samples may lead to overfitting or suboptimal spatial filters and decrease the performance of BCIs. To overcome such inconveniences, one idea is to add a priori information to the CSP process using regularization terms [10,14]. In this case, the useful information obtained from other subjects (named as source subject group) involving the same task is transferred to the target subject (the subject whose brain signals would be classified), which is called subject-to-subject transfer [15]. Fig. 2 presents a proposed schedule for subject transfer based BCIs. With this aim, different regularized CSP methods have been proposed in the literature. Kang et al. proposed a regularized CSP method called Composite CSP, which aims to perform subject-to-subject transfer by the regularization of the covariance matrices using the other subject's information [15]. Their suggested regularized method used linear combination of covariance matrices calculated from the other subjects' data. One approach was regularized CSP with generic learning proposed by Lu et al. [14]. This method attempts to shrink the covariance matrix into both the generic and identity matrix, where the generic matrix is calculated using the covariance matrices of other subjects. Another regularized method that was used in the BCI literature is invariant CSP, which tries to find the filters invariant to a given source of noise [16,10]. All the mentioned regularized CSP methods have shown higher performance than traditional CSP, especially for subjects with small training samples [10]. In all of the above methods, the data of each subject from the source subject group has the same role in regularization of the covariance matrix. Indeed, the similarity between the signal characteristics of the target subject and all other subjects is not considered in the regularization process. However, owing to the inter-subject variability, it is unreasonable to easily add the other subject's data to the training data of the target subject. Indeed, if data from a large group of subjects is available, it may not be the best option to use all of them in regularizing the covariance matrix due to the large
Fig. 1. Basic scheme of a general EEG-based BCI system. While a user performs mental tasks, the EEG signals are acquired and pre-processed. With feature extraction and classification stages, as parts of a machine learning system, the user intentions are predicted. These predictions can be used for controlling output devices.
S. Hatamikia, A.M. Nasrabadi / Computers in Biology and Medicine 64 (2015) 1–11
3
Fig. 2. An illustration of subject transfer based BCIs.
inter-subject variabilities; irrelevant features may even deteriorate the performance. One promising way to avoid this problem is to emphasize the datasets of subjects who have similar characteristics with the target subject. The main challenge in such subject transfer diagrams, however, is how to properly extract and transfer information from other subjects. There are few other studies that have considered similar characteristics in a subject's data. As an example, Lotte et al. [10] performed subject-to-subject transfer using the covariance matrices of a subset of selected subjects and called their proposed method “regularized CSP with selected subjects”. They suggested a sequential forward selection method to find a subset of subjects that maximizes the obtained accuracy for the target subject. Tu et al. [17] proposed a subject transfer framework on both the classification and feature extraction levels. They suggested a set of candidate filters using regularization of l1 norm factor; they then extracted subject-specific and generalized filter banks and learned classifiers corresponding to these candidates' filter sets. Kang et al. proposed a regularized CSP method called Composite CSP. Their suggested regularized method used linear combination of covariance matrices calculated from the other subjects' data where all the source subjects were treated equally in the regularization process. Furthermore, they proposed another Composite method in which similarity between the signal characteristics of the target subject and all other subjects is considered in the regularization process. To this end, they used the Kullback–Leibler (KL) divergence between the distribution of the data of each subject and the target subject to compute the dissimilarity between subjects [15].
This study proposes a modification of CSP for subject transfer BCI, where subject-to-subject transfer is performed by the regularization of the covariance matrix of the target subject, using the information of other subjects with similar characteristic signals. With this aim, we suggest a new approach based on Composite Local Temporal Correlation CSP, namely Composite LTCCSP with selected subjects, which considers the similarity between subjects using Frobenius distance. We compare our proposed method to different methods, including traditional CSP, Composite CSP, Composite CSP with KL-divergence, LTCCSP and Composite LTCCSP using EEG signals from subjects performing motor imagery tasks. Our experimental results suggest that it is worth emphasizing the data of subjects with similar characteristics to regularize covariance matrices, and the proposed method can obtain a positive knowledge transfer for enhancing the performance of BCIs.
2. Methods 2.1. CSP algorithm CSP algorithm aims to find the spatial filters that can differentiate two classes of EEG signals, optimally. We consider that we have two classes of motor imagery, mentioned as C1 and C2 (e.g. right and left motor imagery). Each class includes some trials in which each single trial is represented as K N matrix X¼[x1, x2, …, xN], where N is the number of time samples per channel and K is the number of channels. For each class, the average
4
S. Hatamikia, A.M. Nasrabadi / Computers in Biology and Medicine 64 (2015) 1–11
normalized covariance matrix can be obtained from: Ci ¼
1X
X j X Tj
T i j A T traceðX j X Tj Þ
;
i A f1; 2g;
ð1Þ
i
where, Ti is the number of trials corresponding to ith class. The composite spatial covariance is calculated as: C ¼ C1 þ C2
ð2Þ
C can be factored as follow: C ¼ U T ΛU;
ð3Þ
where, U is the matrix of eigenvectors and � is the diagonal matrix of eigenvalues. The eigenvalues are considered to be sorted in descending order [18,11]. Afterward, the whitening transformation: P ¼ UΛ
12
ð4Þ
equalizes the variances which are spanned by U and all eigenvalues of P T C P are equal to one [11,17]. If C 1 and C 2 are transformed as S1 ¼ P T C 1 P and S2 ¼ P T C 2 P and if S1 ¼ BT Λ1 B, then S2 ¼ BT Λ2 B. Therefore,
Λ1 þ Λ2 ¼ I
ð5Þ
where, I is the identity matrix. The meaning of this outcome is that the eigenvector associated with the largest eigenvalue has the largest variance for class C1 and has the smallest variance for class C2 and contrariwise. Therefore, if whitened EEG signals are projected onto the first and last eigenvectors of B, the most optimal discriminative information will be achieved for the classification process. The mapping of each trial X is done using projection matrix W ¼PB as follows: Z ¼ WT X
ð6Þ
The 2m signals corresponding to the m first and last rows of Z are associated with the largest eigenvalues of each class. Afterward, using the normalized variance of these signals, 2m features could be extracted as follows [11]: 0 1 B varðZ Þ C p B C f p ¼ log B C 2m @P A varðZ i Þ
ð7Þ
ith trial. It is worth to note that the N samples for each channel of EEG signals are zero-mean, which usually is the case after frequency filtering. With omission of indexes, using the dimension reduction strategy in [18], the term wT XX T w can be expanded as follow [11]: wT XX T w ¼
N X N 2 1 X w T xl w T xm : 2N l ¼ 1 m ¼ 1
ð10Þ
where xl and xm are the vectors at time points l and m, respectively. The above equation implies that the sum of the squared pairwise distances between the projected data points constructs the variance after filtering. Eq. (10) concentrates on the total information. In [12] a weight matrix W Xlm is added to focus on local information as follow: N X N 2 1 X wT xl wT xm W Xlm 2N l ¼ 1 m ¼ 1
ð11Þ
The weight matrix determines the effect degree of the temporally neighboring samples on the covariance matrices and ignores the effect of the noise of one time sample on the extracting covariance matrix in the far time samples [11]. For this aim, W Xlm has to decrease when the distance between two temporally adjacent data points increases. The correlation coefficient is used to calculate the corresponding weight matrix using the following equation: ( W Xlm ¼
expðcorrðxl ; xm ÞÞ;
l m r τ
0;
otherwise;
ð12Þ
where corr(x1, xm) shows the correlation coefficient operator and τ is a positive number that determines the temporally local range. After some manipulation, temporally local information will appear in the average normalized covariance matrix as follow [11,13]: X j LX X Tj 1X ; C~ i ¼ T i j A T traceðX j LX X Tj Þ i
i A f1; 2g;
ð13Þ
where the Laplacian matrix LX ¼ DX W X is a semi-positive definite matrix which can be decomposed as [9]:
i¼1
1 2
1 2
where p ¼ 1, 2,…, 2m.
LX ¼ LX LX ;
2.2. LTCCSP algorithm
the row sum of W X Furthermore, DX is a diagonal matrix which N P makes its diagonal elements; so, DXll ¼ W Xlm . Afterward, like ¼1 the CSP method, the projection matrix mW would be found that maximizes the following equation:
LTCCSP is an extension of traditional CSP that uses local temporal correlation information for further improvement in the estimation of covariance matrices. This method has shown better performance under outlier conditions compared to CSP and LTCSP methods [12,13]. Formally, CSP uses the projection matrix W which optimizes the following function [10]: max
traceðW T C 1 WÞ traceðW T C 2 WÞ
:
ð8Þ
T
where, W ðC 1 þ C 2 Þ ¼ I. This eigenvalue problem could be transformed to [11]: 1 T1
T
traceðW C 1 WÞ traceðW T C 2 WÞ
¼ 1 T2
T1 P K P i¼1j¼1 T2 P K P i¼1j¼1
ðwTj X i X Ti wj Þ=traceðX i X Ti Þ ;
ð9Þ
ðwTj X i X Ti wj Þ=traceðX i X Ti Þ
where T1 and T2 are the number of trials for class one and two, respectively. wj is the jth column of matrix W and (i) denotes the
max
traceðW T C~ 1 WÞ ; traceðW T C~ 2 WÞ
ð14Þ
ð15Þ
~ T ðC~ þ C~ 2 ÞW T ¼ I. where W 1 Then optimal W is computed by: ~ ¼ U~ D ~ 2 V~ ; W 1
ð16Þ
~ are the matrix of eigenvectors and diagonal matrix where U~ and D of related eigenvalues of C~ 1 þ C~ 21 , respectively. Furthermore, V~ is 1 T ~ 2 . The projected matrix the matrix of eigenvectors of D~ 2 U~ C~ 1 U~ D X is then calculated by: 1
~ T XLX 2 : Z~ ¼ W
ð17Þ
Similar to the CSP, the 2m projected signals related to the m first and last rows of Z are suitable for classification and 2m
S. Hatamikia, A.M. Nasrabadi / Computers in Biology and Medicine 64 (2015) 1–11
features can be computed as [11]: 0 1 B varðZ~ Þ C p B C f~ p ¼ log B C; 2m @P A ~ varðZ i Þ
p ¼ 1; 2; …; 2m:
ð18Þ
suggested by Kang et al. [15]. Composite CSP uses a linear combination of covariance matrices corresponding to other subjects' data to incorporate inter-subject alternations as well as intra-subject alternations [15]. Assume the average normalized covariance matrix for subject k and class c is given by:
i¼1
C kc ¼ 2.3. Composite LTCCSP: generalization of the LTCCSP method to a transfer learning framework A large number of proposed CSP-based methods in the literature exploit subject-specific covariance matrices to construct spatial filters. In these methods, the information of other subjects' data has no involvement in constructing spatial filters. Limited and user-dependent training samples may lead to overfitting or suboptimal spatial filters and deteriorate the performance of BCIs. In this case, subject transfer is recommended. In a subject transfer scheme, a priori information is added into the CSP process using regularization terms. In other words, the useful information of other subjects involving the same task is transferred to the target subject to build an efficient training set. However, owing to intersubject variability, it is unreasonable to easily add another subject's data to the training data of the target subject; and if subject's data have very varied signal specifications, we may miss relevant information. Composite CSP is a regularized method first
5
1
X
X j X Tj
T \ T c j A ðT k \ T ÞtraceðX j X Tj Þ k
;
c A f1; 2g;
ð19Þ
c
where T k is the number of trials corresponding to kth subject. k Composite CSP proposes a Composite covariance matrix C^ c for subject k as follow: XT j \ T c j k Tk \ Tc k Cc þ λ Cc C^ c ¼ ð1 λÞ Tc Tc jak
ð20Þ
where λ A ½0; 1 is the regulation parameter that adjusts the influence of the covariance matrix to itself [15]. After computing k the Composite covariance matrix C^ c , the calculation of the projection matrix W is performed completely similar to Section 2.1. In this study, we introduce the Composite LTCCSP algorithm that generalized the LTCCSP method to a transfer learning framework. Similar to the Composite CSP, Composite LTCCSP uses other subjects' data to perform subject-to-subject transfer; however, this method uses a linear combination of temporally local covariance matrices from other subjects to construct a Composite local
Fig. 3. A schematic diagram of the proposed Composite LTCCSP with selected subjects for subject transfer BCIs.
6
S. Hatamikia, A.M. Nasrabadi / Computers in Biology and Medicine 64 (2015) 1–11
temporal covariance matrix. In this algorithm, all the source subjects are treated equally to construct the Composite local temporal covariance matrix. We denote the average normalized local temporal covariance matrix for subject k and class c as follows: C 0c k ¼
X
1
X j LX X Tj
T k \ T c j A ðT k \ T ÞtraceðX j LX X Tj Þ
;
c A f1; 2g;
ð21Þ
characterization to the target subject will have a smaller value of F C j ;C K ;therefore, the larger value of bjk will be achieved and the role of this subject will be larger in constructing the Composite covariance matrix. In this case, we can denote Eq. (25) as follows: ( k X ð1 λÞ j ¼ k; k wjk C 0c j; ; wjk ¼ C~ c ¼ ð27Þ λbjk j ak j¼1
c
Composite LTCCSP proposes a Composite local temporal covar_k iance matrix C c for subject k as follows: XT \ T c 0 _k T \ Tc 0 Cc k þ λ Cc j C c ¼ ð1 λÞ Tc Tc jak k
The above equation can also denote as follows: 8 k k < ð1 λÞT T\ T c j ¼ k; X _k c 0j wjk C c ; where; wjk ¼ Cc ¼ : λT j \ T c j ak: j¼1
ð22Þ
ð23Þ
Tc
After computing the Composite local temporal covariance _k ~ is performed matrix C c , the calculation of the projection matrix W completely similar to Section 2.2. 2.4. The proposed method based on composite LTCCSP with selected subjects This paper proposes a new method based on Composite LTCCSP which considers the similarity between subjects using the Frobenius distance. In our proposed method all the source subjects do not have an equal role in constructing the Composite local temporal covariance matrix. Instead, the source subjects with more similar characteristics have more effect in constructing the Composite local temporal covariance matrix. Therefore, we called our proposed method as Composite LTCCSP with selected subjects. A schematic diagram of the proposed algorithm is illustrated in Fig. 3. Our proposed method based on the Composite LTCCSP with selected subjects uses a temporally local covariance matrix, introduced in Section 2.3, to construct a Composite covariance matrix, like Eq. (22). The difference is that with we focus on covariance matrices of subjects with similar characteristics to construct a covariance matrix of the target subject. In order to achieve this objective, the similarity between subjects is calculated by Frobenius norm between the covariance matrices of the two subjects under comparison as follows: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi F C j ;C k ¼ traceððC j C k ÞðC j C k Þn Þ ð24Þ where the notation |n| represents the conjugate transpose. In traditional CSP, the noise on one time point has the same effect on all the other time points. LTCCSP deletes this defect using the local temporal covariance matrix. The Composite local temporal covariance matrix is given by: X k C~ c ¼ ð1 λÞC 0k bjk C 0jc ð25Þ c þλ jak
where, C 0k c is the average normalized local temporal covariance matrix defined in Eq. (21) and bjk are weights which determine the influence of subjects with similar characteristics on constructing Composite local temporal covariance matrix. The weights bjk are defined by: bjk ¼
1
1 F N C j ;C k k
:
3. Experimental setup and result analysis
j
ð26Þ
P 1=F C l ;C k is the normalization constant for kth subwhere N k ¼ lak ject and the notation |.| shows the cardinality. If we assume subject K as the target subject, then subjectj a k with a more similar
3.1. Data description In this section, we describe the EEG dataset used for the evaluation of the proposed method for classifying different EEG signals during motor imagery tasks. We used the dataset IVa of the BCI competition III, which was recorded by the Neurophysics Group at Berlin University [19]. This dataset contains the EEG data of five subjects labeled as aa, al, av, aw, and ay who performed two different motor imagery tasks: right-hand and right-foot. 280 trails were recorded from each subject. Data was down-sampled at 100 Hz and 118 electrodes were placed according to the international 10/20 system. We segmented the EEG signals as 3.5 s from each visual cue onset and filtered them between 7 and 30 Hz (using the sixth order Butterworth bandpass filter). 3.2. Classification result analysis In this section, we present the implementation procedure and our experimental results using the proposed method. As mentioned before, focusing on subjects with similar characteristics to construct a covariance matrix of the target subject may yield more relevant features in a subject-to-subject BCI. To investigate this subject, the performance of our proposed method based on the Composite LTCCSP with selected subjects, which considers the similarity between subjects based on the Frobenius distance. We compared the results with different methods, including traditional CSP, Composite CSP, Composite CSP with KL-divergence, LTCCSP, and Composite LTCCSP. Traditional CSP uses the covariance matrices on a subject-specific basis. Composite CSP and Composite CSP with KL-divergence, as proposed by Kang et al. [15], uses the covariance matrices of a source subject group in addition to the covariance matrix of the target subject to construct the target's covariance matrix. Composite CSP uses a linear combination of covariance matrices calculated from the data of other subjects, where all the source subjects are equally treated in the regularization process. The Composite CSP with KL-divergence considers the similarity of information between the target subject and all other subjects using KL-divergence [15]. The proposed Composite LTCCSP, introduced in Section 2.3, uses the data of other subjects to perform subject-to-subject transfer. This method employs a linear combination of temporally local covariance matrices from other subjects to construct a Composite local temporal covariance matrix. In this algorithm, all the source subjects are treated equally to construct the Composite local temporal covariance matrix. We performed the proposed method, based on the Composite LTCCSP with selected subjects, according to the subject transfer methodology shown in Fig. 3. This method considers the similarity between subjects based on the Frobenius distance in a subject transfer process. Since the database used in this study contains the EEG data of five subjects, with the proposed methodology shown in Fig. 3, the data of four subjects is used as source subject data and the remaining subject is considered to be the target subject. Average classification results of applying traditional CSP, Composite CSP, Composite CSP with KL-divergence, LTCCSP and Composite
S. Hatamikia, A.M. Nasrabadi / Computers in Biology and Medicine 64 (2015) 1–11
7
Table 1 Classification accuracies (%) obtained for each subject for the traditional CSP, the Composite CSP, the Composite CSP with KL-divergence, LTCCSP, Composite LTCCSP, and the Composite LTCCSP. The results are reported for 2m (m¼1, 2, 3) features using Linear Discriminate Analysis (Linear), Quadratic discriminate Analysis (Quadratic) and Mahalanobis distance-based (Mahalanobis) classifiers. Method
Traditional CSP
Number of features
m¼1
m¼2
m¼3
Composite CSP
m¼1
m¼2
m¼3
Composite CSP with KL-divergence
m¼1
m¼2
m¼3
LTCCSP
m¼1
m¼2
m¼3
Composite LTCCSP
m¼1
m¼2
m¼3
Composite LTCCSP with selected subjects
m¼1
m¼2
m¼3
Classifier
Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis Linear Quadratic Mahalanobis
Subject aa
al
av
aw
ay
Mean
69.14 67.86 66.70 66.32 67.86 67.34 72.50 70.00 68.10 75.00 73.57 73.93 74.21 71.86 72.43 78.36 77.14 73.50 74.29 75.36 72.86 74.29 73.21 72.43 76.79 72.14 73.07 73.21 70.36 70.36 70.00 69.21 69.64 76.07 70.71 70.36 77.50 75.57 75.57 75.16 73.36 74.51 79.17 76.86 77.23 81.24 80.78 79.36 78.42 76.31 80.20 85.86 82.10 79.36
85.14 83.40 84.72 87.20 83.93 84.36 88.21 85.32 86.30 89.43 88.93 88.21 91.43 89.29 88.57 90.01 88.21 89.21 91.43 89.29 89.29 91.43 90.36 89.29 91.07 88.21 88.86 90.71 88.93 88.93 90.20 89.86 87.14 91.07 87.11 85.36 93.43 89.13 88.93 94.18 92.29 88.57 93.87 92.81 92.10 93.15 90.28 89.64 95.43 94.29 90.10 96.84 97.29 94.42
60.43 58.57 55.60 62.25 59.93 57.14 61.42 58.51 58.93 63.71 62.07 59.36 65.30 66.29 61.46 73.14 67.93 63.57 57.01 61.26 59.10 63.24 61.26 59.36 68.12 65.91 60.90 61.70 59.40 55.61 63.00 57.14 58.10 64.90 59.93 60.78 68.20 70.52 65.44 68.78 70.20 65.12 74.93 71.16 69.80 70.31 71.14 66.30 69.41 72.29 68.50 76.81 72.25 69.12
79.50 78.36 78.36 80.00 73.57 66.43 82.51 73.93 69.30 80.00 74.29 78.14 79.20 73.93 76.07 85.64 80.21 79.62 75.71 70.31 69.54 82.86 72.50 71.07 85.00 77.86 71.43 80.00 71.64 72.71 75.16 69.64 66.50 74.64 70.15 68.50 84.16 76.29 78.14 83.20 80.93 78.07 85.64 86.10 78.79 86.20 75.12 80.62 91.07 83.29 81.30 91.80 89.13 82.21
78.40 72.12 75.50 81.07 73.82 74.29 81.07 75.13 74.29 79.64 75.71 78.32 82.14 78.92 78.01 85.00 83.64 81.36 85.71 83.93 84.64 84.29 82.86 86.07 87.50 86.79 87.40 79.20 75.71 74.14 78.93 76.21 74.50 81.43 77.23 73.79 83.86 84.21 81.71 87.56 82.56 82.10 90.34 78.64 84.36 86.72 84.21 75.93 88.14 87.56 84.10 94.76 92.64 87.51
74.52 72.06 72.17 75.36 71.82 69.89 77.14 72.57 71.38 77.55 74.91 75.59 78.45 76.05 75.30 82.43 79.42 77.45 76.83 76.03 75.86 79.22 76.03 75.64 81.69 78.18 76.33 76.96 73.20 72.35 75.45 72.41 71.17 77.62 73.03 71.75 81.43 79.14 77.95 81.77 79.86 77.75 84.79 82.91 80.45 83.52 80.30 78.37 84.49 82.74 80.84 89.21 86.68 82.52
Discriminate Analysis (Linear), Quadratic discriminate Analysis (Quadratic) and Mahalanobis distance-based (Mahalanobis) classifiers.
LTCCSP in the terms of their relative accuracy over all repetitions using a10-fold cross validation technique are reported in Table 1. We extracted the 2m (m¼1, 2, 3) features associated with the largest eigenvalues of each class using the log-variances of the m first and last rows of the projected matrix Z. The extracted features we used as input features for three different classifiers include Linear Discriminate Analysis, Quadratic Discriminate Analysis, and the Mahalanobis distance-based classifier. A 10-fold cross validation strategy is used to compare the average accuracies of the regulation parameter λ within {0.1, 0.2,…, 0.9} and local temporal range τ within {2,…,5} on the training set, and then λ and τ are adjusted according to the highest
average accuracies. The bold-faced number shows the best performance for each subject among six different methods. Table 1 shows that our proposed method based on Composite LTCCSP with selected subjects is an efficient method for recognizing motor imagery tasks. The results demonstrate that our proposed methodology achieves a significant increase in performance compared to traditional CSP and LTCCSP. Comparing the results of all six methods has shown that the best performance is achieved by our proposed method based on Composite LTCCSP with selected subjects. Furthermore, the results show that the Composite LTCCSP method has shown better performance compared to the LTCCSP. Comparing the results of the
8
S. Hatamikia, A.M. Nasrabadi / Computers in Biology and Medicine 64 (2015) 1–11
Fig. 4. Comparison of the classification performance for each subject for six different methods.
Composite CSP and the Composite CSP with KL-divergence has shown that the results are nearly the same for both methods. Table 1 also reveals that using m¼3 leads to better accuracies compared to m¼1 and m¼ 2. For better comparison of the three different methods, the results are summarized in Fig. 4 for m¼3 (the number of features that shows the best results). Comparing the results of the Composite LTCCSP and the Composite LTCCSP with selected subjects methods, in which both of them propose a subject transfer framework, has demonstrated that it is worth using subjects with similar characteristics to regularize covariance matrices, since better performance is achieved using Composite LTCCSP with selected subjects method compared to the Composite LTCCSP, which does not consider similar characteristics between subjects for subject transfer process. The results of Fig. 4 show that the best performance
is achieved using Linear Discriminate Analysis classifier for all cases. These results are consistent with previous studies that introduced this technique as a successful classifier in motor imagery-based BCIs [5]. Features obtained from the first versus the last component of feature vectors (the components offer the most discriminative information) using Composite LTCCSP with selected subjects method are plotted in Fig. 5 for each subject and for m¼3 (the number of features which shows the best results).
4. Discussion and conclusion We have proposed a modification of CSP for subject transfer BCIs, where similar characteristics are considered to transfer
S. Hatamikia, A.M. Nasrabadi / Computers in Biology and Medicine 64 (2015) 1–11
9
Fig. 5. Plots of the magnitude of the first versus the last component of feature vectors f using Composite LTCCSP with selected subjects method for each subject. Blue stars correspond to the foot task, while red stars correspond to the hand task. (a) Subject aa (b) subject al (c) subject av (d) subject aw and (e) subject ay. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
knowledge from other subjects' data. For this aim, we suggested a new approach based on Composite local temporal correlation CSP with selected subjects, which considers similarity between subjects using Frobenius distance. We used dataset IVa of the BCI competition III to evaluate the performance of the suggested
method. This dataset includes EEG signals of 5 subjects who performed two different motor imagery tasks. The CSP method is one of the most successful techniques in motor imagery-based BCIs, but it has some inherent drawbacks. This method does not consider the time structure of the EEG
10
S. Hatamikia, A.M. Nasrabadi / Computers in Biology and Medicine 64 (2015) 1–11
signals; therefore, the performance of the designed system may be negatively altered by the inherent signal variability and the noise. Local temporally based CSP methods can overcome such inconvenience. In this study we used the LTCCSP method which considers local temporal correlation information to construct optimized spatial filters. Correlation-based LTCCSP has some advantages compared to Euclidean distance based LTCSP. LTCCSP is more stable than LTCSP, since the correlation coefficient allocates in [ 1, 1], while the Euclidean distance allocates in a wide range. Furthermore, LTCCSP is simpler than LTCSP to implement in practical use, since it needs only one parameter to be configured, but LTCSP needs two parameters to be configured. On the other hand, traditional CSP constructs covariance matrices on a subject-specific basis. In this case, only intrasubject variations are considered; inter-subject variations are not captured. This decreases the performance of BCIs in the case of user-dependent training samples and may lead to overfitting. To overcome this issue, we used the Composite approach as a subject transfer framework, wherein the useful information of a source subject group involving the same task is transferred to the target subject. Another positive point of our study is to employ the LTCCSP method in a subject-to-subject diagram. We proposed the Composite LTCCSP algorithm, which generalizes the LTCCSP method to a transfer learning framework. This method is able to consider the time structure of the EEG signals in a subject transfer approach. In our opinion, owing to the inter-subject variability, it is unreasonable to easily add other subjects' data to the training set of the target subject, and irrelevant features may deteriorate the performance. Hence, how to properly extract and transfer information from a source subject group is one of the most important issues in such subject transfer diagrams. In order to achieve positive transfer, we used Frebenius norm as a similarity metric between covariance matrices for the two subjects under comparison. In this case, we can emphasize datasets of subjects with similar characteristics to construct the covariance matrix of a target subject. In summary, compared with traditional CSP, our proposed method based on Composite LTCCSP with selected subjects has three advantages: the robustness to noise and potential outliers, the avoidance of overfitting and the use of the information of subjects with similar information to make an efficient training set. The robustness to noise is provided by local-temporal based covariance matrices and avoidance of overfitting is reached by applying the subject transfer using the Composite approach. Furthermore, using data from subjects with similar characteristics avoids irrelevant information and helps to construct covariance matrices that are more robust. Experimental results have shown that the Composite LTCCSP with selected subjects method has increased the performance significantly compared to the traditional CSP. We believe that the performance and good results achieved with our proposed method are due to all of the above advantages of the Composite LTCCSP method. Comparing the results of different methods has shown that the best performance is achieved by our proposed method based on Composite LTCCSP with selected subjects. Comparing the results of the Composite CSP and the Composite CSP with KL-divergence has shown that the results are nearly the same for both methods. This result shows that using KLdivergence to compute similarity between subjects cannot improve the performance compared to the condition that similarity between subjects is not considered. According to our experimental results, the Composite LTCCSP method has shown better performance compared to the LTCCSP. These results show that the performance improvement is due to the source subject's information used in the subject transfer framework.
Furthermore, comparing the results of Composite LTCCSP with selected subjects and Composite LTCCSP methods shows that better performance is achieved using Composite LTCCSP with selected subjects compared to Composite LTCCSP. We think that the performance difference is mainly affected by the definition of the Composite covariance matrix of the target subject. Both Composite LTCCSP with selected subjects and Composite LTCCSP introduce a subject transfer framework. However, Composite LTCCSP with selected subjects uses information of other subjects with similar characteristics to construct the Composite covariance matrix of the target subject, whereas the proposed Composite LTCCSP does not consider the similar characteristics between subjects for regularizing the Composite covariance matrix of the target subject. These results demonstrate that the suggested subject transfer framework based on Frebenius norm which considers the similarity information between subjects, is an efficient method that can obtain a positive knowledge transfer for enhancing the performance of BCIs. Our results suggest that it is worth emphasizing the data of subjects with similar characteristics to regularize covariance matrices. Our proposed Composite LTCCSP method provides better classification accuracies compared to previous studies which proposed subject transfer frameworks to regularize covariance matrices [10,15,20]. Furthermore, the results of our study demonstrate that the performance of the suggested method is very close to the best results of the recently reported methods for the motor imagery tasks EEG signal classification that were developed using this database [21–25]. All of these studies employed different feature selection algorithms to achieve their performance. Our proposed method has achieved similar results without the additional step of feature selection before the classification process. Several previous studies have demonstrated that the filter band and time interval are two crucial factors for feature extraction based on CSP methods. As a feature work, the performance of the proposed system can be improved by selecting the proper time interval and frequency band. Furthermore, future work will focus on extending our approach to a multiclass problem in order to classify EEG signals related to multi-class events. References [1] S. Hatamikia, A.M. Nasrabadi, N. Shourie, Plausibility assessment of a subject independent mental task-based BCI using electroencephalogram signals, in: Proceedings of the 21st Iranian Conference on Biomedical Engineering (ICBME 2014), Tehran, Iran, 2014, pp. 150–155. [2] Z. Zhou, E. Yin, Y. Liu, J. Jiang, D. Hu, A novel task-oriented optimal design for P300-based brain–computer interfaces, J. Neural Eng. 11 (5) (2014) 056003. [3] E. Yin, Z. Zhou, J. Jiang, Y. Yu, D. Hu, A dynamically optimized SSVEP brain– computer interface (BCI) speller, IEEE Trans. Biomed. Eng. 99 (2014) 1–10. [4] G. Pfurtschellera, C. Brunnera, A. Schlögla, F. LopesdaSilva, Mu rhythm(de) synchronization and EEG single-trial classification of different motor imagery tasks, NeuroImage 31 (1) (2006) 153–159. [5] G. Pfurtscheller, F.H. Lopes da Silva, Eventrelated EEG/MEG synchronization and desynchronization: basic principles, Clin. Neurophysiol. 110 (11) (1999) 1842–1857. [6] H. Ramoser, J. Muller-Gerking, G. Pfurtscheller, Optimal spatial fltering of single trial EEG during imagined hand movement, IEEE Trans. Rehabil. Eng. 8 (4) (2000) 441–446. [7] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, K.R. Muller, Optimizing spatial filters for robust EEG single-trial analysis, IEEE Signal Process. Mag. 25 (1) (2008) 41–56. [8] J. Muller-Gerking, G. Pfurtscheller, H. Flyvbjerg, Designing optimal spatial filters for single-trial EEG classification in a movement task, Clin. Neurophysiol. 110 (1999) 787–798. [9] P. Xu, T Liu, R. Zhang, Y. Zhang, D. Yao, Using particle swarm to select frequency band and time interval for feature extraction of EEG based BCI, Biomed. Signal Process. Control 10 (2014) 289–295. [10] F. Lotte, C. Guan, Regularizing common spatial patterns to improve BCI designs: unified theory and new algorithms, IEEE Trans. Biomed. Eng. 58 (2) (2011) 355–362. [11] H. Ghaheri, A.R. Ahmadyfard, Extracting common spatial patterns from EEG time segments for classifying motor imagery classes in a Brain Computer Interface (BCI), Sci. Iran. 20 (6) (2013) 2061–2072.
S. Hatamikia, A.M. Nasrabadi / Computers in Biology and Medicine 64 (2015) 1–11
[12] H. Wang, W. Zheng, Local temporal common spatial patterns for robust singletrial EEG classification, IEEE Trans. Neural Syst. Rehabil. Eng. 16 (2) (2008) 131–139. [13] R. Zhang, P. Xu, T. Liu, Y. Zhang, L. Guo, P. Li, D. Yao, Local temporal correlation common spatial patterns for single trial eeg classification during motor imagery, Comput Math. Methods Med. 591216 (2013). [14] H. Lu, K. Plataniotis, A. Venetsanopoulos, Regularized common spatial patterns with generic learning for EEG signal classification, in Proceedings of Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2009. [15] H. Kang, Y. Nam, S. Choi, Composite common spatial pattern for subject-tosubject transfer, IEEE Signal Process. Lett. 16 (8) (2009). [16] B. Blankertz, M. Kawanabe, R. Tomioka, F. Hohlefeld, V. Nikulin, K.R. Muller, Invariant common spatial patterns: alleviating nonstationarities in brain– computer interfacing, in: Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA, 2008. [17] W. Tu, Sh. Sun, A subject transfer framework for EEG classification, Neurocomputing 82 (2012) 109–116. [18] Y. Koren, L. Carmel, Robust linear dimensionality reduction, IEEE Trans. Vis. Comput. Graph. 10 (4) (2004) 459–470. [19] B. Blankertz, K.R. Muller, D.J. Krusierski, G. Schalk, J.R. Wolpaw, A. Schlögl, G. Pfurtscheller, N. Birbaumer, The BCI competition III: validating alternative
[20]
[21]
[22]
[23]
[24]
[25]
11
approaches to actual BCI problems, IEEE Trans. Neural Syst. Rehabil. Eng. 14 (2006) 153–159. H. Lu, H.L. Eng, C. Guan, K.N. Plataniotis, A.N. Venetsanopoulos, Regularized common spatial pattern with aggregation for EEG classification in smallsample setting, IEEE Trans. Biomed. Eng. 57 (12) (2010) 2936–2946. X. Yu, P. Chum, K.B. Sim, Analysis the effect of PCA for feature reduction in non-stationary EEG based motor imagery of BCI system, Optik 125 (2014) 1498–1502. L. He, Y. Hu, Y. Li, D. Li, Channel selection by Rayleigh coefficient maximization based genetic algorithm for classifying single-trial motor imagery EEG, Neurocomputing 121 (2013) 423–433. Y. Li, P. Wen, Modified CC-LR algorithm with three diverse feature sets for motor imagery tasks classification in EEG based brain–computer interface, Comput Methods Programs Biomed. 113 (2014) 767–780. J. Meng, G. Huang, D. Zhang, X. Zhu, Optimizing spatial spectral patterns jointly with channel configuration for brain–computer interface, Neurocomputing 104 (2013) 115–126. M. Arvaneh, C. Guan, K.K. Ang, C Quek, Mutual information-based optimization of sparse spatio-spectral filters in brain–computer interface, Neural Comput. Appl. 25 (2014) 625–634.