Local temporal common spatial patterns modulated with phase locking value

Local temporal common spatial patterns modulated with phase locking value

Biomedical Signal Processing and Control 59 (2020) 101882 Contents lists available at ScienceDirect Biomedical Signal Processing and Control journal...

1MB Sizes 0 Downloads 61 Views

Biomedical Signal Processing and Control 59 (2020) 101882

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc

Local temporal common spatial patterns modulated with phase locking value Zhenhua Yu a , Tian Ma a , Na Fang b , Haixian Wang b,∗ , Zhanli Li a,∗ , Hui Fan c a

College of Computer Science and Technology, Xi’an University of Science and Technology, Xi’an 710054, Shanxi, PR China Key Laboratory of Child Development and Learning Science of Ministry of Education, School of Biological Science & Medical Engineering, Southeast University, Nanjing 210096, Jiangsu, PR China c Co-innovation Center of Shandong Colleges and Universities: Future Intelligent Computing, Shandong Technology and Business University, Yantai 264005, Shandong, PR China b

a r t i c l e

i n f o

Article history: Received 15 August 2019 Received in revised form 21 January 2020 Accepted 8 February 2020 Keywords: Common spatial patterns (CSP) Local temporal common spatial patterns (LTCSP) Brain–computer interfaces (BCI) Phase locking value (PLV) EEG classification

a b s t r a c t In the field of electroencephalogram (EEG)-based brain–computer interfaces (BCI), the method of common spatial patterns (CSP) is formulated as a problem of eigen-decomposition of covariance matrices. By using temporally local samples to construct the covariance matrices, the approach of local temporal common spatial patterns (LTCSP) are developed, which performs manifold modeling. It is useful to consider the intrinsic structure of samples in defining the temporally local covariance matrices. From the perspective of neurophysiological knowledge, phase synchronization indicates information communication. In this paper, we apply phase locking value (PLV) to quantify the phase relationship between samples, which are then adopted as weights to define the temporally local covariance matrices. As a result, we obtain the PLV-modulated LTCSP. More discriminative features are discovered with the approach proposed. Experiments of EEG classification on three EEG data sets (i.e., the data sets IIIa and IVa of BCI competition III and the data set IIa of BCI competition IV) demonstrate the effectiveness of the proposed technique. The average classification accuracies of the proposed method on the three data sets are 90.56%, 83.25%, and 83.26%. In the case of noise introduced, the average classification accuracy of the proposed method exceeds the conventional CSP and LTCSP by nearly 10% and 6%, respectively. The hypothesis test indicates the superiority of the proposed method in terms of statistical significance. © 2020 Elsevier Ltd. All rights reserved.

1. Introduction The manifold structure of electroencephalogram (EEG) time series is useful for designing EEG-based brain–computer interfaces (BCI), in which a critical issue is to decode different mental activities as accurate as possible [12]. For this purpose, discriminative features are critically desirable. In literature, plenty of modern machine learning strategies are employed to extract classification features as discriminative as possible [15,32]. As a successful paradigm, the formulation of common spatial patterns (CSP) [6] is a classical and effective technique for extracting features by using spatial filters. Based on the neurophysiological effect of event-related desynchronization and synchronization (ERD/ERS) appeared with - and ˇ-rhythms [20], CSP maximizes a spatially filtered variance of one class and meanwhile minimizes that of another class. Due to the effectiveness of CSP, it has been

∗ Corresponding authors. E-mail addresses: [email protected] (H. Wang), [email protected] (Z. Li). https://doi.org/10.1016/j.bspc.2020.101882 1746-8094/© 2020 Elsevier Ltd. All rights reserved.

intensively explored in the field of BCI [8,17]. A large number of CSP extensions have been studied, such as common spatio-spectral patterns (CSSP) [14], regularized CSP (RCSP) [16], sparse CSP (SCSP) [3], stationary CSP (sCSP) [23], and Kullback–Leibler-based discriminant CSP (KLCSP) [4,22]. We have developed L1-norm-based CSP [28] for robust EEG classification and comprehensive CSP (cCSP) for semisupervised learning [29]. In recent years, CSP still receives increasing attention by researchers. Some new methods continue to occur including probabilistic CSP [31], regularized sensor covariance matrices [21], separable common spatio-spectral patterns [1], as well as connection with deep learning [24]. Although the CSP-based spatial filtering approaches achieve satisfying classification performance in some situations, they are global methods in terms of processing EEG samples. That is, the manifold structure of EEG time series is neglected. To remedy this shortcoming, we have developed local temporal CSP (LTCSP) [30] to address the temporally local manifold of EEG time series. By designing temporally local variances based on a basic technique of machine learning, LTCSP characterizes the temporally local structure of EEG signal. If using correlation to design the weights in the temporally local variances, a variant of LTCSP, called local temporal

2

Z. Yu, T. Ma, N. Fang et al. / Biomedical Signal Processing and Control 59 (2020) 101882

correlation CSP (LTCCSP), is developed [33]. It is, however, beneficial to discover the intrinsic manifold of EEG time series from the perspective of neurophysiological information. In neurophysiological community, it is generally agreed that phase synchronization reflects signal communication [27,25] and thus latently underlies different time points of EEG signal. The EEG samples with phase synchronization are in fact intrinsically “close”. Learning such kind of manifold may yield discriminative features. Specifically, if incorporating the phase synchronization into the framework of the temporally local manifold learning, we may obtain an enhanced spatial filtering approach. It combines the spatial amplitude and the temporal phase information. In this paper, we consider performing temporally local manifold learning by explicitly incorporating the information of phase synchronization into the framework of LTCSP. The classical index of phase locking value (PLV) [13] is employed to quantify the phase synchronization. Specifically, we reformulate the temporally local variances by designing a weight function based on PLV rather than the amplitude of EEG signal. The larger the PLV quantity between two samples is, the heavier weight is endowed. We term the proposed method as PLV-modulated LTCSP (p-LTCSP). Finally, the extracted features of p-LTCSP are fed into the classifier of Fisher linear discriminant analysis (LDA) [9]. It is worthwhile to highlight three main properties of the proposed p-LTCSP approach as follows. (a) The PLV quantity, which is a classical index of quantifying the phase synchronization, is introduced into the framework of LTCSP to discover the intrinsically local manifold. (b) p-LTCSP is an enhanced spatial filtering approach which meanwhile utilizes the temporal phase information. (c) Like LTCSP, p-LTCSP is computationally efficient by simultaneously diagonalizing two temporally local covariance matrices. The rest of this paper is organized as follows. The methods of CSP and LTCSP are briefly reviewed in Section 2. In Section 3, we introduce the p-LTCSP method, including instantaneous phase with PLV, LTCSP with PLV as weight, feature extraction and classification. The experimental results are reported in Section 4. Finally, Section 5 concludes this paper.

2. Brief review of CSP and LTCSP Both CSP and LTCSP aim to produce spatial filters based on multi-channel EEG time series for a two-class paradigm. Assume that Xi = {xik |k = 1, . . ., n} ∈ Rc×n (i = 1, . . ., Tx ) are EEG trials of j

one class (corresponding to one mental activity) while Yj = {yk |k = 1, . . ., n} ∈ Rc×n (j = 1, . . ., Ty ) the other class (corresponding to the other mental activity), where c and n denote the number of EEG channels and the number of sampling time points respectively, and Tx and Ty the numbers of trials recorded under the two conditions. The c-channel EEG signals recorded at one time point form a vector of c-dimensional sample. For simplicity of expression, the EEG samples are supposed to be band-pass filtered and mean-centered. The CSP method seeks projection vectors that maximize (or minimize) the quotient of variances of filtered samples of the two classes. Computationally, the optimization is implemented by simultaneously diagonalizing the two covariance matrices. Likewise, LTCSP performs the same optimization as CSP, in which the variance matrices are replaced with the time-dependent local temporal counterparts. Let us denote the sample covariance matrices of the two classes by Cx and Cy , which are formulated as 1 Cx = 2n2

n  l,m=1

Cy =

1 2n2

n 

(yl − ym ) (yl − ym )T ,

(2)

l,m=1

where, for simplicity, the superscript symbol of trial index is dropped off. With temporally local manifold learning, the sample covariance matrices are re-formulated by using only temporally local sample pairs, given by ˜x = 1 C 2n2

˜y = 1 C 2n2

n 

(xl − xm ) (xl − xm )T f (xl , xm ),

(3)

(yl − ym ) (yl − ym )T f (yl , ym ),

(4)

l,m=1 n  l,m=1

where f(xl , xm ) is a weight function of the temporally neighboring sample pair (xl , xm ). The weight function is usually defined such that temporally close sample pairs contribute more to the sample covariance matrices. The design of the weight function f plays an important role in LTCSP. The commonly used weight functions comprise the heat kernel with the Euclidean metric [30] and the correlation measure [33]. The spatial filters are obtained by either simultaneous diagonalization or generalized eigen-decomposition of the two averaged covariance matrices across the trials of each class. Due to the underlying effect of ERD/ERS, the difference of the two classed could be reflected by the filtered variance. Therefore, to extract discriminative spatial filters, a few leading eigenvectors associated with the largest or the smallest eigenvalues are exploited so as to contrast the difference between the two classes.

3. PLV modulated LTCSP 3.1. Motivation While LTCSP is an effective method of capturing temporally local manifold of EEG time series, the determination of the intrinsic manifold (embodied by the weight function) is still an open problem. By using the basic technique of machine learning, previously designed functions are based on the relation of amplitudes of EEG signal. The neurophysiological information, however, is not taken into account in the design of manifold learning. From the neurophysiological perspective, it is generally suggested that phase synchronization reflects signal communication and cognitive activity [27,25], since EEG epochs are essentially time series. The phase synchronization, on the other hand, relates different time points of EEG time series latently. The samples with intrinsical relationship, which are characterized by the phase synchronization, are intrinsically “close”. This kind of manifold may contain discriminative information. It is therefore beneficial to design a weight function based on the phase synchronization rather than the amplitudes of EEG signal. Specifically, intrinsically close sample pair is endowed with a heavy weight so as to highlight temporally local manifold. To design the weight function, the phase synchronization is needed to be quantified. In this paper, the classical PLV index is employed, which is then incorporated into the weight function design. Consequently, both amplitude and phase information are addressed in the temporally local covariance matrices. 3.2. Instantaneous phase and PLV

(xl − xm ) (xl − xm )T ,

(1)

The instantaneous phase of EEG time series is usually computed by the Hilbert transform [19]. For EEG time series z(t) from a spe-

Z. Yu, T. Ma, N. Fang et al. / Biomedical Signal Processing and Control 59 (2020) 101882

cific channel, the Hilbert transform constructs a complex-valued ˙ analytical signal z(t) by using z(t) as the real part, given by ˙ z(t) = z(t) + iˆz (t) = A(t)ei(t) ,

(5)

where zˆ (t) is the imaginary part, A(t) the instantaneous amplitude and (t) the instantaneous phase. With the Hilbert transform of z(t), the imaginary part zˆ (t) is computed as [26] 1 zˆ (t) = PV 





−∞

z() d, t−



zˆ (t) z(t)

(t) = arctan

(7)

  c    1   PLV(xl , xm ) =  exp(ilm (r)) , c 

j

j

Ly = Dy − Fy and Lsz = Dsz − Fsz for i = 1, . . ., Tx ;j = 1, . . ., Ty ;s = 1, . . ., Tz . 2. Covariance matrices formulation: With the Laplacian matrices given T above, we compute the covariance matrices C˜i = 1 Xi Li (Xi ) ,

0,

otherwise,

j=1

=

˜j wT1 Cy w1 ˜j tr(Cy ) wT1 C˜sz w1 tr(C˜sz )

, . . .,

, . . .,

˜j wTd Cy wd ˜j tr(Cy ) wTd C˜sz wd tr(C˜sz )

T

, for i = 1, . . ., Tx ;j = 1, . . ., Ty ;s = 1, . . ., Tz .

T ,

The features are then normalized by a log-transformation. i j 5. LDA projection: By using fx (i = 1, . . ., Tx ) and fy (j = 1, . . ., Ty ) as training samples, we learn a projection basis, which is then used to project the i j training feature vectors fx (i = 1, . . ., Tx ) and fy (j = 1, . . ., Ty ), as well as the s testing feature vectors fz (s = 1, . . ., Tz ). 6. Classification: The label of each testing trial is then assigned as the label of the training trial that is the most close with the testing trial in the LDA projected space.

ance matrices C˜x and C˜y , in which the covariance matrices are averaged across trials of each class.

In light of the property of phase synchronization in identifying intrinsically close EEG samples, we aim to introduce manifold learning by using PLV as weight between two EEG samples. The larger the PLV index is, the more the corresponding sample pair is emphasized: |l − m| ≤ ,



j

s fz

3.3. LTCSP with PLV as weight

PLV(xl , xm ),

x

generalized eigenvectors corresponding to the three largest and the three smallest values of eigenvalues are specified as the spatial filters w1 , . . ., wd (d = 6). 4. Feature extraction: For the training trials Xi and Yj as well as the testing trials Zs , we compute the corresponding feature vectors as ˜i

T wTd C˜ix wd wT1 Cx w1 i , . . ., , fx = tr(C˜ix ) tr(C˜ix )

(8)

where  lm (r) is the difference of instantaneous phases of xl and xm fixed to channel r. Here, the symbol of trial index is dropped off again. The PLV index quantifies the status of phase locking rather than phase value itself. Phase locking suggests underlying neural interaction [13]. PLV, as defined in (8), reflects neural relationship between two EEG samples. Clearly, PLV is based on the phase quantities exclusively and thus is insensitive to amplitude of EEG signal. The PLV index is one if perfect phase locking occurs and tends to zero if phases distribute randomly.

n2

T T ˜j j Cy = 12 Yj Ly (Yj ) and C˜sz = 12 Zs Lsz (Zs ) , for i = 1, . . ., Tx ;j = 1, . . ., Ty ;s = 1, . . ., n n Tz . 3. Spatial filters calculation: Solve the generalized eigenvalue decomposition of the matrices C¯x and C¯y , where C¯x and C¯y are the averaged matrices Tx ˜i ˜i Ty ˜j ˜j defined as C¯x = T1x Cx /tr(Cx ) and C¯y = T1y Cy /tr(Cy ). The

fy =

r=1

f (xl , xm ) =

1. Weights computation: For the trials Xi , Yj and Zs , we compute PLV between any two time points within each trials by (9), obtain the weight j matrices Fix , Fy and Fsz , and form the Laplacian matrices Lix = Dix − Fix ,

i=1

.

Once instantaneous phases are obtained, the PLV index can be computed [13] to directly qualify phase synchronization. Mathematically, it is the length of average vector resulted from instantaneous phase differences when projected onto the unit circle:



Input: The training trials Xi (i = 1, . . ., Tx ), Yj (j = 1, . . ., Ty ), the testing trials Zs (s = 1, . . ., Tz ), and the temporal range parameter . Output: Labels of the testing trials.

x

where PV is the Cauchy principal value. The original signal is switched /2 phase by the Hilbert transform. As a result, the instantaneous phase is obtained as



Table 1 The algorithmic procedure of the p-LTCSP method.

j

(6)

3

(9)

where  is a temporal range parameter. Let Fx be the weight matrix whose elements are f(xl , xm ) (l, m = 1, . . ., n), Dx be the diagonal matrix the lth diagonal entry of which is computed as  n f (xl , xm ), and Lx be the Laplacian matrix given as Dx − Fx . m=1 Notice that Fx is a symmetric matrix. As a result, with some procedures of matrix computation, the temporally local covariance matrix C˜x is re-written as 12 XLx XT , where X is the matrix X = [x1 , n . . ., xn ]. It can be seen that only temporally close sample pairs contribute to the formulation of covariance matrix. Likewise, the above formulation could be applied to the other class. Consequently, the temporally local covariance matrix C˜y is 12 YLy YT , where Y is the n matrix Y = [y1 , . . ., yn ]. We denote the PLV-modulated LTCSP as pLTCSP. The main advantage of p-LTCSP is that the phase locking information is implemented. As a result, the temporally local manifold is discovered. It is thus expected that more accurate features for discrimination are extracted. Computationally, the projection vectors of p-LTCSP are obtained by simultaneously diagonalizing the two temporally local covari-

3.4. Feature extraction and classification According to the principle of ERD/ERS, a few projection vectors produce discriminative features, given by the variances of the spatially filtered EEG samples. Specifically, for an EEG trial, Z = [z1 , . . ., zn ], its p-LTCSP feature vector is formulated as



fz = wT1 ZLz ZT w1 , . . ., wTd ZLz ZT wd

T

,

(10)

where w1 , . . ., wd are a few selected projection vectors yielded by p-LTCSP, and Lz is the associated Laplacian matrix. The value of d defines the reduced dimension of EEG trials, and is empirically set as in the experimental settings. Further, the features are usually they are normally distributed, i.e., fzu ←− log-transformed u  u such that u log fz / fz , where fz is the uth entry of the d-dimensional feature vector fz . The resulting feature vectors of the training and testing EEG trials are then input into the LDA classifier [9]. The algorithmic procedure of the proposed p-LTCSP method is summarized in Table 1. The block diagram is illustrated in Fig. 1. It is seen that the above algorithmic procedure is straightforward. In summary, the main idea of the proposed p-LTCSP is as follows. The intrinsic structure of each EEG trial is characterized by using the PLV index, which defines the phase relationship between samples within the EEG trial. The PLV quantities are then employed as weights to formulate the temporally local covariance matrices. Next, the spatial filters are calculated by eigen-decomposing the two classes of the temporally local covariance matrices which are

4

Z. Yu, T. Ma, N. Fang et al. / Biomedical Signal Processing and Control 59 (2020) 101882

Table 2 The classification accuracies of CSP, LTCSP, LTCCSP, and p-LTCSP on the 17 subjects of the three data sets. The optimal parameter  is given in parenthesis. Data set

IIIa of BCI competition III

Subject

s1

s2

s3

IVa of BCI competition III aa

al

av

aw

ay

CSP LTCSP LTCCSP p-LTCSP

96.67 100.00(2) 100.00(2) 100.00(2)

70.00 73.33(9) 73.33(7) 71.67(8)

98.33 100.00(7) 100.00(7) 100.00(3)

69.64 73.21(7) 75.00(6) 77.68(5)

100.00 100.00(4) 100.00(6) 100.00(4)

65.31 71.43(4) 72.45(11) 71.94(4)

93.75 89.73(5) 88.39(5) 92.41(8)

74.60 74.60(4) 78.17(3) 74.21(4)

Data set

IIa of BCI competition IV

Subject

A01E

A02E

A03E

A04E

A05E

A06E

A07E

A08E

A09E

CSP LTCSP LTCCSP p-LTCSP

89.58 90.97(10) 90.97(11) 92.36(9)

61.81 62.50(8) 61.11(2) 65.28(9)

96.53 96.53(9) 96.53(9) 97.22(9)

73.61 74.31(9) 72.22(7) 71.53(11)

67.36 77.78(4) 79.86(4) 78.47(5)

65.28 71.53(7) 71.53(10) 71.53(9)

80.56 82.64(3) 82.64(6) 85.42(9)

94.44 95.14(9) 95.14(5) 95.14(5)

91.67 93.06(10) 92.36(3) 92.36(3)

Fig. 1. The block diagram of the p-LTCSP method.

averaged across the trials within each class. With the obtained spatial filters, the spatial features are extracted. Finally, the extracted features are fed into the LDA classifier followed by classification. 4. Experiments The experiments are performed on three publicly available EEG data sets of BCI competitions: data set IIIa of BCI competition III, data set IVa of BCI competition III, and data set IIa of BCI competition IV. The classification performance of the proposed p-LTCSP is compared with the conventional LTCSP and LTCCSP methods. The experiments are further conducted when artificial noise is introduced into the data sets. 4.1. EEG data sets We use three data sets of BCI competitions to evaluate the effectiveness of the proposed p-LTCSP technique for EEG classification. Totally 17 subjects are recruited to collect EEG signals. EEG signals are recorded while motor imageries of hands and tongue activities are cued to be performed. This paper focuses on the classification of two classes, i.e., left versus right hands or right hand versus foot motor imageries. The EEG signals from the data set IIIa of BCI competition III involve three subjects (s1, s2, and s3) [5]. The numbers of electrodes and sampled frequency are 60 and 250 Hz, respectively. The numbers of training and testing trials for each subject are equal,

which are 90, 60, and 60 for s1, s2 and s3, respectively. We aim to perform classification of left and right hands motor imageries. There are five subjects (i.e., aa, al, av, aw and ay) in the data set IVa of BCI competition III [5]. The numbers of electrodes and sampled frequency are 118 and 100 Hz, respectively. The number of total trials is 280 for each subject, in which the numbers of training trials are 168, 224, 84, 56 and 28 for aa, al, av, aw and ay, respectively. In the experiment, the subjects perform motor imageries of left hand, right hand and right foot according to a visual cue that is presented for 3.5s. Right hand versus foot motor imageries are needed to be classified. The data set IIa of BCI competition IV contains nine persons (A01E, . . ., A09E) [18]. Four cued motor imageries, i.e., left hand, right hand, foot and tongue are performed. The numbers of electrodes and sampled frequency are 22 and 250 Hz, respectively. The trials are separated into two sessions. In each session, there are 72 trials for each condition. In the experiment, the trials in one session are used as training samples and the other session testing samples. We carry out the classification of left and right hands motor imageries. 4.2. Preprocessing and settings According to the neurophysiological knowledge, the motor related frequency components occur in the  and ˇ bands. We thus apply a fifth-order Butterworth filter with the frequency range [8 Hz, 30 Hz] to the EEG samples. For the three data sets, the EEG epochs are captured in the range [0.5 s, 2.5 s] when referenced to the visual cue of motor imagery. The parameter  is selected in the range {2, . . ., 12}. The number of projection vectors (i.e., the value of d) is set as six (i.e., three pairs), by following the common practice in BCI [6,16]. As a result, a six-dimensional feature vector is associated with each EEG trial. The feature vectors of the training trials are further input into the LDA classifier [9]. Since the training vectors belong to two classes, with the LDA, we learn one projection vector by maximizing the ratio of the between-scatter variance to the within-scatter variance. Computationally, the LDA is formulated as a generalized eigenvalue-decomposition problem. Once the projection basis is obtained, the six-dimensional feature vectors are projected as one-dimensional scalars. The labels of the testing EEG trials are assigned as the ones of the nearest training trials. 4.3. Noise simulation We simulate the situation that the training samples are polluted by noise so as to evaluate the classification performance of the different spatial filtering approaches. The artificially generated noise points abide by a c-dimensional Gaussian distribution N(0, ),

Z. Yu, T. Ma, N. Fang et al. / Biomedical Signal Processing and Control 59 (2020) 101882

5

Fig. 2. Mappings of spatial filters produced by CSP, LTCSP, LTCCSP and p-LTCSP on subject aa of data set IVa of BCI competition III, where only the first pair of spatial filters are displayed. (a) CSP. (b) LTCSP. (c) LTCCSP. (d) p-LTCSP.

where is the covariance matrix of the training samples and is a small quantity. In this experiment, is set as 0.005. This type of noise simulates the generation mechanism of the EEG samples, but has small magnitude. It is expected to acts at noise disturbance that affects EEG during a specific brain activity. Clearly, it cannot model all noise effects. The signal-to-noise ratio (SNR) is measured by the method of maximum signal fraction analysis (MSFA) [2,11], which boils down to a quadratic maximization problem. All the training data samples are added by the noise. For each implementation of adding noise, we repeat the experiment ten times and record the mean classification accuracy. 4.4. Experimental results The classification accuracies of different methods (i.e., CSP, LTCSP, LT-CCSP, and p-LTCSP) with the LDA as classifier are summarized in Table 2. These methods are to extract spatial features while p-LTCSP additionally uses phase synchronization information. In accordance with the setting in [6,16], three pairs of spatial filters are adopted, resulting in six spatial features. The features are further reduced to scalars when LDA is applied. As seen in the table, the classification accuracies on the seventeen subjects vary with these methods. In most cases, the classification accuracies are much larger than the chance level, which suggests the effectiveness of the spatial filtering approaches in classifying motor imagery tasks. Especially on some subjects, such as s1, s3, al, A01E, A03E, A08E, and A09E, the classification accuracies are larger than 90%. The performance of the standard CSP method is used as a benchmark. The purpose is to see whether the other improved methods can enhance the classification performance. Generally, LTCSP, LTCCSP, and p-LTCSP exhibit better performance than that of CSP. This implies that the temporally local information could deliver more discriminative information. In most cases, the p-LTCSP produces the best classification results across the 17 subjects. The median values of CSP, LTCSP, LTCCSP, and pLTCSP are 80.56, 82.64, 82.64, and 85.42, respectively. We see that p-LTCSP achieves the highest median value of classification accuracy. This suggests that the incorporation of phase synchronization into the temporally local variance can further enhance the classification performance. The advantage of the employment of phase synchronization is thus demonstrated on these subjects. For visual perception, we delineate some examples of the first pair of spatial filters produced by CSP, LTCSP, LTCCSP and p-LTCSP

on subject aa of data set IVa of BCI competition III in Fig. 2. Based on neurophysiological knowledge, the ERS/ERD rhythms occur in sensorimotor cortex, the peaks of which are during - and ˇbands for motor imagery activity [20]. Discriminative filters are thus yielded by different spatial filtering methods. These methods are formulated based on the mechanism of contralateral dominance of rhythms that occur during motor imagery. As shown in Fig. 2, the motor cortex areas of left-hemisphere and middle brain (plus some unexpected regions) are endowed with large weights, which is accordance with the neurophysiological point. The first pair of spatial filters of these methods are in fact slightly different although they look similar. The reason may be that the first pair of spatial filters may be mainly controlled by band power and the difference is slight. For the four methods of spatial filtering, it is of great importance to test the statistical significance of the differences of classification accuracies. The classification accuracies of different methods are correlated, since they are obtained on the same seventeen subjects. The nonparametric Friedman test [7] is applied to conduct multiple comparisons. Here, the classification accuracies of each method are treated as a group. There are four groups in this situation. Different subjects are treated as blocks in using the nonparametric Friedman test. The statistic of the Friedman test is computed by using the ranks of the classification accuracies within each block. With the test, the null hypothesis is rejected under the significance level 0.05. Therefore, the classification accuracies of the four methods cannot be treated as equal in terms of the statistical test. Post hoc multiple comparisons test is thus needed for pairwise comparison. For this purpose, we use the Friedman rank sum test [10]. The test reveals that the classification accuracy of the proposed p-LTCSP is superior than the other three methods (p < 0.05). The classification accuracies of the four methods on the three data sets with artificial noise introduced are presented in Table 3 and Fig. 3. The averaged MSFA values across the trials for the three data set are 2.76, 1.54, and 1.39. Because of individual difference, the classification performance is different on different subjects when artificial noise is introduced. It is observed from the figure that the classification performance of the four methods deteriorates with different extent in the noisy situation. Clearly, the conventional CSP method has the lowest classification accuracy. Especially on subjects s2, av, aw, ay, A02E, A04E and A05E, the classification accuracies are around the chance level by the impact of noise. By contrast, LTCSP, LTCCSP and p-LTCSP have relatively superior

6

Z. Yu, T. Ma, N. Fang et al. / Biomedical Signal Processing and Control 59 (2020) 101882

Table 3 The classification accuracies of CSP, LTCSP, LTCCSP, and p-LTCSP on the seventeen subjects of the three data sets with noise added. Data set

IIIa of BCI competition III

Subject

s1

s2

s3

IVa of BCI competition III aa

al

av

aw

ay

CSP LTCSP LTCCSP p-LTCSP

75.03 82.22 74.41 90.17

55.32 56.75 56.52 62.02

68.21 74.33 75.21 80.34

67.54 65.14 70.17 71.39

83.67 96.01 88.39 90.27

50.15 55.15 53.27 59.38

51.26 55.82 53.09 62.07

52.34 52.37 53.15 58.53

Data set

IIa of BCI competition IV

Subject

A01E

A02E

A03E

A04E

A05E

A06E

A07E

A08E

A09E

CSP LTCSP LTCCSP p-LTCSP

70.47 71.32 77.56 82.60

50.72 62.66 58.32 70.23

62.27 62.32 66.92 70.23

52.33 50.18 51.56 55.15

50.17 55.23 53.38 54.36

55.27 56.75 56.32 60.14

60.28 61.23 62.28 73.38

73.25 79.46 78.71 85.29

69.04 69.32 70.90 74.62

Fig. 3. Classification accuracies of CSP, LTCSP, LTCCSP, and p-LTCSP on the 17 subjects from the three real EEG data sets with artificial noise introduced. The last group in each panel indicates the mean classification accuracy across the subjects of each EEG data set. (a) Data set IIIa of BCI competition III. (b) Data set IVa of BCI competition III. (c) Data set IIa of BCI competition IV.

Fig. 4. The classification accuracies of CSP, LTCSP, and LTCCSP versus p-LTCSP on the three real EEG data sets with artificial noise introduced. The horizontal and vertical coordinates of one point are the classification accuracies of two methods on one subject. The classification of the proposed p-LTCSP method is superior than the other methods for the points below the diagonal line.

classification accuracies across the seventeen subjects. The results demonstrate that the temporally local manifold learning is beneficial in resisting noise. Generally, the proposed p-LTCSP method has the highest classification accuracies on most subjects. Statistical tests of multiple comparisons between the proposed p-LTCSP method and the other three methods indicate that the differences are significant (p < 0.05). Specifically, the rank sum statistic between CSP and p-LTCSP is 0, between LTCSP and p-LTCSP is 10, and between LTCSP and p-LTCCSP is 0. Given the number of pairedsamples being 17 and the significance level being 0.05, the critical value is 35, which is larger than the statistics computed. Again,

the advantage of implementing phase synchronization for intrinsic manifold learning is illustrated. Based on Table 3, the pair-wise comparisons between p-LTCSP and the other three methods on the seventeen subjects with artificially noise added are shown in Fig. 4. It is seen that most points are located below the diagonal line, which indicates that p-LTCSP is superior than the other three methods. The superiority is significant by the Friedman rank sum test. Thus, we conclude that the intrinsic manifold learning based on the phase synchronization extracts more discriminative information for single-trial EEG classification.

Z. Yu, T. Ma, N. Fang et al. / Biomedical Signal Processing and Control 59 (2020) 101882

It is interesting to see what the classification accuracies would be if artifacts of muscle activity and eye blink are added to the EEG signals. 10% of channels are randomly selected to be added the artifacts. The experiment is run ten times. The average classification accuracies (across the runs and the seventeen subjects of the three data sets) of CSP, LTCSP, LTCCSP, and p-LTCSP are 70.28%, 78.62%, 79.03%, and 82.56%, respectively. 5. Conclusion In this paper, a framework that incorporates PLV into LTCSP is proposed to perform single trial EEG classification in the task of motor imagery. The PLV is used to re-formulate the temporally local covariance matrices so as to discover the intrinsic manifold of EEG time series. The phase synchronization is related to signal communication and cognitive activity, and thus underlies EEG time series latently. Consequently, the proposed p-LTCSP method extracts spatial filters based on amplitude tuned by phase synchronization, in which both amplitude and phase information are addressed. We point it out that PLV is a classical index for quantifying phase synchronization. It may be useful to try other advanced indices to tune the temporally local covariance matrices. The experimental results on the three real EEG data sets show that the introduction of PLV into LTCSP is useful in improving the classification performance. Authors’ contribution HW, ZL and HF: designed the study; ZY, TM and NF: analyzed the data and performed experiments; ZY, TM and HW: drafted the manuscript and revised the manuscript. All authors approved the final version of the manuscript. Acknowledgment The authors would like to thank the anonymous referees for the constructive recommendations, which greatly improve the paper. This work was supported in part by the National Natural Science Foundation of China under grant 61773114, and the Key Research and Development Plan (Industry Foresight and Common Key Technology) of Jiangsu Province under grant BE2017007-3. Conflict of interest: None declared. References [1] A.S. Aghaei, M.S. Mahanta, K.N. Plataniotis, Separable common spatio-spectral patterns for motor imagery BCI systems, IEEE Trans. Biomed. Eng. 63 (1) (2016) 15–29. [2] C.W. Anderson, J.N. Knight, T. O’Connor, M.J. Kirby, A. Sokolov, Geometric subspace methods and time-delay embedding for EEG artifact removal and classification, IEEE Trans. Neural Syst. Rehabil. Eng. 14 (2) (2006) 142–146. [3] M. Arvaneh, C. Guan, K.K. Ang, C. Quek, Optimizing the channel selection and classification accuracy in EEG-based BCI, IEEE Trans. Biomed. Eng. 58 (6) (2011) 1865–1873. [4] M. Arvaneh, C. Guan, K.K. Ang, C. Quek, Optimizing spatial filters by minimizing within-class dissimilarities in electroencephalogram-based brain–computer interface, IEEE Trans. Neural Netw. Learn. Syst. 24 (4) (2013) 610–619. [5] B. Blankertz, K.-R. Müller, D.J. Krusienski, G. Schalk, J.R. Wolpaw, A. Schlögl, G. Pfurtscheller, J.R. Millán, M. Schröder, N. Birbaumer, The BCI competition III: validating alternative approaches to actual BCI problems, IEEE Trans. Neural Syst. Rehabil. Eng. 14 (2) (2006) 153–159.

7

[6] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, K.-R. Müller, Optimizing spatial filters for robust EEG single-trial analysis, IEEE Signal Process. Mag. 25 (1) (2008) 41–56. [7] J. Demˇsar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res. 7 (2006) 1–30. [8] G. Dornhege, M. Krauledat, K.-R. Müller, B. Blankertz, General signal processing and machine learning tools for BCI, in: Toward Brain–Computer Interfacing, MIT Press, Cambridge, MA, 2007, pp. 207–233. [9] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, Wiley, New York, 2001. [10] R. Eisinga, T. Heskes, B. Pelzer, M. Te Grotenhuis, Exact p-values for pairwise comparison of Friedman rank sums, with application to comparing classifiers, BMC Bioinformatics 18 (1) (2017), pp. 68 (18 pp.). [11] D.R. Hundley, M.J. Kirby, M. Anderle, Blind source separation using the maximum signal fraction approach, Signal Process. 82 (2002) 1505–1508. [12] H.-J. Hwang, S. Kim, S. Choi, C.-H. Im, EEG-based brain–computer interfaces: a thorough literature survey, Int. J. Hum.-Comput. Interface 29 (12) (2013) 814–826. [13] J.P. Lachaux, E. Rodriguez, J. Martinerie, F.J. Varela, Measuring phase synchrony in brain signals, Hum. Brain Mapp. 8 (4) (1999) 194–208. [14] S. Lemm, B. Blankertz, G. Curio, K.-R. Müller, Spatio-spectral filters for improved classification of single trial EEG, IEEE Trans. Biomed. Eng. 52 (9) (2005) 1541–1548. [15] S. Lemm, B. Blankertz, T. Dickhaus, K.-R. Müller, Introduction to machine learning for brain imaging, NeuroImage 56 (2) (2011) 387–399. [16] F. Lotte, C. Guan, Regularizing common spatial patterns to improve BCI designs: unified theory and new algorithms, IEEE Trans. Biomed. Eng. 58 (2) (2011) 355–362. [17] D.J. McFarland, C.W. Anderson, K.-R. Müller, A. Schlögl, D.J. Krusienski, BCI meeting 2005 – workshop on BCI signal processing: Feature extraction and translation, IEEE Trans. Neural Syst. Rehabil. Eng. 14 (2) (2006) 135–138. [18] M. Naeem, C. Brunner, R. Leeb, B. Graimann, G. Pfurtscheller, Separability of four-class motor imagery data using independent components analysis, J. Neural Eng. 3 (2006) 208–216. [19] A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing, 2nd ed., Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1999. [20] G. Pfurtscheller, F.H.L. da Silva, Event-related EEG/MEG synchronization and desynchronization: basic principles, Clin. Neurophysiol. 110 (1999) 1842–1857. [21] L. Roijendijk, S. Gielen, J. Farquhar, Classifying regularized sensor covariance matrices: an alternative to CSP, IEEE Trans. Neural Syst. Rehabil. Eng. 24 (8) (2016) 893–900. [22] W. Samek, M. Kawanabe, K.-R. Müller, Divergence-based framework for common spatial patterns algorithms, IEEE Reviews Biomed. Eng. 7 (2013) 50–72. [23] W. Samek, C. Vidaurre, K.-R. Müller, M. Kawanabe, Stationary common spatial patterns for brain–computer interfacing, J. Neural Eng. 9 (2012), 016013 (14pp). [24] R.T. Schirrmeister, J.T. Springenberg, L.D.J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, T. Ball, Deep learning with convolutional neural networks for EEG decoding and visualization, Hum. Brain Mapp. 38 (11) (2017) 5391–5420. [25] A.K. Singh, H. Asoh, S. Phillips, Optimal detection of functional connectivity from high-dimensional EEG synchrony data, NeuroImage 58 (1) (2011) 148–156. [26] C.J. Stam, G. Nolte, A. Daffertshofer, Phase lag index: Assessment of functional connectivity from multi-channel EEG and MEG with diminished bias from common sources, Hum. Brain Mapp. 28 (11) (2007) 1178–1193. [27] F.J. Varela, J.P. Lachaux, E. Rodriguez, J. Martinerie, The brainweb: phase synchronization and large-scale integration, Nat. Rev. Neurosci. 2 (4) (2001) 229–239. [28] H. Wang, Q. Tang, W. Zheng, L1-norm-based common spatial patterns, IEEE Trans. Biomed. Eng. 59 (3) (2012) 653–662. [29] H. Wang, D. Xu, Comprehensive common spatial patterns with temporal structure information of EEG data: minimizing nontask related EEG component, IEEE Trans. Biomed. Eng. 59 (9) (2012) 2496–2505. [30] H. Wang, W. Zheng, Local temporal common spatial patterns for robust single-trial EEG classification, IEEE Trans. Neural Syst. Rehabil. Eng. 16 (2) (2008) 131–139. [31] W. Wu, Z. Chen, X. Gao, Y. Li, E.N. Brown, S. Gao, Probabilistic common spatial patterns for multichannel EEG analysis, IEEE Trans. Pattern Anal. Mach. Intell. 37 (3) (2015) 639–653. [32] H. Yuan, B. He, Brain–computer interfaces using sensorimotor rhythms: Current state and future perspectives, IEEE Trans. Biomed. Eng. 61 (5) (2014) 1425–1435. [33] R. Zhang, P. Xu, T. Liu, Y. Zhang, L. Guo, P. Li, D. Yao, Local temporal correlation common spatial patterns for single trial EEG classification during motor imagery, Comput. Math. Methods Med. (2013), 591216 (7 pp.).