Author’s Accepted Manuscript Sparse Bayesian multiway canonical correlation analysis for EEG pattern recognition Yu Zhang, Guoxu Zhou, Jing Jin, Yangsong Zhang, Xingyu Wang, Andrzej Cichocki www.elsevier.com/locate/neucom
PII: DOI: Reference:
S0925-2312(16)31337-6 http://dx.doi.org/10.1016/j.neucom.2016.11.008 NEUCOM17706
To appear in: Neurocomputing Received date: 3 March 2016 Revised date: 3 September 2016 Accepted date: 9 November 2016 Cite this article as: Yu Zhang, Guoxu Zhou, Jing Jin, Yangsong Zhang, Xingyu Wang and Andrzej Cichocki, Sparse Bayesian multiway canonical correlation analysis for EEG pattern recognition, Neurocomputing, http://dx.doi.org/10.1016/j.neucom.2016.11.008 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Sparse Bayesian multiway canonical correlation analysis for EEG pattern recognition Yu Zhanga,∗, Guoxu Zhoub , Jing Jina , Yangsong Zhangc , Xingyu Wanga , Andrzej Cichockid a
Key Laboratory for Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China b School of Automation at Guangdong University of Technology, Guangzhou 510006, China c School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang 621010, China d Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Wako-shi, Saitama 351-0198, Japan
Abstract L1-regularized multiway canonical correlation analysis (L1-MCCA) has been introduced to reference signal optimization in steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI). The effectiveness of L1-regularization on significant trial selection highly depends on the regularization parameter setting, which can be typically determined by crossvalidation (CV). However, CV will substantially reduce the practicability of BCI system due to additional data requirement for the parameter validation and relatively high computational cost. To solve the problem, this study proposes a Bayesian version of L1-MCCA (called SBMCCA) by exploiting sparse Bayesian learning. The SBMCCA method avoids CV and can efficiently estimate the model parameters under the Bayesian evidence framework. Experimental results show that the SBMCCA method achieved comparable recognition accuracy but much higher computational efficiency in contrast to the L1-MCCA method. Keywords: Brain-computer interface (BCI), Electroencephalogram (EEG), Multiway canonical correlation analysis (MCCA), Sparse Bayesian learning, ∗
Corresponding author: Yu Zhang. E-mail:
[email protected]
Preprint submitted to Neurocomputing
August 26, 2016
Steady-state visual evoked potential (SSVEP) 1. Introduction A brain-computer interface (BCI) provides a new connection channel between a human brain and a computer [1, 2]. BCI can translate the taskrelated neural responses into the useful computer commands. Accordingly, those severely disabled people can achieve real-time communication with the external environment by the BCI systems [3, 4, 5, 6]. Electroencephalogram (EEG) technique has been most widely adopted to measure the neural responses, typically including P300, steady-state visual evoked potential (SSVEP) and sensorimotor rhythm, etc. for BCI applications [7, 8, 9, 10, 11, 12, 13, 14]. In recent years, SSVEP-based BCI has been increasingly studied due to its relatively higher information transfer rate [15, 16, 17, 18, 19, 20]. An SSVEP-based BCI usually presents several visual stimuli flickering at different frequencies to the user (also called subject). As a periodic neural response, SSVEP is evoked at the same frequency as that of the stimulus and also includes higher harmonics, over occipital scalp region, when the subject focuses attention on one of the stimuli [21]. The SSVEP-based BCI can be exploited to recognize the SSVEP frequency components from EEG, and hence to detect commands desired by the subject. However, a challenge issue for the SSVEP-based BCI is how to accurately recognize the SSVEP, since it is likely to be contaminated by the background noises in the brain [22, 23, 24]. Due to the effects of volume conduction in brain, accurate recognition of SSVEP can hardly be achieved with single-channel detection by the traditional power spectral density analysis (PSDA). In the past few years, minimum energy combination (MEC) [22] and canonical correlation analysis (CCA) [23] have been proposed to improve SSVEP recognition performance through exploiting multichannel information. The MEC method aims at finding spatial filters to minimize the nuisance signals and noises as much as possible from the multichannel EEG. The CCA method discovers the dominant frequency component through maximizing the correlation between the multichannel EEG and the predefined reference signals. In recent years, several other more advanced algorithms have been proposed for SSVEP-based BCI. A multivariate synchronization index (MSI)-based method [25] was proposed to calculate the synchronization index between EEG and the reference 2
signals for SSVEP recognition with improved performance. A multiway CCA (MCCA) [26] was introduced to optimize the reference signals in the CCAbased method through collaboratively optimization between the channel-way and trial-way arrays of EEG tensor. However, the SSVEP responses from the same subject may vary from trial to trial. For instance, a “good” trial presents distinctive SSVEP characteristics at the stimulus frequency and its higher harmonics while a “bad” one provides poor results (see Fig. 1). As a result, MCCA without any regularization may not result in the optimal recognition accuracy. To address this problem, a L1-regularized version of MCCA (L1-MCCA) [27] has been proposed to further enhance the SSVEP recognition performance by selecting significant trials. The L1-MCCA method was demonstrated to outperform both the CCA and MCCA methods. Some other algorithms recently proposed for SSVEP recognition can be found in literatures [28, 29, 30, 31, 32, 33, 34, 35, 36]. The selection of regularization parameter usually plays a considerably important role in the L1-MCCA method. Cross-validation (CV) has been usually adopted to determine the most appropriate parameter. However, CV requires additional data for the parameter validation and relatively high computational cost. These limitations will substantially reduce the practicability of BCI system. Bayesian inference treats the regularization in a probabilistic framework, and provides an effective approach to automatically and quickly estimate the model parameters [37, 38, 39, 40]. In recent years, some Bayesian based methods have been introduced to automatic spatial filtering of EEG [41, 42, 43]. In this study, we propose a Bayesian version of L1-MCCA (called SBMCCA) for SSVEP recognition by alternately exploiting sparse Bayesian learning [44]. Compared with L1-MCCA, SBMCCA does not requires CV with additional data to validate the selection of regularization parameter. The model parameters can be automatically estimated under the Bayesian evidence framework. As a result, higher efficiency could be achieved for the SSVEP-based BCI. Experimental study confirms that the proposed SBMCCA achieved comparable recognition accuracy but much lower computational cost in contrast to the L1-MCCA method. 2. Methodology 2.1. CCA As an effective multivariable statistical method, canonical correlation analysis (CCA) [45] aims at discovering the underlying correlation between 3
Figure 1: An example of SSVEP characteristics evoked at stimulus frequency of 6 Hz from a “good” and a “bad” trials, respectively. The left two subfigures show the power spectrums at channel Oz while the right two present the scalp topography of power spectrum summed at the stimulus frequency and its harmonics.
two sets of data. Consider two multichannel signals X ∈ RI1 ×J and Y ∈ RI2 ×J , CCA seeks two projection vectors w ∈ RI1 and v ∈ RI2 to maximize the correlation between linear projections x ˜ = wT X and y ˜ = vT Y by solving: E[˜ xy ˜T ] ρ = max p w,v E[˜ xx ˜T ]E[˜ yy ˜T ] wT XYT v , = max √ w,v wT XXT wvT YYT v
(1)
where ρ is the maximum correlation coefficient. The optimization problem in (1) can be transformed into a generalized eigenvalue problem [46]. In recent years, CCA has been mostly studied for SSVEP recognition [23]. Assume we have M stimulus frequencies to be recognized in an SSVEP-based BCI. X denotes the EEG data recorded from I1 channels with J points in each channel. Ym is the reference signal at m-th stimulus frequency fm
4
(m = 1, 2, . . . , M ), constructed by a series of sine-cosine waves as sin (2πfm t) cos (2πfm t) J 1 2 .. Ym = , t = , ,..., , . F F F sin (2πHfm t) cos (2πHfm t)
(2)
where H denotes the number of used harmonics (i.e., I2 = 2H) and F is the sampling rate. With the correlation coefficients computed by (1) between X and each Ym , the SSVEP target frequency ft is then recognized as ft = max ρm , fm
m = 1, 2, . . . , M.
(3)
2.2. L1-MCCA A potential issue of the CCA method is that the artificially sine-cosine references can hardly give the optimal accuracy for SSVEP recognition since they lack subject-specific and inter-trial information. By exploiting tensor analysis [47], multiway canonical correaltion analysis (MCCA) [26] has been proposed for reference signal optimization to enhance the recognition accuracy. Let us consider a three-way tensor X ∈ RI×J×K (channel × time × trial) formed by multi-channel EEG data from multiple trials with a specific stimulus frequency. Construct the original reference signal set Y ∈ R2H×J as (2). The MCCA aims at finding projection vectors w1 ∈ RI , w3 ∈ RK and v ∈ R2H to maximize the correlation between linear projections x ˜ = X ×1 w1T ×3 w3T and y ˜ = vT Y through E[˜ xy ˜T ] . ρ = max p w1 ,w3 ,v E[˜ xx ˜T ]E[˜ yy ˜T ]
(4)
The optimization in (4) can be solved by an alternating CCA algorithm [26]. After learning the optimal linear transforms w1 and w3 , we then obtain the optimized reference signal z ∈ RJ as z = X ×1 w1T ×3 w3T .
(5)
ˆ ∈ RI×J and The correlation between a new test data of single trial X the optimized reference signals zm at each of the stimulus frequency fm 5
Figure 2: Contour plots of Student-t prior and the corresponding posterior in two dimensions. The Student-t prior is derived by integrating out the hyperparameters α for the separate Gaussian prior. Uniform hyperpriors p(αd ) = Γ(αd |0, 0) (d = 1, 2, . . . , D) are specified on α.
(m = 1, 2, . . . , M ) is maximized by (1). The SSVEP target frequency is then recognized according to (3). More recently, a penalized MCCA with L1-regularization (L1-MCCA) has been proposed to learn sparse projection vector that provides a function of automatic feature selection for reference signal optimization in SSVEP recognition. According to the relationship between CCA and least squares, we can formulate the optimization problem of L1-MCCA as 1 w1 , w3 , v = arg min kX ×1 w1T ×3 w3T − vT Yk22 w1 ,w3 ,v 2 + λ1 kw1 k1 + λ2 kvk1 + λ3 kw3 k1 s.t. kw1 k2 = kw3 k2 = kvk2 = 1,
(6)
where λ1 , λ2 , λ3 are regularization parameters to control the sparsity of w1 , v, w3 , respectively, and k · kp denotes the lp -norm. Since the optimization problem in (6) is equivalent to the LASSO estimate [48, 49] when any two of w1 , w3 and v are fixed, it can be solved by an alternating LASSO algorithm [27]. The sparsity of w1 , v and w3 results in automatic selection of channels, harmonics and trials, respectively, for the reference signal optimization. Through selecting the significant trials, the L1-MCCA method has been demonstrated to outperform both the CCA and MCCA methods for SSVEP recognition. 2.3. SBMCCA Effectiveness of the L1-MCCA method depends on the selection of regularization parameters to a large extent. CV has been usually used to deter6
Algorithm 1: SBMCCA algorithm for reference signal optimization in SSVEP recognition Input: EEG tensor X ∈ RI×J×K (channel × time × trial) recorded at a specific stimulus frequency, sine-cosine signals Y ∈ R2H×J (harmonic × time) constructed as (2). Output: Optimized reference signal z ∈ RJ Random initialization for w1 ∈ RI ; ˜ = X ×1 w1T ; Do X repeat Random initialization for v ∈ R2H ; Do y ˜ = vT Y; repeat Iteratively update αd and σ 2 by (13) and (14); ˜ yT , w3 = w3 /kw3 k2 ; w3 = σ −2 ΣX˜ ˜ = X ×3 w3T ; X Iteratively update αd and σ 2 by (13) and (14); ˜ yT , w1 = w1 /kw1 k2 ; w1 = σ −2 ΣX˜ ˜ = X ×1 w1T ; X until Stop criterion is met; x ˜ = X ×1 w1T ×3 w3T ; 2 Iteratively update αd by (13) PD and σ by: 2 T 2 σ ← k˜ x − µ Yk2 /(N − d=1 γd ); −2 v = σ ΣY˜ xT , v = v/kvk2 ; until Stop criterion is met; z = X ×1 w1T ×3 w3T . mine the most appropriate parameters. However, additional validation data are required by CV for parameter selection, which takes a relatively long time for model calibration. This will substantially reduce the practicability of BCI system and may cause the users reluctance. Bayesian inference provides a natural approach to automatically determine the model parameters using the training data alone. All of the available samples can be used for quick calibration of the model without the need of CV. As a result, higher efficiency could be achieved for the SSVEP-based BCI. In this study, we introduce a sparse Bayesian MCCA (SBMCCA) to SSVEP recognition by
7
alternately exploiting sparse Bayesian learning. ˜ = X ×1 w1T and Assume w1 and v in (6) has been obtained. We have X y ˜ = vT Y. Our aim is now to solve w3 by 1 ˜ −y ˜k22 + λ3 kw3 k1 . w3 = arg min kw3T X 2 w3
(7)
The likelihood function for (7) without regularization term can be written down as: N2 1 1 2 T ˜ 2 p(˜ y|w3 , σ ) = exp − 2 k˜ y − w3 Xk2 . (8) 2πσ 2 2σ We exploit an elegant way [44] to treat the sparse regularization in (7) through defining a separate Gaussian prior p(w3 |α) = =
D Y
N (wd,3 |0, αd−1 )
d=1 D Y d=1
αd 21 1 2 exp − αd wd,3 , 2π 2
(9)
where α = [α1 , . . . , αD ] contains the hyperparameters to separately control the inverse variance of weights w1,3 , . . . , wD,3 , respectively. Fig. 2 depicts the Student-t prior derived by integrating out the hyperparameters α for the separate Gaussian prior, and the corresponding posterior. It can be seen that the separate Gaussian prior shrinks the posterior towards the axes heavily, thereby enforcing sparse solution. Following the Bayesian rule, we can derive the mean and covariance of posterior p(w3 |α, σ 2 , y ˜) as ˜X ˜ T + Λ)−1 , Σ = (σ −2 X ˜ yT , µ = σ −2 ΣX˜
(10) (11)
where Λ = diag([α1 , . . . , αD ]). To estimate the hyperparameters, we maximize the marginal likelihood Z 2 p(˜ y|α, σ ) = p(˜ y|w3 , σ 2 )p(w3 |α) dw3 , =
1 2π
N2
− 21
|Σ|
8
1 −1 exp − y ˜Σ y ˜ . 2
(12)
With the evidence framework [37, 44] for maximizing (12), the iterative estimation formulas for the hyperparameters are then derived as αd ←
γd , µ2d
(13)
σ2 ←
˜ 22 k˜ y − µT Xk , P N− D γ d d=1
(14)
where γd = 1 − αd Σdd and Σdd is the d-th diagonal entry of the posterior covariance. After convergence, the mean of posterior is computed by (11) and used as the solved projection vector w3 . Fixing w1 with w3 or v with w3 , we can update v or w1 in the same way as the aforementioned approach. Following the procedure, w1 , w3 and v are alternately learned, and the optimal solutions are obtained when convergent. The optimized reference signal zm at the m-th stimulus frequency (m = 1, 2, . . . , M ) is calculated by (5) in using the optimal linear transforms. For ˆ ∈ RI×J , the SSVEP target frequency is then a new test data of single trial X recognized by the CCA method formulated as (1) and (3) with the optimized reference signals. 3. Experimental Study 3.1. EEG Acquisition 3.1.1. Dataset-1 The dataset-1 was collected from ten healthy subjects (S1–S10, aged from 21 to 27 years) who had normal or corrected to normal vision. During the experiment, the subjects were seated in a chair 60 cm from a 17 inch CRT monitor (85 Hz refresh rate). Four red squares were flickered on the screen as stimuli at 6, 8, 9 and 10 Hz, respectively. Each subject completed 20 experimental runs. The experimental task of each run was to ask the subject to focus on each of the four stimuli once for 4 s. Thus, a total of 80 trials were performed by each subject. EEG data were recorded using the Nuamps amplifier (NuAmp, Neuroscan, Inc.) at 250 Hz sampling rate with 0.1-70 Hz band-pass filtering from 30 channels. The reference and ground were placed on the two mastoid electrodes and the forehead, respectively. We only used the eight channels P7, P3, Pz, P4, P8, O1, Oz, and O2 for SSVEP recognition since it has been confirmed to mainly appear at parietal and occipital areas. 9
Table 1: Recognition accuracy (%) obtained by CCA, L1-MCCA and SBMCCA, respectively, at different time window (TW), for the dataset-1. TW
Method
1s
CCA MCCA L1-MCCA SBMCCA CCA MCCA L1-MCCA SBMCCA CCA MCCA L1-MCCA SBMCCA CCA MCCA L1-MCCA SBMCCA
2s
3s
4s
S1 72.5 81.3 86.3 86.3 90.0 97.5 98.8 98.8 92.5 100 100 100 95.0 100 100 100
S2 33.8 43.8 43.8 46.3 58.8 48.8 55.0 56.3 55.0 45.0 56.3 55.0 58.8 62.5 62.5 61.3
S3 37.5 46.3 56.3 53.8 57.5 61.3 66.3 67.5 71.3 76.3 76.3 76.3 72.5 82.5 86.3 86.3
S4 52.5 72.5 80.0 81.3 76.3 95.0 96.3 96.3 81.3 97.5 98.8 98.8 90.0 98.8 98.8 98.8
Subject S5 S6 80.0 52.5 78.8 61.3 78.8 62.5 78.8 62.5 91.3 81.3 95.0 88.8 96.3 92.5 96.3 93.8 90.0 86.3 91.3 88.8 91.3 88.8 91.3 88.8 88.8 85.0 93.8 92.5 93.8 93.8 93.8 92.5
S7 38.8 41.3 43.8 46.3 67.5 66.3 67.5 66.3 72.5 78.8 78.8 78.8 77.5 91.3 91.3 91.3
S8 56.3 57.5 62.5 62.5 81.3 85.0 88.8 88.8 90.0 92.5 93.8 93.8 90.0 96.3 96.3 97.5
S9 43.8 48.8 48.8 50.0 43.8 68.8 71.3 71.3 51.3 76.3 76.3 76.3 51.3 90.0 90.0 90.0
S10 72.5 86.3 88.8 88.8 90.0 98.8 98.8 98.8 93.8 98.8 98.8 98.8 97.5 100 100 100
Average 54.0±16.3 61.8±16.9 65.2±17.3 65.7±16.8 73.8±16.3 80.5±17.8 83.2±16.4 83.4±16.2 78.4±15.4 84.5±16.6 85.9±13.9 85.8±14.2 80.6±15.5 90.8±11.3 91.3±11.1 91.2±11.5
Table 2: Recognition accuracy (%) obtained by CCA, L1-MCCA and SBMCCA, respectively, at different time window (TW), for the dataset-2. TW
Subject
1s
A1 A2 A3 Average A1 A2 A3 Average
2s
CCA 98.0 82.0 52.0 77.3 100 97.0 83.0 93.3
MCCA 100 84.0 68.0 85.0 100 96.0 88.0 94.7
Method L1-MCCA 100 85.0 71.0 85.3 100 97.0 89.0 95.3
SBMCCA 100 86.0 72.0 85.6 100 98.0 88.0 95.3
3.1.2. Dataset-2 The dataset-2 was acquired from three healthy subjects (A1–A3, aged 25, 31 and 34) with normal vision. Four white squares were presented as stimuli on a LCD monitor (60 Hz refresh rate). The stimuli were flickered at four different frequencies 8.5, 10, 12 and 15 Hz, respectively. Each subject
10
completed five runs with 5 to 10 min rest after each of them. During each run, the subject was indicated to focus on each of the four stimuli for five times with a duration of 2 s for each time, respectively. That is, 20 trials were performed in each run and a total of 100 trials were completed by each subject. EEG data were recorded with a Biosemi Active Two amplifier at 256 Hz sampling rate from eight channels PO3, POz, PO4, PO7, PO8, O1, Oz and O2. The EEG data were referred to average of the eight channels and bandpass filtered between 5 and 50 Hz. 3.2. Experimental Evaluation In this study, we compared the proposed SBMCCA method with CCA, MCCA and L1-MCCA for SSVEP recognition. For the MCCA, L1-MCCA and SBMCCA methods, leave-one-run-out CV was implemented to evaluate the average recognition accuracy. More specifically, the data from one run are used for validation while the data from the left 19 runs for reference signal optimization. This procedure is repeated till each run serves once for validation. The regularization parameter λ3 in L1-MCCA is chosen by CV on the training data from a set of values λ3 ∈ {0.01, 0.02, . . . , 0.1}. The model parameters in SBMCCA are automatically determined by Bayesian inference without the need of CV. For the CCA method, the average recognition accuracy is evaluated on the direct validation of 20 runs since no reference signal optimization is needed. For each of the four methods, the accuracy is calculated as the ratio of the number of correctly recognized test trials over the total number of test trials. 3.3. Results Table 1 summarizes the SSVEP recognition accuracies obtained by CCA, MCCA, L1-MCCA and SBMCCA for all of the ten subjects at 1-4 s time windows (TWs), respectively. The paired t-test was adopted to investigate the statistical difference between SBMCCA and each of the other three methods. SBMCCA significantly outperformed CCA at all of the four TWs (p < 0.005 at TW = 1, p < 0.01 at TW = 2, p < 0.05 at TW = 3, p < 0.05 at TW = 4). SBMCCA performed significantly better than MCCA at TW = 1 (p < 0.005), TW = 2 (p < 0.01), and yielded comparable accuracy to that of MCCA at TW = 3 (p = 0.23), TW = 4 (p = 0.39). Comparable accuracy was achieved between SBMCCA and L1-MCCA at all of the four TWs (p = 0.31 at TW = 1, p = 0.32 at TW = 2, p = 0.34 at TW = 3, and 11
Table 3: Computational time of CCA, MCCA, L1-MCCA and SBMCCA for SSVEP recognition. Note that CCA does implement reference signal optimization. Method CCA MCCA L1-MCCA SBMCCA
Computational time (s) 0.003 0.3 292.9 2.7
p = 0.57 at TW = 4). Table 2 presents the SSVEP recognition accuracies obtained by the four methods on the dataset-2. Similar as the results in Table 1, SBMCCA and L1-MCCA achieved comparable recognition accuracy that is higher than those of CCA and MCCA. Table 3 presents the computational time taken by the four methods, respectively, with 4 s TW for SSVEP recognition from the four candidate stimulus frequencies. The computational environment was under Matlab R2012a on a laptop with 2.5 GHz CPU (10 GB RAM). CCA achieved the highest computational efficiency since no reference optimization was required. L1MCCA took much longer time than those of the other methods due to the time-consuming CV procedure. Note that, the computational time of L1MCCA has also been reported in [27], wherein the time taken by the CV procedure was not considered. In summary, SBMCCA achieved comparable SSVEP recognition accuracy but much higher computational efficiency in comparison with L1-MCCA. 4. Discussion In recent years, multiway learning [47, 50, 51, 52, 53, 54, 55] and regularization [56, 57] have shown their potentials in EEG analysis, especially for BCI applications. As a combination of multiway canonical correlation analysis and sparse regularization, L1-MCCA [27] was proposed to learn the optimal reference signals for improving SSVEP recognition. L1-MCCA has been confirmed to achieve better performance in comparison with CCA and MCCA. The effectiveness of L1-MCCA depends on the selection of parameter for L1-regularization to a large extent. Although CV is a typical way of determining the regularization parameter, it requires additional validation data and takes relatively long time for model calibration. This will substantially reduce the practicability of BCI system and may cause the users reluctance. 12
20
SBMCCA Weights in W3
0.8 0.6 0.4 0.2 0 −0.2 0
5
10 15 Trial index
20
−12
0
1
2 3 Time (s)
4
12 6 0 −6
−12
0
1
2 3 Time (s)
4
6
Frequency (Hz)
0 −6
6 4 2 0
0
10 20 30 40 Frequency (Hz)
4 2 0
0
10 20 30 40 Frequency (Hz)
50
200
30 25 20 15 10
50
100
0 Frequency (Hz)
10 15 Trial index
6
Power spectrum
5
Amplitude (µV)
0
12
Power spectrum
L1−MCCA Weights in W3
Amplitude (µV)
0.8 0.6 0.4 0.2 0 −0.2
1
2 3 Time (s)
4
0
200
30 25 20 15 10
100
0
1
2 3 Time (s)
4
0
Figure 3: Example results of reference results learned by L1-MCCA and SBMCCA, respectively. The results contain the weight distribution of w3 , temporal waveform, power spectrum obtained by FFT, and time-frequency information obtained by Morlet wavelet transform.
In this study, the proposed SBMCCA method alternately exploited sparse Bayesian learning to automatically determine the sparsity of projection vectors in MCCA. In SMBCCA, all of the available training samples were used for efficient model calibration without the need of CV. The computational time for reference learning was much reduced by SBMCCA in contrast to L1MCCA (see Table 3). Fig. 3 depicts the example results of reference signals derived by L1-MCCA and SBMCCA, respectively. The sparse projection vector learned by SBMCCA was almost consistent with that by L1-MCCA so that both the two methods obtained similar reference signals for SSVEP recognition. Fig. 4 depicts an example for the effects of varying model parameter on SSVEP recognition accuracy. The accuracy of L1-MCCA was evaluated by leave-one-out CV on the training data. L1-MCCA yielded the highest accuracy at λ3 = 0.03. Thanks to the evidence framework [37, 44], all of the model parameters in SBMCCA were efficiently estimated according to Bayeisan inference (see formulas (10) and (11)) without the need of the time-consuming CV procedure. Thus, SBMCCA saved much computational time to achieved comparable recognition performance in comparison with L1-MCCA (see Table 1 and Table 3). Both the SBMCCA and L1-MCCA implemented iterative updates for reference signal optimization. Hence, the computational efficiency was also influenced by the convergence speed. Fig. 5 shows the convergence curves of projection vectors w1 , w3 and v obtained by L1-MCCA and SBMCCA, respectively. Considering a trade-off between accuracy and efficiency, the
13
94 L1−MCCA
93
SBMCCA
92
Accuracy (%)
91 90 89 88 87 86 85 84 0.02
0.04
0.06 λ3
0.08
0.1
Figure 4: Example for the effects of varying model parameter on SSVEP recognition accuracy. For L1-MCCA, the accuracy was evaluated by leave-one-out cross-validation on the training data. The highest accuracy was achieved at λ3 = 0.03. SBMCCA yielded the same accuracy as the highest one of L1-MCCA. The model parameter of SBMCCA was automatically determined by Bayesian inference without the need of cross-validation.
stop criterion for iterative updates was defined as Error = kw(n) − w(n − 1)k2 < 10−5 ,
(15)
where w is the projection vector to be learned and n denotes the number of iteration steps. The iterative updates did not stop until each of w1 , w3 and v satisfied (15). For both the two methods, the iterative updates achieved convergence in a comparable speed in terms of the number of iterations. It is worth noting that the separate Gaussian prior adopted by sparse Bayesian learning has no direct probabilistic relationship with the L1 -regularization. Instead, Laplace prior [58, 59] is formally equivalent to the L1 -regularization, and hence is considered to be a more natural approach to sparse learning. SBMCCA by using Laplace prior may further improve the SSVEP recognition performance. However, the Laplace prior doest not allow for a tractable Bayesian analysis since it is not conjugate to the likelihood function formulated in (8). Hence, it requires more mathematical transforms and high computational complexity. Effectiveness of the Laplace prior on the SBMCCA model will be investigated in our future study. 14
SBMCCA Error
L1−MCCA Error
w1 2 1.5 1 0.5 0 0
2 1.5 1 0.5 0 0
10 20 30 No. iteration w1
10 20 30 No. iteration
w3
40
0.4 0.3 0.2 0.1 0 0
40
0.4 0.3 0.2 0.1 0 0
v 0.06 0.04 0.02
10 20 30 No. iteration w3
40
0
0
10 20 30 No. iteration
40
v 0.06 0.04 0.02
10 20 30 No. iteration
40
0
0
10 20 30 No. iteration
40
Figure 5: Convergence curves of projection vectors w1 , w3 and v obtained by L1-MCCA and SBMCCA, respectively. The value of y-axis denotes the error computed according to (15).
5. Conclusions In this study, we introduced a sparse Bayesian canonical correlation analysis (SBMCCA) to SSVEP recognition for BCI application. By alternately exploiting sparse Bayesian learning, the SBMCCA method automatically learn the optimal reference signals of SSVEP without the need of cross-validation for determining the model parameters. Experimental results show that the SBMCCA method yielded similar recognition accuracy but saved much computational time in comparison with the competing L1-MCCA method. This indicate the proposed SBMCCA could further improve the practicability of SSVEP-based BCI. Acknowledgements This study was supported in part by the Nation Nature Science Foundation of China under Grant 61305028, Grant 91420302, Grant 61573142, Grant 61673124, Grant 81401484, Fundamental Research Funds for the Central Universities under Grant WH1314023, Grant WG1414005, Grant WH1516018, Grant WH1414022, Shanghai Chenguang Program No. 14CG31, Guangdong Province Natural Science Foundation under Grant 2014A030308009.
15
References [1] J. Wolpaw, N. Birbaumer, D. McFarland, G. Pfutscheller, T. Vaughan, Brain-computer interfaces for communication and control, Clin. Neurophysiol. 113 (6) (2002) 767–791. [2] Y. Zhang, Q. Zhao, J. Jin, X. Wang, A. Cichocki, A novel BCI based on ERP components sensitive to configural processing of human faces, J. Neural Eng. 9 (2) (2012) 026018. [3] U. Hoffmann, J. Vesin, T. Ebrahimi, K. Diserens, An efficient P300based brain-computer interface for disabled subjects, J. Neurosci. Meth. 167 (1) (2008) 115–125. [4] S. Gao, Y. Wang, X. Gao, B. Hong, Visual and auditory brain-computer interfaces, IEEE Trans. Biomed. Eng. 61 (5) (2014) 1436–1447. [5] J. Jin, B. Allison, Y. Zhang, X. Wang, A. Cichocki, An ERP-based BCI using an oddball paradigm with different facecs and reduced errors in critical functions, Int. J. Neural Syst. 24 (8) (2014) 1450027. [6] J. Jin, E. Sellers, S. Zhou, Y. Zhang, X. Wang, A. Cichocki, A P300 brain computer interface based on a modification of the mismatch negativity paradigm, Int. J. Neural Syst. 25 (3) (2015) 1550011. [7] M. Wang, I. Daly, B. Z. Allison, J. Jin, Y. Zhang, L. Chen, X. Wang, A novel BCI paradigm based on P300 and SSVEP, J. Neurosci. Meths. 244 (2015) 16–25. [8] G. Pfurtscheller, T. Solis-Escalante, R. Ortner, P. Linortner, G. M¨ ullerPutz, Self-paced operation of an SSVEP-based orthosis with and without an imagery-based “brain switch:” a feasibility study towards a hybrid BCI, IEEE Trans. Neural Syst. Rehabil. Eng. 18 (4) (2010) 409–414. [9] G. Bin, X. Gao, Z. Yan, B. Hong, S. Gao, An online multi-channel SSVEP-based brain-computer interface using a canonical correlation anaylsis method, J. Neural Eng. 6 (4) (2009) 046002. [10] G. Pfurtscheller, C. Brunner, A. Schl¨ogl, S. Lopes, Mu rhythm (de)synchronization and EEG single-trial classification of different motor imagery tasks, NeuroImage 31 (1) (2006) 153–159. 16
[11] J. Pan, Y. Li, R. Zhang, Z. Gu, F. Li, Discrimination between control and idle states in asynchronous SSVEP-based brain switches: A pseudokey-based approach, IEEE Trans. Neural Syst. Rehabil. Eng. 21 (3) (2013) 435–443. [12] Y. Zhang, G. Zhou, J. Jin, X. Wang, A. Cichocki, Optimizing spatial patterns with sparse filter bands for motor-imagery based braincomputer interface, J. Neurosci. Meth. 255 (2015) 85–91. [13] J. Li, Z. Struzik, L. Zhang, A. Cichocki, Feature learning from incomplete EEG with denoising autoencoder, Neurocomputing 165 (2015) 23– 31. [14] Z. Qiu, J. Jin, H. Lam, Y. Zhang, X. Wang, A. Cichocki, Improved SFFS method for channel selection in motor imagery based BCI, Neurocomputing (2016) In Press. [15] Y. Wang, R. Wang, X. Gao, B. Hong, S. Gao, A practical VEP-based brain-computer interface, IEEE Trans. Neural Syst. Rehabil. Eng. 14 (2) (2006) 234–239. [16] B. Allison, D. McFarland, G. Schalk, S. Zheng, M. Jackson, J. Wolpaw, Towards an independent brain-computer interface using steady state visual evoked potentials, Clin. Neurophysiol. 119 (2) (2008) 399–408. [17] B. Allison, T. L¨ uth, D. Valbuena, A. Teymourian, I. Volosyak, A. Gr¨aser, BCI Demographics: How many (and what kinds of) people can use an SSVEP BCI?, IEEE Trans. Neural Syst. Rehabil. Eng. 18 (2) (2010) 107–116. [18] Y. S. Zhang, L. Dong, R. Zhang, D. Yao, Y. Zhang, P. Xu, An efficient frequency recognition method based on likelihood ratio test for SSVEPbased BCIs, Comput. Math. Methods Med. (2014) Article ID 908719. [19] D. Zhang, B. Huang, W. Wu, S. Li, An idle-state detection algorithm for SSVEP-based brain-computer interfaces using a maximum evoked response spatial filter, Int. J. Neural Syst. 25 (7) (2015) 1550030. [20] J. Cruz, F. Wan, C. Wong, T. Cao, Adaptive time-window length based on online performance measurement in SSVEP-based BCIs, Neurocomputing 149 (2015) 93–99. 17
[21] G. M¨ uller-Putz, R. Scherer, C. Brauneis, G. Pfurtscheller, Steady-state visual evoked potential (SSVEP)-based communication: impact of harmonic frequency components, J. Neural Eng. 2 (4) (2005) 123–130. [22] O. Friman, I. Volosyak, A. Graser, Multiple channel detection of steadystate visual evoked potentials for brain-computer interfaces, IEEE Trans. Biomed. Eng. 54 (4) (2007) 742–750. [23] Z. Lin, C. Zhang, W. Wu, X. Gao, Frequency recognition based on canonical correlation analysis for SSVEP-based BCIs, IEEE Trans. Biomed. Eng. 53 (12) (2006) 2610–2614. [24] Y. Zhang, G. Zhou, J. Jin, X. Wang, A. Cichocki, SSVEP recognition using common feature analysis in brain-computer interface, J. Neurosci. Meth. 244 (2015) 8–15. [25] Y. S. Zhang, P. Xu, K. Cheng, D. Yao, Multivariate synchronization index for frequency recognition of SSVEP-based brain-computer interface, J. Neurosci. Meth. 221 (2014) 32–40. [26] Y. Zhang, G. Zhou, Q. Zhao, A. Onishi, J. Jin, X. Wang, A. Cichocki, Multiway canonical correlation analysis for frequency components recognition in SSVEP-based BCIs, In: 18th Int’l Conf. on Neural Information Processing (ICONIP2011) (2011) 287–295. [27] Y. Zhang, G. Zhou, J. Jin, M. Wang, X. Wang, A. Cichocki, L1regularized multiway canonical correlation analysis for SSVEP-based BCI, IEEE Trans. Neural Syst. Rehabil. Eng. 21 (6) (2013) 887–896. [28] C. Wu, H. Chang, P. Lee, K. Li, J. Sie, C. Sun, C. Yang, P. Li, H. Deng, K. Shyu, Frequency recognition in an SSVEP-based brain computer interface using empirical mode decomposition and refined generalized zero-crossing, J. Neurosci. Meth. 196 (1) (2011) 170–181. [29] J. Pan, X. Gao, F. Duan, Z. Yan, S. Gao, Enhancing the classification accuracy of steady-state visual evoked potential-based brain-computer interfaces using phase constrained canonical correlation analysis, J. Neural Eng. 8 (3) (2011) 036027.
18
[30] Y. Zhang, J. J. Zhou, Guoxu, X. Wang, A. Cichocki, Frequency recognition in SSVEP-based BCI using multiset canonical correlation analysis, Int. J. Neural Syst. 24 (3) (2014) 1450013. [31] X. Chen, Y. Wang, S. Gao, T. Jung, X. Gao, Filter bank canonical correlation analysis for implementing a high-speed SSVEP-based braincomputer interface, J. Neural Eng. 12 (4) (2015) 046008. [32] P. Yuan, X. Chen, Y. Wang, X. Gao, S. Gao, Enhancing performances of SSVEP-based brain-computer interfaces via exploiting inter-subject information, J. Neural Eng. 12 (4) (2015) 046006. [33] M. Nakanishi, Y. Wang, Y. Wang, T. Jung, A comparison study of canonical correlation analysis based methods for detecting steady-state visual evoked potentials, PloS One 10 (10) (2015) e0140703. [34] H. Wang, Y. Zhang, N. Waytowich, D. Krusienski, G. Zhou, J. Jin, X. Wang, A. Cichocki, Discriminative feature extraction via multivariate linear regression for SSVEP-based BCI, IEEE Trans. Neural Syst. Rehabil. Eng. 24 (5) (2016) 532–541. [35] E. Kalunga, S. Chevallier, Q. Barth´elemy, K. Djouani, E. Monacelli, Y. Hamam, Online SSVEP-based BCI using riemannian geometry, Neurocomputing 191 (2016) 55–68. [36] Y. S. Zhang, D. Guo, P. Xu, Y. Zhang, D. Yao, Robust frequency recognition for SSVEP-based BCI with temporally local multivariate synchronization index, Cogn. Neurodyn. (2016) In Press. [37] D. MacKay, Bayesian interpolation, Neural Comput. 4 (3) (1992) 415– 447. [38] M. Tipping, Bayesian inference: An introduction to principles and practice in machine learning, In: Advanced Lectures on Machine Learning, Lecture Notes in Computer Science, Springer 3176 (2004) 41–62. [39] Y. Zhang, G. Zhou, J. Jin, Q. Zhao, X. Wang, A. Cichocki, Sparse Bayesian classification of EEG for brain-computer interface, IEEE Trans. Neural Netw. Learn. Syst. (2016) In press.
19
[40] Y. Zhang, Y. Wang, J. Jin, X. Wang, Sparse Bayesian learning for obtaining sparsity of EEG frequency bands based feature vectors in motor imagery classification, Int. J. Neural Syst. (2016) In press. [41] W. Wu, Z. Chen, S. Gao, E. Brown, A hierarchical bayesian approach for learning sparse spatio-temporal decompositions of multichannel EEG, NeuroImage 56 (4) (2011) 1929–1945. [42] W. Wu, C. Wu, S. Gao, B. Liu, Y. Li, X. Gao, Bayesian estimation of ERP components from multicondition and multichannel EEG, NeuroImage 88 (2014) 319–339. [43] W. Wu, Z. Chen, X. Gao, Y. Li, E. Brown, S. Gao, Probabilistic common spatial patterns for multichannel EEG analysis, IEEE Trans. Pattern Anal. Mach. Intell. 37 (3) (2015) 639–653. [44] M. Tipping, Sparse Bayesian learning and the relevance vector machine, The Journal of Machine Learning Research 1 (2001) 211–244. [45] H. Hotelling, Relations between two sets of variates, Biometrika 28 (3/4) (1936) 321–377. [46] O. Friman, J. Cedefamn, P. Lundberg, M. Borga, H. Knutsson, Detection of neural activity in functional MRI using canonical correlation analysis, Magn. Reson. Med. 45 (2) (2001) 323–330. [47] A. Cichocki, R. Zdunek, A.-H. Phan, S. Amari, Nonnegative Matrix and Tensor Factorization: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation, New York: Wiley, 2009. [48] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B 58 (1) (1996) 267–288. [49] Y. Zhang, J. Jin, X. Qing, B. Wang, X. Wang, LASSO based stimulus frequency recognition model for SSVEP BCIs, Biomed. Signal Process. Control 7 (2) (2012) 104–111. [50] A. Cichocki, Y. Washizawa, T. Rutkowski, H. Bakardjian, A.-H. Phan, S. Choi, H. Lee, Q. Zhao, L. Zhang, Y. Li, Noninvasive BCIs: Multiway signal-processing array decompositions, IEEE Computer 41 (10) (2008) 34–42. 20
[51] Y. Zhang, G. Zhou, Q. Zhao, J. Jin, X. Wang, A. Cichocki, Spatialtemporal discriminant analysis for ERP-based brain-computer interface, IEEE Trans. Neural Syst. Rehabil. Eng. 21 (2) (2013) 233–234. [52] G. Zhou, A. Cichocki, Y. Zhang, D. Mandic, Group component analysis for multiblock data: common and individual feature extraction, IEEE Trans. Neural Netw. Learn. Syst. (2016) In press. [53] G. Zhou, Q. Zhao, Y. Zhang, S. Xie, A. Cichocki, Linked component analysis from matrices to high order tensors: Applications to biomedical data, Proceedings of the IEEE 104 (2) (2016) 310–331. [54] Y. Zhang, G. Zhou, Q. Zhao, A. Cichocki, X. Wang, Fast nonnegative tensor factorization based on accelerated proximal gradient and low-rank approximation, Neurocomputing 198 (2016) 148–154. [55] G. Zhou, A. Cichocki, Q. Zhao, S. Xie, Nonnegative matrix and tensor factorizations: An algorithmic perspective, IEEE Signal Process. Mag. 31 (3) (2014) 54–65. [56] B. Blankertz, S. Lemm, M. Treder, S. Haufe, K. M¨ uller, Single-trial analysis and classification of ERP components – A tutorial, NeuroImage 56 (2) (2011) 814–825. [57] Y. Zhang, G. Zhou, J. Jin, Q. Zhao, X. Wang, A. Cichocki, Aggregation of sparse linear discriminant analysis for event-related potential classification in brain-computer interface, Int. J. Neural Syst. 24 (1) (2014) 1450003. [58] S. Babacan, R. Molina, A. Katsaggelos, Bayesian compressive sensing using Laplace priors, IEEE Trans. Image Process. 19 (1) (2010) 53–63. [59] M. Figueiredo, Adaptive sparseness for supervised learning, IEEE Trans. Pattern Anal. Mach. Intell. 25 (9) (2013) 1150–1159.
21