Accepted Manuscript Improvement of mental tasks with relevant speech imagery for brain-computer interfaces Li Wang, Xiong Zhang, Xuefei Zhong, Zhaowen Fan PII: DOI: Reference:
S0263-2241(16)30211-1 http://dx.doi.org/10.1016/j.measurement.2016.05.054 MEASUR 4070
To appear in:
Measurement
Received Date: Revised Date: Accepted Date:
22 July 2014 12 May 2016 13 May 2016
Please cite this article as: L. Wang, X. Zhang, X. Zhong, Z. Fan, Improvement of mental tasks with relevant speech imagery for brain-computer interfaces, Measurement (2016), doi: http://dx.doi.org/10.1016/j.measurement. 2016.05.054
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Improvement of mental tasks with relevant speech imagery for brain-computer interfaces Li Wanga, Xiong Zhangb, Xuefei Zhongb, Zhaowen Fanb a School of Mechanical and Electric Engineering, Guangzhou University, Guangzhou, 510006, China b School of Electronic Science and Engineering, Southeast University, Nanjing 210096, China E-mail:
[email protected] Tel: +86 13813363667, fax: +86-25-84536903 Abstract Brain-computer interfaces (BCIs) based on electroencephalography (EEG) have been attracted enough attention by researchers. In order to determine whether silent reading can improve mental tasks for BCI systems, this paper proposed a two-step experiment: mental tasks with speech imagery and mental tasks without speech imagery. Reading Chinese characters in mind is set as speech imagery. Since Chinese characters are monosyllabic, it is very convenient to read them in mind with related mental tasks simultaneously. Ten Chinese subjects are trained by two steps in this experiment. Feature vectors of EEG signals are extracted and classified by common spatial patterns (CSP) and support vector machine (SVM), respectively. Compared with just mental tasks, the accuracies between two tasks have been significantly improved by appending speech imagery, and the average of accuracies of ten subjects is increased from 76.3% to 82.3%. During the imagery period, the temporal stability of EEG signals is evaluated by Cronbach’s alpha coefficients. The steadiness of signals is different between mental tasks, and EEG signals are more stabilization with speech imagery. The stability of brain activity is conducive to the operation of BCIs. Keywords Brain-computer interface (BCI); Electroencephalogram (EEG); Mental tasks; Speech imagery
1 Introduction With the rapid development of computer science and biomedical technology, human bio-potential signals can be extracted to meet people’s assisted living and entertainment. A new channel is provided by brain-computer interfaces (BCIs) to help patients with motor dysfunction [1], such as amyotrophic lateral sclerosis (ALS), stroke, spinal cord injury, and traumatic brain injury. Bypassing the peripheral nerve and muscle tissue, the channel of BCIs can directly control assistive technologies with various facilities. The quality and convenience of patients’ life can be effectively promoted. There are some existing applications of BCIs for the disabled, e.g. movement within virtual reality environments [2], operation of computer cursors [3], virtual speller [4] and wheelchairs operation [5]. After experiencing the virtual reality system afforded by BCIs, not only the patients but also healthy people can enjoy the fun [6]. BCI systems can be divided into two categories: noninvasive and invasive. Though invasive BCIs can acquire signals with a higher signal to noise ratio, they still remain lots of technical difficulties and clinical risks. Noninvasive BCIs, especially based on
electroencephalography (EEG), can provide multidimensional control with security and cheapness [7]. There are several brain activities to control EEG signals, e.g. event-related (de)synchronization (ERD/ERS) [8], mental arithmetic task [9], steady-state evoked potentials (SSEPs) [10], P300 evoked potentials [11], auditory imagery and spatial navigation imagery [12]. SSEPs and P300 are related to the BCIs based on visual and auditory. As higher communication speeds, they have become popular [13]. However, they must be operated with additional equipment to produce stimuli. It is not very convenient, especially for patients. Since higher information transfer rate (ITR) can be obtained by classifying single trial EEG, it has been attracted enough attention by researchers. ERD/ERS is calculated from single trial EEG signals, and both overt motor execution and motor imagery can induce ERD/ERS of sensory-motor rhythms. To the best of our knowledge, satisfactory classification accuracy and abundant application can be provided by the motor imagery-based BCIs [14], which have been advanced significantly in flight controls in recent years, like controlling virtual helicopter and physical quadcopter [15-17]. However, their maximum number of categories is limited to four-dimensional (right hand, left hand, tongue and foot) [18]. Besides these main methods, other mental tasks are also proposed [19], e.g. visualizing some words being written on a board, non-trivial multiplication and mentally rotating a 3D object. Novel BCI systems based on speech imagery are proposed too. Eric C Leuthardt had used electrocorticography (ECoG) speech network to control a BCI [20]. In their study, the letters of the alphabet, including OO, EE, AH and EH, were selected as the speech imagery. /a/ and /u/ were proposed by Charles S. DaSalla as vowel speech imagery for an EEG-based BCI [21]. In addition to explore more of the experiment paradigms, it is a big challenge to produce reliable and consistent EEG signals when users operate BCI systems. The EEG signals are much related to users’ mental states. Anxiety, fatigue and frustration may lead to emotional instability, and then unstable EEG signals may be accompanied. In daily life, in order to concentrate on reading, people often read every word with covertly speaking, and even read it out in a low voice. During mental arithmetic, they may silently read the whole calculation process if the arithmetic is complicated. According to these phenomena, it can be speculated that silent reading can help to enhance attention. Metacognitive regulation can be enhanced by mindfulness meditation. Therefore, meditation training can improve the accuracy of BCIs [22]. R. Hebert found that time-domain phase synchrony of alpha rhythm could be enhanced during Transcendental Meditation [23]. The attention can be promoted by meditation, and the EEG signals are also more stable. Meditation includes concentrative-based meditation and mindfulness-based meditation. Keeping silent reading a Chinese character is similar to concentration-based meditation, which focuses the attention on a single stimulus. In our previous study [24], the EEG signals from speech imagery based on two Chinese characters have been analyzed and classified. Unlike the letters of the alphabet and vowels, a single Chinese character expresses a specific meaning. Furthermore, it is very natural to perform mental tasks with reading relevant characters. For BCI systems, it is meaningful to study whether silent reading can improve classification accuracies of mental tasks. The accuracy of a hybrid BCI can be improved by performing motor imagery and SSVEP simultaneously [25]. There is no study about performing mental tasks and speech imagery simultaneously. In this paper, we intend to analyze and classify EEG signals from the mental tasks with and without characters speech imagery. To continue the previous study, “左(left)” and “壹(one)”
are also selected as the characters for the speech imagery. “左” is pronounced as “zuo” in the third tone, and it means left in English. “壹” is pronounced as “yi” in the first tone, and it means one in English. Each Chinese character has its specific strokes. The educated subjects are good at visualizing the Chinese strokes being written on a board, and it is similar with visualizing some English words being written [19]. The strokes of “壹(one)” is shown in Fig. 1.
Fig. 1. The order of strokes of “壹(one)”.
It is very common to do some matters and read related characters at the same time, e.g. boxing (hitting the arm and shouting simultaneously) and writing a word with reading it. In the experiment, subjects must complete two steps to estimate the effect of speech imagery for the mental tasks. In the first step, they are required to image rotating the body to the left with reading “左(left)” in mind. They are also demanded to visualize writing a stroke of “壹(one)” one by one according to the order of the strokes and read “壹(one)” in mind simultaneously. In order to avoid the interference between the two steps, the second step is finished after several months. In the second step, subjects respectively image rotating the body to the left and visualize writing the strokes of “壹(one)” without reading relevant characters in mind. The mental tasks have a relationship with each character, so it is not hard to perform speech imagery and mental tasks simultaneously. Compared with just mental tasks, the burden of training is not evidently increased in this paper. The above-mentioned point is also proved by questionnaires after the experiment. In order to judge the effect of speech imagery, the classification accuracy and temporal stability of EEG signals should be calculated in different mental tasks. Imaging the body rotate to the left and visualizing strokes of “壹(one)” being written are totally different mental tasks, so common spatial patterns (CSP) and support vector machine (SVM) are successively used to extract and separate the EEG signals of two tasks. CSP is a state-of-the-art algorithm to extract discriminant spatial features. As different mental tasks are handled by different cerebral cortex, CSP is very suitable in the paper. Spatial filters can be constructed to maximize the variance of an imagination and to minimize the variance of the other one simultaneously. The robustness of SVM is respected to the curse-of-dimensionality, so it can obtain satisfactory results even trained by a small set with high dimensional feature vectors [26]. The following section introduces the method of acquisition, analysis, feature extraction and classification for the EEG signals. Section 3 gives the results of analysis and classification. The further analysis about experimental results is discussed in section 4. Section 5 concludes the paper.
2 Methods 2.1. Data acquisition Ten Chinese (seven males and three females), right handed students of Southeast University participate in the no feedback experiment. Aged 22-28, with the average of 23.6 years and the standard deviation of 1.7 years, they are in good health and vision. Drinking alcohol within 24 h before the test, coffee or tea within 4 h is not allowed. Seven subjects have attended the experiment of speech imagery before, but all the subjects do not have the experiences of experiment about imaging the body rotate and visualizing the Chinese strokes. This experimental protocol has been permitted by the Academic Ethics Committee of Southeast University. After explaining the purpose and instructions of the experiment, they sign Informed Consent. The subjects are seated on a comfortable chair in a room, approximately 1 m in front of a 22 inch LCD monitor. When performing the tasks, they are required to remain as still as possible with their arms resting on the arm of the chair. As shown in Fig. 2, an electrode cap with 35 channels corresponding to the international 10-20 system is placed on the head to record the EEG signals. The electrodes are distributed over the cerebral cortex including the Broca’s area, Wernicke’s area, superior parietal lobule and primary motor area (M1). The EEG signals are recorded by a SynAmps 2 system (Neuroscan Co., Ltd.). The vertical and horizontal electrooculogram (EOG) signals are recorded by two bipolar channels to monitor eye movements and blinks. After correcting the EOG, Channel-level preprocessing of EEG will be performed. When the data is recorded, the reference electrode is attached to the top of the head and the grounding electrode is attached to forehead. EEG signals are recorded after passing through a 0.1-100Hz band-pass filter, and the sampling frequency is set as 250Hz. The skin impedance is maintained below 5 kΩ.
Fig. 2. Electrode positions of the EEG setup. 35 channels are distributed over the scalp according to the international 10-20 system.
2.2 Experimental paradigm As shown in Fig.3 (A), there are two steps in the whole experiment. The first step is mental tasks with speech imagery and the second one is mental tasks without speech imagery. Training paradigms of both steps are almost the same except the imagery period. The time intervals of 10 subjects between two steps are shown in Table 1. The background of the LCD monitor keeps black. The training paradigm includes a repetition of cue-based trial. Sequence in time is shown in Fig. 3 (B). To start with each trial, a fixed asterisk is displayed on the screen, and it suggests subjects maintaining relaxation during two seconds. After that, it is
ready period of 1 s with a fixation cross. Then, a “Cue” appears for 1 s on the screen. In the next 4 s, if the “Cue” is “左(left)”, the subjects are required to keep imaging rotating the body to the left with reading the Chinese character in mind in the first step of experiment, and they just image rotating the body to the left in the second step. If the “Cue” is “壹(one)”, they visualize writing the strokes of “壹(one)” with reading it in mind in the first step, and they also just visualize writing the strokes in the second step. They can write the strokes according to their own pace, and it is not necessary to finish the strokes within 4 s. During the whole trial, subjects cannot move lips or make a sound. After finishing a trial, a short break can be taken (the break time is from 0.5 s to 1.5 s randomly, and the average value is 1 s). Each of two cues is randomly displayed 15 times in each run and the entire experiment includes 5 runs. Between each run, the subjects have sufficient time (about 5 minutes) to rest. The first step: Mental tasks with speech imagery
The second step: Mental tasks without speech imagery
Several months later
(A) Steps of experiment
Relax
0
1
Ready Cue
3
2
+
Imagery period
4 左|壹
5
6
7
8
Time (s)
Blank screen
(B) Timing of a trial of the training paradigm Fig. 3. (A) Two steps of experiment. (B) Timing of a trial of the training paradigm. 5-7 s of every trial, 2 s of relax period before imagery period of each Chinese character is regarded as Rest. These two fragments of EEG data are extracted and classified by CSP spatial filters and SVM classifier, respectively.
Table 1 The time intervals between two steps for 10 subjects. Subject S1 Interval(month) 5
S2 7
S3 7
S4 5
S5 6
S6 7
S7 6
S8 5
S9 6
S10 6
2.3. ERD/ERS analyses The EEG signals are re-referenced by the common average reference (CAR) [27], and low-frequency artifacts of EEG can be reduced. For this reason, the signal to noise ratio and classification results of EEG signals can be both improved by CAR. ERD/ERS is calculated from each event-related EEG trial after bandpass filtering. The filter range is determined by time-frequency diagram, so the range of each subject may be not the same. The percentage power decrease/increase of EEG during all trials is defined as ERD/ERS. The energy of EEG of a trial is set to A during 8 s. The base-line is set to R, which comes from 1-2 s after trial onset. The computational formula is
ERD / ERS
A R 100% R
(1)
More details can be referred to [28]. In order to evaluate the temporal stability of ERD/ERS from mental tasks with and without speech imagery, consistency coefficients of the ERD/ERS values are calculated over trials by means of Cronbach’s alpha. The coefficients have been evaluated seven mental tasks well [29]. As a model of inner consistency, Cronbach’s alpha is based on the averaged inter-item correlations [30]
=
V n (1 i i ) n 1 Vt
(2)
The number of trials is n, the variance of ERD/ERS from all the trials is Vt and the variance of ERD/ERS from trial i is Vi. The values of Cronbach’s alpha coefficient are from negative infinity to 1. Negative values are not defined, and they are indicated as "x" in Table 2. 0 means no consistency and 1 means maximum consistency. If the coefficient >0.7, it means acceptably high, >0.8 means good and >0.9 means excellent reliability.
2.4. Feature extraction The alpha (8-12 Hz) and beta (14-30 Hz) frequency ranges of EEG signals widely distribute throughout the cerebral cortex. The α-rhythm can be measured over the occipital brain, and the phenomenon of ERD/ERS is the suppression or augmentation of the α-rhythm and β-rhythm signal power in the cerebral cortex. Before classification of EEG data in the paper, the raw signals from the total of 150 trials are filtered in a 6-30 Hz band. The filter is a zero-phase forward/backward filter. As a supervised method, the goal of CSP [31] is to design the spatial filters from the result of simultaneous diagonalization of the two corresponding covariance matrices. EEG signals of two different mental tasks can be respectively projected into low-dimensional subspace by CSP spatial filters. CSP have been successfully applied to classify the movement-related EEG [32]. The filtered EEG data of one single trial is represented as the N×T matrix E. N is the number of channels and T is the number of samples for each channel. The corresponding normalized spatial covariance matrices of trial i from classes 1 and 2 are denoted as R1(i) and R2(i), and they can be obtained from
Rn (i)
EE T , n=1, 2 trace( EE T )
(3)
The averaged spatial covariance matrices over trials are
1 n1 1 R1 R1 (i ) , R2 n1 i 1 n2
n2
R (i) i 1
(4)
2
The composite spatial covariance R is given as R=R1+R2. As R is a symmetrical matrix, it can be factored into its eigenvectors: R U c cU c
T
, where Uc is the matrix of
eigenvectors, and c is considered as the diagonal matrix of eigenvalues. As the eigenvalues are arranged in descending order, the corresponding eigenvectors are also rearranged.
The whitening transformation
P c1/2U cT
(5)
It can equalize the variances by Uc, and it means that all eigenvalues of PRPT are equal to one. The covariance matrices R1 and R2 are converted to
S1 PR1PT ,
S2 PR2 PT
(6)
As S1 and S2 share common eigenvectors, it means that if S1 B1BT , then S2 B2 BT , and 1 2 I . B is the common eigenvectors of S1 and S2, I is the identity matrix. Since
1 2 I , the eigenvector with the largest eigenvalue for S1 has the smallest eigenvalue for S2. Two distributions can be classified by the eigenvectors B as this excellent property. With the best projection matrix W BT P , the decomposition of each trial E is given as Z WE . The common spatial patterns correspond to the columns of W-1, which can be treated as time-invariant EEG source distribution vectors. For each trial, the variances of only a small number m are suitable for discrimination to construct the classifier. After whitening, EEG signals are projected on the first m and last m columns eigenvectors of B. Thus, feature vectors can be calculated from signals Zp (p=1,…,2m) by the following changes
var( Z ) p f p log 2 m var( Z ) i i 1
(7)
It is similar to [31], m is also set as 2 in the paper. The variance of the vector’s elements is marked as var(·), and the distributions of variance are closed to Gaussian distribution by logarithmic transformation.
2.5. Feature classification The feature vectors of EEG are tried to classify by the support vector machine (SVM) [33]. SVM is based on statistical learning theory, and its goal is to solve the problem by searching a hyperplane to separate the training data X with labels Y. For a two-class problem (x+1 and x-1), there is a discriminant function f as the classification rule. Training data xM is judge by
f ( xM ) . xM belongs to x1 ( x1 ) , if f ( xM ) 1( 1) according to the more robust decision rule with a margin. The discriminant function can be written as follows
wT xi b 1, f ( xi ) 1 T w xi b 1, f ( xi ) 1
(8)
In order to include a high dimensional model for classification function, the training data
are transformed by a Mercer reproducing kernel K. A radial basis kernel is an effective choice for K, which is defined by its width g. Error ξ and discipline factor c are introduced to allow mislabeled samples
min
1 2 w +c i subject to yi (wT xi b) 1 i 0 2 i
i 0 i
(9)
LIBSVM [34] is selected to implement the SVM classifier. For each subject, the different optimal values of c and g are respectively obtained by grid search method and 10-fold cross-validation in both steps of experiment. The search ranges of c and g are both from 2-10 to 210.
3 Results Each trial lasts at least 8 s, and useful information is only contained in the signals during imagery period. After time-frequency analysis, the EEG signals of imagery period can be represented outstanding performance from the whole time range. Because of this, the choices of time window and frequency band can be referred to the time-frequency representation in this study. Before starting the next analysis, the signals of all channels are re-referenced by CAR. In order to compare with the previous work (speech imagery) [24], channels FC3 and CP3 near the Broca’s area and Wernicke’s area, are also respectively selected to analyze event-related spectral perturbation (ERSP). After convoluting by complex Morlet’s wavelets, ERSP can be obtained by superimposing energy spectrum distribution of single trial. In this paper, the ERSP is plotted by EEGLAB [35]. The ERSPs of two channels are respectively presented in Fig. 4 (A) and (B), which are from subject S2 performing the first and second steps of experiment.
(A) Mental tasks with speech imagery (B) Mental tasks without speech imagery Fig. 4. ERSPs of two cues from channels FC3 and CP3 for subject S2. (A) and (B) are respectively from the first and second steps of experiment. Bootstrap significance level is 0.01. t=0 s is corresponded to t=3 s of Fig. 3 (B), as the cue appears. Left and right columns are respectively represented as FC3 and CP3 in (A) and (B). The horizontal axis of each subgraph shows time and the vertical axis shows frequency.
As shown in Fig. 4, the cue appears in 0 s, and subjects do not imagine at this time. After
1 s later, it is the imagery period lasting 4 s, and they are demanded to imagine during imagery period. The ERSPs of two different imaginations are shown in the first and second rows. The relative power spectra is depicted by colors in the domain of time and frequency. In the beginning of the imagery period, subjects need certain reaction time to perform the tasks. The reaction time of subject S2 is more than one second, so the power of EEG starts to increase over the second 2. The imagery period ends at the second 5. S2 also requires certain reaction time to end the tasks, and the ERSP vanishes at 5.5 s in Fig. 4. When subject S2 engages in the mental tasks with speech imagery, the power of EEG in the frequency range between 9 and 16 Hz is increased more in channels FC3. Without speech imagery, only the energy of channels CP3 is increased in the same frequency range when he visualizes strokes of “壹(one)”. After filtering by a sixth-order Butterworth bandpass filter (the filter range is 6-30Hz), the power spectrum of EEG signals is calculated by autoregressive (AR) mode in imagery period. The power spectrums are averaged over trials. Between two tasks, the channel is chosen with the greatest difference. For subject S2, channel P6 is selected. Fig. 5 presents the waveforms of power spectrum of channel P6, when he performs the first step of experiment.
Fig. 5. The waveforms of power spectrum are calculated by AR model for channel P6 as subject S2 engages in the mental tasks with speech imagery. Legend “—” stands for “左(left)” and “--” stands for “壹(one)”.
The points of two waveforms are maximum discrepancy at 12Hz in Fig. 5. Significance analysis of these points is calculated by paired samples t-test for 75 trials. The absolute value of t is 3.937, so p-value is less than 0.01. After ten subjects performing the two steps of experiment, the above procedures are executed. Results are shown in Table 2. For eight subjects, the absolute value of t is greater as they perform mental tasks with speech imagery. Most of channels locate in the parietal of cerebral cortex. The more significant difference between tasks predicts higher classification accuracy. Table 2 The most different points between two waveforms from selected channels are calculated by paired samples t-test for ten subjects (Sub). The sample number is 75. The absolute value of t is greater than 2.64, and p-value is less than 0.01. The absolute value of t is greater than 1.99, and p-value is
less than 0.05. *: p<0.05 and **: p<0.01 are according to the t-test. Mental tasks with speech imagery
Mental tasks without speech imagery
Sub
Channel
/t/
Channel
/t/
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
P5 P6 P4 P6 P4 P6 F5 P4 P6 P1
4.002 ** 3.937 ** 3.045 ** 2.781 ** 1.495 3.513 ** 3.342 ** 1.927 3.193 ** 3.336 **
F5 P6 P6 P6 P6 P4 C6 F5 FC6 P5
2.177 * 2.042 * 1.776 2.966 ** 2.403 * 2.206 * 1.368 1.718 2.603 * 2.649 **
The ERD/ERS of two steps are compared in Fig. 6 for subject S2. Channels P6 is selected as its ERD/ERS is the most obvious and its power spectrums are also maximum discrepancy between mental tasks. According to the ERSP, the frequency range is determined as 9-16 Hz. When subject S2 performs the mental tasks with speech imagery, the ERSs of two tasks are more obvious in channel P6. Furthermore, the ratio of the power values of EEG during imagery period between two tasks are of greater difference in Fig. 6 (A). More obvious difference of the ratio suggests the higher classification accuracy. Besides, the accuracies are also affected by the stability of EEG. The temporal stability of ERD/ERS is evaluated by Cronbach’s alpha coefficients. In the beginning of the imagery period, subjects need certain reaction time to perform the tasks. Hence, the following analysis is started in 0.5 seconds after the beginning of imagery period. According to Fig. 3(B), EEG signals are divided into three time intervals, which are 4.5-5.5 s, 5.5-6.5 s and 6.5-7.5 s. Based on analysis results, the channels with large discrepancy between two tasks usually have obvious ERD/ERS phenomenon. For this reason, the channels are the same as Table 2. The frequency ranges of EEG signals are determined in the light of their ERSPs. Cronbach’s alpha coefficients of ten subjects are presented in Table. 3.
(A) Mental tasks with speech imagery (B)Mental tasks without speech imagery Fig. 6. ERD/ERS of the two-step experiment in channel P6 for subject S2. Both of the frequency ranges are from 9 to 16 Hz, and the time axis is the same as Fig. 3 (B).
Table 3
Temporal stability of ERD/ERS is evaluated by Cronbach’s alpha coefficient, and the coefficients of three time intervals are calculated over trials. If the coefficient yields >0.7, it is marked in bold. The coefficient is marked as "x" when it is negative. The frequency (Fre) ranges of EEG signals are determined in the light of their ERSPs. Mental tasks with speech imagery
Mental tasks without speech imagery
Sub
Fre(Hz)
“左(left)”
“壹(one)”
“左(left)”
“壹(one)”
S1 S2 S3 S4
7-15 9-16 8-27 8-26 7-24 8-25 6-18 6-15 8-13 7-14
0.85 0.62 0.54 0.98 0.95 0.42 0.90 0.94 0.84 0.94 0.02 0.74 0.80 0.81 0.68 0.99 0.44 x 0.64 0.61 0.45 x 0.66 x 0.91 x x 0.81 x x
0.62 x x 0.96 0.80 0.88 0.12 0.04 x 0.80 0.81 0.68 x x 0.70 0.92 x 0.58 0.26 0.34 0.69 x x 0.58 x 0.17 0.76 0.46 0.29 0.21
0.82 x x 0.86 0.19 x 0.85 0.39 0.43 0.94 0.67 0.46
S5 S6 S7 S8 S9 S10
0.94 0.09 0.36 0.99 0.79 0.66 0.93 0.42 x 0.97 x 0.58 0.94 0.67 0.46 0.98 x x 0.82 x x x x x 0.85 0.60 x 0.57 0.10 x
0.88 x 0.37 0.96 x x 0.78 x 0.61 0.14 x x x 0.65 x 0.39 x x
Stability of ERD/ERS differs among three time intervals, and the result of the first time interval shows the highest stability in four different mental tasks. Compared with no speech imagery, ERD/ERS of mental tasks is more stable with speech imagery. When the Cue is “壹 (one)”, the change of EEG signals is more stable than the result of “左(left)” in both steps. According to the questionnaires, the reason is that subjects visualize writing the strokes more skilled. After all, writing the strokes is the first step to learn to write Chinese characters. As these educated subjects have taken a long-term training in character-writing, it is very common to visualize writing the strokes without a pen. Additionally, many people also have a habit to visualize writing the strokes while reading the number of the strokes silently. In the first step of experiment, reading the number is replaced by reading the character. Hence, visualizing writing with speech imagery can be trained easily. The power of EEG from 2-4 s has the most obvious change in Fig. 4, and this time frame corresponds to 5-7 s of Fig. 3 (B). Therefore, for single-trial classification, this fragment of EEG data will be analyzed. Another fragment data is from 2 s of “Relax” period in the beginning of each trial, which is regarded as Rest. The EEG signals from imagery period and Rest are compared to estimate whether the signals are changed when subjects performed the tasks. To properly calculate the classification accuracy, the EEG signals of each subject are divided into training and testing sets by 10×10 cross-validation. This method demonstrates that the dataset is randomly divided into ten equally parts. Each part is used to test spatial filters and classifier, while other parts are used to build four CSP spatial filters, extract the feature value by (7) and train the classifier. Each of ten parts acts as the testing set when others are training sets. The averaged result can be obtained by ten different accuracies. This is the accuracy of tenfold cross validation. To further improve the result, this training/testing procedure is repeated 10 times with random detachment. All accuracies over the 10 times are averaged again, and then standard deviation is also calculated [36]. Accuracy rates of ten subjects is presented in Table 4.
Table 4 The classification result of two-step experiment: accuracy±std (%). Mental tasks with speech imagery Sub
Rest vs 左(left)
Rest vs 壹(one)
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 Mean
83.7±2.6 80.7±2.7 80.1±3.7 79.8±3.0 73.9±3.5 77.3±3.1 82.3±2.1 73.7±3.8 81.4±3.2 84.9±2.4 79.8±3.8
81.6±2.1 88.6±1.5 84.3±3.9 89.9±1.5 81.7±2.1 91.1±1.2 86.7±1.7 75.9±4.2 86.0±1.7 87.3±1.9 85.3±4.6
左(left) vs 壹(one)
82.3±2.8 84.7±1.9 81.3±2.2 79.7±4.6 78.5±3.8 89.3±1.8 85.9±2.5 71.6±2.7 87.3±1.9 82.7±2.2 82.3±5.1
Mental tasks without speech imagery Rest vs 左(left)
69.5±2.5 75.3±2.0 75.3±3.4 78.7±2.6 72.5±2.9 84.9±2.4 75.1±2.1 72.9±2.4 71.4±2.7 80.5±3.8 75.6±4.6
Rest vs 壹(one)
80.7±1.9 82.3±1.5 80.7±4.2 82.3±1.7 76.9±3.2 75.4±1.1 74.3±4.0 73.9±3.9 76.6±3.4 77.7±2.0 78.1±3.2
左(left)vs 壹(one)
78.7±2.9 75.4±3.9 72.0±2.1 76.7±4.2 82.3±3.8 74.0±3.3 73.3±3.2 71.3±3.6 80.3±1.6 79.1±2.8 76.3±3.7
When they perform mental tasks with speech imagery, the average validation accuracies of “左(left)” vs “壹(one)” from ten subjects are between 71.6% and 89.3% in Table 4. Just three results (S4, S5 and S8) are less than 80%. With speech imagery, the average accuracy of “左 (left)” vs “壹(one)” is 82.3%, and it is better than that without speech imagery (76.3%). The accuracies have been significantly improved (p<0.01). The results of Rest vs “左(left)” and Rest vs “壹(one)” are also enhanced by speech imagery. Hence, the accuracies of mental tasks can be effectively improved by appending speech imagery. The average accuracies of Rest vs “壹(one)” are better than Rest vs “左(left)” in both steps, which is consistent with the results in Table 3. The phenomenon can refer to Fig. 6. The power of EEG from two tasks both increases during imagery period, and the power of EEG from “壹(one)” increases more than the other one. The reason of the above results is that the educated subjects are good at visualizing the Chinese strokes, and visualizing writing the strokes of characters is an important step to learn to write them. As it is common to speak along with related actions simultaneously in daily life, ten subjects can successfully complete the training paradigm of this experiment. Furthermore, combining motor imagery and mental tasks of this research is the purpose of our future research. After appending relevant speech imagery, the difference between motor imagery and mental tasks may be increased. The accuracies of subject S6 are the best among ten subjects, so his EEG signals will be analyzed in the ensuing discussion.
4 Discussion Imaging rotating the body to the left and visualizing writing the strokes of a character are elected as the mental tasks in this study. Compared with visualizing writing the strokes, imaging rotation is harder to complete. Questionnaires after experiment also explain this phenomenon. Two tasks with different complexity are respectively appended by relevant speech imagery in the first step. Tasks without speech imagery are also trained in the second step. With speech imagery, the accuracies of mental tasks are better. Not only the accuracies between two tasks are improved, but also EEG signals between each task and Rest are more
differentiated. As Chinese is a kind of ideographic character, relevant meanings can be easily associated. Additionally, Chinese characters are monosyllabic which suggests that each sound can contain a particular meaning. Therefore, it can be naturally expressed to read the character and imagine the related meaning simultaneously. When some persons visualize writing the strokes of characters, they usually cannot help counting the number of the strokes. Furthermore, imaging rotation and writing the Chinese strokes are familiar behaviors in daily life. Questionnaires also demonstrate that most of subjects can smoothly expert in the mental tasks with reading the relevant character in mind. As shown in the results of the experiment, the accuracies between two mental tasks have been significantly improved by appending relevant speech imagery. The reason may be that people’s attention is enhanced by speech, and more cerebral cortex is also activated. It is especially meaningful attempt to increase the operational dimensions of BCIs by combining motor imagery and other mental tasks [19]. As motor imagery is processed by the sensorimotor cortex and speech is processed by the Broca’s area and the Wernicke’s area [37], motor imagery has been extended by speech imagery for BCI in our other study [38]. Furthermore, without long training, the number of dimension can be increased to operate the BCI systems by this method. Particularly, there is the second motivation in our research, and another option can be offered to the patients with spinal-cord injuries and amputees. Unlike able-bodied persons, it may be difficult to perform motor imagery for them after long paralysis, [39]. With the help of speech imagery, brain activity may be more stabilize while they perform mental tasks. According to the ERSPs, energy of EEG signals is only changed within a specific frequency range. For this reason, the frequency range should be confirmed before drawing ERD/ERS maps. For each subject, there is great individual difference of frequency range in Table 3. Therefore, the stability is not calculated in line with the band of EEG, e.g. theta band, alpha band and beta band, but within the specific frequency range. The specific range is from the most representative channel. Friedrich examined the stability of ERD/ERS values by Cronbach’s alpha coefficients, and the results showed that word association, mental subtraction and spatial navigation task were highest consistency [29]. Similar to the above conclusion, the tasks of “壹(one)” show higher steadiness in both steps. Besides depending on the specific task, the temporal stability of ERD/ERS also differs among three time intervals. Based on the results of the coefficients, ERD/ERS of mental tasks is the most stable in the first time interval of imagery period. It is consistent with [29]. This may suggest that subjects tend to maintain a high attention in the beginning of imagery. As the tasks continue, the temporal stability of ERD/ERS will gradually reduce. According to the results of Rest vs “左(left)” and Rest vs “壹(one)” in Table 4, the task with higher stability can achieve higher classification accuracy. The reliability of brain activation patterns demonstrates more excellent BCI systems. For the application of systems, it has the significance to select appropriate tasks and imagery period. From ERSPs, it is hard to distinguish the difference of EEG signals from two tasks in the same channel. Just like Fig. 4 (A), ERSPs of channel FC3 and CP3 are similar in two tasks. However, when ERD/ERS of two tasks is plotted in the same figure, the difference will be displayed. The energy of EEG both increases in two tasks, but the amplitude of “壹(one)” is
greater in Fig. 6. The same result can be gotten by power spectrums in Fig. 5. Speech information is mainly processed by the left cerebral cortex (the Broca’s area and the Wernicke’s area). Semantic processing has a relationship with several brain areas of the left frontal lobe, and it can also monitor and extract semantic information from the posterior temporal lobes through semantic execution system [40]. Writing the strokes is an acquired task after long time learning, which may be processed by a wide range of cortical. The mental task with rotating the body involves motor imagery and mentally rotating an object. Relating to movement of the body, the area of motor cortex is very small. The EEG signals can be changed weakly by movement of the body. Therefore, the BCIs based on motor imagery do not include imagining body movement, but they include imagining movements about right hand, left hand, tongue and foot. Imagining rotation has a relationship with the superior parietal lobule [41]. The EEG signals can be distinguished obviously from the area. For most of subjects, the channels are selected from the area of parietal lobule in Table 2, and channel P6 is frequently analyzed in the above results. In order to further demonstrate the distribution of EEG in the cerebral cortex during imagery period of the first step, power spectrums of channels F5, F6, C5, C6, P5 and P6 are analyzed in frequency domain. The six channels respectively distribute in the left and right sides of the frontal, central and parietal of cerebral cortex. Channel CZ is set as the reference, which locates at the center of brain. The calculation of power spectrum is the same as above. The result can be observed in Fig. 7. As subject S6 imagines rotating the body to the left with reading “左(left)” in mind, the energy of EEG signals is greater from 10 to 30 Hz in channels P5 and P6 (p<0.01). Additionally, the divergence of channel P6 is more obvious than P5 in 11 Hz. There is very obvious difference among the results of the ten subjects. The reasons of this phenomenon may be various educational backgrounds and different understandings of the experimental task amongst the subjects.. Thus, they do not perform the mental tasks in the same way. In spite of this, a common feature is still shared by them. The feature is that power spectrums of EEG signals are great different in the parietal of cerebral cortex between two imagery, and it is the same as the result of Table 2. The results of Channels P5 and P6 for subject S6 also reveal the feature in Fig. 7. The choice of channels for BCIs will be guided by this feature in the future, due to fewer channels mean lower price of BCI systems and more convenient to use. The algorithms of channel selection have been extensively studied, e.g. genetic algorithm [42], sparse common spatial pattern (SCSP) [43] and mutual information [44].
Fig. 7. Superimposed average of power spectrums in frequency domain of channels F5, F6, C5, C6, P5 and P6 when subject S6 performs mental tasks with speech imagery. Channel CZ is set as the reference. The electrode positions of EEG setup are located in the center of the figure. The horizontal axis of each surrounding subgraph shows frequency and the vertical axis shows power spectrum. legend “—” stands for “左(left)” and “--” stands for “壹(one)”. **: p<0.01 is according to the t-test.
Although this experiment is only suitable for Chinese, it still has its value. Along with the aging of the 1.3 billion Chinese people, the BCIs gain a very potential application value in China. This experiment can promote the popularity of BCIs. The letters of the alphabet cannot express specific meanings, but the subjects speaking other language (English for instance) can execute mental tasks with reading the related words in mind. The results of the analysis can be compared with this paper.
5 Conclusions Morale can be inspired by speech as attention can be raised as well. The stability of EEG signals from mental tasks can be also improved by speech imagery. In order to compare with the previous study of Chinese speech imagery, we continue to use CSP and SVM to extract and classify feature values in the paper, respectively. The effectiveness of this experimental is
also proved by analyzing power spectrums and Cronbach’s alpha coefficients. In day-to-day life, it’s a universal behavior for a literate to imagine speech and visualize the words being written on a board. After a short training, subjects can be gradually expert in imaging rotating the body to the left. Thus, based on this cognitive operation, an easier operation of BCI systems will be offered. Sufficient accuracy can be achieved without long-term training.
Acknowledgments This work was supported by the National Key Basic Research Program of China (No. 2010CB327705), the National High Technology Research and Development Program of China (No. 2012AA03A302), and the Fundamental Research Funds for the Central Universities under Grant CXLC12_0095, and this work was also supported by Shenzhen China Star Optoelectronics Technology Co. Ltd.
References [1] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, T. M. Vaughan, Brain-computer interfaces for communication and control, Clin. Neurophysiol. 113 (2002) 767–791. [2] R. Leeb, F. Lee, C. Keinrath, R. Scherer, H. Bischof, G. Pfurtscheller, Brain-computer communication: motivation, aim, and impact of exploring a virtual apartment, IEEE Trans. Neural Syst. Rehabil. Eng. 15(4) (2007) 473-482. [3] J. Long, Y. Li, T. Yu, Z. Gu, Target selection with hybrid feature for BCI-based 2-D cursor control, IEEE Trans. Biomed. Eng. 15(1) (2012) 132-140. [4] E. Yin, Z. Zhou, J. Jiang, F. Chen, Y. Liu, D. Hu, A novel hybrid BCI speller based on the incorporation of SSVEP into the P300 paradigm, J. Neural Eng. 10(2) (2013) 026012. [5] J. Long, Y. Li, H. Wang, T. Yu, J. Pan, F. Li, A hybrid brain computer interface to control the direction and speed of a simulated or real wheelchair, IEEE Trans. Neural Syst. Rehabil. Eng. 20(5) (2012) 720-729. [6] S. B. Badia, A. G. Morgade, H. Samaha, P. F. M. J. Verschure, Using a hybrid brain computer interface and virtual reality system to monitor and promote cortical reorganization through motor activity and motor imagery training, IEEE Trans. Neural Syst. Rehabil. Eng. 21(2) (2013) 174-181. [7] D. McFarland, D. Krusienski, W. Sarnacki, J. Wolpaw, Emulation of computer mouse control with a noninvasive brain-computer interface, J. Neural Eng. 5 (2008) 101-110. [8] G. Pfurtscheller, C. Brunner, A. Schlögl, F. H. Lopes da Silva, Mu rhythm (de)synchronization and EEG single-trial classification of different motor imagery tasks, NeuroImage 31 (2006) 153-159. [9] Q. Wang, O. Sourina, Real-time mental arithmetic task recognition from EEG signals. IEEE Trans. Neural Syst. Rehabil. Eng. 21(2), (2013) 225-232. [10] P. Lee, C. Yeh, J. Y. Cheng, C. Yang, G. Lan, An SSVEP-based BCI using high duty-cycle visual flicker, IEEE Trans. Biomed. Eng. 58 (2011) 3350-3359. [11] M. Salvaris, C. Cinel, L. Citi, R. Poli, Novel protocols for P300-based brain-computer interfaces, IEEE Trans. Neural Syst. Rehabil. Eng. 20 (2012) 8-17. [12] A. F. Cabrera, D. Farina, K. Dremstrup, Comparison of feature selection and classification methods for a brain–computer interface driven by non-motor imagery, Med.
Biol. Eng. Comput. 48 (2010) 123-132. [13] S. Gao, Y. Wang, X. Gao, B. Hong, Visual and auditory brain-computer interfaces, IEEE Trans. Biomed. Eng. 61 (2014) 1436-1447. [14] H. Yuan, B. He, Brain-computer interfaces using sensorimotor rhythms: current state and future perspectives, IEEE Trans. Biomed. Eng. 61(5) (2014) 1425-1435. [15] A. Royer, A. Doud, M. L. Rose, B. He, EEG Control of a Virtual Helicopter in 3-Dimensional Space Using Intelligent Control Strategies, IEEE Trans. Neural Syst. Rehabil. Eng. 18(6) (2010) 581-589. [16] A. J. Doud, J. P. Lucas, M. T. Pisansky, B. He, Continuous three-dimensional control of a virtual helicopter using a motor imagery based brain-computer interface, PLoS One 6(10) (2011) e26322. [17] K. LaFleur, K. Cassady, A. Doud, K. Shades, E. Rogin, B. He, Quadcopter control in three-dimensional space using a noninvasive motor imagery-based brain-computer interface, J. Neural Eng. 10(4) (2013) 046003. [18] M. Naeem, C. Brunner, R. Leeb, B. Graimann, G. Pfurtscheller, Seperability of four-class motor imagery data using independent components analysis, J. Neural Eng. 3 (2006) 208-216. [19] F. Faradji, R. K. Ward, G. E. Birch, Toward development of a two-state brain-computer interface based on mental tasks, J. Neural Eng. 8(4) (2011) 046014. [20] E. C Leuthardt, C. Gaona, M. Sharma, N. Szrama, J. Roland, Z. Freudenberg, J. Solis, J. Breshears, G. Schalk, Using the electrocorticographic speech network to control a brain–computer interface in humans, J. Neural Eng. 8(3) (2011) 036004. [21] C. S. DaSalla, H. Kambara, M. Sato, Y. Koike, Single-trial classification of vowel speech imagery using common spatial patterns, Neural Netw. 22 (2009) 1334-1339. [22] L. F. Tan, Z. Dienes, A. Jansari, S. Y. Goh, Effect of mindfulness meditation on brain-computer interface performance. Consciousness and Cognition 23 (2014) 12-21. [23] R. Hebert, D. Lehmann, G. Tan, F. Travis, A. Arenander, Enhanced EEG alpha time-domain phase synchrony during Transcendental Meditation: implications for cortical integration theory. Signal Processing 85(11) (2005) 2213-2232. [24] L. Wang, X. Zhang, X. Zhong, Y. Zhang, Analysis and classification of speech imagery EEG for BCI, Biomed. Signal Proces. 8 (2013) 901-908. [25] B. Z. Allison, C. Brunner, V. Kaiser, G. R. Müller-Putz, C. Neuper, G. Pfurtscheller, Toward a hybrid brain-computer interface based on imagined movement and visual attention, J. Neural Eng. 7(2) (2010) 026007. [26] F. Lotte, M. Congedo, A. Lecuyer, F. Lamarche, B. Arnaldi, A review of classification algorithms for eeg-based brain computer interfaces, J. Neural Eng. 4(2) (2007) R1–R13. [27] D. J. McFarland, L. M. McCane, S. V. David, J. R. Wolpaw, Spatial filter selection for EEG-based communication, Electroencephalography and clinical Neurophysiology 103(3) (1997) 386-394. [28] G. Pfurtscheller, F. H. Lopes da Silva, Event-related EEG/MEG synchronization and desynchronization: basic principles, Clin. Neurophysiol. 110(11) (1999) 1842-1857. [29] E. V. Friedrich, R. Scherer, C. Neuper, Stability of event-related (de-) synchronization during brain-computer interface-relevant mental tasks, Clin. Neurophysiol. 124(1) (2013) 61-69.
[30] L. J. Cronbach, Coefficient alpha and the internal structure of tests, Psychometrika 16(3) (1951) 297-334. [31] H. Ramoser, J. Müller-Gerking, G. Pfurtscheller, Optimal spatial filtering of single trial EEG during imagined hand movement, IEEE Trans. Rehabil. Eng. 8 (2000) 441-446. [32] J. Müller-Gerking, G. Pfurtscheller, H. Flyvbjerg, Designing optimal spatial filters for single-trial EEG classification in a movement task, Clin. Neurophysiol. 110(11) (1999) 787-798. [33] T. N. Lal, M. Schröder, T. Hinterberger, J. Weston, M. Bogdan, N. Birbaumer, B. Schölkopf, Support vector channel selection in BCI, IEEE Trans. Biomed. Eng. 51 (2004) 1003-1010. [34] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, 2011 http://www.csie.ntu.edu.tw/cjlin/libsvm. [35] A. Delorme, S. Makeig, EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics, J. Neurosci. Methods 134 (2004) 9-21. [36] C. Guger, H. Ramoser, G. Pfurtscheller, Real time EEG analysis with subject specific spatial patterns for a brain-computer interface, IEEE Trans. Rehabil. Eng. 8(4) (2000) 447–456. [37] R. L. Billingsley-Marshall, T. Clear, W. E. Mencl, P. G. Simos, P. R. Swank, D. Men, S. Sarkari, E. M. Castillo, A. C. Papanicolaou, A comparison of functional MRI and magnetoencephalography for receptive language mapping, J. Neurosci. Methods 161 (2007) 306-313. [38] L. Wang, X. Zhang, Y. Zhang, Extending motor imagery by speech imagery for brain-computer interface, in: Proceedings of the 35rd Annual Conference of the IEEE Engineering in Medicine and Biology Society, Osaka, Japan, 2013, pp. 7056-7059. [39] G. Birch, Z. Bozorgzadeh, S. Mason, Initial on-line evaluations of the LF-ASD brain–computer interface with able-bodied and spinal-cord subjects using imagined voluntary motor potentials, IEEE Trans. Neural Syst. Rehab. Eng. 10(4) (2002) 219–224. [40] R. A. Poldrack, A. D. Wagner, M. W. Prull, J. E. Desmond, G. H. Glover, J. D. E. Gabrieli, Functional specialization for semantic and phonological processing in the left inferior prefrontal cortex, NeuroImage 10 (1999) 15–35. [41] K. Hugdahl, T. Thomsen, L. Ersland, Sex differences in visuo-spatial processing: An fMRI study of mental rotation, Neuropsychologia 44(9) (2006) 1575-1583. [42] J. Yang, H. Singh, E. L. Hines, F. Schlaghecken, D. D. Iliescu, M. S. Leesonc, N. G. Stocks, Channel selection and classification of electroencephalogram signals: An artificial neural network and genetic algorithm-based approach, Artif. Intell. Med. 55 (2012) 117-126. [43] M. Arvaneh, C. Guan, K. K. Ang, C. Quek, Optimizing the channel selection and classification accuracy in EEG-based BCI, IEEE Trans. Biomed. Eng. 58(6) (2011) 1865-1873. [44] J. Meng, G. Huang, D. Zhang, X. Zhu, Optimizing spatial spectral patterns jointly with channel configuration for brain-computer interface, Neurocomputing 104 (2013) 115-126.
Highlights A novel imagery mode is proposed: visualize writing the strokes of a Chinese character. The accuracies and stabilities of EEG signals from two mental tasks have been significantly improved by relevant speech imagery. The stability of EEG signals is totally different between mental tasks.