Designing optimal spatial filters for single-trial EEG classification in a movement task

Designing optimal spatial filters for single-trial EEG classification in a movement task

Clinical Neurophysiology 110 (1999) 787±798 Designing optimal spatial ®lters for single-trial EEG classi®cation in a movement task Johannes MuÈller-G...

477KB Sizes 2 Downloads 115 Views

Clinical Neurophysiology 110 (1999) 787±798

Designing optimal spatial ®lters for single-trial EEG classi®cation in a movement task Johannes MuÈller-Gerking a,*, Gert Pfurtscheller b,1, Henrik Flyvbjerg c, d a

b

HLRZ, Forschungszentrum JuÈlich, D-52425 JuÈlich, Germany Department of Medical Informatics, Graz University of Technology, Brockmanngasse 41, A-8010 Graz, Austria c Optics and Fluid Dynamics Department, Risù Nat'l Lab, DK-4000 Roskilde, Denmark d The Niels Bohr Institute. Blegdamsvej 17, DK-2100 Copenhagen é, Denmark Accepted 8 September 1998

Abstract We devised spatial ®lters for multi-channel EEG that lead to signals which discriminate optimally between two conditions. We demonstrate the effectiveness of this method by classifying single-trial EEGs, recorded during preparation for movements of the left or right index ®nger or the right foot. The classi®cation rates for 3 subjects were 94, 90 and 84%, respectively. The ®lters are estimated from a set of multichannel EEG data by the method of Common Spatial Patterns, and re¯ect the selective activation of cortical areas. By construction, we obtain an automatic weighting of electrodes according to their importance for the classi®cation task. Computationally, this method is parallel by nature, and demands only the evaluation of scalar products. Therefore, it is well suited for on-line data processing. The recognition rates obtained with this relatively simple method are as good as, or higher than those obtained previously with other methods. The high recognition rates and the method's procedural and computational simplicity make it a particularly promising method for an EEG-based brain±computer interface. q 1999 Elsevier Science Ireland Ltd. All rights reserved. Keywords: EEG classi®cation; Mu rhythm; Voluntary movement; Sensorimotor cortex; Human; Event-related desynchronization

1. Introduction The study of surface EEG as a possible new communication channel for severely disabled persons has a long history (Nirenberg et al., 1971), and has received increased attention recently (e.g. Farwell and Donchin, 1988; Sutter and Tran, 1990; Wolpaw and McFarland, 1994; Kalcher et al., 1996; Pfurtscheller et al., 1997). The EEG allows the observation of gross electrical ®elds of the brain, and re¯ects changes in neural mass activity associated with various mental processes. Since some mental processes result in distinguishable EEGs, a person that can produce such mental processes at will, has the potential to use them for communication. The feasibility of this communication depends on the extent to which the EEGs associated with these mental processes can be reliably recognized automatically. The electro-physiological phenomena investigated most in the quest for an automatic discrimination of mental * Corresponding author. Tel.: 1 49-2461-612-318; fax: 1 49-2461612-430. E-mail address: [email protected] (J. MuÈller-Gerking) 1 Supported in part by the Austrian `Fonds zur FoÈrderung der wissenschaftlichen Forschung', project P11208MED.

states are event-related potentials (EP) (Farwell and Donchin, 1988; Sutter and Tran, 1990), and localized changes in spectral power of spontaneous EEG related to sensorimotor processes (see e.g. Wolpaw and McFarland, 1994; Kalcher et al., 1996; Pfurtscheller et al., 1997). It is well known that planning and execution of movement leads to a short-lasting and circumscribed attenuation known as event-related desynchronization (ERD, Pfurtscheller and Aranibar, 1979) of rhythmic EEG components in the alpha and beta band (Gastaut, 1952; Chatrian et al., 1959; Kuhlman, 1978; Pfurtscheller et al., 1996). In the case of ®nger or hand movement, the desynchronization starts in the contralateral sensorimotor cortex during the planning phase and stays asymmetrical over both hemispheres until movement onset. Recordings from subdural electrodes show similar behavior, but the responses are more localized and the changes in spectral power are much enhanced (Toro et al., 1994). The ERD of mu and central beta rhythms can be seen as a correlate of exited or activated sensorimotor areas, where thalamo-cortical information exchange and processing takes place (Steriade and Llinas, 1988). An interesting observation is that at the same moment in time, different cortical

1388-2457/99/$ - see front matter q 1999 Elsevier Science Ireland Ltd. All rights reserved. PII: S 1388-245 7(98)00038-8

CLINPH 98577

788

J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

areas can display focal attenuated (ERD) and focal enhanced mu and beta components. The latter phenomenon is known as event-related synchronization (ERS) and may be seen as a correlate of active inhibited or deactivated cortical areas (Pfurtscheller, 1992). It may be hypothesized that the structure controlling the patterns of simultaneous cortical desynchronization and synchronization, and hence, the gating of the thalamo-cortical information transfer, is the reticular thalamic nucleus (Yingling and Skinner, 1977). In the case of hand movement, ERD can be found over the hand area and ERS over the foot area. Foot or toe movement can result in a foot area ERD and simultaneously in a hand area ERS (Pfurtscheller et al., 1996). The observation of simultaneously attenuated and enhanced EEG rhythms can be used to classify brain states related to the planning or even imagination of different types of limb movements. Recently, imagined movements of the right and left hand were classi®ed correctly in 80% of single trials in 3 trained subjects (Pfurtscheller et al., 1997). Such imagination prompted changes of sensorimotor rhythms have been suggested as a possible means to reestablish communication in patients with severe motor disturbances (Wolpaw et al., 1991; Wolpaw and McFarland, 1994). In order to be useful in practical applications, such a system must achieve close to 100% classi®cation accuracy. In many of these experiments, the EEG is recorded on a multitude of channels placed in a dense grid covering large parts of the brain. Given that the sensorimotor rhythms originate from very localized areas in the cortex, we expect that not all signals recorded from different sites contribute the same amount of information to the classi®cation, and some may only contribute noise. The central electrodes overlying primary sensorimotor areas will be most important for discrimination. Since the skull and the scalp cause a spatial smearing of the cortical signals, electrodes close to sensorimotor areas will also contain relevant information. However, with increasing distance from sensorimotor areas, the recorded signal will be increasingly contaminated by cortical activity unrelated to the movement to be discriminated. Two consequences arise from this situation: ®rst, the signals from different electrodes have to be weighted in some way, in order to re¯ect their relevance for the classi®cation task. Second, the correlations between signals from neighboring electrodes can be used to suppress the noise in individual channels. The weighting problem has been attacked mostly with ad hoc procedures, i.e. the features derived from an electrode are weighted or selected not by a criterion determined from the data directly, but a posteriori by their importance for the classi®er. This is done, for example, by the modi®cation to the Learning Vector Quantization scheme (LVQ) introduced by Pregenzer et al. (1994), in order to render it distinction sensitive (DSLVQ). Another example is Peters et al. (1997), who trained neural networks on the AR coef®cients of each individual channel. Those networks that perform best on a validation set of trials become members

of a committee, the size of which is chosen for optimal performance of the committee. Classi®cation is decided by a `vote' among the committee members. This classi®er works well only when some channels have a high signal-tonoise ratio. The channels left out of the committee are those that bring more noise than signal to the decision-making. In a situation of low signal-to-noise ratios in all channels, this method will fail, even if a high signal-to-noise ratio can be achieved by spatial ®ltering. The possibility to achieve better signal-to-noise ratios by making use of the inherent correlation between neighboring channels has so far not been exploited. On the contrary, one of the commonly extracted features, spectral properties from the individual time-series, no longer contain this a priori information. The method we advocate here is based on a decomposition of the raw signals into spatial patterns that are extracted from the data of two populations of EEGs in a manner that maximizes their differences. These spatial patterns provide a weighting of the electrodes, which is derived directly from the data. We will show how these patterns re¯ect the underlying physiological processes. The method used for the extraction of the patterns from the data is based on the method of Common Spatial Patterns (CSP) which was introduced in the ®eld of EEG analysis by Koles et al. (1990). They used the method to classify normal versus abnormal EEGs (Koles et al., 1994), to both extract abnormal components from EEGs (Koles, 1991), and to localize sources (Koles et al., 1995; Soong and Koles, 1995). In brief, this method takes as input two sets of spatial patterns representing two classes into which other sets of spatial patterns are later to be classi®ed. Below, an element in such a set, a spatial pattern will be the amplitudes of an Nchannel EEG at a given instant in time. A set of patterns may consist of the T spatial patterns that make up the EEG of a single trial, recorded at T consecutive points in time. Or a set may consist of the union of recordings from several trials. The latter is the case for the sets used to calibrate the method, that is, the sets of spatial patterns from which the method extracts the spatial features later used for classi®cation. The former is the case when the EEG of a single trial is to be classi®ed. In either case, we note that time is no more than an index distinguishing different patterns recorded at different times. Due to the temporal high-pass ®ltering, the signals have zero mean. Therefore, average patterns are unsuited to distinguish the classes and covariances have to be used instead. As output, calibration of the method gives an ordered list of characteristic spatial patterns. These characteristic patterns de®ne directions in pattern space that are optimally suited for distinguishing between the two classes. A timeseries of patterns that belongs to either one or the other class, will, after an appropriate transformation, scatter maximally along the ®rst direction and minimally along the last, if it belongs to the ®rst class, and vice versa if it belongs to

J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798 Table 1 The number of artifact-free trials analyzed below, for each of 3 subjects, A4, B6, and B8 and 3 movements, L, R, F, denoting respectively left index ®nger, right index ®nger, and right foot Subject

L

R

F

Total

A4 B6 B8

73 55 37

73 54 48

77 64 39

223 173 124

the other class. The second and the second-to-last directions in the list are the second best directions for this discrimination, and so on for the other directions in the list. Given an EEG, one then treats it as a set of T spatial patterns, projects these onto the most discriminatory characteristic patterns, and calculates the variance of the T values resulting from each projection. These variances are the features on which classi®cation of the EEG is based, using a simple linear classi®er. Typically only a few directions in pattern space, i.e. a few characteristic patterns, are suf®cient for the discrimination task, so these patterns may be thought of as spatial ®lters that select the most relevant spatial aspects for the discrimination task. We thus obtain a drastic reduction in the dimensionality of the problem while making use of the information contained in all channels. The goal of the work presented in this paper was to pioneer the application of optimal spatial ®lters to the task of single trial EEG classi®cation. The data used were recorded during the planning phase of 3 different types of movement; left and right index ®nger movement and right foot movement. Some of the results presented here were reported in preliminary form in MuÈller-Gerking et al., (1997).

2. Methods and materials 2.1. Experiment and data The experimental protocol followed a classical memorized delay task. Three subjects were asked to perform one of 4 movements (pressing a micro-switch with the left or right index ®nger, ¯exing the toes of the right foot, or moving the tongue to the upper gum) after a series of stimuli. Each trial started with a short warning tone (warning stimulus, WS). One second after a WS, a visual cue (CUE) appeared on a computer screen in front of the subject, indicating which movement was to be done. The movements asked for were chosen at random among the 4 movements studied. The actual movement was to be performed only after a third, acoustic stimulus (reaction stimulus, RS) which occurred 2 s after the cue. Reaction times between RS and the actual movement were broadly distributed between 0.75 and 1.75 s. A total of 150 trials of each class of movement were recorded in blocks of 50 trials, at inter-trial intervals of about 12 s. Here, we only use a subset of these data,

789

comprising the movements of left index ®nger (in the sequel denoted by `L') right index ®nger (`R'), and toes (`F'). The recordings were checked for artifacts, leaving a total of 223, 173, and 124 trials for the 3 subjects, (Table 1). The EEG was recorded monopolarly from 56 Ag/AgCl scalp electrodes against a reference electrode ®xed on the tip of the nose. The electrodes were arranged in a grid of 2.5 cm spacing, covering the pre- and post central areas. The EEG was ®ltered between 0.15 and 60 Hz, and sampled at 128 Hz. In addition, horizontal and vertical electro-oculograms (EOGs) were recorded to detect eye-movements. For further details about the experiment and recordings, see Pregenzer et al. (1994), Pfurtscheller et al. (1994a,b). 2.2. Data analysis 2.2.1. The method of Common Spatial Patterns A classi®cation task is usually broken up into two main parts. The ®rst part is the extraction of relevant features that capture the class-invariant characteristics from some trial. The second part is the classi®cation proper, performed on the extracted features. The core of the classi®cation method that we propose here is that the feature extraction is realized by projections of the high-dimensional, spatial-temporal raw signals onto very few speci®cally designed spatial ®lters. These ®lters are designed in such a way that the variances of the resulting signals carry the most discriminative information. The adjunct of the ®lters are called Common Spatial Patterns, and they are obtained from a set of calibration data by the method of CSP. The features for the classi®cation proper are vectors whose elements are the variances of the projected signals. These feature vectors are ®nally classi®ed by simple linear or quadratic Bayes classi®ers, whose parameters are obtained again from the same set of calibration data, after projection onto the CSPs obtained in the ®rst step. An intrinsic limitation of the method of CSPs is that the design of the most discriminative spatial ®lters is possible only for the distinction between two conditions. So, we will ®rst consider only the design of optimal ®lters for this case, and defer the extension to the classi®cation of 3 and more conditions to the end of the next subsection. We begin with two groups A and B of recordings of relevant EEG data, represented as two groups of matrices. The rows of each matrix contain the signals on one recording electrode, while the columns contain the recordings of all electrodes at some particular point in time. The dimension of each matrix is, therefore, N £ T, with N the number of channels and T the number of samples in time. In our case, we take the recording on 56 electrodes during the 500 ms interval immediately preceding movement onset, or 64 samples. Let V i denote such a matrix containing the raw data of trial i. The method of CSPs now ®nds a decomposition of the two groups of recordings into modes that are common to both groups, but maximally suited to distinguish between the groups. Mathematically, the method relies on the simul-

790

J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

taneous diagonalization of two matrices closely related to the covariance matrices (Fukunaga, 1990). To summarize brie¯y, for each of the two groups the columns of all matrices are pooled to form two origin-centered point clouds in an N-dimensional space. It is always possible to ®nd a linear transformation that renders a point cloud isotropic. This transformation is called the `whitening transformation'. It is composed of a change of coordinate frame to the frame de®ned by the point cloud's principal axes, followed by stretching/compression of the cloud along the directions of these axes by amounts changing all of its principal moments to unit value. The essence of the CSP method can now be stated as follows: if the transformation that whitens the union of the two point clouds is applied to the individual clouds, the two resulting clouds will have the same principal axes, and their corresponding eigenvalues will add up to the value one. This is easily seen: since the whitening transformation is a linear operation, it commutes with summation of the two point clouds. So since the whitening transformation turns their union into an isotropic distribution, the union of the individually transformed point clouds is also isotropic. This is only possible if the two individually transformed point clouds complement each other as just described. In consequence, the directions with the largest and the smallest eigenvalues of one of the point clouds will be those with the smallest and largest eigenvalues for the other cloud. Similarly, the directions with the m largest eigenvalues of one of the point clouds will be those with the m smallest eigenvalues for the other cloud, and vice versa. So, in these directions we see the largest differences between the properties of the two clouds, i.e. these directions are optimally

Fig. 1. The most important (upper row) and second most important (lower row) common spatial patterns for the distinction of left ®nger movement from right ®nger movement. Data set B6, ®ltered to 8±30 Hz. Left ®nger movement leads, when compared with right ®nger movement, to enhanced EEG amplitudes over C3, where the right hand representation is located (left column, top). Similarly, right ®nger movement leads to increased EEG amplitudes over the left hand representation (right column). See Section 4.1 for further explanation.

Fig. 2. The most important common spatial patterns for the distinction right ®nger movements (left column) versus right foot movements (right column). Same notations as in Fig. 1. Right ®nger movements show comparatively high EEG amplitudes over mid-central areas (left column, bottom), while movements of the right foot are characterized by enhanced amplitudes over C3.

suited to project data on, when the task is to discriminate whether the data has the statistics of one cloud or the other. So an additional rotation onto these directions yields the ®nal transformation sought. For a detailed mathematical discussion see Appendix A. The decomposition of a trial V i can be written as h i21 V i ˆ P² Z i

…1†

that is, each raw EEG V i is represented as a linear combination of time-invariant modes, the CSP in the N columns of matrix [P ²] 21, with the new time series Z i as expansion coef®cients. [P ²] 21 is a square N £ N matrix, and Z i has the same dimensions as V i, namely, N £ T. As described above, the method of CSPs determines the modes in such a way that the variance of the ®rst row of Z i is maximal for the trials of group A and at the same time minimal for the trials of group B. On the other hand, the variance of the last row of Z i is minimal for the trials of group A, and at the same time maximal for the trials of group B. Similar, but weaker properties characterize the variance of the ®rst and last few rows of Z i. This property of the decomposition into CSPs ensures that the variances of the ®rst and last rows of Z i contain the most relevant information for discriminating between the two conditions A and B. Note that the variance of a row of Z i can be high only if the spatial distribution of the raw signals' amplitudes is similar to the corresponding pattern, or mode. Since the CSPs are obtained from movement-related EEG, these patterns themselves provide information about the characteristic EEG amplitude distribution relative to a comparison of two types of movement. Examples of the most discriminative CSPs for the pairwise comparison of ®nger and foot movements are shown in Figs. 1±3. See Appendix A and the ¯ow-diagram in Fig. 4 for a detailed mathematical discussion.

J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

791

trial is counted as `L' only if both the L-R and the L-F classi®cation returns `L'.

Fig. 3. Same as Fig. 2, but for the distinction left ®nger movements versus foot movements. In this comparison, right foot movements lead to higher EEG amplitudes over C4, while movements of the left ®nger shows higher amplitudes over mid-central areas, as did right ®nger movement in Fig. 2, where that was the ®nger movement to be distinguished from foot movement. See Section 4.1 for further explanation.

2.2.3. Cross-validation statistics In order to test the method optimally on the data available to us, we performed the usual cross-validation procedure, i.e. we repeatedly separated the whole data set into calibration- and test-sets, and classi®ed the data from the test set with the classi®er obtained from the calibration set. A set of calibration data was selected at random from the whole data set, using about 80% of the data for this purpose, and the projection matrix P ² was calculated from this subset. More precisely, we randomly selected 45 trials in the case of subjects A4 and B6 (15 trials from each movement class), and 30 trials in the case of subject B8 for testing. From the remaining trials we randomly selected the maximal number

2.2.2. Classi®cation using the method of CSP The features we use for classi®cation are obtained from the variances of the ®rst and last m rows of the expansion coef®cients Z i, since, by construction, they are the best suited to distinguish between the two conditions. From a set of reference or calibration data for two conditions we calculate the projection matrix P ² as described above. From this matrix we retain the ®rst and last m rows. They are our spatial ®lters onto which all reference data are projected. Let varpi be the variance of the p-th row of Z i, i.e. the variance of the expansion to mode p. The feature vector for trial i is composed of the 2 m variances varpi for p running from 1...m and from N 2 m 1 1¼N, normalized by the total variance of the projections retained, and log-transformed, 1 0 fpi

C B B varip C C B ˆ logB 2m C A @ P i varp

…2†

pˆ1

The transformation to logarithmic values is done in order to make the distribution of the elements in f i normal. The feature vectors from the calibration data are used to estimate the parameters of a linear (or quadratic) Bayesian classi®er (Fukunaga, 1990). To classify new data, we again obtain the feature vector Eq. (2) for the new data using the spatial ®lters obtained from the calibration data, and feed this feature vector into the classi®er. See Appendix B and Fig. 5 for more details on the classi®cation procedure. The method of CSPs only works for the distinction of two groups of data. For 3 and more conditions, we therefore perform independent pairwise classi®cations between all conditions in the way described above. In our case, we classify L versus R, L versus F, and R versus F. A trial is recognized for some of the classes, only if that class obtained the majority in all pairwise classi®cations. Otherwise, the trial is classi®ed as indecisive. So, in our case a

Fig. 4. Illustration of how the classi®ers are obtained from the calibration data in two steps. The ®rst step obtains the most discriminating spatial ®lters from two sets of calibration data by the method of CSPs (a). These spatial ®lters are subsequently used to transform the same calibration data into feature space. The two sets of feature vectors obtained in this way are used to build a linear or quadratic classi®er (b). Mathematical details are explained in Appendix A. The complete classi®er consists of the spatial ®lters (*) and the classi®ers for the feature vectors (**). Both are obtained from the same set of calibration data.

792

J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

distant next-nearest neighbors. Such electrodes in the monopolar montage that do not have 4 equidistant neighbors (boundary electrodes) are omitted from further processing. Therefore, in the case of the small Laplacian derivation, we obtain 30 re-referenced electrodes from the original 56 ones. In the case of the large Laplacian, ten channels remain. Projection to spatial patterns obtained from all monopolarly recorded channels amounts to a common average reference, since the patterns are reference-free up to a constant. Alternatively we also applied the CSP method to data that have ®rst been transformed to a Laplacian derivation.

Fig. 5. The process of classi®cation of some EEG recording. After temporal ®ltering of all individual channels, the spatial-temporal recording is ®ltered spatially by scalar multiplication with the 2m most discriminative spatial patterns (*), obtained in step (a) of the classi®er design procedure (see Fig. 4). We obtain the feature vector by calculating the log of the normalized variances of the thus obtained expansion coef®cients. This 2m dimensional feature vector is classi®ed using the linear or quadratic classi®er (**) from step (b) of the classi®er design procedure. For the discrimination of 3 classes, use the same scheme for each pair of conditions.

of trials for calibration in such a way that the calibration sets were of equal size for all classes. The surplus trials were left out of this run of the cross-validation. All data were then transformed into feature space, a classi®er was built on calibration data in feature space, and the test data were classi®ed. The classi®cation rates reported here were averaged over 50 repetitions of this procedure. 2.2.4. Reference electrode An important point in EEG analysis is the question of the reference electrode. Pfurtscheller reports that the desynchronization of sensorimotor rhythms is enhanced and best localized when using the surface Laplacian derivation (Pfurtscheller, 1988). Kuhlman already reported that m activity was rarely seen in referential recordings and that closely spaced bipolar derivations should be used instead (Kuhlman, 1978). Most recently, McFarland et al. (1997) compared different electrode references, performance in EEG classi®cation, and found either common-averagereference or large Laplacian derivation to yield the best results. The common-average-reference is obtained by re-referencing each electrode to the mean of all electrodes. The Laplacian derivation is calculated by Hjorth's method (Hjorth, 1975), i.e. each electrode is re-referenced to the average over its 4 nearest neighbors, assuming the electrodes are arranged in the pattern of a square lattice, as in the ®gures below. In contrast to this small Laplacian, the large Laplacian is obtained by re-referencing to the 4 equi-

2.2.5. Temporal ®ltering The common spatial patterns are sensitive only to the spatial distribution of the variances of the EEGs from the two populations; all time structure has been lost by the averaging. The only information contained in those patterns is where the variance of the EEG varies most when comparing two conditions. Given the organization of the cortical motor areas, this is the most relevant information when we try to distinguish the types of movement in our experiment: we expect the ®nger movements to mainly activate the respective contralateral hand area, while foot movement should mainly activate the medial foot area. However, as already stated, the effects of these activations on the measured EEG are not re¯ected to the same degree in all frequency bands. In order to make use of this knowledge, we worked with data which had been ®ltered to various frequency bands. With ®ltered data, the common spatial patterns provide insight into how the typical spatial distribution of EEG variance looks for speci®c frequency bands. Classi®cation accuracy was estimated for data ®ltered in 6 different frequency bands: a (8±12 Hz), lower a (La, 8±10 Hz); upper a (Ua, 10±12 Hz); b (19±26 Hz); g (38±42 Hz), and a broad band of 8±30 Hz. The ®lters we use are 25-point wide ideal-band-pass ®lters, which are convoluted with the signals (Percival and Walden, 1993). These ®nite impulse response ®lters are causal, i.e. they use only data prior in time. Times given below are relative to the un®ltered recordings. 2.2.6. Implementation All analysis and classi®cation routines were implemented in LISP using Tierney's freely available XLISP-STAT program (Tierney, 1990). For ®ltering, we used slightly modi®ed routines from Percival and Walden's SAPA package (Percival and Walden, 1993). 3. Results In this paper, we concentrate on the time segment 0.5±1 s after RS. For most trials, this segment immediately precedes the actual movement; in some trials, movement actually happens within this time window. Classi®cation rates as functions of experimental time will be presented else-

J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798 Table 2 The recognition rates (%, rounded) and sample standard deviations (in parentheses) for 3 types of movement, based on features from the 4 most important common spatial patterns. Data with common reference ®ltered into different frequency bands. Bayes linear classi®er Subject

a

La

Ua

b

g

8±30 Hz

A4 B6 B8

89 (4) 82 (5) 69 (7)

84 (6) 81 (5) 55 (9)

90 (4) 81 (5) 73 (8)

79 (5) 83 (4) 60 (8)

64 (7) 73 (7) 35 (9)

91 (4) 90 (4) 82 (6)

where (MuÈller-Gerking, Pfurtscheller, and Flyvbjerg, in preparation). 3.1. Common Spatial Patterns Figs. 1±3 show the most important common spatial patterns for the pairwise distinction of all types of movement. Fig. 1 presents the four most distinctive patterns for the discrimination of left versus right ®nger movements, Fig. 2 shows those for the discrimination of right ®nger versus foot movement, and Fig. 3 indicates those for the distinction left ®nger versus foot movements. These patterns were obtained with all data from subject B6, time-segment 0.5±1 s after RS, and ®ltered in the wide range 8 to 30 Hz. The area shown is covered by the 56 recording electrodes, which are arranged in a rectangular grid, 11 in each row. The third row of electrodes (from bottom) comprises the central electrodes; the missing frontal electrodes are left blank. Each pattern has electrode positions marked with small squares, except for 4 electrode positions marked with larger black dots. The latter 4 positions correspond to C3, Cz, C4 and Fz in the international 10-20 system. The contour maps have been obtained by interpolation. Within each pattern, the zero line is rarely crossed, and then only by very small amounts. Due to this, and the fact that the sign of the pattern is irrelevant, the textures representing the isovalues were chosen symmetrical around zero. This gave clearer pictures. The top row of each ®gure shows the most important pattern for the conditions to be discriminated (corresponding to ®rst and last column of matrix [P ²] 21 in Eqs. (1) and (14). The second row shows the second most important patterns (second and second to last columns of matrix [P ²] 21) The left column in each ®gure shows the two patterns that are most important for the characterization of the ®rst condition, the right column shows the patterns most important for the characterization of the second condition. In Fig. 1, we see that the most important pattern for distinguishing condition L from condition R ± i.e. the upper left pattern ± shows a ¯at region over the right hemisphere, where the sensorimotor representation of the left hand is located and a strong spot of increased EEG activity right over C3, where the sensorimotor representation of the right hand is approximately located. The two most important patterns for distinguishing condition R from condition L

793

show a ¯at region of EEG signal over the left cortical areas, and strongly enhanced activity on the right hemisphere, in the one pattern frontal to C4 (right column, top), in the other more parietal (right column, bottom). The second most important pattern for condition L is less conclusive. In brief, when comparing the EEG amplitudes for the conditions L versus R, we ®nd the characteristic patterns to show enhanced amplitudes over the respective ipsilateral somatosensory representations. In the distinction of condition R to condition F, shown in Fig. 2, the most important pattern for R shows a diffusely increased amplitude over the right hemisphere shifted towards parietal areas (left column, top), while the second pattern shows strongly enhanced EEG amplitude over midcentral sites, with a maximum at Cz (left column, bottom). The corresponding ®rst pattern for condition F shows highly enhanced amplitude over the left somatosensory area with a maximum at electrode C3 (right column, top). The second pattern of condition F is less conclusive (right column, bottom). So, in the comparison R versus F, movement of the right index ®nger is characterized by enhanced EEG amplitude over central or right post-central areas, while movement of the toes is characterize by increased amplitudes over the left hand representation in sensorimotor areas. Fig. 3 concludes this series with the comparison L±F. The most important patterns in the top row show the mirrored behavior to the R±F patterns in Fig. 2: movement of the left ®nger results in highly enhanced EEG amplitude over the central area, while movement of the toes is characterized by greater EEG amplitudes near and over C4. The second most important patterns show only little modulation. Note, that since the quantity of interest is the variance of the raw signals projected onto the respective pattern, the sign of the modulation does not play any role; the only relevant feature is the magnitude and location of the modulation. The patterns obtained from data of subjects A4 and B8 are qualitatively the same. Patterns obtained from the data ®ltered to the lower a , upper a , and b bands all show the same general behavior, justifying our use of the wide frequency range 8 to 30 Hz. 3.2. Classi®cation results 3.2.1. In¯uence of the frequency band Table 2 shows the mean recognition rates (in percent, rounded) obtained for discrimination of 3 types of movement in different frequency bands. Values in parentheses are sample standard deviations over 50 repetitions of the crossvalidation procedure (see Section 2.2.3). Features are from projections of the ear-referenced data to the 4 most important spatial patterns (m ˆ 2 in Eq. 2). In most cases, this led to best performance. The best results (91 ^ 4, 90 ^ 4 and 82 ^ 6% for subjects A4, B6 and B8, respectively) were obtained with a broad band-pass of 8±30 Hz. In the case of subject A4, however, the information contained in the upper

J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

794

Table 3 The recognition rates (%, rounded) and sample standard deviations (in parentheses) for 3 types of movement, based on features from the 4 most important common spatial patterns. CSP method on data with common reference (column CSP-CR), small Laplacian derivation (CSP-SL) and large Laplacian derivation (CSP-LL). Columns VAR-SL and VAR-LL list recognition rates with variances on channels C3, Cz and C4 as features, referenced to small and large Laplacian, respectively. Bayes linear classi®er Subject

CSP-CR

CSP-SL

CSP-LL

VAR-SL

VAR-LL

A4 B6 B8

91 (4) 90 (4) 82 (6)

94 (4) 90 (4) 84 (7)

94 (3) 89 (4) 79 (7)

87 (4) 74 (6) 66 (8)

88 (4) 85 (5) 77 (7)

alpha band (10±12 Hz) leads to classi®cation results almost as good as those obtained in the 8 ±30 Hz band (90 ^ 4 vs. 91 ^ 4%). In the case of the two other subjects taking the broad band clearly improves the result. 3.2.2. In¯uence of the electrode reference Table 3 shows a comparison of ®ve classi®cation schemes for the data ®ltered in the wide band 8±30 Hz. The ®rst column (CSP-CR) repeats the last column of Table 2 for ease of comparison. Column CSP-SL lists the recognition rates obtained with the CSP method for data after application of the small Laplacian ®lter, while column CSP-LL reports these rates for the case of data ®ltered by the large Laplacian. In order to compare our method with a simple alternative, columns VAR-SL and VAR-LL list the classi®cation rates obtained with the logarithm of the variances in the band 8±30 Hz of channels C3, Cz, and C4 as features in a linear classi®er, after application of the small and large Laplacian, respectively. This corresponds approximately to the approach taken by McFarland et al. (1997). Overall, the CSP method on data pre-®ltered by the small Laplacian performs best, followed by the CSP method applied to the common referenced data. When we take the variances of the channels overlying sensorimotor areas as features, we see that indeed the large Laplacian references leads to sensible improvements over the small Laplacian, as has been found by McFarland et al. (1997). Taking the CSPs as additional spatial ®lters improves the classi®cation rates by 5±7% over the simple alternative. Similarly, Table 4 shows the recognition rates for the same classi®cation methods as in Table 3, but for movements L and R only. Here, columns VAR-SL and VAR-LL are based on the variances of channels C3 and C4 only. The Table 4 Same notations as in Table 3, but for two-class (L-R) discriminations. VAR-SL and VAR-LL are based on channels C3 and C4 only Subject

CSP-CR

CSP-SL

CSP-LL

VAR-SL

VAR-LL

A4 B6 B8

92 (5) 94 (4) 86 (7)

94 (5) 93 (4) 88 (7)

93 (4) 92 (5) 82 (9)

86 (6) 84 (6) 72 (10)

87 (6) 90 (4) 81 (7)

improvement compared with the simple method is sensible for subjects A4 and B8. 3.2.3. In¯uence of the numbers of patterns The main free parameter of the classi®cation scheme is the number of projections to common spatial patterns used to build the feature vector. Table 5 lists the mean scores for all subjects and two frequency ranges, using different numbers of common spatial patterns: from 2 £ 1 to 2 £ 5 (m ˆ 1 to m ˆ 5). These results show that it barely matters how many patterns are used when the number falls in the range shown. But m ˆ 2 and m ˆ 3 seems a slightly better choice than the other values shown. 3.2.4. Linear versus quadratic Bayes classi®er We also tested a Bayes quadratic classi®er against the linear one discussed so far. Classi®cation successes turned out to be almost identical between the two, with the linear classi®er giving the better result with few exceptions. Therefore, we give no results for the Bayes quadratic classi®er here. 4. Discussion 4.1. Spatial patterns The most discriminative spatial patterns shown in Figs. 1±3 are easily related to the ERD patterns found with movement preparation and execution (e.g. Pfurtscheller et al., 1996). For example, Fig. 1 shows that movements of the left index ®nger as compared with the movements of the right index ®nger are characterized by increased EEG activity on the electrodes overlying the ipsilateral hand representations of sensorimotor cortex, and vice versa for right ®nger movements. Stated equivalently, movements of an index ®nger lead to comparatively little EEG amplitude over the contralateral hand representation within sensorimotor cortex. Comparing ®nger movements to movements of the toes, we see that ®nger movement is now characterized by comparatively high EEG amplitudes over mid-central areas, while foot movements lead to high EEG amplitudes over the right (left) sensorimotor cortex, when compared with movements of the left (right) index ®nger. Turned around, we see that foot movement leads to comparatively little EEG activity over mid-central areas, and ®nger movements are characterized by decreased EEG amplitudes over the respective contralateral areas. It is also seen that ®nger movements lead to more localized patterns than movements of the foot. This is due to the fact that hand representations in sensorimotor areas are found on the cortical surface, while foot representations are deep in the medial ®ssure, leading to a wider spreading over the scalp of these cortical signals. These patterns are the basis for our feature extraction. Used as spatial ®lters, these patterns lead to new signals

J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

795

Table 5 The recognition rates as function of the number of common spatial patterns (®rst column) for two frequency ranges. Data referenced to small Laplacian. # 2 2 2 2 2

£ £ £ £ £

1 2 3 4 5

A4 Ua

A4 8±30 Hz

B6 Ua

B6 8±30 Hz

B8 Ua

B8 8±30 Hz

91 92 93 93 93

93 (4) 94 (4) 94 (4) 94 (4) 94 (4)

78 (5) 81 (5) 82 (5) 81 (5) 80 (5)

89 (4) 90 (4) 90 (4) 88 (4) 87 (4)

73 (8) 75 (8) 74 (6) 73 (8) 72 (8)

82 (8) 84 (7) 82 (6) 81 (7) 81 (6)

(4) (4) (4) (3) (3)

that are linear combinations or weighted averages of the recordings from all electrodes in such a way that the amplitudes of the new signals vary maximally between pairs of conditions. The weighting coef®cients are obtained automatically from the calibration data with respect to the importance of each electrode in discriminating two conditions. Compared with other methods of selection and weighting of electrodes, the method proposed here does not need their extra classi®cation step used for ®nding the relevant electrodes. The results, however, are compatible. Comparing, for example, our Fig. 1 with Fig. 3 in Pregenzer et al. (1994), we see that the latter is similar to a merge of the most important CSPs for the discrimination of the two ®nger movements. While ®nding the same areas important for classi®cation, our method also ®nds the classi®cation relevant features of the signal from these areas. 4.2. Recognition rates The recognition rates that we obtained are better than those obtained previously with other methods. In the case of discrimination of 3 brain states (preparation of left hand, right hand and foot movement) we found a classi®cation rate of 84±94% for the 3 subjects (randomness level for the discrimination of 3 states is 33.3%). In a similar paradigm with feedback presentation, Kalcher et al. (1996) reported a classi®cation accuracy of 60% for 3 class discriminators and 3 bipolar EEG channels. Differentiation between two brain states (preparation of left vs. right hand movement increased the classi®cation rate to 78±88% (Pregenzer et al., 1994). This comparison between our results obtained with spatial pre®ltering and linear discrimination and the results obtained with single-trial band power estimates and DSLVQ algorithm (Pregenzer et al., 1994) for a two-class problem shows that higher classi®cation rates can be achieved with the spatial ®ltering technique. We have to acknowledge, however, that there is a high variability of the classi®cation rates between subjects, and even within the same dataset, e.g. when varying the time segments' length or the segments' position in time. A fair comparison of methods is, therefore, dif®cult in general, and a comparison of recognition rates alone will not provide full insight into some methods' capabilities. With this reservation, we believe that the comparison of methods presented here gives a valid account of the methods' abilities. Initially, we were surprised to ®nd that the time evolution

of the EEG is of almost no importance as long as we restrict the signals to the broad frequency band of 8±30 Hz. Inspection of the EEG data, however, demonstrated that in all 3 subjects, the desynchronization was present in both the alpha and beta bands. Therefore, a good classi®cation result for the broad band could be expected. This broad band desynchronization is also the reason why almost as good recognition rates were observed with the simple alternative method and only two or 3 channels. It must be kept in mind that many subjects do not display a signi®cant ERD in the alpha or beta band. The fact that the quadratic classi®er performed slightly worse than the linear one, suggests that the data are very well separable by a linear separator. More sophisticated classi®cation schemes, such as neural networks, will most probably not only introduce unnecessary complications and computational demands, but will often also perform worse, because of the bias-variance dilemma. Comparison of a linear classi®er with a neural network based classi®er, both applied to single-trial EEG data recorded during right and left hand movement, revealed similar results (Lugger et al., 1998). The method we propose here is more complicated than the simple alternatives considered in Tables 3 and 4. However, there is ®rstly a sensible gain in recognition accuracy. Second, the increased effort does not induce longer processing times for classi®cation. Once the characteristic patterns are calculated, the only additional processing required are a few scalar products, which any signal processing hardware can perform in real time. A software solution would induce at most a minimal delay. In particular for twoclass discriminations, e.g. right versus left ®nger movements, the additional effort is negligible. Thus, this method seems quite a promising ingredient for a system for online classi®cation. Furthermore, the method is highly robust to changes in its parameters. Therefore is no need for ®netuning to a particular data set, a fact which suggests the use of this method as a standard tool. The only problem that we encountered using our method is connected with the estimation of the covariance matrices. Evidently, these matrices are crucial for the derivation of the spatial ®lters. So far, we only used the sample covariance as estimator. This estimator is non-robust with unbounded in¯uence function, which basically means that a single misbehaving trial or even a recording channel in the calibration set can make its results meaningless. Obviously, the chance of some recording artifact increases linearly with the

796

J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

number of channels. Temporal ®ltering, in particular the lower cut off, alleviates some common sources of EEG artifacts, such as sudden shifts in DC level. Nevertheless, the use of a robust estimator would be highly advisable. A second dif®culty arises from the high dimension of these covariance matrices. The number of parameters to be estimated goes with the square of the number of recording channels. In consequence, recording from a larger number of electrodes does not necessarily lead to better recognition results, simply due to the number of parameters that have to be estimated rises much faster than the information gained by additional electrodes. The solution to this dif®culty would be a structured estimator that explicitly uses a priori information about the correlations of neighboring electrodes. Currently such a robust structured estimator of the spatial covariance, especially tailored for surface EEG recordings, does not exist. Its development would be highly desirable. To conclude, we have demonstrated a method to devise spatial ®lters that extract features relevant for the classi®cation of movement-related multi-channel EEG recordings. These ®lters are optimal, in the sense that data projected onto these ®lters differ maximally (in the least squares sense) between pairwise conditions. While already providing excellent recognition rates, the method presented here probably still has margins of improvement, in particular in the use of robust statistics and the use of a tailored estimator for the spatial covariance matrix.

it reads Ria ˆ

Vai Vai²   trace Vai Vai²

…3†

The matrix products in this expression amounts to averaging over time. Let Rbi denote the corresponding normalized covariance matrices for the trials of condition b. The normalization is done in order to eliminate trial-to-trial variations in the absolute values of the moments. Next, the normalized covariance matrices are averaged over trials, Ra ˆ kRia ltrials

…4†

and Rb equivalently for condition B. We then build the composite covariance matrix Rc ˆ Ra 1 Rb

…5†

which can be factored into its eigenvectors by Rc ˆ Bc lB²c

…6†

where Bc is an N £ N matrix of normalized eigenvectors, BcBc² ˆ 1N£N, and l is the corresponding diagonal N £ N matrix of eigenvalues. The whitening transformation W ˆ l21=2 B²c

…7†

equalizes the variances in the space spanned by the eigenvectors in Bc. Now (Fukunaga, 1990), if the individual covariance matrices Ra and Rb are transformed by

Acknowledgements

Sa ˆ WRa W ² and Sb ˆ WRb W ²

J.M.G. gratefully acknowledges fruitful discussions with P. Grassberger, J. Martinerie, C. Neuper, B. Peters, B. Renault and F. Varela. H.F. thanks W. Bialek for useful discussions. The research was partially supported by the `Fonds zur FoÈrderung der wissenschaftlichen Forschung' in Austria, project P11208MED.

then Sa and Sb share the same eigenvectors, since Sa 1 Sb ˆ WRcW² ˆ 1N£N. That is, if the eigendecomposition of Sa reads

Appendix A. The method of Common Spatial Patterns We describe here the mathematical part of the method of Common Spatial Patterns as used in the present article. See Fig. 4 for a ¯ow-diagram of the procedure of classi®er design, and Fig. 5 for ¯ow-diagram of the process of classi®cation. Let Vai denote the raw data of trial i, condition a represented as an N £ T matrix with N the number of channels and T the number of samples in time. Thus, the recording at a given point in time can be represented as point in N-dimensional Euclidean space, and one EEG can be seen as a distribution of T such points. Since the constant part of the EEGs has been removed by frequency ®ltering, the mean of this distribution is zero. So the ®rst place to look for characteristic information is in its second moments, or covariance matrix. After normalizing with its total variance,

Sa ˆ U c a U ²

…8†

…9†

with orthonormal U, then S b ˆ U cb U ²

…10†

and the corresponding eigenvalues for the two matrices sum up to 1

ca 1 cb ˆ I

…11†

In consequence, the projection of whitened EEG epochs on U will give us feature vectors that are optimal in the least squares sense for discriminating between the two populations. With the projection matrix P² ˆ U ² W

…12†

the decomposition of each trial writes Z i ˆ P² V i

…13†

Inverting this equation h i21 V i ˆ P² Z i

…14†

J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

we see that each raw EEG is decomposed into the set of CSPs (columns of [P ²] 2l) with the new time series Zi as expansion coef®cients. The CSPs can, therefore, be seen as source distribution matrix with Zi the corresponding source wave form matrix. Appendix B. Classi®er design We recall brie¯y the equations of the quadratic and linear Bayes classi®er for Gaussian distributed data that we used to ! obtain the present results. Let M i be the center of the calibration vectors for class i, Si their sample covariance matrix, and Pi the a priori probability of this class. In our particular case, we perform only two-class discriminations on this level of processing, and the a priori probabilities are equal. But the equations hold for any number of classes. The classi®cation rule for the quadratic Bayes classi®er can be written in distance notation. That is, we de®ne a ! distance between a sample point x that is to be classi®ed, and each of the classes i:  ² X   ! ! 1 ! 1 X 21 ~x 2 M i 1 lnu i u 2 lnPi x 2Mi diQ ˆ i 2 2 …15† where ² means the transpose, and u´udenotes the determinant. x~ is classi®ed into that class i, for which diQ is minimal over all classes. Eq. (15) can be linearized by setting Si ˆ S for all classes, i.e. estimating S as the covariance matrix of all calibration vectors from all classes. A simple calculation reduces Eq. 15 to ! X 1 ! X 21 ! 21 ! x 2 M ²i M i 1 lnPi …16† diL ˆ M ²i 2 !

In the linearized case, some sample point x is classi®ed into that class i, for which diL is maximized (due to a change in sign). For more information about classi®er design, see Fukunaga (1990). References Chatrian GE, Petersen MC, Lazarete JA. The blocking of the rolandic wicket rhythm and some central changes related to movement. Electroenceph clin Neurophysiol 1959;11:497±510. Farwell LA, Donchin E. Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroenceph clin Neurophysiol 1988;70:510±523. Fukunaga K. Introduction to statistical pattern recognition. second ed.. Boston: Academic Press, 1990. Gastaut H. EÂtude eÂlectrocorticographique de la reÂactivite des rythmes rolandique. Rev Neurol (Paris) 1952;87:176±182. Hjorth B. An on-line transformation of EEG scalp potentials into orthogonal source derivations. Electroenceph clin Neurophysiol 1975;39:526± 530. Kalcher J, Flotzinger D, Neuper C, GoÈlly S, Pfurtscheller G. Graz brain± computer interface II: towards communication between humans and computers based on online classi®cation of three different EEG patterns. Med Biol Eng Comput 1996;34:1±7.

797

Koles ZJ. The quantitative extraction and topographic mapping of the abnormal components in the clinical EEG. Electroenceph clin Neurophysiol 1991;79:440±447. Koles ZJ, Lazar MS, Zhou SZ. Spatial patterns underlying population differences in the background EEG. Brain Topogr 1990;2(4):275± 284. Koles ZJ, Lind JC, Flor-Henry P. Spatial patterns in the background EEG underlying mental disease in man. Electroenceph clin Neurophysiol 1994;91:319±328. Koles ZJ, Lind JC, Soong ACK. Spatio-temporal decomposition of the EEG: a general approach to the isolation and localization of sources. Electroenceph clin Neurophysiol 1995;95:219±230. Kuhlman WN. Functional topography of the human mu rhythm. Electroenceph clin Neurophysiol 1978;44:83±93. Lugger K, Flotzinger D, SchloÈgl A, Pregenzer M, Pfurtscheller G. Feature extraction for on-line EEG classi®cation using principal components and linear discriminants. Med Biol Eng Comp 1998;36:309±314. McFarland DJ, McCane LM, David SV, Wolpaw JR. Spatial ®lter selection for EEG-based communication. Electroenceph clin Neurophysiol 1997;103:386±394. MuÈller-Gerking J, Pfurtscheller G, Flyvbjerg H. Prompt classi®cation of EEG signals based on their spatial distribution prior to movement. Soc Neurosci Abstr 1997;23(2):1278. Nirenberg LM, Hanley J, Stear EB. A new approach to prosthetic control: EEG motor signal tracking with an adaptively designed phase-locked loop. IEEE Trans Biomed Eng 1971;18(6):389±398. Percival DB, Walden AT. Spectral analysis for physical applications: multitaper and conventional univariate techniques. Cambridge : Cambridge University Press. Package available from http://lib.stat.cmu.edu/sapaclisp/. Peters BO, Pfurtscheller G, Flyvbjerg H. Prompt recognition of brain states by their EEG signals. Theory Biosci 1997;116:290±301. Pfurtscheller G. Mapping of event-related desynchronization and type of derivation. Electroenceph clin Neurophysiol 1988;70:190±193. Pfurtscheller G. Event-related synchronization (ERS): an electrophysiological correlate of cortical areas at rest. Electroenceph clin Neurophysiol 1992;83:62±69. Pfurtscheller G, Aranibar A. Evaluation of event-related desynchronization (ERD) preceding and following self-paced movement. Electroenceph clin Neurophysiol 1979;46:138±146. Pfurtscheller G, Flotzinger D, Neuper C. Differentiation between ®nger, toe and tongue movement in man based on 40 Hz EEG. Electroenceph clin Neurophysiol 1994a;90:456±460. Pfurtscheller G, Pregenzer M, Neuper C. Visualization of sensorimotor areas involved in preparation for hand movement based on classi®cation of m and central b rhythms in single EEG trials in man. Neurosci Lett. 1994b;181:43±46. Pfurtscheller G, StancaÂk Jr A, Neuper C. Event-related synchronization (ERS) in the alpha band-an electrophysiological correlate of cortical idling: a review. Int J Psychophysiol 1996;24(1/2):39±46. Pfurtscheller G, Neuper C, Flotzinger D, Pregenzer M. EEG-based discrimination between imagination of right and left hand movement. Electroenceph clin Neurophysiol 1997;103(5):1±10. Pregenzer M, Pfurtscheller G, Flotzinger D. Selection of electrode positions for an EEG-based Brain Computer Interface (BCI). Biomed. Technik 1994;39(10):264±269. Soong ACK, Koles ZJ. Principal-component localization of the sources of the background EEG. IEEE Trans Biomed Eng 1995;42(1):59±67. Steriade M, Llinas RR. The functional states of the thalamus and the associated neuronal interplay. Physiol Rev 1988;68(3):649±742. Sutter, E.E. Tran, D. Communication through visually induced electrical brain responses. In: Proceedings of the 2nd International Conference: È sterComputers for Handicapped Persons, vol. 55. Schriftenreihe der O reichischen Computer Gesellschaft, 1990;279-288. Tierney L. LISP-STAT an object-oriented environment for statistical computing and dynamic graphics. New York: John Wiley. Program available from http://stat.umn.edu/ , luke/xls/xlsinfo/xlsinfo.html.

798

J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

Toro C, Deuschl G, Thatcher R, Sato S, Kufta C, Hallett M. Event-related desynchronization and movement-related cortical potentials on the ECoG and EEG. Electroenceph clin Neurophysiol 1994;93:380±389. Wolpaw JR, McFarland DJ. Multichannel EEG-based brain-computer communication. Electroenceph clin Neurophysiol 1994;90:444±449. Wolpaw JR, McFarland DJ, Neat GW, Forneris CA. An EEG-based brain

computer interface for cursor control. Electroenceph clin Neurophysiol 1991;78:252±259. Yingling CD, Skinner JE. Gating of thalamic input to cerebral cortex by nucleus; reticularis thalami. In: Desmedt JE, editor. Attention, voluntary contractions and ERPs, Karger: Basel, 1977. pp. 70±96.