Journal of Neuroscience Methods 116 (2002) 179 /187 www.elsevier.com/locate/jneumeth
Classification of neural signals by a generalized correlation classifier based on radial basis functions Alexander Kremper *, Thomas Schanze, Reinhard Eckhorn Neurophysics Group, Physics Department, Philipps-University, Renthof 7, D-35032 Marburg, Germany Received 4 October 2001; received in revised form 28 February 2002; accepted 1 March 2002
Abstract A common problem in neuroscience is to identify the features by which a set of measurements can be segregated into different classes, for example into different responses to sensory stimuli. A main difficulty is that the derived distributions are often highdimensional and complex. Many multivariate analysis techniques, therefore, aim to find a simpler low-dimensional representation. Most of them either involve huge efforts in implementation and data handling or ignore important structures and relationships within the original data. We developed a dimension reduction method by means of radial basis functions (RBF), where only a system of linear equations has to be solved. We show that this approach can be regarded as an extension of a linear correlationbased classifier. The validity and reliability of this technique is demonstrated on artificial data sets. Its practical relevance is further confirmed by discriminating recordings from monkey visual cortex evoked by different stimuli. # 2002 Elsevier Science B.V. All rights reserved. Keywords: Multi-channel analysis; Dimension reduction; Discriminant analysis; Receiver operating characteristic; Visual cortex
1. Introduction A common task in neuroscience is to identify features responsible for segregating sets of observations into different groups (Hotelling, 1936). For example, multiple-site responses in visual cortex to identical stimuli shall be classified into those leading to correct and false perceptions. To achieve the most effective processing of sensory data it is important to identify the neural response components associated with the perceptual parameters. With conventional methods this requires many stimulus repetitions leading to huge data sets. Instead of working with the original high-dimensional distributions people often try to perform a suitable transformation to a lower dimension. Occasionally, it can be useful or even necessary to reduce the dimension of the data to a manageable size, while keeping as much of the original information as possible, and then feed the reduced data to a discriminant analysis. Often, a * Corresponding author. Tel.: /49-6421-2824186; fax: /49-64212827034. E-mail address:
[email protected] (A. Kremper).
phenomenon that appears high-dimensional and complex can actually be captured by few variables. Among the numerous published techniques for dimension reduction, multidimensional scaling (MDS) and principal component analysis (PCA) are mostly used (review: Carreira-Perpina, 1997). MDS seeks to find a low-dimensional representation usually in 2- or 3dimensions so that the inter-sample distances are as close as possible in a least-square sense to the given dissimilarities (Sammon, 1969), which are not necessarily defined by a metric (Borg and Groenen, 1997). PCA, on the other hand, performs a linear variance maximizing transformation. It is possible to perform some sort of dimension reduction by selecting only a few principal components (Jolliffe, 1986; Oja, 1989). However, a major shortcoming of both methods is their computational burden depending on the number of observations (Carreira-Perpina, 1997). Furthermore, it is not clear to what extent the dimension should be reduced, that is, how many principal components are relevant (Oja, 1989). We propose radial basis functions (RBF) for dimension reduction, where the effort increases only linearly with the number of observations. As with MDS, the
0165-0270/02/$ - see front matter # 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 5 - 0 2 7 0 ( 0 2 ) 0 0 0 4 1 - 9
180
A. Kremper et al. / Journal of Neuroscience Methods 116 (2002) 179 /187
starting point of RBF are the distances between observations. For the nonlinear transformation of the high-dimensional distributions into real numbers (as with PCA) a set of basis/feature vectors plays the decisive role. After the observations have been converted into real values these 1-dimensional distributions can be easily analyzed by common statistical methods. Since we are primarily interested in discrimination and state description relative to a reference we use a measure from signal detection theory called receiver operating characteristic (ROC) (Green and Swets, 1966; Egan, 1975). Our paper surveys the context of feature extraction by RBF approximation in combination with ROC, and compares and contrasts their performance in dimension reduction with common basic approaches. In Section 2, we review shortly the central elements of RBF and explain our Ansatz developed for discriminant analysis and classification of neural multi-channel recordings. In Section 3, we show that the RBF approximation can be regarded as an extension of a linear correlation classifier. We further show how this approach can be used to characterize and segregate spatio-temporal patterns from multi-electrode recordings in monkey visual cortex.
2. Methods 2.1. Function approximation by RBF Originally, RBF have been used in approximation theory (review: Powell, 1992). Suppose that f (x ), x /Rd , is an unknown real-valued function in d dimensions to be approximated. An RBF approximation to f (x ) in its simplest form is a series expansion: s(x)
M X
lj F(kxzj k)
(1)
j1
with weights lj , where ½½×/½½ denotes the Lp -norm with 15/ p 5/2 and F (r ), r ]/0 denotes a real-valued parameter depending (basis) function, e.g. Gaussian F (r )/exp(/ cr2) or multiquadric F(r )/(r2/c2)1/2 (Powell, 1992). The points {zj /Rd jj/1, . . ., M } are called the centers or prototypes of the RBF approximation. Suppose the function f(x) is given at N different data points {xi , f(xi )} /Rd /Rji /1, . . ., N }, then we can construct a function s (x ) of the form (Eq. (1)), which centers at the data points xi . Setting M /N and replacing zj by xi yields: s(xi )f (xi ) i 1; . . . ; N:
(2)
Solving these interpolation conditions is equivalent to solving a linear system of equations:
(3)
AlF T
with Aij /F(jjxi /xj jj), l/(l1, . . ., lN ) , F /(f (x1), . . ., f(xN ))T . It has been shown (Schoenberg, 1938; Micchelli, 1986; Powell, 1992) for a large class of basis functions F (r) and under very weak conditions on the geometry of the centers xi (provided they are distinct) that the distance matrix A is non-singular. If the weights in Eq. (1) are set according to the solution of the linear system from Eq. (3), s (x ) represents a continuous hyper-surface passing exactly through each data point xi . Important for approximation are the choices of the appropriate basis function, its shape parameter c, and the dependence on the position and number of the centers. Unsuitable choices can lead to a bad conditioned, fully occupied matrix. In our analyses, we used in most cases basis functions of the form F(r )/jjrjj2. In applications to neuronal data the matrices are of moderate size (N / 1000). We solved the linear system of equations by a preconditioned conjugate gradient method (Barrett et al., 1994) but other methods are also practicable (Beatson et al., 1999; Buhmann, 2000). 2.2. RBF for pattern classification In recent years RBF became popular in pattern classification (e.g. Evgeniou et al., 2000 and the references there). In our applications we concentrate on twoclass problems. Suppose that two groups C1 /{xi / Rd ji/1, . . ., n } and C2 /{yj /Rd jj/1, . . ., m } of feature vectors are given. There, we are interested in the cortical activity of a monkey looking repeatedly at one of two different visual stimuli labeled C1 and C2. Our goal is to compare the signals from the two groups and to look if they can be discriminated. In order to achieve this by RBF approximation the function s (x ) has to become a kind of indicator function. Each point x of the observation space will be assigned to one class according to its value s (x ). The question is how to construct such an indicator function, provided there exists only a finite set of observations having a broad overlap between the distributions of the two groups. Dividing the observations of each group into a learning set L and a test set T is a widely-used first preparation (e.g. Gammerman et al., 1998). Vectors xi and yj from the learning group L are then used to construct a continuous RBF s(x ), so that e.g. s (xi )/1 for vectors from class C1 and s (yj )//1 for vectors from class C2 (Bishop, 1995). Given the weights and the centers, the labeled elements of the test set T can be mapped by s(x). Thereby, each single observation from the test set can be classified according to its real value. Thus, the initial high-dimensional problem will be reduced to a lowdimensional problem by the nonlinear RBF-transforma-
A. Kremper et al. / Journal of Neuroscience Methods 116 (2002) 179 /187
181 n
tion. Note, the 1-dimensional distributions can be easily used to determine up to what extent the two classes C1 and C2 are separable.
¯ responses of the two groups x¯ 1=n a xi and y m i1 1=m a yj (e.g. Becker and Kru¨ger, 1996; Suppes et al., j1 1997, 1998). To compare this similarity measure with our approach we construct a function of the form:
2.3. Discriminant analysis
8 (y) ½½y x½½ ¯ 2 ½½y y½½ ¯ 2
For the 1-dimensional discriminant analysis we use a nonparametric distance measure from signal detection theory: the area under the ROC (Green and Swets, 1966; Egan, 1975). The value, which ranges from 0 to 1, is a measure for the separability. If the value is 0.5, e.g. if the ROC curve is the diagonal, the distributions are equal. The more the area under the ROC curve (AROC) differs from 0.5, the better the two distributions are separable from each other. Therefore, a value of 1 or 0 indicates no overlap, a value of 0.5 indicates full overlap in the distributions. The whole procedure (Sections 2.1, 2.2 and 2.3) will be as follows:
representing the conventional approach. If y belongs to C1 this function has to be positive, otherwise negative. Reformulating this expression and eliminating ½½x½½ ¯ 2 and 2 ½½y½½ ¯ ; which only changes the threshold, leads to:
Step 1: Specification of the number of centers N and the basis function F . Positioning the basis functions at data points xi , i/1, . . ., N sampled randomly from the underlying distributions (Lowe, 1995). Calculation of the weights li by strict interpolation and transformation of the test set T by RBF. Step 2: Discriminant analysis of the two 1-dimensional distributions obtained from T by ROC. Step 3: Repeating the whole procedure many times, in order to estimate to what extent the classification properties depend on the positions of the centers. Note: Generally, the properties of s (x) depend on the number and the positions of the centers xi and also on other parameters. In technical applications the main effort is to choose these quantities properly, in order to achieve a good separation (e.g. Orr, 1998). In our applications we want to estimate how reliable our analysis is, i.e. to estimate the confidence of the RBF / ROC-prediction.
3. Results 3.1. Comparison of RBF with a linear correlation classifier In this section, we compare our approach with standard approaches and explain why the application of the RBF approximation can be regarded as an extension of a linear correlation classifier. As in Section 2.2, suppose there are two data sets C1 and C2. Assume y /Rd is a new cortical response that has to be classified with respect to the visual stimuli C1 or C2. A common approach in neuroscience is to look at the mean square distance ½½y x½½ ¯ 2 and ½½y y½½ ¯ 2 between y and the mean
8 (y)
(4)
n m X X 1 1 y; xi y; yj ; n m i1 j1
(5)
where ×/,×/ denotes the standard scalar product. Restriction to the case of normalized responses jjy jj / xi jj /jjyj jj /1 and Euclidian distances makes it obvious that the basis function F (Eq. (1)) can be regarded as a nonlinear function f of the correlations between y and xi : pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi F(½½yxi ½½) F( 2(1y; xi ))f (y; xi ) (6) respectively, y and yj . The corresponding RBF Eq. (1), given by: s(y)
n X i1
li f (y; xi )
m X
mj f (y; yj )
(7)
j1
is a nonlinear extension of the linear correlation classifier from Eq. (1). Comparing Eqs. (5) and (7) shows that the correlation classifier uses only first order properties. The coefficients in Eq. (5) are constant and depend only on the number of samples. The coefficients li and mj in the RBF expansion of Eq. (7) depend both on the data and the mapping. Due to this the RBF approximation is often better adapted to the classification problem. To confirm this statement, we applied our method to multivariate data sets of known characteristics. We found (not shown) that the RBF-classifier reveals good results in cases where the linear correlation classifier works well. Beyond, the RBF-classifier indicates a distinction in cases where the first order moments are identical and where there is also a strong overlap in the distributions. As expected, separability depends on the type and stationarity of the data and on the size of the training set. Further on, we investigated signals with a difference in their frequency-phase properties. We performed a discriminant analysis by comparing signal segments of different duration in the time domain. In this case, we found that with increasing dimensionality the segregation becomes monotonically better. That is, RBF application works well independent of dimensionality. Applying our method to artificial data for which a separate investigation of amplitude or phase-density
182
A. Kremper et al. / Journal of Neuroscience Methods 116 (2002) 179 /187
spectra indicates no difference but which are separable anyhow, showed that the nonlinear RBF-transformation takes care of high-dimensional relations. In addition, we were able to confirm a well known result from RBF properties: RBF can be successfully applied for the discrimination of non-contiguous data which are not linearly separable. Although this method makes no explicit use of the characteristic features, it indicated an obvious difference when there was one. 3.2. Spatio-temporal pattern classification in cortical recordings Here, we will show how our approach can be used to compare cortical spatio-temporal activity patterns generated by different visual stimuli. We used local field potentials (LFP) recorded with five parallel m-electrodes from an awake monkey in primary visual cortex (V1) during visual stimulation by (i) a continuous sinusoidal grating (no object ) compared to (ii) a grating in which an object was defined by a shifted rectangle (Gail et al., 2000) (Fig. 1). Before stimulus-onset the monkey fixated a gray screen and continued fixation throughout stimulation. For each stimulus 120 response repetitions were recorded. We chose the number of RBF prototypes fixed, 24 from each class, which leads to a system of 48 linear equations for determining the coefficients in the RBF expansion. The length of the vectors that we investigated first was 30. This value arises from the time
Fig. 1. Visual stimuli used to elicit cortical activity in V1 recorded with five collinearly arranged m-electrodes. The positions of the corresponding receptive fields (RFs) are indicated by contours (modified from Gail et al., 2000).
Fig. 2. Time-resolved multivariate discriminant analysis of simultaneously recorded signals from five electrodes under different visual stimulation (Fig. 1). Stimulus-onset at 0 ms. Analysis window: 12 ms. Error bars: mean9/standard deviation (S.D.). Dashed line: chance level.
series of the five electrodes using epochs of six time samples with 2 ms sampling interval. We shifted the spatio-temporal window over the cortical recordings in 12 ms steps. The result of our global time resolved discrimination analysis is shown in Fig. 2. Before stimulus-onset the areas under the ROC curve (AROC) vary between 0.59/0.06, indicating that there is no significant difference in the two signal groups. About 50 ms after stimulus-onset, however, there is a steep increase in discriminability. AROC rises up to a value of 0.979/0.01 near 160 ms after stimulus-onset. After this peak, discrimination decreases whereas at 240 and 360 ms two weak maxima occur. To estimate, the contributions from single electrode recordings, we repeated our analysis with different combinations of four out of five signals in all combinations (Fig. 3, gray curves). Obviously, the first peak is not an effect of activities from a single electrode. Any pooled combination of four out of five electrode signals shows the same rapid increase in discrimination about 50 ms after stimulus-onset. Before onset the AROC values fluctuate around 0.5. On the other hand, the local maximum about 240 ms after stimulus-onset is primarily due to signals of one electrode (one combination shows
Fig. 3. RBF /ROC discriminant analysis of recordings from four electrodes out of five (gray). Presented are the mean values of the corresponding five curves and the mean curve from Fig. 2 (black). The S.D. of the single curves (not shown) are comparable to the situation in Fig. 2.
A. Kremper et al. / Journal of Neuroscience Methods 116 (2002) 179 /187
a strong reduction during the 200/300 ms epoch). Irrespective to this variation, all other discrimination curves show a similar behavior. We performed, the same analysis with the signals of a single electrode (Fig. 4, gray curves) and found that the discrimination, using the signals from all electrodes in parallel, leads to higher AROC values. Anymore, comparing the object and no object data for each electrode separately shows a moderate increase after stimulus-onset and also the maxima are lower and not so broad. This distinction between the global and the local analysis has been further confirmed when we compared the result due to the simultaneous inclusion of all channels with that of the mean over the single channel analyses (not shown). In order to compare the results of our method with those of standard approaches, we computed the averaged event related potentials over all five electrodes for the object and the no object class. Looking at the variability over time (Fig. 6) shows that there exists a broad overlap which is minimal about 180 ms after stimulus-onset. In the interval 80/240 ms the mean value for the object class is higher than that for the no object class. Subsequently, this relation changes and at 400 ms after stimulus-onset both distributions are nearly identical.
Fig. 4. RBF /ROC discriminant analysis of the single electrode recordings (gray curves) compared to the RBF /ROC analysis using all recordings simultaneously (solid black curve). Presented are the mean values. The S.D. of the gray curves (not shown) are comparable to the S.D. in Fig. 2.
183
Fig. 6. ROC analysis (gray) evaluated at each sampling point (every 2 ms) of the superimposed signals from five electrodes compared to the result from the five parallel-channel RBF /ROC analysis (solid black, see also Fig. 2). Dashed curve: Mirrored RBF /ROC result.
To quantify the separability we computed the AROC for each sample point in time (gray curve in Fig. 6). Like with our method (black curve), significant differences exist from 50 to 400 ms. Overall the black curve is more apart from chance level (AROC: 0.5) than the gray curve (dashed line). In contrast to the results of our method the AROC is lower than 0.5 during the 240/400 ms epoch. This transition indicates that the mean responses of the two classes change their magnitude (see Fig. 5). Interestingly, about 240 ms after stimulusonset no difference is present in the event related potentials. However, our technique shows a significant difference about 240 ms after stimulus-onset, which must be an effect of some bias in the two highdimensional distributions comparable to the results reported in Section 3.1. The benefit of the multidimensional RBF /ROC analysis against the 1-dimensional ROC analysis becomes more clear if one compares the areas under these discrimination curves from stimulus-onset to 450 ms post stimulus. The absolute value of the integral over the 1-dimensional curve referring to chance level is down to 55% of the area covered by the RBF /ROC curve. In summary, before stimulus-onset our method shows no separability as expected. Using the parallel recordings reveals better discrimination than using their mean time course or a single channel signal. Using longer analysis windows (up to 100 ms) leads to a clearer discrimination in the multivariate case (data not shown; see also Zohary et al., 1990; Becker and Kru¨ger, 1996).
3.3. Time resolved differences in the signal-coupling due to the visual stimulation
Fig. 5. Variability (S.D.) of the cortical responses recorded with five electrodes according to the object and the no object stimuli (see Fig. 1).
Further insight in the state description and the dynamical behavior can be obtained by investigating the global coupling structure according to the two visual stimuli. For both stimulus situations (object versus no object ), we computed the rectified correlation coefficient matrix:
A. Kremper et al. / Journal of Neuroscience Methods 116 (2002) 179 /187
184
M
½xi ; xj ½ ½½xi ½½ ½½xj ½½
(8)
i; j1; ...; 5
between the signals from the five electrodes in a sliding time window of 80 ms duration. The entries above the diagonal are arranged in a 10dimensional vector. This leads to two 10-dimensional distributions for the object and no object class which we subsequently analyzed with our method. The timeresolved result is presented in Fig. 7 (black curve). Just as in the previous case (see Section 3.2) before stimulusonset no significant discrimination is obtained. About 40 ms after stimulus-onset the AROC rises up, which indicates an increasing difference in the two correlation distributions. About 100 ms after stimulus-onset the discrimination reaches a first peak (AROC: 0.769/0.04). A second peak appears about 240 ms post stimulus (AROC: 0.869/0.02). After that the AROC curve decreases monotonically towards chance level. To find out if this discrimination behavior is only an effect of stimulus-locked components we retried the analysis with shift predictor corrected data (Perkel et al., 1967; Palm et al., 1988). As Fig. 7 (gray curve) shows this preprocessing delays the rising phase. In addition, the first peak vanishes and the second one is not reached fully. Nevertheless, there remains a significant separability about 200 ms after stimulus-onset (AROC: 0.89/ 0.03). This peak indicates differences in the global correlation structure which are caused by internal processes. In order to compare the results from our multichannel analysis with those of a pairwise analysis, we separately computed ROC curves for the ten pairwise correlations. Averaging over all ten curves leads to the gray curve in Fig. 8. It differs from the RBF /ROC result (black curve) in three points. First, the RBF / ROC curve reaches much higher values. Second, the values of the gray curve are both higher and lower than 0.5. In the case where the gray curve is greater than 0.5 the mean correlation in the object class is higher than the mean in the no object class and in the range where
Fig. 7. Black, RBF /ROC discriminant analysis of the global correlation patterns for the two different stimuli from Fig. 1. Gray, Separability after shift predictor correction. Presented are the mean values. Error bars (not shown) are in the same range as in Fig. 2.
Fig. 8. Comparison of the differences in the 10-dimensional correlation distributions (solid black) to the mean segregation over the single correlation pairs (gray). Presented are the mean values. Dashed curve, Mirrored result of the global analysis.
AROC is lower than 0.5 this ratio is reversed. The values of the RBF /ROC curve in the corresponding range are always greater than 0.5. Third, at 240 ms the RBF / ROC result indicates a significant segregation (AROC /0.8) whereas the mean correlation in both classes is nearly the same. The corresponding AROCvalue for the mean correlation at this time is about 0.5. The absolute value of the integral (0 /450 ms) over the mean correlation curve referring to chance level is only 30% of the area covered by the RBF /ROC curve (see also Section 3.2). Therefore, (like in the previous section, Fig. 6) these results point out that the multi-channel analysis supplies further information which cannot be seen at the single-channel level.
4. Discussion 4.1. General remarks We have demonstrated by examples that our method performs better than a linear correlation classifier. In contrast to MDS, only a linear system of equations has to be solved and the computational effort increases only linearly with the number of observations. This makes our approach very fast. For example, the analysis presented in Fig. 2 took about 15 min on a work station (DEC Alpha 500/500 MHz). Our technique is particularly well suited for the investigation of spatio-temporal processes as in multiple-site cortical recordings. Although we have applied it here to parallel signals from five electrodes in monkey visual cortex it will work similarly well for 100 parallel recordings. Using more centers for the RBF analyses improves the discriminability but also increases the computational efforts (see Section 3.1). Under very weak conditions on the geometry of the centers a unique solution for the coefficients exists (Powell, 1992). This requires that all centers are disparate, which is generally fulfilled in highdimensional spaces due to the relatively low numbers of data samples available in neural recordings. However,
A. Kremper et al. / Journal of Neuroscience Methods 116 (2002) 179 /187
185
use more prototypes from each class for discrimination, because, an adequate number depends to a considerable degree on the data, particularly on the complexity of the underlying distributions. In our examples, the signals contain sufficient relevant stimulus-related components for using a low number of RBFs for learning and good discrimination. We used identical numbers of learned response trials for the two classes. This balance is neither a restriction nor a necessary condition. However, if the numbers of prototypes and the statistical quality of the signals (e.g. their content of relevant signal components, often quantified as signal-to-noise ratio) are considerably different in the two classes, one has to adjust the splitup in learning parts correspondingly. This means that data with a lower content of relevant components will require more prototypes for learning.
with low-dimensions ( B/4) this may pose problems, but low-dimensional problems may not require dimension reduction. We have restricted our method to two-class problems. This has the advantage that the choice of the indicator values (here /1, /1) has no influence. When using RBF for n -class problems the target space should be (n/1)-dimensional in order to preclude that the classification result is influenced by the choice of the indicator values (see also Section 2.2). In a four-class problem, for example, the indicator values should be the edges of a 3dimensional tetrahedron. However, a huge number of questions asked in neuroscience are two-class problems, i.e., two states have to be differentiated. Although RBF is a global working method its behavior is local what probably is the reason for its robustness against noise and outliers. Our method is nonparametric, i.e. independent of the underlying distributions and takes higher order moments into account. In addition, it does not need large data sets for reliable separation, and it is practical for handling extensive ones. Instead of using RBF for classification, it is also possible to estimate the distributions directly by kernel density estimation, which can be interpreted as a special form of RBF approximation (Traven, 1991; Girosi et al., 1995). Kernel density estimation has the advantage that all data can be used at once, instead of dividing them into learning and testing sets as is necessary in our approach. However, to adapt the density estimation to the data leads in general to a difficult nonlinear minimizing problem and the consecutive discrimination has to be done in multidimensional space, both steps are computationally expensive.
Using the AROC as a measure of separation can lead to misinterpretations. For example, in Sections 3.2 and 3.3, we have shown that before stimulus-onset the AROC values vary around 0.5. This is always the case if the two distributions are identical and Gaussian for which the ROC is at the diagonal. However, with arbitrary distributions the ROC can differ from the diagonal while the corresponding AROC value can be 0.5. In such cases, it is useful to compute the area relative to the chance level. Instead of ROC-analysis, our RBF-approach is open for using a large variety of other discrimination measures or tests, e.g. mutual information (Shannon, 1948) or x2, with computational low cost.
4.2. Choice of the basis function
4.5. Signal components relevant for discrimination
Discrimination performance relies largely on the choice of the basis function used for composing the RBF. In the presented examples, we realized that the multiquadric function led to a better segregation than the Gaussian which is commonly used in RBF-networks (Bishop, 1995; Poggio and Girosi, 1990). To date, it is not clear to what extent the original data relations, including the degree of overlap in the multidimensional distribution, will be preserved after any nonlinear RBF-transformation. In this context we have to investigate thoroughly in the future how to choose the proper RBF for a given set of data.
In the example calculations, we applied our method exclusively to intra-cortically recorded LFP which are amplitude-continuous data recorded discretely in time according to the sampling theorem. Hence, our method can also be applied to any other type of amplitudecontinuous neural (and other) data, including EEG, MEG, and envelope conversions of multiple- and singleunit spike trains. Detecting a difference with RBF /ROC does not include an identification of the recording channels and signal components responsible for the separation. However, with preprocessing of the data, we can find out indirectly which features are relevant for the segregation if we extract different components prior to the RBF / ROC analyses and search for those giving best discriminability. For example, in Section 3.2, we have analyzed visual cortical signals evoked by two different stimuli. We found that both, recording position and signal response dynamics influence the separability of
4.3. Size of the training set For our recordings from the visual cortex we found that 25 prototypes (response trials) from each of the two classes (object vs. no object ) were usually sufficient for significant segregations. It may well be appropriate to
4.4. AROC as discrimination measure
186
A. Kremper et al. / Journal of Neuroscience Methods 116 (2002) 179 /187
the data and we determined their contributions quantitatively (see Fig. 3). The results show that discrimination is particularly supported by certain recording positions and how it evolves over post stimulus time. In addition, we demonstrated that channel interactions (present in mutual correlations), even if they are not due to common components phase-locked to stimulus events, can increase the sensitivity for discrimination (Section 3.3). In addition, these interactions contribute to segregation even if they are highly nonlinear. Such multi-channel effects of interactions are due to couplings in the neural network and therefore, cannot be captured in repetitive single-channel analyses, even if all single-channel analyses are carried out. Thus, the inclusion of simultaneously recorded non-averaged multi-channel data, possible with our method, reveals new effects caused by global (high-dimensional) interactions within the neural networks which cannot be seen at the local single-channel level.
relative to stimulus-onset (which is known to the experimenter). However, the visual system has no direct access to stimulus timing. Instead, for perception it has to derive stimulus events exclusively from its single-trial neural activities. Our analysis is able to indicate the presence and timing of two stimuli on the basis of continuous recordings under the following restrictions: (i) the signal components relevant for segregation are present in the recordings at a sufficient signal-to-noise ratio as discussed above for single-trial analyses, (ii) initial learning of the RBF was possible with known reference to stimulus-onset and stimulus class. In the consecutive sliding window analysis the direction and time of the crossing of the ROC-threshold will enable the estimation of the stimulus-onset times and the stimulus class.
4.6. Single-trial classification
Our RBF /ROC method is particularly sensitive in the discrimination of sensory stimuli on the basis of simultaneous multiple-site recordings if several of the recording channels contain correlations with the stimulus. Even if these correlations are due to signal components not phase-locked to stimulus events they will contribute to discrimination. We demonstrated this high sensitivity with multiple-site recordings from monkey visual cortex by showing that the use of rather short analysis windows is possible so that the segregation dynamics are captured at good temporal resolution. In addition, with these recordings already five channels were sufficient to discriminate two stimuli from singletrial recordings with a high degree of certainty while the computational efforts were rather low.
Without modifications our approach can also be used for single-trial classification. This is relevant, for example, with recordings from sensory systems, because, perceptual decisions are generally made on the basis of single stimulus presentations (not on many identical repetitions). However, reliable single-trial classification requires that the recorded single-trial responses contain the relevant signal components for discrimination at a sufficiently good signal-to-noise level. This is fulfilled the better the higher the mutual correlations of the relevant signal components in the simultaneous recordings (because, this redundancy improves their overall signal-to-noise level). Additional improvements in the relative content of relevant components are possible by choosing an optimal duration and post stimulus time for the analysis window and the ROC-threshold. We did this in our example recordings (e.g. Fig. 2) by concentrating the RBF-analysis around the 160 ms post stimulus epoch and choosing an appropriate threshold for the ROC-analysis. In this way we were able to assign all single-trial responses correctly to the two stimuli (object vs. no object ; note that the AROC at 160 ms is close to 1 for every random training set). However, for such high sensitivity discriminations the choice of the RBF plays an even more important role compared to the easier situations in which all mapped response trials are used for a ROC analysis. 4.7. Stimulus discrimination without knowledge of stimulus times In our analyses we kept the temporal stimulusresponse relation fixed, i.e. the RBF /ROC segregation of the two stimulation classes has been calculated
5. Conclusions
Acknowledgements We thank Professor M. Buhmann, A. Bruns and A. Gabriel for inspiring discussions and helpful comments and H.J. Brinksmeyer and A. Gail for kindly providing the experimental data.
References Barrett R, Berry M, Chan TF, Demmel J, Donato JM, Dongarra J, Eijkhout V, Pozo R, Romine C, Vand der Vorst H. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Philadelphia: SIAM, 1994. Beatson RK, Cherrie JB, Mouat CT. Fast fitting of radial basis functions: methods based on preconditioned GMRES iteration. Adv Computation Math 1999;11:253 /70. Becker JD, Kru¨ger J. Recognition of visual stimuli from multiple neuronal activity in monkey visual cortex. Biol Cybern 1996;74:287 /98.
A. Kremper et al. / Journal of Neuroscience Methods 116 (2002) 179 /187 Bishop MC. Neural Networks for Pattern Recognition. Oxford: Clarendon Press, 1995. Borg I, Groenen P. Modern multidimensional scaling: theory and applications. New York: Springer, 1997. Buhmann MD. Radial basis functions. Acta Numerica 2000:1 /38. Carreira-Perpinan MA. A review of dimension reduction techniques. Technical report CS-96-09, Department of Computer Science, University of Sheffield, 1997. Egan J. Signal Detection Theory and ROC Analysis in Pattern Recognition. New York: Academic Press, 1975. Evgeniou T, Pontil M, Poggio T. Regularization networks and support vector machines. Adv Computation Math 2000;13(1):1 /50. Gail A, Brinksmeyer HJ, Eckhorn R. Contour decouples gamma activity across texture representation in monkey striate cortex. Cerebral Cortex 2000;10:840 /50. Gammerman A, Vovk V, Vapnik V. Learning by transduction. In proceedings of uncertainty in AI, Madison Wisconsin 1998: 148 / 155. Girosi F, Jones M, Poggio T. Regularization theory and neural networks architectures. Neural Computation 1995;7:219 /69. Green DM, Swets JA. Signal Detection Theory and Psychophysics. Wiley: New York, reprinted by Krieger, Huntington: NY, 1966. Hotelling H. Relations between two sets of variates. Biometrika 1936;28:321 /77. Jolliffe IT. Principal Component Analysis. New York: Springer, 1986. Lowe D. Radial basis function networks. In: Michael A, editor. The Handbook of Brain Theory and Neural Networks. Arbib, Cambridge, MA: MIT Press, 1995. Micchelli CA. Interpolation of scattered data: distance matrices and conditionally positive definite functions. Constr Approx 1986;2:11 /22.
187
Oja E. Neural networks, principal components, and subspaces. Int J Neural Syst 1989;1:61 /8. Orr M. Optimising the widths of radial basis functions. Fifth Brazilian Symposium on Neural Networks, Belo Horizonte/Brazil, 1998. Palm G, Aertsen AMHJ, Gerstein GL. On the significance of correlations among neuronal spike trians. Biol Cybern 1988;59:1 /11. Perkel DH, Gerstein GL, Moore GP. Neuronal spike trains and stochastic point processes. II. Simultaneous spike trains. Biophysical J 1967;7:419 /40. Poggio T, Girosi F. Networks for approximation and learning. Proc IEEE 1990;78:1481 /97. Powell MJD. The theory of radial basis function approximation in 1990. In: Light W, editor. Advances in Numerical Analysis II: Wavelets, Subdivision Algorithms and Radial functions. Oxford University Press, 1992:105 /210. Sammon JW, Jr. A nonlinear mapping for data structure analysis. IEEE Trans Pattern Anal Machine Intelligence 1969;11(3):401 /9. Schoenberg IJ. Metric spaces and completely monotone functions. Ann Math 1938;39(4):811 /41. Shannon CE. A mathematical theory of communication. Bell Syst Tech J 1948;27:379 /423. Suppes P, Lu Z, Bing Han B. Brain wave recognition of words. Proc Natl Acad Sci USA 1997;94:14965 /9. Suppes P, Han B, Epelboim J, Lu Z. Invariance of brain-wave representations of simple visual images and their names. Proc Natl Acad Sci USA 1998;96:14658 /63. Traven HGC. A neural network approach to statistical pattern classification by semiparametric estimation of probability density functions. IEEE Trans Neural Networks 1991;2(3):366 /77. Zohary E, Hillman P, Hochstein S. Time course of perceptual and single neuron reliability. Biol Cybern 1990;62:475 /86.