Cognitive Brain Research 22 (2004) 1 – 12 www.elsevier.com/locate/cogbrainres
Research report
Relationship between stimulus complexity and neuronal activity in the inferotemporal cortex of the macaque monkey Gy. Sa´rya, Z. Chadaideb, T. Tompaa, Gy. Kova´csc, K. Kftelesa, K. Bodad, L. Radulye, Gy. Benedeka,* a Department of Physiology, University of Szeged, H-6720, Do´m te´r 10, Szeged, Hungary Neurology Department, University of Szeged, H-6720, Semmelweis u. 6. Szeged, Hungary c Budapest University of Technology and Economics, Center for Cognitive Sciences, Budapest, Mu¨egyetem rkp 9 R/203, Hungary d Department of Medical Informatics, University of Szeged, H-6720, Kora´nyi fasor 9, Szeged, Hungary e Department of Psychiatry, University of Medicine and Pharmacy, Tirgu-Mures, Romania b
Accepted 29 June 2004 Available online 11 September 2004
Abstract The single-unit activity of 217 cells was recorded from the inferotemporal cortex (IT) of two awake macaque monkeys while they performed a fixation task. The stimuli were coloured geometrical shapes or coloured representations of natural or artificial objects. To determine whether the stimuli could be separated into groups on the basis on neuronal population behaviour, the responses to the images were analysed by factor analysis and cluster analysis. It was a common result of each analysis that, on the basis of neuronal responses, the stimulus set could be separated into two groups, despite the lack of difference in mean response rate to them. Similar groups were formed when only the first half of the responses was analysed. The results suggest a differential coding of the images of simple geometrical shapes and of the images of complex, real (photographic) objects. We found significant differences between the two stimulus groups in physical features, other than size or luminance. Our results suggest that the same neurone population might respond differently to simple and complex images in the first 150 ms of their responses. The differences might be attributed to bnon-obviousQ physical features of the stimuli, such as the amount of internal lines in the images, colourfulness and the length of perimeter of the stimuli. D 2004 Elsevier B.V. All rights reserved. Theme: Sensory systems Topic: Visual cortex: extrastriate Keywords: Stimulus coding; IT cortex; Object vision; Selectivity
1. Introduction The cells in the inferotemporal cortex (IT) of the monkey brain have been investigated extensively (for reviews see Refs. [7,27,42]). This area of the ventral visual stream contains cells responding to complex stimuli [8,2,43,6,18, 38,12]. Moreover, these cells display remarkable stimulus selectivity and various degrees of invariance to image
* Corresponding author. Tel.: +36 62 545 101; fax: +36 62 545 842. E-mail address:
[email protected] (G. Benedek). 0926-6410/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.cogbrainres.2004.06.015
transformations, such as size and position [36], partial occlusion [21] or defining cue [33]. Previous studies have indicated that IT neurones respond differentially to complex visual stimuli [8,2,31,4]. It has also been shown that there exists a complexity gradient in the IT, i.e., the more anterior regions respond to more complex features [43,42]. Tanaka et al. [43] suggested that cells are clustered in cellular columns (modules) in the IT, containing cells selective for moderately complex features. They proved this idea by gradually reducing the images in their experiments to the simplest configuration, the critical features,
2
G. Sa´ry et al. / Cognitive Brain Research 22 (2004) 1–12
which could still drive the neurons. They hypothesized that general classes of objects are represented not by the activity of a single cell, but by the activity across different IT modules. Detailed discrimination would require the detection of small differences in activity of the neurones within the modules. Thus, IT cells form ensembles, involving different modules depending on the visual stimulus. New stimuli require a new recruitment of modules, which gives an infinite variability for coding novel stimuli using a relative small number of modules. Op de Beeck et al. [28] used parameterised stimuli to show that IT neurones represent similar shapes in a metrically biased but ordinally faithful way. Furthermore, Sigala and Logothetis [39] suggested that stimulus features important for visual categorization for instance, are represented in the activity of single units in the primate IT cortex. In their opinion, neurones in the IT signal the diagnostic features for categorization via their firing rate, thereby, being especially selective for them. During our experiments we presented 20 non-parameterised images similar in size and luminance but having different complexity (Fig. 1) to to behaving monkeys. The stimuli were either geometrical shapes (GI) or photos of real objects (RI). The monkeys performed a fixation task while neuronal activity was recorded from the IT. The responses were subjected to cluster analysis and factor analysis and the images were analysed regarding the physical features. Our aim was to provide evidence with a set of statistical measures for the hypothesis that the simple changes in the
neuronal firing rate are not the only way for the neurons in the IT to code similarity. We tried to establish different kinds of physical measures to find which one (or more) of them can account for the similarities of the responses. The possibility of a semantic influence on the neuronal activity was considered. We suggest that within the frame of the current experimental paradigm the strongest influence can be of physical nature, however, in some cases the semantic influence (probably in the form of feedback from higher areas) cannot be ruled out. We also proved that similarity information could be derived from the variances of the neuronal firing, a kind of information that is not evidently present in the firing rate. Our results suggest that a given neuronal population in the IT may respond to simple and complex stimuli differently and the clustering of the neuronal responses given to them manifests similarity between images. The present study is partly based on the further analysis of a previously recorded neuron set [22,45,46].
2. Methods Two male macaques (Macaca mulatta, weighing 5.5 and 6.3 kg, respectively) served as subjects. Prior to the recordings, the monkeys underwent two aseptic operations. A stainless steel head post was attached to the skull and a scleral search coil was implanted in the eye [15]. After recovery, a second session surgery was performed to place
Fig. 1. Stimuli used in the study. Images #1–11 are geometric shapes (GI) while images #12–20 are photos of real-world objects (RI). It should be noted that these images serve for illustration purposes only and are black and white versions of the ones used in this study.
G. Sa´ry et al. / Cognitive Brain Research 22 (2004) 1–12
a recording chamber on the skull over the IT. The coordinates of the chamber position were determined by using MRI images and by a stereotaxic atlas of the monkey brain [30]. The monkeys subsequently worked under a controlled water access paradigm in daily recording sessions. Body weights were recorded weekly. At the weekends, the animals were allowed to drink water ad libitum and fruits and vegetables were added to their diet. All procedures conformed to the guidelines of the National Institutes of Health Guide for the Care and Use of Laboratory Animals and the guidelines of the University of Szeged Animal Care and Use Committee. A PC-based data acquisition system was used to control the trials, record spike times (sampling rate: 1 kHz), record eye position (sampling rate: 200 Hz) and present the stimuli. Extracellular single unit activity was recorded with parylene coated tungsten microelectrodes having an impedance of 1– 3 MV (FHC). Cellular responses were amplified, band-passfiltered (FHC) and fed into a window discriminator (FHC), a spike sorter (SPS), an audio monitor and an oscilloscope for monitoring. The triggered spikes were sampled and displayed online as a raster histogram and a peristimulus time histogram. Stimuli were presented on a standard computer screen (17 in. in diameter, refresh rate: 74 Hz) on a uniform grey background square (sides: 188; luminance=8 cd/m2) positioned in the centre of the screen 57 cm from the monkeys. A set of 20 colour stimuli was used (Fig. 1). Eleven of the figures were simple geometrical images (GI) filled with a coloured, textured pattern, created by commercial imageprocessing software. The stimuli occupied the same area (658) and had a mean luminance of 7.9 cd/m2. The other 9 images were photos of complex, natural or artificial real objects (RI) having similar areas and luminances, chosen randomly from the image pool of the laboratory. Stimuli were presented centrally during the fixation of a small blue fixation spot (0.18 radius and 5.5 cd/m2 luminance) that remained on screen throughout the trial. A fixation paradigm was used during the recording sessions. Initially, the screen was black. Trials started with
3
the presentation of the fixation spot. If the animal fixated the fixation spot, the uniform grey background pattern was presented for 500 ms, after which the stimulus appeared for another 500 ms. Stimuli were presented in a semirandom fashion, each at least four times. Only cells, responding at least to one of the stimuli were collected and analysed in this study. Animals were rewarded for maintaining the fixation within a 0.80.88 square window until stimulus offset. If they left the fixation window earlier than the stimulus offset, the trial was aborted and excluded from further analysis. The inter-trial interval was 1000 ms. The response magnitude in the individual trials was taken as the difference in firing rates between the 500 ms period before the stimulus onset (baseline) and a 500 ms period during stimulus presentation, starting 50 ms after the stimulus onset. At the end of the experiment, penetrations were made into the cortex with metal reference wires at selected coordinates of the recording chamber. After a lethal overdose of barbiturate, the animals were perfused with 0.9% saline in 0.1 M phosphate buffer, and then with a solution containing 4% paraformaldehyde and 0.4% glutaraldehyde in 0.1 M phosphate buffer. The brains were removed and were left in sucrose overnight. Frozen sections (30–50 Am thick) were cut from the temporal lobe, and the penetration sites were identified. Histological evaluation showed that the recordings were performed in the lower bank of the superior temporal sulcus and in TEa, at stereotaxic coordinates around anterior 17 (Monkey I) and anterior 12 (Monkey II) in a relatively small anteroposterior range (Fig. 2). 2.1. Statistics The data files for the analysis contained 20 variables (columns) of the net responses to the 20 images (in spikes/s) and 117 and 100 rows corresponding to cells from Monkey I and Monkey II, respectively. The data from the two animals were analysed separately. Differences in all statistical procedures were considered significant at a probability
Fig. 2. Recording sites in monkey I. (A) Lateral view of the left hemisphere. The range of the anterio-posterior recording positions is shown by the two vertical lines. (B) Coronal section at AP 17 mm. The lines indicate the tracks of two reconstructed penetrations. Most neurons were recorded from penetrations lateral to the more medial reconstructed penetration. AMTS: anterior middle temporal sulcus, RS: rhinal sulcus, STS: superior temporal sulcus.
4
G. Sa´ry et al. / Cognitive Brain Research 22 (2004) 1–12
level pb0.05 where applicable. Our analysis involved the following tests. 2.2. Cluster analysis (Ward method) Cluster analysis is used to find meaningful pattern in data [16,29] and has a wide range of applications, starting from clinical research [17,44] through analysing neuronal systems [34,23,25]. It has frequently been used in analyzing data obtained from sensory systems [11,37,13], and finally, was used to analyze stimulus features in predator behavior [35,] or the similarities in neuronal representations of shapes in IT [28] Cluster analysis may be used to classify a set of variables into groups (clusters) on the basis of their similarity (i.e., correlation, variance) or distance. Hierarchical cluster methods handle each variable as a single cluster in the first step. In each step, the two most similar clusters (variables) are collapsed, and similarities are recomputed. This process can be illustrated by a hierarchical tree or dendrogram. From the several methods available, we used the Ward method, which uses an analysis of variance approach to evaluate the distances between clusters. This method attempts to minimize the sum of squares of any two (hypothetical) clusters that can be formed in each step. The method is regarded as a very efficient one; however, it tends to create clusters that are small in size. Several definitions for distances can be applied; in this study, the squared Euclidean distance definition was used.
physical domain. It is known for example, that the available surface information and the amount of internal lines influences the discrimination and recognition of images [1], i.e., more complex objects, with more internal lines (hence, in this theory more geons) does not require more time to be recognized when compared to simple objects, and subjects make fewer errors in recognising them. Trying to define complexity by non-obvious physical features we assigned the following parameters to each stimulus: the surface area (SA), the perimeter length (PL), the total length of all the lines on the perimeter and also inside (AL) and perimeter length over the surface area (PL/SA). The number of internal lines within a stimulus has been shown to influence IT neuronal responses [4] so we calculated a measure for internal line density: the total length of all lines over the surface area (AL/SA). Humprey et al. [14] have shown that colour improves recognition of natural objects but not artificial ones. Since our colour stimuli contained natural objects and artificial objects, as well, chromatic features were also analysed and the amount of change of colour information in the images was characterized in two ways. In one method, the change
2.3. Factor analysis Factor analysis is a useful tool to analyse spike train data [5]. The goal of factor analysis is to identify underlying variables (factors) that explain the pattern of correlations within variables. It involves the assumptions that the data should have a bivariate normal distribution for each pair of variables, and that the observations should be independent. Followed by a test of normality, the factor analysis was performed with the principal components extraction method and varimax rotation. The number of factors was determined, with eigenvalues over 1. The above methods do not result in significance levels, but are all useful to describe relationships between variables. If the same or a complementary structure could be derived using different techniques, it would support the reality of the identified structure (i.e., groups of images). Since they differ in method, we consider, that, provided the outcomes are similar, together they might lend support to our conclusions. 2.4. Physical features of the images Since our stimuli did not differ in the obvious features such as mean luminance or surface area (see above), we looked for bnon-obviousQ characteristics, still in the
Fig. 3. Distribution of the mean responses in Monkey I (A) and Monkey II (B) given to all of the stimuli used among the neurons. Arrows indicate the mean of the distribution.
G. Sa´ry et al. / Cognitive Brain Research 22 (2004) 1–12
in energy of hue, saturation and brightness was calculated on the stimuli surfaces in the following way. First, the stimuli were transformed to HSV (hue, saturation, value) images with commercially available software. Next, the differences in the HSV values for neighbouring pixels were calculated along the horizontal and vertical axes of the images. Finally, we took the sum of squares of these values. This procedure resulted in three values for each image, as an expression of how much variability there exists on the surface in the hue, saturation and brightness domains, respectively. We also used a bcolourfulness indexQ [41] for every image. Every pixel in the RGB images was given three values, according to the intensity of the red, green and blue components. Then the following formula was used: Colourfulness ¼
n Xp h
This procedure resulted in a single value, a bcolourfulness indexQ for every image. 2.5. Sparseness index, selectivity To quantify the stimulus selectivity measure of the tested neurons we used the sparseness (SP) introduced by [32]. This is a measure of the proportion of effective stimuli based on the response to each of the 20 stimuli. The sparseness indicates the length of the tail of the distribution of the net firing rates for the different stimuli [49]. Low values indicate long tail of distribution with only a few stimuli with high response rates. SP for n stimuli is computed by using the following formula: " SP ¼
2
ðredmeanÞ þ ðgreenmeanÞ io. þðblue meanÞ2 M
2
where mean was (red+green+blue)/3 and M was the image size in pixels (720*540).
5
X i¼1;n
#2 ," ðRi =nÞ
X
R2i =n
#
i¼1;n
where R i is the response to the ith stimulus of a stimulus set containing n stimuli. SP may range up to 1.0 indicating the case when a neuron responds to all of the stimuli in the stimulus set. We have to note that when calculating SP the negative (netto) responses were clipped to zero.
Fig. 4. Peristimulus time histograms of a cell (KAZ009), recorded from IT. Each window contains the responses given to one of the 20 stimuli arranged in the same matrix as the stimuli in Fig. 1. Dotted lines indicate stimulus onset. Notes indicate the firing rates in a 500 ms time window, starting at 50 ms after stimulus onset.
6
G. Sa´ry et al. / Cognitive Brain Research 22 (2004) 1–12
optimal case, at the end of a block of trials one would have several bcandidatesQ for the latency values. The time across the trials at which this change was most likely to occur was given by the mode of the btime candidatesQ and was taken as the latency value of the response to visual stimulation. In those cases where for some reason there was no latency bcandidateQ, the trial was not included.
3. Results We present data on 217 cells, 117 from Monkey I and 100 cells from Monkey II. 3.1. General findings
Fig. 5. Distribution of the response latency data for Monkey I (A) and Monkey II (B), respectively. Arrows indicate median of the distribution.
Also, the selectivity indices (SI) were calculated for each neuron, using the following formula: SI ¼ Rmax Rmin =Rmax þ Rmin where R max is the maximal and R min is the minimal response to a stimulus from the stimulus set, respectively. The closer this index is to b1Q, the more selective is the cell, i.e., the bigger the difference between a bpreferredQ and a bnonpreferredQ stimulus.
The average firing rate was 8.15 spikes/s for Monkey I. and 11.6 spikes/s for Monkey II, which differed significantly from the baseline. As concerns the firing rates in response to GI and RI, no differences were found. The values were 7.9 spikes/s (S.D.F10.3) and 8.4 spikes/s (S.D.F6.4.) for Monkey I and 11.9 spikes/s (S.D. F12.5) and 11.4 Spikes/s (S.D.F10.4 S.D.) for Monkey II, respectively. Distribution of the responses is shown in Fig. 3. The response levels are low, since many neurons decreased the firing rate to the stimuli we used. An example of the activity of an IT neuron is shown in Fig. 4. This cell had a mean firing rate of 23.9 spikes/s and there was no difference between the firing rates given to GI and RI (t-test, p=0.22). The mean of the response latency values in Monkey I was 114 ms (S.D.F21), identical to that in Monkey II: 114 ms (S.D.F16), t-test, p=0.99. In neither of the monkeys did we observe significant differences in latency times between the responses given to GI and RI (t-test, p=0.93 and p=0.89 for Monkey I and Monkey II, respectively). The distribution of the latency times is shown in Fig. 5.
Table 1 Proximity
Method
The whole (500 ms) response M I. Sq. Euclidean Ward
2.6. Latency times M II.
A Poisson spike train analysis was used to detect and measure the latency of significant changes in activity in the epoch following target onset. In this method the actual number of spikes is compared to the number of spikes predicted by the Poisson distribution given the mean discharge rate of the cell. If the distribution of spikes is non-random in the spike train being analyzed, the Poisson spike train analysis determines the times of significant changes in the spike train [10,24]. There might be several times of modulation in a single trial. For each trial, times of significant neuronal modulation were collected. In an
Sq. Euclidean
Ward
Clusters containing number of images (1, 5, 2, 9, 8, 11, 3, 10, 4, 7, 6)– (12, 15, 20, 19, 13, 18, 14, 16,17) (1, 5, 11, 2, 6, 8, 7, 3, 10, 4, 9)– (12, 15, 19, 17, 14, 16, 20, 13, 18)
First part (150 ms) of the response M I. Sq. Euclidean Ward (1,5,11,2,8,7,3,10,9,4,6)– (12,18,13,12,16,15,19,20,17) M II. Sq. Euclidean Ward (1,5,11,2,7,6,8,3,16,10,4,9)– (12,15,19,20,17,13,18,14) Rest (350 ms) of the response M I. Sq. Euclidean Ward M II.
Sq. Euclidean
Ward
(1,5,8,4,3,9,18,2,11,10)– (6,7,15,13,19,20,14,16,12,17) (1,7,6,8,20,2,18,17,16,11,15,3,4,5,9)– (10,13,19,12,14)
G. Sa´ry et al. / Cognitive Brain Research 22 (2004) 1–12
7
Fig. 6. Result of the cluster analysis in Monkey I. The dendrogram was made by using squared Euclidean distances (horizontal axis) and the Ward method. The numbers on the dendrogram represent the stimuli and correspond to the images in Fig. 1. In the upper half, stimuli #1–11 (geometrical shapes) are clustered, while the real-world images are in the bottom half.
3.2. Results of cluster analysis Table 1 and Figs. 6 and 7 show the different cluster solutions for Monkey I and Monkey II, respectively. A common property of the clusters is that generally images #1–11 and images #12–20 are bcloseQ to each other and form two main clusters. The divisions between the two clusters correspond to the GI and RI stimuli. In the case of Monkey I, the separation follows the stimulus set: only the
GI and RI stimuli can be found in one of the two main clusters. Also, in the case of Monkey II, GI and RI images are completely separated. Interestingly there are images, which form pairs in both monkeys: stimulus #13 and #18 (the cross and the torso-like figurine) are next to each other in the dendrogram, as well as stimuli #1 and #5 (the triangle and the square), stimuli #3 and #10 (the star and the building blocks), and stimuli #14 and #16 (the cactus and the chalice).
Fig. 7. Results of the cluster analysis in Monkey II. The dendrogram was made by the same method as in Fig. 8. The numbers represent the stimuli and correspond to the images in Fig. 1. In the upper half, stimuli #1–11 (geometrical shapes) are clustered; the real world images are in the bottom half.
8
G. Sa´ry et al. / Cognitive Brain Research 22 (2004) 1–12
Table 2 Factors
Stimulus number
1 2 3 4 5
15, 12, 13, 20, 17, 18, 19, 14 10, 11, 8, 1 4, 9, 16, 2, 5 7 6, 3
To see which portion of the cellular responses carries the information for clustering, the responses were divided into two segments, the first containing the first 150 ms of the response, and the second the remainder (the stimulus exposure time was 500 ms). We then repeated the cluster analysis for the two segments separately in both monkeys. We found that, while the first part gave consistent results in both animals, similarly to the original observation, i.e., GI and RI formed separate clusters (Table 1); the second part of the response did not display the same clustering. The clusters contained a mixture of GI and RI and the two image groups were no longer separated.
We did not find any difference in the bchange in energyQ in the hue and brightness domains over the surface between GI and RI (hue: 0.3728 and 0.3728, t-test, p=0.24; brightness: 1.1221 and 0.7660, t-test, p=0.27, respectively), but detected a significant difference in the saturation between GI and RI (saturation: 1.3601 and 0.3728, t-test, p=0.04). Analysis of the images in the RGB format, however, revealed that GI and RI differed in the mean colourfulness index (see Section 2). This index ranged from 0.27 to 4.41, and GI on average had a higher index (2.52) than RI (1.45). This difference was significant (t-test, p=0.04). 3.5. Sparseness, selectivity index Sparseness (SP) and selectivity indices (SI) did not differ between the two animals (Table 4). Also, there were no differences between SP of GI and RI stimuli. Between SI, however, a small but significant difference was found, namely, the indices were larger for GI than for RI (t-test, p=0.04). The distribution of SP and SI is shown in Figs. 9 and 10, respectively.
3.3. Results of the factor analysis 3.6. Discussion 3.3.1. Monkey I Factor analysis resulted in five factors with eigenvalues over 1, which explained 69.16% of the total variance. The rotated component matrix shows that the following images bformQ a new factor (Table 2). Images #1–11 and #12–20 generally did not mix, and GI and RI remained separated, except in factor 3, where image #16 (chalice) was among the GI stimuli. 3.3.2. Monkey II Factor analysis resulted in six factors with eigenvalues over 1, explaining 72.37% of the total variance. According to the rotated component matrix the following images bformQ a new factor (Table 3). In this animal, some boutliersQ were found, stimulus #15 (the duck) was among the GI images, and the last factor was formed by two images belonging in different classes: stimulus #7 belongs in GI, while #20 (the statue) is a member of RI. Other than that, GI and RI were separated. 3.4. Physical features We grouped our stimuli according to the results of the previous analysis; thus, stimuli #1–11 (GI) and stimuli #12– 20 (RI) were put into separate groups. The physical features of the two groups were then compared. The mean surface size (in pixels) was 54440 in GI and 59220 in RI. No significant difference (t-test, p=0.33) was found, which was not surprising, since the stimuli were created in this way.On average, however, RI had longer perimeter lines than GI (805 vs. 633 pixels, t-test, p=0.02) and the total length of lines (perimeter+internal lines) was also greater in RI than in GI (1357 vs. 2265, p=0.002) (Fig. 8).
Similarities of the responses from 217 cells in the IT given to 20 images were analysed by factor analysis and by cluster analysis. A common result of each method used was that on the basis of an analysis of the net cellular responses given to our image set, the geometrical shapes could be distinguished from those containing photos of real-world objects. Tanaka et al. [43] reported that the complexity required to drive the cells in the IT increased as the recording sites were moved from the posterior portion to the anterior part of the IT. Their study suggested that there might be a bcomplexity gradientQ along the IT where neurones requiring different stimulus complexities are distributed in different anatomical locations. Optimal or non-effective stimuli were selected on the basis of the firing rates, i.e., stimuli were considered effective if they triggered high discharge rates, while non-effective ones caused only moderate increases in the firing rate or inhibition. Here, we report that cells in the IT might code stimuli with different complexities in other ways than just the amplitude of the mean firing rate. The response amplitudes given to GI and RI did not differ, however, we found differences in the
Table 3 Factors
Stimulus number
1 2 3 4 5 6
12, 14, 13, 19, 18, 16, 17 3, 15, 10 8, 6, 2, 5 11, 1 4, 9 7, 20
G. Sa´ry et al. / Cognitive Brain Research 22 (2004) 1–12
9
Fig. 8. Comparison of the physical features of the images used in the study. Since the values spanned several orders of magnitude, they were normalized to the maximum value within their corresponding class and averaged (meansFS.E. are shown). Values on the graph serve for comparison between the two image groups. For the real values see Section 3. Filled columns indicate values belonging to GI while empty columns indicate values of RI. Asterisks indicate that there was a significant difference between the GI and RI values.
selectivity indices (SI) in both animals between GI and RI images, i.e., selectivity was higher for the geometrical shapes than for the real world photos. It is possible that this difference is due to the recording locations. The GI images are simpler from several aspects than the real world photos, and it is known that neurons in the anterior IT have a higher selectivity for more complex images [43,42]. Our electrode penetrations were located and our data were collected more in the posterior IT where cells might prefer less complex stimuli. In the present experiments, awake, behaving monkeys were used, while the Tanaka study used anaesthetized monkeys. It has been reported previously that the ratio of the responsive cells and the mean response magnitude is higher in awake than in anaesthetized animals [43,41]. In our study, however, the mean firing rate was rather low. One possible reason for this might be the low luminance level of the images (around 8 cd/m2 in our study as compared with around 35 cd/m2 in the others) and that many of our cells were inhibited (i.e., gave negative net responses) to some of the stimuli. Also, we have to note, that the monkeys were highly overtrained and have seen the stimuli several thousands times by the time the data were collected. On the other hand, the stimuli themselves were irrelevant Table 4
Monkey I. (n=117) Monkey II. (n=100)
SP simple (GI)
SP complex (RI)
SI simple (GI)FStd
SI complex (RI)FStd
0.42 (0.09–0.86) 0.44 (0.09–0.87)
0.51 (0.11–0.97) 0.51 (0.11–0.92)
0.56 (F0.17) 0.60 (F0.18)
0.50 (F0.19) 0.53 (F0.18)
regarding the task, i.e., they had to perform a simple fixation task. Furthermore, instead of trying to find the optimum stimulus containing the critical feature for the neurone, we presented all of our stimuli to all of the recorded IT neurones. This procedure could in theory produce differences in firing rates and separate the images on the basis of the amplitude of the neuronal activity but this was not the case in our study as it concerns the population-response and not individual cellular reactions. It could be argued that the differences in size or mean luminance between GI and RI can explain our results. The net responses and the latencies of the responses to GI and RI stimuli were compared and no significant differences were found (see Section 3). We consider that the differences observed are due to factors other than size or luminance differences since the simple (bfirst-orderQ) physical features did not differ in the two stimulus groups. We hypothesized the presence of other, non-obvious, higher-order differences between the images used. An attempt was made to identify some of the factors, which could explain the clustering of the responses to our stimuli. It is known for example, that the available surface information and the amount of internal lines influences the discrimination and recognition of images [1,14]. Indeed, RI and GI differed both in the length of the perimeter and also in the total length of the lines visible in the stimuli. Since the surfaces in the two stimulus groups did not differ, the line density per unit surface area was higher in RI (0.04 vs. 0.025, t-test, p=0.02, Fig. 8). The number of black and white transitions on the figure surface (essentially, internal lines) has been found to modulate the cellular
10
G. Sa´ry et al. / Cognitive Brain Research 22 (2004) 1–12
coded in the firing rate as responsivity and selectivity for shape; colour information, on the other hand, cannot always be detected in the simple firing rate. Although there is evidence that many IT cells are selective for colours (e.g., Ref. [19]), and are sensitive for the saturation of a particular colour [20,9] and recently published studies are stating that changing the colour of the stimuli strongly affected the responses of the IT neurones in one monkey [3], our earlier studies gave us a strong evidence on that the removal of colour information from the stimuli did not affect at all the responses of the neurones at a population level [46]. Our studies suggested us that at the level of the IT colour and shape are processed in different ways, though, the implementation of these manners is not yet clear. One possibility is that the same cells can handle both kind of information, with different coding strategies. Another one is that the colour-sensitive cells are clustered in the cortex as they
Fig. 9. Distribution of the sparseness of GI and RI of Monkey I (A) and Monkey II (B), respectively. Arrows indicate median value of the distribution.
activity in IT neurones [4] and thus the line density might be another factor responsible for the differences observed. Obviously, if two images have the same area, the one with the longer perimeter will appear more complex (e.g., a circle or square vs. a cross or star, see Section 3). Since these parameters add to the complexity of the images, we suggest, that IT neurones might change they response characteristics (e.g., response variability) if the images differ in these features. On the other hand, these factors might serve to express the similarity (in the physical domain) of images, which do not differ in features that normally influence response levels. IT cells are best driven with complex, colourful images. Since our two stimulus groups differed in colourfulness index (2.52 and 1.45 for GI and RI, respectively, t-test, p=0.04), and in saturation (1.3601 and 0.3728 for GI and RI, respectively, t-test, p=0.04) seems that colour might also be another factor for clustering. Shape and form can be
Fig. 10. Distribution of the Selectivity Indices of GI and RI of Monkey I (A) and Monkey II (B), respectively. Arrows indicate median value of the distribution.
G. Sa´ry et al. / Cognitive Brain Research 22 (2004) 1–12
actually do in the earlier areas [26,50]. This possibility is underpinned by a recently published study, which found that areas with bcolour-biased activityQ have a patchy organisation in IT [47]. Our finding here shows that colour either independently or most probably in combination with other characteristics plays a role in the clustering of cellular responses for similar and different stimuli. Colour combined with other surface attributes might be a strong enough cue to cause differences but it might not alter the neuronal responses significantly when the same stimulus is presented in a coloured and non-coloured version. It has been reported that the early phase of the neuronal responses in the IT contains most of the useful information [48]. Following this logic, we divided our responses into two segments and the cluster analysis was repeated for the two segments separately in both monkeys. We found that the first part gave consistent results in both animals, i.e., the geometrical shapes and real-world images formed separate clusters. The second part did not display the same clustering, the clusters were mixed and the two image groups were no longer clearly separated. This finding supports the idea of Tovee and others that IT neurones carry significant information about the visual stimulus in the first 100–200 ms of their responses. On the other hand, Sugase et al. [40] demonstrated that IT cells can code information needed for different levels of categorization in the early and late parts of the neuronal responses, respectively. Our results, namely the difference in information content of the early and late parts can be the result of similar coding strategy. In both animals, there were stimulus pairs, which grouped consistently in the dendrogram: the torso-like statue (#13) and the crucifix (#18, Figs. 3 and 4). It is tempting to speculate that this grouping was a result of some implicit categorization mechanism, and that the images were situated close together in the dendrogram because of some semantic influence from higher association areas. We do not think, however, that our limited set of stimuli could provide an evidence for such a sophisticated mechanism. Further experiments are needed to clarify this question. Our data clearly indicate the importance of physical features. In Monkey I for instance stimuli #2 and #9 (the square–ellipse, and polygon–circle) clustered, which might indicate the importance of the outline [22] or building elements (see also stimuli #3 and #10 in both monkeys). Other examples could also be demonstrated, such as the internal pattern (images #1 and #5 in both monkeys), or the similar colours in stimuli 14 and 16 in both monkeys. These examples might be a manifestation of the coding mechanism Tanaka et al. suggested, i.e., that individual neurones code similar features, and small changes in the responses of individual neurones in a neuronal ensemble might result in the coding of the whole stimulus. It may also lend support to the results of Vogels [51], who showed that whole neurone populations code a particular category (tree vs. non-tree). It is plausible to suggest that exemplars belonging to the same
11
category share similar physical features (colourfulness, internal pattern, building elements, colours, etc.), and the bhigher-orderQ categorisation manifests in the first part of the response as [40] suggested. The sparseness values in our work did not differ much from those obtained in other studies like Vogels [51] or [52]. Vogels [51] provides a relationship between AP level of IT and the sparseness. Values obtained in the present study suggest two things: (1) The recording positions in our animals might be very similar to the most posterior electrode penetrations in Ref. [51]. (2) The lack of difference between the sparseness values between our two monkeys gives an additional proof of similar electrode locations in both cases. Our results suggest that IT cells respond differently to simple and complex images. Single cells might code the complexity of images reliably, even if they do not differ in simple physical features. Since the results from the two monkeys were highly similar, there might be general rules that determine the responses of IT cells to stimuli that differ in complexity.
Acknowledgments This work was supported by the following grants: McDonnell JSMF 96-44, OTKA T-042610, ETT 574/2000 and FKFP 0455/2000. The authors thank G. Do´sai and P. Liszli for technical assistance, K. Hermann for maintaining the laboratory equipment, M. Jana´ky and T. Gyetvai for help in the eye surgery, and E. Vfrfs for the NMR images. We also thank for R. Vogels at KULeuven for his comments on the original version of the manuscript.
References [1] I. Biederman, Recognition-by-components: a theory of human image understanding, J. Artic. Psychol. Rev. 94 (2) (1987) 115 – 147. [2] R. Desimone, T.D. Albright, C.G. Gross, C. Bruce, Stimulus selective properties of inferior temporal neurons in the macaque, J. Neurophysiol. 4 (1984) 2051 – 2062. [3] R. Edwards, D. Xiao, C. Keysers, P. Foldiak, D. Perrett, Color sensitivity of cells responsive to complex stimuli in the temporal cortex, J. Neurophysiol. 90 (2) (2003) 1245 – 1256. [4] E.N. Eskandar, B.J. Richmond, L.M. Optican, Role of inferior temporal neurons in visual memory: I. Temporal encoding of information about visual images, recalled images and behavioural context, J. Neurophysiol. 68 (1992) 1277 – 1295. [5] D. Fotheringhame, R. Baddeley, Nonlinear principal components analysis of neuronal spike train data, Biol. Cybern. 77 (4) (1997) 283 – 288. [6] I. Fujita, K. Tanaka, M. Ito, K. Cheng, Columns for visual features of objects in monkey inferotemporal cortex, Nature 360 (6402) (1992) 343 – 346. [7] C.G. Gross, How the inferotemporal cortex became a visual area, Cereb. Cortex 5 (1994) 455 – 469. [8] C.G. Gross, C.E. Rocha-Miranda, D.B. Bender, Visual properties of neurons in inferotemporal cortex of the macaque, J. Neurophysiol. 35 (1) (1972) 96 – 111.
12
G. Sa´ry et al. / Cognitive Brain Research 22 (2004) 1–12
[9] A. Hanazawa, H. Komatsu, I. Murakami, Neural selectivity for hue and saturation of colour in the primary visual cortex of the monkey, Eur. J. Neurosci. 12 (5) (2000) 1753 – 1763. [10] P.D. Hanes, K.G. Thompson, J.D. Schall, Relationship of presaccadic activity in frontal eye field and supplementary eye field to saccadic initiation in macaque: poisson spike train analysis, Exp. Brain Res. 103 (1995) 85 – 96. [11] G. Hellekant, V. Danilova, Y. Ninomiya, Primate sense of taste: behavioral and single chorda tympani and glossopharyngeal nerve fiber recordings in the rhesus monkey Macaca mulatta, J. Neurophysiol. 77 (2) (1997) 978 – 993. [12] K. Hikosaka, Responsiveness of neurons in the posterior inferotemporal cortex to visual patterns in the macaque monkey, Behav. Brain Res. 89 (1997) 275 – 283. [13] T.E. Holy, C. Dulac, M. Meister, Responses of vomeronasal neurons to natural stimuli, Science 289 (2000) 1569 – 1572. [14] G.K. Humphrey, M.A. Goodale, L.S. Jakobson, P. Servos, The role of surface information in object recognition: studies of a visual form agnosic and normal subjects, Perception 23 (12) (1994) 1457 – 1481. [15] S.J. Judge, B.J. Richmond, F.C. Chu, Implantation of magnetic search coils for measurement of eye position: an improved method, Vis. Res. 20 (1980) 535 – 538. [16] L. Kaufman, P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, Wiley Interscience, New York, 1990. [17] K.J. Killian, R. Watson, J. OTIS, T.A.S.T. Amand, P.M. O’byrne, Symptom perception during acute bronchoconstriction, Am. J. Respir. Crit. Care Med. 162 (2) (2000) 490 – 496. [18] E. Kobatake, K. Tanaka, Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex, J. Neurophysiol. 71 (3) (1994) 856 – 867. [19] H. Komatsu, Y. Ideura, Relationship between color, shape and pattern selectivity of neurons in the inferotemporal cortex of the monkey, J. Neurophysiol. 70 (1993) 677 – 694. [20] H. Komatsu, Y. Ideura, S. Kaji, S. Yamane, Color selectivity of neurons in the inferior temporal cortex of the awake macaque monkey, J. Neurosci. 12 (2) (1992) 408 – 424. [21] G. Kova´cs, R. Vogels, G.A. Orban, Selectivity of macaque inferior temporal neurons for partially occluded shapes, J. Neurosci. 15 (1995) 1984 – 1997. [22] G. Kova´cs, G. Sa´ry, K. Kfteles, Z. Chadaide, T. Tompa, R. Vogels, G. Benedek, Effects of surface cues on macaque inferior temporal cortical responses, Cereb. Cortex 13 (2) (2003) 178 – 188. [23] A. Laakso, G. Cotrell, Content and cluster analysis: assessing representational similarity in neural systems, Philos. Psychol. 13 (2000) 47 – 76. [24] C.R. Legendy, M. Salcman, Bursts and recurrences of bursts in the spike trains of spontaneously active striate cortex neurons, J. Neurophysiol. 53 (1985) 926 – 939. [25] W.C. Li, A. Roberts, The classification of spinal interneurons using cluster analysis in Xenopus laevis tadpoles, J. Physiol. (Lond.) 533 (2001) 56 – 57. [26] M.S. Livingstone, D.H. Hubel, Anatomy and physiology of a color system in the primate visual cortex, J. Neurosci. 4 (1984) 309 – 356. [27] N.K. Logothetis, D.L. Sheinberg, Visual object recognition, Annu. Rev. Neurosci. 19 (1996) 577 – 621. [28] H. Op de Beeck, J. Wagemans, R. Vogels, Inferotemporal neurons represent low-dimensional configurations of parameterized shapes, Nat. Neurosci. 4 (12) (2001) 1244 – 1252. [29] S.E. Palmer, Vision Science; Photons to Phenomenology, MIT Press, Cambridge, MA, 1999, pp. 390 – 391. [30] G. Paxinos, H. Xu-Feng, A.W. Toga, The Rhesus Monkey Brain in Stereotaxic Coordinates, Academic Press, San Diego, CA, 2000. [31] B.J. Richmond, L.M. Optican, M. Podell, H. Spitzer, Tempora. Encoding of two-dimensional patterns by single units in primate
[32]
[33] [34]
[35]
[36]
[37]
[38]
[39] [40]
[41]
[42] [43]
[44] [45]
[46]
[47]
[48]
[49] [50]
[51]
[52]
inferior-temporal cortex: I. Response characteristics, J. Neurophysiol. 57 (1987) 132 – 146. E.T. Rolls, A. Treves, The relative advantages of sparse versus distributed encoding for associative neuronal networks in the brain, Network 1 (1990) 407 – 421. Gy Sa´ry, R. Vogels, G.A. Orban, Cue-invariant shape selectivity of macaque inferior temporal neurons, Science 260 (1993) 995 – 997. J.W. Scannell, G.A.P.C. Burns, C.C. Hilgetag, M.A. O’Neil, M.P. Young, The connectional organization of the cortico-thalamic system of the cat, Cereb. Cortex 9 (3) (1999) 277 – 299. N. Schqlert, U. Dicke, The effect of stimulus features on the visual orienting behavior of the salamander Plethodon jordani, J. Exp. Biol. 205 (2002) 241 – 251. E.I. Schwartz, R. Desimone, T.D. Albright, C.G. Gross, Shape recognition and inferior temporal neurons, Proc. Natl. Acad. Sci. U. S. A. 80 (1983) 5776 – 5778. T.R. Scott, B.K. Giza, J. Yan, Gustatory neural coding in the cortex of the alert cynomolgus macaque: the quality of bitterness, J. Neurophysiol. 81 (1) (1999) 60 – 71. D.L. Sheinberg, N.K. Logothetis, The role of temporal cortical areas in perceptual organization, Proc. Natl. Acad. Sci. U. S. A. 94 (7) (1997) 3408 – 3413. N. Sigala, N.K. Logothetis, Visual categorization shapes feature selectivity in the primate temporal cortex, Nature 415 (2002) 318 – 320. Y. Sugase, S. Yamane, S. Ueno, K. Kawano, Global and fine information coded by single neurons in the temporal visual cortex, Nature 400 (6747) (1999) 869 – 873. H. Tamura, K. Tanaka, Visual response properties of cells in the ventral and dorsal parts of the macaque inferotemporal cortex, Cereb. Cortex 11 (2001) 384 – 399. K. Tanaka, Inferotemporal cortex and object vision, Annu. Rev. Neurosci. 19 (2000) 109 – 139. K. Tanaka, H.-A. Saito, Y. Fukada, M. Moriya, Coding visual images in the inferotemporal cortex of the macaque monkey, J. Neurophysiol. 66 (1991) 170 – 189. T. Tango, A test for spatial disease clustering adjusted for multiple testing, Stat. Med. 19 (2) (2000) 191 – 204. T. Tompa, Z. Chadaide, L. Lenti, G. Csifcsak, Gy. Kovacs, Gy. Benedek, Invariances of shape-processing for reduced surface cues: how IT neurons and psychophysics correlate in the macaque, Proceedings of the Third International Conference on Cognitive Science (ICCS 2001), Beijing China, Press of USTC, 2001, pp. 40 – 44. T. Tompa, Z. Chadaide, L. Lenti, G. Csifcsa´k, G. Kova´cs, Gy Benedek, Colour independent shape selectivity in inferior temporal cortex, Perception 31 (2002) 67. R.B.H. Tootell, K. Nelissen, W. Vanduffel, G.A. Orban, Search for Color dCenter(s)T in Macaque Visual Cortex, Cereb. Cortex 14 (2004) 353 – 363. M.J. Tovee, E.T. Rolls, A. Treves, R.P. Bellis, Information encoding and the responses of single neurons in the primate temporal visual cortex, J. Neurophysiol. 70 (2) (2004) 640 – 654. A. Treves, E.T. Rolls, What determines the capacity of autoassociate memories of the brain? Network 2 (1991) 371 – 379. D.Y. Ts’o, C.D. Gilbert, The organization of chromatic and spatial interactions in the primate striate cortex, J. Neurosci. 8 (1988) 1712 – 1727. R. Vogels, Categorization of complex visual images by rhesus monkeys: part 2. Single-cell study, Eur. J. Neurosci. 11 (1999) 1239 – 1255. E.T. Rolls, M.J. Tovee, Sparseness of the neuronal representation of stimuli in the primate temporal visual cortex, J. Neurophysiol. 73 (1995) 713 – 726.