Neuroscience Letters 276 (1999) 45±48 www.elsevier.com/locate/neulet
The role of the posterior parietal cortex in human object recognition: a functional magnetic resonance imaging study Takeshi Sugio a,*, Toshio Inui a, Kayako Matsuo b, Masako Matsuzawa b, G.H. Glover c, Toshiharu Nakai b a
Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida-hommachi, Sakyo-ku, Kyoto 606-8501, Japan b Magnetic Resonance Science Laboratory, Life Electronics Research Center, Electrotechnical Laboratories, Tsukuba 300±4201, Japan c Lucas MRS/MRI Center, Department of Radiology, Stanford University, Stanford, CA, USA Received 9 August 1999; received in revised form 23 September 1999; accepted 23 September 1999
Abstract The mechanisms involved in visual object recognition from non-canonical viewpoints were investigated using functional magnetic resonance imaging (fMRI). We used a passive observation task and found three areas activated more strongly in the non-canonical viewing condition compared with the canonical viewing condition. First, it was found that the fusiform gyrus and posterior part of the inferior temporal cortex were involved in the processing of shape information. Next, it was found that the posterior parietal cortex, mainly the superior parietal lobule and the ventral part of premotor area were involved in visuospatial processing and accessing sensorimotor knowledge. These results may indicate that recognition from non-canonical viewpoints is supported by using functional properties of the object, which require more real-time processing for object manipulation. q 1999 Elsevier Science Ireland Ltd. All rights reserved. Keywords: Functional magnetic resonance imaging; Object recognition; Canonicality; Visuospatial processing; Sensorimotor knowledge; Superior parietal lobule; Premotor area
Although the image of an object projected onto the retina changes greatly depending on the spatial relation between the object and the observer, our ability to recognize objects is usually not affected by such retinal changes. It is how objects are represented in the brain to allow for such changes, or to achieve object invariance, that has been a central issue of visual object recognition. Recent ®ndings in neuropsychology suggest that identity and orientation are dissociable information. This may re¯ect the existence of two separate visual pathways in the brain [12]. Computing the precise orientation of an object irrelevant of its identity is necessary since one of the main goals of visual object recognition is to act upon the perceived object. Goodale and Milner proposed that the role of the dorsal visual pathway is to transform visual information to an appropriate motor command, which could be referred to as a `how' system [4]. Such an action-related system may need to process information in real time and may underlie the * Corresponding author. Tel.: 181-75-753-3146; fax: 181-75722-0985. E-mail address:
[email protected] (T. Sugio)
immediate and robust nature of recognition. The aim of this research was to investigate the role of the `how' system in the recognition of objects from unconventional or noncanonical viewpoints by comparing the brain regions activated during the recognition of objects from canonical and non-canonical viewpoints. In a positron emission tomography (PET) study, Kosslyn et al. [6] examined brain activation during an object name veri®cation task from canonical and non-canonical viewpoints. They found that dorsolateral prefrontal areas were strongly activated when identifying objects from non-canonical viewpoints and proposed that these areas are involved in searching for distinctive properties of objects. However, since the subjects were presented with what distinctive properties of objects must be searched in the perceived view by giving the object names in advance, this may have led to the particular strategy employed by the subjects. In the task used in this experiment, the subjects were instructed to passively view the presented picture of realistically drawn objects from canonical and non-canonical viewpoints. The objects were presented one at a time and the subjects tried to identify the object silently. This
0304-3940/99/$ - see front matter q 1999 Elsevier Science Ireland Ltd. All rights reserved. PII: S03 04 - 394 0( 9 9) 00 78 8- 0
46
T. Sugio et al. / Neuroscience Letters 276 (1999) 45±48
approach was considered unbiased toward any particular type of object recognition strategy and it is presumed to re¯ect the cognitive processes in daily life. Twelve neurologically normal subjects (including three females, all right-handed, aged between 21 and 39 years) joined the experiment. All subjects participated in both canonical and non-canonical view conditions and all gave written informed consent approved by the institutional review board (IRB) before explanation of the procedures. For each three-dimensional model of objects, two different images were rendered from canonical and non-canonical viewpoints. Examples of the picture stimuli are presented in Fig. 1. All stimuli were presented in gray-level on a white background to eliminate depth cues as much as possible. Objects viewed from a canonical perspective were assumed to provide their structural information most ef®ciently. Views in which the major axes of the objects were not foreshortened and the discriminative parts were not occluded were selected as canonical. However, non-canonical views were de®ned as views with major axes foreshortened or discriminative parts occluded. Therefore, the structural descriptions of the objects were dif®cult to derive from such views. It has been shown from empirical studies that such qualitative differences in object views often give rise to a noticeable change in recognition performance [8]. Our pilot study using a name veri®cation task showed that subjects were able to match images with their correct names suf®ciently for both types of images (87.7% for canonical views and 76.6% for non-canonical views; the difference was not statistically signi®cant). Both task sessions (canonical viewing and non-canonical viewing) were designed in a block manner (®ve rest and four task blocks, respectively, for 30 s and 270 s for a single task session). All volunteers participated in both sessions in random order. A trial in a task block was organized as follows. Each trial began with the presentation of a ®xation (1) on the projected screen for 1 s, followed by the visual stimulus for 3 s. They were instructed to keep viewing the picture until it disappeared. After a 1-s blank period, the next trial began. Thus, six different pictures were shown in one task block, which resulted in 24 randomly selected pictures from a list of 54 objects for each session. Most objects were graspable in size or at least could be interpreted
Fig. 1. Sample images of an object (can) employed in the experiment (left, canonical view; right, non-canonical view).
as such; that is, an airplane can be interpreted as graspable when a toy airplane is being imaged. Subjects were instructed to give the correct name of the displayed picture at a basic-category level silently. During the rest block, only a ®xation point was displayed for 30 s and subjects were instructed to look at the ®xation throughout the block. In the canonical viewing condition, all pictures displayed were rendered from a canonical viewpoint and in the non-canonical viewing condition, all pictures were rendered from a non-canonical viewpoint, as described earlier. The ®xation point was necessary for the task and the rest block to control eye movement. Images were acquired using an MRI scanner operating at 3T (GE 3T Sigma). Before the task sessions, anatomical images were acquired using a conventional spin echo for T1 images (TR 500=TE 40) and a fast spin echo for T2 images (TR 3000=TE 80=ETL 16). The number and the width of slices were identical to functional images. Anatomical images were used for normalization among individual subjects. Functional images were acquired using a T2*-weighted gradient recalled echo spiral k-space trajectory sequence with navigator echo correction [3]. The imaging parameters were TR 1500 ms, four shots, TE 30 ms, FA 60 deg, 20 axial slices, 6 mm thick and the FOV 24 cm. Forty-®ve images per slice were acquired in a 270 s scan time. The obtained images were reconstructed to an in-plane resolution of 3.33 mm (72 £ 72 effective matrix, reconstructed into 128 £ 128). Both image analyses and statistical tests were performed using SPM96 (Wellcome Department of Cognitive Neurology, London, UK). The functional images were realigned to adjust for head movements and the realigned images were then transformed into the standard stereotaxic Talairach space [11] using the same MNI template to accommodate inter-subject variability in anatomy. Then, the normalized images were smoothed with an isotropic Gaussian smoothing kernel of 4 mm. Linear contrasts were used to test hypotheses about regionally speci®c condition effects. In addition, a statistical parametric map of the t statistic was generated for each voxel. Corresponding Z-values were thresholded at P 0:001 and corrected for multiple comparisons based on the spatial extent of activated voxels (P 0:05). For each canonical and non-canonical viewing condition, both group and individual analyses were performed. Individual analyses were performed to determine whether activations revealed from group analyses for each condition were signi®cantly different between the two conditions. First, group analytical results showed several signi®cant areas for the non-canonical viewing condition (Table 1; P 0:001, corrected for multiple comparisons). The thresholded fMRI surface activation images superimposed on structural images for 12 subjects in the non-canonical condition are shown in Fig. 2. The pixels indicate levels of statistical signi®cance above P 0:001. For individual analyses, we ®rst calculated the number of activated voxels, the size of a cluster, occurring in regions of interest (ROI) in two
T. Sugio et al. / Neuroscience Letters 276 (1999) 45±48
47
Table 1 Coordinates of local peak in the non-canonical condition a Region
BA
Talairach coordinates
Z-score
R middle frontal gyrus L middle frontal gyrus Medial superior frontal gyrus R inferior frontal gyrus R superior parietal lobule L superior parietal lobule R fusiform gyrus
6/9 9 6 6/44 7 7 19/37
48, 10, 34 246, 10, 38 0, 10, 56 46, 4, 28 20, 270, 56 228, 268, 54 34, 264, 216
8.01 7.62 5.94 7.77 7.90 7.31 5.92
a
Coordinates are in millimeters; right (1) or left (2) for x; anterior (1) or posterior (2) for y; above (1) or below (2) for z.
conditions for each of twelve subjects. The locations of the ROIs were bilateral middle frontal gyri, a medial superior frontal gyrus (supplementary motor area), bilateral premotor areas, bilateral superior parietal lobules and bilateral fusiform gyri. A signi®cance of difference for each of ROIs between the two conditions was statistically tested by paired t-tests. As a result, there was signi®cantly larger activation in the non-canonical viewing condition than in the canonical viewing condition for the bilateral premotor areas (Brodmann's area, BA, 4/6,44) and the left superior parietal lobule (BA7; t
11 2:30, P 0:042; t
11 1:96, P 0:019, respectively). For the premotor area, activation was noted in four (left) and ®ve (right) subjects for the canonical viewing condition and in nine subjects (both left and right) for the non-canonical viewing condition. For the posterior parietal cortex, activation was noted in two (left) and ®ve (right) subjects for the canonical viewing condition and in seven (left) and nine (right) subjects for the non-canonical viewing condition. No areas showed statistically larger activation in the canonical viewing condition compared with the non-canonical viewing condition. As a result, two regions were found to be
Fig. 2. The thresholded fMRI surface activation images superimposed on structural images for group data (non-canonical viewing condition).
more deeply involved in object recognition from a noncanonical viewpoint: the ventral part of the premotor area and the posterior parietal cortex including the superior parietal lobule. Evidence for laterality was not clearly observed among individual subjects. One plausible account of these results is that subjects mentally rotated the perceived view of an object to access the canonical representation of the object. The activated areas within the posterior parietal cortex observed in our study were compatible with the neuroimaging studies of mental rotation [1]. In addition, neuropsychological evidence showed that real-time object recognition is performed in area 37 (posterior temporal area) [2], which is compatible with our results. Presumably, shape information processed at this area is transmitted to the superior parietal lobule, where mental rotation is performed. However, a mental rotation account of object recognition from non-canonical views may not be suf®cient enough for the following reason. The three-dimensional block objects employed in most previous studies are quite special in that an arbitrary view of the object can easily be predicted, since all the characteristic feature points remain visible throughout the entire viewing space, which is uncommon in daily situations. This may have induced subjects to rely on bottom-up visuospatial processing independent of stored object-speci®c knowledge. In other words, when we recognize objects from non-canonical viewpoints, visual information required to generate or access canonical object representation is often insuf®cient for that purpose. We may need to access more stable information about object identity associated with visual shapes. One candidate for such information is sensorimotor experiences with objects. It is plausible that object invariance is acquired by the association of visual shape and haptically perceived object structure. Several ®ndings support the claim that activation in the dorsal visual pathway may re¯ect sensorimotor experience with an object. Grafton et al. [5] showed that the brain activity in the ventral region of premotor cortex is related to the observation of graspable objects without any overt motor response. Similar ®ndings have been shown in neurophysiological studies [10]. Furthermore, Riddoch and Humphreys [9] reported a
48
T. Sugio et al. / Neuroscience Letters 276 (1999) 45±48
patient with optic aphasia, who had dif®culty in naming objects perceived visually. The patient was able to gesture the use of an object that he could not name and he was often capable of naming the object that he could correctly gesture. This might represent a subject in which the object was identi®ed using procedural knowledge when semantic knowledge was inaccessible solely from visual information. The mental-rotation and sensorimotor knowledge accounts are not mutually exclusive. Presumably, both bottom-up visuospatial processing and top-down utilization of sensorimotor knowledge may operate complementarily. Existence of multiple routes to access object representation is ef®cient for robust recognition under the situation in which visual information is degraded. After the failure of a shape-matching process takes place in the inferior temporal region, action-related knowledge is utilized to access canonical object representations from non-canonical views of objects in the dorsal visual pathway. It may be argued that the activation associated with the non-canonical viewing condition is simply related to increased activation in the frontal eye ®eld due to visual scanning of non-canonical views more intensively. Although we cannot deny the potential role of eye movement, there are mainly two reasons that we believe factors other than eye movement were involved in non-canonical object viewing. First, the peak of activation in the precentral sulcus was located more ventral than previously reported (e.g. 48,5,44; BA6/8) [7]. Second, a ®xation (1) was presented for 1 s before the presentation of an image to induce subjects to ®xate at the center of the image. In conclusion, the superior parietal lobule and the premotor area are involved in the retrieval of visuallyguided action knowledge necessary for recognizing objects from non-canonical viewpoints. These results might indicate that recognition from non-canonical viewpoints is accomplished using functional properties of the
object, such as which part of the object is suited as a handle. This study was supported in part by grants from JSPS Research Fellow. [1] Alivisatos, B. and Petrides, M., Functinal activation of the human brain during mental rotation. Neuropsychologia, 35 (1997) 111±118. [2] Biederman, I., Gerhardstein, P.C., Cooper, E.E. and Nelson, C.A., High level object recognition without an anterior inferior temporal lobe. Neuropsychologia, 35 (1997) 271±287. [3] Glover, G.H. and Lai, S., Self-navigated spiral fMRI: interleaved versus single-shot. Magn. Reson. Med., 39 (1998) 361±368. [4] Goodale, M.A. and Milner, A.D., Separate visual pathways for perception and action. Trends Neurosci., 15 (1992) 20± 25. [5] Grafton, S.T., Fadiga, L., Arbib, M.A. and Rizzolatti, G., Premotor cortex activation during observation and naming of familiar tools. NeuroImage, 6 (1997) 231±236. [6] Kosslyn, S.M., Alpert, N.M., Thompson, W.L., Chabris, C.F., Rauch, S.L. and Anderson, A.K., Identifying objects seen from different viewpoints. A PET investigation. Brain, 117 (1994) 1055±1071. [7] Luna, B., Thulborn, K.R., Strojwas, M.H., McCurtain, B.J., Berman, R.A., Genovese, C.R. and Sweeney, J.A., Dorsal cortical regions subserving visually guided saccades in humans: an fMRI study. Cereb. Cortex, 8 (1998) 40±47. [8] Palmer, S., Rosch, E. and Chase, P., Canonical perspective and the perception of objects. In J. Long and A. Baddeley (Eds.), Attention and Performance IX, Lawrence Erlbaum Associates, Hillsdale, NJ, 1981, pp. 135±151. [9] Riddoch, M.J. and Humphreys, G.W., Visual object processing in optic aphasia: a case of semantic access agnosia. Cognit. Neuropsychol., 4 (1987) 131±185. [10] Sakata, H. and Taira, M., Parietal control of hand action. Curr. Opin. Neurobiol., 4 (1994) 847±856. [11] Talairach, J. and Tournoux, P., Co-Planar Atlas of the Human Brain, Thieme, New York, 1988. [12] Turnbull, O.H., A double dissociation between knowledge of object identity and object orientation. Neuropsychologia, 35 (1997) 567±570.