The neural representation of face space dimensions

Neuropsychologia 51 (2013) 1787–1793 Contents lists available at ScienceDirect Neuropsychologia journal homepage: www.elsevier.com/locate/neuropsych...

Download PDF

578KB Sizes 0 Downloads 69 Views

Report

PDF Reader
Full Text

Neuropsychologia 51 (2013) 1787–1793

Contents lists available at ScienceDirect

Neuropsychologia journal homepage: www.elsevier.com/locate/neuropsychologia

The neural representation of face space dimensions Xiaoqing Gao n, Hugh R. Wilson Centre for Vision Research, York University, 4700 Keele Street, Toronto, Ontario, Canada M3J 1P3

art ic l e i nf o

a b s t r a c t

Article history: Received 1 April 2013 Received in revised form 17 June 2013 Accepted 1 July 2013 Available online 10 July 2013

Functional neural imaging studies have identiﬁed a network of brain areas that are more active to faces than to other objects. However, it remains largely unclear how these areas encode individual facial identity. To investigate the neural representations of facial identity, we constructed a multidimensional face space structure, whose dimensions were derived from geometric information of faces using the Principal Component Analysis (PCA). Using fMRI, we recorded participants' neural responses when viewing blocks of faces that differed only on one dimension within a block. Although the response magnitudes to different blocks of faces did not differ in a univariate analysis, multi-voxel pattern analysis revealed distinct patterns related to different face space dimensions in brain areas that have a higher response magnitude to faces than to other objects. The results indicate that dimensions of the face space are encoded in the face-selective brain areas in a spatially distributed way. & 2013 Elsevier Ltd. All rights reserved.

Keywords: Facial identity Face space dimension PCA Multi-voxel pattern analysis fMRI

1. Introduction Recognizing faces is among the most important basic skills for social interaction. Although a typical human adult can identify a person from his/her face in a fraction of a second, this seemingly simple ability surpasses any computer system in its efﬁciency and robustness. Valentine (1991) suggested that underlying the ability to individuate faces is a system that encodes faces as points in a multidimensional space (the face space). This hypothesis has received support from numerous behavioral studies (e.g., Webster, Kaping, Mizokami, & Duhamel, 2004; Leopold, Rhodes, Müller, & Jeffery, 2005; Rhodes et al., 2011; Said & Todorov, 2011). However, little is known about how this multidimensional face space is represented in the human brain. Neuroimaging studies have identiﬁed a network of brain areas that are involved in face perception. By comparing the magnitude of brain response to faces with the response to other categories of objects (e.g., houses), studies consistently report that the fusiform face area (FFA, Kanwisher, McDermott, & Chun, 1997; Sergent, Ohta, & MacDonald, 1992) and the occipital face area (OFA, Gauthier et al., 2000; see Pitcher, Walsh, & Duchaine, 2011 for a review) are more active to faces than to other objects. Although the heightened response to faces in these brain areas does not directly indicate the function of individuating faces, later studies have shed light on the role of these areas in encoding individual facial identity. Grill-Spector, Knouf, and Kanwisher (2004) reported a positive correlation between the blood oxygen level-dependent (BOLD) response magnitudes in the FFA with behavioral performance in identifying faces. Using the fMRI-adaptation

n

Corresponding author. Tel.: +1 416 736 2100x33325; fax: +1 416 736 5857. E-mail address: [email protected] (X. Gao).

0028-3932/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.neuropsychologia.2013.07.001

paradigm, which exploits the observation that BOLD response is reduced after prolonged presentation of a stimulus, studies have shown that the FFA and the OFA are sensitive to changes of facial identity (Rotshtein, Henson, Treves, Driver, & Dolan, 2005; GrillSpector et al., 1999; Lofﬂer, Yourganov, Wilkinson, & Wilson, 2005). Another line of evidence comes from studies with prosopagnosia patients with focal lesion in the FFA (e.g., Barton, Press, Keenan, & O’Connor, 2002) or OFA (e.g., Buvier & Engel, 2006; Rossion et al., 2003; Steeves et al., 2006). Furthermore, by temporarily disrupting the function of the OFA (Pitcher, Walsh, Yovel, & Duchaine, 2007) through transcranial magnetic stimulation (TMS), people's accuracy in recognizing individual faces was reduced. Collectively, these ﬁndings suggest the important roles of the FFA and the OFA in individuating faces. Therefore, the FFA and the OFA are good candidates for the current investigation of the neural representations of the multidimensional face space. A face space structure consists of two basic elements: the origin of the space and the dimensions. Valentine (1991) suggests that the origin of the face space represents the central tendency of all the faces encountered in one's life. There are two types of encoding mechanisms in relation to the origin of the face space that have been proposed. One hypothesis suggests that individual facial identities are encoded relative to the origin of the space (norm-based coding). The other hypothesis suggests that faces are encoded relative to the existing exemplars (exemplar-based coding) without referencing to the origin of the space. A recent study (Lofﬂer et al., 2005) demonstrated that BOLD responses to faces in the FFA increase with increasing distance between the face and the origin of the face space (the average face) as would be predicted by the norm-based coding hypothesis but not by the exemplar-based coding hypothesis. The results indicate that the distance between an individual face and the origin of the face space is encoded as BOLD response amplitude in the FFA.

1788

X. Gao, H.R. Wilson / Neuropsychologia 51 (2013) 1787–1793

Valentine (1991) suggests that the dimensions of the face space were formed through experience with faces, but no speciﬁc mechanism was proposed. The neural representation of the face space dimensions in the human brain remains largely unclear. Recent neuroimaging studies investigating the neural representations of individual facial identities have found that individual facial identities are encoded as patterns of neural responses in distributed cortical areas. Kriegeskorte, Formisano, Sorger, & Goebel (2007) recoded neural responses to one female and one male face, both of three quarter view. The two faces elicited different neural response patterns in the anterior inferotmeporal cortex (aIT). To maximize the discriminability of the facial identities, Natu et al. (2010) included face/anti-face pairs in their experiments. They found that a pattern classiﬁer could reliably discriminate the neural response patterns to different facial identities in the ventral temporal cortex, including the fusiform gyrus and the lateral occipital areas, despite changes of point of view of the faces. A recent study (Nester, Plaut, & Behrmann, 2011) has conﬁrmed the role of the FFA in individuating faces, as a searchlight analysis revealed that the fusiform area is one of the most informative areas in the neural response patterns for discriminating four facial identities with varying facial expressions. Since the face space dimensions are the basic elements encoding individual facial identity, one possibility is that the face space dimensions are also encoded as distributed patterns of neural activations in the face-selective cortical areas. Alternatively, it is possible that different face space dimensions are encoded in different loci in the brain. The collective pattern of activation of these loci encodes individual facial identity. To test these two hypotheses, we took a univariate approach and a multivariate approach to analyze the neural responses to changes of facial identities on different face space dimensions. We deﬁned the face space dimensions based on statistical regularities of a set of faces. Speciﬁcally, we ran Principal Component Analysis (PCA) on the geometric information of a set of male Caucasian faces. We used the average face as the origin and used the resulting Principal Components (PCs) as dimensions to set up a face space structure. PCs have proved effective in encoding images of faces for computer recognition (Sirovich & Kirby, 1987; Turk & Pentland, 1991) and in modeling human perception (Hancock, Burton, & Bruce, 1996; O’Toole, Deffenbacher, Abdi, & Bartlett, 1991). The dimensions represented by the PCs are orthogonal. They do not represent local facial features, such as the eyes or the nose; instead, they represent the global conﬁguration of the faces, which has been demonstrated to be important in face recognition (e.g., Tanaka & Farah, 1993; Maurer, Le Grand, & Mondloch, 2002). One important feature of the PCA approach is that the PCs explain different amount of variation in the face set. Therefore, some PCs are more “prominent” than the others, as they explain more variations in the face set. In the current study, besides investigating how the brain encodes the face space dimensions deﬁned by PCA, we are also interested in comparing the brain response to PCs of different importance. We compared brain responses between two PCs, one with a high eigenvalue (PC1) and one with a low eigenvalue (PC16). In the current set of faces, PC1 explained eight times as much of the variance as PC16 explained. By collecting both behavioral and functional neural imaging data, we are able to measure both perceptual sensitivity and neural sensitivity to changes of facial identities along the two face space dimensions that differed statistically.

2. Material and methods 2.1. Participants Participants were nine adults (4 females, mean age ¼31 years, SD ¼ 3.6 years). All (except one male) participants were right-handed. None of the participants reported any history of psychiatric or neurological disorders, or current use of any psychoactive medications. The data from one male participant were excluded from

the ﬁnal data analysis because this participant has unusually large ear canals, which caused artifacts in BOLD signals in the ventral part of the temporal lobe. The study is approved by York University Research Ethics Board. We obtained informed written consent from all the participants. 2.2. Stimuli 2.2.1. Synthetic faces We used synthetic faces derived from digital photographs of 41 Caucasian males. The detailed description of the design of the synthetic faces has been reported in a previous study (Wilson, Lofﬂer, & Wilkinson, 2002). Brieﬂy, each synthetic face is deﬁned by 37 parameters. Within the 37 parameters, 23 of them deﬁne the head shape and hairline, while the remaining 14 parameters deﬁne the locations and sizes of the facial features. All the 37 measures were normalized with the unit change on each measure representing a percentage relative to the mean head radius of the 41 synthetic faces. The reconstructed synthetic faces were grayscale and were ﬁltered with a bandpass difference of Gaussians (DOG) ﬁlter centered on 10 cycles per face with a bandwidth of two octaves to keep the most important information for facial identity (Gao & Maurer, 2011; Gold, Bennett, & Sekuler, 1999; Näsänen, 1999). The synthetic faces capture the major geometric information of individual faces, while leaving out ﬁne details such as color and skin texture. The synthetic faces simpliﬁed the representations of the real faces compared to pixel based coding, while they still carry sufﬁcient information of individual identities as demonstrated by high accuracy in matching the identities between synthetic faces and photographs of individual faces (Wilson et al., 2002). 2.2.2. Face space structure We submitted the 41 synthetic faces to PCA. Unlike the original 37 parameters that have a certain degree of correlation among them, the resulting 37 PCs are orthogonal to each other, making them good candidates for face space dimensions. We set up a multidimensional face space structure centered on the mean of the 41 synthetic faces with the 37 PCs as the dimensions. Distance in this face space structure is deﬁned as the Euclidean distance between two faces in the 37-dimensional face space as a fraction of the mean head radius of the faces 2.2.3. Experimental stimuli We created synthetic faces along two dimensions (PC1 and PC16). PC1 explained 13.2% of the total variance in the original 41 faces while PC16 explained only 1.7% of the total variance. On each direction (+ or ) of each PC dimension, the synthetic faces had distances of 0.1, 0.17, and 0.24 from the average face. We chose these three distances because a previous study (Wilson et al., 2002) has shown that discrimination threshold at 75% accuracy for the synthetic faces was at a distance of 0.06. Therefore in the current stimuli, the faces that were the closest (a distance of

Fig. 1. A face space structure constructed based on PCA. The origin of the space (red) is the average of 41 Caucasian male faces. The two dimensions of the space are derived from PC1 (green) and PC16 (blue) of the 41 Caucasian male faces. On each direction of each dimension, three faces were created with a distance of 0.1, 0.17, or 0.24 from the average face, with the distances deﬁned as the Euclidean distance between two faces in the 37-dimensional face space as a fraction of the mean head radius of the original 41 faces (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.).

X. Gao, H.R. Wilson / Neuropsychologia 51 (2013) 1787–1793 0.1) to the average face can still be discriminated from the average face, while the most similar two faces (a distance of 0.07) can be discriminated from each other. There were 12 faces in total (Fig. 1): 0.1PC1+, 0.17PC1+, 0.24PC1+, 0.1PC1 , 0.17PC1 , 0.24PC1 , 0.1PC16+, 0.17PC16+, 0.24PC16+, 0.1PC16 , 0.17PC16 , and 0.24PC16 . The face stimuli were generated within Matlab with custom written code and presented using Psychophysics Toolbox (Brainard, 1997; Pelli, 1997). They were back-projected onto a projection screen by an MRI compatible LCD projector and viewed by the participant through a mirror placed within the RF head coil at a viewing distance of 43 cm. The face stimuli were on average 10.9 8.1 degree of visual angle at the viewing distance. The position of each stimulus was randomly jittered for one degree of visual angle (44 pixel) away from the center of the screen to prevent apparent motion between trials.

1789

the same as the target face. The target face differed from the distracter face only on one PC dimension with the amount of difference varied at four levels (0.03, 0.06, 0.09, and 0.12). For each direction of each PC dimension, there were 15 repetitions at each difference level. There were 240 trials in total (2 PC dimensions (PC1 vs. PC16) 2 directions (7 ) 4 levels of differences 15 repetitions). We calculated the proportion of correct responses for each level of difference of each PC dimension averaged across the two directions on that dimension and then ﬁt Weibull functions to estimate the threshold level of difference at 75% accuracy. 2.4. MR acquisition We collected all the data in a 3 T SIEMENS MAGNETOM TrioTim scanner (software version B17, Siemens, Erlangen, Germany) using a 12-channel phasedarray head coil.

2.3. Procedures 2.3.1. Face discrimination task We used a block design for the face discrimination task. There were four types of block conditions: PC1+, PC1 , PC16+, and PC16 . Within each block, faces were from the same direction of the same dimension. They only differed on the distance from the average face. Participants completed six face discrimination runs in the scanner. For each run, there were 12 blocks consisting of three repetitions of each of the 4 types of blocks. The blocks were presented in a pseudo-random order so that the adjacent blocks were always different. Within each block, there were 10 presentations of faces with a 20% probability that a face would be identical to the previous one. Participant performed a one-back task where they pressed a predeﬁned key when the current facial identity matched the previous one. On average, the participants achieved an accuracy score of 0.83 (SD ¼0.03) for the one-back task. Within a block, each presentation began with a ﬁxation cross lasting for 300 ms followed by a face presented for 1200 ms. Each block was 15 s long. The face blocks were separated by resting periods of 15 s, during which, the participants were instructed to keep looking at a ﬁxation cross at the center of the screen. Each run started and ended with a 15 s resting block, making the total length of each run 375 s. Following the face discrimination runs, the participants ﬁnished one control run with the same task structure as the face discrimination runs except that the face images were Fourier phase scrambled. These phase scrambled images have the same Fourier power spectrum as the face images, but lack the spatial structure of the face images. 2.3.2. Functional localizer task The functional localizer task had three types of block conditions: faces, houses, and Fourier phase scrambled images. Participants completed two face localizer runs with a one-back task. For each run, there were 12 blocks consisting of four repetitions of each of the three types of blocks. Each block had 10 presentations with a total length of 15 s. As in the face discrimination task, blocks were separated by 15 s resting periods and each run started and ended with a 15 s resting period. The total length of each face localizer run was 375 s. 2.3.3. Retinotopic mapping We ran a standard retinotopic mapping procedure (Warnking, 2002) with rotating wedges (one clockwise run and one counterclockwise run) and expanding/ contracting rings.

2.4.1. Anatomy Anatomical data were collected using a high-resolution T1-weighted magnetization-prepared gradient-echo image (MP-RAGE) with the following parameters: 192 sagittal slices, TR¼ 1900 ms, TE ¼2.5 ms, slice thickness¼ 1 mm, FA ¼ 91, FoV¼ 256 256 mm2, matrix size ¼ 256 256. 2.4.2. Functional BOLD data were collected using an Echo Planar Imaging (EPI) sequence with the following parameters: 21 oblique slices covering the ventral part of the temporal lobe, TR¼ 1500 ms, TE¼ 30 ms, FA ¼ 901, slice thickness¼ 3 mm, in-plane resolution¼ 3 3 mm2, zero gap, FoV¼ 210 210 mm2, matrix size ¼ 64 64, interleaved acquisition. 2.5. Preprocessing We ran the following data preprocessing procedures using the FMRIB Software Library (FSL, version 4.1.6): (1) 3D motion correction using rigid body translation and rotation via an intra-modal volume linear registration (FSL command: mcﬂirt); (2) slice timing correction for interleaved acquisition using Sinc interpolation (FSL command: slicetimer); and (3) skull stripping (FSL command: bet). For the face discrimination runs, no further preprocessing was performed. For the functional localizer runs and the retinotopic runs, we spatially smoothed the data with a Gaussian kernel with fwhm ¼ 6 mm (FSL command: fslmaths). For each run, we discarded the ﬁrst 5 volumes (7.5 s) to avoid the effects of magnetic saturation. We created a three-dimensional surface model of each individual brain from the T1-weighted high-resolution structural image using Freesurfer (Dale, Fischl, & Sereno, 1999; Fischl, Sereno, & Dale, 1999). To map the functional data to the cortical surface, we ﬁrst aligned the functional images with the high-resolution structural image using a 12 degree of freedom afﬁne transformation (FSL command: ﬂirt), then we used the two-surface method in the SUMA software package to map the functional data to the cortical surface (Argall, Saad, & Beauchamp, 2006). The two-surface method maps the absolute maximum value along the segment connecting the white matter surface and the pial surface to the surface node; thereby the activation along the entire gray matter thickness is mapped. 2.6. Analysis

2.3.4. Perceptual sensitivity To measure the perceptual sensitivity to changes of facial identities along PC1 and PC16, we tested a different group of ten adults (5 males, mean age¼30.6 years, SD ¼ 7.5 years) with a delayed match to sample task with a constant stimuli procedure. On each trial, following a 500 ms ﬁxation cross, a target face was displayed for 500 ms. The target face was then replace by a mask of white Gaussian noise, which lasted for 300 ms. After the noise mask, two faces were displayed side by side until the participant pressed a key for answer. The participant was instructed to press a predeﬁned key to indicate which of the two faces was exactly

2.6.1. Functional localizer We estimated the voxel-wise BOLD response amplitudes to each of the three categories (faces, houses, and scrambled images) by ﬁtting the data with a general linear model (GLM), convolved with a hemodynamic response function (canonical difference of gammas HRF, SPM8) with temporal derivatives. We tested the face4house contrast to obtain the t statistical map for each individual. We identiﬁed voxels that have signiﬁcantly higher response amplitude for faces than for houses in lateral occipital and ventral temporal areas with t thresholds

Table 1 Number of voxels and decoding accuracy (in parenthesis) of each ROI. Participants

lFFA

M1 F1 F2 M2 M3 F3 M4 F4

9 35 110 18 134 353 120 10

(0.389) (0.306) (0.181) (0.347) (0.208) (0.417) (0.306) (0.389)

lOFA

rFFA

N.A. 58 (0.375) 536 (0.417) 60 (0.333) 106 (0.222) 165 (0.611) 166 (0.417) N.A.

41 33 522 19 197 586 117 20

(0.347) (0.403) (0.292) (0.403) (0.278) (0.389) (0.306) (0.25)

rOFA

lV1

364 (0.444) 166 (0.306) 97 (0.486) 25 (0.347) 68 (0.292) 191 (0.514) 337 (0.347) N.A.

364 451 295 190 216 276 413 180

rV1 (0.708) (0.625) (0.583) (0.611) (0.514) (0.944) (0.417) (0.458)

438 229 235 215 290 250 364 244

(0.875) (0.667) (0.5) (0.583) (0.556) (0.944) (0.583) (0.361)

Note: The decoding accuracies were calculated in classifying four block conditions (PC1+, PC1 , PC16+, and PC16 ). N.A.: No voxel survived an FDR corrected t-test with qo 0.05.

X. Gao, H.R. Wilson / Neuropsychologia 51 (2013) 1787–1793

corresponding to a false discovery rate (FDR) of qo 0.05. Table 1 summarizes the number of voxels in the face-selective areas for each individual. We used the faceselective areas identiﬁed here as region of interest (ROI) masks for the subsequent analysis. We also estimated the BOLD response amplitude (% of signal change) to changes of facial identities along different face space dimensions averaged across the voxels in each ROI. 2.6.2. Retinotopic maps We estimated the voxel-wise amplitude and phase of the BOLD signal at the stimulation frequency of the retinotopic mapping procedure by Fourier transformation. The hemodynamic response delay was adjusted by combining the estimated phase maps from stimuli moving in opposite directions (Warnking, 2002). The ﬁnal phase map was thresholded to keep the voxels representing the top 5% of the amplitude estimation at the stimulation frequency. Individual phase maps were mapped to their corresponding cortical surfaces. Boundaries between the retinotopic areas were then manually labeled following the standard procedures (Warnking, 2002). 2.6.3. BOLD response patterns We estimated the voxel-wise BOLD response amplitudes for each stimulus block by ﬁtting a GLM convolved with a hemodynamic response function with temporal derivatives to the preprocessed time series. Unlike in the localizer runs where we estimated BOLD response amplitude for each category, here we estimated the BOLD response amplitude for each experimental block. There were 72 beta coefﬁcients in total for each voxel, representing the estimated BOLD response amplitudes for 18 blocks (3 repetitions 6 runs) of each of the four block conditions (PC1+, PC1 , PC16+, and PC16 ). The resulting beta coefﬁcients maps were used as patterns for the subsequent multi-voxel pattern analysis. 2.6.4. Multi-voxel pattern analysis (MVPA) For each of the four face-selective ROIs, we conducted MVPA based on the beta coefﬁcients maps of the selected voxels using Matlab based Princeton MVPA toolbox with a linear Support Vector Machine (SVM) classiﬁer with c¼ 1 (Chang & Lin, 2011). Before submitting the data to MVPA, we normalized the data to have zero mean and a standard deviation of one. The decoding accuracy is calculated through an 18-fold leave-one-out cross-validation. We separated the beta maps by their corresponding block conditions (PC1+, PC1 , PC16+, and PC16 ) and sorted them according to the temporal order of the blocks. We then grouped the beta maps according to their temporal ranks, so that the beta maps from different block condition with the same temporal rank would be in the same group. In total, we had 18 groups. Within each group, there was one beta map for each of the four block conditions. For each cross-validation iteration, we trained the SVM on the data from 17 groups and tested on the remaining one group. The ﬁnal accuracy is calculated by averaging the prediction accuracy across the 18 cross-validation iterations (Table 1). We also estimated the prediction accuracy on the control run in which participants only saw phase scrambled face images. For the control run, we trained the SVM classiﬁer on the 18 groups of the beta maps from the face discrimination blocks and tested the classiﬁer on the beta maps of the control run. The mean accuracy of the prediction of the control run at any of the four ROIs did not differ signiﬁcantly from chance (0.25) with ps4 0.05. The null results of the control run suggest that the Fourier power spectrum of the face images does not provide information for the decoding of the changes of facial identities along different PC dimensions. However, difference in low-level image features such as local contrast and edges may still provide information for decoding. To test this possibility, we ran MVPA analysis with voxels (Table 1) from the primary visual cortex (V1).

3. Results 3.1. Perceptual sensitivity In the behavioral test, on average, a difference of 0.08 is needed to achieve an accuracy of 75% in detecting changes of facial identity on PC1, while only a difference of 0.06 is needed on PC16 to achieve the same accuracy. The result indicates that the observers are more sensitive to physical changes on PC16 than on PC1 (p ¼0.02, two-tailed t-test). 3.2. BOLD response amplitude As shown on Fig. 2, we plotted the percentage of BOLD signal change averaged across voxels in each ROI to changes of facial identity on each direction (7) of PC1 and PC16. For each ROI, we ran a repeated measure ANOVA on the BOLD response amplitude

Percentage of Signal Change (%)

1790

1.0 PC1+ PC1− PC16+

0.5

PC16−

0.0 lFFA

lOFA

rFFA

rOFA

Fig. 2. BOLD responses to changes of facial identity on each direction ( 7 ) of PC1 and PC16 in the face-selective ROIs.

with dimension (PC1 vs. PC16) and direction (+ vs. ) as repeated measures. For all the four ROIs, none of the main effects or interactions was signiﬁcant (ps ¼N.S.). The results suggest that changes of facial identity along different face space dimensions do not modulate BOLD response amplitude in face-selective areas. To test whether changes of facial identity along different face space dimensions modulate BOLD response magnitudes in areas that are not limited to the face-selective ROIs, we ran a whole brain univariate general linear model based analysis with subject as a random effect in the model. No voxel survived an FDRcorrected threshold of q ¼0.05. This result suggests that the BOLD response magnitudes at single voxel level do not provide reliable information to differentiate changes of facial identity along different face space dimensions in any of the brain areas. 3.3. Multi-voxel pattern analysis Using MVPA, we found that, the patterns of activation in all four of the ROIs provided information for decoding of the block conditions (PC1+, PC1 , PC16+, and PC16 ). The mean decoding accuracies for voxels in all four ROIs were all signiﬁcantly higher than chance level (0.25), as suggested by one-tailed t-tests (mean¼0.32, 0.40, 0.33, 0.39; p ¼0.02, 0.01, 0.003, 0.003, uncorrected, for lFFA, lOFA, rFFA, rOFA, respectively, Fig. 3). The decoding accuracy of lFFA became marginally signiﬁcant (p ¼ 0.08) after Bonferroni correction, while the others remained signiﬁcant at po 0.05 level. To test whether the patterns of activation in different ROIs provide redundant or complementary information for encoding the face space dimensions, we calculated the MVPA decoding accuracy with ROI masks that contained voxels of all four of the original face-selective ROIs. The decoding accuracy of the combined ROI was signiﬁcantly higher than chance (0.41, p¼ 0.003, one-tailed t-test). However, the decoding accuracy of the combined ROI was not signiﬁcantly different from the decoding accuracy of any of the four face-selective ROIs (ps4 0.05, corrected for multiple comparison). To test whether low-level images features such as local contrast and edges provide information for the discrimination of the changes of facial identity along different PC dimensions, we ran an MVPA analysis with voxels from V1. Interestingly, BOLD response patterns in V1 can also be decoded at above chance accuracies (left V1, mean accuracy¼ 0.61; right V1, mean decoding accuracy¼ 0.63; ps o0.01, one-tailed t-tests against chance). We also analyzed the pixel-level differences among face images representing changes on different PC dimensions. For each stimulus block, we calculated an average image of the ten stimulus presentations with the location of each image randomly jittered for one degree of visual angle (44 pixels). We applied an SVM classiﬁer to the resulting average images representing pixel-level of changes on each direction of each PC dimension. The classiﬁer achieved an above chance average accuracy of 28.1% across ten

X. Gao, H.R. Wilson / Neuropsychologia 51 (2013) 1787–1793

0.5

MVP A decoding accuracy

**

**

**

0.4 * 0.3 0.2 0.1 0.0

lFFA

lOFA

rFFA

rOFA

Combined

Fig. 3. MVPA decoding accuracy in the face-selective ROIs. The dashed line represents chance level (0.25). *p o0.05; **p o 0.01 (one-tailed t-tests against chance, corrected for multiple comparisons).

MVP A decoding accuracy

0.8

*

0.6

0.4

PC1 PC16

0.2

0.0 lFFA

lOFA

rFFA

rOFA

Fig. 4. MVPA decoding accuracy for PC1 and PC16. The dashed line represents chance level (0.5). *po 0.05 (two-tailed t-tests between decoding accuracy for PC1 and PC16).

simulated runs (SD ¼2.25%, p o0.01, one tailed t-test against chance). However, if we align the face images in each block by removing the location jitter, the decoding accuracy of the classiﬁer rose to 100%. We also ran MVPA for PC1 and PC16 separately to calculate the accuracy in classifying changes of facial identities on different directions (7) within each dimension. The decoding accuracies for both PC1 and PC16 in all the four ROIs were all above chance (pso 0.01, corrected for multiple comparison, Fig. 4). The decoding accuracy between the two directions of PC16 was signiﬁcantly higher than the decoding accuracy between the two directions of PC1 in lOFA (p o0.05, corrected for multiple comparison). There is no signiﬁcant difference between the decoding accuracy of PC1 and PC16 in any other ROIs.

4. Discussion The face space hypothesis suggests that individual facial identities are encoded as distances (from the origin) and directions in a multidimensional space. A previous study (Lofﬂer et al., 2005) has demonstrated that the distance between an individual face and the origin of the face space (the average face) is encoded as BOLD response amplitude in the FFA. Here we show that directions in the face space as represented by the face space dimensions are encoded as patterns of neural activities in the face-selective areas including the FFA and OFA. The current ﬁndings conﬁrmed the role of FFA and OFA in encoding individual facial identities. The fact

1791

that multi-voxel response patterns but not single voxel response magnitudes differed for different face space dimensions indicates that information regarding face space dimensions is encoded in a spatially distributed way. The current ﬁndings of the distributed neural representations of face space dimensions are consistent with a recent study showing that cells in the middle face patch in macaque monkeys are selective to geometric changes of schematic faces along different feature dimensions (Freiwald, Tsao, & Livingstone, 2009). Although the nature of the multi-voxel patterns measured by fMRI is still not well understood (Op de Beeck, 2010; Kriegeskorte, Cusack, & Bandettini, 2010), one possible explanation is that the current ﬁndings reﬂect an uneven distribution of cells that are tuned to different face space dimensions. As a result, voxels sampled by fMRI show biases to different face space dimensions. Previous studies have shown that individual facial identities are encoded as patterns of neural activation in distributed brain areas (Kriegeskorte et al., 2007; Natu et al., 2010; Nester et al., 2011). The current study had two advantages over these studies. In the previous studies, facial identities were kept constant despite changes on other dimensions (e.g. facial expression (Nestor, Plaut, & Behrmann, 2011), face point of view (Natu et al., 2010)). Therefore, the neural activates represent encoding of ﬁxed facial identities. In the current study, the facial identities always varied along a certain face space dimension in a block with the amount of variation as large as the difference between two facial identities (the maximum variation was at 34th percentile of the pair-wised difference of the original 41 facial identities). Therefore, the neural activities did not represent ﬁxed facial identities as points in the face space structure. Instead, what have been measured in the current study were the neural responses to changes of facial identities on face space dimensions. This methodology allowed us to investigate the neural representation of the dimensions forming the face space structure rather than neural representation of individual facial identities. Another advantage is that deﬁning the face space dimensions based on PCA allowed us to link image statistical properties to perceptual and neural sensitivities. We studied changes of facial identities along two dimensions, with PC1 explaining eight times as much of the variance in the face set as PC16 did. We found observers were more sensitive to physical changes on PC16 than on PC1. This result is not surprising if we scale the discrimination thresholds relative to the variance on PC1 and PC16. The standard deviation of the original 41 faces was 0.081 on PC1 and 0.022 on PC16. Therefore, the relative thresholds when scaled to the corresponding standard deviation on each PC would be 0.97 for PC1 and 2.72 for PC16. Observers can detect changes less than 1 standard deviation on PC1, while they need a more than 2 standard deviation difference to detect changes on PC16. We found that the neural sensitivity to PC1 and PC16 was consistent to the perceptual sensitivity. In left OFA, the decoding accuracy was signiﬁcantly higher for changes on PC16 than for PC1 when the same amount of physical difference was available. However, we did not see such a difference between PC1 and PC16 in other three face-selective ROIs. The current ﬁndings suggest that for the same amount of physical difference, there is higher perceptual and neural sensitivity to changes on PC dimensions with smaller eigenvalues. However, this conclusion is limited by the fact that only one PC dimension with a high eigenvalue and one with a low eigenvalue were used. We cannot rule out the possibility that the relation between perceptual and neural sensitivity to PC dimensions and the eigenvalues of PC dimensions may not be monotonic. The fact that pooling all the voxels in the four ROIs did not increase the MPVA decoding accuracy suggests that similar

1792

X. Gao, H.R. Wilson / Neuropsychologia 51 (2013) 1787–1793

information regarding the face space dimensions is present in all four ROIs as distributed patterns. Different ROIs may carry redundant information about changes on different face space dimensions. It is possible that these face-selective areas carry the same information, but use it for different stages of processing. Pooling the voxels in all four ROIs increased the dimensionality of the observations, it might require more observations than we currently have to train the classiﬁer to achieve the same level of decoding accuracy. On the other hand, it might just reﬂect a ceiling effect of the sensitivity of the methodology used in the current study, which could be limited by the resolution (both spatial and temporal) of fMRI. The fact that information can be extracted from early visual areas for accurate decoding of changes of facial identity along different PC dimensions suggests that low-level features of the stimuli may provide information for discrimination among these changes. The pixel-level image analysis provided evidence for the existence of such low-level features, although decoding accuracy based on these low-level features was low if the locations of the images were jittered. However, since the participants were allowed to view the images freely, the effect of location jittering may be compromised by eye movement. If the images were well aligned, a classiﬁer was able to decode the face images at 100% accuracy. Therefore, higher decoding accuracy based on the lowlevel feature could be achieved if the images were realigned through eye movement. To remove the potential confound of the low-level features, future study could vary the size or the point of view of the face images. On the other hand, it is possible that the face speciﬁc information in V1 may reﬂect a feedback mechanism from the higher-level visual areas. Such feedback connections from a higher visual area to the early visual areas have been suggested in previous studies (Bar, 2003; Galuske, Schmidt, Goebel, Lomber, & Payne, 2002; Hupé et al., 1998; Rossion, Dricot, Goebel, & Busigny, 2011). A previous study (Lofﬂer et al., 2005) has shown that BOLD responses to faces in the FFA increases with increasing distance between the face and the average face. It would also be interesting to investigate if distance in the face space also affects BOLD response patterns. Although we had faces at three levels of distance from the average face, because of the nature of the block design, we were not able to separate the patterns of responses for faces that were at different distances from the average face. Therefore, we were not able to investigate how changes in BOLD response patterns are related to changes in the distance from the average face. Future studies with an event related design would be able to provide more information on how neural response patterns encode distance in the face space structure.

5. Conclusions We deﬁned face space dimensions based on statistical regularities from a group of Caucasian male faces. We found that changes of facial identities along different face space dimensions were represented as distributed neural response patterns in the FFA and the OFA. Within the current stimuli set, perceptual sensitivities to changes of facial identities were linked to statistical properties of the face space dimensions, such that human observers were more sensitive to physical changes on dimensions where faces vary less. Such a difference is also present in neural sensitivity that higher neural sensitivity to changes of facial identity was observed on dimensions that faces vary less. Collectively, the current ﬁndings provide evidence for the representation of face space dimensions in the face-selective cortical areas.

Acknowledgments This research was supported in part by CIHR grant #172103 and a grant from the Canadian Institute for Advanced Research to HRW. References Argall, B. D., Saad, Z. S., & Beauchamp, M. S. (2006). Simpliﬁed intersubject averaging on the cortical surface using SUMA. Human Brain Mapping, 27, 14–27. Bar, M. (2003). A cortical mechanism for triggering top-down facilitation in visual object recognition. Journal of Cognitive Neuroscience, 15, 600–609. Barton, J. J. S., Press, D. Z., Keenan, J. P., & O’Connor, M. (2002). Lesions of the fusiform face area impair perception of facial conﬁguration in prosopagnosia. Neurology, 58, 71–78. Bouvier, S. E., & Engel, S. A. (2006). Behavioral deﬁcits and cortical damage loci in cerebral achromatopsia. Cerebral Cortex, 16, 183–191. Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–439. Chang, C. C., & Lin, C. J. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27. Dale, A. M., Fischl, B., & Sereno, M. I. (1999). Cortical surface-based analysis. I. Segmentation and surface reconstruction. NeuroImage, 9, 179–194. Fischl, B., Sereno, M. I., & Dale, A. M. (1999). Cortical surface-based analysis. II: inﬂation, ﬂattening, and a surface-based coordinate system. NeuroImage, 9, 195–207. Freiwald, W. A., Tsao, D. Y., & Livingstone, M. S. (2009). A face feature space in the macaque temporal lobe. Nature Neuroscience, 12, 1187–1196. Galuske, R. A. W., Schmidt, K. E., Goebel, R., Lomber, S. G., & Payne, B. R. (2002). The role of feedback in shaping neural representations in cat visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 99, 17083–17088. Gao, X., & Maurer, D. (2011). A comparison of spatial frequency tuning for the recognition of facial identity and facial expressions in adults and children. Vision Research, 51, 508–519. Gauthier, I., Tarr, M. J., Moylan, J., Skudlarski, P., Gore, J. C., & Anderson, A. W. (2000). The fusiform “face area” is part of a network that processes faces at the individual level. Journal of Cognitive Neuroscience, 12, 495–504. Gold, J., Bennett, P. J., & Sekuler, A. B. (1999). Identiﬁcation of band-pass ﬁltered letters and faces by human and ideal observers. Vision Research, 39, 3537–3560. Grill-Spector, K., Kushnir, T., Edelman, S., Avidan, G., Itzchak, Y., & Malach, R. (1999). Differential processing of objects under various viewing conditions in the human lateral occipital complex. Neuron, 24, 187–203. Grill-Spector, Kalanit, Knouf, N., & Kanwisher, N. (2004). The fusiform face area subserves face perception, not generic within-category identiﬁcation. Nature Neuroscience, 7, 555–562. Hancock, P. J. B., Burton, A. M., & Bruce, V. (1996). Face processing: human perception and principal components analysis. Memory and Cognition, 24, 26–40. Hupé, J. M., James, A. C., Payne, B. R., Lomber, S. G., Girard, P., & Bullier, J. (1998). Cortical feedback improves discrimination between ﬁgure and background by V1, V2 and V3 neurons. Nature, 394, 784–787. Kanwisher, N., McDermott, J., & Chun, M. M. (1997). The fusiform face area: a module in human extrastriate cortex specialized for face perception. The Journal of Neuroscience, 17, 4302–4311. Kriegeskorte, N., Cusack, R., & Bandettini, P. (2010). How does an fMRI voxel sample the neuronal activity pattern: compact-kernel or complex spatiotemporal ﬁlter? NeuroImage, 49, 1965–1976. Kriegeskorte, N., Formisano, E., Sorger, B., & Goebel, R. (2007). Individual faces elicit distinct response patterns in human anterior temporal cortex. Proceedings of the National Academy of Sciences of the United States of America, 104, 20600–20605. Leopold, D. A., Rhodes, G., Müller, K. M., & Jeffery, L. (2005). The dynamics of visual adaptation to faces. Proceedings of The Royal Society B: Biological Sciences, 272, 897–904. Lofﬂer, G., Yourganov, G., Wilkinson, F., & Wilson, H. R. (2005). fMRI evidence for the neural representation of faces. Nature Neuroscience, 8, 1386–1390. Maurer, D., Le Grand, R., & Mondloch, C. J. (2002). The many faces of conﬁgural processing. Trends in Cognitive Sciences, 6, 255–260. Näsänen, R. (1999). Spatial frequency bandwidth used in the recognition of facial images. Vision Research, 39, 3824–3833. Natu, V., Jiang, F., Narvekar, A., Keshvari, S., Blanz, V., & O’Toole, A. J. (2010). Dissociable neural patterns of facial identity across changes in viewpoint. Journal of Neuroscience, 22, 1570–1582. Nestor, A., Plaut, D. C., & Behrmann, M. (2011). Unraveling the distributed neural code of facial identity through spatiotemporal pattern analysis. Proceedings of the National Academy of Sciences of the United States of America, 108, 9998–10003. Op de Beeck, H. P. (2010). Probing the mysterious underpinnings of multi-voxel fMRI analyses. NeuroImage, 50, 567–571. O’Toole, A. J., Deffenbacher, K., Abdi, H., & Bartlett, J. C. (1991). Simulating the “other-race effect” as a problem in perceptual learning. Connection Science, 3, 163–178. Pelli, D. G. (1997). The videotoolbox software for visual psychophysics: transforming numbers into movies. Spatial Vision, 10, 437–442. Pitcher, D., Walsh, V., & Duchaine, B. (2011). The role of the occipital face area in the cortical face perception network. Experimental Brain Research, 209, 481–493.

X. Gao, H.R. Wilson / Neuropsychologia 51 (2013) 1787–1793

Pitcher, D., Walsh, V., Yovel, G., & Duchaine, B. (2007). TMS evidence for the involvement of the right occipital face area in early face processing. Current Biology, 17, 1568–1573. Rhodes, G., Jaquet, E., Jeffery, L., Evangelista, E., Keane, J., & Calder, A. J. (2011). Sexspeciﬁc norms code face identity. Journal of Vision, 11(1), 1–11. Rossion, B., Caldara, R., Seghier, M., Schuller, A., Lazeyras, F., & Mayer, E. (2003). A network of occipito-temporal face-sensitive areas besides the right middle fusiform gyrus is necessary for normal face processing. Brain, 126, 2381–2395. Rossion, B., Dricot, L., Goebel, R., & Busigny, T. (2011). Holistic face categorization in higher order visual areas of the normal and prosopagnosic brain: toward a non-hierarchical view of face perception. Frontiers in Human Neuroscience, 4, 225. Rotshtein, P., Henson, R. N. A., Treves, A., Driver, J., & Dolan, R. J. (2005). Morphing Marilyn into Maggie dissociates physical and identity face representations in the brain. Nature Neuroscience, 8, 107–113. Said, C. P., & Todorov, A. (2011). A statistical model of facial attractiveness. Psychological Science, 22, 1183–1190. Sergent, J., Ohta, S., & MacDonald, B. (1992). Functional neuroanatomy of face and object processing: a positron emission tomography study. Brain, 115, 15–36. Sirovich, L., & Kirby, M. (1987). Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of American A, 4, 519–524.

1793

Steeves, J., Culham, J. C., Duchaine, B., Cavina, C., Valyear, K., Humphry, S. I., et al. (2006). The fusiform face area is not sufﬁcient for face recognition: evidence from a patient with dense prosopagnosia and no occipital face area. Neuropsychologia, 44, 596–609. Tanaka, J. W., & Farah, M. J. (1993). Parts and wholes in face recognition. The Quarterly Journal of Experimental Psychology Section A, 46, 225–245. Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3, 71–86. Valentine, T. (1991). A uniﬁed account of the effects of distinctiveness, inversion, and race in face recognition. The Quarterly Journal of Experimental Psychology, 43, 161–204. Warnking, J. (2002). fMRI Retinotopic mapping—step by step. NeuroImage, 17, 1665–1683. Webster, M. A., Kaping, D., Mizokami, Y., & Duhamel, P. (2004). Adaptation to natural facial categories. Nature, 428, 557–561. Wilson, H. R., Lofﬂer, G., & Wilkinson, F. (2002). Synthetic faces, face cubes, and the geometry of face space. Vision Research, 42, 2909–2923.

The neural representation of face space dimensions

The neural representation of face space dimensions

Recommend Documents