NeuroImage xx (xxxx) xxxx–xxxx
Contents lists available at ScienceDirect
NeuroImage journal homepage: www.elsevier.com/locate/neuroimage
Recognizing approaching walkers: Neural decoding of person familiarity in cortical areas responsive to faces, bodies, and biological motion ⁎
Carina A. Hahn , Alice J. O'Toole The University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080, USA
A R T I C L E I N F O
A BS T RAC T
Keywords: Familiarity Person motion Spatiotemporal processing Gait
In natural viewing environments, we recognize other people as they move through the world. Behavioral studies indicate that the face, body, and gait all contribute to recognition. We examined the neural basis of person recognition using a decoding approach aimed at discriminating the patterns of neural activity elicited in response to seeing visually familiar versus unfamiliar people in motion. Participants learned 30 identities by viewing multiple videos of the people in action. Recognition was tested inside a functional magnetic resonance imaging (fMRI) scanner using 8-s videos of 60 people (30 learned and 30 novel) approaching from a distance (~13 m). Full brain images were taken while participants watched the approach. These images captured neural activity at four time points (TRs) corresponding to progressively closer views of the walker. We used pattern classification techniques to examine familiarity decoding in lateralized ROIs and the combination of left and right (bilateral) regions. Results showed accurate decoding of familiarity at the farthest distance in the bilateral posterior superior temporal sulcus (bpSTS). At a closer distance, familiarity was decoded in the bilateral extrastriate body area (bEBA) and left fusiform body area (lFBA). The most robust decoding was found in the time window during which the average behavioral recognition decision was made – and when the face came into clearer view. Multiple regions, including the right occipital face area (rOFA), bOFA, bFBA, bpSTS, and broadly distributed face- and body-selective voxels in the ventral temporal cortex decoded walker familiarity in this time window. At the closest distance, the lFBA decoded familiarity. These results reveal a broad system of ventral and dorsal visual areas that support person recognition from face, body, and gait. Although the face has been the focus of most person recognition studies, these findings remind us of the evolutionary advantage of being able to differentiate the people we know from strangers at a safe distance.
1. Introduction The process of recognizing someone often begins at a distance, before we are able to get a close look at the person's face. If someone approaches, successful social interaction requires us to determine, as quickly as we can, whether or not we know the person. We accomplish this task easily, even though the quality and resolution of the face, body, and gait vary continuously across distance. It makes sense, therefore, that for recognition, we rely not only on the face but also on the body and gait (Burton et al., 1999; Hahn et al., 2016; O'Toole et al., 2011; Pilz et al., 2011; Rice et al., 2013a, 2013b, Robbins and Coltheart, 2015, 2012; Yovel and O'Toole, 2016). Although the face supports the most accurate recognition (Burton et al., 1999; Hahn et al., 2016; O'Toole et al., 2011; Robbins and Coltheart, 2012), recent studies point to an important role for the body in this process (Hahn et al., 2016; O'Toole et al., 2011; Rice et al., 2013a; Robbins and Coltheart, 2012; Simhi and Yovel, 2015). The body is especially
⁎
important when the person is distant or when the face is uninformative (Hahn et al., 2016; Rice et al., 2013a, 2013b). In addition to the face and body, the natural motions of a person's gait also provide identity cues that support above chance recognition performance (Cutting and Kozlowski, 1977; Loula et al., 2005; Simhi and Yovel, 2015; Stevenage et al., 1999). In natural viewing conditions, when all of these cues are available in the form of whole people in motion, behavioral studies demonstrate that faces and bodies are processed as a single holistic unit (Bernstein et al., 2014; Pilz et al., 2011; Robbins and Coltheart, 2012). Moreover, there is strong evidence to indicate that recognition is most accurate when all cues for identity from the face, body, and gait are available (Hahn et al., 2016; O'Toole et al., 2011; Pilz et al., 2011; Simhi and Yovel, 2015; Yovel and O'Toole, 2016). Although we use all cues when possible, what we know about the neural coding of person familiarity comes almost exclusively from studying neural responses to static images of the face. Multiple neural regions have been implicated in the recognition of newly learned,
Correspondence to: The University of Texas at Dallas, 800 W. Campbell Road, School of Behavioral and Brain Sciences, GR 4.1, Richardson, TX 75080, USA. E-mail address:
[email protected] (C.A. Hahn).
http://dx.doi.org/10.1016/j.neuroimage.2016.10.042 Received 7 June 2016; Accepted 24 October 2016 Available online xxxx 1053-8119/ © 2016 Elsevier Inc. All rights reserved.
Please cite this article as: Hahn, C.A., NeuroImage (2016), http://dx.doi.org/10.1016/j.neuroimage.2016.10.042
NeuroImage xx (xxxx) xxxx–xxxx
C.A. Hahn, A.J. O'Toole
(faces, bodies, or gait). The strong preference of pSTS for dynamic over static stimuli, therefore, leaves open the question of its role in signaling person familiarity from biological motion cues. The goal of the present study was to examine the neural signaling of visual familiarity when all available cues from the face, body, and biological motion are available simultaneously. The visual system has access, in this case, to temporally varying information from the face, body, and gait. We used a video showing the approach of another person from a distance – a commonly experienced context for recognition. We probed the spatiotemporal course of the neural signaling of familiarity and asked: When during the approach, and where in the brain, does the neural response differentiate visually familiar versus unfamiliar people? The relative quality and diagnostic value of the diverse cues from faces, body, and gait to identity vary with distance (Hahn et al., 2016). Therefore, we predicted that multiple brain regions responsive to faces, bodies, and biological motion would contribute to recognition. We also expected the relative contribution of different regions to vary across viewing distances as the walker approached. Our strategy was to familiarize participants with a set of identities outside of the scanner in a familiarization session and to test for recognition inside of the scanner. We submitted voxels from multiple regions of interest (ROIs) to a pattern classifier. Over the time course of the person approaching the camera (4 separate time points), we measured the discriminability of neural activity patterns elicited in response to viewing familiar and unfamiliar people approaching. The goal was to determine where, and when, accurate familiarity coding occurs across this network of high-level visual regions responsive to faces, bodies, and biological motion.
visually familiar faces (for reviews, see Gobbini and Haxby, 2007 and Natu and O'Toole, 2011). Models of familiar face recognition have divided these regions into core and extended systems of familiar face processing (Gobbini and Haxby, 2007; cf. also, Haxby et al., 2000). The extended face-recognition system consists of cortical and subcortical regions that process emotional information and semantic knowledge about people who are successfully recognized. Core system areas in the ventral visual stream process the invariant face features useful for recognition, independent from associated semantic and emotional information. Our interest was in examining neural responses to visually familiar, previously unknown, people. Therefore, we focused on these core-regions in the present study. These regions include face-selective areas in the inferior occipital gyrus (Pitcher et al., 2011b; Puce et al., 1996) and in the lateral fusiform gyrus (Kanwisher et al., 1997). The posterior superior temporal sulcus (pSTS) in the dorsal visual stream is also posited as a core region, but with the function of processing dynamic or changeable information from faces (Allison et al., 2000; Gobbini and Haxby, 2007; Pitcher et al., 2014, 2011a). It is well known that neural responses in face-selective regions of the core system are modulated by face familiarity. Most studies compare the magnitude of neural response to visually familiar versus unfamiliar static faces. These studies have reported smaller responses to familiar versus unfamiliar faces in the fusiform face area (FFA) and occipital face area (OFA) (Dubois et al., 1999; Gobbini and Haxby, 2006; Kosaka et al., 2003; Leveroni et al., 2000; Rossion et al., 2003; Rossion et al., 2001; Schwartz et al., 2003; though see Katanoda et al., 2000; Lehmann et al., 2004; Wiser et al., 2000. For a review, see Natu and O'Toole (2011)). In the dorsal visual stream, Gobbini and Haxby (2006) likewise found a smaller response in the right pSTS (rpSTS) to well-learned, visually familiar faces than to unfamiliar faces. Similarly, Bobes et al. (2013) found a reduced hemodynamic response to newlylearned (visually familiar) faces, but not to unfamiliar faces, in the rpSTS. These studies point to an attenuated activation of neural resources in processing familiar faces versus unfamiliar faces in core system regions. Going beyond magnitude-based responses, one recent study used multi-voxel pattern analysis (MVPA) to decode familiarity from neural responses to faces that varied in the degree to which they were visually familiar (Natu and O'Toole, 2015). Above chance decoding was found with a combination of face-selective voxels in ventral visual cortex – specifically in the OFA and FFA. This classification accuracy increased with increasing levels of face familiarity. As a result of the sensitivity of MVPA techniques, Natu and O'Toole (2015) discovered that neural response to familiar and unfamiliar people in the OFA and FFA is sensitive to the level of face familiarity. By comparison to the face, considerably less is known about the neural coding of body familiarity. Candidate body-selective regions that might code familiarity from bodies include the extrastriate body area (EBA) (Downing et al., 2001; Vangeneugden et al., 2014) in the posterior region of the middle temporal gyrus and the fusiform body area (FBA) (Peelen and Downing, 2005a) – a region adjacent to and partially overlapping the FFA (Peelen and Downing, 2005a; Schwarzlose et al., 2005). The neural response to familiar versus unfamiliar bodies has been compared in only one study. Hodzic et al. (2009) found larger responses to visually familiar versus unfamiliar bodies in the right FBA, but not in the EBA. On the neural coding of person familiarity from biological motion, almost nothing is known. Sensitivity to biological motion in the pSTS (Allison et al., 2000; Grossman and Blake, 2002; Herrington et al., 2011; Pitcher et al., 2014, 2011a; Vangeneugden et al., 2014) makes that region a candidate area for coding person familiarity from gait and body dynamics. Although responses in the pSTS are modulated by the familiarity and identity of static faces (Bobes et al., 2013; Fox et al., 2009; Gobbini and Haxby, 2006), no studies have examined the sensitivity of the pSTS to familiarity using dynamic stimuli of any sort
2. Method 2.1. Participants A total of 20 participants volunteered to participate. One female participant requested to be removed from the scanner during the structural scan and did not participate in the rest of the experiment. Therefore, we included 19 participants for analysis (Mean age=25.05; 13 female). All participants were healthy, cognitively normal, English speakers. Seventeen of the 19 participants were self-identified righthanded individuals, and two self-identified as ambidextrous. Each participant was compensated $45 in cash for his or her time. Consent was obtained following standard IRB protocol for both The University of Texas at Dallas and The University of Texas Southwestern Medical Center. 2.2. Experimental stimuli Videos for both the familiarization session and recognition test came from the Human ID Database (O'Toole et al., 2005). We selected 60 identities from the database (half male/half female). For familiarization, we selected four videos of 30 identities from the same filming session. Each video showed a person as they performed one of four actions without audio: 1) smiling, 2) talking, 3) walking toward the camera, or 4) rotating his/her head 180° (Fig. 1). Multiple exposures, both close-up face views and whole body views, of each identity allowed for the formation of a robust representation of each person. This allowed for accurate and generalized recognition in novel viewing conditions (Hahn et al., 2016; Roark et al., 2006). For the recognition test, a second video of each person walking toward a camera was selected. An additional 30 identities were selected to serve as test stimuli for unfamiliar people. Each person was shown walking toward the camera from a distance of approximately 13.6 m, until they veered off from view to the left of the camera. Videos in the recognition test were trimmed to be exactly 8 s in duration. All videos were shown without audio. These recognition test videos were filmed in 2
NeuroImage xx (xxxx) xxxx–xxxx
C.A. Hahn, A.J. O'Toole
Fig. 1. Example frames from videos used for the familiarization session. Participants saw each person perform all four motion based actions: A) smiling, B) talking, C) walking toward the camera, and D) rotating his/her head 180°.
2.3. Procedure
different sessions than those used for familiarization. These sessions spanned days or weeks. Therefore, participants could not use cues such as hair, clothes, etc., when making recognition decisions. Example frames from a test video appear in Fig. 2. The most distant segment of the video shows the outline of the body, with good information about body shape and stature. The gait of the walker is also clearly perceptible at this distance, but there is very limited facial detail. The second segment again shows body shape and gait, but with the face becoming clearer. There is still no detailed information available in the face. In the third segment, the face is visible in more detail, and the quality of body shape and gait decline, as parts of the walker are now out of the camera view. Finally, by the fourth segment, the body and face are only partially in view.
We modeled the procedure from a previous behavioral study that tested recognition from the face, body, and whole people over varying distances (Hahn et al., 2016), adapting the procedure for the scanner. Participants were first familiarized with a set of identities using videos. Next, in the scanner, we localized voxels for analysis and tested participants for recognition while they were being scanned.
2.3.1. Familiarization session Outside of the scanner, participants viewed all 30 identities performing each of the four motion types once for a total of 120 trials (30 people×4 motion types). Presentation order was completely
Fig. 2. Example frames from each segment of a video stimulus used in the recognition test. The first, middle, and last frames from each TR are shown. All 8-s long videos were shown uninterrupted. Videos depicted a person (familiar or unfamiliar) walking toward the camera. A full brain image was collected every 2 s during the approach. Frames shown correspond to the approximate distance of the walker captured in each of the TRs.
3
NeuroImage xx (xxxx) xxxx–xxxx
C.A. Hahn, A.J. O'Toole
included in VTC. This mask has been used in previous studies, and can be used to examine how ensemble regions code faces and bodies (cf., Natu et al., 2011, 2010; Natu and O'Toole, 2015; O'Toole et al., 2014). To locate face-selective voxels, a whole-brain ANOVA was computed, separately for each subject, using the contrast faces > objects +scrambled images (p < .00001, uncorrected). Voxels in parietal and frontal lobe areas were removed, retaining only voxels in ventral and temporal cortex. We created an analogous body-selective mask by localizing the voxels that responded more to bodies than to objects or scrambled images (p < .00001, uncorrected).
randomized. The entire familiarization procedure lasted approximately 18 minutes. 2.3.2. Structural scan A high-resolution MP-RAGE structural scan lasting about three minutes was obtained (see Section 2.4 below for details). 2.3.3. Localizer session The purpose of the localizer session was to localize voxels that were selectively responsive to faces and bodies. The localizer procedure was adapted from Haxby et al. (2001) and was identical to that used in previous work (cf. Methods sections of Natu and O'Toole, 2015). In this session, completed within a single run, participants viewed 12 s blocks of different stimulus classes of gray scale images: faces, objects, headless bodies, or scrambled images. Stimulus class block order was randomized. Within each block, one stimulus class was shown in three consecutive presentations of 12 images, in random order. Images appeared for 800 ms, with a 200 ms inter-stimulus interval. A 10 s fixation followed each block. This process was replicated six times. To encourage participants to focus on the stimuli, they performed a 1-back task in which they indicated if the image currently on the screen was identical to the preceding image.
This classical way to localize face- and body-selective voxels involves a contrast with objects or scrambled images (e.g., Kanwisher et al., 1997). This method does not explicitly eliminate voxels responsive to bodies in face-selective regions or voxels responsive to faces in body-selective regions. To examine the extent to which dedicated face-and body-responsive voxels contributed to accurate classification, we excluded overlapping voxels in a similar manner as done in previous studies (Bernstein et al., 2014; Brandman and Yovel, 2016).1 For the face-specific mask, we began with the face-selective mask of voxels, and excluded any voxel that was also responsive to bodies at p < .05. Similarly, the body-specific mask consisted of bodyselective voxels, eliminating any voxels that were also responsive to faces at p < .05. This produced a collection of distributed face- and body-specific voxels across VTC.
2.3.4. Recognition test Following the localizer session, participants completed the recognition test during a single run. Participants viewed videos of 60 identities (half familiar/half unfamiliar) in randomized order in an event-related design. The 8 s videos were shown one at a time, with 10 s of fixation in between each video to allow the hemodynamic response to return to baseline before the onset of the following trial. Because the scanning TR was 2 s, the 8 s video allowed for four full brain images. For each trial, participants were instructed to indicate whether the person in the video was familiar or unfamiliar “as soon as [they felt] confident.” Participants could respond at any point during the time course of the video by pressing the appropriate button on a button box. Only one response was obtained per trial. Accuracy and reaction times were collected.
2.5.1.2. Classically defined ROIs. We also examined familiarity coding in classically defined regions implicated in the processing of faces, bodies, and bodies in motion. All ROIs were localized in native space. We functionally localized the OFA and FFA using the same contrast applied for the distributed mask of face-selective voxels (faces > objects +scrambled images). These functionally defined ROIs were required to be at least two contiguous, side-by-side voxels, originating at the peak voxel (based on the F ratio from the contrast analysis). Clusters were restricted to a maximum radius of 8mm, following previous work (Peelen and Downing, 2005b; Ross et al., 2014). Anatomical constraints were applied as follows. For the OFA, we found the peak face-selective cluster on the lateral surface of inferior occipital gyrus. The FFA was defined as the peak face selective cluster on the middle, lateral fusiform gyrus. For body-selective ROIs, we defined two regions: the EBA and the FBA. These regions were obtained using the same contrast applied for the distributed mask of body-selective voxels (bodies > objects+scrambled images). We defined the EBA as the peak body-selective cluster on extrastriate cortex, on the lateral occipital lobe (e.g., Brandman and Yovel, 2016; Pitcher et al., 2012). The FBA was defined as the peak body-selective cluster on the fusiform gyrus. The pSTS was an anatomically localized sphere with a 10 mm radius centered around the posterior portion of the superior temporal sulcus, following previous work (O'Toole et al., 2014).
2.4. Imaging parameters The functional imaging was done using a 3 T MR system (Achieva; Philips Medical Systems, Best, The Netherlands) with a 32-channel SENSE head coil at the University of Texas Southwestern Medical Center. Structural scans were high-resolution T1 weighted whole brain MP-RAGE scans (TR=8.1 ms, TE=3.7 ms, voxel size=1×1×1 mm). Whole-brain functional scans were obtained with echo-planar imaging (EPI) transverse images (TR=2 s, TE=30 ms, flip angle=80°, FOV=220 mm, 38 slices, voxel size=3.44×3.44×4.00 mm). Scans were slice-time corrected, realigned, and co-registered using SPM8. No spatial normalization was performed.
Table 1 shows a summary of the ROIs tested, including average number of voxels and MNI coordinates of the response peak for each classically defined ROI. Compared to some studies, the average number of voxels found per ROI is low due to a high functional threshold (p < .00001, uncorrected) and relatively large voxel sizes. This number is consistent, however, with other studies that have used similar scanner parameters and functional ROI definitions (cf., Natu and O'Toole, 2015). The distributed masks are larger than the classically defined ROIs, as would be expected given the less anatomically constrained operational definition of the distributed regions. This larger size does not assure a classifier advantage over the smaller classically defined
2.5. Analysis 2.5.1. Voxel selection and ROI definitions We examined activity in two types of voxel masks. In distributed masks, we functionally defined face- and body-selective collections of voxels broadly distributed across ventral-temporal cortex. In classically defined ROIs, we functionally localized face- and body-selective regions to examine the extent to which modular regions across ventraltemporal and dorsal visual streams contributed to accurate familiarity decoding. 2.5.1.1. Distributed masks. The first type of mask consisted of faceand body-selective voxels that were distributed across the ventraltemporal cortex (VTC). These are considered distributed, because they are not anatomically constrained beyond the requirement that they are
1 Bernstein et al. (2014) and Brandman and Yovel (2016) refer to these areas as faceexclusive and body-exclusive. We refer to them here as face-specific and body-specific, which may entail less ambiguity about which voxels are excluded.
4
NeuroImage xx (xxxx) xxxx–xxxx
C.A. Hahn, A.J. O'Toole
there was no systematic link between time point and condition. Training data were submitted to a principal component analysis (PCA). Coordinates or “points” in the PC space represented individual scans. A linear discriminant analysis was applied to the representation of the training scans to learn a mapping between these representations and the scan status as familiar or unfamiliar. Single dimensions of the PC space were then tested to determine which components discriminated between the training scans obtained during a familiar versus unfamiliar trial with an accuracy threshold of d′ > 0. PCs that met this criterion were retained and recombined to produce a subspace for an “optimal” classifier. Individual scans were represented as their projection scores on these retained PCs. The average number of PCs retained scaled roughly according to the number of voxels input into the classifier. To summarize, on average, the number of PCs retained were as follows for the distributed masks: M=41.88, SE=0.26; for the anatomical ROIs: M=42.19, SE=0.06; and for the classically defined ROIs: M=9.30, SE=0.63. For cross-validation, the left-out test trials were projected individually into the optimal PC space. Using the same linear discriminant analysis, an estimate of familiarity was obtained for each trial. This process was repeated 30 times, until every trial served as test data. Classifier accuracy was assessed with the signal detection measure, d′, computed as the z-score(hit rate)−z-score(false alarm rate). Hits were defined as correctly classified “familiar” trials. False alarms were defined as unfamiliar trials incorrectly classified as “familiar.” The d's were computed separately for each participant. The average accuracy across participants is reported. As noted, in each ROI, discriminability was tested at four time points throughout the video (8 s videos with a TR of 2 s). For inferential purposes, above chance classification was determined using permutation tests (e.g., Etzel, 2015). We created a null distribution of d's for each ROI, as follows. In each case, familiarity condition labeling was permuted for each participant and the d′ was computed. These were averaged across the participants to yield a group-level d′. This process was repeated 100 times to produce a null distribution of 100 permuted group-level d's, with which we used with a one-tailed cutoff of p < .05. For classically defined ROIs, we computed classification accuracy in left and right hemispheres separately as well as bilaterally. This bilateral analysis was achieved by concatenating the input vectors from each lateralized ROI into one “bilateral” ROI as a single vector. Testing of ROIs bilaterally was done for completeness and to be comparable to previous literature (Natu and O'Toole, 2015; Said et al., 2010a, 2010b). Also, for classifiers, combining ROIs bilaterally can boost power for detecting effects when the left and right ROIs each carry a weak signal.
Table 1 Mean number of voxels and peak MNI coordinates of functionally and anatomically localized face-and body-selective regions. Mean coordinates (SD) Region (subjects successfully localized/19 total) Classically defined FFA (18)
Hemisphere
Mean # Voxels (SD)
X
y
Z
R
9.89 (6.04) 8.63 (5.60) 10.67 (9.98) 7.56 (5.55) 8.69 (8.58) 7.87 (7.39) 14.21 (12.20) 12.56 (10.25) 88.26 (3.05) 88.11 (2.42)
36.65 (4.43) −37.28 (5.93) 33.99 (5.15) −38.07 (8.10) 29.61 (8.09) −32.41 (5.57) 42.79 (7.80) −44.85 (9.78) 56.08 (2.45) −57.83 (3.42)
−49.98 (8.17) −49.12 (5.51) −82.74 (6.93) −82.53 (7.84) −56.12 (5.87) −53.76 (7.08) −73.43 (9.38) −78.45 (8.48) −41.46 (8.18) −45.62 (7.13)
−19.57 (3.11) −19.07 (4.58) −13.41 (6.93) −13.47 (5.04) −15.51 (2.90) −16.65 (5.42) −0.29 (6.85) 0.48 (7.71) 10.35 (5.67) 8.38 (4.62)
(19)
L
OFA (15)
R
(16)
L
FBA (16)
R
(15)
L
EBA (19)
R
(18)
L
pSTS (19) (19) Distributed masks Face-selective Body-selective Face-specific Body-specific
R L
389.68 (328.70) 373.63 (233.42) 216.58 (257.80) 210.00 (179.20)
Note. All ROIs were localized in native space. To obtain MNI coordinates, functional scans with ROIs were normalized to the MNI template using SPM8.
ROIs, however, because the quality of the information in the region, rather than overall size, is the major factor in determining classification performance.
2.5.2. Pattern classification The cross-validation pattern classifier procedure was adapted from previous work (cf., Natu et al., 2011, 2010), using custom-made functions written in MATLAB. We applied the pattern classifier to each participant's brain in native space. Input voxels into the classifier were defined by applying the appropriate ROI or distributed mask of interest to a participant's functional scan. This produced an I×J matrix, where I corresponded to the number of voxels (varied according to the masks), and J corresponded to the number of trials (60). From this matrix, we produced the training and test data. Training data consisted of 58 of the 60 trials (29 familiar trials/29 unfamiliar trials). The two “left-out” validation trials were used as test data. Each dataset was centered along its rows and columns. The two test trials were always of the same gender to prevent classification based on stimulus gender instead of familiarity condition. For selecting the test trials, we iterated through trials for each participant and sampled according to the requirements (one familiar/one unfamiliar; both the same gender). All trials served as test data once. Because we conducted the experiment in a single event-related run with randomized stimuli, this crossvalidation method eliminated the possibility that local hemodynamics could provide a cue to the stimulus condition. Although randomizing the stimulus presentation allowed for small variations in the temporal distance between sampled test trials, this randomization assured that
3. Result 3.1. Behavioral recognition accuracy and response latency Accuracy of participants’ behavioral recognition responses was assessed using d′.2 Correct familiar decisions were considered hits and incorrect familiar responses were counted as false alarms. Table 2 lists the full pattern of behavioral results. As expected, a MannWhitney-Wilcoxon rank test showed that behavioral accuracy was high and well above chance (M=1.82, Mdn=1.76, W=190, p=.00013). The average response latency for all trials was 5.11 s (SD=2.18 s, Mdn=5.31). Also as expected, people were faster on familiar trials (Mdn=5.02 s) than on unfamiliar trials (Mdn=5.64, W=254, p=.0319) 2 Due to timing constraints for the scan, the participants had 8s to respond to each trial. There were a small number of trials in which participants failed to make a response (28 trials total across 8 participants, range: minimum 1, maximum 10 trials. For the first 13 participants, a data collection error failed to record the participant's response on 3 of 60 trials – neural data were unaffected by this error. These trials were omitted from the behavioral analysis.
5
NeuroImage xx (xxxx) xxxx–xxxx
C.A. Hahn, A.J. O'Toole
computed 30 iterations of the cross-validation procedure described previously for each participant. From this procedure, we obtained one d′ measure for each participant in each ROI and at each time point. In what follows, we report mean classifier d's across all participants. Reported standard errors (SE) reflect mean classifier accuracy (d′) variance across participants. Our goal was to determine if the pattern of voxel activity, collectively within ROIs, could discriminate familiarity.
Table 2 Signal detection theory measures from behavioral accuracy. Measure
d′
Hit rate
FA rate
M SD
1.82 0.65
0.81 0.11
0.23 0.12
Note. M and SD represent mean and standard deviation, respectively.
3.2.1. Familiarity at a distance (TR1) Fig. 3a shows clear and accurate neural decoding at the farthest distance (TR1) in dorsal stream ROI, the bpSTS (Md′=0.39, SE=0.14, p=.01). The rpSTS was marginally significant at this time-point (Md′=0.25, SE=0.07, p=.06). No ventral stream regions decoded familiarity at this distance (in TR1).
(e.g., Hahn et al., 2013). Note that the high behavioral accuracy, which was an intentional part of the experimental design, precluded meaningful analysis that directly tied accuracy and neural decoding at the level of individual trials. As noted previously, the familiarization regime was designed to produce high accuracy with the goal of being able to decode person familiarity. A more complete picture of behavioral recognition for people in motion can be found in Hahn et al. (2016).
3.2.2. Familiarity at mid-range distances (TR2) Fig. 3b shows decoding of familiarity during TR2 in two bodyselective regions: the bEBA (Md′=0.35, SE=0.16, p=.03) and lFBA (Md’=0.37, SE=0.16, p=.03). No other classically defined ROIs accu-
3.2. Neural familiarity decoding: from far to near Accuracy of neural decoding was also measured using d′. We
Fig. 3. Each figure shows decoding accuracy (d′) in an ROI over the four distances (TRs 1–4), with above chance (p < .05) performance indicated with an asterisk. Error bars represent ± 1 standard error. At the furthest distance (TR1), accurate decoding of familiarity was found in bpSTS. At TR2, the bEBA and lFBA provided information about person familiarity. At TR3, the bFBA, rOFA, bOFA, and bpSTS accurately decoded person familiarity. At the closest distance (TR4), the lFBA coded familiarity.
6
NeuroImage xx (xxxx) xxxx–xxxx
C.A. Hahn, A.J. O'Toole
Fig. 4. A schematic of the time course of familiarity decoding as a function of the distance at which the walker is seen. Examples from one participant's classically-defined ROIs show that familiarity decoding over distances occurs in the ventral stream EBA (red), FBA (blue), OFA (green) regions, and in the dorsal stream pSTS (purple). Accurate familiarity decoding at the furthest distance (TR1) occurred in the bpSTS (purple); in TR2, decoding occurred in the lFBA and bEBA; in TR3 decoding was found in the bFBA, rOFA, bOFA, and bpSTS; and in TR4, decoded occurred in the lFBA. Refer to Table 1 for averaged peak MNI coordinates for each ROI tested. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 4 shows visualizations of all areas that successfully decoded familiarity over distances.
rately decoded familiarity at this time point. 3.2.3. Familiarity at closer range distances (TR3) The most robust decoding was found in TR3, as the face came into closer view and in the same time window as the average behavioral response. Fig. 3c shows familiarity decoding in both ventral and dorsal regions. In the ventral stream, significant decoding was found in the bFBA, with a strong peak in accuracy at TR3 (Md′=0.50, SE=0.15, p < .01). At this same distance, we found accurate decoding in the rOFA (Md′=0.49, SE=0.13, p < .01) and in the bOFA (Md′=0.36, SE=0.20, p < .01). In the dorsal stream, the bpSTS also decoded familiarity in this time window (Md′=0.49, SE=0.14, p < .01).
3.2.5. Distributed face- and body-selective masks Taking a broader look at familiarity decoding, Fig. 5a shows the discriminability of the neural response to familiar and unfamiliar people in the two distributed face- and body-selective areas in ventral-temporal cortex. The pattern of means for the distributed mask of face-selective voxels indicates above chance decoding at the midrange distances tested (TR2: Md′=0.39, SE=0.14, p=.04; TR3: Md′=0.47, SE=0.09, p=.01). Distributed body-selective voxels followed a similar pattern, with accurate classification in all but the most distant time point tested (TR2: Md′=0.41, SE=0.13, p=.04; TR3: Md′=0.56, SE=0.14, p=.02; and TR4: Md′=0.45, SE=0.10, p=.02).
3.2.4. Familiarity at closest distances (TR4) Only the lFBA decoded familiarity at the closest distance, where the person was partially out of view (Md′=0.34, SE=0.17, p=.03) (Fig. 3d).
3.2.6. Distributed face- and body-specific masks After explicitly eliminating overlapping face- and body-selective 7
NeuroImage xx (xxxx) xxxx–xxxx
C.A. Hahn, A.J. O'Toole
Fig. 5. Familiarity decoding accuracy (d′) over distances (TRs 1–4) in distributed masks of face- and body-selective voxels. Error bars represent ± 1 standard error. Above chance (p < .05) classification is designated using an asterisk. (A) For face- and body-selective voxels, accurate decoding was found at all but the furthest distance. (B) Overlapping voxels of the “other” category were removed to produce face- and body-specific voxels. For these face- and body-specific voxels, only body-specific voxels decoded and only at the third time point (TR3).
gait. The use of this complex, natural stimulus does not support direct conclusions that link successful familiarity decoding to particular identity cues (face, body, or gait). With this caveat in mind, in what follows, we sketch out the spatiotemporal course of decoding in the context of the known properties of the stimulus during the approach and the known response properties of the ROIs we tested. Beginning at the farthest distance, where biological motion information from gait is prominent, it is perhaps not surprising that the pattern of neural activity in pSTS was effective in discriminating the familiarity of the walker. Notwithstanding, this is the first study to demonstrate a role for pSTS in recognition memory for moving people. It is possible that the familiarity signal at this far distance was based on the biological motion of the person – sometimes referred to as the dynamic identity signature (O'Toole et al., 2002). This finding is largely consistent with the utility of dynamic identity signatures for identification proposed in O'Toole et al. (2002). That model extended Haxby et al.'s (2000) distributed system model for face perception by positing a memory function for dynamic facial expression and gesture in the pSTS. The present findings are consistent with this proposal and potentially broaden the role of the pSTS to include whole-person body motion from gait. Use of naturalistic stimuli also allowed us to examine the neural coding of familiarity as the walker approached and the resolution of the face and body increased. At these closer distances, additional ROIs contributed to the determination of familiarity. In the mid-range of the approach, familiarity was signaled in the body-selective bEBA and lFBA. At a closer distance, we found a strong and reliable familiarity signal in the body-selective bFBA, the face-selective rOFA and bOFA, and the dorsal stream pSTS. The involvement of the OFA face regions is consistent with behavioral data indicating the increased importance of the face for recognition with closer proximity (Hahn et al., 2016). At the closest distance tested, the body-selective lFBA strongly decoded familiarity as the person approached the camera and walked out of view. Although we did not predict the strong familiarity decoding in the body-selective ROIs at these closer distances, this result is not inconsistent with behavioral data. Specifically, Hahn et al. (2016) found that person recognition was far better with the face alone than with the body alone at a distance approximately equal to the one seen during TR3. Recognition accuracy with the body alone, however, was above chance. This presents a puzzle. Clearly the neural findings
voxels, we re-examined the contribution of voxels that were exclusively dedicated to processing faces or bodies. The pattern of results from these face-specific and body-specific masks is informative (Fig. 5b). Although these results roughly mirror those found with face- and bodyselective masks, decoding was above chance only in body-specific voxels at TR3 (Md′=0.47, SE=0.14, p=.04). For face-specific voxels, classification was at chance for all four distances. 3.3. Results summary These experiments yielded four novel findings. First, the familiarity of an approaching person was signaled at a substantial distance by the dorsal stream pSTS, bilaterally. Second, at closer distances, both ventral and dorsal stream face- and body-selective areas (bEBA, lFBA, bFBA, rOFA, bOFA, and bpSTS) carried information about person familiarity. Third, a distributed combination of ventral stream voxels selective for faces showed above chance decoding at mid-range distances. A distributed combination of voxels selective for the body showed above chance decoding of familiarity from all but the most distant view. Both the face-selective and body-selective voxels decoded most strongly at the third time point. This was the time point when the face was most clearly visible, just prior to the person exiting the field of view. Fourth, when the face- and body-selective voxel masks were edited to eliminate voxels that responded to both faces and bodies, familiarity was signaled at above chance levels only in the body-specific mask at the third time-point, suggesting a critical role for the interactive nature of face- and body-selective voxels in coding person familiarity. 4. Discussion The results of this study map out a network of cortical regions that signal familiarity in response to a person approaching from a distance. The neural coding of familiarity was distributed across dorsal visual areas sensitive to biological motion and across classically defined faceand body-selective regions in the ventral cortex. Although the behavioral recognition response tended to occur later in the video, neural signaling of person familiarity was present throughout the time course of the approach. The approaching walker stimulus that we used to test recognition included rich identity information from the face, body, and 8
NeuroImage xx (xxxx) xxxx–xxxx
C.A. Hahn, A.J. O'Toole
studying face recognition, rather than whole person recognition. However, there is now a critical mass of behavioral and neural evidence that demonstrates an important role for the body and for dynamic identity signatures in recognition. Here, we found powerful familiarity signaling from brain regions that code faces and bodies, as well as biological motion. In the context of evolution, the importance of recognizing a person at a substantial distance is clear. In a hostile environment, survival might well hinge on the ability to accurately distinguish friends from strangers and enemies at a safe distance. The present study indicates that a network of regions responsive to faces, bodies, and biological motion effectively support this recognition process over a broad range of distances. Although we focused here on familiarity signals in targeted brain regions and at discrete time points, our approach presents the opportunity to apply neural decoding methods across brain regions and across time, as a person approaches. Although beyond the scope of the current work, future study could use these methods to formulate and test specific hypotheses about human neural codes for recognizing familiar people.
indicate that body-selective voxels contain valuable information about the familiarity of the person, but the behavioral data point to the more accurate recognition of the person from the face. This puzzle is partially resolvable when we consider the familiarity decoding results in the broader distributed face- and body-selective voxels versus those found in the face- and body-specific areas. Recall that we found above chance familiarity decoding at all but the farthest distance using the ensemble of body-selective voxels in VTC. Similarly, for the ensemble of face-selective voxels in VTC, familiarity coding was possible in both mid-range distances. When face-selective areas were edited by eliminating body responsive voxels (i.e., to produce the facespecific mask), familiarity decoding failed at all distances. For the body-specific mask, familiarity decoding failed at all but the third time point. In combination, therefore, we found more robust familiarity signaling in collections of voxels that enabled interactions between face- and body-selective voxels. This is strongly suggestive of the importance of these interactions in the neural coding of person familiarity. The observation that the signaling of person familiarity may occur across a broad network of occipito-temporal areas is largely consistent with reports of abnormal familiarity judgments based on the face. In one case study (Van den Stock et al., 2012), a patient was unable to recognize her face as her own. She could, however, recognize her own body. Coupled with this behavioral observation, face-selective activity was absent in the right fusiform gyrus, however, body-selective activity in the right fusiform remained intact. The authors also observed hypometabolism in a wide range of brain regions, particularly in a broad area of occipito-temporal cortex. Another study (Van den Stock et al., 2013a, 2013b), showed the complement: Hyper-metabolism in temporo-parietal cortex was associated with hyper-familiarity and hyperanimacy judgments. Specifically, toy dolls were judged by a patient to be familiar and alive when the faces of the dolls were visible. Together with the current study, these findings indicate a broad network of regions devoted to processing the familiarity of faces and bodies. They also suggest that despite the potentially interactive nature of dedicated face- and body-selective voxels, these voxels may still play a partially independent role in face and body recognition. The complex interactions we found among regions make it difficult to map the behavioral pattern of recognition and neural substrates in a way that makes clear the unique contributions of face, body, and gait over distance. Similarly, in the behavioral person recognition study of Hahn et al. (2016), recognition responses were not linked directly to the quality of the information provided, but also reflected a decisionmaking process. For example, when presented with a video of a person approaching, participants used a response strategy whereby the recognition decision was “withheld” in experimental conditions when the participant anticipated a closer view of the person. This did not occur when the participant knew that only a relatively distant view would be made available. This strategy, therefore, reflects flexibility in response bias (criterion), rather than changes in recognition accuracy. Concomitantly, there is evidence that people use the body for recognition when the quality of the face is poor – notably without conscious awareness of the role the body is playing in recognition (Rice et al., 2013a). It is evident, therefore, that a participant's explicit behavioral strategy can be complex and sensitive to task demands. The complexity of the task in natural visual environments reminds us that there is a broader system of cognitive decision making at work. As such, the emotional valence of bodies and the surrounding environment itself influences how well faces are remembered (Van den Stock and de Gelder, 2012), and how faces are represented in the fusiform gyrus (Van den Stock et al., 2013a, 2013b). This task also reminds us that we detect visual familiarity in the complex context of retrieving broader person-knowledge for others when we are semantically and emotionally familiar with them (Gobbini and Haxby, 2007). The limited awareness we have of the role the body plays in person recognition is one possible reason for the emphasis in the literature on
Acknowledgements We thank Asal Baragchizadeh for assistance in preprocessing neuroimaging data and James Ryland for developing the brain visualization software, Volumetric 3, used to create Fig. 4. We also thank Dr. P. Jonathon Phillips for useful feedback and discussions. References Allison, T., Puce, A., McCarthy, G., 2000. Social perception from visual cues: role of the STS region. Trends Cogn. Sci. 4, 267–278. http://dx.doi.org/10.1016/S13646613(00)01501-1. Bernstein, M., Oron, J., Sadeh, B., Yovel, G., 2014. An integrated face-body representation in the fusiform gyrus but not the lateral occipital cortex. J. Cogn. Neurosci., 1–11. http://dx.doi.org/10.1162/jocn_a_00639. Bobes, M.A., Lage Castellanos, A., Quiñones, I., García, L., Valdes-Sosa, M., 2013. Timing and tuning for familiarity of cortical responses to faces. PLoS One 8, 1–10. http:// dx.doi.org/10.1371/journal.pone.0076100. Brandman, T., Yovel, G., 2016. Bodies are represented as wholes rather than their sum of parts in the occipital-temporal cortex. Cereb. Cortex 26, 530–543. http://dx.doi.org/ 10.1093/cercor/bhu205. Burton, A.M., Wilson, S., Cowan, M., Bruce, V., 1999. Face recognition in poor-quality video: evidence from security surveillance. Psychol. Sci. 10, 243–248. Cutting, J., Kozlowski, L., 1977. Recognizing friends by their walk: gait perception without familiarity cues. Bull. Psychon. Soc. 9, 353–356. http://dx.doi.org/10.3758/ BF03337021. Downing, P.E., Jiang, Y., Shuman, M., Kanwisher, N., 2001. A cortical area selective for visual processing of the human body. Science 293, 2470–2473. http://dx.doi.org/ 10.1126/science.1063414. Dubois, S., Rossion, B., Schiltz, C., Bodart, J.M., Michel, C., Bruyer, R., Crommelinck, M., 1999. Effect of familiarity on the processing of human faces. NeuroImage 9, 278–289 . http://dx.doi.org/10.1006/nimg.1998.0409. Etzel, J.A., 2015. MVPA permutation schemes: permutation testing for the group level. Proc. 2015 Int. Work Pattern Recognit. NeuroImaging, 65–68. http://dx.doi.org/ 10.1109/PRNI.2015.29. Fox, C.J., Moon, S.Y., Iaria, G., Barton, J.J.S., 2009. The correlates of subjective perception of identity and expression in the face network: an fMRI adaptation study. NeuroImage 44, 569–580. http://dx.doi.org/10.1016/j.neuroimage.2008.09.011. Gobbini, M.I., Haxby, J.V., 2007. Neural systems for recognition of familiar faces. Neuropsychologia 45, 32–41. http://dx.doi.org/10.1016/ j.neuropsychologia.2006.04.015. Gobbini, M.I., Haxby, J.V., 2006. Neural response to the visual familiarity of faces. Brain Res. Bull. 71, 76–82. http://dx.doi.org/10.1016/j.brainresbull.2006.08.003. Grossman, E.D., Blake, R., 2002. Brain areas active during visual perception of biological motion. Neuron 35, 1167–1175. http://dx.doi.org/10.1016/S0896-6273(02)008978. Hahn, C.A., Hart, E., Flanagan, K., Phillips, P.J., O'Toole, A.J., 2013. Time course of person recognition in a naturalistic environment. J. Vis. 13, 975. http://dx.doi.org/ 10.1167/13.9.975. Hahn, C.A., O'Toole, A.J., Phillips, P.J., 2016. Dissecting the time course of person recognition in natural viewing environments. Br. J. Psychol. 107, 117–134. http:// dx.doi.org/10.1111/bjop.12125. Haxby, J., Gobbini, M.I., Furey, M.L., Ishai, A., Schouten, J.L., Pietrini, P., 2001. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293, 2425–2430. http://dx.doi.org/10.1126/science.1063736. Haxby, J.V., Hoffman, E.A., Gobbini, M.I., 2000. The distributed human neural system for face perception. Trends Cogn. Sci. 4, 223–233. http://dx.doi.org/10.1016/ S1364-6613(00)01482-0.
9
NeuroImage xx (xxxx) xxxx–xxxx
C.A. Hahn, A.J. O'Toole
10.1007/s00221-011-2579-1. Puce, A., Allison, T., Asgari, M., Gore, J.C., McCarthy, G., 1996. Differential sensitivity of human visual cortex to faces, letterstrings, and textures: a functional magnetic resonance imaging study. J. Neurosci. 16, 5205–5215. Rice, A., Phillips, P.J., Natu, V., An, X., O'Toole, A.J., 2013a. Unaware person recognition from the body when face identification fails. Psychol. Sci., 1–9. http://dx.doi.org/ 10.1177/0956797613492986. Rice, A., Phillips, P.J., O'Toole, A.J., 2013b. The role of the face and body in unfamiliar person identification. Appl. Cogn. Psychol. 27, 761–768. http://dx.doi.org/10.1002/ acp.2969. Roark, D.A., O'Toole, A.J., Abdi, H., Barrett, S.E., 2006. Learning the moves: the effect of familiarity and facial motion on person recognition across large changes in viewing format. Perception 35, 761–773. http://dx.doi.org/10.1068/p5503. Robbins, R.A., Coltheart, M., 2015. The relative importance of heads, bodies, and movement to person recognition across development. J. Exp. Child Psychol. 138, 1–14. http://dx.doi.org/10.1016/j.jecp.2015.04.006. Robbins, R.A., Coltheart, M., 2012. The effects of inversion and familiarity on face versus body cues to person recognition. J. Exp. Psychol. Hum. Percept. Perform. 38, 1098–1104. http://dx.doi.org/10.1037/a0028584. Ross, P.D., de Gelder, B., Crabbe, F., Grosbras, M.-H., 2014. Body-selective areas in the visual cortex are less active in children than in adults. Front. Hum. Neurosci. 8, 941. http://dx.doi.org/10.3389/fnhum.2014.00941. Rossion, B., Schiltz, C., Crommelinck, M., 2003. The functionally defined right occipital and fusiform “face areas” discriminate novel from visually familiar faces. NeuroImage 19, 877–883. http://dx.doi.org/10.1016/S1053-8119(03)00105-8. Rossion, B., Schiltz, C., Robaye, L., Pirenne, D., Crommelinck, M., 2001. How does the brain discriminate familiar and unfamiliar faces?: a PET study of face categorical perception. J. Cogn. Neurosci. 13, 1019–1034. http://dx.doi.org/10.1162/ 089892901753165917. Said, C.P., Dotsch, R., Todorov, A., 2010a. The amygdala and FFA track both social and non-social face dimensions. Neuropsychologia 48, 3596–3605. http://dx.doi.org/ 10.1016/j.neuropsychologia.2011.02.028. Said, C.P., Moore, C.D., Engell, A.D., Todorov, A., Haxby, J.V., 2010b. Distributed representations of dynamic facial expressions in the superior temporal sulcus. J. Vis. 10, 1–12. http://dx.doi.org/10.1167/10.5.11.Introduction. Schwartz, C.E., Wright, C.I., Shin, L.M., Kagan, J., Whalen, P.J., McMullin, K.G., Rauch, S.L., 2003. Differential amygdalar response to novel versus newly familiar neutral faces: a functional MRI probe developed for studying inhibited temperament. Biol. Psychiatry 53, 854–862. http://dx.doi.org/10.1016/S0006-3223(02)01906-6. Schwarzlose, R.F., Baker, C.I., Kanwisher, N., 2005. Separate face and body selectivity on the fusiform gyrus. J. Neurosci. 25, 11055–11059. http://dx.doi.org/10.1523/ JNEUROSCI.2621-05.2005. Simhi, N., Yovel, G., 2015. Seeing people in motion enhances person recognition. J. Vis. 15, 695. http://dx.doi.org/10.1167/15.12.695. Stevenage, S., Nixon, M.S., Vince, K., 1999. Visual analysis of gait as a cue to identity. Appl. Cogn. Psychol. 13, 513–526. http://dx.doi.org/10.1002/(SICI)10990720(199912)13:6 < 513::AID-ACP616 > 3.0.CO;2-8. Van den Stock, J., de Gelder, B., 2012. Emotional information in body and background hampers recognition memory for faces. Neurobiol. Learn. Mem. 97, 321–325. http://dx.doi.org/10.1016/j.nlm.2012.01.007. Van den Stock, J., de Gelder, B., De Winter, F.-L., Van Laere, K., Vandenbulcke, M., 2012. A strange face in the mirror. Face-selective self-misidentification in a patient with right lateralized occipito-temporal hypo-metabolism. Cortex 48, 1088–1090. http://dx.doi.org/10.1016/j.cortex.2012.03.003. Van den Stock, J., de Gelder, B., Van Laere, K., Vandenbulcke, M., 2013a. Face-selective hyper-animacy and hyper-familiarity misperception in a patient with moderate Alzheimer's Disease. J. Neuropsychiatry Clin. Neurosci., 24. http://dx.doi.org/ 10.1017/CBO9781107415324.004. Van den stock, J., Vandenbulcke, M., Sinke, C.B.A., Goebel, R., de Gelder, B., 2013b. How affective information from faces and scenes interacts in the brain. Soc. Cogn. Affect. Neurosci. 9, 1481–1488. http://dx.doi.org/10.1093/scan/nst138. Vangeneugden, J., Peelen, M.V., Tadin, D., Battelli, L., 2014. Distinct neural mechanisms for body form and body motion discriminations. J. Neurosci. 34, 574–585. http:// dx.doi.org/10.1523/JNEUROSCI.4032-13.2014. Wiser, A.K., Andreasen, N., O'Leary, D.S., Crespo-Facorro, B., Boles-Ponto, L.L., Watkins, G.L., Hichwa, R.D., 2000. Novel vs. well-learned memory for faces: a positron emission tomography study. J. Cogn. Neurosci. 12, 255–266. http:// dx.doi.org/10.1162/089892900562084. Yovel, G., O'Toole, A.J., 2016. Recognizing people in motion. Trends Cogn. Sci., 1–13. http://dx.doi.org/10.1016/j.tics.2016.02.005.
Herrington, J.D., Nymberg, C., Schultz, R.T., 2011. Biological motion task performance predicts superior temporal sulcus activity. Brain Cogn. 77, 372–381. http:// dx.doi.org/10.1016/j.bandc.2011.09.001. Hodzic, A., Kaas, A., Muckli, L., Stirn, A., Singer, W., 2009. Distinct cortical networks for the detection and identification of human body. NeuroImage 45, 1264–1271. http:// dx.doi.org/10.1016/j.neuroimage.2009.01.027. Kanwisher, N., McDermott, J., Chun, M.M., 1997. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311. Katanoda, K., Yoshikawa, K., Sugishita, M., 2000. Neural substrates for the recognition of newly learned faces: a functional MRI study. Neuropsychologia 38, 1616–1625. http://dx.doi.org/10.1016/S0028-3932(00)00069-5. Kosaka, H., Omori, M., Iidaka, T., Murata, T., Shimoyama, T., Okada, T., Sadato, N., Yonekura, Y., Wada, Y., 2003. Neural substrates participating in acquisition of facial familiarity: an fMRI study. NeuroImage 20, 1734–1742. http://dx.doi.org/10.1016/ S1053-8119(03)00447-6. Lehmann, C., Mueller, T., Federspiel, A., Hubl, D., Schroth, G., Huber, O., Strik, W., Dierks, T., 2004. Dissociation between overt and unconscious face processing in fusiform face area. NeuroImage 21, 75–83. http://dx.doi.org/10.1016/ j.neuroimage.2003.08.038. Leveroni, C.L., Seidenberg, M., Mayer, A.R., Mead, L.A., Binder, J.R., Rao, S.M., 2000. Neural systems underlying the recognition of familiar and newly learned faces. J. Neurosci. 20, 878–886. Loula, F., Prasad, S., Harber, K., Shiffrar, M., 2005. Recognizing people from their movement. J. Exp. Psychol. Hum. Percept. Perform. 31, 210–220. http://dx.doi.org/ 10.1037/0096-1523.31.1.210. Natu, V., Jiang, F., Narvekar, A., Keshvari, S., Blanz, V., O'Toole, A.J., 2010. Dissociable neural patterns of facial identity across changes in viewpoint. J. Cogn. Neurosci. 22, 1570–1582. http://dx.doi.org/10.1162/jocn.2009.21312. Natu, V., O'Toole, A.J., 2015. Spatiotemporal changes in neural response patterns to faces varying in visual familiarity. NeuroImage 108, 151–159. http://dx.doi.org/ 10.1016/j.neuroimage.2014.12.027. Natu, V., O'Toole, A.J., 2011. The neural processing of familiar and unfamiliar faces: a review and synopsis. Br. J. Psychol. 102, 726–747. http://dx.doi.org/10.1111/ j.2044-8295.2011.02053.x. Natu, V., Raboy, D., O'Toole, A.J., 2011. Neural correlates of own- and other-race face perception: spatial and temporal response differences. NeuroImage 54, 2547–2555. http://dx.doi.org/10.1016/j.neuroimage.2010.10.006. O'Toole, A.J., Harms, J., Snow, S.L., Hurst, D.R., Pappas, M.R., Ayyad, J.H., Abdi, H., 2005. A video database of moving faces and people. IEEE Trans. Pattern Anal. Mach. Intell. 27, 812–816. http://dx.doi.org/10.1109/TPAMI.2005.90. O'Toole, A.J., Natu, V., An, X., Rice, A., Ryland, J., Phillips, J., Phillips, P.J., 2014. The neural representation of faces and bodies in motion and at rest. NeuroImage 91, 1–11. http://dx.doi.org/10.1016/j.neuroimage.2014.01.038. O'Toole, A.J., Phillips, P.J., Weimer, S., Roark, D.A., Ayyad, J., Barwick, R., Dunlop, J., 2011. Recognizing people from dynamic and static faces and bodies: dissecting identity with a fusion approach. Vision. Res. 51, 74–83. http://dx.doi.org/10.1016/ j.visres.2010.09.035. O'Toole, A.J., Roark, D.A., Abdi, H., 2002. Recognizing moving faces: a psychological and neural synthesis. Trends Cogn. Sci. 6, 261–266. http://dx.doi.org/10.1016/S13646613(02)01908-3. Peelen, M.V., Downing, P.E., 2005a. Selectivity for the human body in the fusiform gyrus. J. Neurophysiol. 93, 603–608. http://dx.doi.org/10.1152/jn.00513.2004. Peelen, M.V., Downing, P.E., 2005b. Within-subject reproducibility of category-specific visual activation with functional MRI. Hum. Brain Mapp. 25, 402–408. http:// dx.doi.org/10.1002/hbm.20116. Pilz, K.S., Vuong, Q.C., Bülthoff, H.H., Thornton, I.M., 2011. Walk this way: approaching bodies can influence the processing of faces. Cognition 118, 17–31. http:// dx.doi.org/10.1016/j.cognition.2010.09.004. Pitcher, D., Dilks, D.D., Saxe, R.R., Triantafyllou, C., Kanwisher, N., 2011a. Differential selectivity for dynamic versus static information in face-selective cortical regions. NeuroImage 56, 2356–2363. http://dx.doi.org/10.1016/j.neuroimage.2011.03.067. Pitcher, D., Duchaine, B., Walsh, V., 2014. Combined TMS and fMRI reveal dissociable cortical pathways for dynamic and static face perception. Curr. Biol., 1–5. http:// dx.doi.org/10.1016/j.cub.2014.07.060. Pitcher, D., Goldhaber, T., Duchaine, B., Walsh, V., Kanwisher, N., 2012. Two critical and functionally distinct stages of face and body perception. J. Neurosci. 32, 15877–15885. http://dx.doi.org/10.1523/JNEUROSCI.2624-12.2012. Pitcher, D., Walsh, V., Duchaine, B., 2011b. The role of the occipital face area in the cortical face perception network. Exp. Brain Res. 209, 481–493. http://dx.doi.org/
10