Neuron
Report A Channel for 3D Environmental Shape in Anterior Inferotemporal Cortex Siavash Vaziri,1,2 Eric T. Carlson,1,2 Zhihong Wang,2 and Charles E. Connor2,3,* 1Department
of Biomedical Engineering, Johns Hopkins University School of Medicine, 720 North Wolfe Street, Baltimore, MD 21205, USA Krieger Mind/Brain Institute, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA 3Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, USA *Correspondence:
[email protected] http://dx.doi.org/10.1016/j.neuron.2014.08.043 2Zanvyl
SUMMARY
Inferotemporal cortex (IT) has long been studied as a single pathway dedicated to object vision, but connectivity analysis reveals anatomically distinct channels, through ventral superior temporal sulcus (STSv) and dorsal/ventral inferotemporal gyrus (TEd, TEv). Here, we report a major functional distinction between channels. We studied individual IT neurons in monkeys viewing stereoscopic 3D images projected on a large screen. We used adaptive stimuli to explore neural tuning for 3D abstract shapes ranging in scale and topology from small, closed, bounded objects to large, open, unbounded environments (landscape-like surfaces and cave-like interiors). In STSv, most neurons were more responsive to objects, as expected. In TEd, surprisingly, most neurons were more responsive to 3D environmental shape. Previous studies have localized environmental information to posterior cortical modules. Our results show it is also channeled through anterior IT, where extensive cross-connections between STSv and TEd could integrate object and environmental shape information.
INTRODUCTION ‘‘Inferotemporal cortex’’ (IT) is a general designation for the final stages in the ventral, ‘‘what’’ pathway of monkey and human visual cortex, which progresses anteriorly through the inferior temporal lobe (Ungerleider and Mishkin, 1982; Mishkin et al., 1983). A recent meta-analysis of anatomical evidence concluded that monkey IT actually comprises three channels, with very different connectivities to prefrontal cortex, basal ganglia, and medial temporal lobe structures (Kravitz et al., 2013). Here, we observed a major functional distinction between two of these channels, STSv and TEd. As in our recent studies of 3D object shape coding (Yamane et al., 2008; Hung et al., 2012), we used an adaptive stimulus algorithm to localize and densely sample sparse IT tuning functions within the virtually infinite domain of 3D shape. The algorithm was initialized with random, abstract shapes, which evolved through
multiple generations, guided by differential neural responses to stimuli. Stimuli were segregated into two independent lineages, which could then be compared to evaluate repeatability and significance of results. This is an efficient search method for characterizing pure shape coding, free from semantic confounds and statistical biases in photographic stimuli. Here, we used an adaptive algorithm to test stimuli ranging continuously from small object shapes to large environmental shapes. ‘‘Object’’ stimuli were topologically spheroidal—closed (i.e., like a ball rather than a bowl) and bounded (completely contained) within the field of view. ‘‘Environmental’’ stimuli were topologically planar—open and extending beyond the field of view, like landscapes. Overall surface curvature of some environments, like small caves, could imply closure behind the viewer. Since the goal was to contrast responses to objects and environments, the adaptive method was critical for a fair comparison between optimized object stimuli and optimized environmental stimuli. A random selection of objects and environments could easily fail to sample the maximum response range in the dominant category. In addition, the parallel, independent lineages were essential for testing consistency of any response differences. While previous IT research has focused on objects, we found that the large majority of neurons in the TEd channel were more responsive to environmental shape. These responses were critically dependent on 3D shape cues, especially texture flows and shading. In contrast to TEd, the majority of neurons in the STSv channel were more responsive to object shape. This object/environment dichotomy is a functional counterpart to differences in anatomical connectivity with other brain regions. In particular, object-sensitive STSv, unlike TEd, provides major inputs to ventrolateral prefrontal cortex (VLPFC) regions involved in object working memory and orbitofrontal cortex (OFC) regions that process object value (Kravitz et al., 2013). The response bias in TEd shows that environmental shape information, which has been localized previously to more posterior and dorsal cortical modules (Epstein and Kanwisher, 1998; Epstein et al., 1999; Park and Chun, 2009; Ward et al., 2010; Nasr et al., 2011; Kornblith et al., 2013; Kravitz et al., 2011b), is also propagated through the final stages of the ventral pathway. As with face and color information (Tsao et al., 2003; Lafer-Sousa and Conway, 2013), environmental shape information remains at least partially segregated at these advanced processing stages. (Our anatomical sampling range largely spared regions where face and color modules have been identified.) However, the strong cross-connectivity between STSv and TEd would Neuron 84, 55–62, October 1, 2014 ª2014 Elsevier Inc. 55
Neuron A Channel for 3D Environmental Shape in IT
A 8.4o
17.3o
36.2o
65.1o
126.5o
95.2o
B
Ancestor
Descendants
E
C
STS
TE d
Lineage 1
Lineage 2
Lineage 1
spk s-1 69
r=0.83
Lineage 2
spk s-1 92
r=0.96
3 42
0 54
0.75
0.97
0
0 28
39 0.94
0.96
0
0
33
57 0.89
0.88
0
7 60
60
40
40
20
F Response (spk s−1)
Response (spk s−1)
D
20
0 4
22
0 80 142 4 22 Angular subtense (deg)
60
60
40
40
20
20
0 80
142
4
22
0 4 22 80 142 Angular subtense (deg)
80
142
Figure 1. Stimuli and Example Results (A) Example stimulus shape presented at six scales. Inset values specify the maximum visual angle subtended by the stimulus. The fixation point (red dot) remains at the same point on the stimulus surface. As scale increases, the stimulus extends beyond the screen borders, and the visible portion becomes landscape-like. (B) Example morphing transformations. From left to right: surface distortion removal, x axis rotation, translation/rotation of latitudinal ring, change in surface distortion height function, y axis rotation. (C) High-response stimuli for example TEd neurons. The coronal MRI image shows the approximate extent of TEd (cyan tint) and recording location of example neurons (colored dots). For each neuron, high-response stimuli from independent lineages (left and right columns) are indicated by arrows of the corresponding color. The border color around each stimulus indicates response rate (averaged across the 750 ms presentation period and five repetitions) based on the scale bars at right, which index the minimum-to-maximum response range of each neuron in sp/s. As in our previous studies of IT shape coding (Yamane et al., 2008; Hung et al., 2012), we observed that high-response stimuli differed in global shape but shared partial structure, consistent with fragment-based ensemble coding of shape. In these previous studies, we used multidimensional parameterization of surface and medial axis shape to construct linear/nonlinear models of 3D shape fragment tuning validated by correspondence across lineages. Similar analyses can be applied to these data but will require extensive presentation in a separate report. Precise characterization of shape tuning is not critical for our purpose here, which is to analyze overall selectivity for stimulus scale. The essential cross-validation here is between scale-tuning functions in separate lineages. (D) Scale-tuning functions for example TEd neurons. Average response (±SEM) across all stimuli as a function of stimulus scale is plotted for each example (indexed by color) in each lineage (left versus (legend continued on next page)
56 Neuron 84, 55–62, October 1, 2014 ª2014 Elsevier Inc.
Neuron A Channel for 3D Environmental Shape in IT
support integration of object information into environmental contexts. RESULTS We studied 76 TEd neurons and 65 STSv neurons in two monkeys performing a passive fixation task. We measured responses to abstract 3D shapes, projected stereoscopically on a screen subtending 77 of visual angle horizontally and 61 vertically. Position and structure in depth were conveyed by binocular disparity (stereopsis), shading (consistent with a light source from the viewing direction at infinite distance), and surface texture (hexagonal grids applied to the stimulus surface). 3D stimulus shapes were generated by randomly distorting an initially spherical nonuniform rational b-spline (NURBS) surface. Stimuli varied in scale and topology (Figure 1A) from small, closed objects subtending a few degrees of visual angle to large, open surfaces resembling landscapes or cave interiors (these were the visible portions of large stimuli extending beyond the screen boundaries). We studied each neuron with two independently evolving stimulus lineages in order to verify repeatability of results. The first generation of each lineage comprised 40 random 3D shapes. Successive stimulus generations included ‘‘descendant’’ stimuli that were partially morphed versions of ‘‘ancestor’’ stimuli from previous generations. The ‘‘fitness’’ of an ancestor stimulus—its probability of producing one or more descendants in the current generation—was defined by its neural response level, averaged over five presentations. Descendants were created by morphing the shape, position, orientation, and/or scale of the ancestor (Figure 1B). Scale changed probabilistically, but the distribution of scales within each generation was constrained for equal sampling of object-sized stimuli (subtending 4 –22 of visual angle), environmental stimuli (subtending 80 –142 , i.e., greater than screen width), and intermediate stimuli (subtending 22 –80 ). Neurons were typically studied with 5–8 stimulus generations, for a total of 400–640 stimuli. TEd neurons were typically selective for large structures extending past the screen boundary (i.e., subtending > 80 of visual angle), as illustrated by the examples in Figures 1C and 1D (see also Figure S1A available online). Response rates (averaged across the 750 ms presentation period) of these example neurons were maximal in the environmental scale range and fell to background levels for object-scale stimuli. Scale-tuning functions were strongly correlated across lineages (r values in Figure 1C), demonstrating consistent evolution in the adaptive stimulus algorithm. The response bias of each neuron toward environmental stimuli (subtending > 80 ) over object stimuli (subtending < 22 ) was highly significant (p < 0.0001, Wilcoxon rank-sum test). In contrast, STSv neurons were typically selective for small, object-like stimuli with closed, bounded surfaces, as illustrated by the examples in Figures 1E and 1F (see also Figure S1B). Response rates of these example neurons were maximal in the
object scale range and fell to background levels for environmental stimuli. Scale-tuning functions were strongly correlated across lineages (Figure 1F), and the response bias of each neuron toward object stimuli was highly significant (p < 0.0001, Wilcoxon rank-sum test). Responsiveness in STSv to 3D, object-like stimuli, defined in part by their surface curvatures, is consistent with previous studies (Janssen et al., 2000; Yamane et al., 2008). Differential processing of object versus environmental shape between the two channels was strong and significant at the population level (Figure 2). The scale-tuning function for each neuron in our sample is depicted by a colored strip (Figure 2A), in which color represents the stimulus range (red = objects, green = environments) and brightness represents average response strength at each point along the stimulus continuum. Neurons are ordered according to recording location around the circumference of IT in the coronal plane. Thus, from left to right, recording locations progress laterally across ventral bank STS, from fundus to lip, and then ventromedially across the surface of the gyrus (see MRI images). In the anterior-posterior direction, these recording locations spanned a range from +8 to +22 mm (anterior to the interaural line), mostly between the expected locations of the middle and anterior face and color patches. Only two recording sites fell within the expected location of the AL face patch (Figure S2A) (Freiwald and Tsao, 2010). There were no significant anteriorposterior or mediolateral trends in scale tuning in either STS or TEd (Figure S2A caption). There was, however, a major difference between channels (Figure 2B). The majority of STSv neurons (49/65; 75%) were significantly (p < 0.05, Wilcoxon rank-sum test) more responsive to object stimuli (red dots, Figure 2B). Only eight STSv neurons were significantly more responsive to environmental stimuli (green dots). Conversely, the majority of TEd neurons (50/76; 66%) were significantly (p < 0.05, Wilcoxon rank-sum test) more responsive to environmental stimuli (green dots). Only 16 TEd neurons were significantly more responsive to object stimuli (red dots). In both channels, scale-tuning functions were highly correlated across independent stimulus lineages (Figure 2C). This correlation was significant (p < 0.01) for 60/65 STSv neurons and 71/76 TEd neurons. The average normalized response across STSv neurons (Figure 2D, red) was highest in the small object range and declined gradually to a minimum at stimulus diameters near 70 . In contrast, the average normalized response across TEd neurons (Figure 2D, green) was low, up to 70 , then climbed abruptly, at precisely the point at which stimuli became large enough to exceed the viewing frame, thus becoming topologically open and unbounded, i.e., environmental. Neurons in both channels were highly selective for stimulus shape within these different ranges (Figures S1C and S1D). We conclude that the STSv channel was predominantly sensitive to object shape, as expected; the TEd channel contained some objectresponsive neurons but was predominantly sensitive to environmental shape.
right). The correlations between scale-tuning functions across lineages are given by the r values in (C). These correlations were each highly significant (p < 0.001). Baseline activity ± SEM for each example neuron (colored points next to vertical axis) was averaged across randomly scheduled null stimulus presentation periods during the fixation task. (E) High-response stimuli for example STSv neurons. Details are as in (C). (F) Scale-tuning functions for example STSv neurons. Details are as in (D). See also Figure S1.
Neuron 84, 55–62, October 1, 2014 ª2014 Elsevier Inc. 57
Neuron A Channel for 3D Environmental Shape in IT
A
STS
160 132
78 50 40 30 20 10
Normalized Response
0 −1
D
Number of Cells
Number of Cells
C
Wilcoxon Rank−Sum Value
TEd
B
−0.5
0 0.5 Pearson r
1
40 30 20 10 0 −1
−0.5
0 0.5 Pearson r
1
0.7 0.6
TEd
0.5 0.4 0.3
STSv
0.2 0.1 4
22
80
142
Angular subtense (deg)
Figure 2. Population Results (A) The scale-tuning function of each neuron (n = 141) is plotted as a vertical strip. Color represents scale, ranging from small objects (red) to large environments (green). Brightness at each point along the vertical strip is proportional to the average normalized response strength across all stimuli at the corresponding scale in the top half of the neuron’s response range. Neurons are arranged along the horizontal axis based on their positions in the coronal plane along the STSv and TEd channels (see MRI images and arrows). (B) For each neuron, we plot the result of a Wilcoxon rank-sum test based on the ten highest response environmental stimuli (subtending > 80 ) as compared to the ten highest response objects (subtending < 22 ). The rank-sum value can vary from 55 (if the environmental stimuli occupy all the lower ranks, 1–10) to 155 (if the environmental stimuli occupy all the higher ranks, 11–20). Dashed lines indicate significance thresholds (p < 0.05, two-tailed). Significant neurons are plotted in red (object) or green (environment). Neurons are ordered along the horizontal axis as in (A). (C) Distributions of Pearson correlations between scale-tuning functions in the two independent lineages for each neuron. Filled bars represent significant correlations (p < 0.01). (D) Average normalized
58 Neuron 84, 55–62, October 1, 2014 ª2014 Elsevier Inc.
Since the prevalence of environmental shape processing in TEd was unexpected, we performed a number of additional tests (when recording time permitted) to confirm that TEd neurons were responding to 3D environmental shape rather than some associated, lower-level stimulus factor. We selected a highresponse 3D environmental stimulus from the adaptive search and tested how responses to this stimulus depended on multiple factors. First, we compared responses to equivalent 2D stimuli, in which 3D structure was destroyed by eliminating binocular disparities, shading, and texture flows (by alternatively flattening the hexagonal grid onto the plane of the projection screen, by randomizing the orientations of the constituent texture lines, or by removing texture entirely). Destroying 3D structure in this way largely abolished responses in most cases (Figure 3A; similar results for STSv neurons are shown in Figure S3A). Thus, these neurons were specifically responsive to shape-indepth, which is especially definitive for environments. This shows that responses did not depend simply on the presence of a large 2D shape or texture. It also shows that responses did not depend simply on stimulation of more peripheral parts of the visual field. In other words, large receptive fields, while certainly necessary, are not sufficient to explain selective responses to environmental stimuli. This was further confirmed by testing responses to high-response object stimuli at peripheral locations (Figure S3B). We further tested dependence on 3D cues by presenting all combinations of binocular disparity, shading, and texture flows (Figure S3C). Texture flow was the most significant 3D cue for environmental stimuli (F(1,21) = 16.8; p < 0.002, three-way repeated-measures ANOVA), followed by shading (F(1,21) = 10.4; p < 0.005). Stereoscopic cues did not have a significant effect on environmental shape responses (F(1,21) = 1,8; p = 0.194), which makes sense because binocular disparity differences become negligible at large distances and thus do not provide precise information about large-scale shape-in-depth. Texture flows provide strong cues for 3D shape independent of stereoscopic depth (Ben-Shahar and Zucker, 2001, 2003; Li and Zaidi, 2000, 2004). The low responses to 2D stimuli that preserved but distorted textures helps address the concern that neurons might simply respond to higher spatial frequencies in environmental stimulus textures (Rajimehr et al., 2011). To further rule out this concern, we tested responses when the spatial frequency of the texture in the 3D stimuli was decreased or increased by a factor of 2 (Figure 3B). These manipulations did not significantly affect responses (F(2,32) = 0.16; p = 0.85), showing that the neurons were not merely sensitive to a specific spatial frequency range. Another concern could be that IT responses are known to be somewhat size invariant (Ito et al., 1995; Brincat and Connor, 2004). Given this, high-response shapes might by chance evolve at a particular scale but might evoke equally strong responses at a very different scale. This concern is addressed by the strong and in most cases (131/141) significant (p < 0.01) correlation between scale-tuning functions across entirely independent response levels in STSv (red) and TEd (green) as a function of stimulus diameter. The shaded regions represented the 95% confidence interval of average response values. See also Figure S2.
Neuron
Modulation strength (2D)
A 25.3
B
1
n=27
0
−1 −1
0
Medium−density texture (spk s −1 )
A Channel for 3D Environmental Shape in IT
50
40
30
30
20
20
10
10
0
0 spk s
0 0
1
Modulation strength (3D)
-1
50
n=17
40
10 20 30 40 50 Low−density texture (spk s−1 )
0
10 20 30 40 50 High−density texture (spk s−1 )
C 23.8o
37.7o
51.6o
94.4o
65.5o
104.4o
40 20 0 0
20
40
60 −1
Right light (spk s )
60 40 20 0 0
20
40
60 −1
Left light (spk s )
60
Front light (spk s−1)
n=22
Front light (spk s−1)
Front light (spk s )
60
−1
Front light (spk s−1)
D
40 20 0 0
20
40
60 −1
Bottom light (spk s )
60 40 20 0 0
20
40
60
Top light(spk s−1)
Figure 3. Control Tests on TEd Neurons with Significant Selectivity for Environmental Stimuli Example results are presented for a single TEd neuron (cyan dot in population plots). (A) Sensitivity to 3D shape-in-depth versus 2D shape. One high-response and one low-response environmental stimulus were selected from the adaptation experiment. Modulation strength is the response difference divided by the maximum. 3D modulation strength (x axis) is based on the original stimuli with all depth cues. 2D modulation strength (y axis) was based on stimuli with no disparity cues, no shading, and frontoparallel hexagonal texture, random line texture, or no texture (silhouettes), whichever produced the highest modulation value. Stimuli for the example neuron are shown next to the corresponding axes, with borders indicating response rate (see scale bar). Removing cues for shapein-depth largely abolished differential responses. The average 3D modulation strength of 0.86 was significantly greater than the average 2D modulation strength of –0.041 (paired t test, p < 0.0001). (B) Responses do not depend on texture density. Responses to the original texture density (medium) are plotted against the vertical scale. Responses remained consistent when texture density decreased (low) or increased (high). (C) Scale-tuning test. The high-response environmental stimulus for the example neuron was presented at six scales, ranging from object to environmental. (The original stimulus from the adaptive experiment was the largest scale, at right.) Preference for environmental-scale stimuli was maintained even for this optimal shape. Similar consistency of scale tuning for optimal shapes was observed for other neurons in TEd (Figure S3D) and STSv (Figure S3E). (D) Consistency of response across lighting directions. Responses to the original lighting direction (from the direction of the viewer, at infinite distance) are plotted against the vertical scale. Responses remained consistent for spotlights positioned to the right, left, bottom, or top, which produced substantially different image contrast patterns. See also Figure S3.
lineages (Figure 2C). But we also selected high-response stimuli for some neurons and tested them across the entire scale range (see example in Figure 3C). Scale selectivity remained strong and consistent with the results of the adaptive experiment in all cases (Figures S3D and S3E). We also tested the effect of lighting direction, which determines the pattern of shading across the stimulus. During the adaptive search procedure, the implicit light source was at infinite distance from the viewer’s direction. We additionally tested responses when the light source was above, below, to the right,
or to the left. The results (Figure 3D) show that the 3D environmental shape selectivity of these TEd neurons was remarkably consistent (F(4,84) = 0.64, p = 0.64) across these manipulations, which produced very different image contrast patterns. DISCUSSION Our results reveal a surprising specialization for environmental shape processing in the middle channel (TEd) of anterior IT cortex in macaque monkeys. The majority of TEd neurons were Neuron 84, 55–62, October 1, 2014 ª2014 Elsevier Inc. 59
Neuron A Channel for 3D Environmental Shape in IT
substantially more responsive to large, open, unbounded 3D surface shapes. These responses were critically dependent on depth cues, especially texture flows (Ben-Shahar and Zucker, 2001, 2003; Li and Zaidi, 2000, 2004). This makes sense since environments are more defined by 3D shape-in-depth than by 2D self-occlusion boundaries. Texture flows are likely to be an essential 3D shape cue for environments, which occupy more distant depth ranges where binocular disparity differences become negligible. A previous study showed that TEd neurons are not sensitive to stereoscopic depth in object-size stimuli and concluded that TEd represents only 2D shape (Janssen et al., 2000). To the contrary, we found that TEd is very sensitive to 3D shape but on the scale of environments, a scale on which stereoscopic depth cues provide little shape information. TEd selectivity for environments contrasted with STS selectivity for objects, which have long been considered the primary domain of IT processing. This is a clear functional distinction between the anatomically defined channels in anterior ventral pathway described by Kravitz and colleagues (2013). These channels differ in connectivity with prefrontal cortex, basal ganglia, and medial temporal lobe memory structures. Our finding that STS is more specialized for object shape is consistent with its stronger connection to object-sensitive ventrolateral prefrontal cortex (Kravitz et al., 2013). It should be noted that we observed some selectivity for objects in TEd as well, consistent with prior studies of object processing in this region. Channeling of environmental and object shape processing in anterior IT is a new feature of modular organization in ventral pathway cortex. Face and color modules occur at intervals along the inferior temporal lobe (Tsao et al., 2003; Moeller et al., 2008; Lafer-Sousa and Conway, 2013). Our anterior-posterior sampling range fell mostly in between reported locations for middle and anterior face and color patches. Our sampling also largely spared the lip of the sulcus, where face patches tend to be located. Thus, our findings expand on rather than conflict with prior understanding of anterior IT modularity. Our discovery of environmental shape processing in anterior IT contrasts with previous work that localized processing of place information (scenes and buildings) mainly to more posterior and dorsal visual regions. Beginning with the identification of the parahippocampal place area (PPA) (Epstein and Kanwisher, 1998; Epstein et al., 1999), human fMRI studies have revealed multiple posterior/dorsal areas involved in structural and/or semantic processing of place information (Epstein 2008; Park and Chun, 2009; Ward et al., 2010; Epstein and Ward, 2010; Kornblith et al., 2013; Park et al., 2011; Mullally and Maguire, 2011; Kravitz et al., 2011a; Harel et al., 2013). Posterior scene-sensitive patches in macaque monkey cortex have been identified with fMRI, and specialization of individual neurons in posterior IT (TEO and/or TFO) for place processing has recently been confirmed by Kornblith and colleagues (Kornblith et al., 2013; see also Sato and Nakamura, 2003). These posterior IT neurons were particularly sensitive to surface textures, analogous to our finding in TEd that texture flows were the most critical cues for shape-in-depth. Our data reveal that in addition to these posterior modules, information about place structure is also processed in anterior stages of the ventral pathway. The ventral pathway is specialized 60 Neuron 84, 55–62, October 1, 2014 ª2014 Elsevier Inc.
for fine shape discrimination and thus could support accurate recognition of familiar environments—rooms, neighborhoods, landscapes—even when the objects or landmarks they contain are rearranged or removed. Perhaps more importantly, object information must ultimately be organized within representations of the spatial environment to support scene comprehension and physical interactions with the world. The strong connectivity between object-sensitive STS and environment-sensitive TEd would enable such integration. Consistent with this idea, MacEvoy and Epstein (2011) have shown that, in human cortex, ventral pathway area LO, but not PPA, combines information about scenes and scene-associated objects. Our findings are consistent with some fMRI evidence for more anterior sensitivity to scene information. In addition to finding a homolog to human PPA in monkey posterior temporal cortex (mPPA), Nasr and colleagues (2011) showed some activation in more anterior locations (Figure 8 in Nasr et al. 2011). In humans, a recent study shows that semantic information about scene categories is represented beyond the boundaries of previously described functional regions (Stansbury et al., 2013). It is also worth noting that neural recording can reveal substantial selectivity in regions where fMRI differences are weak or absent: Kornblith and colleagues (2013) found that neural place selectivity was prominent not only in a lateral place patch (LPP), which emerged in an fMRI contrast between places and other stimuli, but also in a more medial place patch (MPP), which was identified only through stimulation of LPP. Our results might also relate to the parallel organization of large and small object representations in human cortex described by Konkle and Oliva (2012). They found that smaller objects are represented in superior/lateral ventral pathway, while larger object representations occur in a more inferior/medial swath. This organization depended on the real-world size of the familiar object stimuli, not on their retinal size. This result seems consistent with our finding that small, abstract objects are represented in a more superior channel and large environments in a more inferior channel. Representations of large objects and large environments are likely to colocalize, since they are statistically related in the natural world. Also, both depend on inputs from more peripheral parts of the visual field. Anterior IT does not appear to be retinotopic (Lafer-Sousa and Conway, 2013), but the ventral location of large object/environment processing might reflect the ventral origin of peripheral upper field inputs from early visual cortex (Kravitz et al., 2013). Under natural viewing conditions where nearby objects are foveated at ground level, the upper field periphery is likely to contain the most information about surrounding environmental structure. Our results leave open the question of whether environmental information in TEd is more structural or categorical/semantic in nature. These two domains are difficult to dissociate because they are statistically correlated in the natural world and necessarily entwined at the neural processing level and in perception (Kourtzi and Connor, 2011). Our stimuli were entirely abstract and thus serve to show that there is at least some level of structural processing in TEd. But further experiments involving photographs of familiar environments would be required to distinguish representation of structure from structural processing in support of place recognition and categorization. It is also conceivable
Neuron A Channel for 3D Environmental Shape in IT
that photographic object and scene stimuli would reveal a different pattern of anatomical organization. Our results establish a clear dichotomy in structural processing, but semantic representation of familiar stimuli can be independent of structural metrics, including scale (Konkle and Oliva, 2012). EXPERIMENTAL PROCEDURES Behavioral Task, Stimulus Presentation, and Electrophysiological Recording Two head-restrained male rhesus monkeys (Macaca mulatta) were trained to maintain fixation within 1 (radius) of a 0.1 diameter spot for 4 s to obtain a juice reward. Eye position was monitored with an infrared eye tracker (EyeLink). 3D shape stimuli were rendered with shading, surface texture, and binocular disparity cues using openGL. Two precisely aligned high-resolution (1,400 3 1,050) color projectors were used to back project differentially polarized right and left eye images on a screen subtending 77 of visual angle horizontally and 61 vertically with the monkey seated at a viewing distance of 55 cm. The fixation target appeared at screen center, with one of four right/left image disparities corresponding to fixation at 55 cm (screen depth), 100 cm, 10 m, or 100 m. The stimulus was always tangent to the fixation point, at 0 disparity, so the monkey was always fixating a point on the stimulus surface. During fixation, randomly ordered stimuli were flashed on the screen with a period of 1 s (250 ms prestimulus blank, 750 ms stimulus presentation), for a total of four stimuli per trial. Prior to each experimental session, binocular fusion and stereoscopic depth perception were verified with a random dot stereogram saccade target detection task. The electrical activity of well-isolated single neurons was recorded with epoxy-coated tungsten electrodes (Microprobe or FHC). Action potentials of individual neurons were amplified and electrically isolated using a Tucker-Davis Technologies recording system. Recording positions were determined on the basis of structural MRIs and the sequence of sulci and response characteristics observed while lowering the electrode. All animal procedures were approved by the Johns Hopkins Animal Care and Use Committee and conformed to U.S. National Institutes of Health and U.S. Department of Agriculture guidelines. Stimulus Construction and Adaptive Algorithm Each stimulus was based on a 3D ellipsoidal shape defined by a dense polar grid of nonuniform rational b-splines (NURBS) control points, which can be rendered as a smooth surface with the OpenGL NURBS utility. Large-scale shape variation was created by randomly altering the positions, orientations, sizes, and aspect ratios of two to five latitudinal rings in the polar grid and interpolating the positions of intervening NURBs control points. Smaller-scale variation in surface shape was created by defining points and paths where surface height was altered with random polynomial functions, creating hills, valleys, ridges, canyons, etc. Global surface smoothness was varied to span from the curvilinear structure of natural landscapes to the rectilinear structure of manmade environments. Each successive generation in the adaptation procedure comprised 40 stimuli in each of two lineages, for a total of 80 stimuli. In each generation, 80% of stimuli were partially morphed descendants of ancestors drawn from the entire pool of previously tested stimuli, and 20% were new, randomly defined stimuli. The probability of a given stimulus producing a descendant was based on its average response rate. The current response range of the neuron was divided into five percentile ranges. Thirty percent of ancestors were from the top range (90%–100% of maximum response), 20% each from the three middle ranges (70%–90%, 50%–70%, 30%–50%), and 10% from the bottom range (0%– 30%). To produce each descendant, one to three morphing transformations were applied. The number and magnitude of morphing transformations were probabilistic functions of response rate, to produce more alteration of low-response ancestors and less alteration of high-response ancestors. The probabilities of specific morphing transformations were heavily weighted toward scaling, since this produced the transition from object to environmental shapes. The scale distribution within each generation was constrained for equal sampling in the object range (4 –22 of visual angle), intermediate range (subtending
22 –80 ), and environmental range (80 –142 ). The morphing probabilities were scaling 35%, fixation depth 5%, rotation 10% (equal probability for x, y, and z axis rotation), translation 10%, longitudinal scaling of the NURBs grid 5%, translation/rotation of one latitudinal ring 5%, change in elliptical aspect ratio of one latitudinal ring 5%, global latitudinal curvature 5%, global longitudinal curvature 5%, height function of a surface distortion 5%, path shape of a surface distortion 5%, position of a surface distortion 2.5%, and removal of a surface distortion 2.5%. All stimuli were adjusted in depth such that the surface was tangent to the fixation point at 0 disparity. Closer fixation depths were prohibited when they would cause the stimulus surface to intersect the surface of the projection. Data Analysis and Statistics Response rates for each stimulus were averaged across the 750 ms presentation period and across five repetitions. Baseline response was averaged over randomly scheduled null stimulus presentations during the adaptive experiment. Baseline was subtracted from each stimulus response for the population analyses presented in Figure 2. Selectivity for object versus environmental stimuli was characterized with a Wilcoxon rank-sum test. Pearson correlation was used to evaluate consistency of scale tuning across independent lineages. Repeated-measures ANOVA was used to evaluate the effects of control test manipulations. Details of analyses are described in Results. SUPPLEMENTAL INFORMATION Supplemental Information includes three figures and can be found with this article online at http://dx.doi.org/10.1016/j.neuron.2014.08.043. ACKNOWLEDGMENTS We thank William Nash, William Quinlan, Lei Hao, and Virginia Weeks for technical assistance. This work was supported by NIH Grant EY024028. Accepted: August 14, 2014 Published: September 18, 2014 REFERENCES Ben-Shahar, O., and Zucker, S.W. (2001). On the perceptual organization of texture and shading flows: From a geometrical model to coherence computation. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference, Volume 1. IEEE. I–1048. Ben-Shahar, O., and Zucker, S.W. (2003). The perceptual organization of texture flow: A contextual inference approach. IEEE Trans. Pattern Anal. Mach. Intell. 25, 401–417. Brincat, S.L., and Connor, C.E. (2004). Underlying principles of visual shape selectivity in posterior inferotemporal cortex. Nat. Neurosci. 7, 880–886. Epstein, R.A. (2008). Parahippocampal and retrosplenial contributions to human spatial navigation. Trends Cogn. Sci. 12, 388–396. Epstein, R., and Kanwisher, N. (1998). A cortical representation of the local visual environment. Nature 392, 598–601. Epstein, R.A., and Ward, E.J. (2010). How reliable are visual context effects in the parahippocampal place area? Cereb. Cortex 20, 294–303. Epstein, R., Harris, A., Stanley, D., and Kanwisher, N. (1999). The parahippocampal place area: recognition, navigation, or encoding? Neuron 23, 115–125. Freiwald, W.A., and Tsao, D.Y. (2010). Functional compartmentalization and viewpoint generalization within the macaque face-processing system. Science 330, 845–851. Harel, A., Kravitz, D.J., and Baker, C.I. (2013). Deconstructing visual scenes in cortex: gradients of object and spatial layout information. Cereb. Cortex 23, 947–957. Hung, C.C., Carlson, E.T., and Connor, C.E. (2012). Medial axis shape coding in macaque inferotemporal cortex. Neuron 74, 1099–1113.
Neuron 84, 55–62, October 1, 2014 ª2014 Elsevier Inc. 61
Neuron A Channel for 3D Environmental Shape in IT
Ito, M., Tamura, H., Fujita, I., and Tanaka, K. (1995). Size and position invariance of neuronal responses in monkey inferotemporal cortex. J. Neurophysiol. 73, 218–226.
Moeller, S., Freiwald, W.A., and Tsao, D.Y. (2008). Patches with links: a unified system for processing faces in the macaque temporal lobe. Science 320, 1355–1359.
Janssen, P., Vogels, R., and Orban, G.A. (2000). Selectivity for 3D shape that reveals distinct areas within macaque inferior temporal cortex. Science 288, 2054–2056.
Mullally, S.L., and Maguire, E.A. (2011). A new role for the parahippocampal cortex in representing space. J. Neurosci. 31, 7441–7449.
Konkle, T., and Oliva, A. (2012). A real-world size organization of object responses in occipitotemporal cortex. Neuron 74, 1114–1124. Kornblith, S., Cheng, X., Ohayon, S., and Tsao, D.Y. (2013). A network for scene processing in the macaque temporal lobe. Neuron 79, 766–781. Kourtzi, Z., and Connor, C.E. (2011). Neural representations for object perception: structure, category, and adaptive coding. Annu. Rev. Neurosci. 34, 45–67. Kravitz, D.J., Peng, C.S., and Baker, C.I. (2011a). Real-world scene representations in high-level visual cortex: it’s the spaces more than the places. J. Neurosci. 31, 7322–7333. Kravitz, D.J., Saleem, K.S., Baker, C.I., and Mishkin, M. (2011b). A new neural framework for visuospatial processing. Nat. Rev. Neurosci. 12, 217–230. Kravitz, D.J., Saleem, K.S., Baker, C.I., Ungerleider, L.G., and Mishkin, M. (2013). The ventral visual pathway: an expanded neural framework for the processing of object quality. Trends Cogn. Sci. 17, 26–49. Lafer-Sousa, R., and Conway, B.R. (2013). Parallel, multi-stage processing of colors, faces and shapes in macaque inferior temporal cortex. Nat. Neurosci. 16, 1870–1878. Li, A., and Zaidi, Q. (2000). Perception of three-dimensional shape from texture is based on patterns of oriented energy. Vision Res. 40, 217–242. Li, A., and Zaidi, Q. (2004). Three-dimensional shape from non-homogeneous textures: carved and stretched surfaces. J. Vis. 4, 860–878. MacEvoy, S.P., and Epstein, R.A. (2011). Constructing scenes from objects in human occipitotemporal cortex. Nat. Neurosci. 14, 1323–1329. Mishkin, M., Ungerleider, L.G., and Macko, K.A. (1983). Object vision and spatial vision: two cortical pathways. Trends Neurosci. 6, 414–417.
62 Neuron 84, 55–62, October 1, 2014 ª2014 Elsevier Inc.
Nasr, S., Liu, N., Devaney, K.J., Yue, X., Rajimehr, R., Ungerleider, L.G., and Tootell, R.B. (2011). Scene-selective cortical regions in human and nonhuman primates. J. Neurosci. 31, 13771–13785. Park, S., and Chun, M.M. (2009). Different roles of the parahippocampal place area (PPA) and retrosplenial cortex (RSC) in panoramic scene perception. Neuroimage 47, 1747–1756. Park, S., Brady, T.F., Greene, M.R., and Oliva, A. (2011). Disentangling scene content from spatial boundary: complementary roles for the parahippocampal place area and lateral occipital complex in representing real-world scenes. J. Neurosci. 31, 1333–1340. Rajimehr, R., Devaney, K.J., Bilenko, N.Y., Young, J.C., and Tootell, R.B. (2011). The ‘‘parahippocampal place area’’ responds preferentially to high spatial frequencies in humans and monkeys. PLoS Biol. 9, e1000608. Sato, N., and Nakamura, K. (2003). Visual response properties of neurons in the parahippocampal cortex of monkeys. J. Neurophysiol. 90, 876–886. Stansbury, D.E., Naselaris, T., and Gallant, J.L. (2013). Natural scene statistics account for the representation of scene categories in human visual cortex. Neuron 79, 1025–1034. Tsao, D.Y., Freiwald, W.A., Knutsen, T.A., Mandeville, J.B., and Tootell, R.B. (2003). Faces and objects in macaque cerebral cortex. Nat. Neurosci. 6, 989–995. Ungerleider, L.G., and Mishkin, M. (1982). Two cortical visual systems. In Analysis of Visual Behaviour, D.J. Ingle, M.A. Goodale, and R.J.W. Mansfield, eds. (Cambridge: MIT Press), pp. 549–586. Ward, E.J., MacEvoy, S.P., and Epstein, R.A. (2010). Eye-centered encoding of visual space in scene-selective regions. J. Vis. 10, 6. Yamane, Y., Carlson, E.T., Bowman, K.C., Wang, Z., and Connor, C.E. (2008). A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nat. Neurosci. 11, 1352–1360.