Accepted Manuscript Creatures great and small: Real-world size of animals predicts visual representations beyond taxonomic category Marc N. Coutanche, Griffin E. Koch PII:
S1053-8119(18)30764-X
DOI:
10.1016/j.neuroimage.2018.08.066
Reference:
YNIMG 15227
To appear in:
NeuroImage
Received Date: 5 June 2018 Revised Date:
15 August 2018
Accepted Date: 27 August 2018
Please cite this article as: Coutanche, M.N., Koch, G.E., Creatures great and small: Real-world size of animals predicts visual representations beyond taxonomic category, NeuroImage (2018), doi: 10.1016/ j.neuroimage.2018.08.066. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
1
Creatures great and small: Real-world size of animals predicts visual representations
2
beyond taxonomic category
3
RI PT
4 Marc N. Coutanche1,2.3* and Griffin E. Koch1,2
6
1
Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
7
2
Learning, Research & Development Center, University of Pittsburgh, Pittsburgh, PA, USA
8
3
Brain Institute, University of Pittsburgh, Pittsburgh, PA, USA
9 10
* Corresponding author:
11
E-mail:
[email protected]
12
16 17
EP
15
Keywords: size, ventral stream, visual, animals, RSA, mvpa
AC C
14
TE D
13
M AN U
SC
5
-1-
ACCEPTED MANUSCRIPT
18 19
Abstract Human occipitotemporal cortex contains neural representations for a variety of perceptual and conceptual features. We report a study examining neural representations of real-world size
21
along the visual ventral stream, while carefully accounting for taxonomic categories that
22
typically co-vary with size. We recorded brain activity during a functional Magnetic Resonance
23
Imaging (fMRI) scan from eighteen participants as they were presented with images of twelve
24
animal species. The animals were selected to vary on a number of dimensions, including
25
taxonomic group, real-world size and prior familiarity. We apply multivariate analysis methods,
26
including representational similarity analysis (RSA) and machine learning classifiers, to probe
27
the distributed patterns of neural activity evoked by these presentations. We find that the real-
28
world size of visually presented animate items is represented in posterior, but not anterior,
29
regions of the ventral stream. A significant linear relationship is present for real-world size
30
representation along the ventral stream. These representations remain after controlling for factors
31
such as taxonomic category, familiarity and models of visual similarity, and even after restricting
32
examinations to within-taxonomic category comparisons, suggesting size information is found
33
within, as well as between, taxonomic categories. These findings are consistent with real-world
34
size having an influence on activity patterns in early regions of the visual system.
36 37
SC
M AN U
TE D
EP
AC C
35
RI PT
20
Introduction
The human occipitotemporal cortex contains neural representations for visual features
38
and concepts. Certain categories of concepts and specific features are marked by particularly
39
strong (univariate) responses, including faces (Kanwisher, McDermott, & Chun, 1997), places
-2-
ACCEPTED MANUSCRIPT
(Epstein & Kanwisher, 1998), objects (Malach et al., 1995), and words (Fiez & Petersen, 1998).
41
Information about specific types of objects, such as chairs versus hammers (Haxby et al., 2001),
42
and between different object exemplars (Eger, Ashburner, Haynes, Dolan, & Rees, 2008) is, in
43
contrast, often contained only in distributed activity patterns. Certain features of visual concepts,
44
such as orientations (Kamitani & Tong, 2005) and shapes (Drucker & Aguirre, 2009) are
45
represented in occipital activity patterns. On the other hand, dimensions that are considered more
46
conceptual, such as an animal’s taxonomic category (Connolly et al., 2012) and predacity
47
(Connolly et al., 2016) are represented in more anterior regions, within ventral temporal cortex.
SC
A dimension falling at the intersection of perception and conception is real-world size.
M AN U
48
RI PT
40
An item’s size is a perceptual feature, akin to shape and color, but (unlike shape and color)
50
cannot be extracted from an item’s retinal imprint alone – instead, additional information (such
51
as size-knowledge, comparisons to other items, or spatial context) is needed to translate the
52
retinal imprint into real-world size, which is closer to a conceptual feature. Studies of cortical
53
areas that are influenced by real-world size have varied in the extent to which they find early
54
versus late parts of the ventral stream to be modulated by size. Investigations of how univariate
55
responses differ based on size have identified that medial and lateral areas of ventral temporal
56
cortex give stronger responses to large and small man-made objects, respectively (Konkle &
57
Caramazza, 2013; Konkle & Oliva, 2012), possibly because they differ in their potential role as a
58
landmark (Julian, Ryan, & Epstein, 2017). In contrast, animate items (which are mobile and thus
59
not reliable landmarks) do not show the same univariate distinctions between medial and lateral
60
ventral temporal areas (Konkle & Caramazza, 2013).
61 62
AC C
EP
TE D
49
One recent study examining the influence of real-world size on early visual cortex taught participants to associate geometric shapes with different sizes, finding that after learning, early
-3-
ACCEPTED MANUSCRIPT
visual cortex activity came to reflect the associated size of the shapes (even with identical
64
presentation sizes; Gabay, Kalanthroff, Henik, & Gronau, 2016). Another recent study probed
65
how various semantic and perceptual dimensions for well-known concepts are represented in the
66
ventral stream as people read words for different concepts (e.g., “camel”; Borghesani et al.,
67
2016). The authors used representational similarity analysis to examine how pattern similarity
68
reflects perceptual and conceptual dimensions while controlling for others. The real-world size
69
of concepts was reflected in the similarity of patterns in early visual cortex (Brodmann Area
70
(BA) 17), suggesting that reading words induces early visual cortex patterns that reflect the real-
71
world size of the referenced items. The broader finding –that early visual cortex activity reflects
72
more than retinotopic stimuli in the current visual field– is consistent with the emerging
73
understanding that early visual cortex can be influenced by non-retinotopic information, such as
74
position-invariant object information (Williams et al., 2008) or the prototypical color of
75
grayscale images (Bannert & Bartels, 2013).
SC
M AN U
TE D
76
RI PT
63
Animate items are valuable stimuli for probing the representation of real-world size because small and large animals are not confounded with the ‘landmark versus manipulation’
78
differences that are present for man-made objects. In contrast to most man-made objects, large
79
animals are not used as landmarks (because they move) and small animals are rarely
80
manipulated. There is, nonetheless, typically a confounding relationship between taxonomic
81
category and real-world size: mammals are on average larger than birds, which are on average
82
larger than insects. In order to isolate real-world size, it is thus important to carefully extract the
83
size dimension from this taxonomic difference. Here, we examine how the real-world size of
84
visually presented items is represented in multivoxel patterns along the ventral stream using
85
animate concepts that lack landmark and manipulation confounds, with a stimuli set that has
AC C
EP
77
-4-
ACCEPTED MANUSCRIPT
been selected to control for taxonomic confounds. To examine size independent of the typical
87
taxonomic category association with size, we recorded the blood-oxygen-level-dependent
88
(BOLD) response using functional magnetic resonance imaging (fMRI) as participants viewed
89
different exemplars of animals that were large or small for their taxonomic category.
90
Importantly, we presented stimuli that break the typical size gradient between insects, birds, and
91
mammals, using insects that are larger than some birds, and birds that are larger than some
92
mammals.
SC
93
RI PT
86
We find that early regions of the visual system represent the real-world size of animals after controlling for a variety of factors including taxonomic category, familiarity, and models of
95
visual similarity. On the other hand, real-world size becomes less influential as the ventral stream
96
progresses anteriorly.
M AN U
94
97
Materials and Methods
99
Participants
Twenty participants (10 females; mean (M) age = 22.56, standard deviation (SD) = 2.81;
EP
100
TE D
98
right-handed, English speakers without a learning or attentional disorder, and with normal or
102
corrected-to-normal vision) took part in the study. Two subjects were removed for excessive
103
head motion. The remaining 18 participants were included in all analyses and results. An a priori
104
power analysis was conducted using effect sizes from Borghesani et al. (2016) who conducted an
105
RSA of real-world size in visual cortex. This analysis determined that a sample size of 16
106
participants provides a power of 0.95, in support of our ability to detect equivalent effects with
107
our sample. Prior to beginning the study, participants provided written informed consent and,
AC C
101
-5-
ACCEPTED MANUSCRIPT
108
upon completion of the study, were compensated with payment for their time. The University of
109
Pittsburgh Institutional Review Board approved all procedures.
110 Stimuli
112
RI PT
111
The stimuli for the study consisted of images of 12 different animals, four from each of the following taxonomic categories: insect, bird, mammal (Table 1; Figure 1). Within each
114
taxonomic category were two animals that were large, and two animals that were small, for their
115
particular category. One animal of each relative size and category was well known, and another
116
was less familiar. The relative size (large versus small for category) and familiarity (well-known
117
versus unfamiliar) were based on norming by ratings in an independent sample of 40
118
participants. This animal grouping was later validated in the scanned participants’ responses
119
during post-scan debriefing. For each animal, 15 high-resolution digital images of exemplars of
120
each species (collected online) were edited to remove backgrounds, and then scaled so that the
121
longest side was 504 pixels. The resulting image was then centered on a white background and
122
the full image (including animal and white background) was adjusted to be 720 pixels by 405
123
pixels. Each of the 15 images was left-right flipped to give a total of 30 unique images per
124
animal.
SMALL More familiar Less familiar acacia ant checkered beetle bee hummingbird violet-tailed sylph capuchin monkey pygmy marmoset
AC C
Insects Birds Mammals 125
EP
TE D
M AN U
SC
113
LARGE More familiar Less familiar praying mantis giant weta ostrich shoebill gorilla gelada
126
Table 1: Species presented for each taxonomic category and size. Within categories, animals
127
were either small or large, and more familiar or less familiar.
-6-
ACCEPTED MANUSCRIPT
TE D
M AN U
SC
RI PT
128
129
Figure 1: Example stimuli for each of the twelve animals. All images reproduced with
131
permission from copyright holders. Images were resized and edited to remove backgrounds.
132
Attributions for images: Image of checkered beetle provided via
133
https://www.flickr.com/photos/37546322@N00/7454045640/; Author: Joan Quintana. Image of
134
giant weta provided via https://www.flickr.com/photos/sidm/5601688959; Author: Sid Mosdell.
135
Image of violet-tailed sylph provided via
136
https://www.flickr.com/photos/francesco_veronesi/16300406862; Author: Francesco Veronesi.
137
Image of gelada provided via https://www.flickr.com/photos/adavey/2447517427; Author:
138
A.Davey.
AC C
EP
130
-7-
ACCEPTED MANUSCRIPT
139 140
To include models of visual similarity in the partial correlations conducted, each stimulus image was processed using the GIST model (Oliva & Torralba, 2001) and a more basic pixel-
142
area analysis. Following the procedure used in Rice et al. (2014), we computed the image
143
statistics of each of our images using the GIST descriptor
144
(http://people.csail.mit.edu/torralba/code/spatialenvelope/). This model produced a vector of 512
145
values for each image, representing the image across a variety of features including spatial
146
frequency and orientation. These vectors were used to quantify the GIST-model’s visual
147
similarity between each of the twelve species by conducting Pearson correlations between GIST
148
vectors of the animal images, and averaging the resulting Fisher-transformed r-values. This gave
149
a GIST-model visual similarity value for each animal pairing. To conduct a pixel-area analysis,
150
we first quantified the number of pixels that represented each animal in every image. The
151
absolute differences in pixel-area for each pair of animals was calculated, z-scored and then used
152
as a predictor in RSAs and multiple regressions.
SC
M AN U
TE D
153
RI PT
141
To anticipate key analyses (described below), we will compare neural patterns for small and large animals within taxonomic categories. The modelled visual similarity of animals did not
155
differ with their real-world size: animal pairings of the same size (e.g., small insect – small
156
insect) did not differ from animal pairings of different sizes, within the same taxonomic category
157
(e.g., small insect – big insect) in their GIST visual similarity (t(14) = 0.30, p = .77) or pixel
158
areas (t(14) = 0.26, p = .80). This is consistent with the within-category manipulation (e.g.,
159
comparing ant to praying mantis, and humming bird to ostrich, but not ant to ostrich)
160
successfully reducing visual differences that might otherwise co-vary with size.
AC C
EP
154
161
-8-
ACCEPTED MANUSCRIPT
162 163
Experimental Procedure The day before scanning, participants were shown brief (35-second) nature videos of each of the twelve species in their natural habitat in a randomized order to ensure they have an
165
understanding of the real-world size of each animal. The scanning session consisted of an
166
anatomical scan (details below) followed by 10 functional runs. Prior to beginning each run,
167
participants were instructed to pay attention as they would later be asked a question about the
168
presented animals or about the fixation cross between blocks. Images were presented using
169
MATLAB (R2016a) and the Psychophysics Toolbox Version 3 (Brainard, 1997; Kleiner et al.,
170
2007). In every run, participants were shown a block for each of the twelve animals. During each
171
block, participants were shown three unique images of an animal consecutively (1.333 ms each),
172
which were then immediately repeated. A fixation cross (‘+’ or ‘x’) was shown between blocks
173
(for 12 seconds). To ensure participants paid attention during the task and fixation, at the end of
174
each run, participants were presented with an image of an animal and asked if it had been
175
presented during the study. The probed animal was equally likely to be old or new. After
176
answering this question, participants were asked if the fixation cross had changed from ‘+’ to ‘x’
177
at any point during the previous run.
SC
M AN U
TE D
EP
178
RI PT
164
Following each session, participants completed assessments, including a real-world size estimation task, size comparison task, and animal familiarity questionnaire in which participants
180
rated how familiar they had been with each species prior to the study on a 1-to-7 Likert scale.
181
To assess participants’ estimation of the real-world size of each animal, participants were first
182
shown an animal’s exemplar image (isolated on a background) for one second as a cue for which
183
animal they would be judging. Participants chose whether to compare the animal to the size of an
184
average human body, human hand, or both, and were given a piece of graph paper containing an
AC C
179
-9-
ACCEPTED MANUSCRIPT
outline of a body or hand superimposed on 1 x 1 cm squares. Participants shaded the number of
186
squares corresponding to their perceived size of the animal in relation to the body or hand that
187
provided a relative scale (e.g., the size of a gorilla relative to a human). Each participant’s size
188
estimate for each animal was calculated based on the surface area of squares they shaded,
189
relative to the area encompassed by the human body or hand, which allowed it to be converted to
190
real-world size. When a participant chose to use both the human body and human hand to
191
indicate an animal’s size, we calculated the average from both estimates. Five participants’
192
behavioral data from the real-world size task were removed because they skipped an animal or
193
did not follow the task instructions (i.e. estimated the size of the animal on the screen instead of
194
the animal in real-life). To replace this missing data, we calculated the average size ratings from
195
all other participants and used these data instead.
M AN U
SC
RI PT
185
196
198
Image acquisition
TE D
197
Participants were scanned using a Siemens 3-T head only Allegra magnet and standard radio-frequency coil equipped with mirror device to allow for fMRI stimuli presentation. T1-
200
weighted anatomical scans were conducted at the beginning of each scanning session (TR = 1540
201
ms, TE = 3.04 ms, voxel size = 1.00 x 1.00 x 1.00 mm). T2-weighted functional scans collected
202
blood oxygenation level-dependent (BOLD) signals using a one-shot EPI pulse (TR = 2000 ms,
203
TE = 25 ms, field of view = 200 mm, voxel size = 3.125 x 3.125 x 3.125 mm, 36 slices).
205 206 207
AC C
204
EP
199
Image preprocessing
All imaging data were preprocessed using the Analysis of Functional NeuroImages (AFNI) software (Cox, 1996). Preprocessing included the following steps: slice-time correction,
- 10 -
ACCEPTED MANUSCRIPT
motion correction registration, high-pass filtering, and scaling voxel activation values to have a
209
mean of 100 (maximum limit of 200). Structural and functional images were also converted to
210
standardized space (Talairach & Tournoux, 1988). Data were not smoothed. Each participant’s
211
pre-processed and standardized functional data were imported into MATLAB, such that the
212
values reflect the pre-processed BOLD response for each voxel at every time-point (TR).
RI PT
208
213
215
Regions of interest
SC
214
Regions of interest (ROIs) were created to represent the pathway along the visual ventral stream, as well as the ventral temporal (VT) cortex as a whole (Figure 2). Following the
217
procedure used by Borghesani and colleagues (2016), we examined six Brodmann areas along
218
the ventral pathway (based on anatomical criteria using a standard Talairach AFNI atlas; Cox,
219
1996): BA 17 (primary visual area), BA 18 (secondary visual areas), BA 19 (lateral and superior
220
occipital gyri), BA 37 (occipito-temporal cortex), BA 20 (inferior temporal gyrus), and BA 38
221
(temporal pole). The definition of the VT ROI was based on criteria in prior studies (Haxby et
222
al., 2001): extending 70 to 20mm posterior to the anterior commissure in Talairach coordinates,
223
incorporating the lingual, fusiform, parahippocampal and inferior temporal gyri.
TE D
EP AC C
224
M AN U
216
- 11 -
SC
RI PT
ACCEPTED MANUSCRIPT
225
Figure 2. Regions of interest. Brodmann areas and VT cortex are indicated by color displayed on
227
a standardized brain.
M AN U
226
228 229 230
Pattern Classification
We first tested discriminability using a machine-learning classifier. First, each voxel’s values were z-scored across the time-course of each run. Next, the condition label associated
232
with each TR was shifted by three TRs to account for the hemodynamic delay. A Gaussian Naïve
233
Bayes (GNB) classifier was then trained and tested on the activity patterns recorded for each TR,
234
which were labeled (post-shift) by the animal presented. A leave-one-run-out cross-validation
235
procedure ensured that training and testing data were kept independent. Averaging classification
236
accuracy for the 10 cross-validation folds gave a single classification accuracy.
238 239
EP
AC C
237
TE D
231
Neural similarity
To measure neural similarity, we implemented a representational similarity analysis
240
(RSA) approach (Kriegeskorte, Mur, & Bandettini, 2008). The condition-label associated with
241
each run’s TR was first shifted by three TRs to account for the hemodynamic delay. For each
- 12 -
ACCEPTED MANUSCRIPT
subject and animal, a mean activity pattern was calculated by averaging the TRs associated with
243
each animal (i.e., averaging vectors of BOLD activity patterns). The mean pattern for each
244
animal was correlated with the mean pattern for each other animal using a Pearson correlation.
245
The correlation was then converted to a z-score through a Fisher-transform and compared
246
through an ANOVA. To investigate how neural activity patterns are affected by real-world size,
247
we ran two primary RSAs.
RI PT
242
First, we conducted partial correlations between each participant’s set of pattern
249
similarities for each ROI (66 rows, reflecting the 66 potential animal pairings) and their
250
behavioral reported size differences between animals (e.g., where ant–gorilla will be a large
251
value, while ant–beetle will be smaller). This partial correlation allowed us to partial-out other
252
factors (for another example see Borghesani et al., 2016): models of visual similarity (GIST and
253
pixel-area), animals belonging to the same or different taxonomic category (a binary vector of
254
the two animals falling within or between taxonomic categories) and whether animals are both
255
well-known or unfamiliar (a binary vector of within or between-familiarity). The resulting partial
256
correlation value reflects the correspondence between pattern similarity and real-world size (for a
257
participant and ROI) while controlling for models of visual similarity (GIST and pixel-area),
258
taxonomic category and familiarity. A negative correlation from this analysis would reflect the
259
presence of real-world size information; specifically, higher pattern-similarity for pairs of
260
animals that have smaller differences in size. We also ran a multiple regression analysis that
261
predicts pattern similarity with the five predictors –real-world size, GIST, pixel-area, taxonomic
262
category and familiarity– to examine the predictive power of each factor in the context of the
263
others.
AC C
EP
TE D
M AN U
SC
248
- 13 -
ACCEPTED MANUSCRIPT
264
For our second primary RSA approach, we conducted partial correlations for similarity within each taxonomic category. We were able to do this because we carefully selected our
266
stimulus-set to include animals that are small and large for their category. By excluding animal
267
pairings that cross a taxonomic boundary (e.g., ant – ostrich), and restricting analyses to
268
small/large pairings within each category (e.g., ant – praying mantis; humming bird – ostrich),
269
we can examine the presence of real-world size without the taxonomic category confound (e.g.,
270
birds tend to be larger than insects). In the same manner as the first RSA, models of visual
271
similarity and familiarity were partialed-out in the analysis. As with the first RSA, a multiple
272
regression analysis was also conducted.
Results
275
Behavioral Results
TE D
274
M AN U
273
SC
RI PT
265
To confirm our classifications of animals based on relative size (e.g. small vs. large) and
277
familiarity (e.g. familiar vs. unfamiliar), we analyzed the behavioral ratings for each participant
278
on both of these measures. We conducted an ANOVA to determine whether the small animals
279
were rated as being smaller than the large animals during the real-world size estimation task.
280
There was a main effect of category (F = 35.92, p < .001), as well as a main effect of size (F =
281
76.61, p < .001). These results suggest a size continuum, progressing from insects (M = 0.51, SD
282
= 0.54) to birds (M = 10.61, SD = 13.63) to mammals (M = 14.10, SD = 13.57). This further
283
validated the importance of holding the taxonomic category constant while comparing real-world
284
size. Additionally, within each taxonomic category, the small animals were indeed rated as
285
smaller than the large animals (Insects: Small (M = 0.22, SD = 0.20), Large (M = 0.79, SD =
AC C
EP
276
- 14 -
ACCEPTED MANUSCRIPT
286
0.62); Birds: Small (M = 2.34, SD = 1.57), Large (M = 18.89, SD = 15.3); Mammals: Small (M =
287
4.80, SD = 3.35), Large (M = 23.40, SD = 13.58)).
288
To confirm our classification of animals based on familiarity, we conducted an ANOVA to determine whether the well-known animals were rated as being more familiar than the
290
unfamiliar animals during the familiarity questionnaire. There was no main effect of taxonomic
291
category (F = 2.04, p = .13), indicating there were no differences between insects, birds, and
292
mammals in terms of familiarity. There was a main effect of familiarity (F = 101.04, p < .001),
293
indicating that the well-known animals were rated as more familiar than the less-familiar items
294
within each category (Well-known: M = 5.99, SD = 1.47; Less familiar: M = 3.17, SD = 2.04)
M AN U
SC
RI PT
289
295 296 297
Pattern Classification
We first investigated classification performance in each of the six ROIs that span the ventral stream to examine where the animals can be decoded, before we later examine
299
discriminability while controlling for other factors. We first conducted an ANOVA to determine
300
if the regions differed in discriminability. The ANOVA indicated significant differences between
301
the seven regions (six ROIs spanning the ventral stream and the VT): F(6, 102) = 25.99, p <
302
.001. We then conducted post-hoc t-tests to investigate which regions were successful at
303
discriminating the 12 animals. Five of the seven regions had activity patterns that were classified
304
at a level above chance (1/12; .08; Figure 3). This included posterior ventral stream regions BA
305
17 (M = 0.13, SD = 0.02; t(34) = 12.46, p < .001), BA 18 (M = 0.15, SD = 0.02; t(34) = 13.40, p
306
< .001), BA 19 (M = 0.12, SD = 0.03; t(34) = 5.62, p < .001), as well as more anterior BA 37 (M
307
= 0.10, SD = 0.03; t(34) =3.36, p = .002) and VT (M = 0.11, SD = 0.02; t(34) = 5.03, p < .001).
AC C
EP
TE D
298
- 15 -
ACCEPTED MANUSCRIPT
308
In contrast, patterns in BA 20 (M = 0.09, SD = 0.01; t(34) = 0.87, p = .39) and BA 38 (M = 0.08,
309
SD = 0.01; t(34) = -0.20, p = .84) could not be decoded.
EP
311
TE D
M AN U
SC
RI PT
310
Figure 3. Decoding performance across the ventral steam. Values reflect the results of a GNB
313
classifier that was trained and tested to decode activity patterns in TRs that were associated with
314
perceiving exemplars of one of twelve animals. Colors correspond to the regions shown in
315
Figure 2. Error bars reflect standard error of the mean. Chance is indicated with a dashed red line
316
(1/12). The asterisks indicate above-chance performance (p < 0.05).
AC C
312
317 318
RSA: Across and within category
- 16 -
ACCEPTED MANUSCRIPT
319
We first asked how neural similarity reflects real-world size across the entire stimuli set. We conducted partial correlations for all possible animal pairings between pattern similarity in
321
our ROIs and the size difference between each pairing. Differences in models of visual
322
similarity, whether the animals belong to the same or different taxonomic categories, and
323
whether animals are both well-known or unfamiliar, were partialed-out through this analysis.
324
First, we conducted an ANOVA to determine if differences were present between the seven
325
regions. There was a main effect of region (F(6,102) = 17.76, p < .001) prompting us to examine
326
the individual regions further. There was a significant pattern similarity / real-world size
327
relationship in the two most posterior ventral stream ROIs (Figure 4): BA 17 (M = -0.262, SD =
328
0.135; t(17) = -8.22, p < .001), BA 18 (M = -0.237, SD = 0.216; t(17) = -4.67, p < .001). The
329
negative mean value reflects a negative correlation between pattern-similarity and size
330
differences (i.e., items with smaller differences in size have more similar activity patterns). BA
331
20 showed a significant positive relationship (M = 0.065, SD = 0.124; t(17) = 2.24, p = .04). In
332
contrast, BA 19 (M = -0.109, SD = 0.254; t(17) = -1.81, p = .09), BA 37 (M = 0.028, SD = 0.178;
333
t(17) = 0.68, p = .51), BA 38 (M = 0.030, SD = 0.085; t(17) = 1.51, p = .15) and VT (M = -.028,
334
SD = 0.180; t(17) = -0.66, p = .52) did not have significant relationships between pattern
335
similarity and real-world size. Additionally, although not the primary focus of the study, we
336
examined if pattern similarity tracked familiarity of the items. We followed the same procedure
337
as used in the real-world size analyses, but correlating neural similarity with differences in
338
participants’ familiarity ratings between each animal pairing. One ROI showed a trending
339
relationship between pattern similarity and familiarity: BA 17 (M = -.060, SD = .124; t(17) = -
340
2.06, p = .055), all other regions were not significant (p > .25).
AC C
EP
TE D
M AN U
SC
RI PT
320
- 17 -
ACCEPTED MANUSCRIPT
The supplementary material includes the raw (non-partialed-out) correlation values for
342
each predicting factor and ROI (Supplementary Table 1) as well as representational similarity
343
matrices and the first two dimensions of a multi-dimensional scaling analysis of each ROI
344
(Supplementary Figures 1-7).
RI PT
341
EP
346
TE D
M AN U
SC
345
Figure 4. Representation of real-world size in activity patterns for regions of the ventral stream.
348
The y-axis reflects partial correlations between activity pattern similarity and real-world size
349
differences while partialing-out models of visual similarity, taxonomic category, and familiarity.
350
Negative values reflect negative correlations between pattern-similarity and size differences (i.e.,
351
more similar activity patterns for items with smaller differences in size). Colors correspond to
352
the regions shown in Figure 2. Error bars reflect standard error of the mean. The asterisks
353
indicate above-chance partial correlations (p < 0.05).
AC C
347
354
- 18 -
ACCEPTED MANUSCRIPT
To further complement our partial correlation analyses, we conducted a multiple
356
regression analysis to predict neural similarity within regions using the same predictors as in the
357
partial correlation analyses: real-world size difference, GIST, pixel-area differences, familiarity,
358
and taxonomic category. The resulting set of group beta coefficients (for each predictor) were
359
first compared in an ANOVA to determine if there were differences between regions. The real-
360
world size predictor was significantly different between ROIs (F(6,102) = 19.96, p < .001), as
361
was GIST (F(6,102) = 53.57, p < .001), pixel area difference (F(6,102) = 6.89, p < .001) and
362
taxonomic category (F(6, 102) = 21.40, p < .001), prompting further comparisons of the
363
individual ROIs. Familiarity did not differ as a predictor between the regions (F(6,102) = 0.80, p
364
= .57). The real-world size predictor followed a similar pattern of results as in the partial
365
correlation model for the early visual regions: significant beta coefficients were present in BA 17
366
(M = -.07, SD = 0.04; t(17) = -7.99, p < .001) and BA 18 (M = -.05, SD = 0.04; t(17) = -4.71, p <
367
.001), however BA 20 did not show a significant positive relationship (M = .01, SD = 0.03; t(17)
368
= 1.88, p = .08). Again similar to the partial correlation model, there were not significant
369
relationships in BA 19 (M = -.02, SD = 0.06; t(17) = -1.78, p = .09), BA 37 (M = .00, SD = 0.04;
370
t(17) = 0.60, p = .55), BA 38 (M = .01, SD = 0.02; t(17) = 1.41, p = .18), nor VT (M = -.00, SD =
371
0.03; t(17) = -0.54, p = .59).
SC
M AN U
TE D
EP
To investigate whether the results from each of the six BA ROIs reflected a linear
AC C
372
RI PT
355
373
relationship within the ventral stream, we conducted a linear regression using y-coordinates for
374
the center of mass of each ROI (BA 17: -88.75; BA 18: -83.80; BA 19: -76.29; BA 37: -54.78;
375
BA 20: -20.85; BA 38: 12.36). These center of mass values were used in a regression to predict
376
the neural-to-behavior real-world size correlation value for each participant (i.e., predicting the
377
correspondence between neural similarity and size similarity, based on the ROI’s y-coordinates).
- 19 -
ACCEPTED MANUSCRIPT
378
The resulting set of group beta coefficients were significantly greater than zero (in a one-sample
379
t-test: t(17) = 5.71, p < .001), indicating a linear progression of decreasing real-world size
380
representation (i.e., less negative r-values) as one progresses from posterior to anterior regions.
382 383
RI PT
381 RSA: Within category
We next examined whether a relationship between pattern similarity and real-world size remains after removing differences across taxonomic category. Our design allowed us to
385
compare animals within each taxonomic category that are (relatively) small or large. We
386
therefore repeated the above RSA, but removed pairings that cross a taxonomic boundary (e.g.,
387
ant – gorilla), leaving only within-category comparisons (e.g., ant – praying mantis). Models of
388
visual similarity and familiarity were partialed-out in the analysis. Again, we conducted an
389
ANOVA to determine if differences were present between the seven regions. There was a main
390
effect of region (F(6,102) = 4.54, p < .001), prompting us to examine the regions further. Two of
391
the posterior ventral stream ROIs (Figure 5) showed a significant relationship between neural
392
similarity and size: BA 17 (M = -0.141, SD = 0.167; t(17) = -3.59, p = .002), BA 18 (M = -0.095,
393
SD = 0.180; t(17) = -2.25, p = .04). There was not a significant relationship in the remaining
394
regions: BA 19 (M = -0.000, SD = 0.253; t(17) = -0.004, p = .997), BA 37 (M = -0.009, SD =
395
0.148; t(17) = -0.26, p = .80), BA 20 (M = 0.091, SD = 0.307; t(17) = 1.26, p = .23), BA 38 (M =
396
0.096, SD = 0.275; t(17) = 1.47, p = .16), VT (M = .040, SD = 0.248; t(17) = 0.68, p = .50).
M AN U
TE D
EP
AC C
397
SC
384
- 20 -
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
398
Figure 5. Representation of real-world size in activity patterns for regions of the ventral stream
400
within taxonomic categories. The y-axis reflects partial correlations between activity pattern
401
similarity and real-world size differences for animal pairings within taxonomic categories, while
402
partialing-out models of visual similarity, and familiarity. Negative values reflect negative
403
correlations between pattern-similarity and size differences (i.e., more similar activity patterns
404
for items with smaller differences in size). Colors correspond to the regions shown in Figure 2.
405
Error bars reflect standard error of the mean. The asterisks indicate above-chance partial
406
correlations (p < 0.05).
408
EP
AC C
407
TE D
399
We again conducted a multiple regression analysis to complement our partial correlation
409
analyses within category. In this model, we predicted neural similarity using real-world size
410
(within categories only), GIST, pixel area difference and familiarity. Again, we compared the
- 21 -
ACCEPTED MANUSCRIPT
resulting set of group beta coefficients (for each predictor) in an ANOVA to determine if there
412
were differences between the regions. The relative size predictor was significantly different
413
between the regions (F(6,102) = 4.74, p < .001), as was GIST (F(6,102) = 43.44, p < .001), but
414
not pixel-area (F(6,102) = 1.19, p = .32) nor familiarity (F (6,102) = 0.55, p = .77). The relative
415
size predictor resulted in similar findings as the partial correlations, with significant beta
416
coefficients in BA 17 (M = -.07, SD = 0.08; t(17) = -3.71, p < .001) and BA 18 (M = -.04, SD =
417
0.06; t(17) = -2.57, p = .02). There was not a significant relationship in BA 19 (M = -.001, SD =
418
0.10; t(17) = -.05, p = .96), BA 20 (M = .03, SD = 0.12; t(17) = 0.98, p = .34), BA 37 (M = -.006,
419
SD = 0.08; t(17) = -0.33, p = .74), BA 38 (M = .04, SD = 0.11; t(17) = 1.44, p = .17), nor VT (M
420
= .001, SD = 0.09; t(17) = .03, p = .97).
SC
M AN U
421
RI PT
411
We next conducted a regression analysis examining how the posterior-anterior ROI continuum predicted the representation of real-world size. A linear regression used the center of
423
mass y-coordinates for each BA ROI to predict the correlation between neural similarity and size
424
similarity (only within taxonomic categories) for each participant. The group’s resulting beta
425
coefficients were significant (in a one-sample t-test: t(17) = 3.69, p = .002), suggesting a
426
decreasing representation of real-world size in ventral stream regions (i.e., less negative
427
correlation values) as one progresses from posterior to anterior ventral stream regions.
429 430
EP
AC C
428
TE D
422
Discussion
We report findings from an investigation of how real-world size is represented across the
431
human ventral stream. In order to examine real-world size without influence of an item’s
432
potential to act as a landmark or be manipulated (associated with large and small man-made
433
objects respectively), we presented images of animals, which do not (typically) have either
- 22 -
ACCEPTED MANUSCRIPT
association. To further eliminate the impact of taxonomic category (which typically correlates
435
with real-world size), we presented animals that are large and small for their category, with large
436
items in one category (e.g., insects) being larger than the small items in the next (e.g., birds).
437
Examining representational similarity in regions of the ventral stream, as well as decoding
438
results, we found that real-world size was apparent in early visual regions of occipital cortex (BA
439
17 and 18), but not more anterior temporal regions. Removing comparisons that cross a
440
taxonomic boundary (e.g., insect – bird) supported BA 17 and BA 18 as containing size
441
information, outside of taxonomic influences. Our findings also suggest a linear progression for
442
the correspondence between neural similarity and real-world size along the visual ventral stream,
443
where the relationship starts-out as strong, but decreases as one moves anteriorly.
SC
M AN U
444
RI PT
434
Our finding that real-world size is represented in multi-voxel patterns of early visual cortex is consistent with several recent studies that examined how real-world size is represented
446
for concepts presented in word-form and learned shapes. Specifically, the real-world size of
447
concepts presented as words (e.g., “camel”) has been found to be represented in pattern
448
similarity of BA 17 (i.e., words for similarly sized concepts evoke similar activity patterns;
449
Borghesani et al., 2016). Similarly, geometric shapes associated with larger or smaller sizes
450
evoke differing levels of activity in early visual cortex (Gabay et al., 2016). Our findings suggest
451
that in addition to symbolic representations (words and shapes, respectively), visually presented
452
concepts can produce activity patterns that reflect real-world size in early visual cortex, even
453
when taxonomic category distinctions are removed. This is consistent with suggestions that
454
information in the ventral stream becomes more conceptual as one moves from posterior to
455
anterior regions, with real-world size being among several perceptual dimensions represented in
456
early visual cortex (Borghesani et al., 2016; Coutanche, Solomon, & Thompson-Schill, 2016).
AC C
EP
TE D
445
- 23 -
ACCEPTED MANUSCRIPT
457
In contrast to early visual cortex, we did not find size information in VT cortex. This is consistent with findings from Konkle and Caramazza (2013) who found that areas of VT cortex
459
differ in how they respond to large versus small inanimate objects (i.e., some areas are more
460
responsive to large than small objects, and vice versa) but do not differ in their response to large
461
versus small animals. We too did not find animal size represented in higher-level areas, but did
462
find this information in the multi-voxel patterns of early visual cortex. Like a number of other
463
visual dimensions (Coutanche et al., 2016; Tong & Pratte, 2012), real-world size might be a
464
property (for animate items at least) that is more detectable in multi-voxel patterns (not probed
465
by Konkle and Caramazza, 2013). Differences in how the size of animate and inanimate items
466
are represented in cortex might relate to their different behavioral roles, as “the size of an
467
inanimate object causally influences how we interact with it (with our hands or whole body),
468
whereas for animals, our primary interactions are not related to real-world size” (Konkle &
469
Caramazza, 2013, p. 10241).
SC
M AN U
TE D
470
RI PT
458
Our decoding analysis, which classified brain data based on the species being viewed, identified species-relevant information in patterns of early visual regions (BA 17, 18 and 19), as
472
well as in BA 37 and the broader VT area. This is in line with prior work showing that animacy
473
is represented in VT areas (Connolly et al., 2012; Konkle & Caramazza, 2013; Sha et al., 2015).
474
The finding of significant species classification performance without real-world size information
475
in BA 37 and VT is consistent with higher-level areas containing species-relevant information
476
that is above and beyond distinctions that might be based on real-world size. Interestingly, our
477
finding that familiarity did not predict pattern similarity in BA 37 or VT raises the possibility
478
that species-relevant information in these regions might not be strongly driven by existing
479
knowledge of particular species.
AC C
EP
471
- 24 -
ACCEPTED MANUSCRIPT
480
In this work, we removed a typical confound of taxonomic category by presenting large and small animals for mammals, birds, and insects, and examining pattern similarity without
482
crossing the taxonomic boundary. The importance of this approach –selecting a stimuli set that
483
varies in a key dimension without crossing other dimensions– for RSA is apparent from our
484
results: examining size in animals across all taxonomic categories gives a significant positive
485
relationship between pattern similarity and size differences in BA 20. However, once pattern
486
similarity comparisons are restricted to within-category, this region is no longer significant. Why
487
might BA 20 show more similar patterns for items with greater size differences? The finding that
488
this relationship is present when taxonomic boundaries can be crossed (e.g., mammal–insect),
489
but not when size comparisons are kept within-category (mammal–mammal and insect–insect),
490
suggests that a factor co-varying with category differences might be driving this initial result.
491
One possibility is that a dimension such as “can fly” versus “cannot fly” might influence patterns
492
in this region. Among the presented species, all the mammals and three of the four insects move
493
along the ground (and it is unlikely that participants knew that the fourth insect -checkered
494
beetle- flies as it was an unfamiliar species, has no visible wings, and is not shown flying at any
495
point during experiment). A shared feature such as “flies” can bring patterns for the (on average)
496
larger mammals and small insects closer together. Consistent with this, the multi-dimensional
497
scaling plot for BA 20 (Supplementary Figure 5) reveals that the three flying birds are grouped
498
together in the ROI’s representational space. The ostrich (the only non-flying bird) is grouped
499
with the insects and mammals. This supports the importance of completely controlling for
500
taxonomic category by restricting RSA comparisons to items that share a category where
501
possible, to avoid unintended correlations (positive or negative) between taxonomic category and
502
other dimensions. A related point was recently made by Popov and colleagues, who argued that
AC C
EP
TE D
M AN U
SC
RI PT
481
- 25 -
ACCEPTED MANUSCRIPT
significant results can occur when representational spaces are correlated (in this case, possibly
504
taxonomic category and flying; Popov, Ostarek, & Tenison, 2018). In contrast to BA 20, BA 17
505
and BA 18 continued to reflect real-world size after we restricted similarity analyses to within-
506
category comparisons, giving confidence that their size information is not due to systematic
507
differences across taxonomic categories.
508
RI PT
503
One common problem for behavioral and imaging studies is how to consider visual
similarities or differences when examining real-world photographic stimuli, which cannot be
510
constructed according to predefined visual parameters. This is a concern for studies that compare
511
categories of visual stimuli (e.g., faces and objects), or stimuli that vary continuously in some
512
property. We submitted the presented images to several models of visual similarity to quantify
513
this characteristic. The strong relationship we observed between results from these models and
514
early visual cortex activity patterns, support the models’ ability to reflect visual properties of
515
images. Nonetheless, it is always still possible that some visual characteristics remain
516
unaccounted in the visual models we employed. To fully explain our study’s findings, these
517
additional visual characteristics would need to co-vary with real-world size in the different
518
taxonomic categories that we controlled (i.e., a visual characteristic that is shared between large
519
insects, large birds and large mammals, but not small insects, small birds and small mammals). A
520
direction for future studies would be to examine how information about real-world size is
521
maintained when images are presented across different locations in the visual field, retinal sizes,
522
and so on (though findings that object-identity can feedback across retinotopic cortex (e.g.,
523
Williams et al., 2008) would need to be taken into account).
AC C
EP
TE D
M AN U
SC
509
- 26 -
ACCEPTED MANUSCRIPT
524
To summarize, we find that the real-world size of visually presented animate items is
525
represented in posterior, but not anterior, regions of the ventral stream. This is maintained when
526
restricting examinations to within-taxonomic category comparisons.
RI PT
527
Acknowledgements
529
The authors thank Oluwatoyin Ajayi, Alex Pang, Ashley Revels, and Travis Slopek for their
530
assistance in editing stimuli, and Mark Vignone for his assistance in running participants. We
531
also thank Heather Bruett, Lauren Hallion, and John Paulus for comments on an earlier version
532
of the manuscript.
M AN U
SC
528
533
References
535
Bannert, M. M., & Bartels, A. (2013). Decoding the yellow of a gray banana. Current Biology:
536 537
TE D
534
CB, 23(22), 2268–2272. https://doi.org/10.1016/j.cub.2013.09.016 Borghesani, V., Pedregosa, F., Buiatti, M., Amadon, A., Eger, E., & Piazza, M. (2016). Word meaning in the ventral visual path: a perceptual to conceptual gradient of semantic
539
coding. NeuroImage, 143, 128–140. https://doi.org/10.1016/j.neuroimage.2016.08.068
AC C
EP
538
540
Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10(4), 433–436.
541
Connolly, A. C., Guntupalli, J. S., Gors, J., Hanke, M., Halchenko, Y. O., Wu, Y.-C., … Haxby,
542
J. V. (2012). The representation of biological classes in the human brain. The Journal of
543
Neuroscience: The Official Journal of the Society for Neuroscience, 32(8), 2608–2618.
544
https://doi.org/10.1523/JNEUROSCI.5547-11.2012
- 27 -
ACCEPTED MANUSCRIPT
545
Connolly, A. C., Sha, L., Guntupalli, J. S., Oosterhof, N., Halchenko, Y. O., Nastase, S. A., … Haxby, J. V. (2016). How the Human Brain Represents Perceived Dangerousness or
547
“Predacity” of Animals. The Journal of Neuroscience, 36(19), 5373–5384.
548
https://doi.org/10.1523/JNEUROSCI.3395-15.2016
RI PT
546
Coutanche, M. N., Solomon, S. H., & Thompson-Schill, S. L. (2016). A meta-analysis of fMRI
550
decoding: Quantifying influences on human visual population codes. Neuropsychologia,
551
82, 134–141. https://doi.org/10.1016/j.neuropsychologia.2016.01.018
552
SC
549
Cox, R. W. (1996). AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Computers and Biomedical Research, an International Journal,
554
29(3), 162–173.
555
M AN U
553
Drucker, D. M., & Aguirre, G. K. (2009). Different spatial scales of shape similarity representation in lateral and ventral LOC. Cerebral Cortex (New York, N.Y.: 1991),
557
19(10), 2269–2280. https://doi.org/10.1093/cercor/bhn244
TE D
556
Eger, E., Ashburner, J., Haynes, J.-D., Dolan, R. J., & Rees, G. (2008). fMRI activity patterns in
559
human LOC carry information about object exemplars within category. Journal of
560
Cognitive Neuroscience, 20(2), 356–370. https://doi.org/10.1162/jocn.2008.20019
562 563 564 565
Epstein, R., & Kanwisher, N. (1998). A cortical representation of the local visual environment. Nature, 392(6676), 598–601. https://doi.org/10.1038/33402
AC C
561
EP
558
Fiez, J. A., & Petersen, S. E. (1998). Neuroimaging studies of word reading. Proceedings of the National Academy of Sciences, 95(3), 914–921. https://doi.org/10.1073/pnas.95.3.914
Gabay, S., Kalanthroff, E., Henik, A., & Gronau, N. (2016). Conceptual size representation in
566
ventral visual cortex. Neuropsychologia, 81, 198–206.
567
https://doi.org/10.1016/j.neuropsychologia.2015.12.029
- 28 -
ACCEPTED MANUSCRIPT
568
Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal
570
cortex. Science (New York, N.Y.), 293(5539), 2425–2430.
571
https://doi.org/10.1126/science.1063736
572
RI PT
569
Julian, J. B., Ryan, J., & Epstein, R. A. (2017). Coding of Object Size and Object Category in Human Visual Cortex. Cerebral Cortex (New York, N.Y.: 1991), 27(6), 3095–3109.
574
https://doi.org/10.1093/cercor/bhw150
SC
573
Kamitani, Y., & Tong, F. (2005). Decoding the visual and subjective contents of the human
576
brain. Nature Neuroscience, 8(5), 679–685. https://doi.org/10.1038/nn1444
577
Kanwisher, N., McDermott, J., & Chun, M. M. (1997). The fusiform face area: a module in
M AN U
575
578
human extrastriate cortex specialized for face perception. The Journal of Neuroscience:
579
The Official Journal of the Society for Neuroscience, 17(11), 4302–4311.
581 582
Kleiner, M., Brainard, D., Pelli, D., Ingling, A., Murray, R., & Broussard, C. (2007). What’s new
TE D
580
in psychtoolbox-3. Perception, 36(14), 1–16. Konkle, T., & Caramazza, A. (2013). Tripartite organization of the ventral stream by animacy and object size. The Journal of Neuroscience: The Official Journal of the Society for
584
Neuroscience, 33(25), 10235–10242. https://doi.org/10.1523/JNEUROSCI.0983-13.2013
586 587 588
Konkle, T., & Oliva, A. (2012). A real-world size organization of object responses in
AC C
585
EP
583
occipitotemporal cortex. Neuron, 74(6), 1114–1124. https://doi.org/10.1016/j.neuron.2012.04.036
Kriegeskorte, N., Mur, M., & Bandettini, P. A. (2008). Representational similarity analysis -
589
connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2.
590
https://doi.org/10.3389/neuro.06.004.2008
- 29 -
ACCEPTED MANUSCRIPT
Malach, R., Reppas, J. B., Benson, R. R., Kwong, K. K., Jiang, H., Kennedy, W. A., … Tootell,
592
R. B. (1995). Object-related activity revealed by functional magnetic resonance imaging
593
in human occipital cortex. Proceedings of the National Academy of Sciences, 92(18),
594
8135–8139. https://doi.org/10.1073/pnas.92.18.8135
RI PT
591
Oliva, A., & Torralba, A. (2001). Modeling the Shape of the Scene: A Holistic Representation of
596
the Spatial Envelope. International Journal of Computer Vision, 42(3), 145–175.
597
https://doi.org/10.1023/A:1011139631724
598
Popov, V., Ostarek, M., & Tenison, C. (2018). Practices and pitfalls in inferring neural representations. NeuroImage, 174, 340–351.
600
https://doi.org/10.1016/j.neuroimage.2018.03.041
M AN U
599
601
SC
595
Sha, L., Haxby, J. V., Abdi, H., Guntupalli, J. S., Oosterhof, N. N., Halchenko, Y. O., & Connolly, A. C. (2015). The animacy continuum in the human ventral vision pathway.
603
Journal of Cognitive Neuroscience, 27(4), 665–678.
604
https://doi.org/10.1162/jocn_a_00733
606
Talairach, J., & Tournoux, P. (1988). Co-planar stereotaxic atlas of the human brain. 3Dimensional proportional system: an approach to cerebral imaging. Thieme.
EP
605
TE D
602
Williams, M. A., Baker, C. I., Op de Beeck, H. P., Mok Shim, W., Dang, S., Triantafyllou, C., &
608
Kanwisher, N. (2008). Feedback of visual object information to foveal retinotopic cortex.
609 610
AC C
607
Nature Neuroscience, 11(12), 1439–1445. https://doi.org/10.1038/nn.2218
- 30 -