Prior auditory information shapes visual category-selectivity in ventral occipito-temporal cortex

Prior auditory information shapes visual category-selectivity in ventral occipito-temporal cortex

NeuroImage 52 (2010) 1592–1602 Contents lists available at ScienceDirect NeuroImage j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / ...

1MB Sizes 0 Downloads 27 Views

NeuroImage 52 (2010) 1592–1602

Contents lists available at ScienceDirect

NeuroImage j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / y n i m g

Prior auditory information shapes visual category-selectivity in ventral occipito-temporal cortex Ruth Adam ⁎, Uta Noppeney Max Planck Institute for Biological Cybernetics, Spemannstr. 41, 72076 Tuebingen, Germany

a r t i c l e

i n f o

Article history: Received 2 December 2009 Revised 28 April 2010 Accepted 3 May 2010 Available online 7 May 2010 Keywords: Multisensory integration Congruency Categorization Selective attention fMRI

a b s t r a c t Objects in our natural environment generate signals in multiple sensory modalities. This fMRI study investigated the influence of prior task-irrelevant auditory information on visually-evoked categoryselective activations in the ventral occipito-temporal cortex. Subjects categorized pictures as landmarks or animal faces, while ignoring the preceding congruent or incongruent sound. Behaviorally, subjects responded slower to incongruent than congruent stimuli. At the neural level, the lateral and medial prefrontal cortices showed increased activations for incongruent relative to congruent stimuli consistent with their role in response selection. In contrast, the parahippocampal gyri combined visual and auditory information additively: activation was greater for visual landmarks than animal faces and landmark-related sounds than animal vocalizations resulting in increased parahippocampal selectivity for congruent audiovisual landmarks. Effective connectivity analyses showed that this amplification of visual landmarkselectivity was mediated by increased negative coupling of the parahippocampal gyrus with the superior temporal sulcus for congruent stimuli. Thus, task-irrelevant auditory information influences visual object categorization at two stages. In the ventral occipito-temporal cortex auditory and visual category information are combined additively to sharpen visual category-selective responses. In the left inferior frontal sulcus, as indexed by a significant incongruency effect, visual and auditory category information are integrated interactively for response selection. © 2010 Elsevier Inc. All rights reserved.

Introduction Visual object recognition relies on category-selective activations in the ventral occipito-temporal cortices. Most notably, faces and to some extent animal faces (Kanwisher et al., 1997; Kanwisher et al., 1999; Maguire et al., 2001), induce activations in the fusiform gyri (fusiform face area, FFA), while houses, landmarks and scenes are associated with selective responses in the parahippocampal gyri (parahippocampal place area, PPA) (Epstein and Kanwisher, 1998; Maguire et al., 2001). However, in our natural environment, objects commonly emit signals in multiple sensory modalities. For instance, we may see and hear a dog barking. Even landmarks that do not genuinely produce sounds are often associated with characteristic sounds such as the view of a church with the chime of its bell. The human brain is thus challenged to integrate semantic object information from multiple senses to more reliably infer an object's category. This raises the question whether category-selective activations that are commonly observed for visual object stimuli can be influenced by concurrent or prior auditory object input. While numerous studies have investigated category-selective activations in

⁎ Corresponding author. Fax: + 49 7071 601 616. E-mail address: [email protected] (R. Adam). 1053-8119/$ – see front matter © 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2010.05.002

the visual (Epstein and Kanwisher, 1998; Kanwisher et al., 1997; Noppeney et al., 2006; Pitcher et al., 2009; Rhodes et al., 2004; Rotshtein et al., 2005; Tootell et al., 2008) or auditory (Belin et al., 2000; Lewis et al., 2004; Staeren et al., 2009) domains alone, only few studies have focused on influences of category-selective information across sensory modalities (for review on multisensory object processing see Amedi et al., 2005). For instance, familiar voices have been shown to elicit activations not only in temporal voice recognition areas but also in the fusiform face area (von Kriegstein et al., 2005). Furthermore, after being trained with ‘face–voice’ or ‘cellphone–ringing tone’ pairs, subjects showed increased activation in the fusiform face area for voices (when presented alone) but not enhanced activation in visual object-selective areas for ringing tones (when presented alone) (von Kriegstein and Giraud, 2006). Yet, in a recent study (Engel et al., 2009), mechanical sounds induced parahippocampal activations without any prior training. Thus, it is currently unclear under which circumstances sounds elicit categoryselective activations in the ventral occipito-temporal cortex. In particular, since those studies presented sounds alone and hence, in the focus of attention, the observed activations are susceptible to taskinduced strategic effects such as imagery or action simulations. Indeed, voice-induced activations in the fusiform face area were observed primarily during speaker rather than speech recognition (von Kriegstein et al., 2005), suggesting that crossmodal activations

R. Adam, U. Noppeney / NeuroImage 52 (2010) 1592–1602

may emerge to facilitate specific task requirements. In contrast, evidence for a more automatic spread of multisensory semantic activations comes from intersensory selective attention paradigms showing that object processing in the task-relevant sensory modality co-activates object representations or associated features in the taskirrelevant sensory modality (Molholm et al., 2007; Murray et al., 2004). Furthermore, audiovisual semantic (in)congruency manipulations demonstrated that task-irrelevant sensory stimuli can modify activations in another sensory modality as a function of semantic congruency (Laurienti et al., 2003; Molholm et al., 2004; Noppeney et al., 2008). For instance, in an intersensory auditory selective attention paradigm, prior incongruent visual information increased activation in the auditory processing stream relative to congruent information (Noppeney et al., 2008). Similarly, in an intersensory visual selective attention paradigm (i.e. presentation of stimuli in visual and auditory modalities with only the visual modality being task-relevant), activations in the ventral occipito-temporal cortex for object pictures (e.g. a cow) were increased in the presence of a semantically conflicting source sounds (e.g. the sound of an alarm clock) (Laurienti et al., 2003). These (in)congruency effects demonstrate that ventral occipitotemporal ‘object’ areas are sensitive to auditory inputs and their semantic content even when it is task-irrelevant. They converge with previous studies on response conflict or incongruency in the visual modality alone showing that unattended words induce response amplification in the fusiform gyrus for face recognition (Egner and Hirsch, 2005). However, incongruency effects (as in Egner and Hirsch, 2005; Laurienti et al., 2003; Noppeney et al., 2008) may rely on different neural mechanisms than visual co-activations elicited by auditory stimuli when presented alone (Engel et al., 2009; von Kriegstein and Giraud, 2006). For instance, since incongruent stimulus pairs violate our expectations based on our life-long exposure to natural statistics, they may induce error detection mechanisms. Thus, incongruency effects may actually serve as a prediction error signal (Friston, 2009; Rao and Ballard, 1999). Importantly, in contrast to crossmodal co-activations observed when presented with inputs from one sensory modality alone, congruency effects require that the influence of an auditory object stimulus depends on and interacts with the category of the visual stimulus. Hence, sensory (e.g. auditory) input may influence processing of input from another sensory (e.g. visual) modality using two distinct neurobiological mechanisms. In the additive case, auditory category input induces coactivations in the corresponding visual category-selective areas irrespective of the category of the concurrent visual input resulting in additive effects for visual and auditory category. For instance, a sound associated with a certain landmark will co-activate the parahippocampal place area irrespective of whether it is presented alone, together with a congruent landmark or an incongruent face. In this additive case, the parahippocampal place area would show additive effects of visual category (i.e. landmarks N face pictures) and auditory object category (i.e. landmark sounds N voices) resulting in a stepwise activation profile (see Fig. 1A, additive case). These additive effects do not depend on the relationship (i.e. congruent or incongruent) of visual and auditory object category stimuli. In the interactive case, auditory category input induces activations in the corresponding visual category-selective areas depending on the semantic category of the visual input. For instance, a particular landmark sound may induce an activation increase when paired with a face picture, but an activation decrease when paired with its corresponding landmark picture (or vice versa) resulting in a crossover interaction between visual and auditory factors (n.b. as a consequence, the auditory stimulus may induce no activation increase when averaged across congruent and incongruent trials). This crossover interaction is then formally identical to (in)congruency effects (e.g. an activation increase in the incongruent and a decrease in the congruent case, see Fig. 1A, interactive case).

1593

To summarize, object representations from one sensory modality can influence object processing in another sensory modality via two distinct neurobiological mechanisms, (i) additive co-activations that do not depend on the relationship of the auditory and visual signals and (ii) interactive effects that are determined by the (in)congruency relationship of the two signals. To our knowledge, none of the previous studies was able to dissociate additive and interactive effects of visual and auditory category, since they used categories that induced responses in mostly overlapping brain areas (e.g. tools and musical instruments). Thus, these studies were only able to characterize interactive (= (in)congruency) effects of visual and auditory category input but could not reveal additive mechanisms of AV interplay, which crucially require a difference in net activation across different categories. To reveal both additive (i.e. crossmodal co-activations) and interactive (i.e. congruency effects) influences of task-irrelevant semantic sounds on visual processing in the ventral occipito-temporal cortex, the current experiment exploited the fact that faces and landmarks are associated with spatially selective activations in the ventral occipito-temporal cortex (Eger et al., 2005; Epstein and Kanwisher, 1998; Grill-Spector et al., 2004; Kanwisher et al., 1997; Kanwisher et al., 1999; Maguire et al., 2001; Rotshtein et al., 2007). Subjects were presented with pictures of landmarks or animal faces that were preceded by animal vocalizations or landmark sounds. Thus, the experiment factorially manipulated visual (landmark vs. animal face) and auditory (landmark sound vs. animal vocalization) category. In an intersensory visual selective attention paradigm, subjects categorized the pictures as landmarks or animal faces while ignoring the prior congruent or incongruent sounds. This 2 × 2 factorial design enabled us to dissociate additive and interactive effects of taskirrelevant semantic source sounds on activations in the ventral occipito-temporal cortex and higher order prefrontal cortices. In particular, we investigated whether activations in the fusiform and parahippocampal gyri were influenced by the category of the auditory stimuli in an additive or interactive fashion. In the additive case, the auditory stimulus (e.g. a ‘landmark sound’) would increase the activation in the corresponding category-selective area within the ventral occipito-temporal cortex (e.g. parahippocampal gyrus) irrespective of the category of the visual stimulus. In the interactive case, the category-selective effect of the auditory stimulus depends on the category of the visual stimulus resulting in activation differences for incongruent relative to congruent stimulus combinations (=(in) congruency effect). Based on previous studies of conflict within and between the senses, we also expected incongruency (=interactive) in the lateral (i.e. inferior frontal sulcus (IFS)) and medial prefrontal cortex (mPFC/anterior cingulate (AC)) (Botvinick et al., 2001; Brown and Braver, 2005; Duncan and Owen, 2000; Hein et al., 2007; Kerns et al., 2004; Noppeney et al., 2008; Noppeney et al., 2010; Paus, 2001). Effective connectivity analyses (i.e. psychophysiologic interaction) were used to further characterize interactions amongst brain regions that mediate the influence of auditory category information on visual object categorization.

Materials and methods Subjects 28 healthy right-handed students (11 females, mean age 27.2 years, range 21–37 years) participated in this fMRI study. All subjects had normal or corrected to normal vision and reported normal hearing. Subjects gave informed written consent prior to the study which was approved by the joint human research review committee of the Max Planck Society and the University of Tübingen. Data from two female subjects were excluded from the study because of poor performance.

1594

R. Adam, U. Noppeney / NeuroImage 52 (2010) 1592–1602

Fig. 1. Experimental rationale, design and example stimuli. A. Additive and interactive mechanisms that mediate the influence of auditory category representations on the processing of object information in the visual modality. In the additive case (left), auditory category signals will co-activate the corresponding visual category-selective area irrespective of the co-occurring visual signal, resulting in a staircase-like activation profile. In the interactive case, auditory category signals induce a response in the corresponding visual categoryselective area depending on the semantic congruency of the auditory and visual signals. F: animal face; L: landmark; a: auditory; v: visual; C: congruent; I: incongruent. B. The 2 × 2 factorial design with the factors (i) auditory prime: landmark sound vs. animal vocalization, and (ii) visual target: animal face vs. landmark. C. Example run and timing of two trials, congruent and incongruent (stimuli only for illustrational purposes).

Stimuli Stimuli were 12 grayscale photographs (6 animal faces, 6 landmarks) and their semantically associated sounds (see full list of stimuli in Supplementary material, Table S1). Animal faces and landmarks were selected as semantic categories, since they are associated with spatially dissociable activations: animal faces are associated with activations in the fusiform gyrus, landmarks are known to activate the parahippocampal gyrus. We used animal faces rather than human faces to match the two categories with respect to variability and familiarity (see Supplementary material). Visual stimuli were 12 photographs obtained from www.PantherMedia.net.©PantherMedia transformed into grayscale images (size 5.03° × 6.96° visual angle). The reliability of the grayscale photographs was manipulated by applying different degrees of Fourier-phasescrambling. The degree of phase scrambling was selected individually for each stimulus to adjust the available object information and maximize the incongruency effects (n.b. auditory information interferes more strongly with visual object recognition when the visual signal is rendered unreliable, Yuval-Greenberg and Deouell, 2007, 2009). The original images and uniform random noise images were separated into spatial frequency amplitude spectra and phase components, using the Fourier transform. The amplitude spectrum

of the original image was recombined with a linear interpolation between the original and random noise phase spectrum. This linear interpolation preserved 30–45% of the original phase components. Based on initial piloting, the level of interpolation was determined individually for each stimulus to maximize the incongruency effect in terms of reaction times and minimize this effect in terms of accuracy. To prevent subjects from using low-level visual cues for the categorization task, we matched the photographs from the two classes with respect to their mean luminance (t8.642 = 1.34 p = 0.21) and root-mean-square (RMS) contrast (t10 = −0.75; p = 0.47). Auditory stimuli were animal vocalizations corresponding to the animal faces and sounds semantically associated with the landmarks shown in the photographs. To maximize the interference effects of the sounds on visual categorization, the sound stimuli were not degraded. The sounds from the two categories were matched with respect to their RMS (t10 = − 1.38; p = 0.20). Identical sound files (1.3 s duration, 44,100 Hz sampling rate) were presented to both ears. Experimental design and procedure (main experiment inside the scanner) In an intersensory visual selective attention paradigm, subjects categorized the degraded pictures as landmarks or animal faces while ignoring the preceding semantically congruent or incongruent intact

R. Adam, U. Noppeney / NeuroImage 52 (2010) 1592–1602

sound. The 2 × 2 factorial design manipulated (i) auditory prime: sound associated with landmark (auditory landmark: La) vs. animal vocalization (auditory face: Fa), and (ii) visual target: landmark (Lv) vs. animal face (Fv) (Fig. 1B). The interaction between visual and auditory category tests explicitly for the semantic audiovisual (in)congruency effect. Please note that in ‘classical’ audiovisual congruency designs the categories are not counted as different levels in each factor and our experiment would be collapsed into a simple design with two conditions, i.e. congruent vs. incongruent. Yet, the separation of congruent and incongruent trials according to visual and auditory categories is important to explicitly dissociate additive and interactive effects of category-selective inputs from different sensory modalities. Please note that the experimental design can also be rearranged into a 2 × 2 factorial design manipulating (i) visual target: landmark (Lv) vs. animal face (Fv) and (ii) (in)congruency: incongruent vs. congruent auditory input. In this design, the effect of auditory category would emerge as the interaction between visual category and (in)congruency. These different descriptions of the experimental design lend themselves more easily to one or another interpretation, but are basically equivalent configurations of our experimental paradigm. We will make use of the two perspectives when discussing our regional activation and effective connectivity results. The auditory prime was played for 1.3 s, followed by the static presentation of the visual target for 1.3 s, subsequently followed by 1.55 s fixation. Hence, the stimulus onset asynchrony (SOA) was 4.15 s (i.e. 1.3 s + 1.3 s + 1.55 s) (Fig. 1C). Subjects responded from the onset of the task-relevant visual stimulus till the end of the fixation period, resulting in a response time interval of 2.85 s. Subjects were instructed to respond as quickly and accurately as possible, with a special emphasis placed on accuracy. The mapping from stimulus category to button/finger was counterbalanced across subjects. 50% of the trials required a ‘landmark’, 50% of the trials an ‘animal face’ response. 50% of the trials were congruent, 50% incongruent. Based on our previous study showing equivalent incongruency effects for categorization irrespective of whether the response dimension was orthogonal to the stimulus manipulation (Noppeney et al., 2008), we used a natural object categorization task (i.e. landmarks vs. animal faces), rather than a less common task, with a categorization dimension orthogonal to the stimulus manipulation (i.e. where stimuli from both object categories can potentially be mapped onto the same response e.g. “country origin?” of landmark or animal). Further, a natural object categorization task that combines semantic incongruency (= non-matching of visual and auditory object category) and response (=non-matching of response based on visual target and auditory prime) conflict is expected to maximize the overall crossmodal interference effects. Blocks of 8 activation trials (block duration ∼ 33 s) were interleaved with 7 s fixation. A randomized sequence of stimuli and activation conditions was generated for each subject. Subjects participated in two sessions inside the scanner (∼ 10 min each). Summed over the two sessions, each of the 12 visual and auditory stimulus components (6 landmarks and 6 animal faces/vocalizations) was presented 10 times in 2 of the 4 conditions (e.g. a visual landmark was presented 10 times in LvLa and 10 times in LvFa), amounting to a total of 240 trials (i.e. 12 × 10 × 2). Data in two additional scanning sessions were acquired for an experiment that is not reported in this communication. Training and familiarization with stimuli (prior to main experiment) To prevent subjects from stimulus and association learning during the course of the main experiment, they were overfamiliarized with the stimuli pairs prior to the main experiment. First, outside the scanner, subjects passively viewed the congruent stimuli (photographs presented together with their matching sounds). Second, outside the scanner, they participated in a behavioral training session

1595

with a design that was similar to the main experiment, but additionally manipulated the onset asynchrony between the auditory and visual stimuli and whether the visual or auditory input was to be categorized (i.e. task-relevant). Finally, inside the scanner, subjects passively viewed the audiovisual congruent stimuli again, twice in random order. Experimental setup Visual and auditory stimuli were presented using the Cogent 2000 v1.25 (developed by the Cogent 2000 team at the FIL and the ICN and Cogent Graphics developed by John Romaya at the LON at the Wellcome Department of Imaging Neuroscience, UCL, London, UK) running under MATLAB (Mathworks Inc., Natick, MA, USA) on a Windows PC. Visual stimuli were back-projected onto a Plexiglas screen using a LCD projector (JVC Ltd., Yokohama, Japan) visible to the subject through a mirror mounted on the MR head coil. Auditory stimuli were presented at approximately 80 dB SPL, using MRcompatible headphones (MR Confon GmbH, Magdeburg, Germany). The subjects performed the behavioral task using a MR-compatible custom-built button device connected to the stimulus computer. MRI data acquisition A 3 T Siemens Magnetom Trio System (Siemens, Erlangen, Germany) was used to acquire both T1-weighted anatomical images and T2*-weighted axial echoplanar images with blood oxygenation level-dependent (BOLD) contrast (gradient echo, TR = 3080 ms, TE = 40 ms, flip angle = 90°, FOV = 192 mm × 192 mm, image matrix 64 × 64 mm, 38 transversal slices acquired sequentially in ascending direction, voxel size = 3.0 mm × 3.0 mm × 2.5 mm + 0.5 mm interslice gap) using a 12-channel head coil (Siemens, Erlangen, Germany). Each subject participated in four experimental sessions. Only two of those sessions (with 208 volume images per session) are reported in the current study. The first three volumes were discarded to allow for T1 equilibration effects. A three-dimensional high-resolution T1 anatomical image was acquired (TR = 2300 ms, TE = 2.98 ms, TI = 1100 ms, flip angle = 9°, FOV = 256mm × 240mm × 176 mm, isotropic spatial resolution 1 mm) for each subject. Data analysis Conventional fMRI data analysis The functional MRI data was analyzed with statistical parametric mapping (SPM5, Wellcome Department of Imaging Neuroscience, London, UK; www.fil.ion.ucl.ac.uk/spm) (Friston et al., 1995). Scans from each subject were realigned using the first as a reference, unwarped, spatially normalized into MNI standard space (Evans et al., 1992; Talairach and Tournoux, 1988), resampled to 2 × 2 × 2 mm3 voxels and spatially smoothed with a Gaussian kernel of 8 mm FWHM. The timeseries in each voxel were highpass filtered to 1/ 128 Hz. The fMRI experiment was modeled in an event related fashion with regressors entered into the design matrix after convolving each event-related unit impulse function (logged to the onset of the auditory prime) with the canonical hemodynamic response function and its first temporal derivative. The statistical model included the 4 conditions in our 2 × 2 factorial design separately for each session. Nuisance covariates included the realignment parameters (to account for residual motion artifacts). Condition-specific effects for each subject were estimated according to the general linear model and passed to a second-level analysis as contrasts. This involved creating 4 contrast images (i.e. each of the 4 conditions summed over the two sessions) for each subject and entering them into a second level ANOVA. Inferences were made at the second level to allow for a random effects analysis and inferences at the population level (Friston et al., 1999).

1596

R. Adam, U. Noppeney / NeuroImage 52 (2010) 1592–1602

The following statistical comparisons were evaluated at the second-level: visual landmark N visual animal face visual animal face N visual landmark incongruent N congruent congruent N incongruent auditory animal face (i.e. animal vocalization) N auditory landmark (6) auditory landmark N auditory animal face (i.e. animal vocalization).

(1) (2) (3) (4) (5)

Psychophysiological interactions Effective connectivity analyses were used to address the question of how category-selective activations in the ventral occipito-temporal cortex are influenced by category-selective responses in the auditory system (i.e. superior temporal sulcus). To investigate whether the category-selective regions in the ventral occipito-temporal cortex were differentially coupled with the superior temporal cortices during congruent and incongruent trials, a psychophysiologic analysis (Friston et al., 1997) was performed where the BOLD signal timecourse in a region within the superior temporal sulcus was the physiological factor and audiovisual semantic (in)congruency was the psychological factor. For this, we first identified a region within the superior temporal sulcus that was differentially activated for animal vocalizations vs. landmark sounds (superior temporal sulci (STS); right: 62, −24, −2 and left: −66, −26, 2). Region-specific timeseries comprised the first eigenvariate of all voxels within a 4 mm radius sphere centered on these maxima (Friston et al., 2003; Noppeney et al., 2008; Noppeney et al., 2006). The BOLD signal timecourses of the left and right STS (after deconvolution) were then multiplied with the psychological factor (=(in)congruency) to form the psychophysiological interaction term. For each subject, the physiological variable (i.e. signal timecourse in either right or left superior temporal sulcus), the psychological variable (i.e. (in) congruency) and the psychophysiological interaction term (and the realignment parameters as confounds) were entered into a new first level general linear model for each subject separately for the right and left hemispheric STS seed regions. The parameter estimates for the psychophysiological interaction term were entered into 2nd level one sample t-tests (separately for left and right STS) to enable a random effects analysis and generalization to the population. A significant psychophysiological interaction reflects a change in coupling or effective connectivity between the region in the superior temporal sulcus and a region identified by the psychophysiological interaction as a function of audiovisual semantic (in)congruency. For further characterization of the psychophysiological interaction effect (i.e. not for statistics), we also estimated a model that replaced the psychophysiological interaction and the physiological regressors by two regressors generated by multiplying the physiological variable with a psychological variable modelling the (i) congruent or (ii) the incongruent trials. This analysis enabled us to plot the parameter estimates of the regression slopes (=effective connectivity) separately for congruent and incongruent conditions. Hence this additional analysis provided insights into the coupling between STS and a potential PPI region for each of the congruent and incongruent conditions. Search volume constraints Each effect was tested for within the neural systems engaged in audiovisual processing (i.e. all stimuliN fixation at pb 0.05 uncorrected, including 49,683 voxels). In addition, given our a priori hypotheses based on previous functional imaging findings, the following effects were tested for within anatomical search volumes (=region of interest), defined by the AAL library (Tzourio-Mazoyer et al., 2002) using the MarsBaR (http:// marsbar.sourceforge.net/) toolbox (Brett et al., 2002). All search volumes were additionally constrained to the neural systems activated relative to

fixation. (1) Since animal vocalizations were hypothesized to evoke increased activation in the fusiform gyrus, the effect of animal vocalizations (i.e. contrast 5) was tested for selectively in the L. and R. fusiform gyrus. (2) Conversely, the effect of ‘landmark sounds’ (i.e. contrast 6) was tested for selectively in the L. and R. parahippocampal gyrus. (3) Given extensive evidence for a role of lateral (i.e. inferior frontal sulcus (IFS)) and medial prefrontal (mPFC/anterior cingulate (AC)) in crossmodal incongruency (Hein et al., 2007; Noppeney et al., 2008; Noppeney et al., 2010) and related conflict or Stroop effects (Botvinick et al., 2001; Brown and Braver, 2005; Duncan and Owen, 2000; Kerns et al., 2004; Paus, 2001), the incongruency effects (i.e. contrast 3) were tested for selectively within two search masks: L. inferior frontal sulcus (AAL: middle/inferior (triangular part) frontal gyri) and medial prefrontal cortex (AAL: bilateral anterior and median cingulate and paracingulate gyri, and bilateral supplementary motor areas). Unless otherwise stated, activations are reported at p b 0.05 at the voxel level corrected for multiple comparisons (family-wise error rate, FWE) within the appropriate search volume. Results were superimposed onto an averaged normalized brain using the MRIcroN software (http://www.cabiatl.com/mricro/mricron/). For full characterization of the data, we also report activations after whole brain correction in the Supplementary material, Table S3. Results In the following, we report the behavioral results inside the scanner, the conventional fMRI analysis and the effective connectivity analysis (psychophysiologic interaction). Behavioral results Fig. 2 displays performance accuracy and reaction times (RT) across the four conditions (see also Supplementary Table S2). A 2 × 2 repeated measures ANOVA of accuracy with within-subject factors of auditory prime (landmark vs. animal vocalization) and visual target (landmark vs. animal face) revealed no significant main effects or interaction. A 2 × 2 repeated measures ANOVA of reaction times with auditory prime (landmark vs. animal vocalization) and visual target (landmark vs. animal face) revealed a significant main effect of

Fig. 2. Accuracy (A) and response times (B) for the four conditions (across subjects' mean ± SEM). F: animal face; L: landmark.

R. Adam, U. Noppeney / NeuroImage 52 (2010) 1592–1602

auditory prime (F(1,25) = 28.613, p b 0.001) and a significant interaction between visual and auditory category (F(1,25) = 19.7, p b 0.001). Please note that an interaction between visual and auditory category is equivalent to a main effect of (in)congruency (see Materials and methods section for further explanation). Post-hoc comparisons testing for the simple main effects showed that visual faces were discriminated more slowly when paired with incongruent landmark sounds (FvLa) than when paired with congruent animal vocalizations (FvFa) (t(1,25) = 13.6, p b 0.001, Bonferroni corrected). In contrast, visual landmarks were not discriminated significantly more slowly when paired with incongruent animal vocalizations (LvFa) than when paired with congruent landmark sounds (LvLa) (t(1,25) = −0.4, p = 0.7, Bonferroni corrected). Please note that these results do not necessarily implicate that the incongruency manipulation is stronger for faces than landmarks. Because of the specific nature of our design, the simple main effects of (in)congruency involve a change in category of the auditory stimulus. Thus, a potential (in)congruency effect for visual landmarks may be reduced because of longer processing times for ‘congruent’ landmark sounds relative to ‘incongruent’ animal vocalizations (as indicated by a main effect of auditory category). In other words, a simple main effect of (in)congruency is not distinguishable from a simple main effect of auditory category. Therefore, we will refrain from further interpreting the simple main effects of (in)congruency for the behavioral data (i.e. incongruency effects limited to one visual category only) and focus on the general main effect of (in)congruency where the effects of categories are fully counterbalanced. fMRI: conventional SPM analysis Main effect of visual category Category-selective visual regions were identified by directly comparing visual landmarks and animal faces. Pictures of landmarks relative to animal faces increased activations in the bilateral parahippocampal gyri, the right collateral sulcus and the bilateral middle occipital gyri. In contrast, pictures of animal faces relative to landmarks enhanced activations bilaterally in the fusiform and the middle occipital gyri (Table 1). This activation pattern nicely conforms to the well-known category-selective organization within the ventral occipito-temporal cortex, most notably the double dissociation between face selectivity in the fusiform gyri and landmark selectivity in the parahippocampal gyri (Eger et al., 2005; Epstein and Kanwisher,

1998; Grill-Spector et al., 2004; Kanwisher et al., 1997; Kanwisher et al., 1999; Maguire et al., 2001; Rotshtein et al., 2007). The double dissociation of activations for animal faces and landmarks therefore validates our stimuli and shows that animal faces are sufficiently potent to selectively activate the fusiform gyri. Main effect of auditory category Category-selective auditory regions were identified by directly comparing landmark sounds and animal vocalizations. Animal vocalizations relative to landmark sounds increased activations bilaterally in the superior temporal gyri and sulci (Table 1, Supplementary material Fig. S1). Even though the sounds between the two classes were equated with respect to RMS contrast, these activation increases might be attributed to differences in the higher order statistical properties of the sounds from the two categories (e.g. differences in spectrotemporal structure or mean amplitude). In contrast, landmark sounds relative to animal vocalizations did not induce significant activations when correcting for multiple comparisons within the areas involved in audiovisual processing. Based on our a priori hypothesis that the category-selective fusiform and parahippocampal activations may be influenced by auditory category information, we then tested for effects of auditory category selectively in these two regions of interest (see Materials and methods). Contrary to our expectations, no significant effects of auditory category (animal vocalization N landmark sound) were observed when correcting for multiple comparisons within the left and right fusiform gyri possibly because human fusiform gyri are specialized selectively for human voices rather than animal vocalizations. However, landmark sounds relative to animal vocalizations significantly increased activations in the left and right anterior parahippocampal gyri (p = 0.103 in left and p = 0.014 right, corrected for multiple comparisons within the region of interest; Table 2, Fig. 3A). As shown in the parameter estimate plots, the effects of auditory and visual category combine in an additive fashion in both the left and right anterior parahippocampal gyri leading to a gradual staircase increase in BOLD response for FvFab FvLab LvFa b LvLa (Fig. 3B). In other words, the anterior parahippocampal gyri show main effects of visual and auditory category in the absence of an interaction (=no (in)congruency effect). Table 2 fMRI activations (region of interest analysis). Region

Table 1 Category-selective activations. Region

Hemisphere

Hemisphere

zscore

pFWEvalue

− 20

3.23

0.014

− 30 − 32

− 18 − 12

2.90 2.20

0.037 0.103

− 42 − 34 0

34 32 12

34 20 48

3. 78 3.64 3.77

0.009 0.015 0.005

6

− 22

30

3.12

0.041

− 22

− 32

− 18

2.62

0.048

26

− 36

− 12

2.37

0.156

− 20

− 32

− 16

2.56

0.054

22

− 38

− 12

2.53

0.107

MNI coordinates x

MNI coordinates

zscore

pFWEvalue

x

y

z

Visual landmark N visual animal face Parahippocampal gyrus/ L collateral sulcus R Collateral sulcus R Middle occipital gyrus R L

− 26 28 30 38 − 32

− 46 − 44 − 34 − 82 − 90

− 12 − 10 − 16 18 20

N7 N7 N7 6.44 6.12

b 0.001 b 0.001 b 0.001 b 0.001 b 0.001

Visual animal face N visual landmark Fusiform gyrus R L Middle occipital gyrus R R L L

40 − 40 46 46 − 40 −44

− 48 − 48 − 78 − 76 − 82 − 80

− 20 − 20 −6 0 −8 −2

N7 N7 N7 N7 N7 N7

b 0.001 b 0.001 b 0.001 b 0.001 b 0.001 b 0.001

Animal vocalization N auditory landmark Superior temporal sulcus L R Superior temporal gyrus L R

1597

− 66 62 − 62 62

− 26 − 24 − 12 −4

2 −2 2 −4

N7 6.69 6.80 6.17

b 0.001 b 0.001 b 0.001 b 0.001

p-values are corrected for multiple comparisons within the search volume defined by stimuli N fixation at p b 0.05 uncorrected (49,683 voxels).

Auditory landmark N animal vocalization Anterior parahippocampal R gyrus/collateral sulcus R L Incongruent stimuli N congruent stimuli Middle frontal gyrus L Inferior frontal sulcus L Medial prefrontal/ cingulate sulcus R Psychophysiological interaction using R. STS as the physiological seed region Anterior parahippocampal L gyrus/collateral sulcus R using L. STS as the physiological seed region Anterior parahippocampal L gyrus/collateral sulcus R

y

z

26

− 28

30 − 22

p-values are corrected for multiple comparisons within the anatomical search volumes according to the AAL library, see Materials and methods for further details.

1598

R. Adam, U. Noppeney / NeuroImage 52 (2010) 1592–1602

Fig. 3. Additive effects of visual and auditory category bilaterally in the parahippocampal gyri. A. Visual landmark (yellow), visual animal face (red), and auditory landmark (light green) selective effects are displayed on sagittal and axial slices of a mean structural image created by averaging the subjects' normalized structural images. Activations are shown at p b 0.05, FWE-corrected at the voxel level for multiple comparisons within the appropriate search volume. For illustrational purposes only, the auditory landmark selective effects are also displayed at height threshold of p b 0.01 uncorrected, masked with all stimuli N fixation at p b 0.05 uncorrected (dark green). B. Parameter estimates (mean ± 90% CI) at the given co-ordinates. The bar graphs represent the size of the effect in non-dimensional units (corresponding to % whole brain mean). F: animal face; L: landmark.

Interaction between visual and auditory category: audiovisual (in) congruency effect As expected, increased activations for incongruent relative congruent audiovisual stimuli were observed in the left inferior frontal sulcus, the left middle frontal gyrus, and the medial prefrontal cortex/anterior cingulate sulcus and gyrus (Table 2, Fig. 4). This pattern of activation is consistent with previous studies investigating audiovisual incongruency (Hein et al., 2007; Noppeney et al., 2008; van Atteveldt et al., 2007) and conflict-related processing in general (Botvinick et al., 2001; Brown and Braver, 2005; Duncan and Owen, 2000; Hein et al., 2007; Kerns et al., 2004; Noppeney et al., 2008; Olivetti Belardinelli et al., 2004; Orr and Weissman, 2009; Paus, 2001; van Atteveldt et al., 2007). For completeness, no activation increases were observed for congruent relative to incongruent trials. Effective connectivity analysis: psychophysiological interactions To better understand how auditory category information influences visually-evoked category-selective activations in the ventral occipitotemporal cortex, we employed a psychophysiological interaction analysis with a region within the right (or left) superior temporal sulcus (STS) as the physiological seed (n.b. this is not an a priori region of

interest, Figs. 5A, B) and (in)congruency as the psychological factor. The right STS was more negatively coupled bilaterally with the parahippocampal gyri for congruent than incongruent trials (Fig. 5, Table 2). A similar connectivity profile was observed when using the left STS as the physiological variable (Table 2). This suggests that congruent auditory category input leads to a stronger differentiation between landmark and face responses in the parahippocampal gyri via increased negative effective connectivity between STS and parahippocampal gyri. More specifically, if the auditory and visual inputs are incongruent (i.e. LvFa and FvLa), the coupling between STS and parahippocampal gyri is negligible (Fig. 5C). Thus, category-selective inputs from the auditory processing system exert only small influences on parahippocampal activity leading to reduced category-selectivity in parahippocampal gyri (i.e. in the incongruent case: LvFa N FvLa is reduced, Fig. 5D). Hence, in the incongruent case, the activation differs only slightly for pictures of landmarks (LvFa) and pictures of animal faces (FvLa) in the parahippocampal gyri. In contrast, when auditory and visual inputs are congruent, the STS couples negatively with activity in the parahippocampal gyri (Fig. 5C). This negative coupling during congruent conditions means that the STS amplifies parahippocampal activation to landmark pictures and reduces activation to pictures of animal faces (i.e. in the congruent case: LvLaN FvFa is increased, Fig. 5D). As a consequence, congruent auditory category input increases category-

R. Adam, U. Noppeney / NeuroImage 52 (2010) 1592–1602

1599

Fig. 4. Left: Audiovisual incongruency effects in (A) the left inferior frontal sulcus and (B) medial prefrontal/cingulate sulcus are displayed on coronal, axial and sagittal slices of a mean structural image created by averaging the subjects' normalized structural images. Activations in light yellow are shown at p b 0.05, FWE-corrected at the voxel level for multiple comparisons within the appropriate search volume. For illustrational purposes only, the activations are also displayed at height threshold of p b 0.01 uncorrected, masked with all stimuli N fixation at p b 0.05 uncorrected (dark yellow). Right: Parameter estimates (mean ± 90% CI) at the given co-ordinates. The bar graph represents the size of the effect in nondimensional units (corresponding to % whole brain mean). F: animal face; L: landmark.

selectivity in the parahippocampal gyri via increased negative coupling between STS and parahippocampal gyri.

Discussion The current study investigated where and how task-irrelevant auditory semantic stimuli influence the processing of task-relevant visual object stimuli within the cortical hierarchy. In an intersensory visual selective attention paradigm, subjects categorized pictures as landmarks or animal faces while ignoring prior sounds that were semantically congruent or incongruent. Behaviorally, subjects were slower for incongruent than congruent trials indicating that auditory information is processed at a semantic level and influences response selection processes despite being task-irrelevant. Using fMRI, we then investigated at which hierarchical level task-irrelevant auditory semantic information influences the visual processing stream. Using categories associated with selective activations in the ventral occipitotemporal cortex, our experimental design enabled us to dissociate two types of auditory influences. First, auditory and visual object signals converged and combined additively in category-selective areas within the ventral occipito-temporal cortices. Second, auditory and visual inputs from different categories combined interactively, hence inducing ‘classical audiovisual incongruency effects’ in components of a higher order executive system.

It is well established that landmark and face stimuli presented in the visual modality elicit category-selective activations in the ventral occipito-temporal cortex. So far, only few studies have investigated and demonstrated the influence of auditory semantic stimuli on category-selective activations in the visual processing stream. For instance, voices that had been paired with faces in a training phase elicited activation in the fusiform face area (von Kriegstein and Giraud, 2006); mechanical sounds were associated with activation in the parahippocampal gyri (Engel et al., 2009). Furthermore, selective attention to speech (relative to melodies) increased activations in the visual word form area (Yoncheva et al., 2009). Collectively, these studies have demonstrated that auditory stimuli elicit activations in brain areas involved in processing visual representations that are associated with the auditory stimuli either via semantics or prior associative learning. However, since these fMRI studies presented subjects with auditory stimuli alone, they cannot characterize how neural responses to visual and auditory object stimuli from distinct semantic categories combine within those regions. Using auditory and visual stimulus categories that are associated with selective activations in the ventral occipito-temporal cortex enabled us to investigate whether auditory and visual signals combine in an additive or interactive fashion in the ventral occipito-temporal cortex, most notably the fusiform and the parahippocampal gyri. Indeed, activation in the anterior parahippocampal gyrus was significantly influenced by both visual and auditory object category.

1600

R. Adam, U. Noppeney / NeuroImage 52 (2010) 1592–1602

Fig. 5. Activation results pertaining to the psychophysiological interactions using the right STS region (62, − 24, − 2) as the seed and audiovisual (in)congruency as the psychological factor. A. The right STS (red) is shown on a rendered brain. Activations pertaining to the psychophysiological interaction effects (white) and auditory landmark selective activations (green) are shown on coronal and axial slices of the average structural image. Height threshold of p b 0.01 uncorrected (for illustrational purposes, masked with all stimuli N fixation at p b 0.05 uncorrected). B. Activation timecourse in the seed region [62 − 24 − 2] of the right superior temporal sulcus is shown for a representative subject (one session only). C. Coupling strength in terms of regression coefficients. Parameter estimates (mean ± 90% CI) for the regression of activation in the left anterior parahippocampal gyrus (−22, −32, −18) on the neural activity in rSTS region (62, −24, −2) for congruent and incongruent conditions. The rSTS shows a negative coupling with the left parahippocampal gyrus only for congruent stimuli. The bar graph represents the size of the effect in non-dimensional units (corresponding to % whole brain mean). F: animal face; L: landmark. D. Parameter estimates (mean± 90% CI) for condition-specific activations at the peak co-ordinate identified by the psychophysiologic interaction analysis. The activation difference between visual landmarks and animal faces is amplified in the context of congruent auditory stimuli. The bar graphs represent the size of the effect in non-dimensional units (corresponding to % whole brain mean). F: animal face; L: landmark; v: visual.

Increased activations were observed for landmark picturesN animal face pictures and for landmark soundsN animal vocalizations. More specifically, our results revealed a functional dissociation within the parahippocampal gyrus with the posterior part being affected only by the semantic category of the visual pictures and the anterior part showing additive effects of both picture and sound categories. As shown in Fig. 3, activation in the anterior parahippocampal gyri gradually increased in a staircase fashion with FvFab FvLab LvFab LvLa. Landmark sounds induce categoryselective activations in the parahippocampal gyrus irrespective of whether they are paired with landmark or animal face pictures via mechanisms of co-activation. Yet, an alternative perspective on our results can be obtained by rearranging and relabelling our experimental factors as visual category (landmark vs. animal face) and incongruency (incongruent vs. congruent). From this perspective, the parahippocampal activation profile can be characterized as visual category-selective activations that are modulated by the congruency of auditory input. As shown in the parameter estimate plots, visual category-selectivity is greater when the auditory category is congruent (FvFa b LvLa) than incongruent (FvLab LvFa) with the visual category. This interpretation fits nicely with our additional psychophysiological interaction analysis showing a stronger negative coupling of the auditory cortex with the anterior parahippocampal gyri for congruent than incongruent trials. Thus,

category-dependent variation in STS activation influences visual category-selective activations primarily, when the auditory input is congruent with the visual input. An increase in STS activation suppresses parahippocampal responses to congruent animal face pictures. Conversely, a decrease in STS activation amplifies parahippocampal activations to congruent landmark pictures. Collectively, stronger negative coupling between auditory STS and anterior parahippocampal gyri for congruent stimuli increases the landmark-selectivity in the parahippocampal gyrus leading to a sharpening and differentiation of visual category representations in the context of congruent auditory category input. Contrary to our expectations, activation in the fusiform gyrus was modulated only by the category of the visual but not the auditory input. This null result may reflect the more modular organization of the fusiform face area that renders it relatively immune and insensitive to taskirrelevant auditory inputs. In line with this conjecture, a recent study (von Kriegstein et al., 2005) demonstrated fusiform activations for voices only during speaker but not sentence recognition. Furthermore, the fusiform gyri in humans may be more selectively specialized for human voices rather than animal vocalizations in general (Fecteau et al., 2004; Petkov et al., 2008 for a species-specific area in monkeys). Clearly, future studies are needed to further investigate the determinants of auditory evoked activations in the fusiform gyrus.

R. Adam, U. Noppeney / NeuroImage 52 (2010) 1592–1602

Our experimental design enabled us to identify not only additive, but also interactive response combinations of auditory and visual category inputs, i.e. classical audiovisual (in)congruency effects. (In) congruency effects have often been used as an index of multisensory integration, both in human EEG and fMRI studies. EEG studies have revealed audiovisual (in)congruency effects in the human brain for arbitrary AV stimulus pairs (Giard and Peronnet, 1999) and complex natural AV objects (Molholm et al., 2004). fMRI studies have implicated a widespread neural system encompassing sensory (van Atteveldt et al., 2004; van Atteveldt et al., 2007) and higher order executive regions (Hein et al., 2007; Noppeney et al., 2008; Olivetti Belardinelli et al., 2004; for review see Doehrmann and Naumer, 2008) in AV (in)congruency. Interestingly, during passive listening and viewing, congruent audio-visual stimuli that allow successful binding of sensory inputs are associated with increased activation relative to incongruent or uni-modal stimuli (Calvert et al., 2000; van Atteveldt et al., 2004). In contrast, during intersensory selective attention tasks, incongruent stimulus pairs induced activation increases in areas along the visual or auditory processing streams and prefrontal cortices (Laurienti et al., 2003; Noppeney et al., 2008; Sadaghiani et al., 2009). These opposite activation patterns highlight the role of task-context and attention on the neural processes underlying multisensory integration. Activation increases for congruent relative to incongruent stimuli are observed primarily when both auditory and visual signals are attended, relevant and integrated into a unified percept. In contrast, in intersensory selective attention paradigms activation increases for incongruent stimuli are observed along the attended and task-relevant sensory processing streams. Thus, previous intersensory visual selective attention paradigms reported incongruency effects in the ventral occipito-temporal cortex (Laurienti et al., 2003; Taylor et al., 2009; Weissman et al., 2004), while auditory selective attention paradigms reported them along the auditory processing stream (Noppeney et al., 2008). To our surprise, the current study did not observe incongruency effects (i.e. interactions between visual and auditory category) anywhere within the ventral occipito-temporal cortex (particularly in the fusiform or parahippocampal gyri). While caution needs to be applied when interpreting null results, there are two main factors that may account for these differences in BOLD response profile across studies. First, in previous studies, auditory and visual signals were often presented concurrently rather than sequentially as in the current immediate priming paradigm. Second, in contrast to previous experiments, the current experiment deliberately selected two stimulus categories (animal face vs. landmark) that are associated with spatially dissociable activations within the ventral occipito-temporal cortex. By design, incongruency effects for landmarks and faces would thus be located in spatially non-overlapping brain areas rendering the current experimental design less sensitive to a main effect of incongruency within the ventral occipito-temporal cortex. However, ‘classical’ audiovisual incongruency effects were observed selectively in the left inferior frontal sulcus and cingulate sulcus/mPFC that have previously been implicated in conflict and response selection processes (Botvinick et al., 2001; Brown and Braver, 2005; Duncan and Owen, 2000; Kerns et al., 2004; Paus, 2001). More recently, the left inferior frontal sulcus has also been shown to be influenced by audiovisual (in)congruency of visual and auditory inputs in selective attention and passive paradigms (Hein et al., 2007; Noppeney et al., 2008; van Atteveldt et al., 2007). Collectively, these studies suggest that the left inferior frontal sulcus may accumulate evidence for response alternatives from multiple senses (Kiani et al., 2008; Roitman and Shadlen, 2002). If the auditory and visual senses disagree and opt for different responses, this neural accumulation process is protracted leading to an increased BOLD response (Werner and Noppeney, 2010; Noppeney et al., 2010). In conclusion, factorially manipulating the semantic category of the auditory and visual inputs enabled us to dissociate two processing

1601

stages at which task-irrelevant auditory category input influences the visual processing stream. First, in the anterior parahippocampal gyri, auditory inputs influence visual ‘landmark’-selective activations in an additive fashion leading to a staircase profile with activation gradually increasing for FvFa b FvLa b LvFa b LvLa. Category-selectivity in the ventral occipito-temporal cortex is thus not only visual-based but a multisensory phenomenon by which category-selective information from the auditory and visual senses is summed for object categorization. Based on our effective connectivity analysis, the increased landmark-selectivity for congruent (FvFa b LvLa) than incongruent (FvLa b LvFa) stimuli is mediated by increased negative coupling between STS and parahippocampal gyrus for congruent auditory input. Second, as indicated by the incongruency effects in the lateral and medial prefrontal cortices, auditory and visual category information are then interactively integrated into a decision variable to enable selection of an appropriate response. Acknowledgments This work was supported by the Max-Planck Society. We thank all members of the Cognitive Neuroimaging Group, in particular Johannes Tuennerhoff and Sebastian Werner for their help. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.neuroimage.2010.05.002. References Amedi, A., von Kriegstein, K., van Atteveldt, N.M., Beauchamp, M.S., Naumer, M.J., 2005. Functional imaging of human crossmodal identification and object recognition. Exp. Brain Res. 166, 559–571. Belin, P., Zatorre, R.J., Lafaille, P., Ahad, P., Pike, B., 2000. Voice-selective areas in human auditory cortex. Nature 403, 309–312. Botvinick, M.M., Braver, T.S., Barch, D.M., Carter, C.S., Cohen, J.D., 2001. Conflict monitoring and cognitive control. Psychol. Rev. 108, 624–652. Brett, M., Anton, J.L., Valabregue, R., Poline, J.B., 2002. Region of interest analysis using an SPM toolbox. Neuroimage 16, 2 8th International Conference on Functional Mapping of the Human Brain. Brown, J.W., Braver, T.S., 2005. Learned predictions of error likelihood in the anterior cingulate cortex. Science 307, 1118–1121. Calvert, G.A., Campbell, R., Brammer, M.J., 2000. Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Curr. Biol. 10, 649–657. Doehrmann, O., Naumer, M.J., 2008. Semantics and the multisensory brain: how meaning modulates processes of audio-visual integration. Brain Res. 1242, 136–150. Duncan, J., Owen, A.M., 2000. Common regions of the human frontal lobe recruited by diverse cognitive demands. Trends Neurosci. 23, 475–483. Eger, E., Schweinberger, S.R., Dolan, R.J., Henson, R.N., 2005. Familiarity enhances invariance of face representations in human ventral visual cortex: fMRI evidence. Neuroimage 26, 1128–1139. Egner, T., Hirsch, J., 2005. Cognitive control mechanisms resolve conflict through cortical amplification of task-relevant information. Nat. Neurosci. 8, 1784–1790. Engel, L.R., Frum, C., Puce, A., Walker, N.A., Lewis, J.W., 2009. Different categories of living and non-living sound-sources activate distinct cortical networks. Neuroimage 47, 1778–1791. Epstein, R., Kanwisher, N., 1998. A cortical representation of the local visual environment. Nature 392, 598–601. Evans, A.C., Marrett, S., Neelin, P., Collins, L., Worsley, K., Dai, W., Milot, S., Meyer, E., Bub, D., 1992. Anatomical mapping of functional activation in stereotactic coordinate space. Neuroimage 1, 43–53. Fecteau, S., Armony, J.L., Joanette, Y., Belin, P., 2004. Is voice processing species-specific in human auditory cortex? An fMRI study. Neuroimage 23, 840–848. Friston, K., 2009. The free-energy principle: a rough guide to the brain? Trends Cogn. Sci. 13, 293–301. Friston, K.J., Buechel, C., Fink, G.R., Morris, J., Rolls, E., Dolan, R.J., 1997. Psychophysiological and modulatory interactions in neuroimaging. Neuroimage 6, 218–229. Friston, K.J., Harrison, L., Penny, W., 2003. Dynamic causal modelling. Neuroimage 19, 1273–1302. Friston, K.J., Holmes, A., Worsley, K., Poline, J., Frith, C.D., Frackowiak, R., 1995. Statistical parametric mapping: a general linear approach. Hum. Brain Mapp. 2, 189–210. Friston, K.J., Holmes, A.P., Price, C.J., Buchel, C., Worsley, K.J., 1999. Multisubject fMRI studies and conjunction analyses. Neuroimage 10, 385–396.

1602

R. Adam, U. Noppeney / NeuroImage 52 (2010) 1592–1602

Giard, M.H., Peronnet, F., 1999. Auditory-visual integration during multimodal object recognition in humans: a behavioral and electrophysiological study. J. Cogn. Neurosci. 11, 473–490. Grill-Spector, K., Knouf, N., Kanwisher, N., 2004. The fusiform face area subserves face perception, not generic within-category identification. Nat. Neurosci. 7, 555–562. Hein, G., Doehrmann, O., Muller, N.G., Kaiser, J., Muckli, L., Naumer, M.J., 2007. Object familiarity and semantic congruency modulate responses in cortical audiovisual integration areas. J. Neurosci. 27, 7881–7887. Kanwisher, N., McDermott, J., Chun, M.M., 1997. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311. Kanwisher, N., Stanley, D., Harris, A., 1999. The fusiform face area is selective for faces not animals. NeuroReport 10, 183–187. Kerns, J.G., Cohen, J.D., MacDonald III, A.W., Cho, R.Y., Stenger, V.A., Carter, C.S., 2004. Anterior cingulate conflict monitoring and adjustments in control. Science 303, 1023–1026. Kiani, R., Hanks, T.D., Shadlen, M.N., 2008. Bounded integration in parietal cortex underlies decisions even when viewing duration is dictated by the environment. J. Neurosci. 28, 3017–3029. Laurienti, P.J., Wallace, M.T., Maldjian, J.A., Susi, C.M., Stein, B.E., Burdette, J.H., 2003. Cross-modal sensory processing in the anterior cingulate and medial prefrontal cortices. Hum. Brain Mapp. 19, 213–223. Lewis, J.W., Wightman, F.L., Brefczynski, J.A., Phinney, R.E., Binder, J.R., DeYoe, E.A., 2004. Human brain regions involved in recognizing environmental sounds. Cereb. Cortex 14, 1008–1021. Maguire, E.A., Frith, C.D., Cipolotti, L., 2001. Distinct neural systems for the encoding and recognition of topography and faces. Neuroimage 13, 743–750. Molholm, S., Martinez, A., Shpaner, M., Foxe, J.J., 2007. Object-based attention is multisensory: co-activation of an object's representations in ignored sensory modalities. Eur. J. Neurosci. 26, 499–509. Molholm, S., Ritter, W., Javitt, D.C., Foxe, J.J., 2004. Multisensory visual-auditory object recognition in humans: a high-density electrical mapping study. Cereb. Cortex 14, 452–465. Murray, M.M., Michel, C.M., Grave de Peralta, R., Ortigue, S., Brunet, D., Gonzalez Andino, S., Schnider, A., 2004. Rapid discrimination of visual and multisensory memories revealed by electrical neuroimaging. Neuroimage 21, 125–135. Noppeney, U., Josephs, O., Hocking, J., Price, C.J., Friston, K.J., 2008. The effect of prior visual information on recognition of speech and sounds. Cereb. Cortex 18, 598–609. Noppeney, U., Ostwald, D., Werner, S., 2010. Perceptual decisions formed by accumulation of audiovisual evidence in prefrontal cortex. Journal of Neuroscience. 30, (21), 7434–7446. Noppeney, U., Price, C.J., Penny, W.D., Friston, K.J., 2006. Two distinct neural mechanisms for category-selective responses. Cereb. Cortex 16, 437–445. Olivetti Belardinelli, M., Sestieri, C., Di Matteo, R., Delogu, F., Del Gratta, C., Ferretti, A., Caulo, M., Tartaro, A., G.L., R., 2004. Audio-visual crossmodal interactions in environmental perception: an fMRI investigation. Cogn. Process. 5, 167–174. Orr, J.M., Weissman, D.H., 2009. Anterior cingulate cortex makes 2 contributions to minimizing distraction. Cereb. Cortex 19, 703–711. Paus, T., 2001. Primate anterior cingulate cortex: where motor control, drive and cognition interface. Nat. Rev. Neurosci. 2, 417–424. Petkov, C.I., Kayser, C., Steudel, T., Whittingstall, K., Augath, M., Logothetis, N.K., 2008. A voice region in the monkey brain. Nat. Neurosci. 11, 367–374. Pitcher, D., Charles, L., Devlin, J.T., Walsh, V., Duchaine, B., 2009. Triple dissociation of faces, bodies, and objects in extrastriate cortex. Curr. Biol. 19, 319–324.

Rao, R.P., Ballard, D.H., 1999. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87. Rhodes, G., Byatt, G., Michie, P.T., Puce, A., 2004. Is the fusiform face area specialized for faces, individuation, or expert individuation? J. Cogn. Neurosci. 16, 189–203. Roitman, J.D., Shadlen, M.N., 2002. Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J. Neurosci. 22, 9475–9489. Rotshtein, P., Henson, R.N., Treves, A., Driver, J., Dolan, R.J., 2005. Morphing Marilyn into Maggie dissociates physical and identity face representations in the brain. Nat. Neurosci. 8, 107–113. Rotshtein, P., Vuilleumier, P., Winston, J., Driver, J., Dolan, R., 2007. Distinct and convergent visual processing of high and low spatial frequency information in faces. Cereb. Cortex 17, 2713–2724. Sadaghiani, S., Maier, J.X., Noppeney, U., 2009. Natural, metaphoric, and linguistic auditory direction signals have distinct influences on visual motion processing. J. Neurosci. 29, 6490–6499. Staeren, N., Renvall, H., De Martino, F., Goebel, R., Formisano, E., 2009. Sound categories are represented as distributed patterns in the human auditory cortex. Curr. Biol. 19, 498–502. Talairach, J., Tournoux, P., 1988. Co-planar Stereotaxic Atlas of the Human Brain. Thieme, Stuttgart. Taylor, K.I., Stamatakis, E.A., Tyler, L.K., 2009. Crossmodal integration of object features: voxel-based correlations in brain-damaged patients. Brain 132, 671–683. Tootell, R.B., Devaney, K.J., Young, J.C., Postelnicu, G., Rajimehr, R., Ungerleider, L.G., 2008. fMRI mapping of a morphed continuum of 3D shapes within inferior temporal cortex. Proc. Natl. Acad. Sci. U. S. A. 105, 3605–3609. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N., Mazoyer, B., Joliot, M., 2002. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 15, 273–289. van Atteveldt, N., Formisano, E., Goebel, R., Blomert, L., 2004. Integration of letters and speech sounds in the human brain. Neuron 43, 271–282. van Atteveldt, N.M., Formisano, E., Goebel, R., Blomert, L., 2007. Top-down task effects overrule automatic multisensory responses to letter-sound pairs in auditory association cortex. Neuroimage 36, 1345–1360. von Kriegstein, K., Giraud, A.L., 2006. Implicit multisensory associations influence voice recognition. PLoS Biol. 4, e326. von Kriegstein, K., Kleinschmidt, A., Sterzer, P., Giraud, A.L., 2005. Interaction of face and voice areas during speaker recognition. J. Cogn. Neurosci. 17, 367–376. Weissman, D.H., Warner, L.M., Woldorff, M.G., 2004. The neural mechanisms for minimizing cross-modal distraction. J. Neurosci. 24, 10941–10949. Werner, S., Noppeney, U., 2010. Distinct functional contributions of primary sensory and association areas to audiovisual integration in object categorization. J. Neurosci. 30, 2662–2675. Yoncheva, Y.N., Zevin, J.D., Maurer, U., McCandliss, B.D., 2009. Auditory selective attention to speech modulates activity in the visual word form area. Cereb. Cortex. Yuval-Greenberg, S., Deouell, L.Y., 2007. What you see is not (always) what you hear: induced gamma band responses reflect cross-modal interactions in familiar object recognition. J. Neurosci. 27, 1090–1096. Yuval-Greenberg, S., Deouell, L.Y., 2009. The dog's meow: asymmetrical interaction in cross-modal object recognition. Exp. Brain Res. 193, 603–614.