Audiovisual integration of emotional signals in voice and face: An event-related fMRI study

Audiovisual integration of emotional signals in voice and face: An event-related fMRI study

www.elsevier.com/locate/ynimg NeuroImage 37 (2007) 1445 – 1456 Audiovisual integration of emotional signals in voice and face: An event-related fMRI ...

888KB Sizes 0 Downloads 78 Views

www.elsevier.com/locate/ynimg NeuroImage 37 (2007) 1445 – 1456

Audiovisual integration of emotional signals in voice and face: An event-related fMRI study Benjamin Kreifelts,a,⁎ Thomas Ethofer,a,b Wolfgang Grodd,b Michael Erb,b and Dirk Wildgruber a,b a

Department of Psychiatry and Psychotherapy, University of Tuebingen, Osianderstrasse 24, 72076 Tuebingen, Germany Section of Experimental MR of the CNS, Department of Neuroradiology, University of Tuebingen, Tuebingen, Germany

b

Received 23 April 2007; revised 8 June 2007; accepted 25 June 2007 Available online 4 July 2007

In a natural environment, non-verbal emotional communication is multimodal (i.e. speech melody, facial expression) and multifaceted concerning the variety of expressed emotions. Understanding these communicative signals and integrating them into a common percept is paramount to successful social behaviour. While many previous studies have focused on the neurobiology of emotional communication in the auditory or visual modality alone, far less is known about multimodal integration of auditory and visual non-verbal emotional information. The present study investigated this process using event-related fMRI. Behavioural data revealed that audiovisual presentation of non-verbal emotional information resulted in a significant increase in correctly classified stimuli when compared with visual and auditory stimulation. This behavioural gain was paralleled by enhanced activation in bilateral posterior superior temporal gyrus (pSTG) and right thalamus, when contrasting audiovisual to auditory and visual conditions. Further, a characteristic of these brain regions, substantiating their role in the emotional integration process, is a linear relationship between the gain in classification accuracy and the strength of the BOLD response during the bimodal condition. Additionally, enhanced effective connectivity between audiovisual integration areas and associative auditory and visual cortices was observed during audiovisual stimulation, offering further insight into the neural process accomplishing multimodal integration. Finally, we were able to document an enhanced sensitivity of the putative integration sites to stimuli with emotional non-verbal content as compared to neutral stimuli. © 2007 Elsevier Inc. All rights reserved. Keywords: Audiovisual; Integration; Emotion; STS; Fusiform; Prosody; Facial expression; Effective connectivity

⁎ Corresponding author. Fax: +49 7071 294141. E-mail address: [email protected] (B. Kreifelts). Available online on ScienceDirect (www.sciencedirect.com). 1053-8119/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2007.06.020

Introduction Taking part in social interactions requires us to integrate a variety of different inputs from several sense organs into a single percept of the situation we are dealing with. Inability to perceive and understand non-verbal emotional signals (i.e. speech melody, facial expression, gestures) within this process will often result in impaired communication. Over the past years a plethora of neuroimaging studies have addressed the processing of faces (e.g. Haxby et al., 1994; Kanwisher et al., 1997; Sergent et al., 1992) and emotional facial expressions (e.g. Blair et al., 1999; Breiter et al., 1996; Morris et al., 1996; Phillips et al., 1997; reviewed in Posamentier and Abdi, 2003) as well as the processing of voices (e.g. Belin et al., 2000; Belizaire et al., 2007; Fecteau et al., 2004; Giraud et al., 2004; Kriegstein and Giraud, 2004) and emotional prosody (e.g. Buchanan et al., 2000; Ethofer et al., 2006b,c; George et al., 1996; Grandjean et al., 2005; Imaizumi et al., 1997; Kotz et al., 2003; Mitchell et al., 2003; Wildgruber et al., 2002, 2004, 2005; reviewed in Wildgruber et al., 2006) identifying neural networks subserving the perception of visual and auditory non-verbal emotional signals. Yet, to date only very few neuroimaging studies on audiovisual integration of non-verbal emotional communication are available (Dolan et al., 2001; Ethofer et al., 2006a,d; Pourtois et al., 2005). Behavioural studies demonstrated that congruence between facial expression and prosody facilitates reactions to stimuli carrying emotional information (De Gelder and Vroomen, 2000; Dolan et al., 2001; Massaro and Egan, 1996). This parallels findings from audiovisual integration of non-emotional information which indicate shortened response latencies and heightened perceptual sensitivity upon audiovisual stimulation (Miller, 1982; Schroger and Widmann, 1998). Moreover, affective signals received within one sensory channel can affect information processing in another. For instance, the perception of a facial expression can be altered by accompanying emotional prosody (Ethofer et al., 2006a; Massaro and Egan, 1996). Since these crossmodal biases occur mandatorily and irrespective of attention

1446

B. Kreifelts et al. / NeuroImage 37 (2007) 1445–1456

(De Gelder and Vroomen, 2000; Ethofer et al., 2006a; Vroomen et al., 2001) one might assume that the audiovisual integration of non-verbal affective information is an automatic process. This assumption gains further support in the results of electrophysiological experiments providing evidence for multisensory crosstalk during an early perceptual stage about 110–220 ms after stimulus presentation (De Gelder et al., 1999; Pourtois et al., 2000, 2002). Neuroimaging data on audiovisual integration of non-verbal emotional information highlights a stronger activation in left middle temporal gyrus (MTG) (Pourtois et al., 2005) and left posterior superior temporal sulcus (pSTS) (Ethofer et al., 2006d) during audiovisual stimulation as compared to either unimodal stimulation. These findings of activations adjacent to the superior temporal sulcus (STS) correspond well with reports on enhanced responses in pSTS during audiovisual presentation of animals (Beauchamp et al., 2004b) and tools (Beauchamp et al., 2004a,b), speech (Calvert et al., 2000; van Atteveldt et al., 2004; Wright et al., 2003) and letters (van Atteveldt et al., 2004). Moreover, data from several electrophysiological and functional neuroimaging studies (Ghazanfar et al., 2005; Giard and Peronnet, 1999; von Kriegstein and Giraud, 2006) document early audiovisual integration processes within modality-specific cortices. To date it remains controversial to which extent these audiovisual response modulations arise from feedback connections of the STS with the respective unimodal auditory and visual cortices or from direct crosstalk between auditory and visual associative cortices. So far, none of the neuroimaging studies on audiovisual integration of emotional prosody and facial expression employed dynamic visual stimuli. Also, stimulus material in these studies portrayed only two exemplary emotions (happy, fearful) and did not contain emotionally neutral stimuli. In the present study, we used functional magnetic resonance imaging (fMRI) to delineate the audiovisual integration site for dynamic non-verbal emotional information. Participants were scanned while they performed a classification task on audiovisual (AV), auditory (A) or visual (V) presentation of people speaking single words expressing different emotional states in voice and face. Dynamic stimulation in combination with a broad variety of emotions was chosen in order to approximate real life conditions of social communication. The classification task was applied to ascertain constant attention to the stimuli and to acquire a behavioural measure of the audiovisual integration effect. In a prestudy, stimuli used in the fMRI experiment were tested for a relevant behavioural integration effect during the bimodal condition. This experiment was designed to investigate audiovisual integration of nonverbal communication signals in the context of an explicit emotional categorization task. This, then embraces the class of stimuli with emotionally neutral prosody and facial expression as an “emotional” category. In other words, the main focus of the present study lies on audiovisual integration of nonverbal communication under the top-down influence of an emotional classification task, rather than on bottom-up (stimulus driven) effects of emotional content on the audiovisual integration of non-verbal communication. For the identification of brain regions contributing to multimodal integration of emotional signals, responses during bimodal stimulation were compared to both unimodal conditions. Areas characterized by stronger responses to audiovisual than to either unimodal stimulation were considered candidate regions for the integration process.

We expected that in such a region, associated with audiovisual integration, the gain in classification accuracy under audiovisual stimulation as compared to either unimodal stimulation might be paralleled by increased cerebral activation. Moreover, Macaluso and colleagues (2000) demonstrated that a perceptual gain during bimodal as compared to unimodal stimulation was paralleled by enhanced effective connectivity between associative sensory cortices and supramodal integration sites for vision and touch. Accordingly, we expected enhanced effective connectivity between putative audiovisual integration areas and voice-sensitive (Belin et al., 2000) as well as face-sensitive (Haxby et al., 1994; Kanwisher et al., 1997; Sergent et al., 1992) areas during the audiovisual condition as compared to either unimodal condition as a possible correlate of the perceptional gain in congruent audiovisual integration. A final point of interest was, if the putative integration sites exhibit a different sensitivity to stimuli with emotional prosody/ facial expression as compared to stimuli with neutral prosody/ facial expression. In summary, our fMRI study was designed to delineate brain areas specifically involved in the process of audiovisual integration of dynamic emotional signals from voice and face on the basis that they exhibit stronger responses to audiovisual than to either unimodal stimulation. Further expectations on the response pattern of regions subserving audiovisual integration of nonverbal emotional communication were: a) increase of cerebral responses in correlation with gain of classification accuracy under audiovisual stimulation as compared to either unimodal stimulation; b) enhanced effective connectivity with voice-sensitive and facesensitive cortices during bimodal stimulation as compared to either unimodal stimulation. In order to gain valuable add-on information about a possible differential sensitivity of the integration sites to the emotionality of stimulus content, we compared the responses to stimuli with emotional non-verbal content with those to stimuli with neutral non-verbal content. Based on recent neuroimaging studies on audiovisual integration (Beauchamp et al., 2004a,b; Calvert et al., 2000; Ethofer et al., 2006d; van Atteveldt et al., 2004; Wright et al., 2003) we hypothesised that a region featuring the aforementioned characteristics might be located in the pSTS.

Materials and methods Subjects Thirty right-handed subjects (15 male, 15 female; mean age 23, S.D. 3 years) participated in the behavioural prestudy. Twenty-four right-handed volunteers (12 male, 12 female, mean age 26, S.D. 5 years) who did not take part in the behavioural experiment were included in the fMRI experiment. All of the participants were native speakers of German language and had neither history of neurological or psychiatric illness nor of substance abuse or impaired vision or hearing. None of the participants was on a medication. Handedness was assessed using the Edinburgh

B. Kreifelts et al. / NeuroImage 37 (2007) 1445–1456

Inventory (Oldfield, 1971). The study was performed according to the Code of Ethics of the World Medical Association (Declaration of Helsinki) and the protocol of human investigation was approved by the local ethics committee. All subjects gave their written informed consent prior to inclusion in the study. Stimuli The original stimulus material consisted of single words spoken in either neutral or one of six emotional intonations (alluring, angry, disgusted, fearful, happy, or sad) with a congruent emotional facial expression. The words were spoken by two female and two male professional actors and captured in a two-second video sequence with a resolution of 720 × 576 pixels. Stimuli were presented under three experimental conditions: auditory (A), visual (V) and audiovisual (AV). Participants were instructed to judge stimuli according to the expressed emotional category using only non-verbal cues (prosody, facial expression) for their decision while completely disregarding semantic information. To ensure an adequate proportion of correct classifications in the final stimulus set a body of 630 stimuli (30 words × 7 emotional categories × 3 experimental conditions) was tested in a prestudy. This prestudy yielded 61% (A), 72% (V) and 83% (AV) correct classifications with P b 0.001 for AV NA and AV N V indicating a significant behavioural gain through crossmodal integration in the classification task. Additionally, participants were asked to rate the semantic valence of the stimulus words on a nine point scale (1 = highly positive, 5 = neutral, 9 = highly negative) in a questionnaire. From the original body of stimuli eight words with the highest percentage of correct classifications in the AV condition were selected under the restriction that they were balanced for identity and gender of actor, semantic valence and number of syllables. Proportions of correct classifications for the final stimulus set in the prestudy were: 57% (A), 70% (V), 86% (AV). Each actor spoke

1447

an equal number of words (4 actors × 2 words × 7 emotional categories), each word consisting of two syllables (mean duration: 733 ms, standard deviation: 368 ms). Half of the stimuli were neutral in word content while the other half was positively or negatively valenced in equal parts (mean valence scores ± S.D.: neutral: 4.9 ± 0.4, positive: 1.3 ± 0.7, negative: 8.0 ± 0.9). These 56 video sequences were presented under three different conditions (A, V, AV) totaling in 168 stimuli per subject. Experimental design For the fMRI experiment, we employed an event-related design consisting of four sessions with 42 trials each. Stimuli in the auditory and audiovisual conditions were presented binaurally via magnetic resonance compatible headphones with piezoelectric signal transmission (Jancke et al., 2002). In the visual and audiovisual conditions the video sequences were back-projected (NEC MT 1030+) onto a translucent screen (picture size ∼ 80 × 65 cm) in the scanner room placed approximately 2.5 m from the subject's head. The participants viewed the stimuli via a mirror system mounted on the head coil. The experimenter ascertained that every participant was able to see the whole screen. The order of stimulus presentation was balanced and pseudorandomized over sessions and randomized within sessions. Stimulus onset was jittered relative to the scan onset in steps of 0.5 s resulting in inter stimulus intervals ranging from 20 to 24 s (10–12 repetition times (TR)). The classification task was performed on a circular scale with seven categories which was shown for 6 s directly after stimulus offset (see Fig. 1). Participants conveyed their decision via a fiber optic system which allowed them to move a white dot on the circular scale clockwise or counter-clockwise by pressing corresponding buttons in their right or left hands. To avoid lateralization effects caused by motor responses or possible laterality effects in the perception of emotionally valenced information the

Fig. 1. Experimental design. Stimuli, balanced for experimental condition (A, V, AV) and emotional category were presented across four imaging sessions comprising 485 EPI images each. Stimulus presentation was randomized within sessions. Inter stimulus intervals ranged from 20 to 24 s. The classification task was performed on a circular scale with seven emotional categories (“EROTIK” = eroticism, “FREUDE” = happiness, “ANGST” = fear, “EKEL” = disgust, “AERGER” = anger, “TRAUER” = sadness, “NEUTRAL” = neutral). The scale was shown for 6 s, 250 ms after stimulus offset.

1448

B. Kreifelts et al. / NeuroImage 37 (2007) 1445–1456

arrangement of categories on the scale was flipped in horizontal direction for half of the subjects as was the association between right/left button and the clockwise/counter-clockwise direction of dot movement. The starting point of the white dot was randomized for every trial. To accustom the participants to the use of the circular scale a short training experiment was run outside the scanner. Analysis of behavioural data In order to estimate the subjects' performance in correctly classifying the different stimuli, we applied the unbiased hit rate (Hu) which was devised to include false alarms and biases in the use of response categories in the analysis of nonverbal behaviour (Wagner, 1993). This measure is defined as “the joint probability that a stimulus category is correctly identified given that it is presented at all and that a response is correctly used given that it is used at all”. The unbiased hit rate is obtained by multiplying the raw hit rate with the positive predictive value. Thus, this behavioural measure does not only capture how sensitive, but also how specific a categorization task is carried out, resulting in a more precise estimate of the behavioural performance. Separate paired t-tests were used to compare Hu for the bimodal condition to either one of the unimodal conditions within every emotional category. In order to test for interactions between experimental condition and stimulus type on the behavioural level, results of the categorization task were submitted to a two-way analysis of variance (ANOVA) for repeated measures with experimental condition (A, V, AV) and stimulus type (emotional vs. neutral) as within-subject factors. All resulting P values were corrected for heterogeneous correlations (Geisser and Greenhouse, 1958). Independent samples t-tests for heterogeneous variances were applied to compare classification performance between the preparatory behavioural study and the functional imaging study in all three experimental conditions. Image acquisition Functional MR images covering the whole cerebrum (field of view [FOV] = 192 mm × 192 mm, 24 slices, 4 mm slice thickness and 1 mm gap) were acquired on a 1.5 T whole body scanner (Siemens AVANTO; Siemens, Erlangen, Germany) using an echoplanar imaging (EPI) sequence (TR = 2 s, echo time [TE] = 40 ms, matrix = 642, and flip angle = 90°). High-resolution T1-weighted images were obtained using a magnetization prepared rapid acquisition gradient echo (MPRAGE) sequence (FOV = 256 mm × 256 mm, 176 slices, 1-mm slice thickness, no gap, flip angle 15°, TR = 1980 ms, TE = 3.93 ms, and matrix size = 2562). To enable offline correction of distortions of the EPI images, a static field map (TR = 487 ms, TEs = 5.28, and 10.04 ms) was acquired in all subjects prior to the functional measurements. Image analysis Image analysis was carried out with SPM2 software (Wellcome Department of Imaging Neuroscience, London, UK; http://www. fil.ion.ucl.ac.uk/spm). The first five EPI images of each session were discarded to exclude measurements preceding T1 equilibrium. Preprocessing of the functional MR images included motion correction, unwarping by use of a static field map, slice time correction to the middle slice (12th slice) and coregistration with

the anatomical data. The transformation matrix for normalization to Montreal Neurological Institute (MNI) space (Collins et al., 1994) was calculated based on the structural T1-weighted 3-D data set of each subject and subsequently applied to the functional images. Before statistical analysis, the functional MR images were smoothed using a Gaussian filter with 10-mm full width half maximum (FWHM). Separate regressors were defined for each trial using a stick function convolved with the hemodynamic response function. Events were time-locked to stimulus onset. To correct for low-frequency components, a high-pass filter with a cutoff frequency of 1/256 Hz was used. Serial autocorrelations of fMRI data were accounted for by modeling the error term as an autoregressive process with a coefficient of 0.2 (Friston et al., 2002) and an additional white noise component (Purdon and Weisskoff, 1998). To identify brain regions showing stronger responses during the bimodal condition (AV) than during either unimodal condition a conjunction analysis (AV NA) ∩ (AV N V) was used. Statistical parametric maps were thresholded according to a conjunction null-hypothesis (Nichols et al., 2005). Statistical evaluation of group data was based on a second-level random effects analysis. Activations are reported at a height threshold of P b 0.001, uncorrected and an extent threshold of k N 10 voxels. Only clusters exceeding an extent threshold of k N 70 voxels, corresponding to P b 0.05, corrected for multiple comparisons across the whole brain were considered for further analysis. To rule out the possibility that the observed integration effect was driven by stimuli from a single category or a minority of categories, we calculated the integration term (AV NA) ∩ (AV N V) for every emotional category. This was accomplished by comparing the parameter estimates from the respective peak activation voxels of the candidate regions to the parameter estimates from each of the two unimodal conditions within every emotional category using paired t-tests. In a similar fashion, effect size for emotional and neutral stimuli was compared using β-estimates from the most significantly activated voxel within the candidate regions. Data were subjected to a two-way ANOVA for repeated measures with experimental condition (A, V, AV) and stimulus type (emotional vs. neutral) as within-subject factors. Resulting P values were corrected for heterogeneous correlations (Geisser and Greenhouse, 1958). Separate post-hoc paired t-tests were employed for the comparison of parameter estimates for neutral and emotional stimuli within each experimental condition. To test behavioural and imaging data for a relationship concerning the audiovisual integration effect, the behavioural integration effect, defined as difference between hit rate (H) in the AV condition and the maximum hit rate of the A and V conditions (HAV − max(HA,HV)) was calculated for every AV stimulus. The simple hit rate (H, correctly classified stimuli / total number of stimuli) was used in this instance because it is impossible to calculate the Hu for a single stimulus. Behavioural integration effect and parameter estimates for the AV condition in the candidate regions were submitted to a simple regression analysis. Resulting regression slopes for single subjects were compared to zero in a one-sample t-test. Finally, six psychophysiological interaction analyses (PPI, Friston et al., 1997) were performed to locate brain regions showing enhanced connectivity with putative audiovisual integration areas during audiovisual stimulation as compared to either unimodal condition. Two separate PPIs (AV NA and AV N V) were computed for each cluster significantly activated in the initial

B. Kreifelts et al. / NeuroImage 37 (2007) 1445–1456

conjunction analysis (AV NA) ∩ (AV N V) comprising all stimuli regardless of emotional category. In these analyses, the timecourse of the BOLD response in the putative integration areas, based on a sphere with a radius of 3 mm around the individual peak-activation voxel as determined within a radius of 8 mm from the peak-activation voxel of the second-level analysis of the conjunction (AV NA) ∩ (AV N V), were defined as physiological variable. The psychophysiological interaction was calculated as the product of the deconvolved activation time course (Gitelman et al., 2003) and the vector of the psychological variable (AV NA or AV N V). The deconvolution procedure enables the investigation of interactions at the neuronal level and is especially useful in an experimental setting with low frequency stimulation like eventrelated designs with long inter-stimulus intervals. The results of the PPI analyses were then submitted to a conjunction analysis (AV NA) ∩ (AV N V). Each PPI analysis was based on a general linear model with separate regressors for the psychological, physiological, and psychophysiological variable. Parallel to the initial conjunction analysis the statistical evaluation of group data was based on a second-level random effects analysis. Results are reported on a height threshold of P b 0.001 (uncorrected) and an extent threshold of k N 65 voxels corresponding to P b 0.05, corrected for multiple comparisons across the whole brain. In a second step regions of interest (ROI) were defined on the basis of studies describing the location of voice-sensitive (Belin et al., 2000; Fecteau et al., 2004) and face-sensitive (Haxby et al., 1994; Kanwisher et al., 1997; Sergent et al., 1992) areas within the human brain. These regions of interest comprising the superior temporal and fusiform gyrus were determined with the automated anatomic labeling tool integrated in the SPM software (Tzourio-Mazoyer et al., 2002). Significance levels of the results from the PPI conjunction analysis were recalculated using a small volume correction (Worsley et al., 1996) based on the predefined ROIs. In the single case of the PPI analysis with the right thalamus as probe region, the statistical threshold for the ROI analysis within the right STG was lowered to P b 0.01 in order to test for enhanced effective connectivity with heightened sensitivity. Results Behavioural data Mean unbiased hit rates (Hu) in the classification task (± standard error of the mean, S.E.M.) were 0.35 ± 0.02 (A), 0.58 ± 0.02 (V) and 0.76 ± 0.02 (AV) corresponding to 56% (A), 75% (V) and 86% (AV) correct classifications. A two-way ANOVA with experimental condition (A, V, AV) and stimulus type (emotional vs. neutral) as within-subject factors indicated significant differences in the data regarding experimental condition (F(43.2,1.8) = 94,0, P b 0.001) and stimulus type (F(24,1) = 24,0, P b 0.001) while there was no significant interaction between these factors (F(40.8,1.7) = 2,0, P N 0.05). Separate t-tests evidenced a significantly better classification performance in the bimodal condition than in either of the unimodal conditions: AV NA (T(23) ≥ 6.8, P b 0.001) and AV N V (T(23) ≥ 2.9, P ≤ 0.009) for any of the seven emotional categories. The comparison between Hu in the pilot study (A: 0.4 ± 0.03, V: 0.57 ± 0.04 and AV: 0.78 ± 0.04) and in the functional imaging study, as assessed by independent samples ttests, showed no significant differences in any of the experimental conditions: T ≤ 1.7, ≥ 41 df and P ≥ 0.05.

1449

fMRI analysis The conjunction analysis (AV NA) ∩ (AV N V) revealed several brain areas including bilateral posterior STG (pSTG), right thalamus, bilateral temporal pole, right hippocampus and right posterior cingulum (see Table 1, Figs. 2 and 3) with stronger activation during the bimodal condition (AV) than during either of the unimodal conditions (A, V). Activation in bilateral pSTG and right thalamus survived correction for multiple comparisons across the whole brain. A two-way ANOVA with experimental condition (A, V, AV) and stimulus type (emotional vs. neutral) as withinsubject factors, calculated on the contrast estimates from the peak activation voxels of these clusters, indicated a significant main effect of experimental condition (all F ≥ 8.8, all P ≤ 0.001), and significant main effect of stimulus type (left pSTG: F = 12.0, 1 df, P = 0.002; right pSTG: F = 31.5, 1 df, P b 0.001; right thalamus: F = 5.9, 1 df, P = 0.02) in all three candidate regions. There was no interaction between experimental condition and stimulus type in any of the candidate regions (F ≤ 0.1, P ≥ 0.83). Separate inspection of the parameter estimates for the single conditions and emotional categories revealed that this integration effect (AV NA ∩ AV N V) was significant for each and every emotional category in the left pSTG (see Fig. 2d, Table 2), while it was significant for six out of seven categories in the right pSTG (see Fig. 2c, Table 2). In the right thalamus the integration effect was significant for three out of seven categories (Fig. 3b, Table 2). A post-hoc analysis of the main effect for stimulus type showed that activation to emotional stimuli was stronger than to neutral stimuli in every experimental condition for bilateral pSTG (left: T ≥ 2.1, P ≤ 0.03; right: T ≥ 3.1, P ≤ 0.003; see Figs. 2g and h), whereas the differences in right thalamus were significant for auditory and audiovisual stimuli and exhibited a clear trend towards significance for the visual condition (T ≥ 1.6, P ≤ 0.06; Fig. 3c). Regression of AV contrast estimates on the behavioural integration effect (HAV − max(HA,HV)) indicated a significant relation between the physiological and the psychological measure within left pSTG (P = 0.05) and right thalamus (P = 0.02) and a strong tendency towards significance within the right pSTG (P = 0.08, for details see Table 3, Figs. 2i, j and 3d). The conjunction analysis on the two PPI analyses performed separately for each cluster significantly activated in the conjunction analysis (AV NA) ∩ (AV N V) revealed a network of brain regions exhibiting enhanced effective connectivity with both pSTG and right thalamus during the process of audiovisual integration including posterior and inferior parts of the occipital lobes extending

Table 1 Increase of activation during audiovisual integration Anatomical definition

MNI coordinates

Z score

Cluster size

Left pSTG Right thalamus Right pSTG Right temporal pole Left temporal pole Right hippocampus Right posterior cingulum

− 54 − 51 18 12 −27 6 51 −33 9 54 6 − 21 − 48 15 −27 21 −12 −9 3 − 48 27

4.78 4.25 4.14 4.13 4.05 3.46 3.35

96 ⁎ 118 ⁎ 165 ⁎ 19 14 15 12

Results of the conjunction analysis of activation during audiovisual stimulation as compared to auditory and visual stimulation: (AV NA) ∩ (AV N V); cluster size k N 10. ⁎ P b 0.05, corrected for multiple comparisons across the whole brain.

1450

B. Kreifelts et al. / NeuroImage 37 (2007) 1445–1456

Fig. 2. Increased activation during audiovisual stimulation as compared to either auditory or visual stimulation ((AV NA) ∩ (AV N V)) within right (a) and left (b) pSTG (P b 0.001, uncorrected, cluster size k N 70, corresponding to P b 0.05 corrected). Inspection of contrast estimates for auditory (red), visual (green) and audiovisual (blue) stimulation reveals significant (P ≤ 0.05) integration effect within bilateral pSTG for all emotional categories with the exception of happiness in right pSTG (c + d). Asterisks mark significant differences. Event-related responses for auditory (red), visual (green) and audiovisual (blue) stimulation depicting a stronger and slightly prolonged activation for bimodal stimulation in right (e) and left (f) pSTG. Error bars represent S.E.M. Both regions exhibit stronger responses to emotional than to neutral stimuli (P ≤ 0.05) under every experimental condition (A, V, AV) (g + h). Positive correlation between contrast estimates during the AV condition and behavioural gain, estimated as the difference between classification hit rate during the bimodal condition and the maximum of hit rates during the unimodal conditions, was significant (P b 0.05) over subjects in left pSTG (i) and showed a tendency versus significance (P = 0.08) in right pSTG (j). Data shown from a typical subject.

into the inferior temporal cortex, superior temporal gyri, medial frontal gyri, precentral gyri and dorsolateral prefrontal cortex, intraparietal sulcus and anterior insula extending into the inferior frontal gyrus (see Table 4). The ROI approach confirmed that effective connectivity between pSTS and ipsilateral fusiform gyrus, as well as connectivity between pSTS and STG as predefined regions of interest was significantly enhanced during audiovisual integration (see Table 5, Fig. 4). For the right thalamus, effective connectivity with the fusiform gyrus was enhanced parallel to the findings for the integration sites in the pSTS, but enhanced effective connectivity with the right STG could only be confirmed using a more lenient threshold of P b 0.01 (see Tables 4 and 5).

Discussion Behavioural integration At the behavioural level we found a perceptual gain in the emotional classification task when contrasting the bimodal condition to either of the unimodal conditions. This is in good keeping with results from a series of experiments conducted by De Gelder and Vroomen (2000) which demonstrate that emotional facial expressions are more easily classified if accompanied by congruent emotional prosody. One of the most noticeable differences between the present study and the one performed by de Gelder and Vroomen is the multitude of emotional categories

B. Kreifelts et al. / NeuroImage 37 (2007) 1445–1456

1451

Fig. 3. Increased activation during audiovisual stimulation as compared to either auditory or visual stimulation ((AV NA) ∩ (AV N V)) within right thalamus (P b 0.001, uncorrected, cluster size k N 70, corresponding to P b 0.05 corrected) (a). Inspection of contrast estimates for auditory (red), visual (green) and audiovisual (blue) stimulation reveals significant (P ≤ 0.05) integration effect within right thalamus for three emotional categories and a non-significant tendency towards enhanced activation during the audiovisual condition in the other four categories (b). Asterisks mark significant differences. Stronger responses to emotional than to neutral stimuli (P ≤ 0.05) under auditory and audiovisual stimulation (A, AV) and a tendency versus significance during visual stimulation (P = 0.06) (c). A positive correlation between contrast estimates during the AV condition and behavioural gain, estimated as the difference between classification hit rate during the bimodal condition and the maximum of hit rates during the unimodal conditions, was significant (P b 0.05) over subjects. Data shown from a typical subject (d). Event-related responses for auditory (red), visual (green) and audiovisual (blue) stimulation depicting a stronger and slightly prolonged activation for bimodal stimulation (e).

non-verbal emotional signals. This is important in so far, as it is unlikely that the application of sparse temporal sampling (Hall et al., 1999) to our experimental paradigm would have improved stimulus comprehension decisively. On the contrary, sparse sampling noticeably decreases the temporal resolution of the BOLD response measurement, which could be detrimental in an eventrelated design.

employed in the present study. Yet, the facilitation effect in stimulus classification observed during the audiovisual condition was detectable for every single emotional category, indicating the comprehensive nature of this integration process. In the fMRI study, using the same paradigm as the behavioural prestudy, comparable hit rates indicated that the scanning procedure did not significantly impair the comprehension of the

Table 2 Statistical analysis of hemodynamic responses during bimodal as compared to either unimodal stimulation within bilateral pSTG and right thalamus Right thalamus [12 − 27 6]

Anatomical region [MNI coordinates]

Left posterior superior temporal gyrus Right posterior superior temporal [−54 − 51 18] gyrus [51 − 33 9]

Experimental conditions

AV NA

Emotional category

T (df = 23)

P value

T (df = 23)

P value

T (df = 23)

P value

T (df = 23)

P value

T (df = 23)

P value

T (df = 23)

P value

Anger Disgust Eroticism Fear Happiness Sadness Neutral

5.2 1.7 3.1 1.8 2.3 1.9 3.4

b0.001 0.05 0.005 0.04 0.02 0.04 0.002

2.1 2.6 2.9 3.1 2.4 2.5 3.6

0.03 0.01 0.008 0.005 0.01 0.01 0.002

4.0 2.0 2.7 4.7 0.1 3.0 2.7

0.001 0.03 0.006 b0.001 1 0.003 0.006

4.7 6.4 5.8 4.3 4.7 5.4 7.2

b0.001 b0.001 b0.001 b0.001 b0.001 b0.001 b0.001

2.6 1.4 2 2.9 1.6 3.1 2.3

0.01 0.08 0.03 0.004 0.06 0.003 0.02

0.4 1.1 2.3 1.5 3.0 4.1 1.9

0.33 0.15 0.02 0.07 0.004 b0.001 0.04

AV N V

AV NA

Results of the comparisons within each emotional category as detailed below. All P values one-tailed.

AV N V

AV NA

AV N V

1452

B. Kreifelts et al. / NeuroImage 37 (2007) 1445–1456

Table 3 Significant positive relationship between behavioural gain in classification accuracy (HAV − max(HA,HV)) and AV contrast estimates within candidate regions for audiovisual integration of non-verbal emotional information Anatomical definition

MNI coordinates

Left pSTG

− 54 − 51 18 12 − 27 6 51 − 33 9 54 6 − 21 − 48 15 − 27 21 − 12 − 9 3 − 48 27

Right thalamus Right pSTG Right temporal pole Left temporal pole Right hippocampus Right post. cingulum

Mean regression slope ± S.E.M.

T (df = 23)

P value

0.94 ± 0.54

1.7

0.05

0.82 ± 0.40 0.53 ± 0.36 −0.18 ± 0.70 0.22 ± 0.58 0.12 ± 0.61 −0.9 ± 0.53

2.1 1.5 − 0.3 0.4 0.2 − 1.7

0.02 0.08 0.60 0.35 0.42 0.95

All P values one-tailed.

fMRI correlates of audiovisual integration of non-verbal signals—conjunction analysis and connection with anatomical and electrophysiological studies The behaviourally measured perceptual gain during the audiovisual condition was paralleled by enhanced BOLD responses in bilateral pSTG adjacent to the pSTS and in the right thalamus when comparing the bimodal condition to either unimodal condition. This finding of enhanced responses in areas bordering on the pSTS during audiovisual integration of non-verbal emotional information is in agreement with earlier reports by Pourtois et al. (2005) and Ethofer et al. (2006b) who found stronger activations to audiovisual presentation of non-verbal emotional information in left MTG (Pourtois et al., 2005) and left pSTS (Ethofer et al., 2006d). In addition, our findings of enhanced activations adjacent to the pSTS during audiovisual presentation of non-verbal emotional signals parallel results from studies on audiovisual integration of speech (Calvert et al., 2000; van Atteveldt et al., 2004; Wright et al., 2003). On the basis that we used words as carrier for emotional speech melody, one might ask now, if the activation found in the bilateral pSTG might simply be the product of multimodal integration of spoken words. However, the results of the regression analysis clearly indicate a relationship between the behavioural integration of non-verbal emotional information and the hemodynamic response in these regions. In addition, one has to bear in mind, that also the audiovisual integration of letters (van Atteveldt et al., 2004), as well as of animals (Beauchamp et al., 2004b) and tools (Beauchamp et al., 2004a,b) is associated with activations of the pSTS, suggesting that this region may play a fundamental role in audiovisual integration of stimuli from many different classes. This notion is further supported by neuroanatomical studies demonstrating that the STS constitutes a convergence zone for projections from auditory and visual sensory cortices (Jones and Powell, 1970; Seltzer and Pandya, 1978) and conforms to the electrophysiological “integration rules” formulated by Stein and colleagues (Stein and Meredith, 1993), and thus is well suited to integrate auditory and visual signals. Another neuronal structure which meets these anatomical (Mufson and Mesulam, 1984) and electrophysiological (Komura et al., 2005) integration criteria is the thalamus. Prior to the present study Ethofer and colleagues (2006b) have found a stronger activation to audiovisual than to auditory or visual emotional stimuli in the right thalamus. The fact that the activation found by Ethofer and colleagues did not reach significance as in the present study may be due to differences in stimulus material (static vs. dynamic visual stimulation) and statistical power (24 vs. 12 subjects).

Further investigation of parameter estimates within the putative temporal and thalamic integration sites brought about three results which allow us to define the function of the respective region during the process of audiovisual integration more exactly. In the following, we will discuss these results in detail. fMRI correlates of audiovisual integration of emotional signals—parametric analysis The integration effect (AV NA) ∩ (AV N V) was significant for all seven emotional categories in the left pSTG and for six out of seven categories in the right pSTG. In the right thalamus we found a constant tendency versus enhanced audiovisual activation, although non-significant in some cases. Even surpassing these obvious similarities regarding the pattern of cerebral activation and behavioural categorization performance across emotional categories, we found a linear relationship between behavioural and imaging data which was significant in the left pSTG and right thalamus and close to significant in the right pSTG. This relationship is a clear argument for a participation of these three brain regions in the integration process, as it directly links the gain in emotional categorization performance following audiovisual stimulus presentation to the cerebral responses during this experimental condition. Yet, our measure for the behavioural integration effect has to be critically addressed: The dichotomic nature of the task response (correct vs. incorrect) makes it impossible to use this measure in a regression analysis without prior parameterization. This was achieved by stimulus-wise averaging the integration effect

Table 4 Increase of effective connectivity during audiovisual integration Anatomical definition

MNI coordinates Z score Cluster size

Right posterior superior temporal sulcus [51 −33 9] (source area of PPI) Right middle occipital gyrus 30 − 84 6 4.63 723 Left middle occipital gyrus − 24 −90 6 4.57 780 Bilat. supplementary motor area 0 9 48 4.19 291 Left anterior insula − 30 27 3 3.95 71 Right precentral gyrus 48 − 3 54 3.84 96 Right superior temporal gyrus 66 − 12 3 3.8 86 Left superior parietal gyrus − 30 −60 54 3.65 83 Left posterior superior temporal sulcus [− 54 − 51 18] (source area of PPI) Left superior temporal gyrus − 60 −39 18 4.89 278 Bilat. supplementary motor area 6 12 54 4.48 427 Left lingual gyrus − 24 −75 −3 4.45 1774 ⨽Right middle occipital gyrus 24 − 87 6 4.33 Right superior temporal gyrus 60 − 33 15 3.86 236 Right inferior frontal gyrus 54 24 − 3 3.83 113 Right precentral gyrus 39 0 42 3.79 112 Right inferior frontal gyrus 48 15 24 3.79 77 Left precentral gyrus − 30 −6 57 3.79 99 Left inferior parietal gyrus − 33 −51 54 3.65 79 Right thalamus [12 − 27 6] (source area of PPI) Left fusiform gyrus − 24 −78 −6 Right lingual gyrus 15 − 84 − 9 Bilat. supplementary motor area 0 15 45

3.81 3.59 3.47

249 300 69

Results of the PPI conjunction analysis (AV NA) ∩ (AV N V) locating areas with stronger effective connectivity to pSTG under bimodal stimulation than under either unimodal stimulation. Results given for separate analyses of right and left pSTG as well as right thalamus with P b 0.05, corrected for multiple comparisons across the whole brain.

B. Kreifelts et al. / NeuroImage 37 (2007) 1445–1456 Table 5 Increased effective connectivity during audiovisual integration Anatomical definition

MNI coordinates

Z score

Cluster size

Right posterior superior temporal sulcus [51 − 33 9] (source area of PPI) Right superior temporal gyrus 66 − 12 3 3.8 86 Right fusiform gyrus 36 − 69 − 9 3.77 114 Left posterior superior temporal sulcus [− 54 −51 18] (source area of PPI) Left superior temporal gyrus −60 −39 18 4.89 165 Left fusiform gyrus −24 −81 − 6 4.43 100 Right thalamus [12 −27 6] (source area of PPI) Right fusiform gyrus 24 − 72 − 6 Right superior temporal gyrus 57 − 24 6

3.58 2.94

66 88†

Results of the PPI conjunction analysis (AV NA) ∩ (AV N V) locating areas with stronger effective connectivity to bilateral pSTS and right thalamus under bimodal stimulation than under either unimodal stimulation within predefined ipsilateral regions of interest with P b 0.05, corrected. † Cluster size given at P b 0.01, uncorrected, instead of P b 0.001.

across subjects. Unfavorably, this step dilutes the function of the parameter as a measure for the individual integration effect, since it may introduce variance into the data which cannot be explained in a single subject analysis. Thus, the parameterization itself reduces the probability of detecting an existing correlation. A second factor which might impede the detection of a correlation is the definition of the integration effect as the minimum of differences between audiovisual and auditory and between audiovisual and visual stimulation. On the one hand, this is the change in behaviour which cannot be explained otherwise, but on the other hand this conservative approach might underestimate the integration process. Consequentially, these limitations of the analytical strategy could be a reason why the observed linear relationship between BOLD response and behavioural integration effect was nonsignificant in the right pSTG. Concerning future research on nonverbal emotional communication, limitations, evident in the present analytical strategy, document the necessity to find more elaborate approaches for the parametric assessment of integration effects, including reaction times in forced-choice paradigms, not exceeding two or three alternative responses, and the parametric modification of stimulus material.

1453

tion (Cunnington et al., 2002; Elsinger et al., 2006, Rushworth et al., 2004). Then, their enhanced effective connectivity with the audiovisual integration areas might represent a correlate of the behavioural gain observed when comparing the bimodal condition to either unimodal condition. However, since the design of the present study was clearly focused on perceptual and not on decisional processes, this interpretation remains speculative and offers a starting point for future research. Sensitivity to emotional stimulus content within audiovisual integration sites A further interesting, point which provides support for the notion that non-verbal emotional information is processed in bilateral pSTG and right thalamus, is that these areas exhibit a significantly stronger reaction to stimuli with emotional non-verbal content than to stimuli with neutral non-verbal content in every experimental condition (A, V, AV) with the exception of visual stimuli in the right thalamus, which scarcely failed to reach significance. However, one has to be aware that using an experimental paradigm including a classification task can influence such a response pattern. Previous studies have shown that brain structures sensitive to emotional information exhibit a more pronounced response during implicit stimulus processing (Hariri et al., 2000, 2003; Lange et al., 2003). Thus, one might ask if the integration areas, delineated in this study, are activated differently during explicit and implicit processing of signals with emotional nonverbal content, or in other words, if an interaction between stimulus emotionality and type of task can be found in bilateral pSTG and right thalamus during audiovisual integration. This question remains to be addressed in future research, most favorably within the framework of a factorial design.

Enhanced effective connectivity during audiovisual integration of non-verbal information Following the observation of Macaluso and colleagues (2000) that a perceptual gain during visuohaptic stimulation was accompanied by enhanced effective connectivity between the respective associative sensory cortices and a supramodal region in the parietal lobe, we hypothesised that a similar enhancement of connectivity would occur between associative auditory and visual cortices in STG and fusiform gyrus and audiovisual integration areas for nonverbal emotional information. Based on the results of the effective connectivity analysis we were able to confirm this hypothesis. Moreover, we detected that bimodal stimulation enhanced effective connectivity between the audiovisual integration sites and several other regions including supplementary and premotor areas, dorsolateral prefrontal cortex, intraparietal sulcus and insula. One possible explanation for the emergence of these areas in the context of the connectivity analysis is that they are part of a network of brain regions subserving response selection and response prepara-

Fig. 4. Enhanced effective connectivity between right pSTG and right middle STG (y = −15) (a) as well as right fusiform gyrus (y = −54) (b) during audiovisual stimulation as compared to either unimodal stimulation. (c) and (d) demonstrate a similar enhancement in connectivity between left pSTG and left middle STG (y = − 18) (c) as well as left fusiform gyrus (y = −57) (d). Activations thresholded at P b 0.001 and cluster size k N 65, corresponding to P b 0.05, corrected. Red contours mark regions of interest (STG and fusiform gyrus).

1454

B. Kreifelts et al. / NeuroImage 37 (2007) 1445–1456

Linking of the present study to current models on audiovisual integration Comparing the three areas activated in the initial conjunction analysis on the basis of the results from the subsequent analyses, we find that only the left pSTG fully conforms to our a priori assumptions about characteristics of a brain region which integrates emotional audiovisual information. The right thalamus, and even more so the right pSTG, show at least a tendency towards the postulated effect in all tests, while not reaching significance level on all occasions. Again, this is in concordance with prior studies on audiovisual integration of non-verbal emotional signals (Ethofer et al., 2006d; Pourtois et al., 2005) which found the most consistent activations in the left temporal lobe. Returning to the question, if audiovisual integration is derived rather from the interaction between unimodal auditory and visual and higher order supramodal cortices or from direct crosstalk between auditory and visual associative cortices, our results chiefly support the former model. Yet, one has to bear in mind that convincing evidence exists also for the latter concept (Ghazanfar et al., 2005; Giard and Peronnet, 1999; von Kriegstein and Giraud, 2006). Thus, coexistence of both pathways within a network comprising feedforward as well as feedback and lateral connections between unimodal and supramodal sensory cortices as suggested recently by Foxe and Schroeder (2005) might offer an explanation for these conflicting findings. Models on sensory integration and perceptual binding (John, 2002; Llinas and Ribary, 2001) implicate both, cortex and thalamus in these processes on the basis of findings from single cell, EEG and MEG studies. These models propose synchronized oscillations in thalamo-cortical feedback loops as the correlate of binding several sensory inputs into a common percept. Applying this theory to our results, parallel activations in thalamus and pSTG could be seen as equivalents of synchronously oscillating thalamo-cortical loops during the process of audiovisual integration. On the other hand, one could hypothesise as follows: Keeping in mind the thalamus as an early node in sensory processing, our evidence for the involvement of this neural structure in the audiovisual integration process could be interpreted as a neuroimaging correlate of early integration processes within auditory and visual cortices, reported in electrophysiological studies (reviewed in Foxe and Schroeder, 2005), which cannot be explained via feedback from supramodal cortices. This interpretation gains even more support from the enhanced effective connectivity between right thalamus and ipsilateral associative cortices during audiovisual integration, as shown in the present study. Conclusion Bilateral pSTG and right thalamus reacted stronger to audiovisual than to visual and auditory stimuli. Of these areas, the left pSTG conformed best to the further a priori assumed characteristics of an integrative brain region for non-verbal emotional information, namely a positive linear relationship of the BOLD response under audiovisual stimulation with the behaviourally documented perceptual gain during the bimodal condition and an enhanced effective connectivity with auditory as well as visual associative cortices, which possibly depicts the mechanism how the formation of the bimodal percept is brought about. Moreover, we were able to demonstrate a heightened sensitivity of the

audiovisual integration sites to the emotionality of stimulus nonverbal content as further evidence that emotional signals are processed within these regions. Acknowledgments This study was supported by the Deutsche Forschungsgemeinschaft (SFB 550/B10). References Beauchamp, M.S., Argall, B.D., Bodurka, J., Duyn, J.H., Martin, A., 2004a. Unraveling multisensory integration: patchy organization within human STS multisensory cortex. Nat. Neurosci. 7, 1190–1192. Beauchamp, M.S., Lee, K.E., Argall, B.D., Martin, A., 2004b. Integration of auditory and visual information about objects in superior temporal sulcus. Neuron 41, 809–823. Belin, P., Zatorre, R.J., Lafaille, P., Ahad, P., Pike, B., 2000. Voice-selective areas in human auditory cortex. Nature 403, 309–312. Belizaire, G., Fillion-Bilodeau, S., Chartrand, J.P., Bertrand-Gauvin, C., Belin, P., 2007. Cerebral response to ‘voiceness’: a functional magnetic resonance imaging study. NeuroReport 18, 29–33. Blair, R.J., Morris, J.S., Frith, C.D., Perrett, D.I., Dolan, R.J., 1999. Dissociable neural responses to facial expressions of sadness and anger. Brain 122 (Pt. 5), 883–893. Breiter, H.C., Etcoff, N.L., Whalen, P.J., Kennedy, W.A., Rauch, S.L., Buckner, R.L., Strauss, M.M., Hyman, S.E., Rosen, B.R., 1996. Response and habituation of the human amygdala during visual processing of facial expression. Neuron 17, 875–887. Buchanan, T.W., Lutz, K., Mirzazade, S., Specht, K., Shah, N.J., Zilles, K., Jancke, L., 2000. Recognition of emotional prosody and verbal components of spoken language: an fMRI study. Brain Res. Cogn. Brain Res. 9, 227–238. Calvert, G.A., Campbell, R., Brammer, M.J., 2000. Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Curr. Biol. 10, 649–657. Collins, D.L., Neelin, P., Peters, T.M., Evans, A.C., 1994. Automatic 3D intersubject registration of MR volumetric data in standardized Talairach space. J. Comput. Assist. Tomogr. 18, 192–205. Cunnington, R., Windischberger, C., Deecke, L., Moser, E., 2002. The preparation and execution of self-initiated and externally-triggered movement: a study of event-related fMRI. NeuroImage 15, 373–385. De Gelder, B., Vroomen, J., 2000. The perception of emotions by ear and by eye. Cogn. Emot. 14, 289–311. De Gelder, B., Vroomen, J., Pourtois, G., Weiskrantz, L., 1999. Nonconscious recognition of affect in the absence of striate cortex. NeuroReport 10, 3759–3763. Dolan, R.J., Morris, J.S., de Gelder, B., 2001. Crossmodal binding of fear in voice and face. Proc. Natl. Acad. Sci. U. S. A. 98, 10006–10010. Elsinger, C.L., Harrington, D.L., Rao, S.M., 2006. From preparation to online control: reappraisal of neural circuitry mediating internally generated and externally guided actions. NeuroImage 31, 1177–1187. Ethofer, T., Anders, S., Erb, M., Droll, C., Royen, L., Saur, R., Reiterer, S., Grodd, W., Wildgruber, D., 2006a. Impact of voice on emotional judgment of faces: an event-related fMRI study. Hum. Brain Mapp. 27, 707–714. Ethofer, T., Anders, S., Erb, M., Herbert, C., Wiethoff, S., Kissler, J., Grodd, W., Wildgruber, D., 2006b. Cerebral pathways in processing of affective prosody: a dynamic causal modeling study. NeuroImage 30, 580–587. Ethofer, T., Anders, S., Wiethoff, S., Erb, M., Herbert, C., Saur, R., Grodd, W., Wildgruber, D., 2006c. Effects of prosodic emotional intensity on activation of associative auditory cortex. NeuroReport 17, 249–253. Ethofer, T., Pourtois, G., Wildgruber, D., 2006d. Investigating audiovisual

B. Kreifelts et al. / NeuroImage 37 (2007) 1445–1456 integration of emotional signals in the human brain. Prog. Brain Res. 156, 345–361. Fecteau, S., Armony, J.L., Joanette, Y., Belin, P., 2004. Is voice processing species-specific in human auditory cortex? An fMRI study. NeuroImage 23, 840–848. Foxe, J.J., Schroeder, C.E., 2005. The case for feedforward multisensory convergence during early cortical processing. NeuroReport 16, 419–423. Friston, K.J., Buechel, C., Fink, G.R., Morris, J., Rolls, E., Dolan, R.J., 1997. Psychophysiological and modulatory interactions in neuroimaging. NeuroImage 6, 218–229. Friston, K.J., Glaser, D.E., Henson, R.N., Kiebel, S., Phillips, C., Ashburner, J., 2002. Classical and Bayesian inference in neuroimaging: applications. NeuroImage 16, 484–512. Geisser, S., Greenhouse, S.W., 1958. An extension of Box's results on the use of the F-distribution in multivariate analysis. Ann. Math. Stat. 29, 885–891. George, M.S., Parekh, P.I., Rosinsky, N., Ketter, T.A., Kimbrell, T.A., Heilman, K.M., Herscovitch, P., Post, R.M., 1996. Understanding emotional prosody activates right hemisphere regions. Arch. Neurol. 53, 665–670. Ghazanfar, A.A., Maier, J.X., Hoffman, K.L., Logothetis, N.K., 2005. Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. J. Neurosci. 25, 5004–5012. Giard, M.H., Peronnet, F., 1999. Auditory–visual integration during multimodal object recognition in humans: a behavioral and electrophysiological study. J. Cogn. Neurosci. 11, 473–490. Giraud, A.L., Kell, C., Thierfelder, C., Sterzer, P., Russ, M.O., Preibisch, C., Kleinschmidt, A., 2004. Contributions of sensory input, auditory search and verbal comprehension to cortical activity during speech processing. Cereb. Cortex 14, 247–255. Gitelman, D.R., Penny, W.D., Ashburner, J., Friston, K.J., 2003. Modeling regional and psychophysiologic interactions in fMRI: the importance of hemodynamic deconvolution. NeuroImage 19, 200–207. Grandjean, D., Sander, D., Pourtois, G., Schwartz, S., Seghier, M.L., Scherer, K.R., Vuilleumier, P., 2005. The voices of wrath: brain responses to angry prosody in meaningless speech. Nat. Neurosci. 8, 145–146. Hall, D.A., Haggard, M.P., Akeroyd, M.A., Palmer, A.R., Summerfield, A.Q., Elliott, M.R., Gurney, E.M., Bowtell, R.W., 1999. “Sparse” temporal sampling in auditory fMRI. Hum. Brain Mapp. 7, 213–223. Hariri, A.R., Bookheimer, S.Y., Mazziotta, J.C., 2000. Modulating emotional responses: effects of a neocortical network on the limbic system. NeuroReport 11, 43–48. Hariri, A.R., Mattay, V.S., Tessitore, A., Fera, F., Weinberger, D.R., 2003. Neocortical modulation of the amygdala response to fearful stimuli. Biol. Psychiatry 53, 494–501. Haxby, J.V., Horwitz, B., Ungerleider, L.G., Maisog, J.M., Pietrini, P., Grady, C.L., 1994. The functional organization of human extrastriate cortex: a PET-rCBF study of selective attention to faces and locations. J. Neurosci. 14, 6336–6353. Imaizumi, S., Mori, K., Kiritani, S., Kawashima, R., Sugiura, M., Fukuda, H., Itoh, K., Kato, T., Nakamura, A., Hatano, K., Kojima, S., Nakamura, K., 1997. Vocal identification of speaker and emotion activates different brain regions. NeuroReport 8, 2809–2812. Jancke, L., Wustenberg, T., Scheich, H., Heinze, H.J., 2002. Phonetic perception and the temporal cortex. NeuroImage 15, 733–746. John, E.R., 2002. The neurophysics of consciousness. Brain Res. Brain Res. Rev. 39, 1–28. Jones, E.G., Powell, T.P., 1970. An anatomical study of converging sensory pathways within the cerebral cortex of the monkey. Brain 93, 793–820. Kanwisher, N., McDermott, J., Chun, M.M., 1997. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311. Komura, Y., Tamura, R., Uwano, T., Nishijo, H., Ono, T., 2005. Auditory thalamus integrates visual inputs into behavioral gains. Nat. Neurosci. 8, 1203–1209. Kotz, S.A., Meyer, M., Alter, K., Besson, M., von Cramon, D.Y., Friederici,

1455

A.D., 2003. On the lateralization of emotional prosody: an event-related functional MR investigation. Brain Lang. 86, 366–376. Kriegstein, K.V., Giraud, A.L., 2004. Distinct functional substrates along the right superior temporal sulcus for the processing of voices. NeuroImage 22, 948–955. Lange, K., Williams, L.M., Young, A.W., Bullmore, E.T., Brammer, M.J., Williams, S.C., Gray, J.A., Phillips, M.L., 2003. Task instructions modulate neural responses to fearful facial expressions. Biol. Psychiatry 53, 226–232. Llinas, R., Ribary, U., 2001. Consciousness and the brain. The thalamocortical dialogue in health and disease. Ann. N. Y. Acad. Sci. 929, 166–175. Macaluso, E., Frith, C., Driver, J., 2000. Selective spatial attention in vision and touch: unimodal and multimodal mechanisms revealed by PET. J. Neurophysiol. 83, 3062–3075. Massaro, D.W., Egan, P.B., 1996. Perceiving affect from the voice and the face. Psychon. Bull. Rev. 3, 215–221. Miller, J., 1982. Divided attention: evidence for coactivation with redundant signals. Cogn. Psychol. 14, 247–279. Mitchell, R.L., Elliott, R., Barry, M., Cruttenden, A., Woodruff, P.W., 2003. The neural response to emotional prosody, as revealed by functional magnetic resonance imaging. Neuropsychologia 41, 1410–1421. Morris, J.S., Frith, C.D., Perrett, D.I., Rowland, D., Young, A.W., Calder, A.J., Dolan, R.J., 1996. A differential neural response in the human amygdala to fearful and happy facial expressions. Nature 383, 812–815. Mufson, E.J., Mesulam, M.M., 1984. Thalamic connections of the insula in the rhesus monkey and comments on the paralimbic connectivity of the medial pulvinar nucleus. J. Comp. Neurol. 227, 109–120. Nichols, T., Brett, M., Andersson, J., Wager, T., Poline, J.B., 2005. Valid conjunction inference with the minimum statistic. NeuroImage 25, 653–660. Oldfield, R.C., 1971. The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9, 97–113. Phillips, M.L., Young, A.W., Senior, C., Brammer, M., Andrew, C., Calder, A.J., Bullmore, E.T., Perrett, D.I., Rowland, D., Williams, S.C., Gray, J.A., David, A.S., 1997. A specific neural substrate for perceiving facial expressions of disgust. Nature 389, 495–498. Posamentier, M.T., Abdi, H., 2003. Processing faces and facial expressions. Neuropsychol. Rev. 13, 113–143. Pourtois, G., de Gelder, B., Vroomen, J., Rossion, B., Crommelinck, M., 2000. The time-course of intermodal binding between seeing and hearing affective information. NeuroReport 11, 1329–1333. Pourtois, G., Debatisse, D., Despland, P.A., de Gelder, B., 2002. Facial expressions modulate the time course of long latency auditory brain potentials. Brain Res. Cogn. Brain Res. 14, 99–105. Pourtois, G., de Gelder, B., Bol, A., Crommelinck, M., 2005. Perception of facial expressions and voices and of their combination in the human brain. Cortex 41, 49–59. Purdon, P.L., Weisskoff, R.M., 1998. Effect of temporal autocorrelation due to physiological noise and stimulus paradigm on voxel-level falsepositive rates in fMRI. Hum. Brain Mapp. 6, 239–249. Rushworth, M.F., Walton, M.E., Kennerley, S.W., Bannerman, D.M., 2004. Action sets and decisions in the medial prefrontal cortex. Trends Cogn. Sci. 8, 410–417. Schroger, E., Widmann, A., 1998. Speeded responses to audiovisual signal changes result from bimodal integration. Psychophysiology 35, 755–759. Seltzer, B., Pandya, D.N., 1978. Afferent cortical connections and architectonics of the superior temporal sulcus and surrounding cortex in the rhesus monkey. Brain Res. 149, 1–24. Sergent, J., Ohta, S., MacDonald, B., 1992. Functional neuroanatomy of face and object processing. A positron emission tomography study. Brain 115 (Pt. 1), 15–36. Stein, B.E., Meredith, M.A., 1993. Merging of Senses. MIT Press, Cambridge. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N., Mazoyer, B., Joliot, M., 2002. Automated anatomical

1456

B. Kreifelts et al. / NeuroImage 37 (2007) 1445–1456

labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage 15, 273–289. van Atteveldt, N., Formisano, E., Goebel, R., Blomert, L., 2004. Integration of letters and speech sounds in the human brain. Neuron 43, 271–282. von Kriegstein, K., Giraud, A.L., 2006. Implicit multisensory associations influence voice recognition. PLoS Biol. 4, e326. Vroomen, J., Driver, J., de Gelder, B., 2001. Is cross-modal integration of emotional expressions independent of attentional resources? Cogn. Affect. Behav. Neurosci. 1, 382–387. Wagner, H.L., 1993. On measuring performance in category judgment studies of nonverbal behavior. J. Nonverbal Behav. 17, 3–28. Wildgruber, D., Pihan, H., Ackermann, H., Erb, M., Grodd, W., 2002. Dynamic brain activation during processing of emotional intonation: influence of acoustic parameters, emotional valence, and sex. NeuroImage 15, 856–869.

Wildgruber, D., Hertrich, I., Riecker, A., Erb, M., Anders, S., Grodd, W., Ackermann, H., 2004. Distinct frontal regions subserve evaluation of linguistic and emotional aspects of speech intonation. Cereb. Cortex 14, 1384–1389. Wildgruber, D., Riecker, A., Hertrich, I., Erb, M., Grodd, W., Ethofer, T., Ackermann, H., 2005. Identification of emotional intonation evaluated by fMRI. NeuroImage 24, 1233–1241. Wildgruber, D., Ackermann, H., Kreifelts, B., Ethofer, T., 2006. Cerebral processing of linguistic and emotional prosody: fMRI studies. Prog. Brain Res. 156, 249–268. Worsley, K., Marrett, S., Neelin, P., Vandal, A.C., Friston, K.J., Evans, A., 1996. A unified statistical approach for determining significant signals in images of cerebral activation. Hum. Brain Map. 4, 74–90. Wright, T.M., Pelphrey, K.A., Allison, T., McKeown, M.J., McCarthy, G., 2003. Polysensory interactions along lateral temporal regions evoked by audiovisual speech. Cereb. Cortex 13, 1034–1043.