Integration of cross-modal emotional information in the human brain: An fMRI study

Integration of cross-modal emotional information in the human brain: An fMRI study

cortex 46 (2010) 161–169 available at www.sciencedirect.com journal homepage: www.elsevier.com/locate/cortex Research report Integration of cross-...

608KB Sizes 0 Downloads 54 Views

cortex 46 (2010) 161–169

available at www.sciencedirect.com

journal homepage: www.elsevier.com/locate/cortex

Research report

Integration of cross-modal emotional information in the human brain: An fMRI study Ji-Young Parka, Bon-Mi Gub, Do-Hyung Kangc, Yong-Wook Shinc, Chi-Hoon Choid, Jong-Min Leee and Jun Soo Kwona,b,c,* a

Interdisciplinary Program in Cognitive Science, Seoul National University, Seoul, Republic of Korea Interdisciplinary Program in Brain Science, Seoul National University, Seoul, Republic of Korea c Department of Psychiatry, Seoul National University Hospital, Seoul, Republic of Korea d Department of Radiology, National Medical Center, Seoul, Republic of Korea e Department of Biomedical Engineering, Hanyang University School of Medicine, Seoul, Republic of Korea b

article info

abstract

Article history:

The interaction of information derived from the voice and facial expression of a speaker

Received 19 April 2008

contributes to the interpretation of the emotional state of the speaker and to the forma-

Reviewed 27 May 2008

tion of inferences about information that may have been merely implied in the verbal

Revised 1 June 2008

communication. Therefore, we investigated the brain processes responsible for the inte-

Accepted 20 June 2008

gration of emotional information originating from different sources. Although several

Action editor Maurizio Corbetta

studies have reported possible sites for integration, further investigation using a neutral

Published online 29 June 2008

emotional condition is required to locate emotion-specific networks. Using functional magnetic resonance imaging (fMRI), we explored the brain regions involved in the inte-

Keywords:

gration of emotional information from different modalities in comparison to those

Bimodal integration

involved in integrating emotionally neutral information. There was significant activation

Emotion

in the superior temporal gyrus (STG); inferior frontal gyrus (IFG); and parahippocampal

Functional magnetic resonance

gyrus, including the amygdala, under the bimodal versus the unimodal condition, irre-

imaging (fMRI)

spective of the emotional content. We confirmed the results of previous studies by find-

Interaction analysis

ing that the bimodal emotional condition elicited strong activation in the left middle temporal gyrus (MTG), and we extended this finding to locate the effects of emotional factors by using a neutral condition in the experimental design. We found anger-specific activation in the posterior cingulate, fusiform gyrus, and cerebellum, whereas we found happiness-specific activation in the MTG, parahippocampal gyrus, hippocampus, claustrum, inferior parietal lobule, cuneus, middle frontal gyrus (MFG), IFG, and anterior cingulate. These emotion-specific activations suggest that each emotion uses a separate network to integrate bimodal information and shares a common network for crossmodal integration. ª 2008 Elsevier Srl. All rights reserved.

* Corresponding author. Department of Psychiatry, Seoul National University College of Medicine and Hospital, 28 Yongon-dong, Chongno-gu, Seoul 110-744, Republic of Korea. E-mail address: [email protected] (J.S. Kwon). 0010-9452/$ – see front matter ª 2008 Elsevier Srl. All rights reserved. doi:10.1016/j.cortex.2008.06.008

162

1.

cortex 46 (2010) 161–169

Introduction

Humans receive information simultaneously from multiple sensory modalities (e.g., sight, hearing, touch, smell); the perceptual system extracts the information that is relevant for a given situation, and these data are processed into a unified representation. Thus, multi-sensory integration is essential for daily living. Indeed, it has been suggested that multisensory systems contribute to faster and more accurate appraisals (Bushara et al., 2003) than other sensory systems and that the former can supplement such other systems (Calvert et al., 1999). Furthermore, specialized distributed networks within the brain are implicated in these multi-sensory integrative processes, and these networks depend on the nature and extent of the correspondences among various sensory inputs (Calvert et al., 2000). Integrated perceptions of emotional information deriving from multi-sensory inputs serve a central function in social life. Successful interpersonal interactions depend on successful communications. The latter, in turn, rest on the use of multiple sources of information, both verbal and nonverbal, about relevant emotional factors. Thus, actual communication relies on non-verbal inputs such as prosody and facial expression. Data gathered from all sources are compared and contrasted; redundancy among perceptions may reduce ambiguity, whereas conflicts among perceptions may increase bias. Evidence from behavioral and electrophysiological studies clearly shows that information from the visual system influences perceptions of auditory information, and vice versa, in regard to the perception of emotions (de Gelder and Vroomen, 2000). The increases in response latencies following bimodal stimuli provide evidence that subjects automatically and necessarily integrate information from both sources (Massaro and Egan, 1996; de Gelder and Vroomen, 2000). Consistent with research on the multimodal integration of other phenomena, emotional information derived from facial expression and vocal characteristics are not reducible to the simple sum of unimodal inputs. The first study to use functional magnetic resonance imaging (fMRI) to investigate these issues found that the amygdala modulates the cross-modal binding effects of explicit and congruent fearful face-voice pairs (Dolan et al., 2001). In a followup study using positron emission tomography (PET) to study an implicit emotional processing task, activation by a unimodal condition was compared to that by bimodal pairs to identify the areas involved in the integration of positive (happy) versus negative (fear) emotional information (Pourtois et al., 2005). The authors reported that the multi-sensory integration of positive emotions is mediated by different neuroanatomical substrates from that of negative emotions and that affective information from faces and voices converges in heteromodal regions of the human brain during multi-sensory perceptions of emotion. However, no direct comparison between the audiovisual perception of emotional stimuli and the audiovisual perception of non-emotional stimuli has been performed because neutral emotional conditions have been lacking. Consequently, conclusions about the exact role of the middle tem-

poral gyrus (MTG) as a convergence area must await further investigation. The use of neutral stimuli (both unimodal and bimodal) is essential to discriminate emotion-specific effects from non-emotional effects on cross-modal binding. Although the ability to detect anger in the context of interpersonal interactions serves not only to facilitate communication, but also to promote safety and social appropriateness (Grandjean et al., 2005), the audiovisual network in the brain that is responsible for processing information on anger-related stimuli remains incompletely investigated and understood. Despite the use by Kreifelts et al. (2007) of a neutral condition, they used an explicit emotion categorization task that is unusual in everyday life, and the results cannot indicate how emotions influence and interact with cognition. Therefore, we used an fMRI methodology with a crossmodal implicit emotional processing paradigm (angry vs happy vs neutral) to investigate the brain regions activated by combinations of face and voice stimuli. We then compared these regions to those activated by either visual or auditory stimuli presented alone to clarify where information about emotions that originate from different sensory modalities interacts and becomes integrated. By comparing data from the neutral and the emotional conditions, we also investigated whether these convergence regions are emotion-specific or more general.

2.

Materials and methods

2.1.

Subjects

Eleven healthy adults (seven males, four females; age range: 19–33 years; mean age: 23.3 years; SD: 4.2 years) participated after providing written informed consent. All subjects were right-handed and had normal or corrected-to-normal vision. None had hearing problems or a history of neurological or psychiatric problems. Subjects were paid for their participation. The study was approved by the Institutional Review Board of Seoul National University Hospital, Korea.

2.2.

Stimuli

We presented three emotional states (happy, neutral, and angry) in three ways (visual, auditory, and audiovisual; Fig. 1). The three emotional states were presented in photos showing the facial expressions of three men and three women. All of the face stimuli were hair-masked and gray-scaled using Adobe Photoshop 7.0 (Adobe, San Jose, CA, USA). Vocal presentations of the three emotional states were obtained by recording the voices of three men and three women while speaking semantically neutral sentences (e.g., ‘‘There’s a book on the desk’’) in three different emotional tones. All voices were recorded and normalized using Sound Forge 5.0 (Sony, Tokyo, Japan). In the pilot study, subjects initially named the emotion associated with a face or a voice to categorize these stimuli for the full study. We selected 20 sentences and six faces for each emotion.

cortex 46 (2010) 161–169

163

Fig. 1 – Experimental design consisting of three emotions (happy, neutral and angry) and three conditions (face-only, voiceonly, and face and voice) with stimulus duration (1300 msec) for each trial. Each stimulus was presented for 1300 msec, followed by the presentation of ‘‘?’’ as a prompt for 500 msec. A blank screen then appeared for 200 msec. This three-step sequence constituted a trial; each block was composed of 10 trials and thus lasted for 20 sec. One run consisted of 10 blocks, and each block was followed by 12 sec of rest period. Each block was presented in a pseudorandomized order under the restriction that the same emotions were not presented consecutively. During fMRI scanning, participants were asked to identify the gender of each face or voice, regardless of any emotional content. After scanning, the subjects performed the emotional discrimination task to confirm that they had correctly perceived the emotional valence of each stimulus.

2.3.

Image acquisition

Functional and structural images were acquired using a 1.5 T whole-body scanner (AVANTO; Siemens, Erlangen, Germany). Functional images were acquired by a multi-slice echo planar imaging (EPI) sequence covering the whole cerebrum [24 axial slices acquired in an interleaved way, 5-mm slice thickness, repetition time (TR) ¼ 2 sec, echo time (TE) ¼ 41 msec, flip angle (FA) ¼ 90 , field of view (FOV) ¼ 230 cm, 64  64 matrix]. High-resolution anatomical images were acquired in 176 contiguous axial slices for purposes of anatomical localization and co-registration.

2.4.

Data analysis

Functional imaging analysis was performed using SPM2 software (Wellcome Department of Imaging Neuroscience, London, UK; www.fil.ion.ucl.ac.uk/spm). The first three images in each run were discarded to eliminate the non-equilibrium effects of magnetization. For each subject, a set of 450 fMRI scans was realigned to correct for interscan movement and

was stereotactically normalized using sinc interpolation of the standard space defined by the Montreal Neurological Institute (MNI) template. The scans were then smoothed using a Gaussian kernel of 10 mm full-width half-maximum to account for residual inter-subject differences. Low-frequency signal drifts were removed using a 128-sec high-pass filter, and temporal autocorrelation in the fMRI time series was corrected using a first-order autoregressive model. The data were then analyzed using a canonical hemodynamic response function in SPM2. Three event types of interaction analyses [Aa  (A þ a), Nn  (N þ n), Hh  (Hþ h)] were defined at the first level of analysis. Individual subject maps were created using one-sample ttests; the three resultant contrast images were then entered into second-level (random effects) analyses. We carried out the conjunction analyses using the results of the three interaction analyses such as: (1) [Aa  (A þ a)] AND [Nn  (N þ n)] AND [Hh  (H þ h)] to determine the convergence area for crossmodal processing and (2) [Aa  (A þ a)] AND [Hh  (H þ h)] to determine the convergence area for emotional information. The threshold of the resulting statistical parametric map for conjunction analysis was set at p < .05 false discovery rate (FDR) with at least 10 continuous voxels. Because interaction analysis has the possibility of producing false positives due to deactivations in the unimodal condition, we report only those activation clusters for (A-rest) and for (V-rest) that had results significantly greater than zero. Finally, we compared the results of angry and happy interaction analysis with those of the neutral condition using the following contrasts: (1) [Aa  (A þ a)]–[Nn  (N þ n)] and (2) [Hh  (H þ h)]–[Nn  (N þ n)]. A statistical threshold of p < .005 (uncorrected) and an extent threshold of k > 30 voxels were set to identify regions that showed significant activation in response to each emotional condition as compared to the neutral condition, respectively.

164

cortex 46 (2010) 161–169

3.

Results

3.1.

Behavioral data

During scanning, subjects were instructed to identify the gender of the presented face or voice; over 97.6% of the subjects gave correct responses under all conditions (97.6%, 98%, and 98.3% for face, voice, and face-voice condition, respectively). Reaction times (RTs) were slower for voices (mean RT: 1249 msec) than for faces (mean RT: 746 msec) and face-voice pairs (mean RT: 1044 msec). The difference in RT among the three modalities was significant at the .001 level [F(2,261) ¼ 655.491; p < .001]. Two-way analysis of variance (ANOVA; emotion  modality) indicated significant main effects of emotion [F(2,261) ¼ 11.286; p < .001]. Post-hoc analysis indicated that the mean difference in RT under the angry condition compared to the neutral or happy condition was significant at p < .001, but there was no significant difference between the neutral and happy conditions (t ¼ .23; p ¼ .82). After scanning, half of the subjects were instructed to identify the emotions; 96% provided correct responses for voices (95.8% for angry, 92.2% for neutral, and 98.4% for happy) and 97% were correct for faces (95.8% for angry, 98.4% for neutral, and 96.8% for happy).

3.2.

fMRI data

We first performed three interaction analyses to locate multi-sensory integration sites for each emotion by studying

activation patterns; we defined these as positive interactions (bimodal > face þ voice; Calvert and Thesen, 2004). The interaction analyses investigated the differences between the hemodynamic responses obtained during bimodal stimulation and the sum of the responses obtained during unimodal conditions. We then conducted conjunction analysis (Nichols et al., 2005) using the results of the three interaction analyses to determine which brain regions are associated with the perception of audiovisual information, regardless of the emotional content [e.g., Aa  (A þ a)] AND [Nn  (N þ n)] AND [Hh  (H þ h)]. However, Calvert and Thesen (2004) and Ethofer et al. (2006b) point out that this analysis allows for false positives due to deactivations in the unimodal conditions. To rule out this possibility, we report only those activation clusters that measured greater than zero in both unimodal conditions: (A-rest) and (V-rest) (Fig. 2). Viewing faces and listening to voices engaged the superior temporal gyrus (STG); inferior frontal gyrus (IFG); and the parahippocampal gyrus, including the amygdala (Table 1). These regions were engaged in the binding of physical information derived from pairs of faces and voices. Following the focus of previous studies (Pourtois et al., 2005; Kreifelts et al., 2007), we used conjunction analysis to investigate the convergence area for the binding of emotional information from different sources. We confirmed previous results by finding strong activation in the left MTG during the presentation of bimodal emotional information (Table 2). Then we compared the results of the interaction analyses for each emotional condition with those for the neutral condition to investigate the brain regions subserving the integration of audiovisual emotional information. The contrasts

Fig. 2 – Brain areas showing significantly strong activation in bimodal conditions compared to unimodal conditions and the pattern of the percent signal change in each cluster across modality.

165

cortex 46 (2010) 161–169

Table 1 – The result of conjunction analysis. Region

x

y

z

Z score

[Aa  (A þ a)] AND [Nn  (N þ n)] AND [Hh  (H þ h)] STG 40 18 22 3.55 48 14 34 3.88 IFG 50 40 6 3.71 Parahippocampal gyrus/amygdala

28

6

18

3.61

Cluster size 50 50 25 10

H: happy face; h: happy voice; Hh: happy face þ voice; A: angry face; a: angry voice; Aa: angry face þ voice; N: neutral face; n: neutral voice; Nn: neutral face þ voice. The threshold was set to p < .05 FDR and cluster size k > 10.

[Hh  (H þ h)]–[Nn  (N þ n)] and [Aa  (A þ a)]–[Nn  (N þ n)] revealed stronger responses for bimodal presentations of happiness and anger than for bimodal neutral conditions. There was greater activation in the posterior cingulate, fusiform gyrus, and cerebellum for anger than for neutral (Table 3, Fig. 3). In contrast, the MTG, parahippocampal gyrus, hippocampus, claustrum, inferior parietal lobule, cuneus, MFG, IFG, and anterior cingulate showed greater activation for happiness than for neutral (Table 4, Fig. 4).

4.

Discussion

4.1.

Behavioral results

The behavioral results were consistent with those of previous studies (de Gelder et al., 2004; Pourtois et al., 2005). The RT was slower under the bimodal condition than under the face condition, but faster under the bimodal condition than under the voice condition. Furthermore, the RT was faster under the face condition than under the voice condition. Thus, face and voice information interacts under bimodal conditions. Visual information speeds up the processing of auditory signals, as is the case in audiovisual speech processing (van Wassenhove et al., 2005). The average RT for negative emotion was longer than that for neutral or positive emotion. This result is consistent with that of Simpson et al. (2000), who also reported that the RT for negative pictures was slower than that for neutral pictures. Our paradigm also confirmed that a negative voice is associated with a slower RT during a cognitive task given identical linguistic content across the three conditions. These results

Table 2 – Brain areas showing significantly strong activation in emotional condition and the comparison with neutral condition. Region

x

y

z

[Aa  (A þ a)] AND [Hh  (H þ h)] MTG 48 65 25

Z score

Cluster size

4.97

161

H: happy face; h: happy voice; Hh: happy face þ voice; A: angry face; a: angry voice; Aa: angry face þ voice; N: neutral face; n: neutral voice; Nn: neutral face þ voice. The threshold for conjunction analysis was set to p < .05 FDR and cluster size k > 10.

Table 3 – Brain areas showing significantly strong activation in bimodal angry condition compared to neutral condition. Region

BA

x

Posterior cingulate Fusiform gyrus Fusiform gyrus Cerebellum

30 37 19

14 30 36 14

y

z

52 10 36 14 64 4 52 18

Z score Cluster size 3.03 3.05 3.22 3.21

32 35 43 43

This is the result of the contrast [Aa  (A þ a)]–[Nn  (N þ n)]. The threshold was set to p < .005 uncorrected and cluster size k > 30. BA ¼ Brodmann area.

suggest that emotional information influences cognitive processes.

4.2.

fMRI results

We examined whether the activation of heteromodal multisensory areas during multi-sensory perception of emotion are specific to the integration of emotional information and whether emotions share a common network or engage different discrete emotion-specific subregions in the integration of information from different sources. The activation of multisensory integration sites differs from the arithmetic sum of the activations in response to unimodal stimuli. That is, when the response to a bimodal stimulus exceeds the sum of the unimodal responses [bimodal > unimodal1 þ unimodal2], it is defined as a positive interaction (Calvert and Thesen, 2004; Ethofer et al., 2006a; Wildgruber et al., 2006). Interaction analyses indicated the areas used for each emotion in the integration of bimodal information. All three bimodal conditions evoked significantly strong activation in the STG, IFG, and parahippocampal gyrus, including the amygdala (Table 1). This suggests that bimodal conditions share a network to form a unified representation, regardless of the particular emotions involved in the stimulus. This cross-modal perceptual binding network (Roskies, 1999) links the emotion attached to a voice with the associated visual percept, as well as to a percept of the speaker. Thus, both are automatically perceived as aspects of a single event. Conjunction analysis indicated the area for the integration of bimodal information derived from faces and voices irrespective of emotions. In prior studies, the minimum statistic has been compared to the global null (MS/GN) to test for an AND conjunction. However, the problem with the MS/GN method is that it does not test for an AND conjunction, but instead tests for an overall effect (Nichols et al., 2005). Thus, to test for a logical AND, we used the method of comparing the minimum statistic to the conjunction null (MS/CN); this method can detect the voxels that show significantly strong activation under bimodal conditions as compared to their activation under each unimodal condition. Furthermore, we ruled out the possibility of detecting voxels that show deactivation in unimodal conditions by using the pattern of the percent signal change in each region of interest (ROI) for confirmation (Fig. 2). According to our results, the presentation of a face-voice pair elicits the activation of the STG, IFG, and parahippocampal gyrus, including the amygdala.

166

cortex 46 (2010) 161–169

Fig. 3 – Brain areas showing significantly strong activation in bimodal angry condition compared to neutral condition.

Animal studies have shown that the superior temporal sulcus (STS) is a plausible site for multi-sensory integration because its neurons elicit a supra-additive response (i.e., greater than the sum of unimodal responses; Barraclough et al., 2005). Moreover, evidence from human studies has suggested that the STS plays a key role in the integration of audiovisual speech (Calvert et al., 2000). Ethofer et al. (2006b) and Kreifelts et al. (2007) reported that the left posterior STG responds more to audiovisual than to unimodal stimuli and shows enhanced connectivity with auditory and visual associative cortices. Most previous studies reported multisensory effects in more posterior regions of STS compared with ours. The difference of activation pattern is supported by the fact that we included neutral condition and asked our subjects to discriminate gender of the stimuli. The anterior STG is reported to be generally involved in multimodal representations of individuals (Gainotti et al., 2003; Gorno-Tempini

Table 4 – Brain areas showing significantly strong activation in bimodal happy condition compared to neutral condition. Region

BA

x

MTG

19

42 14 6 32 32 60 62 54 64 60 18 26 44 38 34 2 6 4

Parahippocampal gyrus Hippocampus Insular/claustrum Inferior parietal lobule

40

Postcentral gyrus Cuneus MFG IFG

40 3 19 6 47

Anterior cingulate

24

y

z

76 18 50 8 44 2 30 10 4 8 20 36 40 36 28 30 28 22 20 36 84 34 14 60 20 14 20 8 30 16 16 22 24 22 34 16

Z score Cluster size 3.08 4.68 4.29 3.31 3.19 2.98 3.1 2.9 3.09 2.98 3.21 3.09 3.13

34 2180

3.55

152

88 40 55

179 45 185

This is the result of the contrast [Hh  (H þ h)]–[Nn  (N þ n)]. The threshold was set to p < .005 uncorrected and cluster size k > 30. BA ¼ Brodmann area.

et al., 1998; Snowden et al., 2004; Tsukiura et al., 2006; Von Kriegstein and Giraud, 2006). The IFG, the dorsolateral prefrontal cortex (DLPFC) neurons are also known to integrate sensory stimuli from different modalities such as sight and sound, as well as across time. Drawing on experiments with single cell recordings from monkeys, Fuster et al. (2000) reported that this area includes audio and visual cross-modal association neurons, supported by the connections from sensory cortices (Romanski et al., 1999). Amygdalectomized monkeys have deficits in a cross-modal version (tactual-to-visual) of an object memory task, even though they performed well on the intramodal version (e.g., visual-to-visual and tactual-to-tactual; Murray and Mishkin, 1985). Conjunction analysis of bimodal emotional conditions demonstrates a strong activation in the left MTG, which confirms the finding of Pourtois et al. (2005) that the left MTG is a convergence area for emotional information emanating from different sources as shown in Table 2. However, the identification of brain regions that contribute to the integration of emotional information requires comparisons with neutral conditions to determine the effects of emotional stimuli in the integration of bimodal stimuli. By using a bimodal neutral condition, emotion-specific activation patterns can be observed as compared to those evoked by neutral conditions. The contrast [Aa  (A þ a)]–[Nn  (N þ n)] revealed activation in the posterior cingulate, fusiform gyrus, and cerebellum (Table 3), whereas [Hh  (H þ h)]–[Nn (N þ n)] showed activation in the MTG, parahippocampal gyrus, hippocampus, claustrum, inferior parietal lobule, cuneus, MFG, IFG, and anterior cingulate (Table 4). The activation pattern in the happy condition is consistent with previous studies (Pourtois et al., 2005; Ethofer et al., 2006b; Johnstone et al., 2006) that reported significantly strong activation in the left MTG region related to happy voices only when they were paired with happy faces. Several previous studies have suggested that the MTG is involved in multi-sensory integration and is a possible convergence region (Streicher and Ettlinger, 1987; Pourtois et al., 2005). A previous PET study reported the supra-additive activation of the MTG in both fearful and happy conditions (Pourtois et al., 2005). Other studies also reported that the left posterior STS (Ethofer et al., 2006b) and bilateral STG (Kreifelts et al., 2007) evidence stronger activation during audiovisual

cortex 46 (2010) 161–169

167

Fig. 4 – Brain areas showing significantly strong activation in bimodal happy condition compared to neutral condition.

integration of non-verbal emotional information. However, these latter two studies used an explicit emotion categorization task, whereas we and Pourtois et al. (2005) asked the subjects to identify the gender and implicitly process emotional information. The difference in activation patterns is supported by the fact that the STG is reported to be significantly active during explicit emotional processing (Critchley et al., 2000) and top–down (attention-driven) processing of the salience of faces (Gallagher and Frith, 2003). According to these results, the STG is more involved in the integration of explicit emotional information than of implicit information; the latter is understood as automatic and independent of attentional factors. The claustrum is reported to receive projections from the various sensory systems, to emanate projections to these systems (Pearson et al., 1982), and to play a critical role in cross-modal matching (Calvert, 2001). This area also shows stronger activation during presentations of bimodal happiness as compared to either of the unimodal conditions (Ethofer et al., 2006b), which is consistent with our result. Damage to the IFG has been shown to induce changes in personality and affect (Hecaen and Albert, 1978; Rolls et al., 1994; Zald and Kim, 1996). Recent neuroimaging studies have reported enhanced activity in the IFG in response to different kinds of emotional stimuli such as sad and angry visual expressions (Morris et al., 1996), film-generated emotions (Reiman et al., 1997), pleasant touch (Francis et al., 1999), pleasant music (Blood et al., 1999), and aversive odorants or tastes (Zald and Pardo, 1997; Royet et al., 2000). Thus, our data confirm and extend these findings by showing that the IFG is involved during the processing of cross-modal stimuli from the ear and eye. Our results regarding the bimodal happy condition confirm those of previous studies; we also demonstrate that the integration of facial and vocal expressions of anger did not elicit as significant an activation in the MTG as did the happy or fear condition (Pourtois et al., 2005; Kreifelts et al., 2007). Anger is considered to differ from other negative emotions (Strauss et al., 2005). Angry faces are regarded as ‘‘aversive stimuli’’ that are more likely to produce harm than are fearful faces (or happy and neutral faces). Moreover, anger evokes sensitization, whereas fear evokes response habituation. Thus, the differences between these two negative emotions may be related to the different activation patterns involved in integration.

We observed that the simultaneous perception of an angry face and an angry voice elicits strong activation in the bilateral fusiform gyrus; this is consistent with the modulation of the fusiform gyrus during cross-modal fear processing (Dolan et al., 2001). Recent fMRI studies have focused on the feedforward convergence from modality-specific cortices to a heteromodal area. However, enhanced activity in the fusiform gyrus during emotional conditions indicates that multi-sensory phenomena affect not only heteromodal regions, but also unimodal areas. Nonetheless, the audiovisual response modulations that arise from feedback connections between heteromodal areas and the respective unimodal auditory and visual cortices or from direct crosstalk between auditory and visual associative cortices remain controversial (Macaluso and Driver, 2005). According to the conjunction analysis and the emotion-specific interaction analyses, our results are consistent with those of previous studies that support the conclusion that audiovisual integration emerges from the interaction of unimodal and heteromodal areas. Future connectivity analyses will offer clearer explanations of the feedforward and feedback pathways. Our study is limited insofar as the results were not corrected for multiple comparisons in emotion-specific interaction analyses. Despite the low statistical threshold, our findings with regard to the bimodal happy condition are consistent with those of previous studies (Pourtois et al., 2005; Ethofer et al., 2006b), confirming that these results are not arbitrary. An additional potential limitation is related to the neutral content of the sentence used in all conditions; this may have produced conflicted responses to emotional conditions because of the content-emotion incongruency. However, the neutral conditions elucidate emotional effects via direct comparisons between emotional and neutral conditions. Despite these limitations, this is the first study to identify emotion-specific integration areas by directly comparing emotional and neutral conditions. Indeed, previous studies have used ROI to distinguish between the areas that showed greater sensitivity for binding non-verbal emotional information versus those that showed greater activity in the neutral condition. Unlike previous studies that evaluated the event-related potentials (ERPs) or fMRI result in response to the final words of sentences, we measured responses to an entire sentence qua stimulus. A sentence can convey more emotional

168

cortex 46 (2010) 161–169

information with greater clarity than a single word, thus enhancing the ability of subjects to perceive the emotion embedded in the stimulus. Because the propositional content of sentences was identical across emotions, we can rule out the influence of linguistic content. However, it is unclear whether the area that our paradigm designates for the integration of emotional information shows sub-additive activation under incongruent conditions in addition to showing supra-additive activation under congruent conditions because we did not include incongruent blocks. This question will be investigated in the future using explicit tasks and an event-related design.

5.

Conclusion

We observed significantly greater activation in the STG, IFG, and parahippocampal gyrus, including the amygdala, under bimodal conditions compared to unimodal conditions, irrespective of the emotional content of stimulus. These regions are plausible nodes of a network in the human brain that binds information from different sources. Our results confirm earlier studies by indicating that emotional information (i.e., regarding happiness or anger) deriving from different sources elicits activation in the left MTG, and extends these by distinguishing responses to the emotional from those to the neutral conditions. We found anger-specific activation in the posterior cingulate, fusiform gyrus, and cerebellum; whereas the MTG, parahippocampal gyrus, hippocampus, claustrum, inferior parietal lobule, cuneus, MFG, IFG, and anterior cingulate were differentially activated for the happiness condition. The emotion-specific activations underscored by our results suggest that each emotion uses a separate network to integrate bimodal information, as well as a common network to integrate cross-modal information.

Acknowledgements This research was supported by The Cognitive Neuroscience Program (M10644020003-06N4402-00310) funded by the Ministry of Science and Technology, the Republic of Korea. The authors thank to Yong-Sik Jung for his help in MR scanning.

references

Barraclough NE, Xiao D, Baker CI, Oram MW, and Perrett DI. Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive Neuroscience, 17: 377–391, 2005. Blood AJ, Zatorre RJ, Bermudez P, and Evans AC. Emotional responses to pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nature Neuroscience, 2: 382–387, 1999. Bushara KO, Hanakawa T, Immisch I, Toma K, Kansaku K, and Hallett M. Neural correlates of cross-modal binding. Nature Neuroscience, 6: 190–195, 2003. Calvert GA, Brammer MJ, Bullmore ET, Campbell R, Iversen SD, and David AS. Response amplification in sensory-specific

cortices during crossmodal binding. NeuroReport, 10: 2619–2623, 1999. Calvert GA, Campbell R, and Brammer MJ. Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology, 10: 649–657, 2000. Calvert GA. Crossmodal processing in the human brain: Insights from functional neuroimaging studies. Cerebral Cortex, 11: 1110–1123, 2001. Calvert GA and Thesen T. Multisensory integration: Methodological approaches and emerging principles in the human brain. Journal of Physiology, Paris, 98: 191–205, 2004. Critchley H, Daly E, Phillips M, Brammer M, Bullmore E, Williams S, et al. Explicit and implicit neural mechanisms for processing of social information from facial expressions: A functional magnetic resonance imaging study. Human Brain Mapping, 9: 93–105, 2000. Dolan RJ, Morris JS, and de Gelder B. Crossmodal binding of fear in voice and face. Proceedings of the National Academy of Sciences of the United States of America, 98: 10006–10010, 2001. Ethofer T, Anders S, Wiethoff S, Erb M, Herbert C, Saur R, et al. Effects of prosodic emotional intensity on activation of associative auditory cortex. NeuroReport, 17: 249–253, 2006a. Ethofer T, Pourtois G, and Wildgruber D. Investigating audiovisual integration of emotional signals in the human. Progress in Brain Research, 156: 345–361, 2006b. Francis S, Rolls ET, Bowtell R, McGlone F, O’doherty J, Browning A, et al. The representation of pleasant touch in the brain and its relationship with taste and olfactory areas. NeuroReport, 10: 453–459, 1999. Fuster JM, Bodner M, and Kroger JK. Cross-modal and crosstemporal association in neurons of frontal cortex. Nature, 405: 347–351, 2000. Gainotti G, Barbier A, and Marra C. Slowly progressive defect in recognition of familiar people in a patient with right anterior temporal atrophy. Brain, 126: 792–803, 2003. Gallagher HL and Frith CD. Functional imaging of ‘theory of mind’. Trends in Cognitive Science, 7: 77–83, 2003. de Gelder B and Vroomen J. The perception of emotions by ear and by eye. Cognition and Emotion, 14: 289–311, 2000. de Gelder B, Vroomen J, and Pourtois G. Multisensory perception of affect, its time course and its neural basis. In Calvert G, Spence C, and Stein BE (Eds), Handbook of Multisensory Processes. Cambridge, MA: MIT, 2004: 581–596. Gorno-Tempini ML, Price CJ, Josephs O, Vandenberghe R, Cappa SF, Kapur N, et al. The neural systems sustaining face and proper-name processing. Brain, 121: 2103–2118, 1998. Grandjean D, Sander D, Pourtois G, Schwartz S, Seghier ML, Scherer KR, et al. The voices of wrath: Brain responses to angry prosody in meaningless speech. Nature Neuroscience, 8: 145–146, 2005. Hecaen H and Albert ML. Human Neuropsychology. New York: Wiley, 1978. Johnstone T, Van Reekum CM, Oakes TR, and Davidson RJ. The voice of emotion: An fMRI study of neural responses to angry and happy vocal expressions. Social Cognitive and Affective Neuroscience, 1: 242–249, 2006. Kreifelts B, Ethofer T, Grodd W, Erb M, and Wildgruber D. Audiovisual integration of emotional signals in voice and face: An event-related fMRI study. NeuroImage, 37: 1445–1456, 2007. Macaluso E and Driver J. Multisensory spatial interactions: A window onto functional integration in the human brain. Trends in Neurosciences, 28: 264–271, 2005. Massaro DW and Egan PB. Perceiving affect from the face and the voice. Psychonomic Bulletin and Review, 3: 215–221, 1996. Morris JS, Fletcher PC, Kapur N, Frith CD, and Dolan RJ. Brain regions involved in implicit processing of facial emotion. NeuroImage, 3: 235, 1996.

cortex 46 (2010) 161–169

Murray EA and Mishkin M. Amygdalectomy impairs crossmodal association in monkeys. Science, 228: 604–606, 1985. Nichols T, Brett M, Andersson J, Wager T, and Poline JB. Valid conjunction inference with the minimum statistic. NeuroImage, 25: 653–660, 2005. Pearson RC, Brodal P, Catter KC, and Powell TP. The organization of the connections between the cortex and the claustrum in the monkey. Brain Research, 234: 435–441, 1982. Pourtois G, de Gelder B, Bol A, and Crommelinck M. Perception of facial expressions and voices and of their combination in the human brain. Cortex, 41: 49–59, 2005. Reiman EM, Lane RD, Ahern GL, Schwartz GE, Davidson RJ, Friston KJ, et al. Neuroanatomical correlates of externally and internally generated human emotion. American Journal of Psychiatry, 154: 918–925, 1997. Rolls ET, Hornak J, Wade D, and McGrath J. Emotion-related learning in patients with social and emotional changes associated with frontal lobe damage. Journal of Neurology Neurosurgery and Psychiatry, 57: 1518–1524, 1994. Romanski LM, Tian B, Fritz J, Mishkin M, Goldman-Rakic, and Rauschecker JP. Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nature Neuroscience, 2: 1131–1136, 1999. Roskies AL. The binding problem. Neuron, 24: 7–9, 111–125, 1999. Royet JP, Zald D, Versace R, Costes N, Lavenne F, Koenig O, et al. Emotional responses to pleasant and unpleasant olfactory, visual, and auditory stimuli: A positron emission tomography study. Journal of Neuroscience, 20: 7752–7759, 2000. ¨ ngu¨r D, Akbudak E, Conturo TE, Ollinger JM, Simpson JR, O Snyder AZ, et al. The emotional modulation of cognitive processing: An fMRI study. Journal of Cognitive Neuroscience, 12: 157–170, 2000.

169

Snowden JS, Thompson JC, and Neary D. Knowledge of famous faces and names in semantic dementia. Brain, 127: 860–872, 2004. Strauss MM, Makris N, Aharon I, Vangel MG, Goodman J, Kennedy DN, et al. fMRI of sensitization to angry faces. NeuroImage, 26: 389–413, 2005. Streicher M and Ettlinger G. Cross-modal recognition of familiar and unfamiliar objects by the monkey: The effects of ablation of polysensory neocortex or of the amygdaloid complex. Behavioral Brain Research, 23: 95–107, 1987. Tsukiura T, Mochizuki-Kawai H, and Fujii T. Dissociable roles of the bilateral anterior temporal lobe in face-name associations: An event-related fMRI study. NeuroImage, 30: 617–626, 2006. Von Kriegstein K and Giraud AL. Implicit multisensory associations influence voice recognition. PLoS Biology, 4: e326, 2006. van Wassenhove V, Grant KW, and Poeppel D. Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences of the United States of America, 102: 1181–1186, 2005. Wildgruber D, Ackermann H, Kreifelts B, and Ethofer T. Cerebral processing of linguistic and emotional prosody: fMRI studies. Progress in Brain Research, 156: 249–268, 2006. Zald DH and Kim SW. Anatomy and function of the orbital frontal cortex, ii: Function and relevance to obsessive-compulsive disorder. Journal of Neuropsychiatry and Clinical Neurosciences, 8: 249–261, 1996. Zald DH and Pardo JV. Emotion, olfaction, and the human amygdala: Amygdala activation during aversive olfactory stimulation. Proceedings of the National Academy of Sciences of the United States of America, 94: 4119–4124, 1997.