BR A I N R ES E A RC H 1 1 7 6 ( 2 00 7 ) 1 0 3 –1 12
a v a i l a b l e a t w w w. s c i e n c e d i r e c t . c o m
w w w. e l s e v i e r. c o m / l o c a t e / b r a i n r e s
Research Report
Listen up! Processing of intensity change differs for vocal and nonvocal sounds Annett Schirmer a,⁎, Elizabeth Simpson b , Nicolas Escoffier a a
Department of Psychology, Faculty of Arts and Social Sciences, National University of Singapore, Block AS6, Level 3, 11 Law Link, 117570, Singapore b Department of Psychology, University of Georgia, USA
A R T I C LE I N FO
AB S T R A C T
Article history:
Changes in the intensity of both vocal and nonvocal sounds can be emotionally relevant.
Accepted 4 August 2007
However, as only vocal sounds directly reflect communicative intent, intensity change of
Available online 10 August 2007
vocal but not nonvocal sounds is socially relevant. Here we investigated whether a change in sound intensity is processed differently depending on its social relevance. To this end,
Keywords:
participants listened passively to a sequence of vocal or nonvocal sounds that contained
Prosody
rare deviants which differed from standards in sound intensity. Concurrently recorded
Attention
event-related potentials (ERPs) revealed a mismatch negativity (MMN) and P300 effect for
Empathy
intensity change. Direction of intensity change was of little importance for vocal stimulus
P3a
sequences, which recruited enhanced sensory and attentional resources for both loud and
P300
soft deviants. In contrast, intensity change in nonvocal sequences recruited more sensory
MMN
and attentional resources for loud as compared to soft deviants. This was reflected in
Gender
markedly larger MMN/P300 amplitudes and shorter P300 latencies for the loud as compared to soft nonvocal deviants. Furthermore, while the processing pattern observed for nonvocal sounds was largely comparable between men and women, sex differences for vocal sounds suggest that women were more sensitive to their social relevance. These findings extend previous evidence of sex differences in vocal processing and add to reports of voice specific processing mechanisms by demonstrating that simple acoustic change recruits more processing resources if it is socially relevant. © 2007 Elsevier B.V. All rights reserved.
1.
Introduction
The emergence of language during human evolution dramatically changed the way humans communicated and the extent to which they could coordinate their actions. Indeed, many researchers view language as a key factor that allowed humans to dominate other species and to successfully spread across the globe (Pinker, 2003; Corballis, 2004). While this linguistic revolution was certainly to our advantage it also meant that
other – nonverbal – forms of communication moved into the background. For example, while our primate relatives still rely to a relatively large degree on odor information for communication and devote relatively large genetic and cortical areas to representing this information, humans make seemingly little of their sense of smell (Gilad et al., 2003). Moreover, although other nonverbal signals such as gestures, facial expressions or tone of voice always accompany human spoken language, both the speaker and the listener tend to be less aware of this type of
⁎ Corresponding author. Fax: +65 6773 1843. E-mail address:
[email protected] (A. Schirmer). 0006-8993/$ see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.brainres.2007.08.008
104
BR A I N R ES E A RC H 1 1 7 6 ( 2 00 7 ) 1 0 3 –11 2
information than they are aware of the verbal message (Prinz, 2006; Schirmer et al., 2005a). Nevertheless, behavioral and neurophysiological studies indicate that humans have not completely lost the nonverbal skills inherited from their ancestors. On the contrary, researchers found dedicated neural mechanisms for the processing of nonverbal cues, which require little attentional resources and may call for attention if the perceived nonverbal cues are of emotional relevance (for a review see Palermo and Rhodes, 2007). For example, a dedicated face processing region has been identified in the inferior temporal lobe. The perception of human faces activates this region to a larger degree than the perception of animal faces or inanimate objects (Kanwisher et al., 1997; Kanwisher and Yovel, 2006). Likewise, the auditory system seems specifically equipped to process human vocalizations. Regions along the superior temporal sulcus (STS) respond more strongly to human vocalizations than to animal or environmental sounds (Belin et al., 2000), and vocalizations elicit even greater activation if they are emotional as compared to neutral (Ethofer et al., 2006; Grandjean et al., 2005; for a review see Schirmer and Kotz, 2006). Additionally, localized brain damage to structures implicated in vocal processing can induce a specific deficit in recognizing speaker identity (Van Lancker et al., 1989) or emotion (Pell, 2006) while leaving language comprehension intact. This evidence supports the idea that nonverbal signals, and in particular vocalizations, represent important stimuli that are processed by dedicated neural circuits and that have primacy over the processing of other information. Moreover, this primacy suggests that although nonverbal signals may receive little direct attention in human social interactions, they are nevertheless of great relevance. While the brain structures that mediate the primacy of vocal over nonvocal sounds during auditory processing have been relatively well established (for a review see Belin et al., 2004), little is known about when these structures are activated and whether there are differences in the temporal course of vocal and nonvocal processing. One might speculate that sensory processing and attention capture are facilitated for vocal as compared to nonvocal sounds. However, no such evidence is available as of today. Furthermore, it is unclear whether the primacy of vocal processing is restricted to the identification of vocal as compared to nonvocal sounds or whether it extends to the processing of acoustic changes in vocal and nonvocal sounds, respectively. Drawing upon the evidence discussed above, one might speculate that the same acoustic change (e.g., change in intensity) is more relevant to listeners if it concerns a vocal stimulus as compared to a nonvocal stimulus. As a consequence, acoustic change in a vocal stimulus may be processed faster and more thoroughly than the same acoustic change in a nonvocal stimulus. To address these issues, the present study employed a mismatch negativity (MMN) paradigm. The MMN is a scalprecorded event-related potential (ERP) elicited to rare and unattended acoustic change. Typically, participants are asked to watch a silent movie with subtitles or read a book while passively listening to an auditory oddball sequence. Such a sequence contains standard sounds that are comparable in their acoustic properties and rare deviant sounds that differ from standards in one or more acoustic properties. Subtracting
the ERP elicited to standards from that elicited to deviants reveals an MMN that peaks around 200 ms following stimulus onset with a fronto-central distribution (Sams et al., 1985). The MMN has been interpreted to reflect a basic change detection mechanism that is triggered by any noticeable acoustic change in the environment and that allows humans to respond appropriately (e.g., Näätänen et al., 2005; Schroeger and Wolff, 1998). This mechanism is believed to be fairly automatic as it has been evoked in sleeping participants (Martynova et al., 2003) and participants who were engaged in a demanding visual task such as discriminating between rapidly appearing odd and even numbers (Escera et al., 2002). As such the MMN represents a useful tool to study early sensory processes elicited by vocal and nonvocal change. Under certain conditions, the MMN is followed by a positivity of the P300 family, which serves as an additional marker of change detection (e.g., Schroeger and Wolff, 1998). The P300 was initially observed in active target detection paradigms by comparing the ERPs to targets and non-targets. Subsequently, it has been observed for rare non-targets as well as in MMN paradigms during passive listening (for a review see Polich, 2007). Because the amplitude of the P300 is larger in active as compared to passive paradigms, it has been proposed that it may reflect attentional processes such as the orientation towards a target or rare non-target stimulus (Alho et al., 1998; Friedman et al., 2001; Mecklinger et al., 1998). Moreover, larger P300 amplitudes have been associated with increased attention capture and enhanced stimulus encoding. The temporal course of stimulus evaluation or classification is believed to correlate with P300 latency (Kutas et al., 1977; Magliero et al., 1984; Polich, 2007). The present study used both the MMN and the P300 to shed light on the sensory and attentional mechanisms underlying the detection of acoustic change in vocal and nonvocal sounds. Task-irrelevant vocal and nonvocal oddball sequences were presented to participants who were engaged in watching a silent movie with subtitles. The vocal sounds were meaningless syllables, whereas the nonvocal sounds were synthesized sine waves with the same temporal, fundamental frequency and amplitude patterns as the original vocal sounds. This resulted in stimuli that were largely comparable to the human vocalizations but sounded synthetic, which is distinctly nonhuman. Both the vocal and the nonvocal sounds were presented in separate blocks with rare increases or decreases in sound intensity. Based on prior functional magnetic resonance imaging (fMRI) evidence indicating a primacy for vocal processing (Belin et al., 2004), we predicted that intensity changes should receive primacy if they occurred in vocal as compared to nonvocal stimulus sequences. This should be reflected by increased MMN and P300 amplitudes. Additionally, we speculated that vocal change is processed faster than nonvocal change. If true we should find respective differences in MMN and P300 latencies. A further issue that we aimed to address in the present study concerns the universality of vocal processing mechanisms. Because it is largely agreed that engaging in social interactions is fundamental to being human (Pinker, 2003; Corballis, 2004), the neural mechanisms underlying nonverbal processing are considered to be more or less comparable across individuals (Belin et al., 2004; Palermo and Rhodes,
BR A I N R ES E A RC H 1 1 7 6 ( 2 00 7 ) 1 0 3 –1 12
2007; Kanwisher and Yovel, 2006). While such a proposal is appealing, it should nevertheless entail room for interindividual variation. Humans differ in the degree to which they are interested in social intercourse and this variation likely translates into nonverbal processing differences. One variable that has been linked repeatedly with interindividual differences in social interest is sex. Compared to men, women have been reported to be more strongly interested in other individuals and to value social relationships to a larger degree (Kashima et al., 2004; Gilligan, 1982; but see Jaffee and Hyde, 2000). Furthermore, women consistently show more affiliative behaviors in social interactions such as matching the mannerisms of the interaction partner or smiling (Berenbaum and Rotter, 1992; LaFrance et al., 2003). In line with these observations, behavioral studies frequently found women to be more sensitive to nonverbal signals than men. These studies show a female superiority in the recognition of and memory for emotional expressions in a number of different nonverbal channels, including the voice (Hall, 1978; Hall et al., 2006). Additionally, there is evidence that women process nonverbal cues more automatically than men do. This has been demonstrated in a recent MMN study that employed happily and angrily spoken syllables as deviants and neutrally spoken syllables as standards, and vice versa (Schirmer et al., 2005b). The MMNs obtained by subtracting happy standards from happy deviants and angry standards from angry deviants were both larger than the MMN obtained by subtracting neutral standards from neutral deviants. Moreover, these effects were significant in female listeners only, indicating that females were more likely than males to discriminate the emotional and/or social significance of vocalizations preattentively. Taken together, behavioral and neurophysiological evidence suggests that while vocalizations may be more special than other sounds to both men and women, they may be particularly important for women (Hall et al., 2006; Schirmer and Kotz, 2006). While this appears to be a sensible conclusion it is compromised by the fact that past research on sex differences in nonverbal communication largely focused on emotion recognition and used communicative stimuli only. Moreover, research investigating sex differences in other domains, such as the processing of emotional pictures (e.g., snake), also revealed a heightened sensitivity in women as compared to men (Bradley et al., 2001b). While these later findings do not preclude that nonverbal signals are more important to women than to men, they raise the possibility that these sex differences reflect a more generally increased emotional sensitivity that is not specific to social stimuli. To address this issue, we compared male and female listeners in the processing of intensity change in nonvocal and vocal sequences — controlling for the emotional relevance of intensity change while modulating its social relevance. For both nonvocal and vocal sequences, change in intensity was emotionally relevant. Like animals, humans are highly sensitive to large physical contrasts as such contrasts can signal danger (e.g., something falling over; Grandin and Johnson, 2005). In particular, a sudden increase in sound intensity is emotionally relevant as sound intensity correlates with distance and can indicate how fast danger is approaching. Accordingly, sudden loud noises are known to elicit a startle response in humans and nonhuman animals which serves to
105
protect the body from potential harm (Bradley et al., 2001a). In contrast to intensity change in nonvocal sequences, intensity change in vocal sequences is not only emotionally relevant (Scherer and Banse, 1996) it also carries social information as it signals a speaker's communicative intent. Accordingly, it has been proposed that speakers employ vocal intensity and other acoustic parameters to modulate listener behavior (Bachorowski and Owren, 2001). Hence, changes in vocal intensity – specifically when they are associated with increased speaker arousal – are highly socially relevant as they underline the significance of the message and induce dynamic changes in social interactions. Given the emotional significance of vocal and nonvocal sound intensity, both vocal and nonvocal intensity deviants should elicit increased MMN and P300 amplitudes when they are louder as compared to when they are softer than the standard (Schirmer et al., 2005a,b; Rinne et al., 2006). Furthermore, if previously reported sex differences in picture processing (Bradley et al., 2001a,b) extend to the auditory modality, women may be more sensitive than men to intensity change. However, if women, besides being more emotionally responsive than men, also value social signals more than men do, we should expect sex differences to be larger for vocal as compared to nonvocal sounds.
2.
Results
2.1.
EEG experiment
EEG results are illustrated in Figs. 1A, B and 2A, B. Mean latency of the MMN peak averaged across electrodes and subjects resulted in 175, 210, 183 and 212 ms for loud and soft nonvocal and loud and soft vocal stimuli, respectively. Peak latencies and amplitudes were subjected to separate ANOVAs with intensity (loud/soft), stimulus (vocal/nonvocal), electrode position (anterior/posterior) and hemisphere (left/right) as repeated measures factors and sex as a between subjects factor. Analysis of peak latencies revealed main effects of intensity and sex. The MMN peaked earlier for high as compared to low intensity sounds (F(1,38) = 55.38, p b .0001) and earlier in female as compared to male listeners (F(1,38) = 5.09, p b .05). All other main effects and interactions were non-significant (F b 1). Analysis of peak amplitudes revealed a main effect of intensity (F(1,38) = 27.11, p b .0001) and stimulus (F(1,38) = 11.91, p b .005) indicating that loud and vocal stimuli elicited a larger MMN than soft and nonvocal stimuli. Additionally there was an interaction of intensity, stimulus, and sex (F(1,38) =4.54, pb .05). Follow-up analysis for each type of stimulus revealed a simple main effect of intensity (F(1,38)=21.62, pb .0001) for the nonvocal stimulus while the intensity by sex interaction was non-significant (Fb 1). The simple main effect of intensity was also significant for the vocal stimulus (F(1,38)=9.3, pb .01) as was the intensity by sex interaction (F(1,38)=6.34, pb .05). Separate analyses for each sex indicated that women (F(1,19) =23.02, pb .001) but not men (Fb 1) showed increased MMN amplitudes to the loud as compared to the soft vocal stimulus. Moreover, an additional comparison of MMN amplitudes between the sexes revealed that while amplitudes were comparable between men and women for the soft vocal stimulus (F(1,38) =1.99, p=.16), women showed larger MMN
106
BR A I N R ES E A RC H 1 1 7 6 ( 2 00 7 ) 1 0 3 –11 2
Fig. 1 – (A) Difference waves were obtained by subtracting standards from physically identical deviants. Differences waves of loud vocal sounds (solid black line), soft vocal sounds (solid gray line), loud nonvocal sounds (dotted black line) and soft nonvocal sounds (dotted gray line) are presented for male and female listeners respectively. (B) Difference waves presented in panel A are presented again for a single electrode site.
amplitudes than men to the loud vocal stimulus (F(1,38) =3.87, pb .05). The P300 peaked at 340, 427, 389, and 382 ms following stimulus onset for loud and soft nonvocal and loud and soft vocal stimuli, respectively. Statistical analysis of peak latencies revealed a main effect of intensity (F(1,38) = 11.09, p b .01), an interaction between stimulus and intensity (F(1,38) = 25.45, p b .0001) as well as a stimulus by intensity by electrode position interaction (F(1,38) = 4.06, p b .05). Follow-up analysis of the three-way interaction for each level of electrode position confirmed a significant stimulus by intensity interaction for both anterior (F(1,38) = 37.79, p b .0001) and posterior sites (F(1,38) = 10.78, p b .01), thereby indicating a larger effect over anterior sites. Over both anterior and posterior sites, nonvocal sounds elicited an earlier P300 peak when they were loud as
compared to when they were soft (anterior, F(1,38) = 40.38, p b .0001; posterior, F(1,38) = 18.47, p b .001). In contrast, the P300 peaked equally early for loud and soft vocal sounds (anterior, F(1,38) = 1.44, p = .24; posterior, F b 1). Analysis of peak amplitudes revealed main effects of intensity (F(1,38) = 14.5, p b .001), electrode position (F(1,38) = 74.1, p b .0001) and sex (F(1,38) = 8.21, p b .01). Generally, P300 amplitudes were larger for loud as compared to soft stimuli, larger over posterior as compared to anterior sites and larger in female as compared to male listeners. There were also interactions of intensity, stimulus and electrode position (F(1,38) = 4.35, p b .05), of electrode position and sex (F(1,38) = 4.53, p b .05) and of hemisphere and sex (F(1,38) = 6.59, p b .05). Follow-up analysis of the intensity by stimulus by electrode position interaction revealed a significant intensity by stimulus inter-
BR A I N R ES E A RC H 1 1 7 6 ( 2 00 7 ) 1 0 3 –1 12
107
Fig. 2 – (A, B) Difference waves were obtained by subtracting standards from physically identical deviants. Differences waves for loud (black line) and soft (gray line) sounds as elicited in female (solid line) and male (dotted line) listeners are presented for vocal and nonvocal stimulus sequences. (B) Difference waves presented in panel A are presented again for a single electrode site.
action over anterior (F(1,38) = 3.93, p = .05) but not posterior sites (F b 1). Over anterior sites, nonvocal sounds elicited a larger P300 when they were loud as compared to soft (F(1,38) = 11.19, p b .01), whereas the P300 elicited by loud and soft vocal sounds was equally large (F b 1). Follow-up analysis of the interactions involving sex indicated that P300 amplitudes were larger in women as compared to men over posterior (F(1,38) = 17.91, p b .001) but not anterior regions (F(1,38) = 1.85, p = 18). Moreover, this sex difference was larger over the left (F(1,38) = 10.87, p b 01) as compared to the right hemisphere (F(1,38) = 4.98, p b .05).
2.2.
Stimulus rating
After completing the EEG experiment, each participant was asked to listen to a set of sounds and rate their emotional
valence and arousal on a six-point scale in two separate blocks (see Fig. 3). Unknown to the participants, these sounds included the vocal and nonvocal stimuli presented during the EEG experiment. Valence and arousal values obtained for these latter stimuli were subjected to separate ANOVAs with intensity (0 dB/10 dB/20 dB) and stimulus (vocal/nonvocal) as repeated measures factors and sex as between subjects factor. Analysis of valence ratings revealed a main effect of stimulus indicating that the vocal stimulus was perceived as more negative than the nonvocal stimulus (F(1,38) = 17.43, p b .0001). Importantly for the present study, the intensity main effect (F(1,38) = 1.7, p = .19) as well as interactions involving intensity were non-significant (Fs b 1). Analysis of arousal ratings revealed a main effect of stimulus (F(1,38) = 72.69, p b .0001) and a stimulus by sex interaction
108
BR A I N R ES E A RC H 1 1 7 6 ( 2 00 7 ) 1 0 3 –11 2
Fig. 3 – Results from the arousal ratings (mean, standard error of means) for the vocal and nonvocal stimulus used in the EEG experiment. To illustrate the linear increase in perceived emotional arousal with increasing stimulus intensity, the 10 dB intensity increase (which was not used during the EEG experiment) was included in the figure and the statistical analysis of the rating results.
(F(1,38) = 6.92, p b .05). Follow-up analysis of the interaction indicated that arousal ratings for the vocal stimulus were higher in female than male listeners (F(1,38) = 5.36, p b .05), while ratings for the nonvocal stimulus were comparable across groups (F b 1, see Fig. 1). The main effect of intensity (F(1,38) = 18.49, p b .0001) was significant indicating that louder stimuli were associated with increased emotional arousal (0 dB vs 10 dB: F(1,38) = 11.05, p b .01; 0 dB vs 20 dB: F(1,38) = 25.62, p b .0001; 10 dB vs 20 dB: F(1,38) = 12.59, p b .01). As the interactions involving intensity were non-significant (F b 1), we can conclude that the effect of intensity on perceived emotionality was comparable for vocal and nonvocal stimuli and for male and female listeners.
3.
Discussion
The present study aimed at comparing the processing of socially relevant and socially irrelevant intensity change. To this end, participants listened passively to a sequence of naturally spoken or synthesized stimuli that contained rare deviants that differed from standards in intensity. As predicted, social relevance influenced both the time course and the amplitude of ERP effects associated with the processing of intensity change. Moreover, while women showed a generally enhanced processing of intensity change relative to men, these sex differences were particularly salient when intensity change was socially relevant as compared to irrelevant. The exact nature of the observed effects and their implications will be discussed in more detail in the following paragraphs.
3.1.
Are vocal sounds special?
The present study investigated vocal processing by comparing the ERP to intensity changes in vocal and nonvocal stimulus sequences. Like other types of acoustic change, intensity change elicited an MMN—a thoroughly investigated ERP component. It is widely believed that the MMN reflects the comparison between incoming sensory information and an
existing short-lived sensory memory representation, which is updated in case of a mismatch (Näätänen et al., 2005). This presumably cognitive process is accompanied by refractory effects due to the activation of “fresh” neurons that have not been excited by prior sensory stimulation. Depending on whether ERPs elicited by auditory deviants are compared with a physically different standard or a physically identical control stimulus, the contributions of refractoriness to the MMN may be larger or smaller, respectively. The brain structures that mediate the scalp recorded MMN have been investigated via dipole modeling of EEG and/or magnetoencephalography (MEG) data. This research converges on a main generator in the bilateral temporal lobe (Rinne et al., 2000; Pulvermuller et al., 2006) and is consistent with fMRI results. Due to the better spatial resolution afforded by fMRI, potential MMN generators have been further specified to encompass primary auditory cortex in Heschl's gyrus and higher order auditory regions in the superior temporal cortex (Rinne et al., 2005; Rinne, Schroeger, and von Cramon, 2005). Moreover, it has been proposed that primary auditory cortex subserves the refractoriness component of the MMN, while higher order auditory areas mediate its mnemonic component (Rinne, Schroeger, and von Cramon, 2005). The present study suggests that the latency of activating one or both of these components differs as a function of stimulus intensity but is unaffected by whether or not the sensory information is socially relevant (i.e., vocal) or irrelevant (i.e., nonvocal). In other words, the sensory processing of vocalizations is not facilitated relative to the sensory processing of nonvocal sounds. However, our findings indicate that the extent to which listeners engage auditory change detection mechanisms reflected by the MMN differs as a function of social relevance. Specifically, MMN amplitudes were larger for intensity change in vocal as compared to nonvocal stimulus sequences. Furthermore, in male listeners, the processing of loud and soft vocal deviants elicited comparatively large MMN amplitudes, whereas the MMN was larger for loud as compared to soft nonvocal deviants. This suggests that, in male listeners, MMN amplitude differences between vocal and nonvocal
BR A I N R ES E A RC H 1 1 7 6 ( 2 00 7 ) 1 0 3 –1 12
sounds were primarily driven by soft vocal sounds eliciting a larger MMN than soft nonvocal sounds. Based on these results one can conclude that although sensory processing may be equally fast for vocal and nonvocal sounds, the processing resources recruited for both differ in that vocal intensity deviants, in particular when they are soft, recruit more processing resources than comparable nonvocal intensity deviants. This is in line with previous fMRI reports of increased activity to vocal as compared to nonvocal sounds (Belin et al., 2004). Moreover, as this activity is typically found outside the primary auditory cortex in the auditory association areas in the superior temporal cortex, these later regions most likely contribute to the differences between vocal and nonvocal sounds observed in the present study. Besides investigating sensory processes reflected by the MMN, we were also interested in how effectively vocal and nonvocal intensity deviants capture a listener's attention. To this end, we analyzed P300 latencies and amplitudes. While initially thought of as a single ERP component, researchers now use the P300 to refer to a family of ERP components which may be evoked depending upon experimental conditions (for a review see Polich, 2007). For example, both passive and active processing of auditory deviants may elicit a component referred to as P3a. However, active more than passive deviant processing is associated with a component referred to as P3b (Squires et al., 1975; Simons et al., 2001). While the P3a has a fronto-central distribution, the P3b shows its maximum over parietal sites. In line with this, frontal brain damage has been shown to affect the P3a leaving the P3b intact (Knight, 1984), whereas the temporo-parietal junction appears most critical for the integrity of the P3b (Knight et al., 1989). Based on this evidence, one can conclude that P3a and P3b reflect functionally distinct mechanisms. Moreover, it has been proposed that the P3a indicates the involuntary capture of attention with larger amplitudes reflecting increased allocation of processing resources (Alho et al., 1998; Friedman et al., 2001; Mecklinger et al., 1998; Rinne et al., 2006). The P3b, on the other hand, may be elicited when attention capture secures resources for stimulus encoding and memory storage (Polich, 2007). Here we show that P300 latency and amplitude differ as a function of stimulus intensity and social relevance. While the P300 was most pronounced over parietal regions, effects of social relevance on P300 latency and amplitude were most salient over frontal regions and hence most likely associated with the P3a component. Moreover, the latency effects observed over frontal sites suggest that attention capture was equally fast for soft and loud vocal sounds, but delayed for soft relative to loud nonvocal sounds. P300 amplitude effects matched that of P300 latency and indicated that while soft and loud vocal sounds recruited comparable attentional resources, soft nonvocal sounds recruited fewer attentional resources than loud nonvocal sounds. Thus, similar to sensory processing, attention capture mediated by frontal regions seems to be enhanced and less dependent on stimulus intensity for vocal as compared to nonvocal sounds. The spreading of these effects from frontal to parietal sites may suggest an enhancement of subsequent stimulus encoding (Polich, 2007). Taken together, the present results support the idea that humans are especially sensitive to vocal sounds. Intensity
109
change recruits more sensory processing resources and attracts attention more efficiently if it occurs in a vocal as compared to a nonvocal sound. Moreover, these effects are less dependent on the direction of intensity change (i.e., increase or decrease) but appear to be driven by the sound structure of vocalizations, which is distinctly human.
3.2. Are vocal sounds more special to female as compared to male listeners? Prior research on nonverbal processing revealed sex differences in the recognition of nonverbally conveyed emotions. Moreover, women were found to process nonverbal emotional information more accurately and more automatically than men (Hall, 1978; Schirmer et al., 2005b). These findings have been interpreted in the context of women being more socially interested than men and it has been proposed that they reflect sex differences in the significance of nonverbal (i.e., social) cues (Hall et al., 2006; Schirmer and Kotz, 2006). However, in order to substantiate such a proposal it is necessary to demonstrate that the observed sex differences are indeed specific to nonverbal cues and not simply a reflection of a generally increased emotional responsiveness to events in the environment. The present study clearly links sex differences in the processing of nonverbal expressions to the social significance of these expressions. We found that both male and female listeners associated an increase in stimulus intensity with an increase in emotional arousal. Moreover, while the emotional arousal of nonvocal stimuli was perceived equally in female and male listeners, female listener gave higher arousal ratings than male listeners in response to vocal stimuli. Similarly, ERP measures indicated that both sexes engaged sensory processing resources equally when an unattended nonvocal stimulus sequence was interrupted by a sudden soft or loud nonvocal deviant. In this case, MMN amplitudes failed to show sex differences. However, MMN amplitudes were larger in female than in male listeners if a vocal stimulus sequence was interrupted by a sudden loud, but not soft, vocal deviant. While intensity changes were of comparable emotional significance in the nonvocal and the vocal stimulus sequence, they were socially significant only for the latter. Given that sex differences in sensory processing showed for vocal but not for nonvocal sounds, one may infer that these sex differences reflect differential sensitivity to social information. Specifically, vocal sounds are unique in that they provide information about the emotional state, intentions and behaviors of other individuals (e.g., Schirmer and Kotz, 2006). Moreover, a perceived increase (but not decrease) in speaker arousal has been associated with a heightened interest to engage listeners in an interaction (Knapp and Hall, 2006). Based on the present results, it appears that women more so than men are sensitive to such speaker intentions and hence may be better prepared to adjust their own intentions and behaviors appropriately. Although somewhat speculative, this interpretation is in accordance with other findings in the literature and fits well with the idea that women more so than men empathize with other individuals (for a review see Baron-Cohen, 2002). It is interesting to note that, while the MMN amplitude showed sex differences for vocal stimuli only, other aspects of
110
BR A I N R ES E A RC H 1 1 7 6 ( 2 00 7 ) 1 0 3 –11 2
the ERP showed sex differences for both vocal and nonvocal stimuli. Specifically, the MMN peaked generally earlier in female as compared to male listeners and female listeners showed a larger P300 across stimulus conditions. Thus, the temporal course of brain responses to intensity changes as well as the probability of attention capture seems to be enhanced in women relative to men independently of the social relevance of the stimulus. This is in line with other research indicating a generally greater responsiveness to emotional events in women than in men. More specifically, it matches a relatively recent proposal that compared to men, women have increased defensive reactivity associated with “heightened sensory intake and attention” (Bradley et al., 2001b). This proposal was derived from an investigation of physiological responses to affective pictures revealing sex differences in cardiac deceleration, startle potentiation and facial reactivity. These differences have been interpreted in the context of visual processing only. We show in the present study that they may also apply to auditory processing.
4.
Conclusions
Past research suggests that human vocalizations engage dedicated processing mechanisms in the brain. Here we provide further support for such a notion by demonstrating that intensity changes recruit more sensory processing resources if they are associated with vocal as compared to nonvocal sounds. Moreover, while vocal intensity deviants may call for sensory and attentional resources regardless of whether they are loud or soft, comparable resources are recruited for nonvocal intensity deviants only if they are loud and hence physically salient. The present study furthermore suggests that, while the latency of early sensory processing is comparable for vocal and nonvocal sounds, intensity decreases capture attention more rapidly if they occur within vocal as compared to nonvocal sound sequences. Given the importance of vocalizations in human interactions, it is not surprising that the primacy of vocal over nonvocal signals is seen across individuals. Nevertheless, the present results indicate that there are subtle interindividual differences in vocal processing. Specifically, women appear more sensitive than men to sudden loud as compared to soft vocal expressions. As these sex differences show above and beyond sex differences in the sensory and attentional facilitation of socially relevant and irrelevant intensity change, one can conclude that while vocal sounds may be special to both men and women they are particularly special to women. Moreover, the present results support the view that sex differences in nonverbal processing reflect greater social interest and interpersonal sensitivity in women as compared to men (Hall et al., 2006; Schirmer and Kotz, 2006).
5.
Experimental procedures
5.1.
Participants
The participants were 20 male (mean age = 20.04, SD 2.3) and 20 female (mean age = 19.38, SD 1.1) volunteers recruited from the
University of Georgia undergraduate research pool. All participants had normal or corrected to normal vision and were without any known hearing impairments.
5.2.
Stimuli
The present study made use of the stimulus material employed by Schirmer and colleagues (2005a,b). It comprised a set of four meaningless syllables (i.e., “dada”) spoken once with a happy, once with an angry and twice with a neutral tone of voice. Two neutral stimuli were selected as they matched in length and mean sound intensity, the happy and angry syllables, respectively. Praat (Boersma, 2001) was used to create a corresponding set of four nonvocal sounds that retained many of the characteristics of the original spoken syllables. Specifically, for each original sound, a sine waveform was synthesized at a modulated frequency following the original fundamental frequency contour, and the original sound envelope was applied to this synthesized sound. The intensity of both the original and the synthesized sounds was first normalized at the same root mean square value and then increased by 0 dB, 10 dB and 20 dB, making a total of 24 sounds. Only four out of the twenty-four sounds (i.e., angrily spoken syllables (0 and 20 dB) and their synthesized counterparts (0 and 20 dB)) were selected for the oddball paradigm employed in the present study. Based on a pilot study, intensity changes in these stimuli were found to be most effective in eliciting corresponding emotional changes in the listener. However, all 24 stimuli were presented in a stimulus rating subsequent to the EEG experiment to assess the effect of intensity change on perceived valence and arousal of vocal and nonvocal stimuli.
5.3.
Procedure
Participants passively listened to a sequence of angrily spoken and synthesized stimuli while watching a silent movie with subtitles. Ongoing EEG recordings were made while sounds were presented over headphones. There were four blocks: A, B, C and D. In blocks A and B, standards had a low sound intensity and deviants had a high sound intensity. In blocks C and D, standards had a high intensity and deviants had a low intensity. In blocks A and C, standards and deviants consisted of vocal stimuli, whereas in blocks B and D nonvocal stimuli were presented as standards and deviants. The order of blocks A through D was counterbalanced. Each block consisted of 700 standards and 100 deviants, which were presented in a pseudo-randomized order such that there was a minimum of 3 and a maximum of 11 standards between two successive deviants. Sounds were presented with a stimulus onset asynchrony (SOA) of 1200 ms such that the duration of each block totaled 16 min. After participants had completed the experimental blocks, they were asked to rate a total of 24 sounds (including the ones presented in the EEG study) with respect to their valence and arousal. Sounds were presented in a self-paced manner and in a randomized order. During both the valence and the arousal rating each sound occurred twice and the ratings from the second presentation were used for the stimulus analysis as by
BR A I N R ES E A RC H 1 1 7 6 ( 2 00 7 ) 1 0 3 –1 12
then participants had heard all sounds and were better able to rate them in comparison to one another. Participants were instructed to evaluate each sound on a 7-point scale ranging from very positive to very negative or from unaroused to highly aroused for the valence and arousal rating, respectively. The order in which participants rated valence and arousal was counterbalanced. In all, each participant heard each sound four times, twice for the valence rating and twice for the arousal rating. The entire experiment, including set up and debriefing, lasted approximately 90 min.
5.4.
ERP measurements
The EEG was recorded from 64 electrodes mounted in an elastic cap according to the modified 10–20 system. The electro-occulogram (EOG) was recorded using four electrodes, which were attached above and below the right eye and at the outer canthus of each eye. Additionally, one recording channel was placed on the nose tip. The data were recorded reference free using the ActiveTwo system from Biosemi. Offline, the scalp recordings were referenced against the recording from the nose and bipolar signals were computed from the vertical and horizontal EOG channels, respectively. The EEG was sampled at 256 Hz and a 0.5 to 20 Hz bandpass filter was applied offline. Artifactual epochs caused by drifts or muscle movements were rejected via visual inspection of the data. Eye movement artifacts were corrected using the algorithm developed by Gratton et al. (1983). EPRs were computed for an 800 ms timewindow starting at stimulus onset and using a 100 ms prestimulus baseline.
5.5.
Data analysis
Difference waves were computed by subtracting standards from physically identical deviants. That is, loud standards were subtracted from loud deviants and soft standards were subtracted from soft deviants. Visual inspection of these difference waves revealed an MMN that peaked approximately 200 ms following stimulus onset and a P300 that peaked approximately 400 ms following stimulus onset. Accordingly, a 100 to 300 ms timewindow and a 200 to 600 ms timewindow were selected and for each subject, electrode and condition the voltage minimum and maximum were determined within these two timewindows, respectively. Voltage minima for the MMN timewindow and voltage maxima for the P300 timewindow as well as their respective latencies were subjected to separate ANOVAs with intensity (loud/soft), stimulus (vocal/ nonvocal), electrode position (anterior/posterior) and hemisphere (left/right) as repeated measures factors and sex as a between subjects factor. Electrodes were grouped as follows: anteriorleft: AF3 AF7 F3 F5 FC5 FC3; anterior-right: AF4 AF8 F4 F6 FC4 FC6; posterior-left: CP5 CP3 P5 P3 PO7 PO3; posterior-right: CP6 CP4 P6 P4 PO8 PO4.
Acknowledgments The authors would like to thank Trevor Penney for helpful comments on earlier versions of the manuscript. This research
111
was supported by an Academic Research Grant (R-581-000-112/ 133) awarded to Annett Schirmer. REFERENCES
Alho, K., Winkler, I., Escera, C., 1998. Processing of novel sounds and frequency changes in the human auditory cortex: magnetoencephalographic recordings. Psychophysiology 35, 211–224. Bachorowski, A., Owren, M.J., 2001. Sounds of emotion: production and perception of affect-related vocal acoustics. Ann. N. Y. Acad. Sci. 1000, 244–265. Baron-Cohen, S., 2002. The extreme male brain theory of autism. Trends Cogn. Sci. 6, 248–254. Belin, P., Fecteau, S., Bedard, C., 2004. Thinking the voice: neural correlates of voice perception. Trends Cogn. Sci. 8, 129–135. Belin, P., Zatorre, R.J., Lafaille, P., Ahad, P., Pike, B., 2000. Voice-selective areas in human auditory cortex. Nature 403, 309–312. Berenbaum, H., Rotter, A., 1992. The relationship between spontaneous facial expressions of emotion and voluntary control of facial muscles. J. Nonverbal Behav. 16, 179–190. Boersma, P., 2001. Praat, a system for doing phonetics by computer. Glot Int. 5:9/10, 341–345. Bradley, M.M., Codispoti, M., Cuthbert, B.N., Lang, P.J., 2001a. Emotion and motivation I: defensive and appetitive reactions in picture processing. Emotion 1, 276–298. Bradley, M.M., Codispoti, M., Sabatinelli, D., Lang, P.J., 2001b. Emotion and motivation II: sex differences in picture processing. Emotion 1, 300–319. Corballis, M.C., 2004. The origins of modernity: was autonomous speech the critical factor? Psychol. Rev. 111, 543–552. Escera, C., Corral, M.J., Yago, E., 2002. An electrophysiological and behavioral investigation of involuntary attention towards auditory frequency, duration and intensity changes. Cogn. Brain Res. 14, 325–332. Ethofer, T., Anders, S., Wiethoff, S., Erb, M., Herbert, C., Saur, R., Grodd, W., Wildgruber, D., 2006. Effects of prosodic emotional intensity on activation of associative auditory cortex. NeuroReport 17, 249–253. Friedman, D., Cycowicz, Y.M., Gaeta, H., 2001. The novelty P3: an event-related brain potential (ERP) sign of the brain's evaluation of novelty. Neurosci. Biobehav. Rev. 25, 355–373. Gilad, Y., Man, O., Pääbo, S., Lancet, D., 2003. Human specific loss of olfactory receptor genes. Proc. Natl. Acad. Sci. U. S. A., vol. 100, pp. 3324–3327. Gilligan, C., 1982. On “in a different voice”: an interdisciplinary forum: reply. Signs 11, 324–333. Grandin, T., Johnson, C., 2005. Animals in Translation. Harvest Books, Orlando. Grandjean, D., Sander, D., Pourtois, G., Schwartz, S., Seghier, M.L., Scherer, K.R., Vuilleumier, P., 2005. The voices of wrath: brain responses to angry prosody in meaningless speech. Nat. Neurosci. 8, 145–146. Gratton, G., Coles, M.G.H., Donchin, E., 1983. A new method for offline removal of occular artifact. Electroencephalogr. Clin. Neurophysiol. 55, 468–484. Hall, J.A., 1978. Gender effects in decoding nonverbal cues. Psychol. Bull. 85, 845–857. Hall, J.A., Murphy, N.A., Mast, M.S., 2006. Recall of nonverbal cues: exploring a new definition of interpersonal sensitivity. J. Nonverbal Behav. 30, 141–155. Jaffee, S., Hyde, J.S., 2000. Gender differences in moral orientation: a meta-analysis. Psychol. Bull. 126, 703–726. Kanwisher, N., Yovel, G., 2006. The fusiform face area: a cortical region specialized for the perception of faces. Philos. Trans. R. Soc. Lond., B Biol. Sci. 361, 2109–2128.
112
BR A I N R ES E A RC H 1 1 7 6 ( 2 00 7 ) 1 0 3 –11 2
Kanwisher, N., McDermott, J., Chun, M.M., 1997. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311. Kashima, Y., Kokobu, T., Kashima, E.S., Boxall, D., Yamaguchi, S., Macrea, K., 2004. Culture and self: are there within-culture differences in self between metropolitan areas and regional cities? Personal. Soc. Psychol. Rev. 30, 816–823. Knapp, M.L., Hall, J.A., 2006. Nonverbal Communication in Human Interaction. Thomson, Wadsworth, Belmont, CA. Knight, R.T., 1984. Decreased response to novel stimuli after prefrontal lesions in man. Electroencephalogr. Clin. Neurophysiol. 59, 9–20. Knight, R.T., Scabini, D., Woods, D., Clayworth, C., 1989. Contributions of temporal parietal junction to the human auditory P3. Brain Res. 502, 109–116. Kutas, M., McCarthy, G., Donchin, E., 1977. Augmenting mental chronometry: P300 as a measure of stimulus evaluation time. Science 197, 792–795. LaFrance, M., Hecht, M.A., Paluck, E.L., 2003. The contingent smile: a meta-analysis of sex differences in smiling. Psychol. Bull. 129, 305–334. Magliero, A., Bashore, T.R., Coles, M.G.H., Donchin, E., 1984. On the dependence of P300 latency on stimulus evaluation processes. Psychophysiology 21, 171–186. Martynova, O., Kirjavainen, J., Cheour, M., 2003. Mismatch negativity and late discriminative negativity in sleeping human newborns. Neurosci. Lett. 340, 75–78. Mecklinger, A., Maess, B., Opitz, B., Pfeifer, E., Cheyne, D., Weinberg, H., 1998. A MEG analysis of the P300 in visual discrimination task. Electroencephalogr. Clin. Psychophysiol. 108, 45–56. Näätänen, R., Jacobsen, T., Winkler, I., 2005. Memory based of afferent processes in mismatch negativity (MMN): a review of evidence. Psychophysiology 42, 25–32. Palermo, R., Rhodes, G., 2007. Are you always on my mind? A review of how face perception and attention interact. Neuropsychologia 45, 75–92. Pell, M.D., 2006. Cerebral mechanisms for understanding emotional prosody in speech. Brain Lang. 96, 221–234. Pinker, S., 2003. Language as an adaptation to the cognitive niche. In: Christiansen, M., Kirby, S. (Eds.), Language Evolution: States of the Art. Oxford University Press, New York. Prinz, W., 2006. Measurement contra appearance: Oskar Pfungst examines Clever Hans. Psychol. Rundsch. 57, 106–111.
Polich, J., 2007. Updating P300: an integrative theory of P3a and P3b. Clin. Neurophysiol. doi:10.1016/j.clinph.2007.04.019. Pulvermuller, F., Shtyrov, Y., Ilmoniemi, R.J., Marslen-Wilson, W.D., 2006. Tracking speech comprehension in space and time. NeuroImage 31, 1297–1305. Rinne, T., Alho, K., Ilmoniemi, R.J., Virtanen, J., Näätänen, R., 2000. Separate time behaviors of the temporal and frontal mismatch negativity sources. NeuroImage 12, 14–19. Rinne, T., Degerman, A., Alho, K., 2005. Superior temporal and inferior frontal cortices are activated by infrequent sound duration decrements: an fMRI study. NeuroImage 26, 66–72. Rinne, T., Sarkka, A., Degerman, A., Schoeger, E., Alho, K., 2006. Two separate mechanisms underlie auditory change detection and involuntary control of attention. Brain Res. 1077, 135–143. Sams, M., Paavilainen, P., Alho, K., Näätänen, R., 1985. Auditory frequency discrimination and event-related potentials. Electroencephalogr. Clin. Neurophysiol. 62, 437–448. Scherer, K.R., Banse, R., 1996. Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70, 614–634. Schirmer, A., Kotz, S.A., 2006. Beyond the right hemisphere: brain mechanisms mediating vocal emotional processing. Trends Cogn. Sci. 10, 24–30. Schirmer, A., Kotz, S.A., Friederici, A.D., 2005a. On the role of attention for the processing of emotions in speech: sex differences revisited. Cogn. Brain Res. 24, 442–452. Schirmer, A., Striano, T., Friederici, A.D., 2005b. Sex differences in the pre-attentive processing of vocal emotional expressions. NeuroReport 16, 635–639. Schirmer, A., Lui, A., Maess, B., Escoffier, N., Chan, M., Penney, T.B., 2006. Task and sex modulate the brain response to emotional incongruity in Asian listeners. Emotion 6, 406–417. Schroeger, E., Wolff, C., 1998. Behavioral and electrophysiological effects of task-irrelevant sound change: a new distraction paradigm. Cogn. Brain Res. 7, 71–87. Simons, R.F., Graham, F.K., Miles, M.A., Chen, X., 2001. On the relationship of P3a and the Novelty-P3. Biol. Psychol. 56, 207–218. Squires, N.K., Squires, K.C., Hillyard, S.A., 1975. Two varieties of longlatency positive waves evoked by unpredictable auditory stimuli in man. Electroencephalogr. Clin. Neurophysiol. 39, 387–401. Van Lancker, D.R., Kreiman, J., Cummings, J., 1989. Voice perception deficits: neuroanatomical correlates of phonagnosia. J. Clin. Exp. Neuropsychol. 11, 665–674.