Cognitive Brain Research 23 (2005) 429 – 435 www.elsevier.com/locate/cogbrainres
Research report
Seeing and hearing others and oneself talk Mikko Sams*, Riikka Mfttfnen, Toni Sihvonen Laboratory of Computational Engineering, Helsinki University of Technology, PO Box 9203, FIN-02015 HUT, Finland Accepted 19 November 2004 Available online 8 January 2005
Abstract We studied the modification of auditory perception in three different conditions in twenty subjects. Observing other person’s discordant articulatory gestures deteriorated identification of acoustic speech stimuli and modified the auditory percept, causing a strong McGurk effect. A similar effect was found when the subjects watched their own silent articulation in a mirror and acoustic stimuli were simultaneously presented to their ears. Interestingly, a smaller but significant effect was even obtained when the subjects just silently articulated the syllables without visual feedback. On the other hand, observing other person’s or one’s own concordant articulation and silently articulating a concordant syllable improved identification of the acoustic stimuli. The modification of auditory percepts caused by visual observation of speech and silently articulating it are both suggested to be due to the alteration of activity in the auditory cortex. Our findings support the idea of a close relationship between speech perception and production. D 2004 Elsevier B.V. All rights reserved. Theme: Neural basis of behavior Topic: Cognition Keywords: Articulation; Audiovisual speech; McGurk effect; Multisensory integration; Sensorimotor interactions; Speech perception
1. Introduction In normal conversation, we both hear our companion’s speech and see some of the corresponding articulatory gestures. Perceptually, the audiovisual nature of speech is manifested in two ways. First, concordant visual information improves intelligibility of auditory speech. This is especially evident when speech is presented with poor signal-to-noise ratio [27,36], but seeing speech also improves identification of a difficult acoustic message when speech is presented without any noise [2]. Second, observing discordant articulatory gestures can change the auditory percept phonetically, as occurs in the McGurk effect [19]. When the acoustic syllable /pa/ is dubbed onto the visual presentation of articulatory gestures of /ka/, subjects typically hear /ta/ or /ka/ [35]. This change in perception occurs even when the acoustic syllables are identified perfectly when presented alone. In many experiments * Corresponding author. Fax: +358 0451 4830. E-mail address:
[email protected] (M. Sams). 0926-6410/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.cogbrainres.2004.11.006
studying the McGurk effect, the proportion of correctly identified acoustic syllables, indicating the strength of the effect, is 10% or less [16,35]. The change in perception is clearly auditory in nature and subjects seldom recognize the discrepancy between the auditory and visual components. However, the strength of the effect is quite individual and depends on the specific stimuli used in the experiments [35]. Observing articulatory movements of a talker has been shown to modulate activity of the human auditory (primary and/or non-primary) cortex, which accords with the perceptual bauditorinessQ of the McGurk effect. Magnetoencephalographic (MEG) studies suggest that visual information about a speaking face has access to the auditory cortex within 200 ms from the stimulus onset during audiovisual speech perception [21,33,34]. Also, functional magnetic resonance imaging (fMRI) studies have shown that lip-reading may modify activity in the primary and secondary auditory cortices during audiovisual speech perception [5]. Thus, visual speech could influence auditory speech perception by modifying activation of auditory cortical areas. Evidence of the primary auditory cortex
430
M. Sams et al. / Cognitive Brain Research 23 (2005) 429–435
activation during silent lip-reading has been found in some but not in all recent imaging studies [3,4,28]. MEG studies have also indicated that a speaker’s own utterances can modulate reactivity of the human auditory cortex [7,13,24,25]. When subjects read aloud, the auditory cortex responses to the probe tones were small and delayed in comparison to those obtained when subjects read silently [25]. Curio and his coworkers [7] recorded MEG responses to self-produced vowels. The M100 response, peaking at 100 ms after the voice onset, was delayed in the left hemisphere relative to the right. No such asymmetry was observed when the utterances were taped and replayed to the subjects. Numminen and Curio [24] showed that even subjects’ silent articulation can influence the processing of speech sounds in the auditory cortex. M100 response was damped in the left auditory cortex when the subjects silently produced the same vowel as the one presented to their ears. The effect was specific to the stimulus type and was not found when the utterance and presented vowel did not match. The activation of auditory cortical areas during speech production has been demonstrated also in PET studies [12,29,30]. The rate at which the subjects whispered syllables correlated with the increase of cerebral blood flow at the left planum temporale and at the left posterior perisylvian cortex even when the auditory input was totally masked by white noise [30]. These areas contain secondary auditory areas and are known to be involved in perception of speech sounds [31]. Paus and coworkers [30] suggested that Broca’s area and/or left primary face motor area modulate activity in the secondary auditory cortical areas. In the present psychophysical study, we examined whether auditory perception of speech stimuli is modified by subjects’ own silent articulation. Such a modification would be expected on the basis of the above-described neurophysiological studies. We also studied whether auditory percepts are modified when subjects silently articulate but also see their own articulation in a mirror. For comparison, our subjects also identified both concordant (acoustic and visual /pa/, acoustic and visual /ka/) and discordant (acoustic /pa/ dubbed onto /ka/ articulation) audiovisual utterances of another speaker. Seeing a concordant utterance was expected to improve the identification of the acoustic syllable. In contrast, seeing a discordant utterance was expected to decrease the identification of the acoustic syllable and produce a strong McGurk effect.
/ka/ or /pa/ was simultaneously presented via earphones. The observed or articulated syllables were either concordant with the acoustic syllable (observed/articulated /ka/ + acoustic /ka/, observed/articulated /pa/ + acoustic /pa/) or discordant with it (observed/articulated /ka/ + acoustic /pa/). The subjects were required to report what they heard in the following conditions (Fig. 1). In the audiovisual condition, the subjects saw an unknown female talker (height of the face about 188 of visual angle) to articulate either /pa/ or /ka/ simultaneously with the acoustic /pa/ or /ka/ synchronously presented to them via earphones. In 2/3 of the stimuli, the syllables were concordant: visual /pa/ and acoustic /pa/, and visual /ka/ and acoustic /ka/. In 1/3 of the tokens, the acoustic syllable /pa/ was dubbed onto the articulation of /ka/. The subjects indicated on a piece of paper whether they heard bkaQ, bpaQ, or btaQ. In the mirror condition, the subjects were asked to observe their own articulation seen in the mirror. The written articulation instruction, either bkaQ or bpaQ (height about 88), was displayed on a computer screen. After reading the instruction, the subjects turned their gaze about 408 to the left to see themselves in a mirror. The height of their own face was about 128 of visual angle. Then they pressed a space bar to trigger the acoustic stimulus and articulated silently the syllable indicated on the screen, in synchrony with the acoustic stimulus presented via earphones. In 2/3 of the stimuli, the acoustic stimulus and the articulation were concordant. In 1/3 of the tokens, the acoustic syllable /pa/ was presented when the subject articulated /ka/. The subjects reported what they heard by pressing one of the response buttons: bkaQ, bpaQ, or btaQ. The articulation condition was similar to the mirror condition, but the subjects did not see their articulatory movements. After reading the instruction, the subjects turned their gaze about 408 to the left to fixate at a cross. The control condition was similar to the articulation condition, but the subjects identified the acoustic syllables
2. Methods Twenty voluntary healthy subjects (native speakers of Finnish, 8 females, 21–33 years old, two left-handed) with normal or corrected-to-normal vision participated in the experiments. None of them were aware of the purpose of the experiment. They silently articulated or observed articulation of the Finnish syllables /ka/ or /pa/ when the acoustic
Fig. 1. The four main experimental conditions. (A) Audiovisual condition, (B) mirror condition, (C) articulation condition, and (D) auditory control condition. See text for details.
M. Sams et al. / Cognitive Brain Research 23 (2005) 429–435
without executing any articulatory gestures and responded as in the mirror and articulation conditions. The subjects saw the written bpaQ or bkaQ as in the articulation and mirror conditions and listened to the acoustic syllables when they had turned their gaze to the fixation cross. The baseline condition was similar to the auditory control condition, except that the letter X was presented on a computer screen instead of the written bkaQ or bpaQ. To avoid the possibility that the subjects would have guessed the real nature of the stimuli in the critical conditions and doubt their auditory percepts, the last two conditions were always the auditory control condition and the baseline condition, in this order. The order of the other conditions was randomized. The intensity of the acoustic syllables was about 80 dB SPL (duration 123 ms, digitized at 22 256 Hz). They were embedded in continuous white noise (80 dB SPL) to mask the possible sounds caused by subject’s own articulation and to decrease identification of the stimuli in the baseline condition. We intended not to have perfect identification of the syllables in the latter condition, because we were interested both in improved identification due to concordant additional information (observing articulation, own silent articulation) as well as decreased identification due to conflicting additional information. Each stimulus type (concordant /pa/; concordant /ka/; discordant stimulus consisting of acoustic /pa/ and visual/articulating /ka/) was presented ten times to each subject in random order in each experimental condition. Before starting the experiment, silent articulation in synchrony with the acoustic stimuli was practiced without background noise. The experiment was started when the subject executed the articulatory gestures silently so that the investigator did not notice any asynchrony between them and the acoustic stimuli. The synchrony between a subject’s articulation and the acoustic stimulus was monitored via video camera and TV monitor. An extra articulation condition was performed before the experimental conditions for additional practice. The study included also a condition which was similar to the articulation condition, except that only acoustic /pa/ stimuli were presented with /pa/ or /ka/ articulation. In this condition, each stimulus type was presented 15 times. This stimulus presentation was quite different from that in other experimental conditions and the results are not reported here. The effect of the experimental condition on perception was analyzed by repeated-measures one-way ANOVAS (conditions: audiovisual, mirror, articulation, control, baseline). Differences between specific means were analyzed by Newman–Keuls post hoc tests.
3. Results Fig. 2 illustrates the proportions of correctly identified acoustic /pa/ syllables in different experimental conditions (white bars = discordant stimuli, grey bars = concordant
431
Fig. 2. The proportions of correctly identified acoustic /pa/ syllables. The white bars illustrate the correct identification of acoustic /pa/ when the observed and/or the own articulation was the discordant /ka/. The correct identification when the observed and/or the own articulation was the concordant /pa/ is indicated with grey bars. A grey horizontal line shows the identification of acoustic /pa/ in the baseline condition.
stimuli). The horizontal grey line indicates that in the baseline condition the subjects correctly identified 68 F 6% (mean F SEM) of the syllables. Visual inspection of Fig. 2 shows that seeing the articulatory gestures, either of another person (the audiovisual condition) or of oneself (the mirror condition), had a very strong effect on stimulus identification. In comparison to the baseline condition, discordant visual stimuli strongly deteriorated correct identification of the acoustic syllable, but concordant improved it. A similar, somewhat smaller effect was found in the articulation condition. Now silently articulating /ka/ decreased correct identification of acoustic /pa/, but concordant articulation increased it. The written syllable in the auditory control condition also influenced the identification of the following /pa/ syllable. Identification was less accurate when the written syllable was discordant and more accurate when it was concordant. 3.1. Perception of discordant syllables The proportions of correctly identified acoustic /pa/ syllables in different experimental conditions are shown in Fig. 3. ANOVA indicated a significant main effect of the condition ( F(4,76) = 29.10; P b 0.00001). Significant differences in the proportion of /pa/ responses between different conditions are indicated by stars in Fig. 3. When the acoustic /pa/ syllable was presented simultaneously with a /ka/ articulation in the audiovisual condition, only 6% of the acoustic syllables were identified correctly. The proportion of correct responses was slightly higher in the mirror condition (17%), but did not differ significantly from that in the audiovisual condition. In the articulation condition, the proportion of correctly identified acoustic syllables (32%) was significantly higher than in the mirror condition, but significantly lower than in the control condition where the proportion of correctly identified /pa/ syllables was 50%. The proportion of correct identifications was significantly smaller (18% units) in the control condition than in the
432
M. Sams et al. / Cognitive Brain Research 23 (2005) 429–435
Fig. 3. The mean proportions (FSEMs) of correctly identified acoustic /pa/ syllable in different experimental conditions. The significant differences between the means (Newman–Keuls tests) are indicated by stars (*P b 0.05, **P b 0.001). Circles denote the condition to which the other conditions were compared.
baseline condition, indicating that merely seeing the written syllable on the screen influenced the responses. The proportions of discordant stimuli that were perceived as /ta/ and /ka/ are depicted in Fig. 4. The experimental condition had no effect on the proportion of /ta/ responses ( F(4,76) = 1.10; P = 0.3613). ANOVA revealed a significant main effect of the experimental condition on the proportion of /ka/ responses ( F(4,76) = 26.35; P b 0.00001). Post hoc comparisons showed that the proportion of /ka/ percepts did not differ in the audiovisual (64%) and mirror (61%) conditions, but all other differences were significant (Fig. 4). The effects of the experimental condition on the proportions of /pa/ and /ka/ responses are mirror images (cf. Figs. 3 and 4). 3.2. Perception of concordant syllables Perception of concordant /pa/ syllables is shown in Fig. 5. The identification of acoustic /pa/ syllables was perfect or close to it in the audiovisual, mirror, and articulation conditions. Experimental condition influenced the proportion of /pa/ identifications ( F(4,76) = 19.67; P b 0.00001). Post hoc comparisons showed that the proportion of /pa/ responses in the audiovisual, mirror, and articulation conditions differed significantly from that in the control
Fig. 4. The mean proportions (FSEMs) of acoustic /pa/ perceived as /ta/ or /ka/. The significant differences between the means (Newman–Keuls tests) for /ka/ percepts are indicated by stars as in Fig. 3.
Fig. 5. Identification of concordant /pa/ stimuli. The significant differences between the means (Newman–Keuls tests) for /ta/ percepts are indicated by stars as in Fig. 3.
(81%) and baseline (68%) conditions ( P b 0.01). In addition, the proportion of /pa/ responses was significantly different in the control and baseline conditions ( P b 0.01). Rather poor identification of /pa/ syllables in the baseline condition (68%) is due to the white-noise masker. Experimental condition influenced the proportion of /ta/ responses ( F(4,76) = 12.29; P b 0.00001). Post hoc comparisons (Fig. 5) showed that the proportion of /ta/ responses in the audiovisual, mirror, and articulation conditions differed significantly from that in the control (14%) and baseline (23%) conditions. In addition, the proportion of /ta/ responses was significantly different in the control and baseline conditions. Experimental condition influenced the proportion of /ka/ identifications ( F(4,76) = 5.97; P b 0.0003). Newman–Keuls comparisons showed that the proportion of /ka/ responses in the baseline condition (9%) was significantly higher than the audiovisual, mirror, and articulation conditions ( P b 0.01). In addition, the proportion of /ka/ responses differed in the articulation (1%) and control (6%) conditions ( P b 0.05). Fig. 6 shows the identification of concordant /ka/ syllables in different conditions. The very poor identifica-
Fig. 6. Identification of concordant /ka/ stimuli. The significant differences between the means (Newman–Keuls tests) for /ka/ percepts are indicated by stars as in Fig. 3. The significant differences for /pa/ percepts are otherwise identical, but the proportions in the control and baseline conditions differed at P b 0.05.
M. Sams et al. / Cognitive Brain Research 23 (2005) 429–435
tion of acoustic /ka/ syllables in the baseline condition was due to the strong masking by the background noise. We did not check identification of the acoustic /ka/ in the absence of noise. Experimental condition had a significant effect on the proportion of both /ka/ ( F(4,76) = 25.75; P b 0.00001) and /pa/ responses ( F(4,76) = 20.42; P b 0.00001). Post hoc comparisons (see Fig. 6) showed that, for both responses, identification in the control and baseline conditions was significantly different from that in all other conditions. Identification did not differ between the audiovisual and mirror conditions, neither between the mirror and articulation conditions. The effect of the experimental condition on the proportion of /ta/ responses was not statistically significant ( F(4,76) = 1.67; P = 0.1668).
4. Discussion As expected, observing other person’s articulation improved identification of concordant acoustic syllables and deteriorated identification of discordant ones. Seeing oneself articulating in a mirror produced very similar effects. Moreover, a similar but slightly smaller effect was obtained when the subjects only silently articulated the stimuli: discordant utterances decreased and concordant ones improved identification of the acoustic syllables. Complicating the interpretation of the results, subject’ responses were also significantly influenced by merely seeing the written syllable. In the audiovisual condition, seeing a conflicting /ka/ articulation dropped the proportion of correct identification of acoustic /pa/ stimuli close to zero. Most of the stimuli were perceived as /ka/, a smaller proportion as /ta/. In our previous study, using a different speaker, we also dubbed acoustic /pa/ (no noise) onto /ka/ articulation [35]. In that study, the proportions of the /pa/, /ka/, and /ta/ percepts were 3%, 49%, and 32%, quite similar to the present study (response alternatives included the category botherQ). Wellarticulated Finnish syllables /ka/ and /ta/ are quite distinct and are not easily confused [26]. The main difference in them is that during /ta/ articulation the tongue is clearly visible between the teeth (unlike in articulating /ta/ in English). The relatively high proportion of bvisualQ responses to the combination of acoustic /pa/ and visual /ka/ found with Finnish stimuli might be due to the obvious difference in /ta/ and /ka/. Watching concordant articulation in the audiovisual condition improved identification of both /pa/ and /ka/ syllables. The improvement (60% units) was much larger for /ka/, obviously because it was identified very poorly in the baseline condition. In general, the influence of seeing speech is the stronger, the more ambiguous the acoustic signal. Quite intriguingly, even seeing one’s own articulation in a mirror influenced the identification of acoustic stimuli.
433
Moreover, this effect was very similar to that in the audiovisual condition. This similarity might mean similarity of the underlying perceptual mechanism. However, there is one crucial difference between the two conditions. In the mirror condition, in addition to seeing one’s own articulating face, the subjects also articulated the syllables. Silent articulation per se also influenced identification of the acoustic syllables. However, it appears that seeing one’s own articulation had an additional influence. The effect of the discordant syllable was significantly stronger in the mirror than in the articulation condition. An insignificant trend toward stronger effects in the mirror than in the articulation condition was seen also for the concordant /ka/ syllables. (For concordant /pa/ syllables, the proportion of correct /pa/ identifications was almost 100% in both conditions.) We tentatively suggest that the McGurk effect in the audiovisual condition and at least part of the effect obtained by seeing oneself in a mirror are produced by the same mechanism. If it is assumed that the effect of silent articulation is similar in the articulation and mirror conditions, the strength of the McGurk effect in the latter condition would be quite small. However, the assumption that the effects of silent articulation and seeing one’s own articulation in a mirror are linearly additive may well be incorrect. In the audiovisual condition, the visual stimuli were the same for each subject, but in the mirror condition they were different. In addition, the synchrony of the visual and auditory stimuli was constant in the audiovisual condition but most probably varied quite a lot in the mirror condition. These differences in stimuli make the comparison of the effects in the audiovisual and mirror conditions difficult. Surprisingly, identification of the acoustic stimuli was modified by one’s own silent articulation. For discordant stimuli, the proportion of /pa/ responses was about 20% units lower in the articulation than in the control condition (the proportion of /ka/ percepts being 20% units higher). For concordant /ka/ stimuli, the proportion of correctly identified /ka/ stimuli was also about 20% units higher than in the control condition. For concordant /pa/ stimuli, the proportion of correctly identified /pa/ stimuli was 14% (not significantly) higher than the control condition; the proportion of /ka/ percepts was slightly (5% units) but significantly higher in the control than in the articulation condition. Modifications during silent articulation were qualitatively similar to those obtained during observing other person’s articulation, but smaller. Previous neurophysiological studies have shown that silent articulation modifies the activity of the auditory cortex (see Introduction). Our present findings can be explained by assuming that an efferent copy of the motor command to the speech effectors is sent in parallel to the speech processing mechanisms, perhaps to the auditory cortex. This copy might be involved in a comparator mechanism, anticipating the consequences of the action [1,15,40]. If the auditory consequences of the action match the anticipation, the action was successful. If a
434
M. Sams et al. / Cognitive Brain Research 23 (2005) 429–435
mismatch is registered, the action and underlying motor commands have to be adjusted. Articulation effects in the auditory cortex are probably phoneme specific [25] and influence the activation pattern caused by the acoustic stimulus. When articulation and acoustic syllable are concordant, the corresponding perception is enhanced. When they are discordant, the auditory percept is distorted toward the articulated syllable. The articulatory effect resembles the McGurk effect. In both cases, auditory perception is changed and the possible reason is modified activation of the auditory cortex. We suggest that both effects are generated by, at least partly, the same befference copyQ mechanism. A good candidate to influence the auditory cortex is Broca’s speech area, known to be involved in speech production. Its monkey homologue (F5) contains dmirror neuronsT, which are activated during perception and execution of hand and mouth actions [9,32]. Similar mirror neurons seem to exist also in Broca’s area [14,22,23]. There is also evidence that Broca’s area is involved in lip-reading [6,28]. Broca’s area thus seems to be activated not only when speakers articulate, but also when they observe someone else’s articulatory gestures. Broca’s area could then, either directly or indirectly, send an efference copy to the auditory cortical areas during both production and perception of articulatory gestures. Our present results support the idea of a close connection between speech production and perception. Results agree with the idea that during visual speech perception, a speaker’s articulatory gestures are internally reproduced, probably in the mirror neuron system. A similar suggestion was recently made by Kerzel and Bekkering [17]. They studied how seeing a video of an articulating mouth influenced the cued production of similar or dissimilar articulations. The articulatory movements on the video were irrelevant for the task performance. The authors found a reliable interference of the irrelevant articulation on the speed of motor response, the reaction times being shorter for concordant stimuli. Strong support for the sensory-motor interactions during speech perception comes from recent transcranial magnetic stimulation (TMS) studies. Motor potentials recorded from the articulatory muscles are enhanced not only during visual [37,39] but also during auditory speech perception [8,39] when the mouth area of the left motor cortex is stimulated. The enhancement seems to be specific to those muscles that are used in articulation of the perceived stimuli [8,39]. These findings have been proposed to support the so-called motor theory of speech perception [18]. Perception of the syllables was influenced significantly also by the written cue syllable the subjects saw before they heard the acoustic stimuli. Such an irrelevant cue might bias the subjects’ responses without influencing perception mechanisms. When observers try to make sense of an ambiguous stimulus, they might use all available possible cues to restrict the number of response alternatives. Previously, it was shown that an orthograph presented simultaneously with an acoustic stimulus does influence
perception marginally but not significantly [10]. In the present study, the acoustic stimuli were imbedded in noise, which probably increased the effectiveness of the written syllable. If response bias would explain the perceptual modifications in the control condition, could it also explain the modifications in the mirror and articulation conditions? Although we cannot rule out this possibility, we consider it unlikely. The modifications were clearly stronger in the mirror and articulation conditions than in the control condition. In the latter two conditions the subjects had to keep the to-be-articulated syllable in memory till the moment of articulation. It could be argued that this strengthens the effect of the orthograph and increases the response bias. In that case, perceptual modifications should have been equal in the articulation and mirror conditions, but they were not. On the other hand, the results obtained in the audiovisual condition cannot be explained by a response bias (no written syllables were presented in this condition). However, the pattern of modifications was quite similar as in the mirror and articulation conditions. This is in line with the interpretation that the effects in the three conditions share a common mechanism. We recently studied the effect of silently articulating a Finnish vowel /&/ or /K/ on the perception of acoustic vowels on the /&/–/K/ continuum [38]. Silent articulation of /&/ shifted the phoneme boundary significantly toward /K/. Importantly, a similar shift was not obtained when the same subjects were instructed to position their articulation system as if they would say /&/ or /K/, but not to silently articulate the vowel. The response bias should be quite similar in the two conditions, so these results support our interpretation that silent articulation modifies perception. We also suggest another possibility to explain the effect of the written syllable on auditory percepts. There is evidence from imaging studies that when subjects read real words or pseudowords, motor brain structures involved in articulation are activated [11,20]. It is quite possible that the orthographic effect found in this study is due to automatic activation of similar motor speech structures that are activated in the articulatory and mirror conditions. In conclusion, our results suggest that the McGurk effect occurs when we see our own articulatory gestures, conflicting with the acoustic stimuli, in a mirror. In addition, even our own silent articulation, without visual feedback, may alter auditory percepts. Both modifications could be explained by alternation of the activity in auditory cortical areas.
Acknowledgments This study is dedicated to the late Alvin Liberman. The authors thank Yurii Alexandrov and Riitta Hari for comments on the manuscript. The study was supported by the Academy of Finland.
M. Sams et al. / Cognitive Brain Research 23 (2005) 429–435
References [1] P.K. Anokhin, Biology and neurophysiology of the conditioned reflex and its role in adaptive behavior, Pergamon, Oxford, 1974. [2] P. Arnold, F. Hill, Bisensory augmentation: a speechreading advantage when speech is clearly audible and intact, Br. J. Psychol. 92 (2001) 339 – 355. [3] L. Bernstein, E. Auer, J. Moore, C. Ponton, M. Don, M. Singh, Visual speech perception without primary auditory cortex activation, NeuroReport 13 (2002) 311 – 315. [4] G. Calvert, E. Bullmore, M. Brammer, R. Campbell, S. Williams, P. McGuire, P. Woodruff, S. Iversen, A. David, Activation of auditory cortex during silent lipreading, Science 276 (1997) 593 – 596. [5] G.A. Calvert, M.J. Brammer, E.T. Bullmore, R. Campbell, S.D. Iversen, A.S. David, Response amplification in sensory-specific cortices during crossmodal binding, NeuroReport 10 (1999) 2619 – 2623. [6] R. Campbell, M. MacSweeney, S. Surguladze, G. Calvert, P. McGuire, J. Suckling, M. Brammer, A.S. David, Cortical substrates for the perception of face actions: an fMRI study of the specificity of activation for seen speech and for meaningless lower-face acts (gurning), Cogn. Brain Res. 12 (2001) 233 – 243. [7] G. Curio, G. Neuloh, J. Numminen, V. Jousm7ki, R. Hari, Speaking modifies voice-evoked activity in the human auditory cortex, Hum. Brain Mapp. 9 (2000) 183 – 191. [8] L. Fadiga, L. Craighero, G. Buccino, G. Rizzolatti, Speech listening specifically modulates the excitability of tongue muscles: a TMS study, Eur. J. Neurosci. 15 (2002) 399 – 402. [9] P. Ferrari, V. Gallese, G. Rizzolatti, L. Fogassi, Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex, Eur. J. Neurosci. 17 (2003) 1703 – 1714. [10] C.A. Fowler, D.J. Dekle, Listening, J. Exp. Psychol. Hum. Percept. Perform. 17 (1991) 816 – 828. [11] P. Hagoort, P. Indefrey, C. Brown, H. Herzog, H. Steinmetz, R. Seits, The neural circuitry involved in the reading of German words and pseudowords: a PET study, J. Cogn. Neurosci. 11 (1999) 383 – 399. [12] S. Hirano, H. Kojima, Y. Naito, I. Honjo, Y. Kamoto, H. Okazawa, K. Ishizu, Y. Yonekura, Y. Nagahama, H. Fukuyama, J. Konishi, Cortical processing mechanism for vocalization with auditory verbal feedback, NeuroReport 8 (1997) 2379 – 2382. [13] J.F. Houde, S.S. Nagarajan, K. Sekihara, M.M. Merzenich, Modulation of the auditory cortex during speech: an MEG study, J. Cogn. Neurosci. 14 (2002) 1125 – 1138. [14] M. Iacoboni, R.P. Woods, M. Brass, H. Bekkering, J.C. Mazziotta, G. Rizzolatti, Cortical mechanisms of human imitation, Science 286 (1999) 2526 – 2528. [15] M. Jeannerod, The 25th Bartlett lecture. To act or not to act: perspectives on the representation of actions, Q. J. Exp. Psychol., A 52 (1999) 1 – 29. [16] J.A. Jones, K.G. Munhall, The effects of separating auditory and visual sources on audiovisual integration of speech, Can. Acoust. 25 (1997) 13 – 19. [17] D. Kerzel, H. Bekkering, Motor activation from visible speech: evidence from stimulus response compatibility, J. Exp. Psychol. Hum. Percept. Perform. 26 (2000) 634 – 647. [18] A. Liberman, I. Mattingly, The motor theory of speech revised, Cognition 21 (1985) 1 – 36. [19] H. McGurk, J. MacDonald, Hearing lips and seeing voices, Nature 264 (1976) 746 – 748. [20] A. Mechelli, M.L. Gorno_Tempini, C.J. Price, Neuroimaging studies
[21]
[22] [23] [24]
[25]
[26]
[27] [28]
[29]
[30]
[31] [32] [33]
[34]
[35]
[36] [37]
[38]
[39]
[40]
435
of word and pseudoword reading: consistencies, inconsistencies, and limitations, J. Cogn. Neurosci. 15 (2003) 260 – 271. R. Mfttfnen, C.M. Krause, K. Tiippana, M. Sams, Processing of changes in visual speech in the human auditory cortex, Cogn. Brain Res. 13 (2002) 417 – 425. N. Nishitani, R. Hari, Temporal dynamics of cortical representation for action, Proc. Natl. Acad. Sci. U. S. A. 97 (2000) 913 – 918. N. Nishitani, R. Hari, Viewing lip forms: cortical dynamics, Neuron 36 (2002) 1211 – 1220. J. Numminen, G. Curio, Differential effects of overt, covert and replayed speech on vowel-evoked responses of the human auditory cortex, Neurosci. Lett. 272 (1999) 29 – 32. J. Numminen, R. Salmelin, R. Hari, Subject’s own speech reduces reactivity of the human auditory cortex, Neurosci. Lett. 265 (1999) 119 – 122. J.-L. Olive´s, R. Mfttfnen, J. Kulju, M. Sams, Audio-visual speech synthesis for Finnish, in: D. Massaro (Ed.), AVSP’99, auditory-visual speech processing, Santa Cruz, California, USA, 1999, pp. 157 – 162. J.J. O’Neill, Contributions of the visual components of oral symbols to speech comprehension, J. Speech Hear. Disord. 19 (1954) 429 – 439. E. Paulesu, D. Perani, V. Blasi, G. Silani, N.A. Borghese, U. De_Giovanni, S. Sensolo, F. Fazio, A functional–anatomical model for lipreading, J. Neurophysiol. 90 (2003) 2005 – 2013. T. Paus, S. Marrett, K. Worsley, A. Evans, Imaging motor-to-sensory discharges in the human brain: an experimental tool for the assessment of functional connectivity, NeuroImage 4 (1996) 78 – 86. T. Paus, D.W. Perry, R.J. Zatorre, K.J. Worsley, A.C. Evans, Modulation of cerebral blood flow in the human auditory cortex during speech: role of motor-to-sensory discharges, Eur. J. Neurosci. 8 (1996) 2236 – 2246. C.J. Price, The functional anatomy of word comprehension and production, Trends Cogn. Sci. 2 (1998) 281 – 288. G. Rizzolatti, M.A. Arbib, Language within our grasp, Trend Neurosci. 21 (1998) 188 – 194. M. Sams, S. Lev7nen, A neuromagnetic study of the integration of audiovisual speech in the brain, in: Y. Koga (Ed.), Brain Topography Today, Elsevier, Amsterdam, 1998, pp. 47 – 53. M. Sams, R. Aulanko, M. H7m7l7inen, R. Hari, O.V. Lounasmaa, S.-T. Lu, J. Simola, Seeing speech: visual information from lip movements modifies activity in the human auditory cortex, Neurosci. Lett. 127 (1991) 141 – 145. M. Sams, P. Manninen, V. Surakka, P. Helin, R. K7ttf, McGurk effect in Finnish syllables, isolated words, and words in sentences: effects of word meaning and sentence context, Speech Commun. 26 (1998) 75 – 87. W. Sumby, I. Pollack, Visual contribution to speech intelligibility in noise, J. Acoust. Soc. Am. 26 (1954) 212 – 215. M. Sundara, A.K. Namasivayam, R. Chen, Observation–execution matching system for speech: a magnetic stimulation study, NeuroReport 12 (2001) 1341 – 1344. J. Tuomainen, R. Hari, R. Mfttfnen, M. Sams, Motor and auditory interactions: silent articulation affects vowel categorization, First Dutch Neuro-Endo-Psycho Meeting, 4–7 June 2002, Doorwerth, The Netherlands, 2002. K.E. Watkins, A.P. Strafella, T. Paus, Seeing and hearing speech excites the motor system involved in speech production, Neuropsychologia 41 (2003) 989 – 994. E. von Holst, H. Mittelstaedt, Das Reafferenzprinzip: Wechslewirkungen zwischen Zentralnervensystem und Peripherie, Naturwissenschaften 37 (1950) 464 – 475.