Available online at www.sciencedirect.com
Brain and Language 104 (2008) 159–169 www.elsevier.com/locate/b&l
Inadequate and infrequent are not alike: ERPs to deviant prosodic patterns in spoken sentence comprehension Anja Mietz a, Ulrike Toepel
b,c,*
, Anja Ischebeck d, Kai Alter
b,e
a Department of Linguistics, University of Potsdam, Germany Max Planck Institute for Cognitive and Brain Sciences, Leipzig, Germany Neuropsychology and Neurorehabilitation Service, CHUV Lausanne and University of Lausanne, Switzerland d Medical University Innsbruck, Austria e School of Neurology, Neurobiology & Psychiatry, University of Newcastle, UK b
c
Accepted 7 March 2007 Available online 10 April 2007
Abstract The current study on German investigates Event-Related brain Potentials (ERPs) for the perception of sentences with intonations which are infrequent (i.e. vocatives) or inadequate in daily conversation. These ERPs are compared to the processing correlates for sentences in which the syntax-to-prosody relations are congruent and used frequently during communication. Results show that perceiving an adequate but infrequent prosodic structure does not result in the same brain responses as encountering an inadequate prosodic pattern. While an early negative-going ERP followed by an N400 were observed for both the infrequent and the inadequate syntax-to-prosody association, only the inadequate intonation also elicits a P600. 2007 Elsevier Inc. All rights reserved. Keywords: Speech processing; Vocative; Syntax–prosody mismatch; ERP; N400; P600; Early negativity
1. Introduction Prosody or suprasegmental phonology, respectively, is an inherent aspect of spoken language. In particular, pitch, intensity, and duration variations, as well as speech pauses are prosodic features which co-occur with other linguistic information (e.g. semantic and syntactic) in spoken language (Selkirk, 1984; Shattuck-Hufnagel & Turk, 1996). Although the term ‘‘prosody’’ is commonly used (especially in the literature on language processing) the expression ‘‘suprasegmental phonology’’ better emphasises that prosodic phenomena often relate to elements in speech which are larger than a single segment. In particular, syllables present the relevant domains for word-level prosodic features like stress (re´cord vs. reco´rd). However, the supraseg* Corresponding author. Address: Neuropsychology and Neurorehabilitation Service, Centre Hospitalier Universitaire Vaudois, Rue Pierre Decker 5, CH-1011 Lausanne, Switzerland. Fax: +41 21 3141319. E-mail address:
[email protected] (U. Toepel).
0093-934X/$ - see front matter 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.bandl.2007.03.005
mental course of the global fundamental frequency of a speaker also spans complete syntactic phrases or sentences allowing for, e.g. a differentiation between questions and statements in intonation languages like English, Dutch or German. Numerous psycholinguistic studies have revealed prosodic influences on the processing of the syntactic structure of spoken utterances. The majority of these studies report that the congruence between the prosodic and syntactic structure of sentences facilitates parsing (Marslen-Wilson, Tyler, Warren, Grenier, & Lee, 1992; Pynte & Prieur, 1996; Schafer, Speer, Warren, & White, 2000; Schepman & Rodway, 2000). On the other hand, inconsistencies between the prosodic and the syntactic interpretation source can cause garden path phenomena resulting in processing difficulties (Marslen-Wilson et al., 1992; Warren, Grabe, & Nolan, 1995; Speer, Kjelgaard, & Dobbroth, 1996). Moreover, studies have shown that prosody is used by listeners very early during language comprehension to determine the continuation of sentences by syntactic means
160
A. Mietz et al. / Brain and Language 104 (2008) 159–169
(Marslen-Wilson et al., 1992; Warren et al., 1995). In a cross-modal naming paradigm, Marslen-Wilson et al. tested listeners’ abilities to use suprasegmental prosodic cues for the disambiguation of the following sentences: (1) The workers considered the last offer from the management was a real insult. (2) The workers considered the last offer from the management of the factory. For this purpose, volunteers were acoustically presented first with sentence fragments like the italicized parts of example (1) and (2). Second, the visual probe word ‘was’ followed. This probe was an appropriate continuation of the prosody of sentence fragment (1), but not of (2). The naming latencies for both continuations were measured. Results showed that the naming latencies for the inappropriate probe word (2) were longer than for the probe consistent with the preceding prosody (1). This result provided evidence against a default syntactic parsing mechanism as suggested by Frazier (1979, 1987, 1990) in auditory language comprehension. Frazier suggested that a default parse would always be in favour of a minimal attachment construction or a direct object continuation, respectively. The visual probe ‘was’, however, only allows for a complement clause continuation, thus a non-minimal attachment option. According to the formulated assumptions, subjects are required to reparse the sentence in favour of the non-preferred option. In turn, it should take subjects longer to name the probe after the presentation of sentence fragment (1). Yet, naming latencies were longer for the probe following the prosody of stimulus (2), the minimal attachment option. This finding suggests that listeners do not automatically construct a purely syntactically driven minimal attachment structure but incorporate prosodic cues to syntactic structures already into initial parsing stages. Further behavioural studies reveal that prosody can also guide the computation of syntactic phrase structures in globally ambiguous sentences (Lehiste, 1972, 1973; Price, Ostendorf, Shattuck-Hufhagel, & Fong, 1991; Ferreira, Anes, & Horine, 1996). Listeners were able to identify the intended meaning of a globally ambiguous sentence by using the prosodic correlates of syntactic boundaries (e.g. prefinal lengthening of syllables, pitch contour variation, and speech pauses). Thus, prosodic factors substantially contribute to the structuring and interpretation of spoken utterances. Prosody may sometimes even guide the syntactic analysis of single sentences. In contrast, a prosodic pattern which is not congruent with a particular syntactic structure can also induce processing difficulties or garden path effects, respectively. Event-Related brain Potentials (ERPs) have been shown to be a valuable tool for the investigation of the time course of prosodic influences on language perception and the temporal dimensions of processing difficulties caused by syntax–prosody mismatches.
One of the first ERP studies concerned with these aspects was conducted on German by Steinhauer, Alter, and Friederici (1999). Listeners were presented with sentence conditions differing in their underlying syntactic structure which in turn lead to varying prosodic phrasing patterns. (3) [Peter verspricht Anna zu arbeiten]IPh1 [und das Bu¨ro zu putzen.]IPh2‘Peter promises Anna to work and the office to clean.’ (literal) (4) [Peter verspricht]IPh1 [Anna zu entlasten]IPh2 [und das Bu¨ro zu putzen.]IPh3‘Peter promises Anna to support and the office to clean.’ (literal) (5) *[Peter verspricht]IPh1 [Anna zu arbeiten]IPh2 [und das Bu¨ro zu putzen.]IPh3‘Peter promises Anna to work and the office to clean.’ (literal) In example (3) the noun phrase ‘Anna’ is the object of the first verb ‘verspricht’ (‘promises’) due to the intransitivity of the second verb ‘zu arbeiten’ (‘to work’). As a result, one Intonational Phrase (IPh; Selkirk, 1984) is formed by the complete fragment. The IPh boundary (as proven by acoustic analyses of pitch, duration, and pause patterns) is expressed at and after the second verb. In example (4) the transitivity of the second verb induces the formation of two within-sentence IPh boundaries, namely at the right edges of the first verb ‘verspricht’ (‘promises’) and the second verb ‘zu entlasten’ (‘to support’). The ERP data of the listeners revealed a centro-posterior positive-going waveform with a latency of 500 ms to the position of each IPh boundary. This result was interpreted as evidencing the on-line structuring of spoken sentences by means of the prosodic boundaries, i.e. the closure of a major prosodic phrase. In accordance with these findings, the ERP component was termed Closure Positive Shift (CPS). An additional condition (5) in the experimental setting served to determine whether an inadequate prosodic phrasing is sufficient to elicit mismatch effects (Steinhauer et al., 1999; Friederici, 2004). This syntax–prosody mismatch condition (5) was created by combining the IPh1 from example (4) with the consecutive noun + verb complex ‘Anna zu arbeiten’ (‘Anna to work’) from example (3). In turn, the syntactic structure of the sentence requires the attachment of the noun ‘Anna’ to the first noun. However, the prosodic phrasing indicates that the noun is attached to the second verb. The behavioural and ERP data confirmed that the mismatch between prosody and syntax was detected by listeners. In particular, a biphasic N400–P600 pattern was elicited on the verb ‘zu arbeiten’ (‘to work’). This result indicates that prosody–syntax mismatches can indeed induce garden path effects as previously reported in studies employing syntactic violations proper (Osterhout & Holcomb, 1992) or manipulating the ease of syntactic integration (Kaan, Harris, Gibson, & Holcomb, 2000). Steinhauer et al. (1999) interpreted the N400–P600 complex in their study as reflecting a lexical re-access due to
A. Mietz et al. / Brain and Language 104 (2008) 159–169
the violation of the intransitive argument structure of the second verb (N400) followed by a revision of the attachment site of the noun ‘Anna’ (P600). Hence, mismatches between the syntactic and the prosodic structure of sentences can induce the same ERP deflections as the detection and revision of semantic–syntactic violations proper (i.e. N400–P600 effects; Steinhauer et al., 1999). Yet, incongruities in sentence-level prosodic contours have also been shown to elicit ERP responses apart from N400/P600 patterns. For example, Magne et al. (2005) report a sustained negativity from 150 to 1050 ms for French accentuation patterns which are unexpected and inappropriate on the final words of dialogues. Although the effect is initially (between 150 and 300 ms) allocated to frontal electrode sites, later time periods yield effects which are largely distributed over the scalp surface. In sentence-medial positions, the polarity of the effect is reversed and occurs somewhat later than the negativity. This positive-going ERP to inappropriate sentence-medial accentuation appears statistically significant in the time range of 450–1050 ms. A further early deflection for prosodic mismatch detection has been reported by Scho¨n, Magne, and Besson (2004). In their study on French, changes in the pitch of sentence endings elicited a negative-going ERP between 50 and 200 ms over temporal electrode sites. This effect has also been replicated with 8-year old children (Magne, Scho¨n, & Besson, 2006). With respect to sentence-level prosodic processing in German, Eckstein and Friederici (2006) describe an also widely distributed negativity starting around 100 ms and yielding statistical differences between 300 and 500 ms. This particular ERP is evoked when listeners expect the continuation of a sentence’s intonation contour but encounter a sentence-final prosody. On the other hand, over-pronounced pitch accents in sentence-initial positions have been shown to result in a widely distributed expectancyrelated negativity between 250 and 350 ms (Heim & Alter, 2006). The given interpretations of these findings can be summarized briefly as follows: the early prosody-driven negativities are supposed to reflect automatic aspects of prosodic processing on sentence level. In particular, they are thought to arise from mismatches between expectations on an intonation contour which are built up on-line and the prosodic pattern listeners actually encounter (i.e. a target-actual comparison). To our knowledge, no ERP study has previously investigated whether sentence-level prosodic patterns which are infrequently occurring in everyday language bear the same processing consequences than syntactic–prosodic mismatches proper and/or mere prosodic incongruities. Thus, the current study was designed to investigate whether perceiving such an infrequent intonation (i.e. a vocative intonation) elicits similar ERP deflections than the processing of an inadequate prosodic pattern or syntax–prosody mismatch, respectively. Vocatives or so-called calling contours are, as opposed to declarative sentences or questions, prosodically rather
161
stereotyped or stylized expressions (Ladd, 1978). Vocatives have been described as spoken chants which convey a call with either a warning character or an attempt to attract attention or help (Pike, 1945; Fox, 1969; Lewis, 1970; Leben, 1976; Ladd, 1978). In phonological terms, the intonation contour of a vocative typically comprises a downstepped tonal sequence from one stable pitch level to another (Gussenhoven, 1993; Ladd, 1996). Caspers (2000) conducted a behavioural experiment on vocative structures in Dutch concerned with the question of their general communicative intentions. They presented volunteers with written contexts supporting a vocative or a default (new information) interpretation or not, and target words realized with different intonations. Subjects had to indicate on a 1–4 scale whether the spoken vocative suits the preceding context. Caspers (2000) found that vocatives are readily accepted when accompanying information which is not forcibly new for the listeners. Thereby, highest acceptability rates were achieved when the vocative served to single out a particular element from background information, thus increasing its salience. However, when vocatives were associated with information which was forcibly new for listeners the acceptability ratings were substantially lower. The current study is concerned with additional aspects of speech processing and employs an experimental methodology allowing for more fine-grained assertions on the temporal dimensions of perception, i.e. Event-Related brain Potentials (ERPs). Moreover, the employment of vocative intonation contours results in a processing condition with a somehow intermediate status between correct and frequently used prosodic forms (i.e. declarative and coordination sentences) and syntactic–prosodic mismatches (as derived by cross-splicing). We hypothesize that the prosodic patterns which are infrequent but correct for listeners (i.e. vocatives) do not hinder the processes of syntactic structure building as inadequate syntax–prosody mismatches do (see Steinhauer et al., 1999). However, the uncommon intonation of vocative contours might decrease the ease with which successive words can be integrated into a semantic phrase structure. Thus, infrequent prosodic patterns (vocatives) are expected to elicit an N400 similar to words with a low frequency in occurrence (Van Petten & Kutas, 1987). On the other hand, truly inadequate prosodic contours (syntax–prosody mismatches) should induce integration difficulties as reflected by an N400. In contrast to infrequent prosodies they should, however, also promote the reanalysis of a formerly assigned syntactic structure (reflected by a P600) when a syntactic-prosodically nonmatching word is perceived. 2. Materials and methods 2.1. Participants Twenty-four (12 female) volunteers took part in the ERP study (mean age of 24.8; SD = 3.1). All participants
162
A. Mietz et al. / Brain and Language 104 (2008) 159–169
were native speakers of German, had no hearing or neurological impairment and were right-handed as assessed by a German version of the Edinburgh Handedness Inventory (Oldfield, 1971).
2.2. Materials Overall, four experimental sentence conditions were created (see Table 1). Each of the four conditions consisted of 40 sentences in German. In all conditions, the combination of the first nouns and verbs (e.g. ‘Anton bites’) was held constant. However, all verbs (e.g. ‘bites’) did have two potential subcategorization frames, thus can either be of intransitive or transitive valence. These differences in verb valence bear consequences for the syntactic and the prosodic structure of the sentences. The prosodic consequences are discussed in detail in Section 2.3. The syntactic outcomes are summarized in the following. In condition COO (Coordination) and VOC (Vocative) the verb is used in its intransitive mode with, however, varying consequences. In condition COO, the noun following the verb (‘bites’) becomes the subject of a consecutive coordination phrase. In condition VOC, however, the postverbal noun is realized as a vocative or ‘calling contour’. In contrast to declaratives and coordinations which are constantly used by speakers, they use vocatives rather infrequently and with stylized prosodies. In condition DEC (Declarative), the verb (‘bites’) is used in transitive manner and followed by a direct object noun (‘Patricia’). Condition DEC served as filler and provided material for the consecutive cross-splicing explained below. The three conditions (DEC, COO, and VOC) were produced in a sound-attenuated room by a trained female speaker of Standard German. After the recordings, the sentences were digitized at a sampling rate of 44.1 kHz (16 bit/Mono) and normalized in amplitude. For condition CRS (Cross-spliced), the first part of condition COO (‘Anton bites’) including the pause after the first verb was cross-spliced with the second part of condition DEC (‘Patricia. . .’). To avoid artefacts, the point of cross-splicing was within the pause just before the onset of the second noun. This merging of condition COO and DEC resulted in syntactically correct sentences containing a prosodic violation, namely an inadequate prosodic
boundary between the verb (‘bites’) and its direct object (‘Patricia’). For the analyses of the ERP deflections, two main comparisons were drawn. First, the responses to the prosodically adequate and frequently used condition COO were compared to the adequate but infrequent condition VOC. Second, the adequate and frequent condition COO was compared to the generally inadequate mismatch condition CRS. Condition COO was chosen for the ERP comparisons due to the syntactic similarity up to the critical second noun (‘Patricia’) between the conditions COO, VOC, and CRS. 2.3. Acoustic analyses For the analysis of the tonal and durational properties of the speech material, sentences were divided into three parts of interest. The first part includes the first noun (‘Anton’), the second fragment the optional transitive verb (‘bites’) and the last part consists of the second noun (‘Patricia’). For each part of interest, four fundamental frequency (F0) values were computed separately per sentence and condition: the F0 value at the onset, the maximal F0, the minimal F0, and the F0 value at the offset. In a second step, the four values per fragment were averaged across the 40 realizations per condition. With respect to the F0 properties, please note that the utilized verbs ended on one or two voiceless consonants due to language-specific requirements in German. In these voiceless parts, no F0 can be measured. Thus, the durations of constituents as well as the pauses apparent from Figs. 1–3 as opposed to Fig. 4 superficially differ. The analyses of the tonal features are visualized in Figs. 1–3, and the results of the durational properties are illustrated in Fig. 4. Calculations on the statistical differences between F0 values between conditions were restricted to the segments and comparisons relevant for the interpretation of the ERP data, namely the first noun, the first verb and the second noun (‘Anton bites
Table 1 Sentence materials: overview and examples for the experimental conditions, and literal English translations of the German materials Condition
Example
(DEC) Declarative
Anton beisst Patricia und Carola lu¨gt. Anton bites Patricia and Carola lies.
(COO) Coordination
Anton beisst, Patricia petzt und Carola lu¨gt. Anton bites, Patricia squeals and Carola lies. Anton beisst, Patricia. Anton bites, Patricia. [Anton beisst]COO[Patricia und Carola lu¨gt]DEC Anton bites Patricia and Carola lies.
(VOC) Vocative (CRS) Cross-spliced
Fig. 1. F0 contours in condition DEC (grey line) and condition COO (black line).
A. Mietz et al. / Brain and Language 104 (2008) 159–169
163
Fig. 2. F0 course of condition COO (black solid line) as compared to condition VOC (black dotted line).
Fig. 4. Verb (a) and pause (b) durations in condition DEC, condition COO, condition VOC, and condition CRS.
Fig. 3. F0 course in condition COO (black solid line) as compared to condition CRS (black dashed line).
Patricia’). t-Tests were performed to validate differences in segment durations and F0 values. Fig. 1 illustrates the average F0 course in both highly frequent but syntactically differing sentence types. While the noun (‘Patricia’) is the object of the first verb in condition DEC, it is the subject of the subsequent syntactic phrase in condition COO. Fig. 1 depicts that the prosodic structure of both conditions varies by means of the transitive usage of the verb. In condition DEC, the verb (‘bites’) does not exhibit a high boundary tone while it does in condition COO. However, the boundary tone in condition DEC is shifted to the second noun (‘Patricia’) since the noun then establishes the right edge of the syntactic phrase. The durational properties (see Fig. 4) reveal significantly longer durations for condition COO in verb position (t[78] = 23.75; p 6 .01) and the consecutive pause (t[78] = 17.12; p 6 .01). Taken together, the prosodic features manifest the existence of an IPh boundary on the verb (‘bites’) in condition COO but not in condition DEC. The IPh boundary is signalled by a falling pitch pattern, the verb duration and the consecutive pause (Selkirk, 1984).
The following F0 course on the second noun ‘Patricia’ conveys a low F0 followed by a rising pitch pattern in condition DEC. In condition COO, the noun onset yields a low F0 and the following F0 course is a falling one, too. Fig. 2 incorporates the comparison of the average F0 courses in condition COO and condition VOC. It is evident that the infrequent but syntactically well-formed condition VOC initially displays a compressed pitch contour which is only interrupted by a strong tonal movement towards a high tone in verb position (‘bites’). In succession, the second noun (‘Patricia’) bears a low tone and a falling contour. The statistical analysis reveals differences in the maximal (t[78] = 4.62; p < .01) and minimal (t[78] = 9.67; p < .01) F0 values of the first noun, in the maximal (t[78] = 2.66; p < .01) and minimal (t[78] = 3.65; p < .01) F0 values of the first verb, and reliable differences in the maximal (t[78] = 2.30; p < .05) and minimal (t[78] = 5.14; p < .01) F0 values of the second noun. As apparent from Fig. 4, condition COO and VOC also differ significantly in the verb and pause durations. The duration of the verb (t[78] = 8.67; p < .01) and the pause (t[78] = 7.06; p < .01) are both longer in condition COO. Based on these prosodic properties it is concluded that condition COO and VOC both convey an IPh boundary after the verb ‘beisst’ (‘bites’). The IPh boundary in condi-
164
A. Mietz et al. / Brain and Language 104 (2008) 159–169
tion COO bears a low boundary tone, while the IPh boundary in condition VOC bears a high boundary tone. However, the prosody throughout the initial sentence fragment strongly differs between conditions. Fig. 3 illustrates the average F0 course for the condition with matching syntactic–prosodic properties (condition COO) and inadequate syntax-to-prosody associations (condition CRS). As condition CRS, however, only differs from condition COO after the point of cross-splicing, the tonal and durational features preceding the noun (‘Patricia’) are physically identical. In the position of the second noun, on the other hand, F0 courses differ between conditions. While condition COO conveys a falling F0 contour, condition CRS comprises the rising intonation of the underlying condition DEC. The actual prosodic inappropriateness of condition CRS is induced by the initially low F0 on the second noun which is not expected due to the closure of the preceding IPh. The statistical analysis reveal that only the maximal F0 value differs between conditions (t[78] = 2.66; p < .01). 3. Methods 3.1. EEG recordings The electroencephalogram (EEG) was recorded in a sound-proof and electromagnetically shielded cabin. The recordings were sampled at a rate of 250 Hz from 25 Ag/ AgCl cap-mounted electrodes (FP1 + 2, FZ, F3 + 4, F7 + F8, FT3 + 4, FT7 + 8, T7 + 8, CZ, C3 + 4, CP5 + 6, PZ, P3 + 4, P7 + 8, O1 + 2) according to the international 10–20 system (Jasper, 1958). Recordings were online referenced to the left mastoid and re-referenced offline to averaged mastoids. The electrooculogram (EOG) was recorded in order to control for horizontal and vertical eye movements. These were placed at the outer canthus of each eye, and below and above the right eye. All electrode impedances were kept below 5 kX.
prised 40 sentences each. The experimental conditions and blocks were presented in pseudo-randomized order once during the measurements. The entire experiment (including electrode preparation) lasted 1.5 h. 3.3. EEG data analysis The EEG data was analysed with the EEP 3.2 software package (MPI-CBS Leipzig). All trials containing eye blinks or movement artefacts were rejected from further computations. First, an automatic rejection procedure identified the trials with voltage changes above 40 lV at the ocular electrodes. A consecutive manual control assured that all trials afflicted with artefacts were indeed rejected. The number of rejected trials per subject was on the mean 10.85% (SD = 6.11), and did not differ significantly across conditions. Averages were first computed for each single subject, condition, and electrode. These averages then entered the grand average across subjects. Average ERPs were calculated from the onset of the second noun (‘Patricia’) relative to a baseline of 200 ms preceding the onset of the second noun. Overall, two comparisons were computed. Due to the short length of the sentences in condition VOC, the average for the comparison between conditions COO and VOC were computed for 800 ms in four time windows (TW) between 50–200, 200–400, 400–600, and 600– 800 ms. The ERPs for condition COO vs. CRS, however, were analysed in seven TW within a time range of 1800 ms between 50–120, 220–400, 400–600, 600–800, 800–1000, 1000–1400, and 1400–1800 ms. The first two TWs for each analysis were chosen by visual inspection, the later TWs provide fixed epochs in succession. Two ANOVAS were computed with the factors Condition (COO vs. VOC, COO vs. CRS), Region (frontal, central, posterior), and Hemisphere (left, right). For this
3.2. Testing procedure After electrode application, participants were seated in a chair in front of a computer screen (distance 150 cm). Volunteers were instructed about their task before the experiment. In particular, they were asked to carefully listen in order to indicate the acceptability of the intonation after each sentence by button-press on a palm box. The stimulus sentences were presented via loudspeakers. Throughout each sentence playback a fixation cross on the computer screen was present to avoid eye movements. Each trial was followed by an inter-stimulus interval (ISI) of 4000 ms after button-press. Participants were asked to avoid movements and to only blink their eyes within the ISI. In order to familiarize participants with the task, each experimental session started with a block of eight practice trials containing two sentences similar to each condition. After the training session, four blocks followed which com-
Fig. 5. ERPs at nine representative electrodes (7 Hz low-pass filtered) for condition COO (black solid line) vs. condition VOC (black dotted line) from the onset of the second noun (‘Patricia’).
A. Mietz et al. / Brain and Language 104 (2008) 159–169
purpose, the lateral electrodes were grouped into six regions of interest (ROI): left anterior (F7, F3, FC3), left central (T7, C3, CP5), left posterior (P7, P3, O1), right anterior (F8, F4, FC4), right central (T8, C4, CP6) and right posterior (P8, P4, O2). Only effects of the factor Condition and interactions involving this factor are reported in Section 4. All interactions conveying more than one degree of freedom were automatically adjusted for repeated measurements by means of Greenhouse–Geisser–Epsilon correction. Effects following the decomposition of interactions are only reported when they yield significant results. Moreover, all statistical analyses were computed on unfiltered data. For illustration purposes (see Figs. 5 and 6), however, a 7 Hz low-pass filter was applied.
165
4. Results 4.1. Behavioural data Participants show high acceptability rates for the highly frequent adequate prosodies of condition DEC and COO (96.7% vs. 94.3%). A two-tailed t-test revealed no differences. For the infrequent (condition VOC) and inadequate (condition CRS) syntax–prosody associations, however, acceptability judgements are lower. While the intonation of condition CRS is judged as acceptable in 81.7%, condition VOC is rated as acceptable in only 60.3% of all trials (significantly differences: t[46] = 2.95; p 6 .01). The t-test between the acceptability ratings for condition COO vs. VOC also yielded significant differences (t[46] = 5.01; p 6 .01). Likewise, the difference between ratings for condition COO vs. CRS was significant (t[46] = 2.48; p 6 .05). 5. ERP data 5.1. Processing frequent adequate as opposed to infrequent adequate sentence prosodies In Fig. 5, the comparison of the ERPs to the frequent adequate condition COO vs. the infrequent adequate condition VOC is displayed. The onset of the averages is congruent with the onset of the second noun (‘Patricia’). As apparent from Fig. 5, the ERPs for condition COO as compared to condition VOC display a widely distributed Early Negativity (EN) peaking at 120 ms. In addition, a second negativity is evident which peaks around 400 ms. The statistical analysis attests main effects Condition in the TW from 50 to 200 ms (F(1, 23) = 15.55; p 6 .01), in the TW between 200 and 400 ms (F(1, 23) = 14.30; p 6 .01), and in the TW from 400 to 600 ms (F(1, 23) = 12.36; p 6 .01). 5.2. Processing frequent adequate as opposed to inadequate sentence prosodies
Fig. 6. ERPs at nine representative electrodes (7 Hz low-pass filtered) for condition COO (black solid line) vs. condition CRS (black dashed line) from the onset of the second noun (‘Patricia’). The upper part serves to resemble the analysed TW as in Fig. 5. The lower part, however, displays the complete TW analysed for condition COO vs. CRS.
Fig. 6 illustrates the ERP responses for the comparison of the frequent adequate condition COO vs. the inadequate condition CRS. Again, the onset of the averages is congruent with the noun onset. For reasons of comparability with Fig. 5, the upper part of Fig. 6(a) displays an identical TW of 800 ms. The lower part of Fig. 6(b), on the other hand, displays the complete TW analysed for the comparison of condition COO vs. CRS. It is apparent from Fig. 6 that condition CRS elicits a widely distributed Early Negativity (EN) peaking around 120 ms after the onset of the second noun. The EN is immediately followed by a second negative ERP deflection peaking around 350 ms. The negative components are followed by a strong centro-posterior positive shift which starts around 500 ms after noun onset and reaches its peak at around 900 ms. The statistical analysis attest main effects Condition in a TW from 50 to 120 ms (F(1, 23) = 11.63; p 6 .01) and in the TW between 220 and 400 ms (F(1, 23) = 5.08; p 6 .05).
166
A. Mietz et al. / Brain and Language 104 (2008) 159–169
In succession, the TW between 600 and 800 ms reveals a main effect Condition (F(1, 23) = 6.79; p 6 .05), and an interaction of the factors Condition · Region (F(2, 46) = 9.22; p 6 .01). The decomposition of the interaction attests an effect Condition in the posterior ROI (F(1, 23) = 13.03; p 6 .01). In addition, also the TW from 800–1000 ms shows a main effect Condition (F(1, 23) = 5.39; p 6 .05), and an interaction Condition · Region (F(2, 46) = 41.56; p 6 .01). The decomposition of the interaction attests effects Condition in the central ROI (F(1, 23) = 5.32; p 6 .05) and in the posterior ROI (F(1, 23) = 25.15; p 6 .01). Last, the TW from 1000 to 1400 ms displays an effect Condition (F(1, 23) = 4.32; p 6 .05). 6. Discussion The current study aimed at investigating the consequences of perceiving prosodic or suprasegmental phonological patterns, respectively, which are infrequent or inadequate with respect to a syntactic phrase structure. Behavioural and ERP correlates for the perception of the deviant prosodies were compared to responses of adequate syntax–prosody relations permanently used in everyday speech. With respect to the behavioural results, the high acceptability rates for the conditions COO and DEC are in congruence with their correct syntax-to-prosody associations and their frequent occurrence in everyday talk. What is striking, however, is the high acceptance of the crossspliced condition CRS. This result is not congruent with e.g. the data of Steinhauer et al. (1999) which used a similar task and stimuli. The outcome has to be potentially related to the unacceptability of vocative prosodic contours on information which is new with respect to a preceding context (Caspers, 2000) as our experimental materials comprised of context-free sentences which forcibly convey novelties (Ladd, 1996). More than 60% of our participants commented on the idiosyncrasy of the vocatives in a posttest questionnaire. We thus hypothesize that listeners’ attention might have been so attracted by the infrequent prosody of condition VOC that they did not consistently recognize the oddity of the cross-spliced condition CRS. Yet, the event-related brain responses indicate that both conditions (VOC and CRS) were processed as prosodic deviations. In line with our ERP hypotheses, infrequent prosodic patterns (condition VOC) elicited N400 responses. The latency and scalp distribution of the observed component is similar to the N400 in studies which contrasted, e.g. the processing of high-frequent and low-frequent words (Van Petten & Kutas, 1987). As apparent from our materials (cf. Fig. 2) the IPh on the actual warning call (’Anton bites!’) is closed after the verb. There is no prosodic indication for listeners that a concurrent sentence element still has to be integrated into the IPh. We propose that the premature closure of the IPh impedes the incorporation of the consecutive noun (’Patricia’) into the semantic computa-
tion of the complete vocative containing the warning call and the person addressed by it. The N400 is thus interpreted as reflecting integration difficulties induced by an infrequent intonation. The biphasic N400/P600 pattern elicited by syntax–prosody mismatches in the study of Steinhauer et al. (1999) could thus not be replicated with infrequent intonation patterns. Yet, our ERP data on the processing of inadequate prosodic contours (condition CRS) replicate the biphasic N400/P600 pattern of Steinhauer et al. (1999). Moreover, the data resemble findings on electrophysiological consequences of perceiving syntactic violations proper (Osterhout & Holcomb, 1992; Hagoort & Brown, 2000b) and increased efforts in syntactic integration (Kaan et al., 2000). The intonation of the initial part of condition CRS (‘Anton bites’) is congruent with condition COO (cf. Fig. 3). The acoustic analyses show an IPh boundary at and after the verb. Thus, a new IPh is established by the consecutive sentence fragment. However, this fragment is cross-spliced from condition DEC, and its prosody signals the attachment of the noun (‘Patricia’) to the verb (‘bites’). Listeners are thus faced with a conflict since the noun ‘Patricia’ cannot be easily integrated into the former IPh. We propose that the N400 in condition CRS is indicative of listeners’ attempts to integrate the noun (‘Patricia’) into this former IPh. We suggest that the consecutive P600 signals the conflict between the attempts to attach the noun to the preceding verb and to constitute a new IPh starting which in turn leads to difficulties in the formation of a syntactic hierarchy. The relatively high acceptability rates for condition CRS, however, indicate that listeners can derive a meaningful sentence structure from the prosodically deviant input. Thus, we suggest that the P600 reflects the effortful integration of the noun into the syntactic sentence structure (Kaan et al., 2000). In addition to the N400 effects elicited by both prosodically deviant conditions (infrequent: VOC; inadequate: CRS), and the P600 effect evoked during the perception of the inadequate prosody, our data show further earlier ERPs. Figs. 5 and 6 illustrate that perceiving an infrequent adequate (condition VOC) as well as an inadequate prosody (condition CRS) both elicit early broadly distributed negative-going waveforms with a latency of 120 ms. These so-termed Early Negativities (EN) clearly precede the N400 and yield separate peaks. We thus claim that the EN should not be interpreted as the early onset of an N400 as suggested by Holcomb and Neville (1990, 1991) and Diaz and Swaab (in press). Yet, the latency of the EN component is also substantially shorter than previously reported N200/PMN deflections for deviating segmental phonological input (Connolly, Steward, & Phillips, 1990; Connolly, Phillips, Stewart, & Brake, 1992; Connolly & Philipps, 1994; Hagoort & Brown, 2000a; Van den Brink, Brown, & Hagoort, 2001; D’Arcy, Connolly, Service, Hawco, & Houlihan, 2004). In brief, these studies show that words which either match or do not match a sentential context
A. Mietz et al. / Brain and Language 104 (2008) 159–169
start to be differentiated before their semantic content has been fully encountered. Connolly et al. (1990) were the first to report this ERP response which peaks at 250 ms and displays a fronto-central scalp distribution. In particular, the ERP was observed for semantic mismatches which were phonological word onset errors at the same time. Connolly et al. (1992) interpreted the negativity as reflecting the acoustic-phonological pre-processing of words. A follow-up study by Connolly and Philipps (1994) served to further dissociate the ERP responses to semantic and phonological mismatches. The results proved the functional dissimilarity of the N400 and the preceding N200 (re-termed then as Phonological Mismatch Negativity; PMN). While the peak latency for pure violations of sentence semantics was around 400 ms, the pure segmental phonological mismatch effect peaked at 275 ms predominantly at centro-parietal sites and lacking a consecutive N400. However, similar early negativities have not only been described for the processing of deviant segmental phonological properties (i.e. phoneme mismatches) but also for the perception of suprasegmental phonological alterations (i.e. sentence-level prosody or accentuation, respectively). The influence of accent patterns on listeners’ predictions about the subsequent prosodic structure of an utterance has been the subject of many behavioural studies (Beach, 1991; Marslen-Wilson et al., 1992; Warren et al., 1995). Furthermore, several ERP studies reported on early prosody-driven negativities (Scho¨n et al., 2004; Pannekamp, Toepel, Alter, Hahne, & Friederici, 2005; Magne et al., 2005, 2006; Heim & Alter, 2006; Eckstein & Friederici, 2006). They are generally interpreted in terms of rather automatic mismatch detections between on-line built expectations on the prosodic form of an utterance and the actually perceived prosodic pattern. The onset latency of the EN in the current study strongly suggests that listeners process the prosodic contours of the conditions VOC and COO based on expectancies on an upcoming intonation structure. Thereby, the prosody of the first sentence fragment (‘Anton bites’) serves as a context for the consecutive noun (‘Patricia’). Based on the prosody of the context, listeners expect a specific continuation of the prosodic contour. Following IPh boundaries, the fundamental frequency contour (F0) of a speaker usually undergoes a reset (Shattuck-Hufnagel & Turk, 1996). In turn, the expected F0 onset is rather high in the beginning of a subsequent IPh. In both of our prosodically deviant conditions, however, the onset of the F0 is low. Listeners’ expectancies on the continuation of the F0 contour after the IPh boundary are thus not met. We propose that listeners’ sensitivity to these inconsistencies manifests in the early negative-going waveform (EN). Due to its time course, the EN in our study should, however, also be discussed in relation to ERPs which are elicited still earlier during speech perception (i.e. N100). Sanders and Neville (2003) have shown that not only the onsets of sentence-initial words elicit N100 responses but
167
also word onsets which are embedded into connected natural speech. However, amplitudes of the N100 were smaller for word onsets in sentence-medial than in sentence-initial positions. The authors interpreted the negative-going component as an indicator for on-line speech segmentation and acoustic word onset detection. In our study, the critical noun (‘Patricia’) is always preceded by a pause. These pauses differ significantly between condition COO and the infrequent prosody of condition VOC, but are identical between condition COO and the inadequate prosody of condition CRS. However, both comparisons (COO vs. VOC and COO vs. CRS) yield EN’s with similar latencies and scalp distributions. Loudness had been adapted between conditions making an interpretation of the EN as being evoked by acoustic factors more unlikely. In addition, the temporal dynamics of the EN component (cf. Figs. 5 and 6) suggest that the EN is a slow oscillation (surviving the 7 Hz-low pass filtering for illustrative purposes and yielding significant effects in time windows of 150 ms). We thus argue that the EN in our study cannot simply be explained in terms of an N100 triggered by rather exogenous physical factors of the speech signal (Rugg & Coles, 1995). In line with the early formulation of the PMN as driven by segmental phonological mismatches (Connolly et al., 1992) but at the same time considering the ERP evidence available for prosodic mismatches (Eckstein & Friederici, 2006; Heim & Alter, 2006; Magne et al., 2005, 2006; Pannekamp et al., 2005; Scho¨n et al., 2004), we propose that the EN in our study indeed reflects an acoustic-phonological pre-processing step. The elicitation of the EN is assumed to be driven by the evaluation of the suprasegmental phonology (or prosody, respectively) of the critical noun ‘Patricia’ and expectations on its prosodic pattern as formed by the sentence context. It is thus proposed that the context influences listeners’ expectations on the continuation of the prosodic contour top–down and on-line. In summary, we showed that the processing of infrequent and inadequate syntax-to-prosody relations yield diverging electrophysiological consequences. Both prosodies which deviate from sentence intonations used frequently in every-day speech elicit an early negative-going ERP (EN). This EN is assumed to reflect a comparison between a target intonation and an actually perceived prosody during the pre-processing of suprasegmental phonological (prosodic) information. The EN is invariably followed by an N400 which is proposed to signal semantic integration difficulties induced by the idiosyncratic sentence prosodies. Yet, only the perception of ostensible syntax-to-prosody mismatches seems to result in an effortful syntactic structure building yielding a P600 deflection. Acknowledgments Both first authors contributed equally to the paper. We thank Angela Friederici for the opportunity to collect the ERP data in the Max-Planck-Institute for Cognitive and
168
A. Mietz et al. / Brain and Language 104 (2008) 159–169
Brain Sciences in Leipzig. Moreover, we thank Caroline Fe´ry for discussions on the prosodic aspects of the study, and Sylvia Stasch for her support in ERP data collection. We also thank two anonymous reviewers for helpful comments on an earlier version of the paper. This project has been supported by the Human Frontier Science Program (grant awarded to Kai Alter; HFSP, RGP 0053/2002). References Beach, C. M. (1991). The interpretation of prosodic patterns at points of syntactic structure ambiguity: Evidence for cue trading relations. Journal of Memory and Language, 30, 644–663. Caspers, J. (2000). Experiments on the meaning of four types of singleaccent intonation patterns in Dutch. Language and Speech, 43, 127–161. Connolly, J. F., & Philipps, N. A. (1994). Event related brain potential components reflect phonological and semantic processing of the terminal word of spoken sentences. Journal of Cognitive Neuroscience, 6, 256–266. Connolly, J. F., Phillips, N. A., Stewart, S. H., & Brake, W. G. (1992). Event-related potential sensitivity to acoustic and semantic properties of terminal words in sentences. Brain and Language, 43, 1–18. Connolly, J. F., Steward, S. H., & Phillips, N. A. (1990). The effects of processing requirements on neuropsychological responses to spoken sentences. Brain and Language, 39, 302–318. D’Arcy, R. C. N., Connolly, J. F., Service, E., Hawco, C., & Houlihan, M. (2004). Separating phonological and semantic processing in auditory sentence processing: A high-resolution event-related brain potential study. Human Brain Mapping, 22, 40–51. Diaz, M.T. & Swaab, T.Y. (in press). Electrophysiological differentiation of phonological and semantic integration in word and sentence contexts. Brain Research. Eckstein, K., & Friederici, A. D. (2006). It’s early: Event-related potential evidence for initial interaction of syntax and prosody in speech comprehension. Journal of Cognitive Neuroscience, 18(10), 1696–1711. Ferreira, F., Anes, M. D., & Horine, M. D. (1996). Exploring the use of prosody during language comprehension using the auditory moving window technique. Journal of Psycholinguistic Research, 25, 273–290. ˆ tre Phone´tique, 84, Fox, A. (1969). A forgotten English tone. Le Maı 13–14. Frazier, L. (1979). On comprehending sentences: Syntactic parsing strategies. Bloomington: India University Linguistics Club. Frazier, L. (1987). Sentence processing: A tutorial review. In M. Coltheart (Ed.), Attention and performance XII: The psychology of reading. London: Erlbaum. Frazier, L. (1990). Exploring the architecture of the language-processing system. In G. T. M. Altmann (Ed.), Cognitive models of speech processing: Psycholinguistic and computational perspectives. Cambridge, MA: MIT Press. Friederici, A. D. (2004). Event-related brain potential studies in language. Current Neurology and Neuroscience Reports, 4, 466–470. Gussenhoven, C. (1993). The Dutch foot and the chanted call. Journal of Linguistics, 29, 37–63. Hagoort, P., & Brown, C. M. (2000a). ERP effects of listening to speech: Semantic ERP effects. Neuropsychologia, 38, 1518–1530. Hagoort, P., & Brown, C. M. (2000b). ERP effects of listening to speech compared to reading: The P600/SPS to syntactic violations in spoken sentences and rapid serial visual presentation. Neuropsychologia, 38, 1531–1549. Heim, S., & Alter, K. (2006). Prosodic pitch accents in language comprehension and production: ERP data and acoustic analyses. Acta Neurobiologiae Experimentalis, 66, 55–68.
Holcomb, P. J., & Neville, H. J. (1990). Auditory and visual semantic priming in lexical decision—a comparison using event-related brain potentials. Language and Cognitive Processes, 5, 281–312. Holcomb, P. J., & Neville, H. J. (1991). Natural speech processing: An analysis using event-related potentials. Psychobiology, 19, 286–300. Jasper, H. H. (1958). Report of the committee on the methods of clinical examination in electroencephalography. Electroencephalography and Clinical Neurophysiology, 10, 370–375. Kaan, E., Harris, A., Gibson, E., & Holcomb, P. (2000). The P600 as an index of syntactic integration difficulty. Language and Cognitive Processing, 15(2), 159–201. Ladd, D. R. (1978). Stylized intonation. Language, 54, 517–539. Ladd, D. R. (1996). Intonational phonology. Cambridge, UK: Cambridge University Press. Leben, W. (1976). The tones in English intonation. Linguistic Analysis, 2, 69–107. Lehiste, I. (1972). The timing of utterances and linguistic boundaries. Journal of the Acoustic Society of America, 51, 2018–2024. Lehiste, I. (1973). Phonetic disambiguation of syntactic ambiguity. Glossa, 7, 107–122. Lewis, J. W. (1970). The tonal system of remote speech. Le Maıˆtre Phone´tique, 85, 31–36. Magne, C., Aste´sano, C., Lacharet-Dujour, A., Morel, M., Alter, K., & Besson, M. (2005). On-line processing of ‘‘Pop-Out’’ words in spoken French dialogues. Journal of Cognitive Neuroscience, 17(5), 740–756. Magne, C., Scho¨n, D., & Besson, M. (2006). Musician children detect pitch violations in both music and language better than nonmusician children: Behavioral and electrophysiological approaches. Journal of Cognitive Neuroscience, 18(2), 199–211. Marslen-Wilson, W. D., Tyler, L. K., Warren, P., Grenier, P., & Lee, C. S. (1992). Prosodic effects in minimal attachment. Quarterly Journal of Experimental Psychology, 45A, 73–87. Oldfield, R. C. (1971). The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia, 9, 97–113. Osterhout, L., & Holcomb, P. J. (1992). Event-related potentials and syntactic anomaly. Journal of Memory and Language, 31, 785–804. Pannekamp, A., Toepel, U., Alter, K., Hahne, A., & Friederici, A. D. (2005). Prosody-driven sentence processing. Journal of Cognitive Neuroscience, 17(3), 407–421. Pike, K. (1945). The intonation of American English. Ann Arbor: University of Michigan Press. Price, P. J., Ostendorf, M., Shattuck-Hufhagel, S., & Fong, C. (1991). The use of prosody in syntactic disambiguation. Journal of the Acoustical Society of America, 90, 2956–2970. Pynte, J., & Prieur, B. (1996). Prosodic breaks and attachment decisions in sentence parsing. Language and Cognitive Processes, 11, 165–191. Rugg, M. D., & Coles, M. G. H. (1995). Electrophysiology of mind: Eventrelated brain potentials and cognition. New York: Oxford University Press. Sanders, L. D., & Neville, H. J. (2003). An ERP study of continuous speech processing: I. Semantics, syntax, and prosody in native English speakers. Cognitive Brain Research, 15, 228–240. Schafer, A., Speer, S., Warren, P., & White, S. (2000). Intonational disambiguation in sentence production and comprehension. Journal of Psycholinguistic Research, 29, 169–182. Schepman, A., & Rodway, P. (2000). Prosody and parsing in coordination structures. Quarterly Journal of Experimental Psychology A, 53(2), 377–396. Scho¨n, D., Magne, C., & Besson, M. (2004). The music of speech: Music training facilitates pitch processing in both music and language. Psychophysiology, 41(3), 341–349. Selkirk, E. (1984). Phonology and syntax: The relation between sound and structure. Cambridge: MIT Press. Shattuck-Hufnagel, S., & Turk, A. E. (1996). A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research, 25(2), 193–247.
A. Mietz et al. / Brain and Language 104 (2008) 159–169 Speer, S. R., Kjelgaard, M. M., & Dobbroth, K. M. (1996). The influence of prosodic structure on the resolution of temporary syntactic closure ambiguities. Journal of Psycholinguistic Research, 25, 247–268. Steinhauer, K., Alter, K., & Friederici, A. D. (1999). Brain potentials indicate immediate use of prosodic cues in natural speech processing. Nature Neuroscience, 2, 191–196. Van den Brink, D., Brown, C., & Hagoort, P. (2001). Electrophysiological evidence for early contextual influences during spoken word recogni-
169
tion: N200 versus N400 effects. Journal of Cognitive Neuroscience, 13, 967–985. Van Petten, C., & Kutas, M. (1987). Ambiguous words in context: An event-related potential analysis of the time-course of meaning activation. Journal of Memory and Language, 26, 188–208. Warren, P., Grabe, E., & Nolan, F. (1995). Prosody, phonology, and parsing in closure ambiguities. Language and Cognitive Processes, 10, 457–486.