How and when predictability interacts with accentuation in temporally selective attention during speech comprehension

How and when predictability interacts with accentuation in temporally selective attention during speech comprehension

Neuropsychologia 64 (2014) 71–84 Contents lists available at ScienceDirect Neuropsychologia journal homepage: www.elsevier.com/locate/neuropsycholog...

4MB Sizes 0 Downloads 36 Views

Neuropsychologia 64 (2014) 71–84

Contents lists available at ScienceDirect

Neuropsychologia journal homepage: www.elsevier.com/locate/neuropsychologia

How and when predictability interacts with accentuation in temporally selective attention during speech comprehension Xiaoqing Li a,n, Yong Lu b, Haiyan Zhao a a b

Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China Academy of Psychology and Behavior, Tianjin Normal University, Tianjin, China

art ic l e i nf o

a b s t r a c t

Article history: Received 4 March 2014 Received in revised form 10 September 2014 Accepted 12 September 2014 Available online 22 September 2014

The present study used EEG to investigate how and when top-down prediction interacts with bottom-up acoustic signals in temporally selective attention during speech comprehension. Mandarin Chinese spoken sentences were used as stimuli. We systematically manipulated the predictability and de/ accentuation of the critical words in the sentence context. Meanwhile, a linguistic attention probe ‘ba’ was presented concurrently with the critical words or not. The results showed that, first, words with a linguistic attention probe elicited a larger N1 than those without a probe. The latency of this N1 effect was shortened for accented or lowly predictable words, indicating more attentional resources allocated to these words. Importantly, prediction and accentuation showed a complementary interplay on the latency of this N1 effect, demonstrating that when the words had already attracted attention due to low predictability or due to the presence of pitch accent, the other factor did not modulate attention allocation anymore. Second, relative to the lowly predictable words, the highly predictable words elicited a reduced N400 and enhanced gamma-band power increases, especially under the accented conditions; moreover, under the accented conditions, shorter N1 peak-latency was found to correlate with larger gamma-band power enhancement, which indicates that a close relationship might exist between early selective attention and later semantic integration. Finally, the interaction between top-down selective attention (driven by prediction) and bottom-up selective attention (driven by accentuation) occurred before lexical-semantic processing, namely before the N400 effect evoked by predictability, which was discussed with regard to the language comprehension models. & 2014 Elsevier Ltd. All rights reserved.

Keywords: Speech processing Temporally selective attention Accentuation Predictability

1. Introduction Speech signal provides a large amount of information and unfolds rapidly in time, which presents significant challenges to the auditory perception and comprehension systems. Listeners must determine which time point of the rapidly changing acoustic signals needs to be processed in detail. Temporally selective attention, therefore, would be critical for speech comprehension, since it can help to allocate attentional resources to time windows that contain more important information (Astheimer and Sanders, 2009). A striking feature of selective attention is that it depends not only on “bottom-up” signals, such as sensory salience, but also on “top-down” information, such as our prior knowledge or predictions. With regard to the relationship between these topdown and bottom-up processes, previous studies mainly focus on visual–spatial attention. How people use temporally selective attention n Correspondence to: Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, No. 16 Lincui Road, Chaoyang District, Beijing 100101, China. Tel.: þ 86 10 64864012; fax: þ 86 10 64872070. E-mail address: [email protected] (X. Li).

http://dx.doi.org/10.1016/j.neuropsychologia.2014.09.020 0028-3932/& 2014 Elsevier Ltd. All rights reserved.

to process speech is still a relatively underdeveloped area. Thus, the present study aimed to investigate how the bottom-up process driven by acoustic salience (namely, accentuation) and the top-down process driven by prior prediction interact with each other in temporally selective attention during speech comprehension. Accentuation is one type of prosodic information in the speech signal, which reflects the relative prominence of a particular syllable, word, or phrase in an utterance realized mainly by modulations of pitch (Shattuck-Hufnagel and Turk, 1996). In West-Germanic languages, it is by now well-known that speakers tend to place a pitch accent on new or focused information, while leaving given information de-accented (Gussenhoven, 1983; Ladd, 1996). In Chinese, the new or focused information is also encoded via accentuation which is mainly realized by the expansion of the pitch range of lexical tone (Chen, 2006; Chen and Gussenhoven, 2008; Xu, 1999; Jia et al., 2008, 2006; Liu and Xu, 2006; Wang et al., 2002). Previous psycholinguistic studies on accentuation mainly focused on the correspondence between accentuation and information structure. Behavioral studies found that speech processing is facilitated when new information is accented and given information de-accented (e.g. Bock and

72

X. Li et al. / Neuropsychologia 64 (2014) 71–84

Mazzella, 1983; Dahan et al., 2002; Terken and Noteboom, 1987). The studies using EEG (electroencephalogram) also reported immediate ERP (event related potential) effects (broadly-distributed negativity, N400, or P300) for missing pitch accent on new information or superfluous pitch accent on given information (e.g., Dimitrova et al., 2012; Hruska et al., 2000; Johnson et al., 2003, Li et al., 2008b; Magne et al., 2005; Toepel et al., 2007). Those results suggest that accentuation can influence the ease by which the current speech signal is processed. As to the specific mechanisms by which accentuation affects speech processing, some researchers propose that accentuation can modulate attention allocation (e.g., Cutler, 1976; Li and Ren, 2012; Sanford et al., 2006). For example, using phoneme monitoring task, Cutler (1976) showed heightened attention (as indicated by faster phoneme monitoring responses) to a word that received a pitch accent. Subsequently, using change detection task, Sanford et al. (2006) also found that the ability of listeners to detect a oneword alteration between the twice-presented spoken discourses was superior under the contrastive accent condition than that under the non-contrastive accent condition. Using EEG, Li and Ren (2012) also found that the semantically incongruent words elicited a larger N400 than the semantically congruent words when the corresponding words were accented; however, no significant difference was observed between the incongruent and congruent words when they were de-accented. Those results indicate that accentuation can modulate listeners' selective attention process. Accentuation guides the listeners to allocate more attention to accented information and take deeper processing. Therefore, listeners are more able to detect the presence of semantic incongruence, word alteration, or phoneme probe under the accented condition (Cutler, 1976; Li and Ren, 2012; Sanford et al., 2006). Besides the bottom-up sensory signal such as accentuation, prior knowledge or predictions also influence the process of language comprehension. The effect of predictability on language comprehension has been studied extensively in both spoken language comprehension and reading. Eye tracking studies revealed that highly predictable words are read more quickly and skipped more often than lowly predictable words (e.g., Frisson et al., 2005). The ERP studies also demonstrated that, in a sentence or discourse context, the word with a high-level of cloze-probability (namely, a high-level of predictability) elicits a reduced N400 compared with the word with a low-level of clozeprobability (e.g., DeLong et al., 2005; Van Berkum et al., 2005; Federmeier, 2007; Laszlo and Federmeier, 2009; Thornhill and Van Petten, 2012). Those results indicate that the highly predictable words are more easily processed and integrated into the sentence or discourse context. Some fMRI or MEG studies also examined how predictability influences early perceptual processing and found that predicted stimuli evoke reduced neural responses in the early visual/auditory cortex (Alink et al., 2010; den Ouden et al., 2010; Todorovic et al., 2011; Sohoglu et al., 2012). For example, Sohoglu et al., 2012 manipulated predictability of speech content by presenting matching, mismatching, or neutral text before speech onset. They found that the provision of prior knowledge reduces activity in the superior temporal gyrus that has been considered to be involved in perceptual aspects of speech processing. Although the precise relationship between predictive sensory coding and attention is still the subject of ongoing debate, the sensory attenuation of predicted signals is consistent with the possibility that processors might direct less attention to predicted external signals. However, other studies also provided inconsistent results by revealing that prediction sometimes seems to enhance rather than reduce sensory signals (Doherty et al., 2005; Chaumon et al., 2008). To resolve the inconsistency in the above results, Kok et al. (2012) further examined how predictability and cued spatial attention

affect early visual perceptual processing. They found that, at the unattended location, predicted stimuli reduce neural response in the early visual cortex; in contrast, at the attended location, predicted stimuli enhance neural response in the early visual cortex. That is, the effect of predictability on early perceptual processing is modulated by the amount of attentional resources already allocated to the external stimuli. Until now, only a few studies directly examined how predictability influences temporally selective attention during speech processing. In an ERP study, Astheimer and Sanders used an attention probe paradigm to investigate whether listeners allocate attentional resources to the time windows that contain highly important acoustic information, such as word onset. Attention probes were presented concurrently with word onsets, beginning 50 and 100 ms before and after word onsets, and at random control intervals. They found that linguistic probe ‘ba’ presented at the word onset elicited larger amplitude N1 than probes presented at other time points, suggesting that listeners direct attention to moments that contain word onsets (Astheimer and Sanders, 2009). Subsequently, Astheimer and Sanders (2011) further explored the reason that listeners attend to word onsets in speech. Based on transitional probabilities, word onsets are relatively unpredictable (Aslin et al., 1999). It might be that listeners tend to allocate more resources to times at which unpredictable information is presented, since unpredictable segments are highly informative. To test this hypothesis, they measured ERPs elicited by syllable onsets in an artificial language. The participants were required to listen to stream of artificial nonsense words arranged in pairs, such that the second word in each pair was completely predictable. After recognition training, the unpredictable first words elicited a larger N1. This enhancement was absent for the completely predictable second word in each pair. These results provided solid evidences for the fact that listeners are most likely to attend the segments in speech that are less predictable (Astheimer and Sanders, 2011). Taken together, the above findings make it clear that both predictability and sensory salience, such as accentuation, influence temporally selective attention during speech processing. However, there are still questions needing to be explored further. First, although previous studies proved that accentuation plays a role in modulating attention allocation during spoken language comprehension, their findings are based on behavioral measures or on semantic congruence effect (e.g., Cutler, 1976; Li and Ren, 2012; Sanford et al., 2006). Therefore, these studies could not tell us whether accentuation can modulate selective attention at the early stage of information processing, such as before lexicalsemantic processing or decision making. Second, with the help of artificial word training paradigm, the study conducted by Astheimer and Sanders (2011) demonstrated that listeners allocate more resources to the less predictable moment in the speech signal. However, we do not know whether, during natural speech comprehension, predictions derived from sentence context have the same effect on selective attention. Third and most importantly, it is completely unknown how, or even if, the bottom-up process driven by sensory salience and the top-down process driven by prior prediction interacts with each other in temporally selective attention during speech comprehension. If they do, at what functional stage does the top-down process begin to affect the bottom-up process? As to the functional stage at which top-down knowledge interacts with bottom-up sensory signals, different models have been put forward. One proposal assumes that language processing is strictly feedforward, with semantic contextual information and bottom-up sensory information integrated only at a later decision stage (Fodor, 1983; McQueen et al., 2006; Norris et al., 2000). In contrast, the TRACE (McClelland and Elman, 1986; McClelland

X. Li et al. / Neuropsychologia 64 (2014) 71–84

et al., 2006; Friston, 2010) and “predictive coding” (Gagnepain et al., 2012; Sohoglu et al., 2012; Wild et al., 2012; Spratling, 2008) models argue that higher-level knowledge can directly influence pre-lexical processes through feedback connections. Recently, some neuroimaging studies further examined the interaction between top-down knowledge and bottom-up sensory signals and their results were mixed, with some findings supporting the early interaction account (Guediche et al., 2013; Obleser and Kotz, 2010; Sohoglu et al., 2012) and some supporting the later interaction account (Davis et al., 2011). In the field of language comprehension, the functional stage, at which top-down knowledge begins to interact with the bottom-up signals, remains unsettled. Therefore, the aims of the present study were to further investigate the effects of accentuation and prediction on temporally selective attention during speech comprehension; more importantly, how, or even if, the top-down process driven by prior prediction interacts with the bottom-up process driven by acoustic signals; and if it does, what is the functional stage at which the two processes interact with each other. To study temporally selective attention, EEG and attention probe paradigm were used in the present study. Selective auditory attention has been studied with the help of event-related brain potentials (ERPs). Whether selected on the basis of location or time, attended transient auditory stimuli elicit a N1 that increases in amplitude (Hillyard et al., 1973; Hink and Hillyard, 1976; Näätänen and Picton, 1987; Näätänen and Winkler, 1999) or/and decreases in latency (Folyi et al., 2012; Lagemann et al., 2010; Obleser and Kotz, 2011) as compared with the unattended one. However, not all portions of the unfolding speech signal include abrupt acoustic changes similar to the transient auditory stimuli. Therefore, as in early studies (Astheimer and Sanders, 2009; Coch et al., 2005; Stevens et al., 2006), a temporal variant of the Posner probe paradigm (Posner, 1980) was used to index selective attention at different times during speech processing. In the Posner probe paradigm, attention probe (namely, target stimuli) is presented at the attended or unattended location that is oriented by a preceding cue. In the temporal attention probe paradigm, the auditory probe was superimposed on the different time points of the speech signal. Based on previous studies, the N1 elicited by the probe is considered as a correlate of focal attention, with the enhancement of N1 amplitude or/and shortening of N1 latency reflecting more attention allocated to the corresponding time point. In addition to the N1 component, we were also interested in ERP component and brain oscillations that are related to semantic processing or integration: N400 and gamma frequency band. N400 has been found to reflect the ease with which a word is integrated into the current context, with enhanced N400 observed for the more difficult condition (e.g., Hagoort and Brown, 2000; Federmeier, 2007; Kutas and Federmeier, 2000). Gamma band power has been found to increase for semantically or pragmatically congruent words as compared with words incongruent with respect to their sentence context (30–45 Hz or 35–45 Hz: Hald et al., 2006; Hagoort et al., 2004). It appears that normal sentence-level semantic integration is accompanied by increases in gamma band synchronization. Although both N400 and the low gamma frequency band can index semantic processing or integration, there are differences between them: N400 just reveals the phase-locked neural activities and the oscillatory brain activities reveal both the phase-locked and nonphase-locked neural activities; N400 has a higher level of temporal resolution and the gamma activities have a relatively lower level of temporal resolution. An early EEG study, which was interested in the relationship between early perceptual processing and later semantic processing, reported a significant N1-gamma correlation, but not N1-N400 correlation (Obleser and Kotz, 2011). Therefore, the oscillatory brain activities can provide additional information for the underlying cognitive processes.

73

The materials used in the present study were isolated Chinese spoken sentences, i.e. sentences presented without explicitly provided discourse-level contextual information. With such isolated sentences, we could investigate how accentuation influences selective attention without the interference of the correspondence between accentuation and the information state derived from discourse context. In the present study, the critical words in the sentence were highly predictable or lowly predictable. Meanwhile, the critical words were accented or de-accented. Additionally, a linguistic attention probe ‘ba’ was presented concurrently with the critical words or not. On the one hand, we compared the N400 components and oscillatory activities elicited by the four conditions without an attention probe to examine how prediction and accentuation influence the later stage of semantic processing. On the other hand, we subtracted the ERP waveforms elicited by the critical words without a probe form those elicited by the corresponding words (namely, the words with the same degree of predictability and accentuation) with a probe, which isolated the N1 effect triggered by the attention probe. By examining how prediction and accentuation modulated the N1 effect elicited by the attention probe, we would know how and when the two sources of information interact with each other in temporally selective attention. If their interaction occurs at the pre-lexical stage, the time point at which their interaction occurs would be earlier than the onset of the semantic processing effect (as indicated by the onset of the N400 effect).

2. Methods 2.1. Participants Twenty four right-handed university students (12 males), all of whom were native Mandarin Chinese speakers, participated in this experiment. The mean age was 23 years (range 21–26). None reported any medical, neurological, or psychiatric illness, and all gave informed consent. The data of 4 participants (2 males) were removed from analysis because of excessive artifacts.

2.2. Stimuli Mandarin Chinese sentences were used as stimuli, which were spoken by a female speaker and recorded at a sampling rate of 22050 Hz. The experimental materials consisted of 148 sets of sentences, with each sentence including a critical word. The critical words were all double-character nouns and were not in the sentence-final position. First, the sentences in each set had the same critical word (e.g., ROSES), which was highly predictable (e.g., On Valentines Day, he bought some ROSES for his girlfriend.) or lowly predictable (e.g., On holidays, he bought some ROSES for his girlfriend.) given the preceding sentence context. By using exactly the same critical words under the high- and low-prediction conditions, we could prevent the differences between words becoming a confounding factor. These critical words were not only congruent in the high-prediction context but also congruent in the low-prediction context. Secondly, the critical words in the sentence were accented or de-accented. For the de-accentuation condition, the sentence accent was naturally present on the sentence-final words. In Mandarin Chinese, expect for accentuation, there are also some lexical markers for focus constituents, such as shi. Therefore, when constructing the materials, it was guaranteed that there was no lexical marker of focus in the sentences. Thirdly, the critical word in every sentence was added with a linguistic attention probe or not. The linguistic attention probe was created by extracting a 50 ms excerpt of the narrator pronouncing the syllable “ba” that was spoken with a light tone. This probe was added to 100 ms after the acoustic onset of the critical words, since at the very beginning of the critical word, the acoustic parameters of accentuation have not been distinctly pronounced. The probe had an intensity of 64 dB in accented condition and an intensity of 63 dB under the de-accented condition, since the intensity of the accented words was almost 1 dB higher than that of the de-accented words (see the next paragraphs for the acoustic analysis of the critical words). Taken together, this resulted in a full factorial design with all combinations of the factors Prediction (high vs. low), Accentuation (accented vs. de-accented), and Probe (with vs. without) (see Table 1 for example sentences under four experimental condition: ‘High-prediction, accented’, ‘High-prediction, de-accented’, ‘Low-prediction, accented’, ‘Low-prediction, de-accented’).

74

X. Li et al. / Neuropsychologia 64 (2014) 71–84

Table 1 Illustrations for the experimental materials used in the present study. Conditions

Example sentences

High-prediction; accented

On Valentines Day, Xiao Li bought some roses for his girlfriend. 给女朋友。 小李在情人节那天买了一些 On Valentines Day, Xiao Li bought some roses for his girlfriend. 小李在情人节那天买了一些玫瑰给女朋友。 On holidays, Xiao Li bought some roses for his girlfriend. 给女朋友。 小李在过节的那天买了一些 On holidays, Xiao Li bought some roses for his girlfriend. 小李在过节的那天买了一些玫瑰给女朋友。

High-prediction; de-accented Low-prediction; accented Low-prediction; de-accented

Note. The underlined words are the critical words. Bold and italic indicated the presence of pitch accent. To validate the degree of predictability of the critical word in the sentence context, we conducted a cloze probability test by presenting the sentence frames until the word before the critical word position. Twenty four participants who did not attend the EEG experiment and other pre-tests were instructed to fill in the first event that came to their mind and made the sentence meaningful. Under the highprediction condition, the close probabilities of the critical words were 79% (range 50–100%). Under the low-prediction condition, the close probabilities of the critical words were 4% (range 0–41%). Meanwhile, even the best completion under the low-prediction condition, which sometimes was different from the critical words we used, only had a cloze probability of 24% (range 8–41%). The paired-T test revealed that the cloze probability of the critical words under the high-prediction condition was not only higher than the cloze probability of the critical words under the low-prediction condition (t(147) ¼ 51.63, p o 0.0001) but also higher than the cloze probability of the best completion under the low-prediction condition (t(147) ¼37.87, p o 0.0001). To confirm that the critical words were congruent both under the high- and low-prediction conditions, we conducted a congruence pre-test by presenting the written sentence up until the critical words. Twenty four participants who did not attend the EEG experiment and other pre-tests were instructed to mark the semantic congruence of the last word in each sentence on a 5-point scale (from  2 to 2). The negative values indicated that the last word was incongruent given the sentence context, and the positive values indicated that the last word was congruent. The larger the score, the more congruent the last word was. The mean scores for the high-prediction condition and low-prediction condition were 1.82 (STDEV¼ 0.26, range 0.4–2) and 1.32 (STDEV¼0.44, range 0.3–2) respectively. The One-Sample T-test showed that both the rating scores under the high-prediction condition (T(147) ¼ 94.04, p o 0.0001) and those under the low-prediction condition (T(147) ¼ 36.26, po 0.0001) were significantly larger than 0. The rating results indicated that the critical words were congruent under both the high- and lowprediction conditions. To establish that our speaker had succeeded in correctly accenting the critical words, ANOVAs were performed on the corresponding acoustic measurements, with Prediction (high vs. low) and Accentuation (accented vs. de-accented) as independent factors. Mandarin Chinese is a tone language, in which there are four kinds of lexical tone: HH (for High tone), LH (for Rising tone), LL (for Low tone), and HL (for Falling tone). The pitch correlate of the lexical tone in Mandarin Chinese is not a single point but a pitch contour, which is called pitch register. Previous studies found that, in Mandarin Chinese, focus accent is realized by the expansion of the pitch range of the lexical tone (pitch register) and by the lengthening of the syllable duration (Chen, 2006; Chen and Gussenhoven, 2008; Xu, 1999; Jia et al., 2008, 2006; Liu and Xu, 2006; Wang et al., 2002). As to what gave rise to the pitch range expansion, some studies found that, for two-character words, focus accent mainly raises the pitch maximum of H tones and has no significant effect on L tones (Jia et al., 2008, 2006). The critical words in the present study were all twocharacter words. Even when the two characters were both Low tones (LL), Thirdtone sandhi changed the first of a sequence of two Low tones into LH. Namely, all of the critical words in the present study included at least one H tone. Given the acoustic realization of focus accent in Mandarin Chinese and the materials used in the present study, the dependent factors for the ANOVAs were pitch maximum, pitch range, and syllable duration of the critical words (namely, from the acoustic onset to the acoustic offset of the critical words). Although intensity is not a reliable acoustic signal for accentuation, we still added intensity as an additional dependent factor.  Pitch maximum ¼ 12log2 Maximum Pitch=100 :  Pitch range ¼ 12log2 Maximum Pitch=Minimum Pitch : The results of the ANOVAs (with duration, intensity, pitch maximum, or pitch range of the critical words as dependent factors) revealed a significant main effect of Accentuation (F(1, 147) ¼ 1828.94, p o 0.0001; F(1, 147) ¼ 21.90, p o 0.0001; F(1, 147) ¼688.45, p o 0.0001; F(1, 147) ¼128.31, p o0.0001 for duration, intensity, pitch maximum, and pitch range respectively), indicating that there were significant increases in syllable duration, intensity, pitch maximum, and pitch range expansion from de-accented words to accented words. Importantly, both the main effect of

Prediction (all p 40.5) and the two-way interaction between Accentuation and Prediction failed to reach significance. The acoustic measurements in the present study confirmed that the experimental sentences were spoken with the intended accentuation pattern (see Fig. 1). Moreover, the results of the acoustic analysis demonstrated that there was no difference between the acoustic parameters (such as duration, intensity, and pitch) of highly predictable words and those of lowly predictable words. The experimental materials (148 sets, with each set including 8 versions of sentences) were grouped into 4 lists of 296 sentences according to the Latin square procedure based on the four experimental conditions (combination of Accentuation and Probe). That is, the high-prediction sentence and the low-prediction sentence coming from the same stimuli set were included in the same list, since they had different sentence-level meanings. Consequently, the same critical word was presented twice in each list, because the sentences under the high- and lowprediction conditions had the same critical word. In each list, there were an equal number of sentences (37 sentences) for each of the eight experimental conditions and additional 60 filler sentences. Subjects were divided into 4 groups, with each group listening to only one list of materials. Meanwhile, the whole list of sentences (356 sentences) was divided into four blocks, with the first and the second presentations of the same critical words being separated by one block; and the order of the twice presentations of the same critical words was counterbalanced between subjects.

2.3. Procedure After the electrodes were positioned, subjects were asked to listen to each sentence for comprehension. Meanwhile, their EEG signals were recorded. At the end of each of the 56 sentences in all of the materials, the subjects were asked to judge the correctness of a question sentence regarding the meaning of the sentence just heard. Each trial consisted of a 300 ms auditory warning tone, followed by 700 ms of silence and the target sentence. To inform subjects of when to fixate and sit still for EEG recording, an asterisk was displayed from 500 ms before onset of the sentence to 1000 ms after its offset. After a short practice session that consisted of 10 sentences, the trials were presented in four blocks of approximately 13 min each, separated by brief resting periods.

2.4. EEG acquisition EEG was recorded (0.05–100 Hz, sampling rate 500 Hz) from 64 Ag/AgCl electrodes mounted in an elastic cap, with an on-line reference linked to the left mastoid and off-line algebraic re-reference linked to the left and right mastoids. EEG and EOG data were amplified with AC amplifiers (Synamps, Neuroscan Inc.). Vertical eye movements were monitored via a supra- to sub-orbital bipolar montage. A right-to-left canthal bipolar montage was used to monitor horizontal eye movements. All electrode impedance levels (EEG and EOG) were kept below 5 kΩ. 2.5. ERP analysis For ERP analysis, the raw EEG data were first corrected for eye-blink artifacts using the ocular artifact reduction algorithm in the Neuroscan v. 4.3 software package. Then, the EEG data were filtered with a band-pass filter 0.1–40 Hz. Subsequently, the filtered data were divided into epochs ranging from 100 ms before the acoustic onset of the critical words to 1000 ms after the acoustic onset of the critical words. A time window of 100 ms preceding the onset of the critical words was used for baseline correction. Trials contaminated by eye movements, muscle artifacts, electrode drifting, amplifier saturation, or other artifacts were identified with a semiautomatic artifact rejection (automatic criterion: signal amplitude exceeding 7 75 uV, followed by a manual check). Trials containing the abovementioned artifacts were rejected (13.5% overall). Rejected trials were evenly distributed among conditions. Finally, averages were computed for each

X. Li et al. / Neuropsychologia 64 (2014) 71–84

75

Fig. 1. The acoustic parameters (duration, intensity, pitch maximum, and pitch range) of the critical words under the four experimental conditions: high-accent indicates ‘High-prediction, accented’; High-deaccent indicates ‘High-prediction, de-accented’; Low-accent indicates ‘Low-prediction, accented’; Low-deaccent indicates ‘Lowprediction, de-accented’.

participant, each condition, and at each electrode site before grand averages were calculated across all participants. Fig. 2 shows overlays of the ERP waveforms elicited by the critical words that were added with an attention probe and the ERP waveforms elicited by the critical words that were not added with a probe. We subtracted the ERP waveforms elicited by the critical words without a probe form those elicited by the corresponding critical words with a probe, which isolated the N1 effect triggered by the attention probe and resulted in four difference waveforms. Fig. 3 shows the difference waveforms under the four experimental conditions (‘High-prediction, accented’, ‘High-prediction, de-accented’, ‘Low-prediction, accented’, and ‘Low-prediction, deaccented’). As seen from Fig. 2B, relative to the de-accented and highly predictable words, the critical words under the other three conditions all elicited larger negative deflections, which reached maximum around 400–500 ms post-word onset (namely, after the acoustic onset of the critical words). We classified those negative effects as N400 effects, since their latency and topography (see Fig. 2) fit the characteristics of an N400 effect. Meanwhile, as seen from Fig. 3, the N1 effect elicited by the attention probe reached maximum around 180 ms after the onset of the probe (namely, 280 ms post-word onset). On the one hand, for the N1 effect elicited by the attention probe, statistical analyses were done on the mean amplitudes and latencies respectively. Analyses of variance were conducted on a selection of midline electrodes and lateral electrodes respectively. First, to estimate whether N1 effect was elicited by the attention probe, ANOVAs-1 were conducted based on the original ERP waveforms under the eight conditions: for the statistic analysis over the midline electrodes, the independent factors were Prediction (high vs. low), Accentuation (accented vs. de-accented), Probe (with vs. without), and Anteriority (anterior: FZ/FCZ, central: CZ/CPZ, and posterior: PZ/POZ); for lateral electrodes, the mean amplitude values were entered into statistic analysis with Hemisphere (left, right) as an additional factor and lateral electrodes (F5/F3/FC3; F4/F6/FC4; C5/C3/CP3; C4/C6/CP4; P5/P3/ PO3; P4/P6/PO4) nested under Hemisphere. Secondly, to examine how Accentuation and Prediction modulated the probe-related N1 effects, ANOVAs-2 were performed based on the difference waveforms: over the midline electrodes, the independent factors were Prediction, Accentuation, and Anteriority; over the lateral electrodes, the independent factors were Prediction, Accentuation, Anteriority, and Hemisphere. On the other hand, to examine how Accentuation and Prediction affected the later stage of semantic processing, ANOVAs-3 were conducted based on the original ERP waveforms elicited by the critical words that were not added with an attention probe. Relative to ‘High-prediction, de-accented’ condition, the N400 effect evoked by the other three conditions lasted for a relatively long time. We were mainly

interested in the statistical analysis conducted within the time window (namely, 300–650 ms post-word onset) reflecting this N400 effect. In addition, statistical analysis was also preformed on the early window latencies: 130–170 ms (N1 elicited by the critical words) and 200–300 ms post-word onset. The independent factors were the same as the ANOVAs-2, the dependent factor was the mean amplitude (uV) in the corresponding time windows. When the degree of freedom in the numerator was larger than one, the Greenhouse-Geisser correction was applied.

2.6. EEG time-frequency analysis Event-related spectral perturbation (ERSP) was applied to characterize the oscillatory brain activities. ERSP represents the time–frequency representations (TFRs) averaged across single trials, which contains both phase-locked and non-phaselocked brain activities. As the statistic analysis for the N400 component, time– frequency analysis was conducted under the four experimental conditions without an attention probe. The raw EEG data were first corrected for eye-blink artifacts using the ocular artifact reduction algorithm in the Neuroscan v. 4.3 software package, filtered with a band-pass filter 1–80 Hz, and divided into epochs ranging from 1000 ms before the acoustic onset of the critical words to 1500 ms after the acoustic onset of the critical words. Trials contaminated by artifacts were rejected automatically using the artifact-rejection algorithm in the Neuroscan v. 4.3 software package (signal amplitude exceeding 775 uV). Then, we further deleted the residual ocular artifacts and muscle artifacts using the ‘reject data using ICA’ algorithm implemented in the statcond function of EEGLAB (EEGLAB 10.2.5.5b, http://www.sccn.ucsd.edu/eeglab) toolbox. Subsequently, ERSPs were calculated using EEGLAB running under Matlab 7.5 (MathWorks, Natick, MA, USA). We applied Morlet wavelet decomposition (Goupillaud et al., 1984) to the 2500-ms data epoch. We used 150 linearly-spaced time points and a series of 32 log-spaced frequencies ranging from 29 Hz to 60 Hz, with 25 cycles at the lowest frequency and 29 cycles at the highest frequency. Power values were normalized with respect to a  500 to 0 ms pre-word onset baseline and transformed into decibel scale (10  log10 of the signal), yielding the ERSPs. The cluster-based permutation test implemented in the Fieldtrip (http:// fieldtrip.fcdonders.nl) soft-ware package (Maris and Oostenveld, 2007) was used to find out the specific frequency bands and the time windows that the ERSP values were significantly distinguished. This non-parametric statistical procedure optimally handles the multiple-comparisons problem. The permutation test was performed within 0–800 ms post-stimuli onset (16 time points) on the 30–60 Hz frequency range. 60 electrodes (PO7 and PO8 were deleted, since PO8 was connected to the right mastoid during on-line EEG recording) were included in

76

X. Li et al. / Neuropsychologia 64 (2014) 71–84

Fig. 2. Grand-average ERPs and topographies of the Prediction effects. A: Grand-average ERPs time-locked to the critical words that were added with a linguistic attention probe under the four experimental conditions; B: Grand-average ERPs time-locked to critical words that were not added with a linguistic attention probe under the four experimental conditions; C: Topography of the Prediction effects (ERP effects time-locked to the critical words when these words were not added with an attention probe) at different levels of Accentuation.

this test. For every data point (electrode by time by frequency) of two conditions, a simple dependent-samples t test is performed. All adjacent data points exceeding a preset significance level (0.05%) are grouped in to clusters. Cluster-level statistics are calculated by taking the sum of the t-values within every cluster. The significance probability of the clusters is calculated by means of the so-called Monte Carlo method with 1000 random draws. The permutation test revealed that: under the accented conditions, the contrast of “high-prediction” and “low-prediction” showed that highly predictable word induced larger gamma power increases around 35–50 Hz within 250–550 ms post-word onset (p o 0.05); under the deaccented conditions, the contrast of “high-prediction” and “low-prediction” revealed no reliable effect at the gamma frequency range. In addition, to further confirm the results of the cluster-based permutation test, we conducted ANOVAs with mean gamma-band power (35–50 Hz, dB) as dependent factor in consecutive 50-ms time windows from 0 to 800 ms post-word onset. The ERSP values were averaged for the left frontal (F5, F3, FC3), right frontal

(F4, F6, FC4), left central (C5, C3, CP3), right central (C4, C6, CP4), left posterior (P5, P3, PO3), and right posterior (P4, P6, PO4) regions. Then, the data were analyzed by repeated-measures ANOVA with Prediction, Accentuation, Anteriority, and Hemisphere as independent factors. The results showed that both the main effect Prediction and the simple main effect of Prediction under accented conditions showed consecutive significance or marginal significance (all p o 0.06) from 300– 350 ms until 500–550 ms post-word onset; in addition, in 250–300 ms window latency, the main effect of Prediction was marginally significant (p ¼ 0.067) and the simple main effect of Prediction under accented conditions reached significance (p o 0.05). That is, the cluster-based permutation test and the consecutive 50-ms ANOVAs revealed the same pattern of results. Based on the above results, the ERSP values for each participant over each electrode under each condition were averaged in the 30–50 Hz band within 250– 550 ms. Then, the data were analyzed by repeated-measures ANOVA with Prediction, Accentuation, Anteriority (frontal, central, vs. parietal), and Hemisphere (left

X. Li et al. / Neuropsychologia 64 (2014) 71–84

77

A

Fig. 3. Difference ERPs waveforms and peak-latencies. A: Difference ERPs waveforms under the four experimental conditions (with-probe minus without-probe). B: The peak-latency (ms, post-probe onset) of the N1 effect elicited by attention probe under the four experiment conditions; left, the averaged peak-latencies on the lateral electrodes (F5/F3/FC3; F4/F6/FC4; C5/C3/CP3; C4/C6/CP4; P5/P3/PO3; P4/P6/PO4); right, the averaged peak-latencies on the midline electrodes (FZ/FCZ, CZ/CPZ, PZ/POZ). High-acc indicates ‘High-prediction, accented’; High-deacc indicates ‘High-prediction, de-accented’; Low-acc indicates ‘Low-prediction, accented’; Low-deacc indicates ‘Low-prediction, de-accented’.

vs. right) as independent factors and electrodes (F5/F3/FC3; F4/F6/FC4; C5/C3/CP3; C4/C6/CP4; P5/P3/PO3; P4/P6/PO4) nested under Hemisphere.

3. Results 3.1. Behavioral results For all of the 20 participants, the accuracy rate of the question sentence was higher than 85%, which indicated that the participants indeed read the sentence for comprehension during the online EEG recording.

3.2. ERP results: the N1 effect elicited by attention probe First, for the ANOVAs-1 based on the original waveforms under the eight conditions, the dependant factor was the mean amplitude (uV) obtained from 40 ms-wide time window around the peak of the corresponding N1 effect. Specifically, the latency windows used to obtain the mean amplitudes were 152–192 ms, 176–216 ms, 152–192 ms, and 160–200 ms post-probe onset for ‘High-prediction, accented’, ‘High-prediction, de-accented’, ‘Lowprediction, accented’, ‘Low-prediction, de-accented’ conditions respectively. The ANOVAs-1 revealed a significant main effect of Probe (Fmidline(1,19) ¼35.64, p o0.0001; Flateral(1,19) ¼ 44.99, po 0.0001), indicating that the critical word with a probe

78

X. Li et al. / Neuropsychologia 64 (2014) 71–84

Table 2 The N1 effect evoked by attention probe under different conditions of Prediction and Accentuation. Source

Pr Pr  P  A Pr on P1A1 Pr on P1A2 Pr on P2A1 Pr on P2A2

p Values in different window latencies (ms, post-probe onset)

Midline Lateral Midline Lateral Midline Lateral Midline Lateral Midline Lateral Midline Lateral

140–150

150–160

160–170

170–180

180–190

190–200

200–210

0.078 0.031 0.005 0.016 0.001 0.001 0.023 0.053 0.014 0.010 0.060 0.029

0.001 0.0001 0.012 0.038 0.118 0.334 0.002 0.002 0.001 0.001 0.005 0.002

0.0001 0.0001 0.082 0.228 0.161 0.051 0.0001 0.0001 0.001 0.0001 0.004 0.002

0.0001 0.0001 0.180 0.256 0.002 0.001 0.001 0.0001 0.0001 0.0001 0.008 0.007

0.0001 0.0001 0.341 0.361 0.001 0.001 0.0001 0.0001 0.0001 0.0001 0.020 0.016

0.0001 0.0001 0.469 0.585 0.0001 0.0001 0.002 0.003 0.0001 0.0001 0.032 0.036

0.0001 0.0001 0.982 0.789 0.0001 0.0001 0.002 0.005 0.013 0.004 0.035 0.034

Notice: the ANOVAs were based on the mean amplitude in the different latency ranges after the onset of the attention probe. The experimental variables included Pr (Probe: with vs. without), P (Prediction: high vs. low), and A (Accentuation: accent vs. de-accent). P1A1 indicated ‘High-prediction, de-accented’; P1A2 indicated ‘High-prediction, accented’; P2A1 indicated ‘Low-prediction, de-accented’; P2A2 indicated ‘Low-prediction, accented’.

(Mean ¼  2.515 uV, STDEV¼0.960) (Mean and STDEV values were calculated based on all of the selected lateral and midline electrodes) elicited a larger N1 than that without a probe (Mean ¼  0.683 uV, STDEV ¼0.623). In addition, there was a significant two-way Probe  Anteriority interaction (Fmidline(2,38) ¼26.40, p o0.0001; Flateral(2,38) ¼ 24.91, p o0.0001), due to the fact that the effect of Probe (the probe-related N1 effect) mainly had a frontal–central distribution, although it reached significance over both the anterior, central, and posterior electrodes (anterior: Fmidline(1,19) ¼ 46.75, p o0.0001, Flateral(1,19) ¼51.76, p o0.0001; central: Fmidline(1,19) ¼40.95, p o0.0001, Flateral(21,19) ¼ 49.08, p o0.0001; posterior: Fmidline(1,19) ¼11.27, p o0.005, Flateral(1,19) ¼ 15.55, p o0.001). To establish the onsets of the above probe-related N1 effects, we conducted ANOVAs-1 with mean amplitude as dependent factor in consecutive 10-ms time windows from 120 ms postprobe onset (e.g. 120–130 ms, 130–140 ms, etc.). Significance or marginal significance (po 0.06) (for the same effect) on 4 consecutive bins was taken as evidence for the existence of a certain effect. According to this criterion, the probe-related N1 effect started from 160–170 ms post-probe onset in the ‘High-prediction, de-accented’ condition, but started from 140–150 ms postprobe onset under the other three conditions (‘High-prediction, accented’, ‘Low-prediction, accented’, and ‘Low-prediction, deaccented’) (see Table 2). That is, the N1 effect under the accented condition started earlier than that under the de-accented condition only when the critical words were highly predictable. Then, based on the difference waveforms, we further examined the peak latency of the probe-related N1 effects. After visual inspection of the ERP waveforms, we defined the time window used to detect the N1 peak-latency (ms). In order to exclude detection bias, the same time window, namely 130–240 ms postprobe onset, was used under the different conditions. The N1 peak latency was detected automatically in the corresponding time window, with occasionally manual adjustment when the time point detected was not on the negative peak-latency. As seen from Fig. 3, the ANOVAs-2 with peak latency (ms) as dependent factor revealed a significant main effect of Accentuation (Fmidline(1,19)¼15.23, po0.001; Flateral(1,19)¼23.21, po0.0001) and a significant main effect of Prediction (Fmidline(1,19)¼3.75, p¼0.068; Flateral(1,19)¼ 11.32, po0.01), indicating that the N1 effect under the accented condition (Mean¼173.74 ms, STDEV¼7.53) peaked earlier than that under the de-accented condition (Mean¼186.40 ms, STDEV¼11.11), and that the N1 effect under the low-prediction condition (Mean¼175.60 ms, STDEV¼11.88) peaked earlier that that under the high-prediction condition (Mean¼184.55 ms, STDEV¼ 7.49). Importantly, the two

way Prediction  Accentuation interaction reached significance (Fmidline(1,19)¼6.63, po0.05; Flateral(1,19)¼9.06, po0.01). Further simple analysis found that, on the one hand, relative to the deaccented condition (Mean¼195.17 ms, STDEV¼12.47), the N1 effect peaked earlier under the accented condition (Mean¼173.94 ms, STDEV¼11.04) when the critical words were highly predictable (Fmidline(1,19)¼20.75, po0.0001; Flateral(1,19)¼25.87, po0.0001), but not when the critical words were lowly predictable (Fmidline(1,19) ¼0.72, p¼0.406; Flateral(1,19)¼1.39, po0.254) (Mean¼173.56 ms, STDEV¼13.80; Mean¼177.64 ms, STDEV¼14.83 for accented condition and de-accented condition respectively). On the other hand, relative to the high-prediction condition, the N1 effect peaked earlier under the low-prediction condition when the critical words were de-accented (Fmidline(1,19)¼ 19.92, po0.0001; Flateral(1,19)¼21.85, po0.0001), but not when the critical words were accented (Fmidline(1,19)¼0.08, p¼0.780; Flateral(1,19)¼0.11, p¼0.739). That is, the peak latency analysis revealed the same pattern of results as the onset latency analysis. In addition, based on the difference waveforms, we also examined the amplitude of the probe-related N1 effects. As mentioned in the early part, the mean amplitude (uV) was obtained from 40 ms-wide time window around the peak of the corresponding probe-related N1 effect. The ANOVAs-2 with mean amplitude as dependent factor resulted in neither significant effect of Prediction nor significant effect of Accentuation (all p40.1). 3.3. ERP results: the N400 effect related to semantic processing Finally, based on the original ERP waveforms elicited under the four experimental conditions without an attention probe, we examined how Accentuation and Prediction modulated the N400 component (namely, the late stage of semantic processing). In the window latencies of 130–170 ms and 200–300 ms post-word onset, the ANOVAs-3 revealed neither significant main effects of Accentuation and Prediction nor their interaction with other factors (all p 40.1). Within 300–650 ms post-word onset, consecutive 50-ms ANOVAs-3 revealed that the first three 50-ms time windows (300– 350 ms, 350–400 ms, and 400–450 ms) showed the same pattern of results and the latter four windows (from 450–500 ms to 600– 650 ms) showed similar pattern of results. Therefore, ANOVAs-3 were performed in the 300–450 ms and 450–650 ms window latencies respectively. First, in the 300–450 ms, the ANOVAs-3 showed a significant main effect of Prediction (Fmidline(1,19) ¼13.98, po 0.0001; Flateral(1,19) ¼11.97, p o0.005), indicating that the lowprediction condition (Mean ¼  1.805 uV, STDEV¼ 0.966) elicited a

X. Li et al. / Neuropsychologia 64 (2014) 71–84

larger N400 than the high-prediction condition (Mean ¼  0.893 uV, STDEV¼0.663). Importantly, there was a significant two way Prediction  Accentuation interaction (Fmidline(1,19) ¼ 10.65, p o0.005; Flateral(1,19) ¼9.08, p o0.01). Further simple analysis showed that the simple main effect of prediction reached significance when the critical words were accented (Fmidline(1,19) ¼ 33.74, po 0.0001; Flateral(1,19) ¼36.87, p o0.0001) (Mean ¼  2.103 uV, STDEV¼0.917; Mean ¼  0.676 uV, STDEV¼ 0.834 for high- and low-prediction conditions respectively), but not when the critical words were de-accented (Fmidline(1,19) ¼ 1.13, p ¼0.300; Flateral(1,19) ¼ 1.30, p ¼0.268). Secondly, in the 450– 650 ms latency, the ANOVAs-3 showed a significant main effect of Prediction (Fmidline(1,19) ¼15.05, p o0.001; Flateral(1,19) ¼12.53, p o0.005) and a significant main effect Accentuation over the lateral electrodes (Fmidline(1,19) ¼ 3.03, p ¼0.098; Flateral(1,19) ¼6.99, p o0.05), indicating that the low-prediction condition (Mean ¼  2.064 uV, STDEV¼ 0.849) elicited a larger N400 than the high-prediction condition (Mean ¼  1.223 uV, STDEV¼ 0.910), and the accented words (Mean ¼  1.853 uV, STDEV ¼0.945) elicited a larger N400 than the de-accented words (Mean ¼  1.333 uV, STDEV¼ 0.735). To establish the onsets of the N400 effects, we conducted ANOVAs-3 with mean power as dependent factor in consecutive 10-ms time windows from 200 ms post-word onset (e.g. 200– 210 ms, 210–220 ms, etc.). Significance or marginal significance (p o0.06) (for the same effect) on 4 consecutive bins was taken as evidence for the existence of a certain effect. According to this criterion, the earliest semantic effect, namely the simple main effect of prediction under accented conditions, started from 310–320 ms post-word onset (Fmidline(1,19) ¼ 10.00, p o0.005; Flateral(1,19) ¼7.00, p o0.05). 3.4. Results of time–frequency representations As seen in Fig. 4, in the 250–550 ms gamma (35–50 Hz) window, the ANOVA revealed a significant main effect of Prediction (F(1,19)¼ 4.99, po0.05), indicating that the high-prediction condition (Mean¼ 0.377 dB, STDEV¼0.575) induced larger gamma power increases than the low-prediction condition (Mean¼0.052 dB, STDEV¼0.558). There were marginally significant two-way Prediction  Accentuation interaction (F(1,19)¼ 3.96, p¼0.061) and significant four-way Prediction  Accentuation  Anteriority  Hemisphere interaction (F(2,38)¼ 4.71, po0.05). Simple analysis revealed that the effect of Prediction reached significance under the accented conditions (F(1,19)¼ 11.84, po0.005) (Mean¼0.613 dB, STDEV¼0.760; Mean¼0.029 dB, STDEV ¼0.670 for high- and low-prediction conditions respectively) but not under the de-accented conditions (F(1,19) ¼ 0.10, p¼ 0.757). Further analysis showed that the simple effect of Prediction under accented conditions reached significance over the frontal–central electrodes (F(1,19)¼5.26, po0.05; F(1,19)¼3.99, p¼ 0.60; F(1,19)¼ 15.81, po 0.001; F(1,19)¼ 14.16, po0.001 for left-frontal, right-frontal, left-central, and right-central respectively).

3.5. Correlation analysis The above results showed that the facilitating effect of high prediction on later semantic integration (as indicated by reduced N400 and enhanced gamma-band power increases for highly predictable words) was more pronounced when the critical words were accented. One-tailed correlation analysis was performed to further examine the relation between early selective attention (the peak latency of the probe-related N1 effect) and later semantic integration (N400 reduction or gamma-band power enhancement induced by highly predictable words) under the accented conditions. The correlation analysis was based on the averaged probe-related N1

79

peak latencies and averaged gamma power enhancement over the frontal and central electrodes (F5, F3, FC3, F4, F6, FC4, C5, C3, CP3, C4, C6, CP4) and the averaged N400 reduction over the central and parietal electrodes (C5, C3, CP3, C4, C6, CP4, P5, P3, PO3, P4, P6, PO4), since the former two had a frontal–central distribution and the latter had a central–parietal distribution. No significant correlation was found between probe-related N1 peak latency and later N400 reduction. However, shorter N1 peak latencies were significantly correlated with greater gamma power enhancement (35–50 Hz within 250–550 ms post-word onset) when the critical words were accented (r¼0.45, po0.05) (see Fig. 5). As seen from Fig. 5A, this significant correlation seems to be driven by only several data points. However, further descriptive statistics demonstrated that all of the data points are within 2.5 standard deviants of the distribution, hence none of the data points being outlier (see Fig. 5B).

4. Discussion In this experiment, we used the attention probe paradigm to investigate how and when temporally selective attention driven by top-down prediction interacts with that driven by bottom-up acoustic signals, such as accentuation, during speech comprehension. The major results were that, first, the N1 effect evoked by the attention probe started and peaked earlier for accented or lowly predictable words; more importantly, prediction and accentuation showed a complementary interplay on the latency of this N1 effect. Secondly, relative to the lowly predictable words, the highly predictable words elicited a smaller N400 and induced larger gamma-band power increases, which was more pronounced when the corresponding words were accented. Finally, the top-down attention interacted with the bottom-up attention already within 240–250 ms post-word onset (namely, 140–150 ms post-probe onset as indicated by the probed-related N1 effect), which was earlier than the semantic integration process (starting around 310–320 ms post-word onset as indicated by the N400 effect). These results are discussed in more detail below. 4.1. The effect of accentuation on temporally selective attention during speech comprehension The present results showed that when the critical words were highly predictable, the N1 effect evoked by the attention probe started and peaked early under the accented condition as compared with that under the de-accented condition. These results indicated that the presence of pitch accent captured more attentional resources, hence the linguistic probe ‘ba’ being detected earlier under the accented condition. As described in Section 2, the critical words used in the present study are all two-character words, and their accentuation is always realized by raising pitch maximum, lengthening syllable duration, and increasing intensity. Someone may argue that the present results are due to the differences in the acoustic parameters of accented and deaccented words, but not related to the functional process, such as selective attention. We argued that, the extraneous N1 account of that early negativity is farfetched due to the following reasons. First, in our pilot experiment, we examined whether the attention probe paradigm used in the present study indeed indexed selective attention process or was just influenced by the differences in the acoustic cues of accented/de-accented words. The critical words in the pilot experiment are all one-character words which have a High tone (HH), Falling tone (HL), or Low tone (LL). For the words with High tone and Falling tone, their accentuation is realized by raising the pitch maximum; in contrast, for the words with Low tone, their accentuation is realized by lowering the pitch

80

X. Li et al. / Neuropsychologia 64 (2014) 71–84

Fig. 4. Time–frequency analysis of electroencephalogram series under the four experimental conditions that were not added with a linguistic attention probe. A: Eventrelated spectral perturbation (ERSP) from electrodes F3 and C4. Black square frames indicate the time window and frequency band for which there are significant differences between the high-prediction and low-prediction conditions. B: Topographies of the Prediction effect of Gamma-band power (35–50 Hz, dB) within 250–550 ms postword onset.

Fig. 5. Correlation between the peak-latency of the N1 effect (elicited by attention probe) and the gamma-band power increases (induced by highly predictable words as compared with lowly predictable words). Shorter N1 peak latencies were correlated with greater gamma power enhancement when the critical words were accented. A: Peak-latency (ms, post-probe onset) and Gamma power enhancement (dB) were not z-normalized. B: Peak-latency and Gamma power enhancement were z-normalized.

X. Li et al. / Neuropsychologia 64 (2014) 71–84

minimum. The results of the pilot experiment showed that, for both words with High/Falling tone and words with Low tone, the peak latency of the N1 effect elicited by the attention probe was shortened in the accented condition as compared with the deaccented condition, which could not be explained by the differences in the acoustic cues. Second, in the present study, the intensity of the accented words is only 1 dB higher than that of the de-accented words. It is less likely that the small difference in intensity (1 dB) would produce an N1 response in the subject, since early studies found that the N1 intensity discrimination thresholds range from 2 to 4 dB at the normal speech frequency band (Harris et al., 2007; McCandless and Rose, 1970; Martin and Boothroyd, 2000). Moreover, under the accented condition of the present study, the probe's relative intensity (‘intensity of the probe’-minus-‘intensity of the concurrent word’) is exactly the same as that under the de-accented condition. Third, although the accented words have higher pitch maximum and longer syllable duration than the de-accented words, when these words were not added with an attention probe, we did not observe significant differences in the ERP waveforms within early window latencies (0–300 ms post-word onset). When these words were added with a transient attention probe, this linguistic probe should be less salient under the accented condition relative to the de-accented condition due to the acoustic parameters of the background speech signal. Therefore, if the N1 effect was related to the physical property of the two conditions, the probe under the deaccented condition would elicit an enhanced or shortened N1 than that under the accented condition, since previous studies demonstrated that N1 components generally increase in amplitude and decease in latency with increments in the salience of auditory stimuli (e.g., Alain et al., 1997; Roberts et al., 2000). However, our results showed a reverse pattern. Finally and importantly, in the present study, we did not directly compare the N1 components elicited by the attention probe. Instead, we subtracted the ERP waveforms elicited by the critical words without a probe form those elicited by the same critical words (namely, the words with the same degree of predictability and accentuation) with a probe, which isolated the probe-related N1 effect and canceled out the differences in the acoustic cues. Meanwhile, in previous studies, the shortening of N1 latency has already been correlated with more attentional resources allocated (Folyi et al., 2012; Lagemann et al., 2010; Obleser and Kotz, 2011). Therefore, in the present study, a more plausible interpretation for the shortened N1 latency on accented words was that, since accentuation usually indicates the presence of important new or focused information, listeners allocate more attentional resources to the moment that carries a pitch accent during speech comprehension, thus the phoneme ‘ba’ presented concurrently with that moment involving additional endogenous processing and being detected earlier. Previous studies have already shown that accentuation can modulate listeners' selective attention during speech processing (Cutler, 1976; Li and Ren, 2012; Sanford et al., 2006). However, these early studies examined the selective attention process by measuring the behavioral responses (Cutler, 1976; Sanford et al., 2006) or the semantic congruence effect (Li and Ren, 2012), hence not being able to tell us whether accentuation can affect selective attention at the early stage of information processing. The present study demonstrated that the presence/absence of accentuation began to modulate the N1 effect elicited by the attention probe at around 240–250 ms post-word onset, which was earlier than the semantic processing effect (starting around 310–320 ms postword onset). Therefore, the present study extended previous results by showing that, during on-line speech comprehension, the effect of accentuation on selective attention can happen at the early functional stage, such as before lexical-semantic processing and decision making.

81

4.2. The effect of prediction on speech comprehension and temporally selective attention As to the effect of prior prediction on speech comprehension, the present study revealed that, relative to the lowly predictable words, the highly predictable words evoked reduced N400 both when these words were accented and de-accented. Moreover, under the accented conditions, the highly predictable words induced larger gamma-band power (35–50 Hz within 250– 550 ms post-word onset) increases than the lowly predicable words. One thing needing to be clarified is that the related N400 effect has been found to start around 200 ms (DeLong et al., 2005), 250 ms (Kutas and Hillyard, 1984), or 300 ms post-word onset (Federmeier, 2007; Thornhill and Van Petten, 2012). However, in the present study, the N400 effect elicited by variations of predictability started around 310–320 post-word onset, which might be related to the materials we used. The critical words in the present study are all two-character words which take about 660 ms to unfold; meanwhile, quite lots of two-character words in Mandarin Chinese share the first character. Therefore, the N400 effect evoked by variations of prediction in the present study started a little bit later than previous studies. The N400 component has already been considered as an index of the difficulty of semantic integration (e.g., Hagoort and Brown, 2000; Van Berkum et al., 1999; Van Berkum et al., 2003; Kutas and Federmeier, 2000) and the increases of gamma band power have been correlated with successful sentence-level semantic integration (e.g., Hald et al., 2006; Hagoort et al., 2004). The present results are in line with previous studies on predictability and language comprehension (e. g., DeLong et al., 2005; Federmeier, 2007; Thornhill and Van Petten, 2012), indicating that semantic processing of the corresponding words is facilitated by prior prediction coming from the given context. The present study also demonstrated how top-down prediction affects temporally selective attention during speech comprehension. That is, the latency of the N1 effect evoked by the attention probe was shortened when the critical words were lowly predictable as compared with when these words were highly predictable. In our study, the critical words under the high- and lowprediction conditions are exactly the same words and have exactly the same acoustic properties. Therefore, the shortened N1 latency indicated that listeners tend to allocate more attentional resources to the lowly predictable, hence more informative, moment in speech, which confirmed the findings of Astheimer and Sanders (2009, 2011). Moreover, the present study further demonstrated that the effect of prediction on selective attention can be observed on knowledge prediction derived from sentence context during natural speech comprehension. 4.3. How and when top-down prediction and bottom-up acoustic signals interact in selective attention during speech comprehension The most important aim of the present study was to investigate how and when prediction (top-down) and acoustic signals (bottom-up) interact with each other in selective attention during speech comprehension. As discussed in the early section, listeners tend to pay more attention to lowly predictable words or accented words, as indicated by shortened probe-related N1 latency. The present results further revealed that the shortened N1 latency for accented words (compared with de-accented words) was observed only when the critical words were highly predictable; the shortened N1 latency for lowly predictable words (compared with highly predictable words) was observed only under the deaccented conditions. That is, when a word in sentence context had already captured attentional resources due to its low predictability or due to the presence of pitch accent, the other factor did

82

X. Li et al. / Neuropsychologia 64 (2014) 71–84

not modulate attention allocation any more. Therefore, top-down predictability and bottom-up acoustic signals showed a complementary interplay in the process of attention allocation. This complementary interaction is interesting because it for the first time exhibits the way of interaction between top-down and bottom-up information in modulating attention allocation along temporal dimension during speech comprehension. As mentioned in Section 1, previous fMRI and MEG studies have examined the effect of predictability on sensory perceptual processing by recording the neural responses in the early perceptual cortex, with some results indicating that predicted sensory signals are attenuated (e.g., Alink et al., 2010; den Ouden et al., 2010; Todorovic et al., 2011; Sohoglu et al., 2012) and others suggesting that prediction enhances rather than reduces sensory signals (e.g., Doherty et al., 2005; Chaumon et al., 2008). Kok et al. (2012) further found that the effect of predictability on silencing sensory signals is observed only when the stimuli are unattended; in contrast, when the stimuli have already attracted attention due to the preceding cue, predictability enhances early perceptual processing. The results of the present study showed that under the de-accented conditions (namely, when the acoustic signals did not attract attention), listeners allocated less attentional resources to the highly predicted speech signal than to the lowly predicted one, which is in some way consistent with the effect of sensory silencing by prediction (e.g., Kok et al., 2012; Sohoglu et al., 2012). That is, when the stimulus is highly predictable and there are no other cues to attract attention, attention allocated to this external stimulus or the early sensory neural responses to this stimulus are reduced. However, under the accented conditions (namely, when the acoustic signal attracted attention) of the present study, the effect of predictability on selective attention disappeared, which is inconsistent with the enhancing effect of predictability observed by Kok et al. (2012). This inconsistency might be related to the differences in Kok and colleagues' study and the present study: first, the former study let the subjects to detect the external stimuli and the latter study instructed the subjects to listen to sentences for comprehension; second, the former study manipulated attention by the cue preceding the stimuli and, in the latter study, attention was modulated by the acoustic signal of the current stimuli. In summary, the present study, combined with early studies, indicated that the effect of predictability on attention allocation or on early sensory processing interacts with other factors that modulate selective attention. In the future, further studies are needed to explore the circumstances under which the specific pattern of prediction effect occurs. Anyway, the present study suggested that, during speech comprehension, top-down predictability and bottom-up acoustic salience show a complementary interplay in the process of attention allocation. As to the time characteristics of the interaction between topdown and bottom-up information during language comprehension, two kinds of models have been put forward. One kind of model argues that the two types of information integrate only at a later decision stage (Fodor, 1983; McQueen et al., 2006; Norris et al., 2000); in contrast, the TRACE and “predictive coding” models propose that they interact with each other before lexical-semantic processing (e.g., McClelland et al., 2006; Friston, 2010; Gagnepain et al., 2012; Sohoglu et al., 2012). When investigating the interaction between top-down and bottom-up processes, previous studies usually examined how high-level knowledge influences the processing of sensory signals directly. Different from those studies, the present study examined how high-level prediction and bottom-up acoustic signals modulate the third factor, namely temporally selective attention. Our results showed that selective attention driven by prediction and that driven by acoustic signals had already interacted with each other around 240–250 ms postword onset (namely, 140–150 ms post-probe onset). Although

time–frequency analysis found that gamma-band power enhancement for the highly predictable words started around 250–300 ms post-word onset, the oscillatory brain activities have a lower level of temporal resolution and, therefore, cannot tell us the precise time characteristics of the lexical-semantic processing. The hightemporal-resolution ERP results demonstrated that the earliest lexical-semantic processing of the current critical words stared only around 310–320 ms post-word onset. That is, the interaction between top-down prediction and bottom-up acoustic signals occurred before lexical-semantic processing, which is consistent with the hypothesis of TRACE and “predictive coding” models. What is the relationship between early selective attention and later semantic processing during speech comprehension? The present results revealed that, prediction and acoustic signals also interacted with each other at the later stage of semantic processing. The reduced N400 evoked by highly predictable words started earlier under the accented conditions than under the deaccented conditions, and the gamma power enhancement for highly predictable words was observed only under the accented conditions. That is, the facilitating effect of prediction on semantic processing was more pronounced when the corresponding words were accented, namely, when these words captured more attention as indicated by the shortened probe-related N1 latency. Furthermore, although correlation analysis did not found significant correlation between probe-related N1 latency and later N400 effect, it observed a N1-gamma correlation. That is, under the accented conditions, the amount of resources allocated at the early stage (more, as indexed by shortened N1 peak latency) was a significant predictor of the depth of semantic processing at the later stage (deeper, as indexed by larger gamma power enhancement). The N1-gamma correlation observed in the present study was consistent with an early study conducted by Obleser and Kotz (2011). That study showed that while, at sentence onset, the N1 was generally reduced for less degraded speech, its reduced amplitude (indicating better signal quality) was found to correlate with larger gamma power enhancement (indicating larger facilitating effect of prediction on semantic processing) at the sentence-final keywords (Obleser and Kotz, 2011). Overall, the N1-gamma correlation observed in the present study indicated that a close relationship might exist between early stage of selective attention and later stage of semantic integration, with deeper semantic processing for words that attracted more attentional resources. However, the present study mainly focused on how and when top-down prediction and bottom-up acoustic signals interact with each other in temporally selective attention during speech comprehension, and was not directly designed to examine the relationship between attention allocation and semantic processing. That is, in the present study, we did not manipulate the degree of semantic congruence of the critical words with respective to the same sentence context, and therefore, could not directly examine how accentuation and predictability affect the third factor, namely, the semantic congruence effect. Moreover, although the N1-gamma correlation observed here reached significance, this correlation was not very strong. The precise relationship between early attention allocation and later semantic processing needs to be studied more directly in future studies. In addition, our results also provide important information about the nature of the N400 component. Within the language domain, one account of N400 is that the N400 is a signature of top-down pre-activation, which reflects facilitated activation of features of the long-term memory representation that is associated with a lexical item (Federmeier, 2007; Lau et al., 2008). The more the semantics of the context activate lexical features of an incoming word, the smaller the amplitude of the N400 evoked by that word is. As mentioned in Section 1, quite a few studies indeed reveal that, in sentence context, a highly predictable word elicits a

X. Li et al. / Neuropsychologia 64 (2014) 71–84

reduced N400 as compared with a lowly predictable word (e.g., Van Berkum et al., 2005; Federmeier, 2007; Thornhill and Van Petten, 2012). Another account of N400 is that the N400 is a signature of unification load, which reflects the process of semantic integration of the critical word into the preceding context (Brown and Hagoort, 1993; Hagoort, 2008). Words that are not different in pre-activation but require additional effort to be combined within the current contextual meaning (sometimes due to the differences in the bottom-up signals) also enhance the amplitude of the N400 (e.g., Baggio et al., 2010; Li et al., 2008a; Li and Yang, 2013). The results of the present experiment demonstrated that, on the one hand, although the highly and lowly predictable words were not different in the bottom-up acoustic signals, the highly predictable one evoked a reduced N400 due to its high predictability; on the other hand, the accented word elicited a larger N400 than the de-accented word, even though the sentence context preceding the two kinds of word was exactly the same. We think that, the enhanced N400 for the accented words is due to the fact that listeners pay more attention to the accented words and make greater effort to integrate these words into the sentence context. Therefore, the present results suggested that both the pre-activation view and unification view of the N400 are reasonable, and the N400 is sensitive to both top-down and bottom-up cues. 5. Conclusions The current study is the first to take a comprehensive look at how and when top-down prediction and bottom-up acoustic signals interact with each other in temporally selective attention during speech comprehension. Our results showed that listeners tend to allocate more attentional resources to words with prominent acoustic signals (namely, accented words) or to lowly predictable words, which is consistent with previous studies (e.g., Astheimer and Sanders, 2009, 2011; Cutler, 1976; Li and Ren, 2012; Sanford et al., 2006). Our result also provided new insights by showing that top-down predictability and bottom-up acoustic signals show a complementary interplay in temporally selective attention during speech comprehension. Moreover, this complementary interaction occurs before lexical-semantic processing, which is in line with TRACE and “predictive coding” models. In addition, the present results indicated that there might be a close relationship between early selective attention and later semantic integration. Acknowledgments This research was supported by Grants from the National Natural Science Foundation of China (31271091). References Alain, C., Woods, D., Covarrubias, D., 1997. Activation of duration-sensitive auditory cortical fields in humans. Electroencephalogr. Clin. Neurophysio./Evoked Potentials Sect. 104 (6), 531–539. Alink, A., Schwiedrzik, C.M., Kohler, A., Singer, W., Muckli, L., 2010. Stimulus predictability reduces responses in primary visual cortex. J. Neurosci. 30 (8), 2960–2966. Aslin, R.N., Saffran, J.R., Newport, E.L., 1999. Statistical learning in linguistic and nonlinguistic domains, The Emergence of Language. Erlbaum, Mahwah, NJ, pp. 359–380. Astheimer, L.B., Sanders, L.D., 2009. Listeners modulate temporally selective attention during natural speech processing. Biol. Psychol. 80 (1), 23–34. Astheimer, L.B., Sanders, L.D., 2011. Predictability affects early perceptual processing of word onsets in continuous speech. Neuropsychologia 49 (12), 3512–3516. Baggio, G., Choma, T., van Lambalgen, M., Hagoort, P., 2010. Coercion and compositionality. J. Cogn. Neurosci. 22 (9), 2131–2140.

83

Bock, J.K., Mazzella, J.R., 1983. Intonational marking of given and new information: some consequences for comprehension. Mem. Cogn. 11 (1), 64–76. Brown, C., Hagoort, P., 1993. The processing nature of the N400: evidence from masked priming. J. Cogn. Neurosci. 5 (1), 34–44. Chaumon, M., Drouet, V., Tallon-Baudry, C., 2008. Unconscious associative memory affects visual processing before 100 ms. J. Vis. 8 (3), 1–10. Chen, Y., 2006. Durational adjustment under corrective focus in standard Chinese. J. Phon. 34 (2), 176–201. Chen, Y., Gussenhoven, C., 2008. Emphasis and tonal implementation in standard Chinese. J. Phon. 36 (4), 724–746. Coch, D., Sanders, L.D., Neville, H.J., 2005. An event-related potential study of selective auditory attention in children and adults. J. Cogn. Neurosci. 17 (4), 605–622. Cutler, A., 1976. Phoneme-monitoring reaction time as a function of preceding intonation contour. Percept. Psychophys. 20 (1), 55–60. Dahan, D., Tanenhaus, M.K., Chambers, C.G., 2002. Accent and reference resolution in spoken-language comprehension. J. Mem. Lang. 47 (2), 292–314. Davis, M.H., Ford, M.A., Kherif, F., Johnsrude, I.S., 2011. Does semantic context benefit speech understanding through “top–down” processes? Evidence from time-resolved sparse fMRI. J. Cogn. Neurosci. 23 (12), 3914–3932. DeLong, K.A., Urbach, T.P., Kutas, M., 2005. Probabilistic word pre-activation during language comprehension inferred from electrical brain activity. Nat. Neurosci. 8 (8), 1117–1121. den Ouden, H.E.M., Daunizeau, J., Roiser, J., Friston, K.J., Stephan, K.E., 2010. Striatal prediction error modulates cortical coupling. J. Neurosci. 30 (9), 3210–3219. Dimitrova, D.V., Stowe, L.A., Redeker, G., Hoeks, J.C., 2012. Less is not more: neural responses to missing and superfluous accents in context. J. Cogn. Neurosci. 24 (12), 2400–2418. Doherty, J.R., Rao, A., Mesulam, M.M., Nobre, A.C., 2005. Synergistic effect of combined temporal and spatial expectations on visual attention. J. Neurosci. 25 (36), 8259–8266. Federmeier, K.D., 2007. Thinking ahead: the role and roots of prediction in language comprehension. Psychophysiology 44 (4), 491–505. Fodor, J., 1983. The Modularity of Mind [Electronic Resource]: An Essay on Faculty Psychology. The MIT Press, Cambridge, MA. Folyi, T., Fehér, B., Horváth, J., 2012. Stimulus-focused attention speeds up auditory processing. Int. J. Psychophysiol. 84 (2), 155–163. Frisson, S., Rayner, K., Pickering, M.J., 2005. Effects of contextual predictability and transitional probability on eye movements during reading. J. Exp. Psychol.: Learn. Mem. Cogn. 31 (5), 862. Friston, K., 2010. The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138. Gagnepain, P., Henson, R.N., Davis, M.H., 2012. Temporal predictive codes for spoken words in auditory cortex. Curr. Biol. 22 (7), 615–621. Goupillaud, P., Grossmann, A., Morlet, J., 1984. Cycle-octave and related transforms in seismic signal analysis. Geoexploration 23 (1), 85–102. Guediche, S., Salvata, C., Blumstein, S.E., 2013. Temporal cortex reflects effects of sentence context on phonetic processing. J. Cogn. Neurosci. 25 (5), 706–718. Gussenhoven, C., 1983. Focus, mode and the nucleus. J. Linguist. 19 (02), 377–417. Hagoort, P., 2008. The fractionation of spoken language understanding by measuring electrical and magnetic brain signals. Philos. Trans. R. Soc. B: Biol. Sci. 363 (1493), 1055–1069. Hagoort, P., Brown, C.M., 2000. ERP effects of listening to speech: semantic ERP effects. Neuropsychologia 38 (11), 1518–1530. Hagoort, P., Hald, L., Bastiaansen, M., Petersson, K.M., 2004. Integration of word meaning and world knowledge in language comprehension. Science 304 (5669), 438–441. Hald, L.A., Bastiaansen, M., Hagoort, P., 2006. EEG theta and gamma responses to semantic violations in online sentence processing. Brain. Lang. 96 (1), 90–105. Harris, K.C., Mills, J.H., Dubno, J.R., 2007. Electrophysiologic correlates of intensity discrimination in cortical evoked potentials of younger and older adults. Hear. Res. 228 (1–2), 58–69. Hillyard, S.A., Hink, R.F., Schwent, V.L., Picton, T.W., 1973. Electrical signs of selective attention in the human brain. Science 182 (4108), 177–180. Hink, R.F., Hillyard, S., 1976. Auditory evoked potentials during listening to dichotic speech messages. Percept. Psychophys. 20 (4), 236–242. Hruska, C., Steinhauer, K., Alter, K., & Steube, A.2000. ERP effects of sentence accents and violations of the information structure. In: Poster Presented at the 13th Annual CUNY Conference on Human Sentence Processing, San Diego, CA. Jia, Y., Li, A., Chen, Y., 2008. Pitch and durational patterns of five-syllable constituents in standard Chinese. Appl. Linguist. 4, 53–61. Jia, Y., Xiong, Z., Li, A., 2006. Phonetic and phonological analysis of focal accents of disyllabic words in Standard Chinese, Chinese Spoken Language Processing. Springer, Singapore, pp. 55–66. Johnson, S.M., Clifton, C., Breen, M., & Morris, J. (2003. An ERP investigation of prosodic and semantic focus. In: Poster presented at Cognitive Neuroscience, New York City. Kok, P., Rahnev, D., Jehee, J.F.E., Lau, H.C., de Lange, F.P., 2012. Attention reverses the effect of prediction in silencing sensory signals. Cerebr. Cortex 22, 2197–2206. Kutas, M., Federmeier, K.D., 2000. Electrophysiology reveals semantic memory use in language comprehension. Trends. Cogn. Sci. 4 (12), 463–470. Kutas, M., Hillyard, S.A., 1984. Brain potentials during reading reflect word expectancy and semantic association. Nature 307 (5947), 161–163. Ladd, D.R., 1996. Intonational Phonology. Cambridge University Press, Cambridge.

84

X. Li et al. / Neuropsychologia 64 (2014) 71–84

Lagemann, L., Okamoto, H., Teismann, H., Pantev, C., 2010. Bottom-up driven involuntary attention modulates auditory signal in noise processing. BMC Neurosci. 11, 156. Laszlo, S., Federmeier, K.D., 2009. A beautiful day in the neighborhood: an eventrelated potential study of lexical relationships and prediction in context. J. Mem. Lang. 61, 326–338. Lau, E.F., Phillips, C., Poeppel, D., 2008. A cortical network for semantics: (de) constructing the N400. Nat. Rev. Neurosci. 9 (12), 920–933. Li, X., Hagoort, P., Yang, Y., 2008a. Event-related potential evidence on the influence of accentuation in spoken discourse comprehension in Chinese. J. Cogn. Neurosci. 20 (5), 906–915. Li, X., Ren, G., 2012. How and when accentuation influences temporally selective attention and subsequent semantic processing during on-line spoken language comprehension: an ERP study. Neuropsychologia 50 (8), 1882–1894. Li, X., Yang, Y., 2013. How long-term memory and accentuation interact during spoken language comprehension. Neuropsychologia 51 (5), 967–978. Li, X., Yang, Y., Hagoort, P., 2008b. Pitch accent and lexical tone processing in Chinese discourse comprehension: an ERP study. Brain Res. 1222 (30), 192–200. Liu, F., Xu, Y., 2006. Parallel encoding of focus and interrogative meaning in Mandarin intonation. Phonetica 62, 70–87. Magne, C., Astésano, C., Lacheret-Dujour, A., Morel, M., Alter, K., Besson, M., 2005. On-line processing of “pop-out” words in spoken French dialogues. J. Cogn. Neurosci. 17 (5), 740–756. Maris, E., Oostenveld, R., 2007. Nonparametric statistical testing of EEG- and MEGdata. J. Neurosci. Methods 164 (1), 177–190. Martin, B.A., Boothroyd, A., 2000. Cortical, auditory, evoked potentials in response to changes of spectrum and amplitude. J. Acoust. Soc. Am. 107, 2155–2161. McCandless, G.A., Rose, D.E., 1970. Evoked cortical responses to stimulus change. J. Speech Hear. Res. 13, 624–634. McClelland, J.L., Elman, J.L., 1986. The TRACE model of speech perception. Cogn. Psychol. 18 (1), 1–86. McClelland, M.M., Acock, A.C., Morrison, F.J., 2006. The impact of kindergarten learning-related skills on academic trajectories at the end of elementary school. Early Child. Res. Q. 21 (4), 471–490. McQueen, J.M., Cutler, A., Norris, D., 2006. Phonological abstraction in the mental lexicon. Cogn. Sci. 30 (6), 1113–1126. Näätänen, R., Picton, T., 1987. The N1 wave of the human electric and magnetic response to sound: a review and an analysis of the component structure. Psychophysiology 24 (4), 375–425. Näätänen, R., Winkler, I., 1999. The concept of auditory stimulus representation in cognitive neuroscience. Psychol. Bull. 125 (6), 826. Norris, D., McQueen, J.M., Cutler, A., 2000. Merging information in speech recognition: feedback is never necessary. Behav. Brain Sci. 23 (03), 299–325. Obleser, J., Kotz, S.A., 2010. Expectancy constraints in degraded speech modulate the language comprehension network. Cerebr. Cortex 20, 633–640.

Obleser, J., Kotz, S.A., 2011. Multiple brain signatures of integration in the comprehension of degraded speech. NeuroImage 55, 713–723. Posner, M.I., 1980. Orienting of attention. Q. J. Exp. Psychol. 32 (1), 3–25. Roberts, T.P.L., Ferrari, P., Stufflebeam, S.M., Poeppel, D., 2000. Latency of the auditory evoked neuromagnetic field components: stimulus dependence and insights toward perception. J. Clin. Neurophysiol. 17 (2), 114–129. Sanford, A.J., Sanford, A.J., Molle, J., Emmott, C., 2006. Shallow processing and attention capture in written and spoken discourse. Discourse Process. 42 (2), 109–130. Shattuck-Hufnagel, S., Turk, A.E., 1996. A prosody tutorial for investigators of auditory sentence processing. J. Psycholinguist. Res. 25 (2), 193–247. Sohoglu, E., Peelle, J.E., Carlyon, R.P., Davis, M.H., 2012. Predictive top-down integration of prior knowledge during speech perception. J. Neurosci. 32 (25), 8443–8453. Spratling, M.W., 2008. Reconciling predictive coding and biased competition models of cortical function. Front. Comput. Neurosci. 2 (4), 1–8. Stevens, C., Sanders, L., Neville, H., 2006. Neurophysiological evidence for selective auditory attention deficits in children with specific language impairment. Brain Res. 1111 (1), 143–152. Terken, J., Noteboom, S.D., 1987. Opposite effects of accentuation and deaccentuation on verification. Latencies for given and new information. Lang. Cogn. Process 2 (3-4), 145–163. Thornhill, D.E., Van Petten, C., 2012. Lexical versus conceptual anticipation during sentence processing: frontal positivity and N400 ERP components. Int. J. Psychophysiol. 83, 382–392. Todorovic, A., Van Ede, F., Maris, E., De Lange, F.P., 2011. Prior expectation mediates neural adaptation to repeated sounds in the auditory cortex: an MEG study. J. Neurosci. 31 (25), 9118–9123. Toepel, U., Pannekamp, A., Alter, K., 2007. Catching the news: processing strategies in listening to dialogs as measured by ERPs. Behav. Brain Funct. 3, 53. Van Berkum, J.J., Brown, C.M., Hagoort, P., 1999. Early referential context effects in sentence processing: evidence from event-related brain potentials. J. Mem. Lang. 41 (2), 147–182. Van Berkum, J.J., Brown, C.M., Zwitserlood, P., Kooijman, V., Hagoort, P., 2005. Anticipating upcoming words in discourse: evidence from ERPs and reading times. J. Exp. Psychol.: Learn., Mem. Cogn. 31 (3), 443–467. Van Berkum, J.J., Zwitserlood, P., Hagoort, P., Brown, C.M., 2003. When and how do listeners relate a sentence to the wider discourse? Evidence from the N400 effect. Cogn. Brain Res. 17 (3), 701–718. Wang, B., Lu, S., Yang, Y., 2002. The pitch movement of stressed syllable in Chinese sentences. Acta Adustica-Peking 27 (3), 234–240. Wild, C.J., Davis, M.H., Johnsrude, I.S., 2012. Human auditory cortex is sensitive to the perceived clarity of speech. NeuroImage 60 (2), 1490–1502. Xu, Y., 1999. Effects of tone and focus on the formation and alignment of f0 contours. J. Phon. 27 (1), 55–105.