Neuropsychologia 63 (2014) 165–174
Contents lists available at ScienceDirect
Neuropsychologia journal homepage: www.elsevier.com/locate/neuropsychologia
The time course of spoken word recognition in Mandarin Chinese: A unimodal ERP study Xianjun Huang a,n, Jin-Chen Yang b,c,nn, Qin Zhang a, Chunyan Guo a a
Beijing Key Laboratory of Learning and Cognition and Department of Psychology, Capital Normal University, Beijing, China University of California Davis, Center for Mind and Brain, Davis, CA 95618-5412, USA c University of California Davis, Department of Neurology, School of Medicine, Sacramento, CA 95618-5412, USA b
art ic l e i nf o
a b s t r a c t
Article history: Received 27 March 2014 Received in revised form 8 August 2014 Accepted 14 August 2014 Available online 27 August 2014
In the present study, two experiments were carried out to investigate the time course of spoken word recognition in Mandarin Chinese using both event-related potentials (ERPs) and behavioral measures. To address the hypothesis that there is an early phonological processing stage independent of semantics during spoken word recognition, a unimodal word-matching paradigm was employed, in which both prime and target words were presented auditorily. Experiment 1 manipulated the phonological relations between disyllabic primes and targets, and found an enhanced P2 (200–270 ms post-target onset) as well as a smaller early N400 to word-initial phonological mismatches over fronto-central scalp sites. Experiment 2 manipulated both phonological and semantic relations between monosyllabic primes and targets, and replicated the phonological mismatch-associated P2, which was not modulated by semantic relations. Overall, these results suggest that P2 is a sensitive electrophysiological index of early phonological processing independent of semantics in Mandarin Chinese spoken word recognition. & 2014 Elsevier Ltd. All rights reserved.
Keywords: Spoken word recognition Unimodality Mandarin Chinese P2 N400
1. Introduction In an individual's daily life, a great deal of information is received through the auditory modality. In speech comprehension, listeners have to decode the continuous and rapidly changing acoustic signal, involving processes such as phonological encoding, word parsing, and semantic integration. There is great difference between time course of visual and speech processing (Mattys, 1997; McQueen, & Cutler, 2001; Fei-Fei, Iyer, Koch, & Perona, 2007; Magnuson, Dixon, Tanenhaus, & Aslin, 2007; O’Callaghan, 2008). Visual information from a scene can be grasped just at a glance, whereas the speech signal unfolds with time and is often accessed incrementally. Because of the continuous and transient nature of speech signal, spoken words must be rapidly mapped onto words stored in long-term memory. A handful of cognitive models have been proposed to explain spoken word processing, which include the Cohort Model (Marslen-Wilson, 1987), the Distributed Cohort Model (DCM, Gaskell & Marslen-Wilson, 1997, 1999), the TRACE Model (McClelland & Elman, 1986), and the Neighborhood Activation Model (NAM, Luce & Pisoni, 1998), to name a few. These
n Corresponding author at: Beijing Key Laboratory of Learning and Cognition and Department of Psychology, Capital Normal University, Beijing, 100048, China. nn Co-corresponding author at: Center for Mind and Brain, University of California Davis, 267 Cousteau Place, Davis, CA 95618-5412, USA. E-mail addresses:
[email protected] (X. Huang),
[email protected] (J.-C. Yang).
http://dx.doi.org/10.1016/j.neuropsychologia.2014.08.015 0028-3932/& 2014 Elsevier Ltd. All rights reserved.
models generally suppose that an ambiguous speech input will activate a set of word candidates simultaneously, and these multiple candidate words will then compete during lexical access. However, as Malins and Joanisse (2012) noted, spoken word recognition models have been predominantly based on studies of Indo-European languages, with many aspects concerning tonal languages such as Mandarin Chinese remaining unknown. Event-related brain potentials (ERP) possess exquisite temporal resolution (millisecond level), thus offering an excellent noninvasive tool for measuring the time course of ongoing neural events underlying cognitive processes like speech comprehension (Nunez & Srinivasan, 2006). Furthermore, there are several ERP components that are well known to be associated with spoken word recognition; namely, the phonological mismatch negativity (PMN) and N400. It has been well-established that the N400 component is associated with word recognition, indexing real-time semantic/lexical processing and/or semantic integration (as reviewed in Kutas and Federmeier (2011)). A prominent centroparietally distributed N400 component is typically elicited when reading a written sentence in which the final word is incongruent with the context (Kutas & Hillyard, 1980, 1984). N400 has also been demonstrated to be sensitive to phonological processing during lexical access. In phonological priming studies, when the rhyme of the target word is different from that of the prime, a negative N400-like component can be elicited (Rugg, 1984; Praamstra, Meyer, & Levelt, 1994). This phonological N400 has a similar scalp distribution but smaller amplitude compared to the semantic N400 (Radeau, Besson,
166
X. Huang et al. / Neuropsychologia 63 (2014) 165–174
Fonteneau, & Castro, 1998; Perrin & García-Larrea, 2003), and has been shown to be modulated by stimulus modality. Whereas visual stimuli elicit the phonological N400 only in active rhyme judgment tasks (Rugg, 1984; Barrett & Rugg, 1989), auditory stimuli elicit the phonological N400 in both direct rhyme judgment tasks and indirect lexical decision tasks (Praamstra & Stegeman, 1993; Praamstra et al., 1994; Dumay et al., 2001). Using a cross-modal task, Zhao, Guo, Zhou, and Shu (2011) found larger N400 responses to syllable-mismatched monosyllabic Mandarin spoken words, but not to phonetic-segment mismatches, which was different from results in other studies (Brown-Schmidt & Canseco-Gonzalez, 2004; Schirmer, Tang, Penney, Gunter & Chen, 2005; Malins & Joanisse, 2012). Dumay et al. (2001) observed the phonological priming effect for pseudoword and suggested that the phonological N400 may reflect a prelexical level process. Recent studies have demonstrated that orthographic information is activated automatically during spoken word recognition not only in alphabetic languages but also in nonalphabetic languages (Chereau, Gaskell, & Dumay, 2007; Perre, Midgley, & Ziegler, 2009; Zou, Desroches, Liu, Xia, & Shu, 2012). For prime-target pairs sharing orthographic features, reduced N400 amplitudes were observed in comparison with unrelated pairs (Perre et al., 2009; Zou et al., 2012). Hence, N400 can be viewed as a family of components sensitive to multi-dimensional information of language processing. On the other hand, an earlier ERP component appearing about 200–250 ms after the onset of a spoken word has been thought to be specifically related to speech processing (McCallum, Farmer, & Pocock, 1984; Holcomb & Neville, 1990; Connolly & Phillips, 1994; Hagoort & Brown, 2000; Van Den Brink, Brown, & Hagoort, 2001). Some researchers regard this component as PMN, a component different from N400, whereas others view it as an early portion of the N400 component. Connolly and Phillips (1994) investigated the time course of spoken word recognition in spoken sentences and found a divergence between phonological and semantic ERP effects. They found that when the initial phoneme of the sentence terminal word was different from that of the expected word and the terminal word was semantically appropriate, a PMN was elicited. When the initial phoneme of the terminal word was identical to the expected word but the terminal word was semantically anomalous, a delayed N400 was obtained. When the terminal word had unexpected initial phonemes as well as anomalous semantics, both a PMN and an enhanced N400 were evident (Connolly & Phillips, 1994). These authors thus suggested that PMN reflects the acoustic-phonetic processing of the initial phoneme of a spoken word. Using a picturespoken word matching paradigm, Desroches, Newman, and Joanisse (2009) examined the effect of phonological similarity on the time course of spoken word recognition. Their results showed that unrelated spoken words elicited both PMN and N400 (e.g., CONEfox); the rhyme condition produced a PMN and a reduced N400 (e. g., CONE-bone); and a delayed and enhanced N400 was recorded under the cohort condition (e.g., CONE-comb). They suggested that PMN and N400 in spoken word recognition reflect pre-lexical and lexical processes, respectively. Malins and Joanisse (2012) studied Mandarin Chinese using the same paradigm as in Desroches et al. (2009), with the mismatches between picture names and auditory words being segmental (i.e., with identical phonemes but different tones; e.g., picture: hua1 ‘flower’; sound: hua4 ‘painting’), cohort (e.g., picture: hua1 ‘flower’; sound: hui1 ‘gray’), rhyme (e.g., picture: hua1 ‘flower’; sound: gua1 ‘melon’), tonal (i.e., with identical tones but different phonemes; e.g., picture: hua1 ‘flower’; sound: jing1 ‘whale’), or unrelated (e.g., picture: hua1 ‘flower’; sound: lang2 ‘wolf’). Three phonological processing-related ERP components, namely PMN, N400 and Late N400, were identified in their results. However, as stated above, some investigators consider the early negativity during spoken word recognition as an early portion of
the N400. Hagoort and Brown (2000) studied semantic violation effects in spoken sentence-final and sentence-medial positions and found an early negative component peaking around 250 ms, followed by a typical N400. They proposed that this “N250” component is associated with word selection from multiple lexical candidates, while the N400 is associated with semantic integration. Van Den Brink et al. (2001) investigated the terminal word recognition in spoken sentences with ERPs. In addition to the N400 effect, they found an N200 under the fully incongruent condition, which was well distributed across the whole head. These authors suggested that this N200 is not a PMN, but an earlier form of the N400 reflecting lexical selection processes. In a study of spoken word recognition in Mandarin Chinese sentence, Liu, Shu and Wei (2006) observed earlier negativities under the unrelated and rhyme incongruous conditions when compared to the cohort incongruous condition. They also interpreted the earlier negativities as a form of the N400. Taken together, ERP studies on the time course of spoken word recognition generally agree that there are two negative components (i.e., PMN and N400) associated with the recognition processes. However, it remains controversial whether the components (especially the PMN) reflect phonological processing or lexical selection during recognition. The conflicting findings reviewed above likely arise from discrepancies among materials and methods employed in different studies. It has been well-established that word processing could be modulated by stimulus modality (Rugg, 1984; Barrett & Rugg, 1989; Praamstra & Stegeman, 1993; Praamstra et al., 1994; Dumay et al., 2001). A cross-modal design (e.g., picture-word matching task) may not activate the most typically used spoken word processing routes (Ventura, Morais, Pattamadilok, & Kolinsky, 2004). Therefore, two ERP experiments were carried out in the present study to examine the time course of spoken word recognition in Mandarin Chinese by utilizing a unimodal word-matching design in which target words were kept identical across conditions. Disyllabic words were used as stimuli in experiment 1, with the same target word preceded by identical, unrelated, or cohort primes (i.e., sharing the same initial syllable as the target). By manipulating phonological relations between primes and targets in experiment 1, we hypothesized that there is an early phonological processing stage, and an earlier phonological processingspecific component like the PMN elicited by word-initial mismatch in Mandarin Chinese spoken word recognition. Monosyllabic words were used as stimuli in experiment 2, besides the identical and unrelated primes, phonologically-unrelated antonymous primes were used to examine whether the phonological mismatch effect was associated with semantic activation or not. Because of the unimodal (auditory) nature of the paradigms employed, we hypothesized that the current study would provide electrophysiological evidence for a phonological processing stage prior to semantic processing in Mandarin Chinese spoken word recognition, of which the time course has not been fully characterized.
2. Experiment 1 In experiment 1, the time course of disyllabic spoken word recognition was investigated using electrophysiological and behavioral measures. 2.1. Methods 2.1.1. Participants Sixteen right-handed undergraduate and graduate students (8 male, 8 female, mean age¼20.2, age range¼17–26) from the Capital Normal University in Beijing participated in the experiment.
X. Huang et al. / Neuropsychologia 63 (2014) 165–174
They were all native Mandarin Chinese speakers, and reported neither speech or hearing problems, nor neurological disorders. Each subject was paid for participating in the experiment. 2.1.2. Materials Spoken words stimuli were disyllabic words selected from the Sixth edition of the Contemporary Chinese Dictionary (Institute of Linguistics, 2012). The stimuli were divided into three conditions according to the phonological and semantic relations within each word pair. As mentioned earlier, one target word was used across all the three conditions but preceded by different primes under each condition. (1) Cohort, the first syllables of the prime and the target words were the same but the meaning of the two words was unrelated, e.g., ge2bi4—ge2shi4 (“the next door”—“format”). (2) Identical, the target repeated the prime, e.g., ge2shi4—ge2shi4 (“format”—“format”). (3) Unrelated, the two words were completely unrelated, neither phonologically nor semantically, e.g., ran2hou4—ge2shi4 (“then”—“format”). A total of 48 words were selected as targets. For each target, three prime words were created for each condition described above. The mean frequencies of the disyllabic cohort primes, identical primes/targets, and unrelated primes were 873, 604, and 745 times per million, respectively (Liu et al. 1990). Another 48 words served as fillers. In the filler trials, the primes and the targets were always identical. Auditory words were read by a male radio broadcaster at normal pace. The sounds were digitally recorded at 16 bits with a sampling rate of 44,100 Hz, edited by Praat 5.1.20. (http://www.fon.hum.uva.nl/praat/download_win. html). The mean duration was 265 ms for the first syllable of the targets, and 314 ms for the second syllables. A total of 192 trials were constructed and split into 4 blocks. 2.1.3. Procedure Participants were individually tested in a dimly lit, acoustically and electrically shielded room. Presentation .71 (Neurobehavioral System, Albany, California, USA) was used for stimulus presentation and behavioral responses collecting. The procedure was illustrated in Fig. 1. At the beginning of each trial, a fixation cross appeared for 300–600 ms (jittered, mean ¼ 450 ms) on a PC monitor, then a pair of prime and target were played, with a 1500–2000 ms jittered stimulus onset asynchrony (SOA, mean ¼1750 ms). The fixation cross remained on the screen for 3500 ms after the prime onset, followed by a blank screen of 1000–1500 ms before the next trial. The two words were presented at a comfortable level through a pair of headphones. When hearing the
167
second word in each pair, participants were asked to judge whether the second word was the same as the first one or not, by pressing the left or the right button on the mouse with two thumbs fast and accurately. The assignment of the left/right buttons to the yes/no responses was counterbalanced across subjects. Participants were instructed to fixate the cross at the center of the screen while minimizing head movement and eye blinks during the experiment. Before the test session, subjects performed a practice session with 20 trials to become familiar with the procedure. The order of stimulus presentation was pseudo-randomized, with the very first two trials in each block always being fillers. The entire experiment lasted for about 30 min. 2.1.4. ERP recording and data analysis Electroencephalography (EEG) was recorded using a 64-channel Quik-cap (Neuroscan Inc., El Paso, Texas, USA) with electrodes positioned according to the international 10–20 system. Between-electrode impedances were kept below 5 KΩ. The EEG signal was sampled at a rate of 500 Hz. All electrodes were referenced to the left mastoid during online recording and re-referenced offline to the linked mastoids. Bipolar vertical and horizontal electrooculography (EOG) were also measured. The continuous EEG data were segmented into epochs of 100 to 800 ms relative to the onset of the second spoken words (i.e., targets) within each pair. Data were then filtered offline using a 24 dB zero-phase-shift digital bandpass filter (.05–30 Hz). Epochs were baseline corrected with the average voltage of the prestimulus interval. EEG to trials with incorrect behavioral responses were excluded from data analysis. Trials containing eye blinks and other excessive artifacts were also rejected using a maximumvoltage criterion of 750 μV at all scalp electrodes. There were over 36 clean epochs included into averaging for each condition (cohort¼38.678, identical¼38.676, unrelated¼38.977). Based on the visual inspection of averaged ERP waveforms (Fig. 2), mean amplitudes were measured for four ERP components: the N1 (110–160 ms), the P2 (200–270 ms), the early N400 (350–450 ms), and the Late N400 (450–650 ms). Statistical analyses were conducted using SPSS 16.0 (Spss Inc, Chicago, Illinois, USA). Three-way repeated measures analysis of variance (ANOVA) was performed on the mean amplitude of each component, with three within-subjects variable of Condition (cohort, identical, and unrelated), Hemisphere (left, right) and Region (anterior, central, and posterior). For the factors Hemisphere and Region, 30 selected electrodes were divided into 6 clusters: left anterior (AF3, F7, F5, F3, and FC5); right anterior (AF4, F8, F6, F4, and FC6); left central (T7, C5, C3, FC1, and CP1); right central (T8, C6, C4, FC2, and CP2); left posterior (CP5, P3, P5, P7, and PO3); right posterior (CP6, P4, P6, P8, and PO4). In addition, PMN (250–320 ms) was analyzed within a fronto-central cluster of 10 electrodes (F3, F1, Fz, F2, F4, C3, C1, Cz, C2, and C4). Greenhouse–Geisser correction (Greenhouse & Geisser, 1959) was applied to correct for violations of the sphercity assumption when appropriate. 2.2. Results
Fig. 1. The procedure of experiment 1.
2.2.1. Behavioral results Reaction time (RT) and accuracy data for the different conditions are presented in Table 1. Repeated-measures ANOVA showed a significant main effect of Condition on the RTs: F(2,30) ¼21.10, po .001,η2 ¼ .58. Furthermore, post-hoc comparisons (Tukey HSD) revealed that, the RTs under the cohort condition were significant longer than those under the identical (t(15) ¼10.30, p o.001, twotailed) and unrelated conditions (t(15) ¼4.03, p o.01, two-tailed). No difference in RTs between identical and the unrelated conditions was found (t(15) ¼1.61, p ¼.13, two-tailed). ANOVAs
168
X. Huang et al. / Neuropsychologia 63 (2014) 165–174
Fig. 2. (A) Average ERP waveforms to disyllabic targets under the three conditions: cohort (red line), identical (blue line) and unrelated (black line). (B) Topographic distribution of ERP differences: unrelated minus identical (top row) and cohort minus identical (bottom row). (C) Global field power (GFP) waveforms to disyllabic targets under the three conditions.
Table 1 Mean RT and accuracy for each condition (Mean (SD)) in experiment 1. Condition
RT (ms)
ACC (%)
Cohort Identical Unrelated
785 (88) 713 (86) 734 (116)
98.2 (2.3) 98.4 (2.9) 99.7 (.7)
performed on accuracy data showed no significant effect of Condition (F(2,30) ¼ 2.48,p ¼.10,η2 ¼ .14).
2.2.2. ERP results Grand average ERP waveforms are illustrated in Fig. 2. Analyses for the N1 (110–160 ms) revealed that there was no significant Condition Hemisphere Region interaction (F(4,60)¼.70, p¼ .61, η2 ¼ .04), Condition Hemisphere interaction (F (2,30)¼ 1.76,
p¼.19, η2 ¼ .11), or Condition Region interaction (F (4,60)¼ 1.78, p¼.14, η2 ¼.11). Main effects of Condition and Hemisphere were not significant either (F(2,30)¼ 1.61, p¼.22, η2 ¼.10 and F(1,15)¼ 1.10, p¼.32, η2 ¼ .06, respectively ). Only a significant main effect of Region was obtained (F(2,30)¼39.36, po.001, η2 ¼ .72), with more negativities distributed anteriorly in the N1 time window. For P2 (200–270 ms), ANOVAs showed neither a significant Condition Hemisphere Region interaction (F(4,60) ¼1.33, p¼ .27, η2 ¼.08), nor a significant Condition Hemisphere interaction (F(2,30) ¼ 1.76, p¼ .188, η2 ¼.11). Condition interacted with Region (F(4,60) ¼ 16.78, p o.001, η2 ¼.53). Follow-up analyses comparing the three conditions at anterior, central, and posterior sites separately found significant main effects of Condition at anterior and central sites (F(2,30) ¼ 15.56, p o.001, η2 ¼.51 and F(2,30) ¼ 6.98, p o.01, η2 ¼ .32, respectively), with significantly larger positivities elicited by targets under the unrelated condition than under the cohort and the identical conditions, while no significant differences were found between the cohort and the identical
X. Huang et al. / Neuropsychologia 63 (2014) 165–174
Table 2 Comparison of each condition for the P2 component in experiment 1. P2 component
Site
t
Unrelated vs. identical
Anterior Central Posterior
4.33nn 2.89n .56
Anterior Central Posterior
.07 .26 1.06
Cohort vs. identical
n
Po .05 (two-tailed). Po .01 (two-tailed).
nn
conditions (e.g., mean amplitudes at anterior sites: unrelated¼1.65 μV, cohort ¼ 1.25 μV, identical ¼ 1.21 μV; see Table 2 for t-test results). The enhanced P2 component under the unrelated condition can also been seen from the global field power (GFP) waveforms in Panel C of Fig. 2. No main effect of Condition was observed for the P2 at posterior sites (F(2,30) ¼ 1.024, p ¼.37, η2 ¼.06), indicating a fronto-central scalp distribution of this component, as shown by the ERP topographic maps in Fig. 2. For the PMN (250–320 ms) within the fronto-central cluster, ANOVA showed a significant Condition effect (F(2,30) ¼ 7.35, p o.01, η2 ¼.33). Furthermore, post-hoc analyses showing reduced negativities to the unrelated condition as compared to the cohort and identical conditions and no significant differences between cohort and identical conditions. For the early N400 (350–450 ms), no significant Condition Hemisphere Region interaction was obtained (F(4,60) ¼.53, p ¼.72, η2 ¼.03). Neither the Condition Hemisphere interaction nor the Hemisphere Region interaction was significant (F (2,30) ¼ .74, p ¼.48, η2 ¼.05 and F(2,30) ¼ 1.89, p ¼.17, η2 ¼.11, respectively). Condition interacted with Region (F(4,60) ¼ 17.01, p o.001, η2 ¼.53). Follow-up analyses found significant main effects of Condition at three regions (F(2,30) ¼20.26, p o.001, η2 ¼.58; F(2,30) ¼8.31, po .01, η2 ¼ .36; F(2,30) ¼4.53, p o.05, η2 ¼.23; respectively). Furthermore, post-hoc analyses showed that, at anterior and central sites, the cohort and identical conditions elicited larger negativities than the unrelated condition, while there was no difference between the cohort and the identical conditions. At posterior sites, the early N400 was the largest under the cohort condition, with no significant difference between the other two conditions. ANOVAs for the Late N400 (450–650 ms) demonstrated no significant Condition Hemisphere Region interaction (F(4,60)¼.67, p¼.62, η2 ¼ .04). Neither the Condition Hemisphere interaction nor the Hemisphere Region interaction was significant (F(2,30)¼1.40, p¼.26, η2 ¼.09 and F(2,30)¼ 1.09, p¼.35, η2 ¼.07, respectively). Condition interacted with Region (F(4,60)¼13.69, po.001, η2 ¼ .48). Follow-up analyses found significant main effects of Condition at three sites (F(2,30)¼6.89, po.01, η2 ¼.32; F(2,30)¼15.18, po.001, η2 ¼.50; F(2,30)¼37.72, po.001, η2 ¼.72, respectively). Furthermore, post-hoc analyses showed that, at anterior and central sites, the cohort condition elicited a larger Late N400 than the other two conditions. At posterior sites, the three conditions were different from each other, with the largest Late N400 under the cohort condition, and the smallest under the identical condition. 2.3. Discussion In this experiment, we tested the hypothesis that there is an early phonological processing stage in spoken word recognition, and that an earlier phonological processing-specific component like PMN will be elicited by word-initial mismatch. By
169
manipulating the phonological similarities of the first syllables in word pairs, instead of the PMN, we found a phonological mismatch-associated P2 effect at the early stage, accompanied by an early N400 and Late N400 components. 2.3.1. Reaction time RTs of the cohort condition were the longest, indicating difficulties in processing. It has been well-established that behavioral priming effects are determined by the location and the extent of phonological overlap between prime and target. Word-final overlap generally produces facilitatory effects on reaction times in both lexical decision and shadowing tasks (Radeau, Morais, & Segui, 1995; Praamstra et al., 1994; Slowiaczek, McQueen, Soltano, & Lynch, 2000). However, the effects of word-initial overlap are more complex, with facilitation, inhibition, or more often null effects being reported in various experimental tasks including lexical decision, shadowing, and identification in noise (Goldinger, 1998; Praamstra et al., 1994; Radeau et al., 1995). In experiment 1, we observed a RT inhibition under the cohort condition only, suggesting additional processing costs for resolving the ambiguity caused by the word-initial overlap. 2.3.2. P2 The P2 is another electrophysiological correlate of phonological processing in speech perception (Dorman, 1974). To our knowledge, this is the first study reporting a P2 phonological mismatch effect during spoken word recognition. The occurrence of a P2 instead of a PMN is likely due to the unimodal auditory priming paradigm used in this experiment. Compared to a cross modality design, there would be more phonological links between the prime and target when both were presented in the auditory channel, as Gaskell and Marslen-Wilson (2002) have pointed out. To interpret the results of experiment 1, we suggest that the auditorily presented prime word likely leads to a more direct activation of its phonological representations than a cross-modal prime does. Then the activated phonological representations of the prime are temporarily stored in the phonological buffer of working memory (Baddeley, Lewis, & Vallar, 1984; Martín-Loeches, Schweinberger, & Sommer, 1997; Jacquemot & Scott, 2006) and interact with the subsequent phonological input from the target word during the word-matching task. When processing the first syllable of targets under both cohort and identical conditions, bottom-up speech input would be temporarily consistent with the prime's phonological representation in working memory, thus there would be no need to retrieve additional phonological information for word-initial from long-term memory. Whereas, under the unrelated condition, phonological representations for the target have to be accessed from the beginning, and the newly activated phonological representations of the target will compete with and be interfered by those of the prime still stored in working memory. Thus, we propose that the enhanced P2 reflects a higher phonological processing load under the unrelated condition as compared to under the cohort and identical conditions. 2.3.3. PMN As reviewed earlier, studies using picture-spoken word matching and shadowing paradigms often obtain an early component PMN preceding N400. The PMN has been thought to reflect a mismatch between bottom-up speech input and activated phonological representation. In the present study, lesser fronto-central negativities to the unrelated condition were found in the PMN window, but the word-initial mismatch modulation on the ERP waveforms was in the opposite direction to that of the typical PMN (more negative to phonological mismatch). However, the reversed PMN effects can be primarily attributed to the overlap with the preceding P2 mismatch effect (Fig. 2).
170
X. Huang et al. / Neuropsychologia 63 (2014) 165–174
2.3.4. Early N400 In the 350–450 ms time window, target words under the cohort and identical conditions elicited larger negativities than the unrelated condition over the frontal scalp sites. Prior studies using the picture-spoken word matching paradigm or sentence as context generally found a larger N400 under the unrelated condition than the identical condition (e.g., Connolly & Phillips, 1994; Desroches et al., 2009; Malins & Joanisse, 2012). In this experiment, the first syllable of targets under the cohort and identical conditions was the same, leaving lexical selection remaining ambiguous until the second syllable was processed. However, under the unrelated condition, the first syllable of a target word was different from that of the prime, so participants did not need the disambiguating process based on the second syllable, resulting in the smallest early N400. Therefore, the early N400 likely reflects the disambiguating processes around the divergence point between the cohort and the identical words. This early N400 was prominent at frontal sites, indicating that it is a distinct component rather than an early portion of the centroparietally distributed semantic N400. In fact, our results suggest that the processing indexed by the early N400 is still phonological in nature. 2.3.5. Late N400 The cohort condition and the unrelated condition elicited larger Late N400 than the identical condition, with the classic centroparietal topographic distribution of the semantic N400 (Fig. 2, panel B). And at many sites, the Late N400 effect under the cohort condition was larger than under the unrelated condition, suggesting that phonological processing could influence the Late N400. Particularly, when the first syllables under the cohort and the identical conditions were the same, the semantic N400 effect was delayed. These findings are consistent with prior studies (Connolly & Phillips, 1994; Desroches et al., 2009; Malins & Joanisse, 2012). There were prominent positivities after 500 ms (apparent from visual inspection) to the repeated words (i.e. the identical condition), which are consistent with the well-documented late positive component (LPC) associated with episodic/declarative memory (see Olichney, Yang, Taylor and Kutas (2011) for a review).
3. Experiment 2 In experiment 1, a reduced P2 component was elicited by the phonological similarity of the first syllables in the prime-target pairs. Experiment 2 took advantage of the higher proportion of monosyllabic words in Mandarin Chinese to further examine whether the P2 component obtained in the unimodal spoken word-matching task is exclusively associated with phonological processing. Since P2 is sensitive to word-initial mismatch, monosyllabic Mandarin words may provide a probing window to test the potential interaction between semantic and phonological processing without introducing multiple syllables. Besides the identical and unrelated conditions like in experiment 1, phonologically-unrelated antonymous word pairs, a type of stimuli with high semantic constrain, were used under the third condition. If the cognitive processes associated with P2 are phonological in nature, we would not find semantic effects of antonyms on P2. 3.1. Methods 3.1.1. Participants Another sixteen right-handed undergraduate and graduate students (8 male, 8 female, mean age ¼21.1, age range¼ 17–25) from the Capital Normal University in Beijing were recruited for this
experiment. All participants were native Mandarin Chinese speakers, and reported no speech/hearing problems, or other neurological abnormalities. Each subject was paid for participation. 3.1.2. Materials A total of 192 monosyllabic words were selected from the same Chinese Dictionary as in experiment 1. The stimuli were divided into three conditions according to the phonological and semantic relations within each monosyllabic word pair. Again, the same target word was used across all the three conditions, but primed by different monosyllabic words each time. The three experimental conditions are: (1) Antonymous: the two words were antonymous, e.g., da4— xiao3 (“big”—“small”). (2) Identical: xiao3—xiao3, e.g., (“small”—“small”). (3) Unrelated: cu4—xiao3, e.g., (“vinegar”—“small”). The mean frequencies of the monosyllabic antonymous primes, identical primes/targets, and unrelated primes were 5613, 4898, and 3764 times per million, respectively (Liu et al., 1990). There were 48 monosyllabic word pairs for each condition as well as another set of 48 monosyllabic words as fillers. Auditory words were recorded and edited in the same way as in experiment 1. The mean duration of the target monosyllabic words was 517 ms. 3.1.3. Procedure Similar experimental procedures as in experiment 1 were employed, except that the SOA in experiment 2 was jittered between 1200 and 1800 ms. 3.1.4. ERP recording and data analysis EEG was recorded in the same way as in experiment 1. Continuous EEG to target words was segmented into epochs of 800 ms (from 100 ms pre-stimulus onset to 700 ms post-stimulus onset). Data were then filtered, baseline-corrected and artifacts-rejected in the same manner as in experiment 1. Clean epochs included into averaging for each condition did not differ (antonymous¼ 4274, identical¼4075, unrelated¼ 4174). Based on the grand average waveforms (Fig. 3, panel A), mean amplitudes of three ERP components were quantified for each condition in the current experiment: N1 (110–160 ms), P2 (200–270 ms), and N400 (270–500 ms). Data were subjected to the same set of statistical analyses as in experiment 1. 3.2. Results 3.2.1. Behavioral results RT and accuracy under different conditions are presented in Table 3. A one-way ANOVA showed a significant main effect of Condition on RTs (F(2,30) ¼ 3.71, p o.05, η2 ¼.20). Furthermore, post-hoc comparisons (Tukey HSD) revealed that responses under the identical condition were significantly faster than under the unrelated condition (t(15) ¼ 2.91, p o.05, two-tailed). However, no significant differences were found for RTs either between antonymous and identical conditions (t(15) ¼ 1.30, p ¼.21, two-tailed), or between antonymous and unrelated conditions (t(15) ¼1.54, p¼ .15, two-tailed). No significant main effect of Condition was found for accuracy (F(2,30) ¼ .31,p ¼.74,η2 ¼ .02). 3.2.2. ERP results ERP results of this experiment are presented in Fig. 3. Analyses for N1 (110–160 ms) revealed no significant Condition Hemisphere Region interaction (F(4,60)¼.14, p¼ .97, η2 ¼.01), Condition Hemisphere interaction (F(2,30)¼.29, p¼.75, η2 ¼ .02), or significant Condition Region interaction (F(4,60)¼ .52, p¼.72,
X. Huang et al. / Neuropsychologia 63 (2014) 165–174
171
Fig. 3. (A) Average ERP waveforms to monosyllabic targets under the three conditions: antonymous (red line), identical (blue line) and unrelated (black line). (B) Topographic distribution of ERP differences: unrelated minus identical (top row) and antonymous minus identical (bottom row). (C) Global field power (GFP) waveforms to monosyllabic targets under the three conditions.
Table 3 Mean RT and accuracy for each condition (Mean (SD)) in experiment 2. Condition
RT (ms)
ACC (%)
Antonymous Identical Unrelated
646 (150) 626 (132) 658 (137)
97.8 (3.1) 97.9 (2.3) 98.3 (2.0)
η2 ¼.03). Main effects of Condition and Hemisphere were not significant either (F(2,30)¼ 1.23, p¼.31, η2 ¼.08 and F(1,15)¼.01, p¼ .99, η2 ¼.001, respectively ). Only main effect of Region was significant (F(2,30)¼ 38.68, po.001, η2 ¼ .72), with more negativities distributed at anterior electrodes in the N1 time window. For P2 (200–270 ms), ANOVAs revealed no significant Condition Hemisphere Region interaction (F(4,60) ¼.54, p ¼.71, η2 ¼ .04). The Condition Hemisphere interaction was not significant either (F(2,30) ¼.22, p ¼.80, η2 ¼ .01). Condition interacted with
Region (F(4,60) ¼ 21.72, po .001, η2 ¼.59). Furthermore, follow-up analyses comparing the three conditions at anterior, central, and posterior sites separately found significant main effects of Condition at anterior and central sites (F(2,30) ¼ 18.37, p o.001, η2 ¼.55 and F(2,30) ¼ 8.53, p o.001, η2 ¼.36, respectively), with significantly larger positivities elicited by targets under the unrelated and antonymous condition than under the identical conditions (e. g., mean amplitudes at anterior sites: unrelated ¼ .10 μV, antonymous ¼ .85 μV, identical ¼ 2.76 μV; see Fig. 3 panel B for topographic distributions and Table 4 for t-test results). The difference of the P2 component under these three conditions can also been seen from the global field power (GFP) waveforms in the Panel C of Fig. 3. No significant difference between antonymous and unrelated conditions was evident. Analyses for N400 (270–500 ms) demonstrated no significant Condition Hemisphere Region interaction (F(4,60)¼.61, p¼.66, η2 ¼.04). Neither the Condition Hemisphere interaction nor the Hemisphere Region interaction was significant (F(2,30)¼.34, p¼.71,
172
X. Huang et al. / Neuropsychologia 63 (2014) 165–174
Table 4 Comparison of each condition for the P2 component in experiment 2. P2 component
Site
Antonymous vs. identical
Anterior Central Posterior
4.08nn 3.20nn .79
Unrelated vs. identical
Anterior Central Posterior
4.82nnn 3.18nn .32
nn
t
Po .05 (two-tailed). Po .01 (two-tailed).
nnn
η2 ¼ .02 and F(2,30)¼ 1.23, p¼.31, η2 ¼.08, respectively). Condition interacted with Region (F(4,60)¼ 11.61, po.001, η2 ¼.44). Follow-up analyses comparing the three conditions at anterior, central, and posterior sites separately found significant main effects of Condition at central and posterior sites (F(2,30)¼18.93, po.001, η2 ¼.56; F(2,30)¼ 20.89, po.001, η2 ¼.58, respectively). Furthermore, post-hoc analyses showed that three conditions were different from each other posteriorly, with the largest negativities under the unrelated condition, and the smallest negativities under the identical condition. No significant main effect of Condition was revealed at anterior sites (F(2,30)¼1.02, p¼.37, η2 ¼.06). 3.3. Discussion The proportion of monosyllabic words in Mandarin Chinese is very high. To investigate whether the P2 effect can also be elicited by monosyllabic words, and whether the P2 effect is somewhat associated with semantic activation or not, monosyllabic words were used and the semantic relation of the prime and target words was manipulated in experiment 2. Similar P2 effects as in experiment 1 were observed, both under the unrelated and the antonymous conditions. Again, when the initial segments or phonemes of the target words were different from those of the primes, larger P2 components were elicited. When the prime and the target words were neither phonologically nor semantically related (i.e., the unrelated condition), the N400 amplitude was the largest among three conditions, indicating the highest semantic processing load for completely unrelated targets. This is consistent with prior findings of Desroches et al. (2009) and Malins and Joanisse (2012). The posterior N400 elicited under the antonymous condition was smaller than under the unrelated condition, but larger than under the identical condition, suggesting that semantic activation of the prime could spread to its antonym and reduce the amount of semantic information needed for the target word. However, even though semantically activated to some extent by the prime, antonymous targets exhibited no P2 reduction effect relative to the completely unrelated targets, indicating that P2 is independent of the semantic interactions between primes and targets, and the P2 component observed in the present study is an electrophysiological marker of phonological processing rather than semantic processing during spoken-word recognition.
4. General discussion The present study represents the first ERP study using a unimodal auditory priming paradigm to investigate the time course of spoken word recognition in Mandarin Chinese. This set of experiments addressed the hypothesis that there is an early phonological processing stage independent of semantics during spoken word recognition. In experiment 1, we mainly manipulated
the phonological relations between disyllabic primes and targets, and found a larger P2 as well as a smaller early N400 over frontocentral scalp sites elicited by word-initial phonological mismatches. In experiment 2, both phonological and semantic relations between monosyllabic primes and targets were manipulated, and the phonological mismatch-associated P2 was replicated. Moreover, we found that P2 was not modulated by the semantic relation between primes and targets. Overall, these results suggest that there is indeed an early phonological processing stage, and that the phonological mismatch effect (i.e., the P2) obtained in our auditory unimodal paradigm occurs earlier than the PMN effect observed in studies using cross-modal priming tasks. 4.1. P2 The phonological mismatch-associated P2 is a novel finding. Prior ERP studies on phonological priming generally found a negativegoing ERP component associated with phonological mismatch, regardless of whether this negative component has been regarded as the PMN (e.g., Connolly & Phillips, 1994; Desroches et al., 2009; Malins & Joanisse, 2012), or an early portion of the N400 (Hagoort & Brown, 2000; Van Den Brink et al., 2001; Liu et al., 2006). In the present study, larger fronto-central distributed positivities (i.e., the P2) were elicited by greater word-initial discrepancies, while smaller early negativities (i.e., the early N400 in experiment 1) were elicited by word-initial mismatch. The phonological mismatch P2 effect found in the present study and the typical PMN appear to be two distinct ERP components given their similar fronto-central scalp distributions but opposite polarities. Possible cognitive mechanisms underlying the P2 mismatch effect in the current study are linked to phonological processing load as described earlier. Specifically, auditory prime words could directly activate stronger phonological representations of auditory target words relative to cross-modally primed targets (Gaskell & Marslen-Wilson, 2002). The activated phonological representations are temporarily available in the phonological buffer of working memory (Baddeley et al., 1984; Martín-Loeches et al., 1997; Jacquemot & Scott, 2006) and reduce the phonological activation demand for target words under the related conditions. In contrast, under the unrelated/antonymous condition, phonological representations for the target have to be accessed from the beginning, which would pose higher phonological activation demand, due to the need to access a majority (if not 100%) of the phonological information for a target. The specific task employed probably accounts for the more prominent phonological mismatch effects on the P2 rather than PMN in the current study. Firstly, to accomplish the Same/Different judgment in the current study, participants need to hold the phonological representations of the prime word in working memory to enable subsequent comparison against the target, but they normally do not need to form a strong phonological expectation thought to be the key factor to induce a PMN (e.g., Newman & Connolly, 2009). Secondly, in the present study, participants were required to focus on the direct phonological comparison between the primes and the targets which were both presented auditorily, while in studies using tasks such as lexical decision and crossmodal priming, primes and targets are usually linked more through their semantic attributes than their phonological ones (Hagoort & Brown, 2000; Van Den Brink et al., 2001; BrownSchmidt & Canseco-Gonzalez, 2004; Schirmer et al., 2005; Liu et al., 2006; Desroches et al., 2009; Zhao et al., 2011; Malins & Joanisse, 2012). The Same/Different judgment in the present experiments can be made based on phonological activations, while other tasks like the cross-modal priming more likely require processing involving interactions between phonological and semantic representations. Thus, we suggest that, the reduced
X. Huang et al. / Neuropsychologia 63 (2014) 165–174
phonological P2 observed in the present study reflects the facilitatory effect of the directly activated phonological representation at the pre-lexical (phonological) processing stage. This earlier phonological modulation on the P2 is in line with studies showing that spoken word recognition may occur very rapidly (e.g., 50 ms post-word onset in MacGregor, Pulvermüller, van Casteren, and Shtyrov (2012); also see Friederici (2012) for a review). The phonological mismatch P2 likely overlaps with and thus reduces PMN in the current study. Further studies that can dissociate the phonological P2 and PMN effects are highly desired. The pre-lexical processing-related P2 effects have been documented in prior studies other than spoken word recognition. There have been a handful of reports linking the P2 component to phonetic/phonological processing both in visual and auditory tasks. Using visual Chinese words as stimuli, several studies (Liu, Perfetti, & Hart, 2003; Chen, Liu, Wang, Peng, & Perfetti, 2007; Hsu, Tsai, Lee, & Tzeng, 2009) have found orthographic/phonological processing-associated P2 components. Barnea and Breznitz (1998) reported a P2 sensitive to both phonological and orthographic processing during Hebrew word recognition. Friedrich, Alter, and Kotz (2001) have found a stronger P2 to initially unstressed pitch contour than initially stressed pitch contour between 200 and 280 ms. 4.2. Early N400 and Late N400 In later time windows, we observed a double dissociation between negativities in the earlier N400 time window and those in the later N400 time window in experiment 1. The early N400 was more negative to the identical targets than to the unrelated ones, whereas the Late N400 was more negative under the unrelated condition than under the identical condition, thus behaving more like the semantic N400. Similar amplitudes of the early N400 were elicited as a result of the overlapped phonemes with the target words under the cohort and identical conditions, suggesting that the early N400 reflected disambiguation/selection amongst multiple phonological candidates activated by the repeated word-initial phonemes under these two conditions. Therefore, we believe that the processing indexed by the early N400 is still phonological in nature. It should be noted that the early N400 appears to be a component different from the PMN, since its amplitude was modulated by phonological mismatch in an opposite direction compared to the PMN. Consistent with previous literature (as reviewed in Kutas and Federmeier (2011)), the more posteriorly distributed N400 showed strong associations with semantic processing, with more negative amplitude obtained for the unrelated targets. This semantic N400 was delayed by the word-initial similarities in experiment 1, which is consistent with prior studies (Connolly & Phillips, 1994; Liu et al., 2006; Desroches et al., 2009; Malins & Joanisse, 2012) suggesting that the onset time of the N400 was influenced by the phonological similarity between primes and targets. The N400 amplitude to antonymous targets in experiment 2 was significantly larger than that to the identical words, while smaller than that to the unrelated targets, suggesting that semantic activation of the prime could spread to the semantically related targets (including opposites) and reduce the amount of semantic processing needed for these target words. One limitation of the present study is the inconsistency in the number of syllables in the stimuli used in the two experiments, which may confound the experimental effects obtained. Further investigations with semantic manipulation in disyllabic word pairs are needed to replicate the current findings. The temporally and functionally different ERP effects (e.g., P2 and semantic N400) in the present study lend support to the existence of an early pre-lexical (phonological) processing stage
173
independent of semantic processing in spoken word recognition. A large body of neuroimaging studies have revealed that there is a dual-processing stream in the auditory language system, with the dorsal stream involved in mapping speech sound to articulation, and the ventral one supporting speech to meaning correspondences (Démonet, Thierry, & Cardebat, 2005; Hickok & Poeppel, 2007; as reviewed in Price (2012)). A recent intracerebral ERP study confirmed such a dichotomy (Trébuchon, Démonet, Chauvel, & Liégeois-Chauvel, 2013). Taken together, by employing a unimodal auditory priming paradigm to investigate the time course of Mandarin Chinese spoken word recognition, the present ERP study is the first to find a P2 component modulated by phonological mismatch. The phonological mismatch effect (i.e., the P2) obtained in our auditory unimodal paradigm, reflecting the processing load at the level of phonological analysis, occurs earlier than the PMN effect observed in studies using cross-modal priming tasks. The current findings support the existence of an early phonological processing stage independent of semantic processing in spoken word recognition. To further examine whether P2 effect is a language-universal index of pre-lexical phonological processing during spoken word recognition, cross-language studies using the unimodal paradigm employed in the present study are recommended. Furthermore, given the high temporal and spatial resolution of magnetoencephalographic (MEG), this technique may help shed light on the underlying generators of the phonological P2 component.
Acknowledgments This study was supported by Grants from the National Natural Science Foundation of China (Grant #31100816), Research Fund for the Doctoral Program of Higher Education (Grant #20101108120005), MOE (Ministry of Education in China) Project of Humanities and Social Sciences (Grant #09YJCLX021), and the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions (Grant #CIT&TCD201404168). We thank Cheyanne Barba for proofreading the manuscript. References Baddeley, A., Lewis, V., & Vallar, G. (1984). Exploring the articulatory loop. The Quarterly Journal of Experimental Psychology, 36(2), 233–252. Barnea, A., & Breznitz, Z. (1998). Phonological and orthographic processing of Hebrew words: electrophysiological aspects. The Journal of Genetic Psychology, 159(4), 492–504. Barrett, S. E., & Rugg, M. D. (1989). Asymmetries in event-related potentials during rhyme-matching: Confirmation of the null effects of handedness. Neuropsychologia, 27(4), 539–548. Brown-Schmidt, S., & Canseco-Gonzalez, E. (2004). Who do you love, your mother or your horse? An event-related brain potential analysis of tone processing in Mandarin Chinese. Journal of Psycholinguistic Research, 33(2), 103–135. Chen, B., Liu, W., Wang, L., Peng, D., & Perfetti, C. A. (2007). The timing of graphic, phonological and semantic activation of high and low frequency Chinese characters: an ERP study. Progress in Natural Science, 17, 62–70. Chereau, C., Gaskell, M. G., & Dumay, N. (2007). Reading spoken words: orthographic effects in auditory priming. Cognition, 102(3), 341–360. Connolly, J. F., & Phillips, N. A. (1994). Event-related potential components reflect phonological and semantic processing of the terminal word of spoken sentences. Journal of Cognitive Neuroscience, 6(3), 256–266. Démonet, J. F., Thierry, G., & Cardebat, D. (2005). Renewal of the neurophysiology of language: functional neuroimaging. Physiological Reviews, 85, 49–95. Desroches, A. S., Newman, R. L., & Joanisse, M. F. (2009). Investigating the time course of spoken word recognition: electrophysiological evidence for the influences of phonological similarity. Journal of Cognitive Neuroscience, 21(10), 1893–1906. Dorman, M. F. (1974). Auditory evoked potential correlates of speech sound discrimination. Perception and Psychophysics, 15(2), 215–220. Dumay, N., Benraïss, A., Barriol, B., Colin, C., Radeau, M., & Besson, M. (2001). Behavioral and electrophysiological study of phonological priming between bisyllabic spoken words. Journal of Cognitive Neuroscience, 13(1), 121–143. Fei-Fei, L., Iyer, A., Koch, C., & Perona, P. (2007). What do we perceive in a glance of a real-world scene? Journal of Vision, 7(1), 1–29 (article 10).
174
X. Huang et al. / Neuropsychologia 63 (2014) 165–174
Friederici, A. D. (2012). The cortical language circuit: from auditory perception to sentence comprehension. Trends in Cognitive Sciences, 16(5), 262–268. Friedrich, C. K., Alter, K., & Kotz, S. A. (2001). An electrophysiological response to different pitch contours in words. NeuroReport, 12(15), 3189–3191. Gaskell, M. G., & Marslen-Wilson, W. D. (1997). Integrating form and meaning: a distributed model of speech perception. Language and Cognitive Processes 12(5–6), 613–656. Gaskell, M. G., & Marslen-Wilson, W. D. (2002). Representation and competition in the perception of spoken words. Cognitive Psychology, 45(2), 220–266. Goldinger, S. D. (1998). Signal detection comparisons of phonemic and phonetic priming: the flexible-bias problem. Perception and Psychophysics, 60(6), 952–965. Greenhouse, S. W., & Geisser, S. (1959). On methods in the analysis of profile data. Psychometrika, 24, 95–112. Hagoort, P., & Brown, C. M. (2000). ERP effects of listening to speech: semantic ERP effects. Neuropsychologia, 38(11), 1518–1530. Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews, 8, 393–402. Holcomb, P. J., & Neville, H. J. (1990). Auditory and visual semantic priming in lexical decision: a comparison using event-related brain potentials. Language and Cognitive Processes, 5(4), 281–312. Hsu, C. H., Tsai, J. L., Lee, C. Y., & Tzeng, O. J. L. (2009). Orthographic combinability and phonological consistency effects in reading Chinese phonograms: an eventrelated potential study. Brain and Language, 108(1), 56–66. Institute of Linguistics, Chinese Academy of Social Sciences. (2012). 现代汉语词典 (Modern Chinese Dictionary) (6th ed.). Beijing: Commercial Press. Jacquemot, C., & Scott, S. K. (2006). What is the relationship between phonological short-term memory and speech processing? Trends in Cognitive Sciences, 10(11), 480–486. Kutas, M., & Federmeier, K. D. (2011). Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP). Annual Review of Psychology, 62, 621–647. Kutas, M., & Hillyard, S. A. (1980). Reading senseless sentences: brain potentials reflect semantic incongruity. Science, 207(4427), 203–205. Kutas, M., & Hillyard, S. A. (1984). Brain potentials during reading reflect word expectancy and semantic association. Nature, 307, 161–163. Liu, Y., Liang, N. Y., Wang, D. J, Zhang, S. Y., Yang, T. Y., Jie, C. Y., et al. (1990). 现代汉语 常用词词频词典(Word Frequency dictionary of the Commonly used words in modern Chinese). Beijing: Yu Hang Press. in Chinese. Liu, Y., Perfetti, C. A., & Hart, L. (2003). ERP evidence for the time course of graphic, phonological, and semantic information in Chinese meaning and pronunciation decisions. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29(6), 1231–1247. Liu, Y., Shu, H., & Wei, J. (2006). Spoken word recognition in context: evidence from Chinese ERP analyses. Brain and Language, 96(1), 37–48. Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: the neighborhood activation model. Ear and Hearing, 19(1), 1–36. MacGregor, L. J., Pulvermüller, F., van Casteren, M., & Shtyrov, Y. (2012). Ultra-rapid access to words in the brain. Nature Communications, 3, 711. Magnuson, J. S., Dixon, J. A., Tanenhaus, M. K., & Aslin, R. N. (2007). The dynamics of lexical competition during spoken word recognition. Cognitive Science, 31(1), 133–156. Malins, J. G., & Joanisse, M. F. (2012). Setting the tone: an ERP investigation of the influences of phonological similarity on spoken word recognition in Mandarin Chinese. Neuropsychologia, 50(8), 2032–2043. Marslen-Wilson, W. D. (1987). Functional parallelism in spoken word-recognition. Cognition, 25(1), 71–102. Martín-Loeches, M., Schweinberger, S. R., & Sommer, W. (1997). The phonological loop model of working memory: an ERP study of irrelevant speech and phonological similarity effects. Memory and Cognition, 25(4), 471–483. Mattys, S. L. (1997). The use of time during lexical processing and segmentation: a review. Psychonomic Bulletin and Review, 4(3), 310–329.
McCallum, W. C., Farmer, S. F., & Pocock, P. V. (1984). The effects of physical and semantic incongruites on auditory event-related potentials. Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section, 59(6), 477–488. McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18(1), 1–86. McQueen, J. M., & Cutler, A. (2001). Spoken word access processes: an introduction. Language and Cognitive Processes, 16(5–6), 469–490. Newman, R. L., & Connolly, J. F. (2009). Electrophysiological markers of pre-lexical speech processing: evidence for bottom–up and top–down effects on spoken word processing. Biological Psychology, 80(1), 114–121. Nunez, P. L., & Srinivasan, R. (2006). Electric fields of the brain: the neurophysics of EEG ((2nd ed.). New York: Oxford University Press. O’Callaghan, C. (2008). Object perception: vision and audition. Philosophy Compass, 3(4), 803–829. Olichney, J. M., Yang, J.-C., Taylor, J., & Kutas, M. (2011). Cognitive event-related potentials: biomarkers of synaptic dysfunction across the stages of Alzheimer's disease. Journal of Alzheimers Disease, 26(Suppl 3), S215–S228. Perre, L., Midgley, K., & Ziegler, J. C. (2009). When beef primes reef more than leaf: orthographic information affects phonological priming in spoken word recognition. Psychophysiology, 46(4), 739–746. Perrin, F., & García-Larrea, L. (2003). Modulation of the N400 potential during auditory phonological/semantic interaction. Cognitive Brain Research, 17(1), 36–47. Praamstra, P., Meyer, A. S., & Levelt, W. J. (1994). Neurophysiological manifestations of phonological processing: latency variation of a negative ERP component timelocked to phonological mismatch. Journal of Cognitive Neuroscience, 6(3), 204–219. Praamstra, P., & Stegeman, D. F. (1993). Phonological effects on the auditory N400 event-related brain potential. Cognitive Brain Research, 1(2), 73–86. Price, C. J. (2012). A review and synthesis of the first 20years of PET and fMRI studies of heard speech, spoken language and reading. NeuroImage, 62(2), 816–847. Radeau, M., Besson, M., Fonteneau, E., & Castro, S. L. (1998). Semantic, repetition and rime priming between spoken words: behavioral and electrophysiological evidence. Biological Psychology, 48(2), 183–204. Radeau, M., Morais, J., & Segui, J. (1995). Phonological priming between monosyllabic spoken words. Journal of Experimental Psychology: Human Perception and Performance, 21(6), 1297–1311. Rugg, M. D. (1984). Event-related potentials and the phonological processing of words and non-words. Neuropsychologia, 22(4), 435–443. Schirmer, A., Tang, S. L., Penney, T. B., Gunter, T. C., & Chen, H. C. (2005). Brain responses to segmentally and tonally induced semantic violations in Cantonese. Journal of Cognitive Neuroscience, 17(1), 1–12. Slowiaczek, L. M., McQueen, J. M., Soltano, E. G., & Lynch, M. (2000). Phonological representations in prelexical speech processing: evidence from form-based priming. Journal of Memory and Language, 43(3), 530–560. Trébuchon, A., Démonet, J. F., Chauvel, P., & Liégeois-Chauvel, C. (2013). Ventral and dorsal pathways of speech perception: an intracerebral ERP study. Brain and Language, 127(2), 273–283. Van Den Brink, D., Brown, C. M., & Hagoort, P. (2001). Electrophysiological evidence for early contextual influences during spoken-word recognition: N200 versus N400 effects. Journal of Cognitive Neuroscience, 13(7), 967–985. Ventura, P., Morais, J., Pattamadilok, C., & Kolinsky, R. (2004). The locus of the orthographic consistency effect in auditory word recognition. Language and Cognitive Processes, 19(1), 57–95. Zhao, J., Guo, J., Zhou, F., & Shu, H. (2011). Time course of Chinese monosyllabic spoken word recognition: evidence from ERP analyses. Neuropsychologia, 49(7), 1761–1770. Zou, L., Desroches, A. S., Liu, Y., Xia, Z., & Shu, H. (2012). Orthographic facilitation in Chinese spoken word recognition: an ERP study. Brain and language, 123(3), 164–173.