Brain & Language 123 (2012) 30–41
Contents lists available at SciVerse ScienceDirect
Brain & Language journal homepage: www.elsevier.com/locate/b&l
The phonotactic influence on the perception of a consonant cluster /pt/ by native English and native Polish listeners: A behavioral and event related potential (ERP) study Monica Wagner a,b,⇑, Valerie L. Shafer a, Brett Martin a, Mitchell Steinschneider c a b c
The City University of New York – Graduate School and University Center, Program in Speech-Language-Hearing Sciences, 365 Fifth Avenue, New York, NY 10016, USA St. John’s University, 8000 Utopia Parkway, Queens, NY 11439, USA Albert Einstein College of Medicine, Rose F. Kennedy Center, 1410 Pelham Parkway South, Bronx, NY 10461, USA
a r t i c l e
i n f o
Article history: Accepted 15 June 2012 Available online 4 August 2012 Keywords: Speech perception Native-language Phonotactics Event-related potentials Late positive component P3a and P3b English–Polish Consonant clusters /pt/ Cluster /st/ Cluster
a b s t r a c t The effect of exposure to the contextual features of the /pt/ cluster was investigated in native-English and native-Polish listeners using behavioral and event-related potential (ERP) methodology. Both groups experience the /pt/ cluster in their languages, but only the Polish group experiences the cluster in the context of word onset examined in the current experiment. The /st/ cluster was used as an experimental control. ERPs were recorded while participants identified the number of syllables in the second word of nonsense word pairs. The results found that only Polish listeners accurately perceived the /pt/ cluster and perception was reflected within a late positive component of the ERP waveform. Furthermore, evidence of discrimination of /pt/ and /pEt/ onsets in the neural signal was found even for non-native listeners who could not perceive the difference. These findings suggest that exposure to phoneme sequences in highly specific contexts may be necessary for accurate perception. Ó 2012 Elsevier Inc. All rights reserved.
1. Introduction Evidence indicates that phonotactic sequences that do not occur in a language can be difficult to produce and perceive (Davidson & Stone, 2004; Dupoux, Hirose, Kakahi, Pallier, & Mehler, 1999; Halle, Segui, Frauenfelder, & Meunier, 1998). Studies examining behavioral perception suggest that the failure to accurately produce these sequences may, in part, be due to difficulty perceiving the information. For example, Japanese does not allow for non-nasal consonants in syllable final position. When presented with a word that violates this condition, Japanese listeners misperceive the form by inserting a vowel (e.g., perceiving ‘‘ebuzo’’ for ‘‘ebzo’’) (Dupoux et al., 1999). The misperception of the pattern will take the form of a phonotactic pattern in the native language of the listeners. For example, Halle et al. (1998) divided illegal onset clusters in French (i.e., /tl/ and /dl/ within nonsense words such as ‘‘tlabdo’’) into ten acoustic segments that became progressively longer. The initial segment was approximately 10 ms and the final ⇑ Corresponding author. Address: St. John’s University, Communication Sciences and Disorders, St. John’s Hall, Room 344 E1, 8000 Utopia Parkway, Queens, NY 11439, USA. Fax: +1 718 990 2435. E-mail address:
[email protected] (M. Wagner). 0093-934X/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.bandl.2012.06.002
segment was a whole syllable of approximately 190 ms in duration. Participants perceived the initial segments as dentals (i.e., /t/ or /d/), but once features of the /l/ segment appeared, the clusters were perceived as velar clusters (i.e., /kl/ or /gl/) which are acceptable phonotactic patterns in syllable onset in French. In addition to native-language input, psychoacoustic salience of phoneme contrasts appears to affect perception (Burnham, 1986; Strange & Shafer, 2008). Burnham (1986) proposed that phonological contrasts vary on a robust/fragile continuum that correlates with the degree of acoustic difference between sounds. Salience has also been considered in terms of sonority (Berent, Steriade, Lennertz, & Vaknin, 2007). Sonority is defined as ‘‘a scalar property of segments correlated with acoustic intensity. Louder segments (e.g., /l/) are more sonorous than quieter segments (e.g., /p/, /t/)’’ (Berent et al., 2007, p. 593). Examination of phonotactic structures across the world’s languages suggests that sonority rises from the onset consonant of a word to the vowel and then declines. For example, in ‘‘plank’’ sonority rises from -p- to -l- to -a- and then declines from -a- to -n- to -k. Phoneme sequences within lexical units that follow sonority rules are termed unmarked and are common in the world’s languages and phoneme sequences that do not follow sonority rules are termed marked and are rare. Berent and colleagues showed that English listeners’ accuracy in identifying
M. Wagner et al. / Brain & Language 123 (2012) 30–41
non-native nonsense words (e.g., ‘‘pnaef’’, ‘‘ptaef,’’ ‘‘rpaef’’) as having one syllable became significantly worse as the sonority of the consonant cluster types became more marked. This suggests that in addition to presence or absence of a phoneme sequence in the native language salience is a factor influencing perception. The current research examines the influence of exposure to differing native-language phonotactic patterns within a specific context to determine whether presence of a phoneme sequence within that context is a necessary prerequisite for speech perception. Isolating the native-language phoneme sequence within a particular context permits a natural investigation of the acoustic features of phonemes (i.e., frequency and intensity of the signal over time) that change with context. These context-specific acoustic features are termed contextual features. As an example of contextual features, when /p/ is heard in syllable onset (e.g., ‘‘pat’’), the phone consists of a rapid rise in the first formant frequency during opening of the lips. When /p/, on the other hand, is heard in syllable coda (e.g., ‘‘tap’’), the phone consists of a rapid fall in the first formant frequency, related to lip closure for the /p/ (Bordon, Harris, & Raphael, 2003, p. 113). The current study investigates the consonant cluster /pt/ that occurs in both English and Polish, but only in the Polish language in word onset. Hence, we highlight the contextual features of the phonemes sequence. 1.1. Contextual features of phonemes Contextual features are unique to each phoneme sequence or word because the phoneme’s combination of features depends on its context within the word and on the word’s pattern of stress. These features are the acoustic input of the native language. Contextual feature processing underlying speech perception is feasible because features of phonemes in a word’s sound sequence constrain the possibilities in word recognition, partially explaining rapid lexical access. Furthermore, acoustic features are processed in core auditory cortex (Engineer et al., 2008; Mesgarani, David, Fritz, & Shamma, 2008; Steinschneider et al., 2005), supporting a role for contextual feature processing in speech perception. Research has demonstrated that sub-phonemic features influence speech perception (Connine & Pinnow, 2006; Diehl, Lotto, & Holt, 2004; Holt & Lotto, 2008; Luce, Goldinger, Auer, & Vitevitch, 2000; Vitevitch, 2007; Warren & Marslen-Wilson, 1987; see McQueen, Dahan, Cutler, 2003 for review). As an example, Holt and colleagues demonstrated that perception of ambiguous syllables (/ga/ and /da/) was influenced by the preceding context (i.e., /ar/ versus / al/ or tones mimicking the syllables) (Diehl et al., 2004; Holt & Lotto, 2008) and, through manipulation of tonal frequencies, demonstrated that the relative frequency of the context, rather than the absolute value of the signal altered perception (Holt, 2006). There is controversy, however, as to whether sub-phonemic features of the native-language are the basic units that form perception (Connine & Pinnow, 2006) or whether these features are later integrated into phonemic units that eschew detailed contextual information and that directly serve perception. For example, Kharlamov et al., argue for a single phonological representation for the acoustically distinct forms [asna] and [astna] because the Russian language permits ‘‘optional deletion of the medial segment in word-internal three consonant clusters’’ (Kharlamov, Campbell, & Kazanina, 2011, p. 3333). Russian listeners, in contrast to English listeners, were unable to accurately perceive the intervocalic contrast /stn/ versus /sn/ and did not show a pre-attentive neural discriminative response (mismatched negativity, MMN) to the contrast, which occurs in the Russian language. While /stn/ as well as /sn/ are present in the Russian language in the intervocalic position, the authors acknowledged that deletion of /t/ for the /stn/ sequence is more frequent and robust in Russian than in English. They state, ‘‘Notably, in many cases where a [t] was apparent
31
in the spectrogram (for Russian listeners), the realization of the segment was quite weak and difficult to identify on the basis of auditory analysis alone’’ (p. 3340). Hence, the acoustic input of their native language may explain the failure of the Russian listeners to perceive the /stn/ vs. /sn/ contrast. A phonological level representation suggested by Kharlamov et al., presumes integration of contextual features for perception, whereby detailed features are not accessed for perception. Data from behavioral measures alone do not allow for a sufficient explanation for poor perception of non-native phonotactic sequences. Specifically, it is possible that listeners can perceive the non-native /pt/ onset, but that the nature of the task precludes demonstrating this ability. Electrophysiological measures can complement behavioral measures and provide objective information regarding the time course of processing underlying the subjective behavioral perception. 1.2. ERP correlates of speech processing Few studies of phonological processing have used priming-type designs to examine access to phonological information of word forms. One reason to use such a design is because several studies suggest that the N400 component and the late positive component (LPC) can index phonological discrimination. The N400 has generally been interpreted as an index of lexical access to semantic information (Holcomb, Grainger & O’Rourke, 2002; McCallum, Farmer, & Pocock, 1984; Nobre & McCarthy, 1994). However, in a few studies N400 modulation has been observed to phonological differences between spoken words, with a greater negativity elicited for words (and nonsense words) that are more different from a phonological prime (Praamstra, Meyer, & Levelt, 1994; Praamstra & Stegeman, 1993; Shafer, Schwartz, & Kessler, 2002). Both task and lexical status of the words (real or nonsense) appear to influence whether an N400-like modulation will be observed (Nagy & Rugg, 1989; Praamstra & Stegeman, 1993; Praamstra et al., 1994; Rugg, 1987; Shafer et al., 2002). Phonological differences between pairs of words in a same-different task (Shafer et al., 2002) and in a rhyme-judgment task (Praamstra et al., 1994), regardless of lexical status of the words, resulted in N400 modulation. However, phonological differences between word pairs in a lexical decision task led to N400 modulation only when illegal clusters were in offset and not onset position (e.g., ‘‘peil-dleil’’, ‘‘peil’’ is translated as ‘‘level’’ and ‘‘dleil’’ is a non-word) (Praamstra & Stegeman, 1993). Praamstra and Stegeman suggested that listeners did not need to process the illegal onset word beyond the onset cluster to determine whether it was a real or a nonsense word, and thus aborted the lexical access process in this task. These findings suggest that N400 modulation will only be observed if the task requires listeners to process a word fully. Tasks using nonsense words are less influenced by lexical factors than real words (Nagy & Rugg, 1989; Rugg, 1987; Shafer et al., 2002; Vitevitch & Luce, 1998; Vitevitch & Luce, 1999) and are more affected by segmental measures including phonotactic probability (Vitevitch & Luce 1998). Phonotactic probability is an indirect measure of lexical information, however, because it is derived from the statistical probabilities of phoneme sequences of words in an individual’s lexicon. Illegal phoneme sequences have a phonotactic probability of zero in the lexicon. The late positive component has also been elicited in tasks of lexical processing. Studies of repetition priming using orthographically presented word and nonsense word pairs found a late positive component (LPC) that was more positive in response to the second word of different pairs relative to the second word of same pairs (Nagy & Rugg, 1989; Rugg, 1987). Furthermore, a larger LPC effect in response to real words relative to nonsense words suggested that the LPC reflected a process related to lexical form
32
M. Wagner et al. / Brain & Language 123 (2012) 30–41
(Rugg, 1987). A few studies examining phonological contrasts within spoken words have observed a late positive component (LPC) that increased in amplitude to a greater phonological difference (Dehaene-Lambertz, Dupoux, & Gout, 2000). The LPC is a positive wave peaking 300 ms or later and has been shown to reflect a conscious process that occurs following perceptual analysis of stimuli and preceding response selection (Verleger, Jaskowski, & Wascher, 2005). The LPC component waveform is derived from activity from multiple brain sources (Bledowski, Prvulovic, Goebel, Zanella, & Linden, 2004; Steinschneider & Dunn, 2002) and is affected by numerous variables involved in stimulus evaluation and response selection (Leuthold & Sommer, 1998), by stimulus value and complexity, certainty of response and attention (Kok, 2001; Picton & Hillyard, 1974; Squires, Hillyard, & Lindsay, 1973).
We hypothesized that Polish listeners would determine whether pt nonsense words had two or three-syllables with greater accuracy than the English listeners. Both groups should perform the task for the st nonsense words with the same level of accuracy. We predicted an N400 for the target word in different pairs compared to the target word in same pairs only if the phonological contrast was linguistically-relevant for the language group (Praamstra & Stegeman, 1993; Praamstra et al., 1994; Shafer et al., 2002). We also hypothesized that the phonological contrast within our nonsense word stimuli might be reflected in the LPC component (Dehaene-Lambertz et al., 2000) only for the language groups for which the contrast was linguistically relevant.
1.3. Overview
Participants: Thirteen native-English listeners (8 female) between the ages of 21 and 35 (mean = 29 years) and 12 native-Polish listeners (8 female) between the ages of 23 and 34 years (mean = 30 years) were included in the study. Data from one English participant was excluded from ERP analysis because the participant passed the hearing screening in one ear, only. Therefore, ERP analysis was conducted on 12 participants from each language group. The Polish listeners were bilingual Polish–English speakers who had emigrated from Poland to the United States as young adults after 15 years of age. The English listeners reported having no exposure to Polish or other Slavic languages that contain /pt/ as a consonant cluster in word onset. Participants were without a history of speech/language/cognitive impairment. Participants passed a hearing screening at 25 dB HL. Two Polish and one English participant were left-handed; all others were right-handed. Stimuli: Two and 3-syllable nonsense words beginning with /pt/, /pEt/, /st/, /sEt/ were recorded (Table 1). The stimuli were potential real words in Polish and English with the exception of all nonsense words that began with /pt/, which is an illegal phonotactic form in English. The vowels heard in the penultimate syllable of the nonsense words were vowels that are present in the Polish and English language. Nonsense words in the pt and st conditions were matched for rhyme as closely as possible. Specifically, the nonsense words in the pt conditions had a counterpart matched for rhyme in the st conditions (e.g., ptesa, pEtesa, stesa, sEtesa) or a counterpart closely matched for rhyme (e.g., ptIva, pEtIva, stIfa, sEtIfa). The stimuli were recorded by a male bilingual Polish-English speaker who came to the United States at the age of six years. The stimuli were recorded at a sampling rate of 22,050 Hz, selected nonsense word stimuli were copied to separate files and preceding and following silence intervals were removed. Selected stimuli were normalized to 20 dBVU and DC offset was set to zero. The mean and range for the length for each word type are as follows: 3-syllable pt words, mean = 552 ms, range 481–675 ms; 2-syllable pt words, mean = 490 ms, range 417 ms – 604 ms; 3-syllable st words, mean = 698 ms, range 633 – 801 ms; and 2-syllable st words, mean = 622 ms, range = 551 – 698 ms. The mean duration and SD of the /st/ and /pt/ onset clusters reported in Table 2 differed by 97 ms and the /sEt/ and /pEt/ onset segments differed by 112 ms (p < .01). The number of pitch periods for the vowel were significantly different for the /sEt/ and the correctly perceived
The consonant cluster /pt/ in word onset was examined in native-English and native-Polish listeners to determine whether language-specific phonotactic patterns are a necessary prerequisite for accurate perception and to determine when in time the absence of the onset cluster affected neural correlates of processing this information. A phonotactic generalization of English is that the /pt/ consonant cluster is not allowed in word onset. The probability of /pt/ occurring in word onset in English is zero using a phonotactic probability calculator based on a 20,000 word database (Vitevitch & Luce, 2004). In contrast, the /pt/ cluster occurs at the end of words (i.e., ‘‘except’’) and as a regular and irregular past tense form in words (i.e., ‘‘jumped’’; ‘‘slept’’). In conversational speech, English listeners hear sequences of phonemes such as ptina, in sentences like ‘‘He slept in a bed’’, however, sequences of phonemes like ptina are never heard without a preceding vowel. Nonsense word stimuli in the current experiment were presented within same and different word pairs. Different word pairs (e.g., ptima-pEtima/‘‘pteema-peteema’’) contain an onset contrast that occurs in the Polish language, but not the English language. In the Polish language, the /pt/ cluster occurs in word onset (e.g., ‘‘ptak’’ translated as ‘‘bird’’). The contrast sequence within the nonsense words, that is /pEt/ occurs in both English and Polish (e.g., ‘‘petunia’’; ‘‘petycja’’ translated as ‘‘petition’’; ‘‘Petronella’’, a Polish fairy tale character). The consonant cluster /st/ (e.g., ‘‘stop’’; ‘‘stal’’ translated as ‘‘steel’’) and the phoneme sequence /sEt/ (e.g., ‘‘sateen’’; ‘‘setka’’ translated as ‘‘hundred’’) occur in both languages in word onset and are examined as experimental controls. A behavioral task conducted while recording ERPs required participants to determine whether the target nonsense word (i.e., second word) in the nonsense word pairs had two or three syllables. This design was chosen to ensure that participants listened to the whole word. The ERP phonological priming design, in which participants heard prime-target nonsense word stimuli, was selected because comparison of the neural response to the target words of the same and different word pairs enabled us to determine the stages at which the phonological contrast was detected by the brain. Specifically, when the target word is repeated in the same pairs with an inter-stimulus interval (ISI) of 250 ms, neurons are in a relatively refractory state and neural firing in response to the sounds and features of the ‘‘same word’’ are reduced (Sussman, Steinschneider, Gumenyuk, Grushko, & Lawson, 2008). In the different conditions, if the phonological contrast is registered, then new neural firing should occur in response to the features of the target word that are different. If a listener detects the contrast within the different word pairs, we reasoned based on the literature that detection of the contrast should be reflected within ERP components indexing phonological/lexical processing (N400) and conscious processing (LPC).
2. Method
Table 1 Eight experimental conditions: 4 pt and 4 st experimental conditions. Condition types
pt
st
SAME 2 SAME 3 DIFFERENT 2 DIFFERENT 3
/ptuka–ptuka/ /pEtima–pEtima/ /pEtuka–ptuka/ /ptima–pEtima/
/stesa–stesa/ /sEtIla–sEtIla/ /sEtesa–stesa/ /stIla–sEtIla/
M. Wagner et al. / Brain & Language 123 (2012) 30–41 Table 2 shows the mean duration for each onset segment in milliseconds. Standard deviation values are provided within parentheses. /st/
/pt/
/sEt/
/pEt/
232 (23.5)
135 (11.6)
281 (24)
168 (16)
/pEt/ sequences (i.e., /sEt/ = 3.42 (range 1–5); /pEt/ = 2.53 (range 1– 5); p < .01). Four pt and four st conditions were created, each condition consisting of 100 nonsense word pairs (Table 1). Words within the same pairs were different productions of the same word. Also, words within same/different pairs were matched for pitch contour. Word pairs were delivered with a 250 ms inter-stimulus interval (ISI) and a 2000 ms inter-trial interval (ITI). E-Prime software, version 1.1 was used for stimulus presentation and data collection. Stimuli were delivered in a sound-treated, electrically shielded room and presented through speakers. They were presented at an intensity ranging from 51 to 63 dB SPL (Sound level meter: B&K 2203 with a B&K 4144 mic, ‘‘A’’ weighting). 2.1. Behavioral procedure Participants were instructed to decide if the second word in the pair (i.e., target) had two or three syllables. A response key labeled two on the left side and three on the right side of the keypad was pressed for two (or three) syllable words. Participants received verbal instruction and practice with feedback on twenty nonsense pairs. Word pairs for the experimental portion of the design were presented in 10 blocks with 80 pairs in each block. Stimulus pairs were randomized within blocks and the blocks were presented in a random order. Means, median, and standard deviation (SD) scores were calculated for the behavioral data. The Fisher-Exact Test, similar to Chi Square, but for small samples, (Siegel & Castellan, 1988) was used to examine differences in behavioral accuracy between the groups. Behavioral judgments were transformed using the A0 calculation. A’ is similar to d0 , but for small groups and is a measure of syllable judgment accuracy that takes into account response biases by including correct responses and false negatives (Macmillan & Creelman, 2004). T-tests were used for group comparisons of accuracy. 2.2. EEG data acquisition The Geodesic System (Electrical Geodesic, Inc.) using a 65 channel Sensor Net with silver/silver-chloride (Ag/AgCL) plated electrodes encased in soft sponges was used for data collection. Prior to recording, we confirmed that impedance levels were at or below 40 kX, an acceptable level for the high-impedance amplifiers used (Ferree, Luu, Russell, & Tucker, 2001). Electrodes placed above and below the eyes monitored eye movements and eye blinks. The EEG was recorded with a sampling rate of 250 Hz, was bandpass filtered between 0.1 and 30 Hz and referenced to Cz. The continuous EEG was then processed for segmenting, averaging, artifact rejection and baseline correction using Net Station software (Version 4.1.2). The continuous recording was segmented into epochs consisting of a 200 ms baseline and a 1600 ms post-onset segment. Artifact rejection was set at +/ 70 lv. Data from bad channels were replaced by spline interpolation. Data was baseline corrected from 100 ms to 0 ms and re-referenced to an average reference.
33
same pairs (e.g., pt same, 3-syllable, /pEtima-pEtima/) was compared to the response to the target word of the different pairs (pt different, 3-syllable, /ptima-pEtima/). The target word in these two stimulus conditions is the same, the only difference being what precedes the target word, a different production of the same word or a different word. A greater response to the target word of the different pair relative to the same would reflect detection of the sound contrast within the different pair. 2.4. Global field power analysis (GFP) Global field power (GFP) was calculated to determine temporal regions of interest. (Lehmann & Skrandies, 1980; Michel et al., 2004; Murray, Brunet, & Michel, 2008; Shafer, Ponton, Datta, Morr, & Schwartz, 2007; Skrandies, 1990). In the current study, GFP is the standard deviation of the amplitude calculated from all 65 channels at each data point across time. Mixed ANOVAs with condition (same, different), time (48 ms time windows) and group (English, Polish) were used to examine the GFP data. 2.5. Current Source Density (CSD), voltage analysis Current Source Density (CSD) values were obtained by calculating the second spatial derivative from the voltage potential (Pernier, Perrin, & Bertrand, 1989) using Brain Electric Source Analysis (BESA EEG V5.1; 2005). This operation emphasizes shallow depth current flow, which is likely to correspond to sources in neocortex. The second spatial derivative operation also more accurately reflects current sinks and sources within auditory cortex than raw voltage (Pernier et al., 1989; Steinschneider, Kurtzberg, & Vaughan, 1992; see Steinschneider, Liegeois-Chauvel, & Brugge, 2011, chapter 25 for review). CSD maps were constructed from this operation via spline interpolation and were used to identify spatial regions corresponding to the time intervals of interest identified through GFP. The LPC was examined at midline-posterior parietal sites (sites 34 and 38), right-posterior parietal sites (sites 42 and 46) and leftposterior parietal sites (sites 29 and 28). A fronto-central positivity that was larger for the different targets peaked at 400 ms (P400). This P400 was examined at the midline-frontal site 4, left-frontal site 9, and right-frontal site 58. In addition, CSD maps revealed an increased temporal negativity (TN) to the different targets that peaked at 350 ms from left and right-temporal sites (site 20; leftanterior temporal, AL; site 24, left-posterior-temporal, PL; site 56, right-anterior temporal, AR and site 52, right-posterior temporal, PR). Fig. 1 shows the 64-channel Sensor Net map (Geodesic Electric Inc.). To allow comparison of the analysis from the Laplacian operation to previous published results, ANOVAs were also performed on raw voltage amplitude values between 500 and 700 ms, using 100 ms time intervals for the LPC in response to the pt contrast (3-syllable). Mixed ANOVA with group (12 native-English and 12 native-Polish participants) as the between subject factor and stimulus, time and site as the within subject factors were undertaken. Significant three-way interactions were followed with a step-down analysis and two-way interactions with Tukey’s HSD post hoc tests (p < .05). Greenhouse-Geisser correction was applied for three or more factors of site. 3. Results
2.3. ERP data analysis
3.1. Behavioral results
Four comparisons were analyzed in the ERP data in response to the target word stimuli. The ERP response to the target word of the
Behavioral data revealed that the English and Polish listeners identified the st nonsense words as having 2 or 3-syllables with
34
M. Wagner et al. / Brain & Language 123 (2012) 30–41 Table 4 shows the mean A prime values for the English and Polish groups in response to the st and pt stimuli. The asterisk () indicates significant language group differences for all pt conditions. Condition
2-Syllable 2-Syllable 3-Syllable 3-Syllable
st
same pairs diff pairs same pairs diff pairs
pt
Polish
English
Polish
English
.965 .983 .955 .982
.948 .973 .950 .971
.738 .782 .738 .784
.545 .541 .545 .542
3.2. ERP results
Fig. 1. Geodesic Net of 64 electrodes and VREF (electrode 65) (Geodesic Electric Inc.).
Table 3 Mean percentage correct scores, median percentage correct scores and standard deviation scores are shown for the syllable identification task. Conditions
ST conditions
PT conditions
Polish
English
Polish
English
SAME 2 Mean Median SD
98 99 1.8
97 99 4.7
90 97 18.4
43 49 33
DIFFERENT 2 Mean Median SD
99 99 0.9
98 99 3.2
94 98 8.4
42 50 30.1
SAME 3 Mean Median SD
99 100 2.4
98 99 2.2
81 82 7.9
85 90 20.4
DIFFERENT 3 Mean Median SD
99 100 1.1
99 100 1.8
84 83 5.6
82 88 20.8
at least 97% accuracy, indicating that they differentiated the 2 and 3-syllable st word forms with ease (non-parametric Fisher-Exact Test, 90% criteria, p = .48) (Table 3). Furthermore, small standard deviation values for the responses for the st conditions suggest a consistent pattern of response by all listeners. The behavioral response to the pt stimuli revealed that most of the Polish group identified the 2 and 3-syllable pt words with greater than 80% accuracy and with a consistent pattern of response. In contrast, all but one of the English listeners were unable to distinguish the 2 and 3-syllable pt words and most showed a bias for hearing all the word forms as 3-syllable (80% non-parametric Fisher-Exact Test; p = .005). A comparison of the language groups’ A’ prime values found that only the Polish group was able to differentiate the 2 and 3-syllable pt words (T-test between group comparison of A0 values, p < .01) (Table 4).
3.2.1. Global field power (GFP) ERP analysis was initiated by computing the global field power (GFP) across all electrode sites. GFP is an objective means to identify peaks in the AEP waveforms and their latencies. Analysis of the control experimental conditions, st 3-syllable, found two peaks of activity that showed greater power for the different target relative to the same target in the English and Polish listeners. The first was a sharp peak between 280 and 424 ms followed by a dominant peak between 472 and 712 ms (F (4, 88) = 4.149; p = .004; partial eta squared = .159; Tukey HSD, p < .05; F (1, 22) = 3.662; p = .006; partial eta squared = .294, respectively) (Fig. 4). The 2-syllable st conditions found one large peak beginning at 375 ms, greater for the different target than the same for both language groups (376 and 424 ms: F (1, 22) = 12.648, p = .002, partial eta squared = .365 472–712 ms: F (1, 22) = 4.058; p = .056; partial eta squared = .156). The effect of condition in the 2-syllable conditions, however, was larger for the English group than the Polish group (376 and 424 ms: F (1, 11) = 9.929; p = .009; partial eta squared = .474; 472–712 ms: F (1, 22) = 4.382; p = .048; partial eta squared = .166). GFP analysis of the 3-syllable pt conditions found a similar sharp peak between 232 and 424 ms, greater for the different target than the same, for the Polish group, but not the English group (Polish: F (1, 11) = 6.565; p = .026; partial eta-squared = .374; English: p = .82) (Fig. 5). Analysis of the 2-syllable pt conditions found the different target to be larger than the same target for the Polish group, but not the English group for the 424 ms interval (F (1, 22) = 5.021; p = .035; partial eta squared = .186). The dominant peak at 500 ms did not find significant results for the pt conditions. In summary, GFP revealed two peaks of activity in the ERP waveform to the st and pt contrasts, (1) a sharp peak between 232 and 424 ms and (2) a large dominant peak between 472 and 712 ms. 3.2.2. Cortical ERP responses between 232 ms and 424 ms Spatial regions corresponding to the GFP peak latencies were determined from grand mean Current Source Density (CSD) maps and analyzed using CSD transformed values. In addition, CSD values from temporal sites were analyzed because language-related differences may be reflected from lateral temporal sites (Friedrich, Schild, & Roder, 2009) and secondary auditory cortical activity on the lateral surface of the superior temporal gyrus has a small projection area on the surface of the scalp and, therefore, may not be evident in the overall GFP. The GFP peak between 232 and 424 ms was associated with a temporal negativity (TN) and a fronto-central positivity (P400) in CSD maps. The expected increased negativity to the different targets over posterior parietal sites (N400) was not seen in the CSD maps. CSD TN: A greater left-posterior temporal negativity to the different target relative to the same was found in response to the experimental control, 3-syllable st contrast, for English and Polish listeners (F (3, 66) = 3.053 p = .048; partial eta squared = .122) in
M. Wagner et al. / Brain & Language 123 (2012) 30–41
(a)
35
CSD TOPOGRAPHICAL MAPS OF THE TEMPORAL NEGATIVITY Polish English
(b)
Fig. 2. CSD maps depict the language groups’ greater TN (a) to the 3-syllable st different target (lower) relative to the st same target (upper) and (b) to the 3-syllable pt different target (lower) relative to the pt same target (upper) blue = negative, red = positive.
the time interval between 328 and 376 ms (F (1, 22) = 5.788; p = .025; partial eta squared = .208; Tukey HSD, p < .05) (Fig. 2a). Significance was not found for the 2-syllable st conditions. The same analyses for the 3-syllable pt conditions found a greater negativity to the different target at bilateral temporal sites (AL, AR, PL, PR) for the English and Polish listeners between 328 and 376 ms (F (1, 22) = 8.783; p = .007; partial eta squared = .285; Tukey HSD, p < .05) (Fig. 2b). The response to the 2-syllable pt conditions in the English group was greater to the different target than the same target at the left-posterior temporal site in the 376 ms time interval (F (1, 11) = 6.623; p = .026; partial eta squared = .376; Tukey HSD p < .05). Importantly, native-language speech perception differences for the /pt–pEt/ contrast were not found to be reflected within the TN. CSD P400: CSD maps revealed a focal fronto-central positivity in response to the 3-syllable /st–sEt/ contrast in both the English and Polish listeners for the time interval between 376 and 424 ms. The positivity was significant only at the midline site (F (2, 44) = 4.897; p = .015; partial eta squared = .176; Tukey HSD, p < .05), as shown in Fig. 3a. The response to the 2-syllable st conditions did not show significant main effects or conditions. The response to 3-syllable pt conditions showed a more right lateralized positivity in CSD maps for both the English and Polish listeners during the same time period (Fig. 3b). The 3-syllable pt different target was greater than the same target only from the right and midline sites (Right: F (1, 22) = 7.603; p = .011; partial eta squared = .257; Midline: F (1, 22) = 5.372; p = .030) with no interactions including group. Analysis of the midline site found that the Polish group showed an earlier positivity at 376 ms relative to the English group (condition x time interaction: F (1, 22) = 4.285; p = .05; partial eta squared = .163). The earlier positivity to the /pt–pEt/ contrast by the Polish listeners may reflect an effect of language experience. The 2-syllable condition response did not show significance. In summary, ERP waveforms for both the Polish and English listeners showed a temporal negativity that peaked at 350 ms and a
fronto-central positivity that peaked at 400 ms to both the /st–sEt/ and /pt–pEt/ contrasts. Figs. 4 and 5 depict ERP waveforms for the st and pt contrasts, respectively. The language group differences observed in the GFP to the pt contrast for 232 and 424 ms time interval were not reflected independently at the fronto-central sites or temporal sites, except that the P400 amplitude was greater for the different compared to same target in an earlier interval for Polish compared to English listeners. 3.2.3. Late latency ERP responses CSD LPC: The late latency peak in GFP corresponded to a posterior parietal positivity in the CSD maps evident in Fig. 6. English and Polish listeners showed a greater response to the 3syllable st different target relative to the same target between 424 and 712 ms only at the midline and right-posterior parietal sites (Midline: F (1, 22) = 4.607; p = .043; partial eta square = .173; Right: F (1, 22) = 12.525; p = .002; partial eta square = .363). Also, the English participants showed a larger response at the right-posterior parietal sites than the Polish participants (F (1, 22) = 6.421; p = .019; partial eta square = .226; Tukey HSD, p < .05). The topography for the 2-syllable conditions and the 3-syllable conditions differed with the different target greater than the same for English and Polish listeners at the left-posterior parietal sites for the 2-syllable st conditions (F (2, 44) = 6.487; p = .004; partial eta squared = .228; Tukey HSD, p < .05) between 472 through 616 ms (F (6, 132) = 3.918; p = .021; partial eta squared = .151; Tukey HSD, p < .05). CSD analysis of the 3-syllable /pt–pEt/ contrast revealed a more positive response to the different target relative to the same for only the Polish group in the time interval between 376 and 712 ms from the posterior parietal sites (Condition (2) Time (8) Laterality (3) Group; F (1, 22) = 3.96; p = .059; partial eta squared = .145) (Fig. 6b). Analysis of the 2-syllable pt conditions did not reveal significant results. An analysis of raw voltage amplitudes for the LPC was conducted for comparison with previous literature (Dehaene-
36
M. Wagner et al. / Brain & Language 123 (2012) 30–41
CSD TOPOGRAPHICAL MAPS OF THE FRONTO-CENTRAL POSITIVITY Polish
English
(a)
(b)
Fig. 3. CSD maps depict the language groups’ greater fronto-central positivity (arrow) (a) to the 3-syllable st different target (lower) relative to the st same target (upper) (b) and to the 3-syllable pt different target (lower) relative to the pt same target (upper). Notice the right lateralization only for the pt condition blue = negative, red = positive.
Fig. 4. Depicts ERP voltage waveforms in response to the 3-syllable st contrasts. Notice the larger TN, P400 and LPC to the different target for the Polish and English participants positive = up, vertical line = target word onset, horizontal line at zero crossing. Boxed graphs in the upper corners depict GFP.
Lambertz et al., 2000; Praamstra & Stegeman, 1993; Praamstra et al., 1994) using amplitudes values for the same and different pt targets (3-syllable) across 100 ms time windows. The analysis revealed that behavioral perception of the /pt–pEt/ contrast was reflected within the LPC as the Polish group showed a greater
response to the different target relative to the English group (Condition (2) Time (500, 600 ms) Laterality (3) Group; F (1, 22) = 6.34; p = .02; partial eta squared = .224) (Fig. 5). Furthermore, comparison of the st and pt difference waves (3-syllable conditions) for the late positivity revealed that the English listeners
M. Wagner et al. / Brain & Language 123 (2012) 30–41
37
Fig. 5. depicts the ERP voltage waveforms to the 3-syllable pt contrast. The TN, P400 and LPC are labeled. Note the larger LPC to the different target from the posterior parietal sites for the Polish participants, which was not found for the English participants positive = up, vertical line = target word onset, horizontal line at zero crossing. Boxed graphs in the upper corners depict GFP.
CSD TOPOGRAPHICAL MAPS OF THE LATE POSITIVE COMPONENT Polish English
(a)
(b)
Fig. 6. CSD maps depict the posterior parietal positivity of the LPC greater for the different targets relative to the same targets (a) for both language groups in response to the 3-syllable st contrast (b) and only for the Polish group in response to the 3-syllable pt contrast blue = negative, red = positive.
treated the st and pt contrasts differently, but the Polish listeners did not (F (1, 22) = 4.013; p = .058; partial eta squared = .154; Tukey HSD English p < .05; Polish p = .914). 4. Discussion The current research questioned whether exposure to the word onset /pt/ was a necessary prerequisite for perception and when in
time the absence of the onset cluster was reflected in neural processing of this information. We found that Polish listeners were able to distinguish nonsense words having /pt/ and /pEt/ in word onset, whereas English listeners were not. Having experienced word endings having /pt/ and /pEt/ in the language (e.g., ‘‘trumped’’ and ‘‘trumpet’’) was not sufficient for English listeners to accurately perceive the word onset /pt/ in nonsense words. These results suggest that exposure to phoneme sequences in highly
38
M. Wagner et al. / Brain & Language 123 (2012) 30–41
specific contexts (e.g., onsets) is necessary for accurate perception. Native-language patterns of speech perception were reflected in a late positive component (LPC) of the ERP waveform and language group differences were found at earlier stages of processing. Furthermore, we found evidence of the discrimination of the /pt/ and /pEt/ in the neural signal even for non-native listeners who could not perceive the difference. These findings will be addressed in greater detail, below. 4.1. Native-language phonotactics Our findings are consistent with other studies showing the influence of native-language phonotactic patterns on speech perception (Berent et al., 2007; Dehaene-Lambertz et al., 2000; Dupoux et al., 1999; Halle et al., 1998) and provide additional support by confirming sensitivity to the syllable-specific position. Specifically, spectro-temporal features of the /pt/ sequence and the contrast sequence (i.e., /pEt/) present in syllable-final position in English did not lead to good perception of these sequences in word onset for our English listeners. In contrast, the presence of word initial /pt/ and /pEt/ in the Polish language allowed for good perception. These results suggest that the contextual features of the /pt/ sound sequence influenced perceptual behavior. These findings provide additional support for research that has demonstrated the influence of the context on perception (Diehl et al., 2004; Holt, 2006; Holt & Lotto, 2008) and support studies that have found that exposure to the phonotactic patterns of the language form perceptual processes (Connine & Pinnow, 2006). The current findings are consistent with developmental speech perception research. For example, Jusczyk and colleagues demonstrated sensitivity to language-specific phonotactic patterns early in development before word meanings are formed (Jusczyk, 2000; Jusczyk, Friederici, Wessels, Svenkerud, & Jusczyk, 1993; Jusczyk, Luce, & Charles-Luce, 1994). Contextual features are unique to each phoneme sequence and mapping of these features on auditory networks may explain infants’ detection of three-syllable nonsense words from a continuous stream of syllables after 2 min of exposure (Saffran, Aslin & Newport, 1996a; Saffran, Newport & Aslin, 1996b). Mapping of the acoustic features of the phoneme sequence may precede ‘‘fast mapping’’ of word meaning, whereby thirteen-month old babies comprehend novel words after hearing the words only nine times (Woodman, Markman & Fitzsimmons, 1994). English and Polish listeners all have experience with /st/ and /sEt/ word onsets in the native language and their behavior revealed excellent perception of these syllable onsets. This finding further supports the claim that the better perception found by Polish participants for the /pt/ and /pEt/ onsets was related to nativelanguage experience rather than to some other factor, although this possibility cannot be entirely ruled out. It is possible that a factor such as salience (e.g., longer durations) of the /st–sEt/ contrast relative to the /pt–pEt/ contrast might allow for better perception by non-native listeners. In this case, the difference between Polish and English listeners could be that we, by chance, selected English listeners who were particularly poor and Polish listeners who were particularly good at the task. If we had also included a syllable onset that was allowed in English, but not Polish we would be able to address this possibility. However, we feel that the behavioral patterns are, in fact, due to language experience because we also tested a number of English and Polish participants in piloting the experiment and uniformly observed the same pattern of behavior as reported in these 24 participants. 4.2. Acoustic feature processing and perception The native-Polish listeners were successful in distinguishing the onset contrast /pt/ versus /pEt/ in a syllable identification task.
These Polish listeners reported that the speaker of our experimental stimuli, while having good Polish pronunciation, could be identified as a bilingual speaker of Polish and English. Even so, the Polish listeners correctly identified words having /pt/ and /pEt/ onsets as 2 and 3-syllable nonsense words, respectively. This behavioral performance indicated that Polish listeners could make use of acoustic information in the word forms to make accurate judgments, whereas the English listeners could not. Acoustic features of stop consonants (i.e., /p/, /t/, /k/) in word onset differ in Polish and English. Polish speakers release stop consonants after closure without aspiration, whereas, English speakers release and aspirate these stop consonants after closure; thus the /t/ in /pEt/ (e.g., petunia) and /sEt/ (e.g., sateen) is aspirated in English (Ladefoged, 2001, p. 43). The /t/ phoneme in /st/ is released and unaspirated in both Polish and English phonology (Katarzyna Dziubalska-Kolaczyk, personal communication, January 9, 2010). An analysis of response error by the Polish participants reported in supplementary content revealed that the presence or absence of a perceptible vowel was a factor in perception of the pt contrast. Twenty 3-syllable pt words (e.g., pEtudA) did not have a perceptible vowel between the /p/ and /t/ and were identified incorrectly by some Polish participants as having 2-syllables. In contrast, the 3-syllable words correctly identified as having 3-syllables by all Polish listeners had a perceptible vowel. This finding suggests that the Polish group relied, at least partially, on the vowel to perceive the difference between the 2 and 3-syllable words, but that the English listeners did not benefit from this information.
4.3. Stages of neural speech processing Behavioral speech perception of the native-language groups was reflected late in brain processing around 500 ms, however, language group differences were also found during an earlier time window between 232 and 424 ms. Greater activity (GFP) for the Polish participants for the pt contrast relative to the English participants within this time-window did not correspond independently to any spatial region and, therefore, several underlying sources may have contributed to this difference. In particular, the P400 modulation at frontal sites and the TN modulation at lateral sites partially overlapped in the time frame. LPC: The LPC that peaked at 500 ms corresponded with behavioral patterns of speech perception. The behavioral task in the current study consisted of the participants pressing a key to identify the second word in the pair as having 2 or 3-syllables. All participants made perceptual decisions, however, the English listeners showed no significant difference between the same and different pt target within the LPC. The LPC modulation, therefore, was not related to the perceptual decision, but to whether the target word was preceded by the same word or a different word. Hence, the LPC modulation appears to be driven by the presence of a perceptual difference between stimuli at the phonological level, rather than by classification of the stimuli. Dehaene-Lambertz, Dupoux, & Gout, 2000 found similar LPC modulation to reflect the behavior of French and Japanese participant groups. Only French participants accurately judged the final nonsense word in a series of five nonsense words as being the same or different (e.g., /igmo/, /igumo/) and only the French participants showed LPC modulation to the target word. The authors attributed the LPC modulation to response decision for the infrequent deviant stimulus. In our study, each syllable-type was equally frequent and thus, infrequency of the target could not be a factor. Our findings together with those of Dehaene-Lambertz et al. suggest that the LPC served as an index of conscious discrimination and is comparable to a P3b type component reflecting conscious processing of stimuli (Kok, 2001; Squires et al., 1973).
M. Wagner et al. / Brain & Language 123 (2012) 30–41
P400: A fronto-central positivity, peaking at 400 ms to stimulus differences appeared to index acoustic discrimination because the English group data reflected neural processing of both the st and pt contrasts. The P400 to the pt contrasts was earlier for the Polish than the English listeners, thus, showing some modulation by language experience. These results suggest that the P400 is influenced by both acoustic and linguistic factors, because the P400 indexed acoustic differences for both groups, but was facilitated for the native-language group. It is also possible that the P400 is an acousticorienting type response comparable to the P3a elicited in other studies (Linden, 2005). TN: The negativity from temporal sites (TN) peaking around 350 ms appears to reflect acoustic discrimination of the st and pt contrasts for both listener groups and was not modulated by language experience. Of interest was the fact that this negativity was strongly lateralized to the left hemisphere particularly for the st contrast. This left hemisphere bias may indicate that acoustic feature processing had already engaged language-related processing areas. Acoustic features of the /st/ cluster including duration may lend greater salience to the /st/ cluster relative to the /pt/ cluster and allow for greater difference at the temporal site (Ohala & Kawasaki-Fukumori, 1997). Activity recorded from these sites to speech is likely to have sources in secondary auditory cortex, specifically from the lateral surface of the superior temporal gyrus. The finding of poor responses recorded from these sites to speech sounds and words in children with language impairment suggests an important role for the underlying sources in speech processing (Shafer et al., 2011; Tonnquist-Uhlen et al., 2003). Furthermore, an ERP fragment priming study in which a presented German word was preceded by an identical fragment (e.g., trep-treppe), a similar fragment (e.g., krep-treppe) or a different fragment (e.g., dra-treppe) found modulation of a left anterior frontal positivity between 300 and 400 ms to correspond to the degree of difference from the target lexical form (Friedrich et al., 2009). The researchers concluded that the component reflected mapping the input onto lexical representations. Our finding of an increase in negativity to nonsense words, rather than the increased positivity that they observed to real words suggests that the P350 may be related to lexical access, whereas the TN reflects acoustic and, perhaps phonological processing. Analysis of raw voltage amplitudes for the TN component found an effect of language experience not evident in CSD. Follow up research currently underway in our lab using the phonological priming design will more definitively determine whether language exposure affects this stage of processing. 4.4. Absence of N400 modulation We had hypothesized that we might observe modulation of an N400-like negativity at superior posterior sites because previous studies suggested this possibility (Praamstra & Stegeman, 1993; Praamstra et al., 1994; Shafer et al., 2002). The current results did not find an N400 response to native-language phonological contrasts. It is possible that the N400 was not found because it was masked by the large LPC component (Holcomb, Grainger, & O’Rourke, 2002). However, the most likely reason for the absence of N400 modulation is that the words in the current study were all nonsense-word forms and the syllable-judgment task did not require lexical access for performance (Bonte & Blomert, 2004; Friedrich et al., 2009; Holcomb & Grainger, 2009; Praamstra & Stegeman, 1993; Shafer et al., 2002). Similarly, Praamstra and colleagues (Praamstra & Stegeman, 1993; Praamstra et al., 1994) did not observe N400 modulation to non-word targets that had illegal onset clusters, but they did observe N400 modulation to non-word targets that were identical to real words up to an illegal final cluster. They argued that the failure to see N400 in the former case was
39
because listeners could reject the illegal onset from the outset, and thus, not perform lexical access. This explanation is consistent with the findings from the behavioral literature that reveal minimal effects of lexical properties (e.g., neighborhood density) on nonsense word processing (Vitevitch & Luce, 1998; Vitevitch & Luce, 1999). However, sub-lexical phonological effects such as phonotactic probability are more readily observed in studies using nonsense words. Thus, our results suggest that N400 indexes lexical factors. This is an important finding because it clarifies that N400 is not an index of phonological processing, per se, but an index of lexical access, which can be influenced by phonological factors. 4.5. Frequency and distribution of phoneme sequences We predicted both behavioral and late latency ERP responses to be different for our English and Polish participants in response to the pt contrast, but to be the same in response to the st contrast. The behavioral results were consistent with this prediction, whereas, the ERP responses only partially confirmed this prediction. Late latency ERP responses showed the expected pattern to the pt contrast. The English and Polish listeners performed differently on the behavioral task to the pt contrast and this difference was evident in the LPC component of the ERP waveform. The behavioral response to the st contrast was the same for both language groups and showed near perfect accuracy. The LPC reflected detection of the st contrast for both language groups. However, the language groups showed topographical differences of the LPC to the st contrast in that the response was greater for the English group at the right-posterior parietal sites. Furthermore, the late GFP peak corresponding to the LPC was larger for the English group than the Polish group for the 2-syllable conditions. It is possible that a phoneme sequence such as word onset /st/ has high stimulus value in English because frequently used and early-learned words begin with that phoneme segment (e.g., ‘‘stop’’). LPC amplitude has been shown to increase with greater stimulus value, increased certainty and discriminability (Kok, 2001; Squires et al., 1973) consistent with this view. To our knowledge, language group differences within the ERP response to a word onset st contrast have not been previously demonstrated. Whether this pattern reflects greater frequency of usage of /st/ and /sEt/ words in the English language, greater stimulus value or ease of processing awaits further research. The unpredicted language group differences found to the st contrasts in the current study suggest that ERP methodology may prove useful for investigating questions related to frequency and distribution of the acoustic input. 4.6. ERP response to the two and three-syllable word conditions Our results show greater amplitude differences for the 3-syllable conditions (i.e., a 3-syllable target word was preceded by a 2syllable word) relative to the 2-syllable conditions (i.e., a 2-syllable target word was preceded by a 3-syllable word). ERP responses may have been less robust in the 2-syllable conditions because recovery from refractoriness that occurs in the 3-syllable conditions was not present in the 2-syllable conditions resulting in a smaller signal. 5. Conclusion The current results suggest that exposure to phoneme sequences in highly specific contexts may be necessary for accurate perception. A late positive component of the ERP waveform reflected the behavior of the native-language groups and more subtle language group differences were found at earlier stages of speech
40
M. Wagner et al. / Brain & Language 123 (2012) 30–41
processing. Importantly, this research demonstrated that the brain registers acoustic contrasts of non-native sound sequences, even in participants who were unable to perceive the difference. Investigating the behavioral and electrophysiological responses to specific phoneme sequences in word onset has enabled us to begin to tease apart acoustic and linguistic factors underlying native-language speech perception. Acknowledgments This publication was made possible by Grant Number HD46193 from the National Institute of Child Health and Human Development (NICHD) at the National Institutes of Health (NIH) to Valerie Shafer. The contents of this work are soley the responsibility of the authors and do not necessarily represent the official views of NIH. We acknowledge Winifred Strange for her assistance with this project and linguists Katarzyna Dziubalska-Kolaczyk, Malgosia Pyrzanowski, and Zosia Stankiewicz for providing information about the phonetic and phonotactic rules of the Polish language. We thank the reviewers for their effort and excellent comments. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.bandl.2012. 06.002. References Berent, I., Steriade, D., Lennertz, T., & Vaknin, V. (2007). What we know about what we have never heard: Evidence from perceptual illusions. Cognition, 104, 591–630. Bledowski, C., Prvulovic, D., Goebel, R., Zanella, F., & Linden, D. (2004). Attentional systems in target and distractor processing: A combined ERP and fMRI study. Neuroimage, 22, 530–540. Bonte, M., & Blomert, L. (2004). Developmental changes in ERP correlates of spoken word recognition during early school years: A phonological priming study. Clinical Neurophysiology, 115, 409–423. Bordon, G., Harris, K., & Raphael, L. (2003). Speech science primer: Physiology, acoustics, and perception of speech (4th ed.). Baltimore, MD: Lippincott, Williams & Wilkins. Burnham, D. K. (1986). Developmental loss of speech perception: Exposure to and experience with a first language. Applied Psycholinguistics, 7, 207–240. Connine, C., & Pinnow, E. (2006). Phonological variation in spoken word recognition: Episodes and abstractions. The Linguistic Review, 23, 235–245. Davidson, L., & Stone, M. (2004). Epenthesis versus gestural mistiming in consonant cluster production: An ultrasound study. In M. Tsujimura, & G. Garding (Eds.), Proceedings of the west coast conference on formal linguistics (Vol. 22), (pp. 165– 178). Dehaene-Lambertz, G., Dupoux, E., & Gout, A. (2000). Electrophysiological correlates of phonological processing: A cross-linguistic study. Journal of Cognitive Neuroscience, 12, 635–647. Diehl, R., Lotto, A., & Holt, L. (2004). Speech perception. Annual Review of Psychology, 55, 149–179. Dupoux, E., Hirose, Y., Kakahi, K., Pallier, C., & Mehler, J. (1999). Epenthetic vowels in Japanese: A perceptual illusion? Journal of Experimental Psychology: Human Perception and Performance, 25, 1568–1578. Engineer, C., Perez, C., Chen, Y., Carraway, R., Reed, A., Shetake, J., et al. (2008). Cortical activity patterns predict speech discrimination ability. Nature Neuroscience, 11, 603–608. Ferree, T. C., Luu, P., Russell, G. S., & Tucker, D. M. (2001). Scalp electrode impedance, infection risk, and EEG data quality. Clinical Neurophysiology, 112, 536–544. Friedrich, C. K., Schild, U., & Roder, B. (2009). Electrophysiological indices of word fragment priming allow characterizing neural stages of speech recognition. Biological Psychology, 80, 105–113. Halle, P., Segui, J., Frauenfelder, U., & Meunier, C. (1998). Processing of illegal consonant clusters: A case of perceptual assimilation? Journal of Experimental Psychology: Human Perception and Performance, 24, 592–608. Holcomb, P., & Grainger, J. (2009). ERP effects of short interval masked associative and repetition priming. Journal of neurolinguistics, 22, 301–312. Holcomb, P., Grainger, J., & O’Rourke, T. (2002). An electrophysiological study of the effects of orthographic neighborhood size on printed word perception. Journal of Cognitive Neuroscience, 14, 938–950. Holt, L. (2006). The mean matters: Effects of statistically defined nonspeech spectral distributions on speech categorization. Journal of Acoustical Society of America, 120, 2801–2817.
Holt, L., & Lotto, A. (2008). Speech perception within an auditory cognitive science framework. Current Directions in Psychological Science, 17, 42–46. Jusczyk, P. W. (2000). The discovery of spoken language. Cambridge, MA: MIT Press. Jusczyk, P. W., Friederici, A. D., Wessels, J. M., Svenkerud, V. Y., & Jusczyk, A. M. (1993). Infants’ sensitivity to the sound patterns of native language words. Journal of Memory and Language, 32, 402–420. Jusczyk, P., Luce, P., & Charles-Luce, J. (1994). Infants’ sensitivity to phonotactic patterns in the native language. Journal of Memory and Language, 33, 630–645. Kharlamov, V., Campbell, K., & Kazanina, N. (2011). Behavioral and electrophysiological evidence for early and automatic detection of phonological equivalence in variable speech outputs. Journal of Cognitive Neuroscience, 23, 3331–3342. Kok, A. (2001). On the utility of P3 amplitude as a measure of processing capacity. Psychophysiology, 38, 557–577. Ladefoged, P. (2001). A course in phonetics (4th ed.). USA: Thomson Learning, Inc. Lehmann, D., & Skrandies, W. (1980). Reference-free identification of components of checkerboard-evoked multichannel potential fields. Electroencephalography and Clinical Neurophysiology, 48, 609–621. Leuthold, H., & Sommer, W. (1998). Postperceptual effects and P300. Psychophysiology, 35, 34–46. Linden, D. (2005). The P300: Where in the brain is it produced and what does it tell us? Neuroscientist, 11, 563–576. Luce, P., Goldinger, S., Auer, E., & Vitevitch, M. (2000). Phonetic priming, neighborhood activation and PARSYN. Perception & Psychophysics, 62, 615–625. Macmillan, N., & Creelman, C. (2004). Detection theory: A user’s guide. Mahwah, NJ: Lawrence Erlbaum Assoc. McCallum, W. C., Farmer, S. F., & Pocock, P. V. (1984). The effects of physical and semantic incongruities on auditory event-related potentials. Electroencephalography and Clinical Neurophysiology, 59, 477–488. McQueen, J., Dehan, D., & Cutler, A. (2003). Continuity and gradedness in speech processing. In N. Schiller & A. Meyer (Eds.), Phonetics and phonology in language comprehension and production, differences and similarities (pp. 39–78). Berlin: Mouton de Gruyter. Mesgarani, N., David, S., Fritz, J., & Shamma, S. (2008). Phoneme representation and classification in primary auditory cortex. Journal of Acoustical Society of America, 123, 899–909. Michel, C., Murray, M., Lantz, G., Gonzalez, S., Spinelli, L., & Grave de Peralta, R. (2004). EEG source imaging. Clinical Neurophysiology, 115, 2195–2222. Murray, M., Brunet, D., & Michel, C. (2008). Topographic ERP analysis: A step-bystep tutorial review. Brain Topography, 20, 249–264. Nagy, M., & Rugg, M. (1989). Modulation of event-related potentials by word repetition: The effects of inter-item lag. Psychophysiology, 26, 431–436. Nobre, A., & McCarthy, G. (1994). Language-related ERP’s: Scalp distributions and modulation by word type and semantic priming. Journal of Cognitive Neuroscience, 6, 233–255. Ohala, J., & Kawasaki-Fukumori, H. (1997). Alternatives to the sonority hierarchy for explaining segmental sequential constraints. In S. Eliasson & E. H. Jahr (Eds.), Language and its ecology: Essays in memory of einar haugen. Berlin: Mouton De Gruyter. Pernier, J., Perrin, F., & Bertrand, O. (1989). Scalp current density fields: Concept and properties. Electroencephalography and Clinical Neurophysiology, 69, 385–389. Picton, T., & Hillyard, S. (1974). Human auditory evoked potentials. II Effects of attention. Electroencephalography and Clinical Neurophysiology, 36, 191–199. Praamstra, P., Meyer, A., & Levelt, W. (1994). Neurophysiological manifestations of phonological processing: Latency variation of a negative ERP component timelocked to phonological mismatch. Journal of Cognitive Neuroscience, 6, 204–219. Praamstra, P., & Stegeman, D. (1993). Phonological effects on the auditory N400 event-related brain potential. Cognitive Brain Research, 1, 73–86. Rugg, M. (1987). Dissociation of semantic priming, word and non-word repetition effects by event-related potentials. The Quarterly Journal of Experimental Psychology, 39A, 123–148. Saffran, J. R., Aslin, R., & Newport, E. L. (1996a). Statistical learning by 8-month-old infants. Science, 274, 1926–1928. Saffran, J. R., Newport, E. L., & Aslin, R. (1996b). Word segmentation: The role of distributional cues. Journal of Memory and Language, 35, 606–621. Shafer, V.L., Schwartz, R.G., & Kessler, K. (2002). ERP Indices of phonological and lexical processing in children and adults. BUCLD 27, Proceedings (pp. 762– 774). Shafer, V., Ponton, C., Datta, H., Morr, M., & Schwartz, R. (2007). Neurophysiological indices of attention to speech in children with specific language impairment. Clinical Neurophysiology, 118, 1230–1243. Shafer, V. L., Schwartz, R. G., & Martin, B. (2011). Evidence of deficient central speech processing in children with specific language impairment: The T-complex. Clinical Neurophysiology, 122, 1137–1155. Siegel, S., & Castellan, N. (1988). Nonparametric statistics for the behavioral sciences. NY: McGraw-Hill Publ.. Skrandies, W. (1990). Global field power and topographic similarity. Brain Topography, 3, 137–141. Squires, K. C., Hillyard, S. A., & Lindsay, P. H. (1973). Vertex potentials evoked during auditory signal detection: relation to decision criteria. Perception & Psychophysics, 14, 265–272. Steinschneider, M., & Dunn, M. (2002). Electrophysiology in developmental neurophyschology. In S. J. Segalowitz & I. Rapin (Eds.), Handbook of neuropsychology (2nd ed.. NY: Elsevier Science B.V (Chapter 5). Steinschneider, M., Liegeois-Chauvel, C., & Brugge, J. (2011). Auditory evoked potentials and their utility in the assessment of complex sound processing. In J.
M. Wagner et al. / Brain & Language 123 (2012) 30–41 A., Winer, & C. Schreiner (Eds.), The auditory cortex. (Chapter 25). New York: Springer. Steinschneider, M., Kurtzberg, D., & Vaughan, H. Jr., (1992). Event-related potentials in developmental neuropsychology. In I. Rapin & S. J. Segalowitz (Eds.), Handbook of neurophysiology (pp. 239–300). NY: Elsevier. Steinschneider, M., Volkov, I., Fishman, Y., Oya, H., Arezzo, J., & Howard, M. III, (2005). Intracortical responses in human and monkey primary auditory cortex support a temporal processing mechanism for encoding of the voice onset time phonetic parameter. Cerebral Cortex, 15, 170–186. Strange, W., & Shafer, V. L. (2008). Speech perception in second language learners: The re-education of selective perception. In J. G. Hansen Edwards & M. L. Zampini (Eds.), Phonology and second language acquisition. Philadelphia: John Benjamins Publ Co. Sussman, E., Steinschneider, M., Gumenyuk, V., Grushko, J., & Lawson, K. (2008). The maturation of human evoked brain potentials to sounds presented at different stimulus rates. Hearing Research, 236, 61–79. Tonnquist-Uhlen, I., Ponton, C., Eggermont, J., Kwong, B., & Don, M. (2003). Maturation of human central auditory system activity: the T-complex. Clinical Neurophysiology, 114, 685–701.
41
Verleger, R., Jaskowski, P., & Wascher, E. (2005). Evidence for an integrative role of P3b in linking reaction to perception. Journal of Psychophysiology, 19, 165–181. Vitevitch, M. (2007). The spread of the phonological neighborhood influences spoken word recognition. Memory and Cognition, 35, 166–175. Vitevitch, M. S., & Luce, P. A. (1998). Probabilistic phonotactics and neighborhood activation in spoken word recognition. Journal of Memory and Language, 40, 374–408. Vitevitch, M. S., & Luce, P. A. (1999). When words compete: Levels of processing in perception of spoken words. Psychological Science, 9, 325–329. Vitevitch, M. S., & Luce, P. A. (2004). A web-based interface to calculate phonotactic probability for words and nonwords in English. Behavior Research Methods, Instruments, and Computers, 36, 481–487. Warren, P., & Marslen-Wilson, W. (1987). Continuous uptake of acoustic cues in spoken word recognition. Perception & Psychophysics, 41, 262–275. Woodman, A., Markman, E., & Fitzimmons, C. (1994). Rapid word learning in 13-and 18-month olds. Developmental Psychology, 30, 553–566.