Brain & Language 138 (2014) 61–70
Contents lists available at ScienceDirect
Brain & Language journal homepage: www.elsevier.com/locate/b&l
Neural correlates of acoustic cues of English lexical stress in Cantonese-speaking children Xiuhong Tong a, Catherine McBride a,⇑, Juan Zhang b, Kevin K.H. Chung c, Chia-Ying Lee d, Lan Shuai e, Xiuli Tong f a
Psychology Department, The Chinese University of Hong Kong, Shatin, Hong Kong Faculty of Education, University of Macau, Macao Department of Special Education and Counselling, The Hong Kong Institute of Education, Hong Kong d The Institute of Linguistics, Academia Sinica, Taiwan e Department of Electrical and Computer Engineering, Johns Hopkins University, United States f Division of Speech and Hearing Sciences, The University of Hong Kong, Pokfulam, Hong Kong b c
a r t i c l e
i n f o
Article history: Accepted 2 September 2014 Available online xxxx Keywords: English lexical stress processing Prosody development MMN p-MMR
a b s t r a c t The present study investigated the temporal course of neural discriminations of acoustic cues of English lexical stress (i.e., pitch, intensity and duration) in Cantonese-speaking children. We used an eventrelated potential (ERP) measure with a multiple-deviant oddball paradigm to record auditory mismatch responses to four deviants, namely, a change in pitch, intensity, or duration, or a change in all three acoustic dimensions, of English lexical stress in familiar words. In the time window of 170–270 ms, we found that the pitch deviant elicited significant positive mismatch responses (p-MMRs) and that the duration deviant elicited a mismatch negativity (MMN) response as compared with the standard. In the time window of 270–400 ms, the intensity deviant elicited a significant p-MMR, whereas both the duration and the three-dimension changed deviants elicited significant MMNs. These results suggest that Cantonese-speaking children are sensitive to either single or convergent acoustic cues of English words, and that the relative weighting of pitch, intensity and duration in stress processing may correlate with different ERP components at different time windows in Cantonese second graders. Ó 2014 Elsevier Inc. All rights reserved.
1. Introduction Research on speech has typically focused on how phonetic segments such as vowels and consonants are encoded during speech perception (e.g., Mesgarani, Cheung, Johnson, & Chang, 2014). There has been little work on the discrimination of suprasegmental features of speech, such as lexical stress in English. Lexical stress refers to the relative emphasis or prominence of syllables within words or of words in sentences, such as PREsent1 (n0 pre-zəntn; gift)
and preSENT (npri-0 zentn) (Fry, 1955, 1958; Selkirk, 1980). Although behavioral research on native English-speaking adults’ perception and production of English lexical stress has suggested that stress is acoustically related to pitch (i.e., fundamental frequency [F0]), duration, and intensity (e.g., Crystal, 1969; Kehoe, Stoel-Gammon, & Buder, 1995), the neural correlates of encoding pitch, intensity and duration during English lexical stress processing in children remains ⇑ Corresponding author. 1
E-mail address:
[email protected] (C. McBride). The capitalized letters represent stress syllables.
http://dx.doi.org/10.1016/j.bandl.2014.09.004 0093-934X/Ó 2014 Elsevier Inc. All rights reserved.
poorly understood. In particular, no study has yet examined the neural discriminations of acoustic cues of English lexical stress in children whose first language is a tonal language, such as Cantonese speakers learning English as a second language. In this study, we thus used an event-related potential (ERP) measure to explore neural discriminations of English lexical stress cues (i.e., pitch, intensity and duration) in Cantonese-speaking children. We focused on whether Cantonese-speaking second graders acquiring English as a second language can use these three acoustic cues in English stress perception; and to what extent the weight of each cue varies with unfolding of stress perception in those second graders, as well as what neural markers would be associated with each acoustic cue. Within the last decades, researchers have become more interested, both theoretically and in empirical work, in stress perception and production in both native and non-native speakers. Empirical evidence on perception of lexical stress in adult native speakers of English suggests that F0, duration, and intensity are the main acoustic correlates of English stress perception (e.g., Fry, 1958; Kehoe et al., 1995; Mol & Uhlenbeck, 1955; Morton & Jassem, 1965). For example, Fry (1958) found that among the three
62
X. Tong et al. / Brain & Language 138 (2014) 61–70
cues, F0 is the most important cue for English stress perception, followed by duration and intensity. Bolinger (1965) also argued that F0 is the strongest cue in English stress perception, and that both duration and intensity are only secondary. In addition, stressed syllables are characterized as having increased magnitudes of F0, longer duration, and greater intensity relative to unstressed syllables (e.g., Klatt, 1976; Lieberman, 1960). There is increasing interest in perception of non-native lexical stress contrasts in adult listeners (e.g., Frost, 2011; Peperkamp & Dupoux, 2002; Wang, 2008). For example, Peperkamp and Dupoux (2002) reported that French adult speakers showed stress ‘‘deafness” in English lexical stress discrimination, because French is a language with predictable stress while English has an unpredictable stress pattern. In related work, Peperkamp, Vendelin and Dupoux (2010) showed that adult speakers of Standard French, Southeastern French, Finnish, and Hungarian, all of which have fixed stress patterns, had difficulties in perceiving stress contrasts. In contrast, there was no such ‘‘stress deafness” found in adult Spanish speakers whose native language has unpredictable stress. Frost (2011) argued that French and English native speakers may not process stress in the same way. These studies of ‘‘stress deafness” have been focused on adult speakers of languages with predictable stress such as French versus adult speakers of language with unpredictable stress, such as Spanish. Little is known about whether tone language speakers, whose L1 is a non-stressed language, such as Cantonese, are sensitive to English lexical stress, in particular, to the different acoustic dimensions including pitch, intensity, and duration. Thus, we move one step further by investigating whether Cantonese-speaking children are sensitive to three different acoustic correlates of English lexical stress (i.e., pitch, intensity and duration), and whether a similar order of perceived relative importance (F0–duration–intensity) would be observed in young Cantonese-speaking children. There have been only a few empirical studies on English lexical stress perception in Chinese2 learners of English (e.g., Chan, 2007; Wang, 2008). Wang (2008) evaluated the effects of F0, duration, and intensity on English stress perception in Mandarin Chinese learners of English and native English speakers. Results demonstrated that all three cues had a significant influence on English stress perception for native English speakers, but only F0 was found to be important for Chinese learners of English. Similar findings were obtained in adult Cantonese learners of English by Chan (2007) who found that Cantonese speakers used F0 as the primary cue in English stress perception, but the native English speakers used spectral balance (i.e., the distribution of intensity over the frequency spectrum) as the most important cue in stress perception. The finding that Chinese learners of English rely more on F0 than other acoustic cues indicates some transfer of reliance on F0 from the L1 tonal language to L2 stress (e.g., Nguyen & Ingram, 2005; Pennington & Ellis, 2000). More specifically, perceptual studies of Chinese lexical tone suggest that F0 is the primary acoustic cue for Chinese tone perception (e.g., Khouw & Ciocca, 2007; Vance, 1976). Chinese speakers, therefore, may transfer the strategy in perceiving lexical tone to English stress perception (e.g., Wang, 2008). This possibility of transfer is also supported by studies on English stress production in Chinese speakers, revealing that Chinese speakers may adopt the strategies used in their native tone production task to produce English stress (Zhang, Nissen, & Francis, 2008). For example, Zhang et al. (2008) provided extensive acoustic analyses of English stress production by Mandarin Chinese speakers and English speakers and demonstrated that Mandarin Chinese speakers used the acoustic cues of F0, duration, and intensity in a
2 In the present study the word ‘‘Chinese” is used as a blanket term referring to the distinct languages of Mandarin and Cantonese.
similar manner as native English speakers in stress production. That is, both Chinese speakers and native English speakers produced stressed syllables with a higher F0, longer duration, and greater intensity than unstressed syllables. These findings suggest that F0, duration, and intensity are all implicated in English stress perception in L2 English learners. Moreover, the relative importance of these acoustic cues in stress perception in L2 learners may be influenced by the native tone languages. Studies of child learners of English lexical stress perception-particularly the more subtle perception of pitch, intensity, and duration-are needed to further explore these possibilities. Another important issue yet to be examined to date is the neural markers of the acoustic cues of stress in stress perception. In particular, we know of no research that has systematically manipulated the three different acoustic correlates of English lexical stress including F0, intensity and duration and evaluated their effects on perception of English lexical stress in Cantonese-speaking children who are English learners. Therefore, in this study, we adopted an ERP measure to explore the neural discriminations of English stress and further evaluate the relative importance of three acoustic correlates of English stress (i.e., F0, duration, and intensity) in stress perception in Cantonese-speaking second graders acquiring English as a second language. It is widely known that the ERP measure is an approach with a very fine temporal resolution; it can be used to represent the brain’s response to either a passive or an eliciting input. In ERP studies of speech perception, the auditory passive oddball paradigm is often used to examine participants’ discriminative ability in speech perception and production with either single or multiple deviants (e.g., for reviews see Cheour, Leppänen, & Kraus, 2000; Näätänen, 2001; Näätänen, Paavilainen, Alho, Reinikainen, & Sams, 1989; Näätänen, Pakarinen, Rinne, & Takegata, 2004). In the passive oddball paradigm, participants are usually presented with a stream of frequent stimuli (standard) and infrequent stimuli differing in some discriminable change (for reviews, see Cheour et al., 2000; Näätänen, Paavilainen, Rinne, & Alho, 2007). A specific ERP component, i.e., the mismatch negativity (MMN), is often observed in this paradigm by subtracting the ERP responses to frequent stimuli (standard) from those of infrequent stimuli (deviant) (e.g., Chandrasekaran, Gandour, & Krishnan, 2007; Cheour et al., 1997; Näätänen et al., 1989). The MMN is found to distribute over the fronto-central electrodes with a peak in the time window of between 150 ms and 250 ms from the change onset of the stimuli in adults, and it reflects automatic, pre-attentive cortical processing. The MMN is suggested to be an indicator of the participant’s ability to discriminate between the standard and the deviant; the MMN has been found to become smaller or disappear as the degree of deviance between the standard and deviant is reduced (for a review, see Näätänen et al., 2007). The MMN, which can be obtained irrespective of participants’ attention or the behavioral task administered, is a useful tool to use to examine auditory or speech perception in infants and children, who are limited in attention or motivation (e.g., Cheour et al., 2000; Kuhl, 1998; Lee et al., 2012; Morr, Shafer, Kreuzer, & Kurtzberg, 2002). However, previous research has predominantly focused on investigations of the segmental level of speech such as vowels and consonants (for a review, see Näätänen et al., 2007), so less is known about the brain responses to suprasegmental features, such as English lexical stress in Cantonese-speaking children who learn English as a second language. There have been a few ERP studies on neural discriminations of German stress in German monolinguals (Weber, Hahne, Friedrich, & Friederici, 2004). For example, Weber et al. (2004) used an MMN paradigm to investigate German-speaking adults’ and 4- and 5month-old infants’ ERP responses to trochaic (on the first syllable) and iambic (on the second syllable) stress patterns in two-syllable
X. Tong et al. / Brain & Language 138 (2014) 61–70
German pseudowords. In the trochaic condition, an iambic CVCV item /baba:/ was frequently presented and was occasionally replaced by the trochaic deviant CVCV item /ba:ba/; in an iambic condition, the trochaic /ba:ba/ was assigned as the standard and the iambic CVCV item was assigned as the deviant. The authors reported that a typical MMN was observed for both the trochaic item and the iambic item, and they also found that 4-month-old infants did not show reliable responses to either condition. But a significant mismatch positive response (p-MMR) was observed for the trochaic item in 5-month-old infants. The p-MMR is usually observed in oddball studies of infants’ and children’s speech perception in the time window between 150 and 450 ms with a similar topographic distribution as the typical MMN (e.g., Cheng et al., 2013; Friederici, Friedrich, & Weber, 2002; Jing & Benasich, 2006; Lee et al., 2012; Maurer, Bucher, Brem, & Brandeis, 2003). However, what the p-MMR reflects and when it may be present or absent is still debatable (e.g., Cheng et al., 2013; Lee et al., 2012; Maurer et al., 2003; Shafer, Yan, & Datta, 2010). There are several accounts proposed to explain the mechanism of this component. For example, it is suggested that the p-MMR may act as an analogy of sorts to the adult-like P3a, and reflect distractibility or an involuntary attention shift or the automatic categorization of stimuli (e.g., Alho, Sainio, Sajaniemi, Reinikainen, & Näätänen, 1990; He, Hotson, & Trainor, 2009; Shestakova, Huotilainen, & Cheour, 2003). Other researchers propose that the p-MMR may reflect a recovery from refractoriness, indexing the detection and encoding of the acoustic properties of a stimulus in connections in the primary auditory cortex (e.g., Escera, Alho, Winkler, & Näätänen, 1998). In addition, some researchers suggest that the p-MMR found in children might have the same functional nature as the typical MMN found in adults, which may reflect additional or increased neural activation to deviants relative to standards (e.g., Maurer et al., 2003). Recently, there have been several studies demonstrating that the absence or presence of the p-MMR is correlated with the features of deviants such as deviance size (e.g., Cheng et al., 2013; Lee et al., 2012; Maurer et al., 2003). Nonetheless, MMN and p-MMR may serve as indicators of speech perception at both the segmental and suprasegmental levels. Taken together, the present study extended previous research to Cantonese-speaking second graders who acquire Cantonese lexical tone and English lexical stress, two distinct suprasegmental features, in parallel. We systematically examined the neural discriminations of changes in acoustic features of English lexical stress using an ERP measure. Such an investigation may help to clarify the impact of first language experience on the neural mechanisms underlying English lexical stress processing. Cantonese and English represent two extremes of the world’s languages in terms of suprasegmental phonology (tone versus non-tone; stressed versus non-stressed). Cantonese is a tone language, and there are six distinctive lexical tones (up to nine depending upon how one counts it): tone 1-high level (55), tone 2-high rising (25), tone 3mid level (33), tone 4-low falling (21), tone 5-low rising (23), tone 6-low level (22).3 Lexical tone can minimally contrast words. For example, one monosyllable /fu/ can represent six words/meanings of /fu55/膚(skin), /fu25/虎(tiger), /fu33/褲(trousers), /fu21/符(symbol), /fu23/婦(woman), and /fu22/父(father) (e.g., Bauer & Benedict, 1997; Tong, McBride, & Burnham, in press). Despite the difference between Cantonese and English, Cantonese lexical tones share certain acoustic and functional similarities with English lexical stress. Acoustically, although the primary acoustic correlate of lexical tone is fundamental frequency (F0) (e.g., Bauer & Benedict, 1997; Tong 3 Chao (1947) first transcribed lexical tone in a numerical notational system by using five levels (from lowest 1 to highest 5) to describe relative height, shape and duration of pitch contour.
63
et al., in press), duration and intensity are also related to Cantonese tone (e.g., Ng, Gilbert, & Lerman, 2000; Wu & Xu, 2010). Functionally, just as the variation of pitch on contiguous syllables (i.e., English stress) can result in changes in meaning, the variation of pitch in single syllables (i.e., Cantonese tone) can distinguish meanings for words. Given the clear evidence showing that L2 learners tend to make reference to acoustic cues that are actively involved in both L1 and L2 in L2 speech perception (Nguyen & Ingram, 2005), it is theoretically interesting to investigate the neural process of Cantonesespeaking children’s encoding of different acoustic cues including F0, duration and intensity during English lexical stress processing. Thus, in the present study, we tested whether the MMN, which is usually elicited in the oddball paradigm in segmental level speech and auditory research, would be a marker of neural discrimination of English lexical stress perception in Cantonesespeaking children, and whether Cantonese-speaking children would show different brain responses to F0, duration, and intensity of different acoustic cues of English lexical stress. We expected that Cantonese-speaking second graders might show neural sensitivity to all three acoustic cues of English lexical stress such as F0, intensity, and duration given that these three acoustic cues are accessible in their L1 and L2 suprasegmental phonology. In addition, we are also interested in examining the relative weights of the three acoustic cues in English lexical stress perception for Cantonese-speaking second graders. We expected that the importance of the three cues might vary with unfolding of the stress process. Given that the MMN reflects the automatic, preattentive cortical processing of auditory or speech signals, the present study thus might identify neural patterns that correspond to the changes of different acoustic cues (F0, intensity and duration) of English lexical stress. The present study thus sought to provide evidence on how different acoustic cues are encoded and used in English lexical stress to highlight the neural markers of stress perception in Cantonese-speaking children. This would be informative for understanding the neural correlates of English lexical stress perception in young L2 learners whose native language is a tonal language. Such an investigation of Cantonese-speaking children’s neural processing of acoustic cues of English lexical stress might additionally provide some practical ideas as well. The linguistic differences, in particular, suprasegmental phonology, between English and Cantonese provide a challenge for second language (L2) learners, and also highlight the task that Cantonese-speaking children face in shifting their linguistic attention to the specific phonetic features of second language which are not distinctive in their native language. These difficulties partly motivated this study by focusing on identifying the neural marker that might best correspond to the acoustic cues used in English lexical stress processing. With this knowledge, investigators might then consider how to aid Cantonese children optimally in the acquisition of English lexical stress. This might also be clinically informative for identifying English L2 learners who have difficulty in English lexical stress perception. That is, the mismatch responses to acoustic cues of English stress might serve as neural indicators of English stress perception difficulty for Cantonese learners (Tong, Tong, & McBride-Chang, 2013). 2. Methods 2.1. Participants Participants were 18 Hong Kong second grade children (9 girls and 9 boys) ranging in age from 7 years 4 months to 8 years 3 months (M = 7; 10, SD = 3 months). According to parents’ reports, all children were typically developing without any history of neurological, psychiatric, brain injury, or hearing problems, or learning difficulties.
64
X. Tong et al. / Brain & Language 138 (2014) 61–70
to access auditory and speech processing in both children and adults (e.g., Cheng et al., 2013; Lee et al., 2012; Maurer et al., 2003; Näätänen et al., 2004). This paradigm has thus been identified as being stable for examining auditory and speech perception. The basic assumption of the oddball paradigm is that the deviants, which differ from the standard in one respect, can strengthen the memory trace for the standard with regard to those attributes they have in common (e.g., Näätänen et al., 2004). In the present study, we manipulated the stimuli that differed in only one dimension: (1) pitch (i.e., fundamental frequency [F0]); (2) intensity; (3) duration, and that differed in three dimensions (pitch, intensity and duration) of the stress of a disyllabic English word. Therefore, there were four deviants including change in pitch, or in intensity or in duration only as well as a change in all three acoustic cues (pitch, intensity and duration) in the present study (see Fig. 1). Experimental stimuli were generated with the word pair MOther and toDay. This word pair was selected for two primary reasons. First, this word pair differed in stress pattern: MOther is disyllabic with a trochaic stress pattern (a stressed syllable followed by an unstressed one), whereas toDAY is a disyllabic word with an iambic stress pattern (an unstressed syllable followed by a stressed one). This is in accordance with the rationale for designing stress stimuli for young children in a very recent study of
This age range was selected because of clear evidence of the growth in children’s ability in using pitch cues to make lexical distinctions in English words in this age (Quam & Swingley, 2014). Furthermore, the 7- to 8-year-olds in Hong Kong had had more than 3 years’ experience in learning English as a second language, and this enabled us to explore how Cantonese children use different acoustic cues to perceive stress. Hong Kong Chinese children have typically begun to learn English at the age of 3 years or even earlier in Hong Kong (McBride-Chang et al., 2008). Hong Kong children are formally taught to learn English in kindergarten, either by local English teachers whose native language is Chinese, or by native English speakers. Native English speakers are commonly recruited to teach English in kindergartens (Leung, Lim, & Li, 2013). Moreover, some Hong Kong families have Filipina women as domestic helpers to look after their children. The Filipinas often speak English to the children. Hong Kong parents are very motivated to speak with their children in English. Thus, Hong Kong Chinese children in this age range have typically had relatively extensive exposure to English. 2.2. Stimuli and design A multiple-feature oddball paradigm was adopted in the present study. The multiple-feature paradigm has been widely used (a) Standard
(b) Pitch-changed
(c) Duration-changed
Amplitude
.5
0
Frequency
-.5 5000
0 0
300
Time (ms) (d) Intensity-changed
(e) Three-dimension-changed
Amplitude
0.5
0
Frequency
-0.5 5000
0
0
Time (ms)
300
Fig. 1. Waveforms (the upper row) and spectrograms (the lower row) of the stress stimuli used in the present study. In the spectrograms, the yellow line represents intensity, and the blue line represents the F0 features of stress. In the spectrograms, the dark areas indicate the time and frequency points where the acoustic energy is the highest. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.).
X. Tong et al. / Brain & Language 138 (2014) 61–70
English-speaking children’s stress perception (Quam & Swingley, 2014). Second, these two words are familiar words for 7- to 8-year-old Hong Kong Cantonese children. There is empirical evidence showing that young children encode the phonetic detail of familiar words for lexical distinction (Swingley & Aslin, 2000). The words mother and today were produced by a female native English speaker in a soundproof room. The original sound of MOther was assigned as the standard (see Fig. 1a). The four deviants (pitch, intensity, duration) were constructed on the basis of the original sound of MOther with reference to the acoustic parameters of toDAY. We extracted the prosodic information from the word ‘‘toDAY” and it served as a template to synthesize deviant stimuli. All deviant stimuli were constructed using the software Praat (Boersma & Weenink, 2004). The pitch deviant for the target word MOther was generated on the basis of the pitch contour of today by maintaining the intensity of the whole word (i.e., 75 dB). The original mean pitch values were 216.70 and 180.20 Hz for first and second syllables, respectively. After adjustments, the mean pitch values were 165.61 and 2020.78 Hz for first and second syllables, respectively (see Fig. 1b). The duration was also adjusted according to the word toDAY. The original duration of the first syllable of /0 mʌ/ was around
148 ms, and the second syllable of /ðə/ was around 152 ms. To match with the template word today, the duration values were adjusted to be 98 ms and 228 ms, for the first syllable /0 mʌ/ and the second syllable /ðə/, respectively (see Fig. 1c). For word toDAY, the intensity value was around 76 dB for the first syllable /tə/ and it was approximately 80–81 dB for the second
syllable /0 deɪ/. For the word MOther, the intensity value was 76.78 dB SPL and 74.48 dB SPL for the first syllable /0 mʌ/ and the second syllable /ðə/, respectively. We then averaged intensity of each syllable and results in the adjusted intensity 69.35 dB SPL and 78.38 dB SPL for the two syllables of the deviants (see Fig. 1d). For the three-dimension-changed deviant, the intensity values were set to 67.57 dB SPL and 76.65 dB SPL for the first and second syllables, respectively (see Fig. 1e). All stimuli were lengthened to 300 ms with an SOA of 600 ms. The waveforms and spectrograms of standard and deviant tokens are shown in Fig. 1. 2.3. Procedure Participants were tested individually in an electrically shielded ERP lab by a trained Cantonese-speaking experimenter and the first author, who has experience in children’s ERP testing. Prior to the testing, the caregiver was asked to complete a consent form and a parental questionnaire. The parental questionnaire served to help us to collect information about children’s language experience and any history of language and learning difficulties. Following completion of these forms, the preparation of the EEG recording began. After preparation, participants were given instructions about the experimental procedure. Participants were seated in a comfortable chair in front of and at a distance of 80 cm from the computer monitor. The participants were asked to watch a movie entitled ‘‘The Mole” in silence while listening to the experimental stimuli. The stimuli were binaurally delivered via headphones. The stimuli were presented through three blocks, each of which consisted of 515 trials, starting with 15 trials of standard. The standard and four types of deviants including single dimensional change of pitch, intensity, duration and three-dimensional change were mixed in each of the three blocks consisting of 40% deviants (10% for each deviant) and 60% standard. Thus, there were 150 trials for each type of deviant in total. The orders of the trials within in each block were pseudo-randomized. The orders of the trials within each block were pseudo-randomized. Each block lasted approximately 8 min. Participants were given 2 min for a break
65
between blocks during the experiment. During the experiment, participants were asked to refrain from moving in order to minimize EEG artifacts. The whole experiment, including preparation and breaks, lasted approximately 1.5 h. 2.4. EEG recordings Electroencephalographic (EEG) signals were recorded from 64 Ag/AgCI electrodes fitted on an elasticized cap. The 64 electrodes were arranged according to the international 10–20 electrode system with reference to an electrode located between Cz and CPz. The activities of the right and left mastoids were also recorded. The vertical electrooculogram (EOG) was obtained from below versus above the left eye (vertical EOG) and the left versus right lateral orbital rim (horizontal EOG). During recording, the electrode impedance was kept below 15 KX. The EEG and EOG signals were amplified with a band-pass of 0.05–70 Hz and digitized on-line at a sampling rate of 1000 Hz. 2.5. Data analysis The EEG data were analyzed off-line using Scan 4.5 software. The EEG data were re-referenced to the average of both mastoids, which is commonly used in the MMN literature (e.g., Nenonen, Shestakova, Huotilainen, & Näätänen, 2003; Näätänen et al., 2004; Paavilainen, Simola, Jaramillo, Näätänen, & Winkler, 2001; Pakarinen, Huotilainen, & Näätänen, 2010). The continuous data were filtered with a 0.3–30 Hz band-pass. Epochs of 600 ms from stimulus onset were averaged separately for each condition, using a 100 ms pre-stimulus as baselines. The first 15 standard trials and epochs with artifacts exceeding ±100 lm were rejected automatically. In addition, the standard trials that immediately followed the deviant trials were excluded from averaging. Individual averages included at least 97 accepted trials. The mean amplitudes were computed separately for each participant and each condition in two time windows including 170– 270 and 270–400 ms at the electrodes of F3, Fz, F4, FC3, FCz, and FC4. Repeated measures ANOVAs with experimental condition (standard versus four deviants), site (frontal, frontocentral) and hemisphere (left, middle, right) as within-subject factors were performed. If there were any significant interactions, one-way ANOVAs were performed to unpack the interaction. To examine whether all deviants elicited significant MMNs, the planned comparisons were performed between the pitch-changed deviant and standard, intensity-changed deviant and standard, duration-changed deviant and standard, and three-dimension-changed deviant and standard. Moreover, planned comparisons were performed on the differences obtained by subtracting each deviant, i.e., pitch-changed, intensity-changed, duration-changed, threedimension-changed, from standard, in order to examine whether the MMNs elicited by each deviant were the same or different in terms of mean amplitude in each time window. For each ANOVA, the Greenhouse-Geisser adjustment to the degrees of freedom was used to correct for the violations of sphericity associated with repeated measures.
3. Results Fig. 2 shows the grand average of ERPs at the electrode of F3 for all types of stimuli. Fig. 3 shows the difference waveform of each deviant minus standard at the electrode of Fz. Fig. 4 shows the topographic maps of each deviant minus standard. The topographic voltage maps were obtained by subtracting the standard from the four types of deviants.
66
X. Tong et al. / Brain & Language 138 (2014) 61–70
(a)
(b)
(c)
(d)
-5 µV Standard
-100
0
100
200
300
ms
400
500
600
Deviant Difference
5 Fig. 2. ERP waveforms to the standard and deviant stimuli for (a) F0-changed deviant, (b) duration-changed deviant, (c) intensity-changed deviant, and (d) three-dimensionchanged deviant at the electrode of F3.
µV
-5 -2.5
-100
0
100
200
300
400
500
600
2.5
ms
5 F0 - Standard
Duration - Standard
Intensity - Standard
Three-dimension -Standard
Fig. 3. ERP difference waveforms for the F0 deviant minus standard, intensity deviant minus standard, duration deviant minus standard and three-dimensionchanged deviant minus standard at the electrode of Fz.
Visual inspection of the grand average of ERPs suggested that a prominent positive mismatch response (p-MMR) was identified between the pitch-changed deviant and standard in the time window from 170 to 270 ms. The duration-changed deviant elicited an obvious negative mismatch response (MMN) in both the time windows from 170 to 270 ms and 270 to 400 ms. It appeared that a p-MMR occurred between the intensity-changed deviant and standard in the time window from 270 to 400 ms. And an MMN seemed to be prominent in the time window from 270 to 400 ms for the three-dimension-changed deviant relative to the standard. By visual detection, combining the typical time windows for p-MMR and MMN in children, we performed the statistical analyses on the data collected in these two time windows. 3.1. Analyses of 170–270 ms time window There was a significant main effect of experimental condition in the time window from 170 to 270 ms, F (4, 68) = 10.52, p < .001, g2p = .38. Planned comparisons further revealed that the mean amplitude of the pitch-changed deviant (M = 9.59 lm) was more positive than the standard (M = 7.61 lm) (p < .01), and that the mean amplitude of the duration-changed (M = 5.93 lm) deviant was more negative than the standard (M = 7.61 lm) (p < .01). Also, there was a significant site effect in this time window, F (1, 17) = 14.74, p < .01, g2p = .46. Planned comparisons further showed that the mean amplitude of the frontocentral site (M = 8.17 lm) was
more negative than the frontal site (M = 7.55 lm) (p < .01). Moreover, a significant interaction between hemisphere and site was found, F (2, 34) = 14.93, p < .001, g2p = .47. The simple main effect analyses further revealed that the mean amplitude for the left hemisphere was more negative than the right hemisphere at the frontal sites (p < .05). At the frontocentral site, the mean amplitude for the right hemisphere was more negative than the ones for the left hemisphere (p < .05), and middle line (ps < .05), respectively. In addition, the analysis on the differences elicited by each deviant in the time window of 170–270 ms showed that the difference obtained from the pitch-changed deviant was significantly more positive than the ones elicited by the other three deviants (ps < .05). Also, the difference elicited by the duration-changed deviant (M = 1.68 lm) was significantly more negative than the differences obtained from the pitch-changed deviant (M = 1.99 lm) (p < .05), intensity-changed deviant (M = .58 lm) (p < .05), duration-changed deviant (M = .38) (p < .05), and threedimension-changed deviant (M = .38 lm) (p < .05).
3.2. Analyses of 270–400 ms time window In this time window, the main effect of experimental condition was significant F (4, 68) = 12.86, p < .001, g2p = .43. Follow-up pairwise comparisons indicated that the mean amplitude of the intensity-changed deviant (M = 4.41 lm) was more positive than the mean amplitude of the standard (M = 2.27 lm) (p < .01), and that the mean amplitude of the duration-changed deviant (M = .36 lm) was more negative than the mean amplitude of the standard (M = 2.27 lm) (p < .01). The mean amplitude of the three-dimension-changed (M = 1.08 lm) was also more negative than the standard (M = 2.27 lm) (p < .01). Also, there was a significant site effect in this time window F (1, 17 = 7.91, p < .05, g2p = .32. Planned comparisons showed that the mean amplitude of the frontocentral site (M = 2.49 lm) was more negative than the frontal (M = 1.98 lm) (p < .05). There was no significant interaction found in this time window (ps > .05). In addition, we found that the difference elicited by the intensity-changed deviant (M = 2.18 lm) was more positive than the differences elicited by the other deviants, and that the differences elicited by the duration-changed (M = 1.87 lm), and the
67
X. Tong et al. / Brain & Language 138 (2014) 61–70
(a)
(b)
(c)
+3
(d)
0 -3
170-194
195-219
220-244
245-269
270-309
310-349
350-389 ms
Fig. 4. Maps display the topographic distribution of the mean amplitude in the two analysis time windows from 170 to 400 ms for the deviant minus standard difference for (a) F0-changed deviant, (b) duration-changed deviant, (c) intensity-changed deviant, and (d) three-dimension-changed deviant.
three-dimension-changed deviant (M = 1.14 lm) were more negative than the differences elicited by the pitch-changed deviant (M = 86 lm), and intensity-changed deviant (ps < .05). 4. Discussion In the present study we investigated Cantonese-speaking second graders’ brain correlates of the automatic detection of violations in English lexical stress using an ERP measure with a multiple-deviant oddball paradigm. We manipulated three acoustic correlates of stress including F0, duration, and intensity in a real English disyllable word. We aimed to understand whether Cantonese-speaking second graders were able to use the three acoustic correlates of English lexical stress in stress perception and if so, what the neural markers were that were associated with the acoustic correlates, and whether the neural markers associated with each cue would be different from each other with an unfolding of the temporal course during English lexical stress perception. Our ERP results showed that in the time window from 170 to 270 ms, a violation in F0 exhibited a significant positive mismatch response, and that the violation in duration elicited a typical negative mismatch negativity, the typical MMN. In the time window of 270–400 ms, the violations in intensity elicited a significant p-MMR, whereas the violations occurring in all three dimensions elicited a significant MMN. These results indicated that Cantonese-speaking second graders are sensitive to the acoustic changes in F0, duration, and intensity of English lexical stress during English lexical stress perception. Also, the changes of the acoustic correlates of English lexical stress may be associated with different ERP components, which may reflect the fact that the discriminability between the three acoustic cues and the standard
may vary from each other at different stages during English lexical stress processing. The present findings have important implications for understanding English lexical stress perception in Cantonese-speaking children. First, these results demonstrate that Cantonese-speaking children depend upon all three acoustic cues, i.e., F0, duration, and intensity, in English lexical stress perception. This finding is in accordance with previous results found for L1 adults’ stress perception, in which a rise of F0, longer duration, higher intensity and fuller vowel quality were observed to correlate with stressed syllables in speakers whose L1 was English (e.g., Lieberman, 1967; Medress, Skinner, & Anderson, 1971). There are clear differences that exist between Cantonese and English in phonology at both the segmental and suprasegmental levels, however (e.g., Chan & Li, 2000; So & Dodd, 1995). For example, unlike English, which is a stress-timed language with fixed stress, Cantonese is a tonal language. Then why do Cantonese-speaking second graders show sensitivity to F0, duration and intensity, a manner that is similar to native speakers’ English stress perception observed in previous research? Prior studies on L2 acquisition have suggested that there is a transfer from L1 to L2 in L2 acquisition (e.g., Cisero & Royer, 1995; Durgunog˘lu, Nagy, & Hancin-Bhatt, 1993; Gottardo, Yan, Siegel, & Wade-Woolley, 2001). Such transfer can occur at all levels such as phonology, syntax, semantics and pragmatics (e.g., Brenders, van Hell, & Dijkstra, 2011; Durgunog˘lu et al., 1993; McBride-Chang, Bialystok, Chong, & Li, 2004; McBride-Chang, Cheung, Chow, Chow, & Choi, 2006; Meisel, 1997). In particular, the transfer at the phonology level is much more remarkable than that of other linguistic levels at both segmental and suprasegmental levels (e.g., Ellis, 1994). Thus, Cantonese-speaking children may transfer the ability in tone
68
X. Tong et al. / Brain & Language 138 (2014) 61–70
perception to stress perception. That is, Cantonese-speaking children may perceive English stress with reference to their familiar L1 acoustic correlates of lexical tones. Additionally, our results suggest that Hong Kong Cantonesespeaking children show different sensitivity to the three acoustic cues in English lexical stress perception as reflected by different ERP components at different processing stages. In other words, the weights of the three acoustic cues may not be the same and may vary with the unfolding processing stages in stress perception in Cantonese-speaking children. In the present study, in the 170–270 ms time window, we found more positive ERPs in the F0 deviant relative to the standard, and more negative ERPs for the contrast between duration-changed deviant and standard. However, no significant ERP effects were found for the contrast between intensity-changed and all three dimension-changed deviants. In contrast, in the 270–400 ms interval, a robust p-MMR was elicited by the intensity-changed, the duration-changed, and all three dimension-changed deviants also elicited significant MMNs, but neither p-MMR nor MMN was observed for the F0-changed deviant in this time window. The components of p-MMR and MMN have been observed in several previous auditory and speech perception studies (e.g., Cheng et al., 2013; Dehaene-Lambertz, 2000; Lee et al., 2012; Maurer et al., 2003) and they have been successfully used to examine a variety of phonetic differences, such as frequency, intensity, duration, sound location or rhythm of a sound or speech signal; these have been suggested to be valuable tools in speech perception (e.g., Cheour et al., 2000; Maurer et al., 2003; for a review, see Näätänen et al., 2007). The MMN might be an outcome of a comparison process between a new deviant stimulus and a memory trace formed by the standard stimulus in the auditory system. As the discrimination becomes more difficult, the MMN gets smaller or disappears (for a review, see Näätänen et al., 2007). For example, when Gomes et al. (1999) examined children and adults’ brain responses to difficult, medium, and easy deviants (1050, 1200, 1500 deviants versus 1000 Hz standard), they found that an MMN was only observed for medium and easy deviants, but not for difficult deviants when children were ignoring the stimuli. Although the function of p-MMR is still debated, the absence and presence of p-MMR is associated with the features of stimuli such as the deviance size and phonological saliency of speech (e. g., Cheng et al., 2013; Lee et al., 2012; Maurer et al., 2003). For example, Lee et al. (2012) reported that p-MMRs were observed for the small vowel contrast of /di/ with /da/ in 5-year-old children with a distribution in the midline and right hemispheric sites. Thus, researchers have proposed that the MMN may reflect enhanced and more mature discrimination ability, whereas pMMR is associated with the more difficult discrimination in children. In the present study, the p-MMR was elicited by the F0 deviant in the time window from 170 to 270 ms, and by the intensity deviant in the time window from 270 to 400 ms. MMN was elicited by the duration deviant at both the two time windows as well as by the all three dimension changed deviants in the time window from 270 to 400 ms. Although our findings cannot distinguish which cue is more prominent than the others, it is important to note that the deviance between F0 and standard, and intensity and standard seems smaller and more difficult to discriminate; in contrast, the discriminability between duration and standard tends to be easy to detect by Cantonese-speaking second graders. In other words, Cantonese-speaking children show different sensitivity to the three acoustic cues in stress perception at the automatic and preattentive stages. Our results appear to be a bit different from the findings from Wang (2008). These findings were that F0 was the most prominent cue for Chinese-speaking adults in English lexical stress perception. Two potential explanations may account for this discrepancy.
First, the difference in L1 language background may be the most important reason. More specifically, the L1 for participants in Wang’s study was Mandarin, but Cantonese was the L1 for participants in our study. Although Mandarin and Cantonese are both tonal languages, they differ in several ways in their respective phonological systems. For example, Mandarin only has four tones, while Cantonese has six (up to nine) tones (depending on how one counts it), which makes Cantonese tones more difficult to distinguish compared to Mandarin tones for non-native speakers, indeed even for speakers of tonal languages. Thus, apart from the F0, Cantonese-speaking children may also use other acoustic cues such as duration and intensity in tone perception. They may transfer those skills to English stress perception. Second, adults were the participants in Wang’s study (2008). In contrast, the participants in our study were second graders with a mean age of around 7 years old. Children may adopt different strategies in perceiving English stress from adults. That is, children may use different cues to perceive English stress because of their relatively unstable representations of both L1 and L2 speech acoustic features compared to adults. In fact, studies on stress in native English-speaking children suggest that even native English-speaking children could not completely master the complexities of word stress until about 12 years old (Kehoe, Stoel-Gammon, & Buder, 1995). Similarly, a prior developmental study in tone perception suggests that Cantonese children achieve adult-like performance in lexical tone perception by the age of 10 (Ciocca & Lui, 2003). Thus, our findings suggest that second grade Cantonese speakers may not perceive stress in the same manner as adults. It is also essential to highlight that our results argue especially for the status of the three acoustic cues in stress perception in Cantonese-speaking children. The influence of the three acoustic cues seems to vary with the unfolding process of stress perception. For example, in this study we found that the acoustic cue of duration seems to affect stress perception as early as 170 ms and lasts until 400 ms after the onset of the stimuli, which is associated with a typical MMN component; but the intensity cue seems to impact stress perception in the time window from 270 to 400 ms after the stimuli onset, which is indexed by a p-MMR component. This suggests that Cantonese-speaking children may depend more upon a duration cue at the early stage of stress perception. Interestingly, children seems less dependent upon the cue of F0 at the early stage of stress perception, namely from the 170 to 270 ms time window, in the present study, reflected by the p-MMR component, but they seemed to react more with the F0 cue in the late stage, namely after 400 ms in the present study. Although there was no main experimental effect found in the time window from 400 to 600 ms, the independent t-test for the MMN obtained from the pitch-changed deviant and standard indicated that there was a significant MMN effect (p < .05). However, we are aware that although our study was among the first to investigate the neural markers of acoustic cues of English stress in second Cantonese graders, it may not have been conclusive regarding the status of acoustic cues in stress perception in L2 children learners. There may be other possibilities leading to the different brain responses of different acoustic cues in English stress perception. For example, the development of language skills in both L1 and L2 is highly likely to affect the discriminability of acoustic cues of stress in stress perception. The influence of language skills in both L1 and L2 needs to be further tested in future research by directly comparing the neural discrimination of English stress in different age groups in comparison to those of native English speaker groups. In summary, our ERP results demonstrate that Cantonesespeaking second graders are sensitive to F0, duration, and intensity, in stress perception, as reflected by MMN or p-MMR. On the one hand, this finding shows that the MMN and p-MMR could be a valuable tool to investigate the neural discrimination of
X. Tong et al. / Brain & Language 138 (2014) 61–70
suprasegmental features of speech in children. On the other hand, this finding suggests that Cantonese second graders are able to detect the acoustic violation of English stress of spoken words, which may indicate that Cantonese second graders have a longterm memory representation of the acoustic cues of English stress. In addition, our results show that the influence of the three acoustic cues on English stress perception may vary with the unfolding process of stress, reflected by either MMN or p-MMR. Practically, the MMN or p-MMR elicited by the violation of acoustic cues of English stress may serve as an indicator for diagnosing difficulties in L2 English learning among Hong Kong Chinese–English bilingual readers. However, in this study, we manipulated the stress in a real English disyllable, and we did not control for the lexical properties of the word such as the word frequency. It remains unclear whether lexical properties would influence the temporal course of stress perception in Chinese-speaking children. Also, we used words with an iambic stress pattern. Brain activity may be different with different stress patterns. Future work might attempt to extend our findings on the words with trochaic stress patterns. It may be also valuable to use pseudowords as stimuli to exclude the influence of lexical properties. Acknowledgments This research was supported by the General Research Fund of the Hong Kong Special Administrative Region Research Grants Council (CUHK: 451811) and Collaborative Research Fund of the Hong Kong Special Administrative Region Research Grants Council (CUHK: 2300035) to Catherine McBride. We thank the research assistants for help with data collection, and children and parents for their participation. References Alho, K., Sainio, K., Sajaniemi, N., Reinikainen, K., & Näätänen, R. (1990). Eventrelated brain potential of human newborns to pitch change of an acoustic stimulus. Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section, 77(2), 151–155. http://dx.doi.org/10.1016/0168-5597(90)90031-8. Bauer, R. S., & Benedict, P. K. (1997). Modern cantonese phonology (Vol. 102). Walter de Gruyter. Boersma, P., & Weenink, D. (2004). Praat: Doing phonetics by computer (Version 4.2) [Computer program]. Retrieved 04.03.04. Bolinger, D. (1965). In D. Bollinger (Ed.), Pitch accent and sentence rhythm in forms of English: Accent, morpheme, order. Cambridge, MA: Harvard U.P. Brenders, P., van Hell, J. G., & Dijkstra, T. (2011). Word recognition in child second language learners: Evidence from cognates and false friends. Journal of Experimental Child Psychology, 109(4), 383–396. http://dx.doi.org/10.1016/j. jecp.2011.03.012. Chan, M. K. (2007). The perception and production of lexical stress by Cantonese speakers of English. Unpublished Master Thesis, University of Hong Kong. Chan, A. Y., & Li, D. C. (2000). English and Cantonese phonology in contrast: Explaining Cantonese ESL learners’ English pronunciation problems. Language Culture and Curriculum, 13(1), 67–85. http://dx.doi.org/10.1080/ 07908310008666590. Chandrasekaran, B., Gandour, J. T., & Krishnan, A. (2007). Neuroplasticity in the processing of pitch dimensions: A multidimensional scaling analysis of the mismatch negativity. Restorative Neurology & Neuroscience, 25(3/4), 195–210. Chao, Y. R. (1947). Cantonese Primer. Cambridge, Mass: Harvard University Press. Cheng, Y. Y., Wu, H. C., Tzeng, Y. L., Yang, M. T., Zhao, L. L., & Lee, C. Y. (2013). The development of mismatch responses to Mandarin lexical tones in early infancy. Developmental Neuropsychology, 38(5), 281–300. http://dx.doi.org/10.1080/ 87565641.2013.799672. Cheour, M., Alho, K., Sainio, K., Reinikainen, K., Renlund, M., Aaltonen, O., et al. (1997). The mismatch negativity to changes in speech sounds at the age of three months. Developmental Neuropsychology, 13(2), 167–174. http://dx.doi.org/ 10.1080/87565649709540676. Cheour, M., Leppänen, P. H. T., & Kraus, N. (2000). Mismatch negativity (MMN) as a tool for investigating auditory discrimination and sensory memory in infants and children. Clinical Neurophysiology, 111(1), 4–16. http://dx.doi.org/10.1016/ S1388-2457(99)00191-1. Ciocca, V., & Lui, J. (2003). The development of the perception of Cantonese lexical tones. Journal of Multilingual Communication Disorders, 1(2), 141–147. Cisero, C. A., & Royer, J. M. (1995). The development and cross-language transfer of phonological awareness. Contemporary Educational Psychology, 20(3), 275–303. Crystal, D. (1969). Prosodic systems and intonation in English. Cambridge: Cambridge University Press.
69
Dehaene-Lambertz, G. (2000). Cerebral specialization for speech and non-speech Stimuli in Infants. Journal of Cognitive Neuroscience, 12(3), 449–460. http://dx. doi.org/10.1162/089892900562264. Durgunog˘lu, A. Y., Nagy, W. E., & Hancin-Bhatt, B. J. (1993). Cross-language transfer of phonological awareness. Journal of Educational Psychology, 85(3), 453. http:// dx.doi.org/10.5353/th_b3688930. Ellis, R. (1994). The study of second language acquisition. Oxford University Press. Escera, C., Alho, K., Winkler, I., & Näätänen, R. (1998). Neural mechanisms of involuntary attention to acoustic novelty and change. Journal of Cognitive Neuroscience, 10(5), 590–604. http://dx.doi.org/10.1162/089892998562997. Friederici, A. D., Friedrich, M., & Weber, C. (2002). Neural manifestation of cognitive and precognitive mismatch detection in early infancy. NeuroReport, 13(10), 1251–1254. http://dx.doi.org/10.1097/00001756-200207190-00006. Frost, D. (2011). Stress-cues to relative prominence in English and French: A perceptual study. Journal of the International Phonetic Association, 41(1), 67–84. http://dx.doi.org/10.1017/s0025100310000253. Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. The Journal of the Acoustical Society of America, 27(4), 765–768. http://dx.doi.org/ 10.1121/1.1917773. Fry, D. B. (1958). Experiments in the perception of stress. Language & Speech, 1(2), 126–152. http://dx.doi.org/10.1177/002383095800100207. Gomes, H., Sussman, E., Ritter, W., Kurtzberg, D., Cowan, N., & Vaughan, H. G. Jr., (1999). Electrophysiological evidence of developmental changes in the duration of auditory sensory memory. Developmental Psychology, 35(1), 294. http://dx. doi.org/10.1037/0012-1649.35.1.294. Gottardo, A., Yan, B., Siegel, L. S., & Wade-Woolley, L. (2001). Factors related to English reading performance in children with Chinese as a first language: More evidence of cross-language transfer of phonological processing. Journal of Educational Psychology, 93(3), 530. http://dx.doi.org/10.1037/0022-0663. 93.3.530. He, C., Hotson, L., & Trainor, L. J. (2009). Maturation of cortical mismatch responses to occasional pitch change in early infancy: Effects of presentation rate and magnitude of change. Neuropsychologia, 47(1), 218–229. http://dx.doi.org/ 10.1016/j.neuropsychologia.2008.07.019. Jing, H., & Benasich, A. A. (2006). Brain responses to tonal changes in the first two years of life. Brain and Development, 28(4), 247–256. http://dx.doi.org/10.1016/j. braindev.2005.09.002. Kehoe, M., Stoel-Gammon, C., & Buder, E. H. (1995). Acoustic correlates of stress in young children’s speech. Journal of Speech Hear Research, 38(2), 338–350. Khouw, E., & Ciocca, V. (2007). Perceptual correlates of Cantonese tones. Journal of Phonetics, 35(1), 104–117. http://dx.doi.org/10.1016/j.wocn.2005.10.003. Klatt, D. H. (1976). Linguistic uses of segmental duration in English: Acoustic and perceptual evidence. The Journal of the Acoustical Society of America, 59, 1208. http://dx.doi.org/10.1121/1.380986. Kuhl, P. K. (1998). Effects of language experience on speech perception. The Journal of the Acoustical Society of America, 103(5), 2931. http://dx.doi.org/10.1121/ 1.422159. Lee, C. Y., Yen, H. L., Yeh, P. W., Lin, W. H., Cheng, Y. Y., Tzeng, Y. L., et al. (2012). Mismatch responses to lexical tone, initial consonant, and vowel in Mandarinspeaking preschoolers. Neuropsychologia, 50(14), 3228–3239. http://dx.doi.org/ 10.1016/j.neuropsychologia.2012.08.025. Leung, C. S. S., Lim, S. E. A., & Li, Y. L. (2013). Implementation of the Hong Kong language policy in pre-school settings. Early Child Development and Care, 183 (10), 1381–1396. Lieberman, P. (1960). Some acoustic correlates of word stress in American English. The Journal of the Acoustical Society of America, 32(4), 451–454. http://dx.doi.org/ 10.1121/1.1936148. Lieberman, P. (1967). Intonation, Perception and Language (Research Monograph, 38). Cambridge, MA: M.I.T Press. Maurer, U., Bucher, K., Brem, S., & Brandeis, D. (2003). Altered responses to tone and phoneme mismatch in kindergartners at familial dyslexia risk. NeuroReport, 14 (17), 2245–2250. http://dx.doi.org/10.1097/00001756-200312020-00022. McBride-Chang, C., Bialystok, E., Chong, K. K. Y., & Li, Y. (2004). Levels of phonological awareness in three cultures. Journal of Experimental Child Psychology, 89(2), 93–111. http://dx.doi.org/10.1016/j.jecp.2004.05.001. McBride-Chang, C., Cheung, H., Chow, B. W.-Y., Chow, C. S.-L., & Choi, L. (2006). Metalinguistic skills and vocabulary knowledge in Chinese (L1) and English (L2). Reading and Writing, 19, 695–716. http://dx.doi.org/10.1007/s11145-0055742-x. McBride-Chang, C., Tong, X., Shu, H., Wong, A. M. Y., Leung, K. W., & Tardif, T. (2008). Syllable, phoneme, and tone: Psycholinguistic units in early Chinese and English word recognition. Scientific Studies of Reading, 12(2), 171–194. Meisel, J. M. (1997). The acquisition of the syntax of negation in French and German: Contrasting first and second language development. Second Language Research, 13(3), 227–263. http://dx.doi.org/10.1191/026765897666180760. Mesgarani, N., Cheung, C., Johnson, K., & Chang, E. F. (2014). Phonetic feature encoding in human superior temporal gyrus. Science, 343(6174), 1006–1010. Medress, Skinner, T. E., & Anderson, D. E. (1971). Acoustic correlates of word stress. 82nd Meeting of Acoustical Society of America, paper k3, Denver Colorado, U.S.A. Mol, H., & Uhlenbeck, E. M. (1955). The linguistic relevance of intensity in stress. Lingua, 5, 205–213. http://dx.doi.org/10.1016/0024-3841(55)90010-3. Morr, M. L., Shafer, V. L., Kreuzer, J. A., & Kurtzberg, D. (2002). Maturation of mismatch negativity in typically developing infants and preschool children. Ear and Hearing, 23(2), 118–136. http://dx.doi.org/10.1097/00003446-20020400000005.
70
X. Tong et al. / Brain & Language 138 (2014) 61–70
Morton, J., & Jassem, W. (1965). Acoustic correlates of stress. Language & Speech, 8 (3), 159–181. Näätänen, R. (2001). The perception of speech sounds by the human brain as reflected by the mismatch negativity (MMN) and its magnetic equivalent (MMNm). Psychophysiology, 38(1), 1–21. http://dx.doi.org/10.1111/14698986.3810001. Näätänen, R., Paavilainen, P., Alho, K., Reinikainen, K., & Sams, M. (1989). Do eventrelated potentials reveal the mechanism of the auditory sensory memory in the human brain? Neuroscience Letters, 98(2), 217–221. http://dx.doi.org/10.1016/ 0304-3940. Näätänen, R., Paavilainen, P., Rinne, T., & Alho, K. (2007). The mismatch negativity (MMN) in basic research of central auditory processing: A review. Clinical Neurophysiology, 118(12), 2544–2590. http://dx.doi.org/10.1016/j. clinph.2007.04.026. Näätänen, R., Pakarinen, S., Rinne, T., & Takegata, R. (2004). The mismatch negativity (MMN): Towards the optimal paradigm. Clinical Neurophysiology, 115(1), 140–144. http://dx.doi.org/10.1016/j.clinph.2003.04.001. Nenonen, S., Shestakova, A., Huotilainen, M., & Näätänen, R. (2003). Linguistic relevance of duration within the native language determines the accuracy of speech-sound duration processing. Cognitive Brain Research, 16(3), 492–495. http://dx.doi.org/10.1016/s0926-6410(03)00055-7. Ng, M. L., Gilbert, H. R., & Lerman, J. W. (2000). Fundamental frequency, intensity, and vowel duration characteristics related to perception of Cantonese alaryngeal speech. Folia Phoniatricaetlogopaedica, 53(1), 36–47. http://dx.doi. org/10.1159/000052652. Nguyen, T. T. A., & Ingram, J. (2005). Vietnamese acquisition of English word stress. TESOL Quarterly, 39(2), 309–319. http://dx.doi.org/10.2307/3588314. Paavilainen, P., Simola, J., Jaramillo, M., Näätänen, R., & Winkler, I. (2001). Preattentive extraction of abstract feature conjunctions from auditory stimulation as reflected by the mismatch negativity (MMN). Psychophysiology, 38(2), 359–365. http://dx.doi.org/10.1111/1469-8986.3820359. Pakarinen, S., Huotilainen, M., & Näätänen, R. (2010). The mismatch negativity (MMN) with no standard stimulus. Clinical Neurophysiology, 121(7), 1043–1050. http://dx.doi.org/10.1016/j.clinph.2010.02.009. Pennington, M. C., & Ellis, N. C. (2000). Cantonese speakers’ memory for English sentences with prosodic cues. The Modern Language Journal, 84(3), 372–389. http://dx.doi.org/10.1111/0026-7902.00075. Peperkamp, S., & Dupoux, E. (2002). A typological study of stress ‘deafness’. In C. Gussenhoven & N. Warner (Eds.). Laboratory phonology (Vol. 7, pp. 203–240). Berlin: Mouton de Gruyter.
Peperkamp, S., Vendelin, I., & Dupoux, E. (2010). Perception of predictable stress: A cross-linguistic investigation. Journal of Phonetics, 38(3), 422–430. Quam, C., & Swingley, D. (2014). Processing of lexical stress cues by young children. Journal of Experimental Child Psychology, 123, 73–89. Selkirk, E. O. (1980). The role of prosodic categories in English word stress. Linguistic Inquiry, 11(3), 563–605. http://dx.doi.org/10.2307/4178179. Shafer, V. L., Yan, H. Y., & Datta, H. (2010). Maturation of speech discrimination in 4to 7-yr-old children as indexed by event-related potential mismatch responses. Ear and Hearing, 31(6), 735–745. http://dx.doi.org/10.1097/ aud.0b013e3181e5d1a7. Shestakova, A., Huotilainen, M., & Cheour, M. (2003). Event-related potentials associated with second language learning in children. Clinical Neurophysiology, 114(8), 1507–1512. So, L. K., & Dodd, B. J. (1995). The acquisition of phonology by Cantonese-speaking children. Journal of Child Language, 22, 473–496. http://dx.doi.org/10.1111/ j.1365-2788.1994.tb00439.x. Swingley, D., & Aslin, R. N. (2000). Spoken word recognition and lexical representation in very young children. Cognition, 76(2), 147–166. Tong, X., McBride, C., & Burnham, D. (in press). Cues for lexical tone perception in children: Acoustic correlates and phonetic context effects. Journal of Speech, Language, and Hearing Research. http://dx.doi.org/10.1044/2014_jslhr-s-130145. Tong, X., Tong, X., & McBride-Chang, C. (2013). A tale of two writing systems: Double dissociation and metalinguistic transfer between Chinese and English word reading among Hong Kong children. Journal of Learning Disabilities. http:// dx.doi.org/10.1177/0022219413492854. Vance, T. J. (1976). An experimental investigation of tone and intonation in Cantonese. Phonetica, 33(5), 368–392. http://dx.doi.org/10.1159/000259793. Wang, Q. (2008). Perception of English stress by Mandarin Chinese learners of English: An acoustic study. Unpublished Doctoral Dissertation. University of Victoria. Weber, C., Hahne, A., Friedrich, M., & Friederici, A. D. (2004). Discrimination of word stress in early infant perception: Electrophysiological evidence. Cognitive Brain Research, 18(2), 149–161. http://dx.doi.org/10.1016/j.cogbrainres.2003.10.001. Wu, W. L., & Xu, Y. (2010). Prosodic focus in Hong Kong Cantonese without postfocus compression. Speech Prosody, 2010. http://dx.doi.org/10.1515/tlr-20120006. Zhang, Y., Nissen, S. L., & Francis, A. L. (2008). Acoustic characteristics of English lexical stress produced by native Mandarin speakers. The Journal of the Acoustical Society of America, 123(6), 4498–4513. http://dx.doi.org/10.1121/ 1.2902165.