Prosodic transfer in Vietnamese acquisition of English contrastive stress patterns

Prosodic transfer in Vietnamese acquisition of English contrastive stress patterns

ARTICLE IN PRESS Journal of Phonetics 36 (2008) 158–190 www.elsevier.com/locate/phonetics Prosodic transfer in Vietnamese acquisition of English con...

740KB Sizes 0 Downloads 81 Views

ARTICLE IN PRESS

Journal of Phonetics 36 (2008) 158–190 www.elsevier.com/locate/phonetics

Prosodic transfer in Vietnamese acquisition of English contrastive stress patterns T. Anh-Thu’ Nguye˜ˆ n, C.L. John Ingram, J. Rob Pensalfini School of English, Media Studies, and Art History, University of Queensland, St Lucia, Qld. 4072, Australia Received 15 June 2004; received in revised form 7 August 2007; accepted 7 September 2007

Abstract This paper reports a study of prosodic transfer effects in the production and perception of three English stress patterns (broad-focus noun phrase, narrow-focus noun phrase and compound) at the level of word and phrase prosody by Vietnamese learners of English. The experiments examined the acoustic features and the perceptual strategies that native Australian English speakers and different groups of non-native speakers (Vietnamese beginning learners and advanced speakers of English) use to distinguish the three stress patterns. The results showed that native speakers and non-native speakers differ in their use of acoustic patterns which are optimally suited to their respective first language phonologies for realizing the three English stress patterns. Native speakers of English employed a combination of syntagmatic f0 (and correlated intensity) contrasts and duration in distinguishing the three stress patterns. Vietnamese speakers had no problem in manipulating contrastive levels of f0 and intensity on accent-bearing syllables but failed to realize the timing contrast between compound words and phrases and the syntagmatic contrast of accent in larger units such as polysyllabic words or phrases, as evidenced by their failure to deaccent the second element of the compound and narrow-focus patterns. Nevertheless, the advanced speakers’ ability to compress the constituents of the compounds and to deaccent the final nouns shows the effect of language learning/experience on prosodic acquisition. Possible mechanisms that underlie the transfer effects involved in three stress patterns are also discussed. Crown Copyright r 2007 Published by Elsevier Ltd. All rights reserved.

1. Introduction A wealth of studies, based upon loanword formation (Blair & Ingram, 2003; LaCharite´ & Paradis, 2005; Silverman, 1992, among others) and phonetic and phonological accommodation in second language (L2) learning (Archibald, 1998; Best, 1995; Flege, 1995; Iverson et al., 2003; Kuhl, 1993, among others) have shown that native speakers perceive and produce words and utterances of L2 through a phonetic or phonological ‘filter’ of their native language (L1). Studies of segmental feature transfer effects dominate the literature and experimental studies of prosodic accommodation to a second language remain scarce, though in recent years have begun to appear (McGory, 1997; Nguyeˆ˜n & Ingram, 2005; Ueyama, 2000; Ueyama & Jun, 1998). Corresponding author.

E-mail addresses: [email protected] (T.A.-T. Nguye˜ˆ n), [email protected] (C.L.J. Ingram), r.pensalfi[email protected] (J.R. Pensalfini). 0095-4470/$ - see front matter Crown Copyright r 2007 Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.wocn.2007.09.001

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

159

Current models of transfer effects (Best, 1995; Flege, 1995; Kuhl, 1993) have been formulated almost entirely upon studies of segmental contrasts. All models acknowledge the importance of prior phonological learning, in the form of L1 imposed categorical boundaries on otherwise gradable phonetic dimensions. But because they limit themselves to segmental transfer effects (where sound categories at single level of phonological contrast map onto those of another language), such models avoid the more awkward and interesting questions arising from considerations of how phonetic similarities interact with structural differences of the kind that are inevitably encountered in even the simplest cases of prosodic contact phenomena. This paper reports a study of prosodic transfer effects in the production and perception of three contrastive English stress patterns by Vietnamese learners of English. These contrasts are framed as sets of ‘minimal triplets’, disambiguated in their linguistic function by a preceding contextual phrase such as:

(a) (b) (c)

Context

Target

Type

This is a bottle which is colored blue. This bottle isn’t colored yellow. This kind of jelly-fish is common here.

It’s a blue bottle. It’s a blue bottle. It’s a blue bottle.

Broad focus NP Narrow focus NP Compound word

Our original motivation was simply to fill a gap in the literature by undertaking a study of prosodic transfer effects across two languages with sharply different prosodic systems, using a set of stimuli that would involve minimal confounding of segmental transfer effects, and two groups of second language learners (beginners and advanced) that might provide some purchase on the learnability of the relevant phonetic features required to master the target phonological contrasts in perception and production. From a strictly phonetic perspective, the three English ‘stress’ patterns may be viewed as contrasting patterns of pitch or accentual prominence (right edge prominence for the broad-focus noun phrase [B], left edge prominence for the narrow-focus noun phrase [N] or compound [C]), plus a temporal factor (distinguishing the compound from the phrase). This, we demonstrate in the first part of the data analysis, by showing that two orthogonal linear discriminant functions, based upon vowel nuclei fundamental frequency (f0) difference measures and normalized word or phrase durations successfully classify spoken English nativespeaker tokens into the three stress groupings. Thus, a linear phonetic feature detector, trained to recognize category boundaries on the relevant pitch and timing dimensions may be all that is required to discriminate/ identify the three target stress types. However, the finding that native speakers of Australian English performed significantly worse in the perceptual experiment than a simple linear discriminator, supplied with the critical acoustic measurements of nuclear f0 peak differences and syllable duration gave pause for reflection, that a two-parameter feature detector may be seriously flawed as a perceptual model of discrimination between the three stress patterns. Further calling into question the appropriateness of the simple parametric model is the phonological consideration that the broad focus, narrow focus and compound contrasts span two distinct domains of contrastive feature assignment—the lexicon in the case of compounds, and the post-lexical domain of phrasal accent assignment in the case of the broad and narrow focus NPs. Hence the listener’s task in perceptually judging or producing acceptable tokens of the three stress patterns is likely to involve simultaneous access to two autonomous aspects of prosodic competence concerned with phrase level control of intonational focus marking on the one hand, and control over the lexical prosody that distinguishes compounds as words from their otherwise homophonous phrasal counterparts on the other. It will be argued that this simultaneous duallevel access to prosodic forms complicates the listener’s task in accurately perceiving the three stress patterns. From the perspective of the second language learner, any phonological transfer effects from Vietnamese to English prosody will likely involve considerations of the lexical prosody of Vietnamese compounds and intonation-based accent assignment. Elaboration of a model of the interaction of lexical stress with phrasal accent assignment constitutes a central theme in the attempted integration of autosegmental theory (from phonology) with phonetic investigations of suprasegmental features in speech production (Beckman & Ayers, 1994; Beckman & Pierrehumbert, 1986; Fletcher & Harrington, 2001; among others). Section 2 of this paper reviews the phonetics and phonology of Vietnamese compounds and word prosody, with a view to predicting

ARTICLE IN PRESS 160

T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

prosodic interference effects that might be expected from the perspective of an autosegmental model of lexical stress and accent assignment. Notwithstanding its shortcomings, the two-factor phonetic model, when it is supplemented by a distinction between ‘active’ and ‘non-active’ prosody control parameters derived from L1 (Ueyama, 2000) proves worthy of closer scrutiny for its ability to predict transfer effects in discrimination and production of the three stress patterns for Vietnamese learners of English. The argument strategy of this paper is to push the predictions of an admittedly overly simple phonetic model of prosodic interference effects; to reveal how well it succeeds and where it fails to predict the perceptual responses and production characteristics of learners, as well as the perceptual responses of native listeners. Our analysis of the experimental findings emphasizes the linguistic information processing demands imposed by the perception and production tasks and the strategies that native and non-native speaker–listeners probably adopted to meet these task demands. The organization of the paper is as follows: Section 2 begins with a review of recent phonetic studies of prosodic transfer effects, focussing on contact situations between languages that differ in terms of the familiar typology of ‘tone’, ‘stress’, and ‘pitch accent’ languages. Relevant background information on Vietnamese word and phrase level prosody is then discussed; specifically, (a) the lexical tonal system of Vietnamese and tonal transfer effects in the production and perception of English word stress and (b) a comparative phonetic analysis of the compound—phrasal stress contrast in English and Vietnamese. Section 3 describes the experiments that were conducted. In Section 3.1 the acoustic properties of the stimulus materials are described that were subsequently used in the perceptual experiment and as training material for the production experiment. We first show that the broad focus (B) narrow focus (N) and compound (C) categories may be discriminated using just two critical acoustic parameters (F0 and normalized duration), among a range of acoustic correlates of stress that were tested. Next (Section 3.2), we investigate how well the critical f0 and timing parameters are preserved in the non-native speakers’ productions of the target English stress patterns, elicited as appropriate continuations of the context sentence cue. Discriminant analysis and other statistical tests indicate that Vietnamese subjects primarily respond to pitch cues, but that some acquired sensitivity to the durational contrast between compound and phrasal stress is evident from the responses of the advanced learner group. In Section 3.3 a perceptual discrimination experiment is reported. Native Australian English listeners as well as Vietnamese learners were given the task of identifying the appropriate stress pattern (presented as auditory stimuli on the carrier phrase: It’s a _________.) for a given context. Section 4 addresses apparent discrepancies in the results of the production and perception tests in terms of differential task demands imposed on native Australian English listeners and Vietnamese learners. We argue that beginning learners’ responses are dominated by a tonal transfer strategy in which Vietnamese lexical tones are substituted for English accentual and boundary tones on the basis of tone ‘shape’ similarity, constrained by Vietnamese phonotactics on the distribution of lexical tones. In contrast, the advanced learner group evince some accommodation to the temporal cues that contrast (compound) word, and phrase phonology in English and show evidence of de-accenting the rightmost components of the narrow-focus phrases under conditions of contrastive focus at the level of phrasal prosody. The case for these distinct learner response strategies is strengthened by a qualitative ToBI style analysis of pitch accent types and their distribution in the native English and learner tokens. Some tentative conclusions are then offered in Section 5. 2. Prosodic transfer effects A few studies that closely examine the phonetic properties of L2 prosody production by learners from various language backgrounds have shown how L1 phonology constrains the production and perception of L2 prosodic patterns. Willems (1982) investigated intonational deviations in the English produced by native speakers of Dutch. He found that L2 productions of English deviate from the native British English norm mainly in the size and direction of pitch movements. This could be clearly attributed to the transfer of intonational characteristics from Dutch, which was examined through an instrumental comparison of the production of English utterances by monolingual English speakers and Dutch learners of English, and the production of comparable Dutch utterances by functionally monolingual Dutch speakers.

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

161

A similar transfer effect from L1 intonation was found for Seoul Korean and Mandarin Chinese speakers of English. McGory (1997) investigated the production of American English word pairs differing in the location of stress (e.g., memorizes vs. memorial) in statements and questions and in several focus conditions by Seoul Korean and Mandarin Chinese speakers. Both groups of non-native speakers appeared to have difficulties producing native English prominence relations: where native English speakers produced pitch accents in prominent target words only, non-native speakers produced stressed syllables with higher f0 values in both prominent and less prominent words. In addition, the non-native speakers did not distinguish between statements and questions in their f0 patterns. The differences in intonation patterns between non-native and native speakers of English could (to a large extent) be attributed to influences of the L1, which was clearly shown in the different error patterns produced by non-native speakers of two different L1 backgrounds: Mandarin and Korean speakers. In a similar study, Ueyama and Jun (1998) examined the realization of English post-focus deaccentuation produced by Tokyo Japanese speakers and Seoul Korean speakers at different proficiency levels. They found that post-focus deaccentuation was easier for Japanese than for Korean speakers at the same level of proficiency, which was attributed to an L1 transfer effect. It appears that Japanese downstep (i.e., the pitch accent of an accented word following another accented word has a lower f0 peak relative to the preceding pitch accent) was being positively transferred to L2 production and the lack of downstep in Korean (in which phrases are marked by a phrase-final H tone) was negatively transferred to L2 production. They also found that the degree of deaccentuation correlated with proficiency levels: the more fluent the speaker, the greater is the degree of dephrasing. The above studies are restricted to the deviation of f0 patterns (tonal shapes) in L2 intonation production. Studies that have jointly examined transfer effects in the acoustic correlates of adaptation to L2 temporal and accentual structure are even rarer, but are now beginning to appear (see, Mennen, 2004). In a study on the production of accent peak alignment by Dutch non-native speakers of Greek, Mennen (2004) found a bidirectional interference in the realization of an accent contrast common to both languages. The majority of the L2 learners (four out of five) in her study failed to produce native-like f0 peak alignment values in the L2. They produced the peak as early as that of the native Dutch control group (i.e., within the accented vowel) in statements with long vowels in the accented syllable of the test word, and considerably earlier than that of the native Greek control group, who realized the peak in the following unaccented vowel. Ueyama (2000) and Nguye˜ˆ n and Ingram (2005) are two recent studies that investigate the acoustic correlates of L2 word-level prosody. In a study on Japanese learners’ production of English words, Ueyama (2000) found that the active role of f0 in the L1 Japanese pitch accent system positively transfers to English, as indicated by a consistently higher f0 in accented than in unaccented syllables. By contrast, even though Japanese has a phonemic vowel length contrast, beginning Japanese speakers of English were less successful in realizing accented vs. nonaccented syllable duration contrasts than advanced speakers, because L1 Japanese word accent production is restricted to f0 contrasts while duration is not actively manipulated. In a companion study to the present paper, using the same subjects as this study (Vietnamese speakers of English at two different levels of proficiency) but investigating a different L2 prosodic contrast: the production of English word stress contrasts in segmentally homophonous noun/verb pairs (e.g., permit[n] vs. permit[v]), Nguye˜ˆ n and Ingram (2005) found that Vietnamese learners can differentiate between stressed and unstressed syllables in English by means of f0 contrast—an acoustic correlate available in both languages. However, in the early stage of second language learning they fail to produce a syllable duration contrast that characterizes native productions and fail to reduce vowels in unstressed syllables, possibly because these two important phonetic features are not active in Vietnamese tonal contrasts. In brief, both studies that examined acoustic correlates of L2 prosody suggest that L2 learners will have less difficulty realizing an acoustic correlate that is actively used for prosodic contrasts in both native and target languages (e.g., f0 in Japanese: pitch accent, f0 in Vietnamese: lexical tone vs. f0 in English: word stress and accent) than those that are not active in L1 (e.g., stress-induced duration contrasts and vowel reduction in Japanese and Vietnamese). Nevertheless, these two studies are restricted to the acoustic correlates of L2 wordlevel prosody. The present study takes a further step in investigating the adaptation to contrasting temporal and accentual structure of (compound) word vs. phrase prosody using both quantitative acoustic measurement and qualitative tonal transcription (ToBI: Beckman & Ayers, 1994).

ARTICLE IN PRESS 162

T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

2.1. Comparative analysis of English word stress vs. Vietnamese tones Vietnamese—a tone language—and English—a stress accent language—have quite different systems of word prosody. English has a system of culminative word stress with predominantly short stressed word roots and reduced suffixes and thus the majority of words have stressed first syllables (Dauer, 1983; Garde, 1965). Also, stressed syllables have more complex structures, whereas unstressed syllables often have reduced vowels. Vietnamese, on the other hand, has a system of lexically distinctive tones (Nguyeˆ˜n, 1970, 1980) and is strongly syllabic in its phonological organization and morphology. Most syllables are independent morphemes and every syllable in an utterance bears an independent lexical tone specification which is not neutralized (become toneless) in context. In addition, no system of culminative word stress has been found; nevertheless, it is widely accepted that there is stress in the sense of accentual prominence at the phrasal level (Nguye˜ˆ n, 1970; Thompson, 1987). English and Vietnamese also differ in terms of how they manipulate the acoustic correlates at word-level prosody. Studies on the acoustic correlates of English stress show that judgements of linguistically significant stress in English are contingent upon at least 4 acoustic parameters: fundamental frequency, duration, amplitude, and vowel quality (Beckman, 1986; Fry 1955; among others). On the other hand, in Vietnamese, in addition to direction of f0 movement (tone contour) and f0 height—the two primary dimensions of linguistic tone—voice quality, intensity and duration have also been found to distinguish tones (Nguye˜ˆ n & Edmondson, 1997; Ph: am, 2003; Vu˜, 1981, 1982). Voice quality, particularly the laryngeal features of creakiness and breathiness are found to accompany some particular tones across dialects. Creakiness, in addition to occurring as a regular feature on the Broken (nga˜) and Drop (na.ng) tones of the Northern dialect and the Curve (hh oi) tone of the Central dialect, also occurs on some local variants of the Southern Drop tone (Vu˜, 1981). Creakiness and breathiness are found to accompany Falling (huye`ˆ n), Drop, Curve and Broken tones of the Hanoi dialect and claimed to be a distinctive register feature, distinguishing low register tones from high register tones (Ph: am, 2003). Intensity was found to highly correlate with f0 (Vu˜, 1981) and thus can be said to be supplementary to f0. Duration or particularly tonal length has been found to be not a distinctive feature in Vietnamese (Ph: am, 2003; Vu˜, 1981) but only varies in segmental contexts (i.e. tones in stop final syllables are inherently shorter than tones in other environments). From a study on native speakers’ perception of Vietnamese tones, Vu˜ (1981) came to the conclusion that the direction of f0 movement, f0 height and voice quality play a more important role than other tonal dimensions, such as duration and intensity, in the identification of tones. Intensity and duration supportively contribute to perception but play no independent role in tone recognition. The aforementioned studies show that even though both languages employ f0 as perceptual cues (to tones in Vietnamese and word stress and accent in English), the two languages differ in terms of the manipulation of the acoustic cues. Evidence on the transfer of tonal pitch features into Vietnamese learners’ English production and perception has been observed (Ho`ˆ , 1997; Nguye˜ˆ n, 1970, 1980, 2003; Pittam & Ingram, 1991; Riney, 1988). For example, at the production level, Nguye˜ˆ n (1970) noted that Vietnamese speakers of English tend to substitute the high rising (sac) tone for primary stress resulting in exaggerated pitch changes on stressed syllables. Pittam and Ingram (1991) and Riney (1988) observed tonal effects provoked by English words with syllables closed by an obstruent which were produced with a checked quality of the Rising (sac) tone and easily identified by its abrupt high rise (Nguye˜ˆ n & Ingram, 2004). In a recent study on Vietnamese perception of English polysyllabic words, Nguye˜ˆ n (2003) found that an English syllable could be perceived as a certain Vietnamese tone depending on the syllable structure (a closed syllable ending in an obstruent or a syllable ending in a sonorant) and stress levels (stressed and unstressed), namely stressed syllables associated with high level tones and unstressed syllables with low level tones, which suggests that there is perceptual tonal transfer which is constrained by relative pitch levels and the segmental composition of the syllables. The results of these studies suggest that Vietnamese learners make reference to pitch in tone in the perception of English stress, in other words, they seem to interpret the intonation patterns on English words and phrases in terms of their native language tone categories. However, the fact that word stress is ‘culminative’ in English in the sense that every content word or larger domain has exactly one primary-stressed syllable, and whatever syllables remain are subordinate to it (Trubetskoy, 1939/1969), potentially has far-reaching implications for gestural timing and rhythmic

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

163

differences between the two languages, and for how such differences may be modelled phonologically. In English, stress contrasts are enhanced segmentally: stressed syllables are longer than unstressed syllables (i.e., duration is a distinctive and active correlate in word stress production) and unstressed vowels tend to be reduced. In contrast, in Vietnamese, generally considered as a syllable-timed language (Nguyeˆ˜n, 1970, 1980) in which each syllable has a lexical tone specification, no systematic difference in duration or vowel quality among syllables has been found. Production-wise, one of the five or six lexical tones, namely tone na.ng (dropping), is much shorter than all other tones (Brunelle, 2003; Ph: am, 2003). However, tonal length was found to have no distinctive status in Vietnamese (Ph: am, 2003; Vu˜, 1981), suggesting that duration is not an active cue in tonal contrast. From this comparative analysis, it is predicted that Vietnamese learners of English will be able to produce f0 contrast—an acoustic correlate available in both languages—but will have problems using duration to enhance prominence contrast because they make limited use of it in their L1 and fail to reduce unstressed syllables. 2.2. Comparative analysis of compound—phrasal accent contrasts in English and Vietnamese The three stress patterns of interest in this study (illustrated in examples (a)–(c) below) exemplify three types of prominence: (a) blackberry ¼ bla´ckberry (compound, meaning: a kind of fruit) (b) black berry ¼ bla´ck be´rry (broad-focus noun phrase, meaning: a berry that is black) (c) black berry ¼ bla´ck berry (narrow-focus noun phrase, with an emphatic contrastive accent on black, as contrastive to green berry) The compound bla´ckberry as a single three-syllable word has a primary word stress on black and secondary stress on berry. The broad-focus noun phrase bla´ck be´rry consists of two accented constituents: a phrasal stress (or default accent assignment) on ber-, the first syllable of berry with a pre-nuclear accent on black (Farnetani & Cosi, 1988; Hardcastle, 1968). In the narrow-focus noun phrase bla´ck berry the syllable black receives an emphatic or contrastive stress. There has been controversy over the pragmatic functions, the phonological structures and the phonetic cues associated with these three ‘‘stress’’ patterns. Firstly, there is the question of the nature of the distinction between broad and narrow focus, which is commonly described in terms of scope: as to whether the listeners’ attention is drawn to new information that has scope over the whole phrase (broad focus) or only to the element within the phrase which contains new information (narrow scope). Narrow scope is often considered to convey the specialized communicative function of countermanding some erroneous assumption that the speaker believes the listener to hold. In this usage, narrow scope is identified as ‘‘contrastive stress’’. There is ongoing debate as to whether contrastive stress is a distinct type of prosodic effect or whether it should be simply treated as a case of accentual prominence. Indeed, some maintain that contrastive accents are formally different from other accents, either because the type of accent is different for the contrastive cases or because they are more prominent. Couper-Kuhlen (1984) and Chafe (1974) mention the existence of a sudden drop in f0 after the contrastive accent, whereas a non-contrastive accent is more likely to be sustained. Pierrehumbert and Hirschberg (1990) suggested that contrastive accents have an L+H* pattern while novelty accents have an H* form. Ladd and Morton (1997) have shown (for standard southern British English) that the ‘‘emphatic’’ peak accent type has a higher, later peak than the ‘‘normal’’ peak accent type, which is in consistent with Pierrehumbert and Hirschberg (1990)’s L+H* and H* contours. Bartels and Kingston (1994) argue that what distinguishes narrow focus is enhanced prominence on the focused element. In English broad and late narrow focus have identical accent patterns: pitch accents are aligned with the last accentable constituents within an intonation contour. Under early narrow focus the pitch accent is claimed to shift to an earlier location and the last accentable syllable is deaccented (Beckman & Pierrehumbert, 1986; Jackendoff, 1972; Ladd, 1980). Nevertheless, in a study on the production and perception of narrow-focus patterns (e.g., RED ball vs. red BALL) by 42 American English children (age 3–10) and 6 adults, Jannedy (1997) found that children as well as adults accent the noun regardless of whether the adjective or the noun is contrasted. However, there appears to be a strong tendency to use non- or less prominent accent types on the noun when the adjective is narrowly focussed. The perception results showed that adult listeners can reliably

ARTICLE IN PRESS 164

T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

interpret the pragmatic information of an early narrow focus regardless of whether the following noun is deaccented or not. The results of this study suggested that deaccenting the noun does not involve taking the accent completely away but using less prominent accent types over more prominent ones and that children have to learn to use less prominent accent types over prominent ones. Secondly, although the phonological status of the contrast between compound and phrasal stress is not in question for prototypical cases such as blackberry vs. black berry there is some doubt as to whether the prosodic pattern of the compound form is reliably distinguishable from its phrasal counterpart both in terms of production and perception by native listeners. In fact, Atkinson-King (1973) and Vogel and Raimy (2002) investigated the acquisition of compound vs. phrasal stress (ho´t dog vs. hot do´g) in English by children aged 5, 7, 9 and 11. The subjects were shown pairs of pictures representing a compound word and the corresponding phrase. They heard a prerecorded tape with the names of the items, and were asked to indicate which one they heard. The results of both studies (Vogel & Raimy (2002) replicated Atkinson-King (1973)’s study) showed that even though the youngest children produced the right pattern, they did not parse the pattern in perception until as late as 11 or 12 years of age. Despite the vast body of work on the phonology and the phonetics of English stress, the interface between word and phrasal prosody as exemplified by the acoustic correlates and perception of these three stress patterns has not been adequately investigated. Hardcastle (1968) examined the f0 and intensity changes between the accented syllables of each stress pattern in Australian English and found that the f0 and intensity changes were clearly greatest for the syllables carrying the emphatic stress pattern (narrow focus). He also found the f0 changes associated with the narrow focus closely resembled those associated with the compound stress pattern. A sharp upward f0 movement at the beginning of the second element was found in the broadfocus noun phrases, which did not occur in items associated with the emphatic or compound stress pattern. Surprisingly perhaps, Hardcastle’s perception experiment showed that a significant majority of listeners had difficulty in reliably distinguishing the three stress patterns. Narrow-focus phrasal stress was often confused with the compound stress. This he attributed to the similar f0 and intensity changes associated with these two stress patterns. Some other studies have focussed on the investigation of the acoustic and perceptual correlates of two patterns: compounds and broad-focus noun phrases only. Bolinger and Gerstman (1957) found that in three-constituent pairs like lighthouse keeper vs. light housekeeper, the temporal interval between the constituents was an efficient cue for distinguishing these pairs. But there has been no evidence of temporal interval as a distinctive cue in simpler pair constructions, like lighthouse and light house. Faure, Hirst, and Chafcouloff (1980) investigated two-constituent minimal pairs blackbird and black bird, finding that the two constructions differed significantly in duration; total duration of phrases were 20% longer than total duration of compounds. Their data suggested that, while both duration and fundamental frequency are important features in the production of compounds and phrases, pitch, but not temporal structure, is crucial for their perception. Farnetani and Cosi (1988) found that while duration is the major differentiating parameter in production (compounds are shorter in comparison to phrases), the perceptual distinction lies primarily in the different prominence pattern: a sequence of an accented constituent followed by an unaccented one in compounds and of two accented constituents (the second heard as stronger than the first) in non-compounds. In Vietnamese, it seems that compounds and noun phrases are syntactically and semantically contrastive but not phonologically contrastive under normal circumstances of production. In terms of syntactic structure, Vietnamese compounds and phrases have a reversed word order from English (Vietnamese: Noun+Adjective: hoa[flower] hoˆ`ng[pink] vs. English: Adjective+Noun: black berry). A subtype of Vietnamese compounds(e.g., specializing compounds vs. phrases) is claimed to have a reverse prosodic pattern of prominence of the English compound—phrase pattern; that is weak–strong for compounds and strong–weak for noun phrases (Thompson, 1987). However, no conclusive acoustic evidence has been found to support this compoundphrasal prominence pattern (Nguye˜ˆ n & Ingram, 2007). In addition, it is generally claimed that there is no tonal neutralization due to ‘‘sandhi’’ in Vietnamese (except in a subclass of reduplication), a phenomenon that occurs in other tone languages like Chinese and Thai (Chen, 2000; Gandour, 1974); that is, there is no toneless syllable such as in Shanghainese and the other Wu dialects or a systematic change of tone when words occur in combination such as in Mandarin Chinese (Chen, 2000). No systematic prosodic difference between

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

165

compounds and phrases has been found in Vietnamese. Instead, every syllable in both a compound word and a phrase has a full vowel and a lexical tone specification. In a recent experiment, Nguyeˆ˜n and Ingram (2007) investigated the acoustic and perceptual correlates that distinguish compounds (hoa hoˆ`ng: a rose) from phrases (hoa hoˆ`ng: a pink flower) in Vietnamese under two experimental conditions: one with a picture-naming task (representing spontaneous natural speech) and one with a minimal pair sentence task (the ‘maximally contrastive’ elicitation condition) by 45 Vietnamese native speakers of three dialects (Hanoi, Hue, and Saigon). It was found that even under conditions of maximal contrast, there was no conclusive acoustic evidence (in terms of f0 [Hz], intensity [dB], spectral tilt and duration) to support the claim of contrastive stress patterns between compounds and noun phrases in Vietnamese. If forced to realize a prosodic contrast under elicitation conditions of ‘maximal contrast’, Vietnamese speakers produced a juncture between the two constituents of the noun phrases, but only under this condition, and no juncture was present between components of compounds. Compound words as a whole were not temporally compressed in comparison to their phrasal counterparts as in English, a stressed language with stress or foot-based timing. Listeners relied only on the juncture between the two components of noun phrases as a cue to distinguish between phrases and compounds and failed to distinguish noun phrases from compounds in stimuli elicited under the picture-naming task where no juncture was produced between the two constituents of a phrase. In regards to phonetic cues of contrastive or corrective accentual focus in Vietnamese, some authors, such as Hoa`ng and Hoa`ng (1975), or Gsell (1980) consider that full tonal realization of accented syllables is one of the positive marks of prominence (accent) at phrasal level in Vietnamese. In a recent study on the effect of emphasis on glotalized and non-glotalized Vietnamese tones (the Hanoi creaky falling tone (i.e. the na.ng tone) in obstruent vs. sonorant final consonant environment respectively), Michaud and Vu˜ (2004) found that in Vietnamese emphasis, syllable lengthening appears as a speaker-dependent variable, whereas a stable correlate of emphasis is curve amplification, manifested as increased slope of f0 curve or as f0 register raising. 3. Three experiments 3.1. Preliminary experiment The aims of the first experiment were: (a) to construct a set of native speaker exemplar stimuli that could be used in perception and production experiments to test the mastery of the three-way pattern of prosodic contrasts between compound words and broad and narrow-focus noun phrases by Vietnamese learners of English, and (b) to establish the effectiveness of the accentual pitch and timing cues discussed earlier for discriminating among the three Australian English stress patterns. 3.1.1. Linguistic materials Three sets of minimally contrastive triplets of compound words (C), broad-focus noun phrases (B), and narrow-focus noun phrases (N) were constructed, using three syllabic templates: monosyllabic first element plus disyllabic second element (e.g. black berry); disyllabic first element plus monosyllabic second element (e.g. butter fish); disyllabic first and second element (e.g. English teacher). There were four tokens for each syllable type, yielding 12 sets of triplets or 36 items for each speaker. Each item was made up of a short context sentence, followed by fixed carrier sentence (This/it/he is a y) which ensured that target contrasts appeared as the final elements in each sentence and were in approximately the same position in the sentence intonation contour (see item example below and Appendices A and B for a complete list of stimuli). (a) (b) (c)

This is a bottle which is colored blue. This bottle isn’t colored yellow. This kind of jelly fish is quite common here.

It’s a blue bottle. It’s a blue bottle. It’s a blue bottle.

Broad focus NP Narrow focus NP Compound word

3.1.2. The speakers Four native speakers of Australian English, experienced in producing good quality exemplars for phonetic experiments were used; two adult males (J.I. and R.P.) and two females (E.C. and F.H.). J.I. is a phonetician

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

166

and R.P. a linguist and actor with professional voice training (and are co-authors of this paper). E.C. and F.H. are speech pathologists with extensive clinical and teaching experience. They were presented with a randomized list of the test triplets with their preceding context sentences and instructed to read each item (context+target sentence) in a natural speaking manner. 3.1.3. Measurements The sentences were digitized (at 20 kHz sampling rate and 16 bit precision) and spectrographic measurements were made via a sound editing and analysis program, the Emu Speech Tools (Cassidy, 1999). First, the Emu Labeller was used to mark the edges of the target syllables and vowels, relying primarily on the spectrographic display in the Labeller. Then the Emu Query Tool was used to extract syllable durations (ms) and f0 (Hz) and intensity (dB) values at vowel midpoint. The segmentation criteria were generally based on the major discontinuities of the energy distribution over frequency and time visible on the spectrograms. Taking butter fish as an example, the syllable bu- was measured from the onset of the closure for [b] to the cessation of the vowel formants; the syllable -ter from the onset of closure for [t] to the onset of fricative noise for [f]; the syllable fish from the onset of fricative noise for [f] to the offset of high-frequency fricative noise for [P]. Since all the stops of the test items appear utterances medially, the onset of closure for the stops at the start of the syllable was taken from the offset/cessation of the preceding word/segment. Studies of the effects of stress and accent on duration in English have shown that not only the rhymes but also the initial consonants are lengthened relative to their counterparts in unstressed syllables (Ingrisano & Weismer, 1979; Umeda, 1977; among others). Therefore, in this experiment, the duration of the whole syllable, including the onset and the rhyme, was measured. f0 and intensity measurements were taken at the center of the vowels of the stressed syllables, which was extracted automatically by using an EMU-R query command on the basis of the labelled vowel onset and offset. 3.1.4. Analysis The acoustic analysis concerns fundamental frequency (f0), duration and intensity of the constituents of test items. The following acoustic parameters were investigated: 1. 2. 3. 4. 5.

mid-vowel f0 value of the first and second stressed syllable (e.g., English teacher: V1F0, V2F0), mid-vowel intensity value of the first and second stressed syllable (V1 and V2 intensity), F0 change (V1F0–V2F0), intensity change (V1 intensity–V2 intensity), duration of the constituent syllables (e.g., English teacher: S1 and S2 for the underlined accent-bearing syllables and U1 and U2 for italic unstressed syllable), 6. duration of the whole compound words or noun phrases (blackberry, English teacher). In order to control for segment compositional effects in duration measurements, intrinsic and contextdependent f0 effects, and individual speaker differences, all measurements were analyzed as pair-wise comparisons within items (i ¼ 1, 6) and speakers (j ¼ 1, 4) and a mixed model ANOVA was used. e.g.

Eng. | Eng. | Eng.

lish| lish | lish

tea. | tea. | tea.

cher | cher | cher

Compoundi

speakerj bold ¼ accented

Ph. Broadi

speakerj

Ph. Narrowi

speakerj

The mixed model two-way ANOVA, with stress patterns (Compound, Broad, and Narrow) and speaker groups (native, advanced, beginner) as fixed effects and speakers and items as random effects was conducted on each acoustic parameter. The restricted maximum likelihood (REML) method was used to estimate variance components. A Tukey post hoc test (with the criterion p-value set at 0.05) was then conducted to determine the significant differences among levels of the main fixed factors and their interaction effects (i.e., the pair-wise comparison among the three stress patterns within each speaker group).

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

167

Guided by the ANOVA results, two of the acoustic measures (f0 change and duration of the whole words or noun phrases) were selected as a basis for classification of the stimulus tokens produced by the four native speakers. A scatter plot diagram of the test items on these two parameters, supported by a discriminant function analysis, provided an acoustic basis for distinguishing the three target stress patterns. It is also worth noting that the preliminary analysis on all nine acoustic parameters showed no systematic significant difference among the three syllabic templates (e.g. black berry, butter fish and English teacher) therefore, the syllabic template as a factor was excluded from further analysis. However, in some four-syllable compounds (e.g., English teacher), the second accent-bearing syllables, in spite of having less prominent f0, were not totally deaccented and will be discussed separately in the qualitative analysis of the f0 contour section. 3.1.5. Results: classification of native speaker productions A detailed report of individual parameters of the native English productions is presented along with the Vietnamese learners’ productions in Section 3.2. Here we merely present the results of the discriminant function analysis which established that the three English contrastive stress patterns may be successfully discriminated on the basis of speaker-normalized peak f0 measurements and rate-normalized duration differences between compound words and phrases. In order to examine whether stress patterns produced by native speakers are classifiable on the basis of f0 and duration cues, the f0 change measure (V1F0–V2F0, defined previously) and a normalized word duration measure were fed into a linear Discriminant analysis (Splus2000TM) to partition the stimulus items into three non-overlapping groups in acoustic space. The results are shown in a scatter plot (Fig. 1). The normalized duration measure used was a modified Z-score, derived as follows. Measurements of the duration of the whole compound word or noun phrase were expressed as a difference score for each item from the mean of its minimal set (thus normalizing for intrinsic phone value and speaker differences). Z-score ¼ (duration valuemean duration value/standard deviation). Then, the Z-score was converted to a t-score (t-score ¼ Z-score  10+50) in order to yield a whole and positive number. The scatter plot of native speakers’ items (Fig. 1) shows that all but 12 of the experimental stimuli (12/144: 92% of tokens) were correctly classified into their Broad, Narrow and Compound groupings on the basis of the f0 change measure and the normalized ‘word’ duration scores. 3.1.6. Conclusions: classification of native speaker productions The foregoing result demonstrates that on a set of tokens carefully produced by four phonetically aware native speakers, a simple linear discriminator equipped with the ability to measure pitch changes across adjacent stressed syllables and the relative timing of these intervals can discriminate the three stress patterns

Normalised word/phrase duration

Native speakers

51.0 50.5 50.0 49.5 49.0 48.5 0

100 F0 change

200

Fig. 1. The scatter plots of four native speakers’ stress patterns: B—Broad, N—Narrow, and C—Compound.

ARTICLE IN PRESS 168

T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

rather well—better, apparently, than native listeners, as the subsequent perception experiment revealed. We take this seemingly clear result, both as a validation of the role of temporal as well as accentual patterning in English compound (word)—phrasal contrasts and as, as it turns out, an indication of the inadequacy of a simple feature detector to adequately model perceptual discrimination of the prosodic patterns by native listeners. 3.2. Production experiment: Vietnamese learners of English Following a perceptual experiment (reported in Section 3.3) which familiarized Vietnamese learners with the stimuli and three target stress patterns, a production experiment was conducted to gather the same set of acoustic measures described previously, in order to assess prosodic transfer effects and the hypothesis that ‘active’ prosodic parameters in L1 feature prominently in the early stages of L2 acquisition and whether ‘inactive’ prosodic parameters can become ‘activated’ through prolonged exposure to L2. 3.2.1. Subjects Two groups of learners participated in this experiment; Vietnamese beginning learners of English, and Vietnamese advanced speakers of English. The beginner group consisted of 10 subjects (five males and five females) who were first year university students majoring in English in Hanoi. They were paid for their participation in the experiment. They all started learning English at the age of 12 (in secondary school) with the Grammar Translation method which focuses mainly on vocabulary and grammar learning. However, they were exposed to communicative English learning during their first year in university. As soon as they finished their first year, they participated in this experiment. The advanced group consisted of ten postgraduate students at the University of Queensland (five males and five females). They were in the age range 25–32. They had been in Australia for between 8 months and 10 years. All of them could be classified as competent and good users of English, since they had at least an average band score of 6.5 on the IELTS test (International English Language Testing System—a nine-band proficiency test of English on four skills: listening, speaking, writing and reading). They were all teachers of English who held a BA, a degree in EFL teaching, had 2–3 years’ teaching experience and were undertaking an MA in TESOL. All of the subjects started learning English at the age of 12 with the Grammar Translation method, which they studied through secondary and high school. They were exposed to the communicative language teaching method during 4 years of undergraduate study. They can speak Vietnamese and English, and have very limited knowledge of French which they learned as a second foreign language at university, where the curriculum has a strong emphasis on grammar. 3.2.2. Procedure Subjects had gained prior familiarity with triplets in a perceptual discrimination experiment, conducted in the previous hour. For the production experiment, subjects read the target sentences, accompanied by their prior context, from cards presented in quasi-random order. Before the recording, subjects were allowed sufficient time for familiarization and practice. They then read the text aloud three times in their normal speaking manner. Only the third repetition was recorded and used for analysis. The recording was made in a quiet room using a sound recording and editing computer software (Speech Station) at 20 kHz sampling rate and 16-bit precision. In the case of the beginner group, a 5-min review of the three stress patterns and explanation of the meaning of the triplets were given in both English and Vietnamese only so as to make sure they understood the target stress contrasts. 3.2.3. Signal processing and measurements Signal processing and measurements of fundamental frequency, duration and intensity were identical to those reported previously (Sections 3.1.3–3.1.4 above). In addition, for purposes of qualitative comparisons between the accentual patterns produced by the Vietnamese learners and those of the native Australian English exemplars, a ToBI style analysis of the intonation contours of the target sentences was conducted.

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

169

3.2.4. Prosodic transcription of f0 contours Prosodic transcription of the f0 contours on the accent-bearing syllables of the stress patterns was made in accordance with the guidelines for the English ToBI labelling (Beckman & Ayers, 1994). The English ToBI tonal patterns observed in the utterances were H*, L+H*, L*, L*+H, !H*, L+!H*, and H+!H*. The utterances were transcribed by two transcribers familiar with ToBI, one as the primary and the other as the secondary transcriber, who made an independent categorization of a subset of tokens. The inter-transcriber differences were then reviewed, discussed with the input of a third rater and resolved. 3.2.5. Results: Vietnamese learners’ productions and native speaker exemplars In the sections that follow the beginners’ and advanced learners’ acoustic production parameters are compared with those obtained from the four native speakers’ exemplars of the three stress patterns, the key f0 and duration characteristics of which were reported previously. Statistical test results refer to the mixed model two-way ANOVA and post hoc Tukey tests previously described (Section 3.1.4). 3.2.5.1. f0 values: quantitative analysis. The two-way ANOVA results consistently showed a significant main effect of stress pattern for all three f0 parameters (V1F0: F(2,823) ¼ 24.1, po.0001, V2F0: F(2,823) ¼ 34.78, po.0001, f0 change: F(2,823) ¼ 45.97, po.0001) and the interaction of stress pattern  speaker group (V1F0: F(4,823) ¼ 2.96, po.02, V2F0: F(4,823) ¼ 2.46, po.03, f0 change: F(4,823) ¼ 2.96, po.02) while there was no significant effect for speaker group. Post hoc results (Table 1) indicated that across the three speaker groups, the V1F0 of the narrow-focus pattern was significantly greater than the f0 value on the same syllable of the compound and broad-focus patterns while there was no or marginal significant difference in V1F0 between compound and the broad patterns. By contrast, V2F0 of the broad-focus pattern was significantly greater than f0 value on the second stressed syllable of the narrow and compound patterns across three groups. There was no significant difference in V2F0 between the narrow and compound patterns produced by advanced and beginner groups while a marginal difference was found in items produced by native speakers’ group (NoC, po.04). The result on f0 change showed that the magnitude of f0 change from the first stressed syllable to the second stressed syllable of the narrow pattern was significantly greater than that of the broad and compound patterns across the three speaker groups and that of the compound patterns was greater than the broad pattern except for the beginner group (p ¼ .07 n.s.). Fig. 2 illustrates the difference in magnitude of f0 change among the three stress patterns across three speakers groups, supported by Tukey results in Table 1. f0 change was greatest in the narrow-focus pattern, less in the compound pattern and least in the broad-focus pattern. The degree of f0 change is also significantly different among the three stress patterns for the native and advanced groups (N4C4B), while for beginner group, only f0 change of the narrow pattern was significantly greater than that of the broad and compound

Table 1 The post hoc Tukey results on F0 values Native

Advanced MD

Sig.

Beginner MD

Sig.

MD

Sig.

V1F0

BC N4C N4B

7.1 25.5 32.6

p ¼ .3 n.s. po.001 po.0001

BC N4C N4B

7.7 13.0 20.8

p ¼ .1 n.s. po.01 po.0001

B4C N4C N4B

9.9 19.1 9.1

po.04 po.0001 p ¼ .052 n.s.

V2F0

B4C NoC B4N

21.7 15.6 37.3

po.004 po.04 po.0001

B4C NC B4N

20.3 0.3 19.9

po.0001 p ¼ 0.9 n.s. po.0001

B4C NC B4N

22.3 1.8 20.5

po.0001 p ¼ 0.7 n.s. po.0001

F0 change

C4B N4C N4B

28.8 41.1 70.0

po.01 po.001 po.0001

C4B N4C N4B

28.0 16.7 40.7

po.0001 po.03 po.0001

CB N4C N4B

12.4 17.3 29.7

p ¼ .07 n.s. po.02 po.0001

MD: mean difference.

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

170

patterns (N4C ¼ B). It is also shown in Fig. 2 that the degree of f0 change was less for non-native speakers than for native speakers, particularly least for the beginning group. The compact f0 change in the broad pattern indicates that the two accent-bearing syllables of this pattern have comparatively high f0 level, i.e., both the adjective and the noun are accented. The large f0 decline from the first to the second accent-bearing element showed that the narrow and compound patterns are left headed, and with greater f0 change the narrow pattern has a higher f0 peak than the compound. The insignificant difference in f0 change between the compound and broad patterns as well as the small f0 change from the first to the second accent-bearing element produced across three stress patterns by the beginners compared to native and advanced speakers indicates that they failed to de-accent the accent-bearing nouns in the narrow and compound patterns. 3.2.5.2. f0 contours: qualitative analysis. Generally both transcribers agreed on the presence/absence of a pitch accent on utterances produced by three speaker groups (90%). Disagreement was mainly on tone types produced by non-native speakers. Agreement rate on tone types was much better for utterances produced by the advanced speakers than the beginners (advanced: 81%, beginner: 53%). It was sometimes difficult to decide a tone type for many of the beginners’ utterances due to various tonal shapes being fairly different from standard English ToBI tones, probably as a result of lexical tone transfer. The results are reported in terms of the proportions (percentages) of tonal patterns observed on each of the three target stress patterns (broad, narrow, compound) for each speaker group. The results in Table 2 show that the intonation pattern of broad-focus accent corresponds with a double accent across the three speaker groups with three major f0 patterns: (1) a step-up pattern, with a sharp upward movement on the second word from a flat high tone of the first word followed by a steep fall (H* L+H*; Fig. 3a1) or two successive step-up peaks (L+H* L+H*), (2) two comparative f0 peaks either with the flat hat pattern (H* H*; Fig. 3b1) or comparative double peak pattern (L+H* L+H*), and (3) a step-down pattern with the second tone lower than the first one (L+H* L+!H*, H* !H*, or H* H+!H*; Fig. 3c1). As shown in Table 2, less than half of the test items across three speaker groups (native: 30%, advanced: 47%, beginner: 36%) have the clear step-up patterns consistent with Hardcastle’s findings while a majority of the test items have comparative f0 peaks. The step-down f0 in some items may be due to an utterance final pitch declination effect. The intonation contour of the narrow-focus pattern spoken by Australian English speakers contains a single peak with an enhanced prominence (either L+H* or H*; Fig. 3a2, b2, c2, d2 and e2) on the contrastive adjective followed by a sudden drop in f0 after the contrastive accent and a deaccented noun, consistent with findings in previous studies on contrastive focus in English (Beckman & Ayers, 1994; Ladd, 1980; Pierrehumbert & Hirschberg, 1990). As shown in Table 2, while the native speakers consistently deaccented the noun in the narrow-focus pattern (94%), Vietnamese beginning learners of English failed to deaccent the noun (only 5% of cases involved noun deaccentuation but 46% of items carried comparative peaks and 49% f0 change (Hz) 100 90 80 70 60 50 40 30 20

B C N ∗∗

∗ ∗

∗∗

∗∗

10 0 Native

Advanced

Beginner

Fig. 2. The comparison of f0 change between stress patterns among speaker groups. Error bars: standard error of mean, **po.01, *po.05.

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

171

Table 2 Percentage of tonal pitch patterns per stress pattern per speaker group Patterns

Native

Advanced

Beginner

Total (%)

Tone patterns

%

Total (%)

Tone patterns

%

Total (%)

Tone patterns

%

Broad F0 step up

30

H* L+H*

L+H* L+H*

28 2

47

41

H* L+H* H*

H* H* L+H*

35 4 2

36

L+H* L+H* L+H* L+H* H* L+H* H*

42 3 1 1 32 3 1

36

Comparative F0 peaks

H* L+H* L* L*+H H* H* L+H*

F0 step down

29

H* L+H* L+H*

!H* L+!H* !H*

15 10 4

17

H* L+H* L+H*

!H* !H* L+!H*

13 3 3

26

H* L+H* L* L*+H H* L+H* H* L+H* H* H* L+H* L+H* L+H*

L+H* L+H* L+H* L+H* H* L+H* L+H* H* !H* H+!H* !H L+!H* H+!H*

31 3 1 1 28 7 3 2 9 7 4 4 2

Narrow Deaccent

94

L+H* H* L+H*

y y !H*

75 19 6

70

L+H* H* L+H* H* L+H*

y y !H* !H* H+!H*

68 2 11 5 3

5

L+H*

y

49

11

H* L*+H L+H* H*

H* H* H* L+H*

6 2 2 1

46

L+H* L+H* L+H* H* H* H* L+H* L+H* H*

!H* H+!H* L+!H* !H* H+!H* H* H* L+H* L+H*

21 10 8 7 3 23 14 6 3

67

H* L+H* H* H* L+H*

y y !H* H+!H* !H*

58 9 19 2 1

4

H* L*+H

H* H*

10 1

H* L+H* H* H* L+H* L+H* L+H* H* H* L+H*

y y !H* H+!H* !H* L+!H* H+!H* H* L+H* H*

3 1 21 10 6 5 3 43 7 2

6

19

F0 step down

Comparative F0 peaks Compound Deaccent

83 17

F0 step down

Comparative F0 peaks

H* L+H* H* H* L+H*

y y !H* H+!H* !H*

79 4 2 13 2

22

11

38

45

52

5

Tone 1 is the pitch contour on the first accent-bearing syllable and Tone 2 is that on the second accent-bearing syllable.

involved step-down accent peaks; Fig. 3g2 and h2). Nevertheless, the advanced Vietnamese speakers deaccented the noun of the narrow-focus pattern in 70% of occurrences (compared to 11% with comparative peaks and 19% with step-down peak patterns). The f0 contour of the compound patterns spoken by native Australian English speakers had a single f0 peak similar to that of the narrow pattern, but with a less prominent f0 on the adjective (a total of 83% including 79% with H* and 4% L+H* patterns; Fig. 3a3, b3, c3 and e3). 17% of native speakers’ compounds have a step-down accent pattern with a non-deaccented noun (H* H+!H*; Fig. 3d3). This pattern is found mainly in the four-syllable template (e.g., open classroom, plastic money, sleeping partner). An explanation for this might be that these four-syllable compounds have not yet been completely lexicalized. While the advanced Vietnamese speakers could deaccent the noun of the compound on 67% of tokens (compared to 11% with a

ARTICLE IN PRESS 172

T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

comparative peak and 22% with step-down peak patterns), beginners mostly preserve the accent on the noun of the compound (only 4% with noun deaccentuation, up to 52% of comparative peak and 45% of step-down peak; e.g., Fig. 3g3 and h3).

Fig. 3. Samples of intonation patterns. The three target patterns were segmented from their own respective carrier phrases in the experiment data set and pasted on the same time axis for purpose of illustrations only.

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

173

Fig. 3. (Continued)

The qualitative examination of the f0 contour, consistent with the statistical results on f0 values, showed that the narrow-focus noun phrases and compounds had a single accent with left headed f0 pattern while the broad pattern had a double accent. The contrastive accent in the narrow pattern had a higher f0 peak than the compound. The results from Table 2 showed that non-native speakers markedly differed from native speakers

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

174

in several ways. First, beginning speakers failed to deaccent the second constituent of the compound and narrow focused patterns; nevertheless the advanced speakers appeared to be able to enhance the contrastive prominence on the adjective by de-accenting the noun to a greater extent. Second, the f0 patterns produced by non-native speakers were more varied than those of native speakers, suggesting a transfer effect of the various tonal contours on different syllable types from Vietnamese. For example, compared with native speakers, Vietnamese learners seemed to use more rising patterns (L+H*) characterized by a sharp rising tone particularly on accented syllable and stressed checked syllables (syllables ending in an obstruents) (Fig. 3f2, f3, h1 and h2). This can be explained as a result of their association of English accented syllables with the high rising Vietnamese Sac tone as observed by previous researchers (Nguyeˆ˜n, 1970, 2003; Pittam & Ingram, 1991; Riney, 1988). In addition, Vietnamese speakers also used more step-down pattern from a sustained high f0 on the previous unstressed syllable (H+!H*), due to a failure to deaccent unstressed syllables, consisting with findings from Nguye˜ˆ n and Ingram (2004). 3.2.5.3. Intensity. The two-way ANOVA results consistently showed a significant main effect of stress pattern for all three intensity parameters (V1 intensity: F(2,823) ¼ 15.47, po.0001, V2 intensity: F(2,823) ¼ 15.39, po.0001, intensity change: F(2,823) ¼ 26.51, po.0001) and the interaction of stress pattern  speaker group(V1 intensity: F(4,823) ¼ 2.66, po.04, V2 intensity: F(4,823) ¼ 3.03, po.02, intensity change: F(4,823) ¼ 2.82, po.03) while significant effect for speaker group was found only for native and beginner groups(F(2,823) ¼ 11.69, po.0001 and F(2,823) ¼ 8.27, po.001, respectively). The Tukey post hoc results (Table 3) indicated that across the three speaker groups, the V1 intensity of the narrow focus and the compound patterns was significantly greater than the intensity value on the same syllable of broad-focus pattern. By contrast, V2 intensity of the broad-focus pattern was significantly greater than the intensity value on the second stressed syllable of the narrow and compound patterns, while marginal or insignificant differences were found in intensity values on the second syllables between the narrow and compound patterns(native: p ¼ 0.2 n.s., beginner: p ¼ 0.1 n.s., advanced: CoN: po.05). Similarly, the magnitude of intensity change from the first stressed syllable to the second stressed syllable of the narrow and compound patterns was significantly greater than that of the broad patterns across the three speaker groups while the difference between the compound and narrow patterns was insignificant (Fig. 4). It is also shown in Fig. 4 that the degree of intensity change was greatest for the advanced group, less for native speaker group and least for beginner group. Generally, the intensity pattern seems to mirror that of f0 pattern. Nevertheless, the correlation analysis showed only a significant positive correlation (r ¼ .47) between f0 change and intensity change produced by native speakers while no correlation was found for any acoustic parameters of the two Vietnamese groups. Table 3 The post hoc Tukey results on intensity values Native

Advanced

Beginner

MD

Sig.

V1 intensity C4B N4C N4B

3.6 3.1 4.6

po.02 po.03 po.01

C4B N4C N4B

1.1 1.3 2.4

V2 intensity B4C CN B4N

3.5 1.6 5.2

po.02 p ¼ 0.2 n.s. po.001

B4C CoN BN

Intensity change C4B NC N4B

4.9 2.4 7.3

po.001 p ¼ .09 n.s. po.0001

C4B NC N4B

MD: mean difference.

MD

Sig.

MD

Sig.

po.04 po.01 po.0001

C4B N4C N4B

1.1 2.1 1.2

po.05 po.0001 po.03

2.9 1.9 0.0

po.001 po.05 p ¼ .09 n.s.

B4C CN B4N

3.4 1.3 2.1

po.0001 p ¼ 0.1 n.s. po.02

4.0 1.5 2.5

po.0001 p ¼ .09 n.s. po.01

C4B NC N4B

2.4 0.7 3.2

po.01 p ¼ 0.3 n.s. po.001

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

175

Intensity change (dB) 18 16 14 12

B C N

∗∗

10 8 6

∗∗

4

∗∗

2 0 Native

Advanced

Beginner

Fig. 4. Comparison of intensity change between stress patterns among speaker groups. Error bars: standard error of mean, **po.01.

3.2.5.4. Duration. The two-way ANOVA results consistently showed a significant main effect of stress pattern for phrase duration and its syllable constituents (Phrase: F(2,823) ¼ 93.94, po.0001, S1: F(2,823) ¼ 77.3, po.0001, U1: F(2,539) ¼ 18.77, po.0001, S2: F(2,823) ¼ 63.1, po.0001, U2: F(2,539) ¼ 5, po.01) and the interaction of stress pattern  speaker group(Phrase: F(4,823) ¼ 40.17, po.0001, S1: F(4,823) ¼ 23.34, po.0001, U1: F(4,539) ¼ 5.06, po.001, S2: F(4,823) ¼ 16.02, po.0001, U2: F(4,539) ¼ 6.4, po.0001) while no significant effect for speaker group was found. As shown by the post hoc Tukey results in Table 4, generally there were significant differences in duration among the stress patterns for only the native and advanced speaker groups, while beginning Vietnamese speakers of English showed either no difference or very marginal differences in duration among the three stress patterns, particularly between compounds and phrasal counterparts. The duration of all constituents of compounds spoken by native speakers (first stressed syllable [S1: e.g., English teacher], unstressed syllable in the first word [U1: e.g., English teacher], second stressed syllable [S2: e.g., English teacher], and the final syllable [U2: English teacher]) was significantly shorter than the counterpart syllables in the other two phrasal constructions (except a negligible difference between narrow focus and compound patterns on U2, p ¼ 0.1 n.s.). Similarly, the duration of all constituents of compounds produced by advanced speakers (except for duration of S1 and U1 in the first words between broad focus and compound patterns) was also significantly shorter than the counterpart constituents in the two other phrasal constructions. By contrast, there was no significant and consistent difference in duration of constituents of compounds produced by the beginner group and the counterpart syllables in the two other phrasal constructions. Particularly the duration of constituents of compounds were sometimes even longer than the phrasal counterparts (S1: BoC, U2: N&BoC), indicating that beginners failed to compress duration of the constituents of the compound word. As shown in Table 4 and Fig. 5, the overall duration of compound words produced by both the native and advanced speakers were significantly shorter than their phrasal counterparts. This result together with the consistent shortening of compound constituents discussed above indicated that native speakers made compression of the compound constituents to conform to the compound word timing template. The result also showed that advanced speakers can make compression to produce duration contrast between compounds and phrases even though to a less extent than native speakers. By contrast, the insignificant difference between duration of whole compounds produced by beginners and phrasal counterparts together with their failure to shorten compound constituents showed that beginning speakers failed to produce a duration contrast between English compound words and phrases. This compound word vs. phrase duration contrast mechanism is further discussed in the discussion section below. 3.2.6. Discussion 3.2.6.1. f0 and intensity indices. As predicted, non-native speakers generally have no problems encoding different levels of f0 and intensity contrasts on accent-bearing syllables (e.g., encoding greater f0 and intensity

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

176

Table 4 The post hoc Tukey results on duration values Native

Advanced MD

Sig.

Phrase B4C N4C B4N

26.8 203.9 63.9

po.0001 po.0001 po.01

S1 B4C N4C BoN

77.5 88.6 11.1

U1 B4C N4C BN

Beginner MD

Sig.

MD

Sig.

B4C N4C BoN

88.3 121.6 33.2

po.0001 po.0001 po.02

BoC NC BoN

28.1 8.6 36.7

po.04 p ¼ 0.5 n.s. po0.01

po.0001 po.0001 p ¼ .25 n.s.

BC N4C BoN

4.3 57.6 62.0

p ¼ 0.5 ns. po.0001 po.0001

BoC N4C BoN

14.4 14.7 29.1

po.02 po.02 po.0001

51.7 41.7 9.9

po.0001 po.002 p ¼ 0.4 n.s.

BC N4C BoN

4.7 31.8 27.0

p ¼ 0.5 ns. po.0001 po.001

BC N4C BoN

11.9 32.1 44.1

p ¼ 0.1 n.s. po.0001 po.0001

S2 B4C N4C B4N

96.6 44.9 51.6

po.0001 po.0001 po.0001

B4C N4C B4N

73.3 35.5 37.7

po.0001 po.0001 po.0001

BC NC B4N

5.3 10.7 16.1

p ¼ 0.5 n.s. p ¼ 0.1 n.s. po.05

U2 B4C NC BN

37.0 18.2 18.7

po.002 p ¼ 0.1 ns. p ¼ 0.1 ns.

B4C NC BN

24.2 13.8 13.4

po.002 p ¼ 0.1 ns. p ¼ 0.06

BoC NoC BN

16.5 25.0 8.5

po.03 po.001 p ¼ 0.2 n.s.

MD: mean difference.

1400 U2

S2

U1

S1

1200 1000 800 600 400 200 0 B

N Native

C

B

N Advanced

C

B

N Beginner

C

Fig. 5. Comparison of duration of word/phrase (across constituent syllables: ¼ S1+U1+S2+U2) between stress patterns among speaker groups. Vertical axis: mean duration (ms). Legend: S1: first stressed syllable (e.g., English teacher), U1: unstressed syllable in the first word (U1: e.g., English teacher), S2: second stressed syllable (e.g., English teacher), and U2: the final syllable (e.g., English teacher).

on the adjective of the narrow than that of the compound). However, they failed to realize the syntagmatic contrasts of accent (i.e., more prominent elements alternate and contrast syntagmatically with less prominent ones) and thus failed to deaccent the nouns of the narrow and compound patterns. In addition, the f0 pattern produced by Vietnamese speakers were more varied than that of native speakers, suggesting a transfer effect of the various tonal contour on different syllable types from Vietnamese, supporting previous observations on

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

177

Vietnamese learners’ making reference to tonal features in the perception and realization of English stress patterns (Hoˆ`, 1997; Nguyeˆ˜n, 1970, 1980, 2003; Pittam & Ingram, 1991; Riney, 1988). The current results of beginning L1 Vietnamese’s failure to deaccent English unstressed syllables is consistent with McGory (1997)’s findings for Mandarin and Seoul Korean L1 speakers, showing that English L2 learners of different L1 prosodic systems (Vietnamese, Mandarin Chinese, and Seoul Korean) all fail at first to produce contrastive f0 prominence patterns between stressed and unstressed levels in an English-like way. This can be explained as a confluence of different types of L1 interference that result in f0 patterns that are different in shape and possibly alignment but all interpretable by L1 native speakers only as a failure to deaccent. 3.2.6.2. Discussion of duration. The duration analysis showed that there were significant differences in temporal structure between compounds and their corresponding (broad or narrow) phrasal constructions, further confirming findings on compound word vs. phrase duration contrast by previous researchers (Farnetani & Cosi, 1988; Faure et al., 1980). The duration compression was evidenced in all syllable constituents of compounds. The finding of a temporal compression effect associated with compounds may be explained by two different mechanisms: (1) accentual lengthening and word edge lengthening in phrases or (2) a word shortening effect in compounds, i.e., the syllables in the compound were compressed to conform to the temporal template of a word by the word shortening effect. This will be further elaborated in the Discussion section below. On the basis of the first mechanism, it can be clearly seen that the constituents of the phrases were longer than compound counterparts due to two different lengthening effects: an accentual lengthening and a word edge lengthening. In black berry, the syllable black in the phrases were longer possibly due to an accentual and word edge lengthening effect, the syllable ber- was also longer than the compound counterpart due to an accentual lengthening particularly in the broad-focus pattern, while in the narrow focus phrase it is preserved to be an accent-bearing element, i.e., it can be de-accented but is not reduced. On the other hand, compounding may be seen phonologically as a process in which words (new lexical items) are created from phrases (compositional syntactic constructions) by altering the prosodic characteristics of the phrasal construction to conform to the prosodic template of the word, i.e. the process of taking a single accentable syllable and being subject to temporal compression effects associated with affixation in polysyllabic word forms. The conformity to a ‘‘word accentual template’’ is supported by compound accent patterns in different languages. As in other Germanic languages, most English words have initial stress, so the fact that it is the accent-bearing element of the first word in the compound that gets to keep its stress as the primary stress is part of this template. By contrast, in Italian, a stress accent language, stress tends to occur on the stressable syllable of the rightmost morpheme carrying an underlying accent of a word (Garde, 1965; Rossi, 1998); and in Italian compounds, stress ‘‘is assigned to the last member of a compound’’ (e.g., lava pia´tti ‘dish washer’) (Vogel & Raimy, 2002, p. 229). Further evidence is found in pitch accent languages. In Turkish, a pitch accent language (Levi, 2005), word-level accent consistently shows the pattern of promoting the leftmost site of the lexical accent (Barker, 1989; Inkelas, 1999). Compounds, like affixed words, also ‘‘promote the accent of the leftmost member of the compound’’ (e.g., aya´k-kabi ‘shoe’ vs. aya´k ‘foot’ and kabi ‘cover’) (Levi, 2002). In Japanese compounds, on the other hand, it’s often the tone pattern of the last element that is kept (Kubozono, 1993). Therefore, it is argued that the finding on the compression of all compound constituents in comparison to phrasal counterparts in this study can also be explained as a ‘‘word shortening effect’’ in which the first element of the compound takes on the accentual characteristics of primary word stress (e.g., black in blackberry), the other accent-bearing elements (e.g., ber- in blackberry and tea- in English teacher) are either de-accented and/or reduced) while every constituent syllable is subject to the rhythmicinduced word template. As a result of this, the compound as a whole takes on the rhythmic properties of a lexical word, as a domain for rhythm-induced temporal compensation effects. The consistent compression of all constituents particularly even unstressed syllables at a word edge together with the de-accented and reduced vowel quality of the second stressed syllables observed in many highly lexicalized compounds produced by native speakers (e.g., /blæb=ri/ compared to the full vowels in phrasal counterparts (/blæk beri/) tend to support this analysis. Particularly even though many four-syllable compounds (e.g., English teacher, plastic money, open classroom) still preserve a less prominent accent on the noun, they all have

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

178

more compressed syllable constituents and thus shorter duration as a whole than their phrasal counterparts (B4C: po.001; N4C: po.001; NB: p ¼ .05). A similar temporal compression effect is found in Italian compounds compared to their phrasal counterparts (e.g., centopiedi ‘centipede’ vs. cento piedi ‘one hundred feet’) in our preliminary acoustic analysis. It would be interesting to extend this investigation to other languages. 3.2.6.3. Discriminant analysis. The result of the discriminant analysis on non-native speakers’ items (similar methodology as described in Section 3.1.5) showed that in contrast to native speakers’ items which were wellpartitioned into three non-overlapping groups in acoustic space (Fig. 1), many items spoken by the advanced speakers were misclassified by the discrimination function, particularly narrow patterns misclassified as broad along the f0 x-axis (Fig. 6a). This is consistent with the quantitative and qualitative analysis of f0 contours (Section 3.2.5.2) that many advanced speakers failed to de-accent the noun of the narrow patterns. Compounds are generally well separated from phrases (in spite of some compounds misclassified as narrow)

normalised word/phrase duration

Advanced speakers

51.0 50.5 50.0 49.5 49.0 48.5 -30

-10

10 30 F0 change

50

70

90

70

90

normalised word/phrase duration

Beginning speakers

51.0 50.5 50.0 49.5 49.0 48.5 -30

-10

10 30 F0 change

50

Fig. 6. The scatter plots of stress patterns spoken by Advanced and Beginning Vietnamese speakers of English: B—Broad, N—Narrow, and C—Compound.

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

179

along the duration y-axis. However, the three stress patterns could be separately grouped. The scatter plot Fig. 6b showed that items spoken by beginning speakers were unclassifiable into three separate groups on either f0 or duration dimensions, a further confirmation that non-native speakers failed to de-accent the nouns of the compound and narrow patterns and to compress constituents of the compounds. 3.3. Perception of three stress patterns In the perceptual experiment, native Australian English listeners and non-native listeners (Vietnamese learners of English) heard a context sentence (e.g. This berry is black). They were then required to make a forced choice among three alternatives as an appropriate continuation of the preceding context: a carrier sentence ending with a compound word, a narrow-focus noun phrase, or a broad-focus noun phrase. The acoustic analysis presented in the production experiment suggests that an optimal perceptual strategy would employ a combination of: (a) pitch change over the nuclear elements of the compound word or phrase to distinguish left-headed (narrow-focus noun phrases and compounds) from right-headed constructions (broadfocus noun phrases) and (b) normalized word/phrase duration to distinguish compounds from phrasal constructions. English listeners may be expected to use this strategy, particularly considering that the two stimulus dimensions, respectively, map directly onto separate phonological dimensions of accent and (word) rhythm. Vietnamese listeners may be expected to transfer L1 tonal contrasts onto English stress (accent) perception and therefore would also be predicted to make use of the pitch-change dimension. Duration is not an active and distinctive cue in Vietnamese tonal contrasts and thus would be unlikely to be involved in tonal transfer effects. Also, it may be predicted that Vietnamese listeners would be insensitive to the duration contrast between compounds and phrases because Vietnamese lacks the specific temporal compensation effects associated with culminative word stress and foot timing. However, this does not necessarily imply poorer performance, because as was shown previously, the tokens in this experiment supported a three-way classification of the stimuli (N4C4B). Advanced Vietnamese learners of English would be expected to perform at a higher level than beginning learners. But interest on the effects of L2 fluency was not on the level of performance of the two groups so much as whether a comparison of their behavior on the tasks indicates any shift in strategy from a ‘Vietnamese’ towards an ‘English’ response strategy. This has bearing on the question of whether there is evidence of reorganization of perceptual schemas in adaptation to L2 prosody, or simply more efficient usage of cues predicted to be used in L1 perceptual processing. 3.3.1. Method: perception experiment 3.3.1.1. Subjects. Three groups of subjects participated in this experiment; Vietnamese beginning learners of English, advanced Vietnamese speakers of English, and a control group of native Australian speakers of English. The beginner group consisted of 80 first-year English-major undergraduates (20 Hanoi, 20 Hue, 20 Nghe An and 20 Saigon speakers; half-male and half-female in each dialect group) with no known auditory deficiencies. The advanced group consisted of 20 postgraduate students at the University of Queensland (12 Southerners, 3 Northerners, and 5 Hue dialect speakers). Ten subjects in each non-native group also participated in the production experiment. The control native English listener group consisted of 29 subjects (five males, 24 females) who received course credit for their volunteer participation in the experiment. All were first year linguistics students at the University of Queensland. Their first language was Australian English. 3.3.1.2. Stimuli. As reported in the production experiment, four native speakers (two males and two females) recorded the sentences, but only stimuli of two male linguists were used for this perceptual experiment. In the listening identification test, listeners heard a contextual sentence (the target context) followed by three different test sentences carrying the three stress patterns. The context sentence was read once followed by the three test sentences spoken in sequence, with a short pause between each (approximately 1 s) and then repeated. The subjects’ task was to choose the test sentence with the appropriate stress pattern that

ARTICLE IN PRESS 180

T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

fitted the meaning of the context sentence by circling the letter corresponding to their response in the answer sheet (see Appendix C for the answer sheet). An example of a test trial is as follows: Context phrase (target)

Test sentences

Category

Response type

This berry isn’t green.

(A) It’s a black berry (B) It’s a black berry (C) It’s a black berry

Broad Narrow Compound

Error Correct Error

The response sheet contained no orthographic cues to the correct response. Although two speakers read the test items, there was no mixing of speakers’ voices within trials. The speaker who provided the context phrase was always the one who produced the three test sentences. The order of the correct sentence within a trial triplet was random. The stimuli were prepared using a sound-editing program (Speech Station), stored as wav. files and put in a web page format. There were in total 72 testing items (12 items  3 stress patterns  2 speakers). The test items were put into blocks of 12, with a gap of about 6 s between each item. 3.3.1.3. Procedures. The beginning learners did the perception test in a quiet classroom at a university in each location (Hanoi, Saigon, Hue, and Nghe An). The test was played from a Compaq laptop computer with loud speakers of good quality. For the native English listeners and advanced Vietnamese learners of English, the perception test was carried out in the Phonetic Laboratory at the University of Queensland. The test was played from a desktop computer with loud speakers of good quality. Before the test, there was a 5-min training in which listeners listened to six examples to acquaint themselves to the format of the test. Before analysis and discussion of the results, it is important to note that in the listening task of this experiment, the subjects were comparing, for each given trial, a set of auditory images of the three different stress patterns for a particular context sentence presented both textually and auditorily, in isolation from the other two context sentences. This means that in order to make a ‘‘correct’’ response, the subject would need to have interpreted the context correctly in isolation without being able to compare it to the other two contexts. As a result, it was a concern that an incorrect response to a target context could be due to either subjects’ misinterpretation of the context or their inability to perceive the stress pattern correctly. This concern was addressed by the explanation of the meaning of the context sentences that was given to non-native participants in the training session prior to the listening task. In addition, it is also noted that an alternative form of test task in which subjects listened to only the target stress pattern and then identified the context (e.g., they heard ‘‘It’s a black berry’’ and then identified the most suitable context among the three given: (1) This berry is back. It’s a y, (2) This berry isn’t green. It’s a y, and (3) This is a kind of fruit. It’s a y) was carried out in the pilot study and it turned out to be too hard for nonnative listeners: to listen to only the target stress pattern and then identify the context. Therefore, the current test task (as described above) had to be used since it proved to be more suitable for non-native listeners: they listen to three stress patterns in a row, compare and then pick out one they think fit the context sentence. There might be a concern about a large memory demand of the current test task, and this was anticipated and already addressed with the repetition of the audio input twice. 3.3.2. Results: perception experiment 3.3.2.1. Effects of L1 background and L2 proficiency levels. In order to examine the effect of language background and English proficiency levels on subjects’ perception of the three stress patterns, a three-way analysis of variance (ANOVA) was conducted. The experimental factors in this analysis were the listener groups (three levels: native listeners, non-native beginners, and non-native advanced learners), the target stress patterns (three levels: compound, narrow focus, and broad focus) and the speakers (two levels: J.I. and R.P.). The speaker factor is of interest because it was anticipated that native and non-native listeners may respond differently to speaker variation in the stimulus items. The dependent variable was the percentage correct score for each item, calculated within each level of the three independent factors. This analysis was adopted because

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

181

the three listener groups have different sample sizes (beginners: 80, advanced learners: 20, and native listeners: 29). The non-native listeners’ dialect as a factor was found not to be a significant effect in a preliminary analysis and was therefore excluded from the current analysis. The ANOVA results show a significant main effect for each of the three factors (stress patterns: F(2,216) ¼ 41.65, po.0001; listener groups (i.e. proficiency levels): F(2,216) ¼ 16.12, po.0001; and speakers: F(1,216) ¼ 6.22, po.02) and two interaction effects (stress patterns  listener groups: F(4,216) ¼ 7.6, po.0001; listener groups  speakers: F(2,216) ¼ 3.7, po.03). The most significant main and interaction effects between stress patterns and listener groups (proficiency levels) are shown in Fig. 7. Pair-wise comparisons among the three listener groups by the Tukey method indicated that native listeners and advanced learners did not differ significantly from each other in overall performance (Native vs. Advanced: non-significance), but both differed significantly from the beginner group (Native vs. Beginners: po.01; Advanced vs. Beginners: po.01). However, this main effect of listener group needs to be evaluated in relation to the highly significant interaction of this factor with stress pattern (discussed below). The highly significant main effect of stress pattern also needs to be evaluated in relation to its interaction with the listener group factor. The marginally significant main effect of speakers (R.P. vs. J.I.) also requires interpretation in relation to its interaction with listener group. In order to examine the interaction between stress patterns and listener groups, a post hoc pair-wise multiple comparison by Tukey test was conducted. The results showed that native listeners performed significantly better in the narrow and compound items than the broad items (N4B: po.001, C4B: po.001, NC, n.s.). In contrast, the two non-native groups seem to have a similar pattern of performance; both groups identified the narrow pattern more successfully than the compound and broad (N4B: po.001, N4C: po.001, BC: n.s.). Considered jointly, the main effect of listener group and its interaction with stress pattern indicates that although the advanced learners performed as well as native listeners overall, they shared with beginners a pattern of poor performance on identification of compounds. An overall comparison of mean performance (Fig. 7) shows that the narrow focus pattern was most accurately identified by all three groups. While native listeners tended to perceive compounds better and the broad pattern worst, both groups of Vietnamese tended to identify the broad pattern better and the compound worst. This difference can be attributed to different strategies and acoustic cues used by listeners of the two languages. It is argued that native listeners of English relied on both pitch and duration to distinguish the three stress patterns. They relied on greater pitch change to identify the narrow pattern and on the shortened word duration to distinguish the compound pattern, thus the broad pattern was left most difficult to identify because it is of comparable duration to the narrow-focus pattern but has small pitch change. On the other hand, Vietnamese listeners relied mainly on pitch and did not make use of the duration cue. As a result, they could easily identify the narrow patterns and had some difficulty in identifying the broad pattern but could not properly recognize the compound pattern. Post hoc tests on the interaction between listener group and speakers (Fig. 8) show that there is no significant speaker effect on native and advanced Vietnamese listeners’ performance (native: p ¼ .6 n.s.;

Stress patterns x Listener groups 90 85 80 75 70

Native Advanced Beginner

65 60 55 50 45 40 Broad

Narrow

Compound

Fig. 7. Mean of percentage of correct perception scores by stress patterns and listener groups.

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

182

Listener groups x Speakers 75 70 65 60 55 50 45

Native Advanced Beginner

40 JI

RP

Fig. 8. Mean of percentage of correct perception scores by listener groups and speakers.

advanced: p ¼ .1 n.s.); i.e., items spoken by speaker 1 (J.I.) were as equally identifiable as those spoken by speaker 2 (R.P.). In contrast, there is a significant effect on the beginning listener groups’ performance (po.001), indicating that items spoken by speaker 1.(J.I.) were better perceived by beginning listeners than those spoken by speaker 2 (R.P.). It is surmised that the significant speaker effect for the beginner group is due to differences in pitch range (and correlated intensity) between the two speakers and the Vietnamese listeners’ greater reliance on pitch contrasts to perform the identification task. Examination of the speakers’ average pitch and intensity range between stressed syllables and unstressed syllables shows that speaker 1 (J.I.) has a higher pitch and intensity level and greater pitch and intensity range than speaker 2 (R.P.). The average pitch and intensity range of speaker 1 (J.I.) for the testing items is 42 Hz and 20 db, respectively, while that of speaker 2 (R.P.) is 30 Hz and 17 db. The average pitch and intensity level of speaker 1 (J.I.) is 151 Hz and 86 db on stressed syllables and 109 Hz and 82 db on unstressed syllables, while that of speaker 2 (R.P.) is 115 Hz and 76 db on stressed syllables compared to 85 Hz and 70 db on unstressed syllables, respectively. With pitch height as a cue to tone perception in their native language, Vietnamese are sensitive to pitch contrasts. As a result, items spoken by a speaker with a greater pitch range (accompanied by enhanced intensity) are more easily identified than those spoken by a speaker with a more restricted pitch range. 3.3.3. Analysis of the perceptual error patterns 3.3.3.1. Comparisons among native listener and learner groups. Table 5 shows the percentage of responses for each stress pattern. The column labels indicate the subjects’ choice of test sentence patterns in response to the target contexts, indicated by the row labels. As shown in Table 5, the error patterns differ between native and non-native listeners. Native listeners tended to choose the narrow-focus pattern (23.9%) for compound context. This means that some narrow stress patterns were misperceived as compounds. However, the narrow-focus context was more likely to elicit an erroneous broad focus stress pattern than a compound pattern (15% of broad patterns were chosen to indicate narrow). And vice versa, the broad context was more likely to elicit an erroneous narrow stress pattern than a compound pattern (23.8% of narrow patterns were chosen to indicate broad). This indicates native speakers’ tendency to confuse between the two phrasal patterns, arguably due to the comparable duration between the two patterns and/or possibly a lack of pitch accent contrast between some narrow and broad stress patterns. By contrast, both groups of non-native speakers tended to confuse the broad pattern and the compound pattern. They tended to choose compound patterns for broad context (beginners: 25.4%, advanced: 21.6%) and vice versa, they chose the broad stress patterns for the compound context (beginners: 28.8%, advanced: 33.5%). It is argued that prosodically this confusion can be due to the lack of pitch prominence on the second element of many of the broad pattern stimuli due to the double accent and particularly pitch declination effect

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

183

Table 5 The percentage of responses for each stress pattern across three listener groups Target contexts

Stimulus stress patterns Broad (%)

Beginning listeners (12 items  2 speakers  80 listeners ¼ 1920 items) Broad 51.4 Narrow 14.4 Compound 28.8 Advanced listeners (12 items  2 speakers  20 listeners ¼ 480 items) Broad 59.3 Narrow 8.5 Compound 33.5 Native English listeners (12 items  2 speakers  29 listeners ¼ 696 items) Broad 55.4 Narrow 15 Compound 8.6

Narrow (%)

Compound (%)

23.1 63.8 21.9

25.4 21.7 49

18.9 84.7 15.8

21.6 6.6 50.4

23.8 77.7 23.9

20.6 7.1 67.3

The column labels indicate the subjects’ choice of stimulus stress patterns in response to the target contexts, indicated by the row labels.

at utterance finally which made some broad focus token sound similar to compound tokens. It is noted that the first element of the broad pattern is also accented due to a pre-nuclear accent, thus there needs to be a very sharp rise on the second accented element of the broad pattern so as to make it more prominent but this can be cancelled perceptually in many tokens due to the declination effect. There was certainly duration contrasts between the two patterns but Vietnamese learners were not sensitive to duration cues. Besides, this error pattern might stem from the confusion of the different word order in their native language. Vietnamese and English have contrastive word order in compounds and noun phrases. The word order is adjective-noun in English but it is noun-adjective in Vietnamese. Therefore, it may be this contrastive pattern between the two languages that caused Vietnamese listeners’ confusion and misperception. 3.3.3.2. Poor perceptual discrimination of the native listener group. The relatively poor performance of the native listener group in assigning stress patterns to their appropriate elicitation contexts requires comment. We have previously noted the superior performance of a simple linear discriminator (which obviously makes no use of context) to correctly identify the prosodic class membership of the stimulus tokens. It is also remarkable that high level L2 functioning non-native listeners should perform as well overall as native listeners on a task of phonological pattern recognition. This result calls for critical analysis of the listening task. As one careful reviewer noted, performance on the perceptual task depended not only on the listener’s ability to distinguish the three prosodic contrasts, but to correctly interpret the pragmatic or semantic significance of the context sentence; and the importance of the latter was probably enhanced by the testing method of offering a single context sentence per trial and asking the listener to choose amongst three candidates, the appropriate stress pattern (Compound, Narrow or Broad). A post hoc analysis of the felicity conditions for the C, N and B responses was therefore undertaken, with particular reference to the form and content of the triggering contexts in relation to the target sentences. It is important to consider specifically how the felicity conditions were instantiated lexically and syntactically in the stimulus materials of the experiment because such features may serve as cues for native or non-native response strategies. Non-native listeners may be able to form response strategies on the basis of lexical or formal cues present in the stimulus materials without fully appreciating the pragmatic conditions that render an N B or C reading felicitous, or in some cases, without appreciating ambiguities that render more than one N B or C response possible in context. The felicity condition on the selection of a C target is semantic in nature and critically dependent upon lexical knowledge. A C target is unambiguously signalled when it has a lexical (non-compositional) meaning which is compatible with that of the context sentence and where its phrasal counterpart has a compositional

ARTICLE IN PRESS 184

T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

reading that is incompatible with the context (e.g., Context: ‘y house y for growing plants.’ Target: ¼ ‘hothouse’; * ‘hot house’). Post hoc analysis revealed that eight of the 12 C context–target pairs met this condition but four did not (e.g., Context: ‘ywoman teaches English.’ Target ¼ ‘English-teacher’ but ‘English teacher’ is not ruled out. See also, C items 6, 7 and 8 in Appendix B). Removing these four items from the data analysis did not however substantially change the pattern of results reported in the ANOVAs in Section 3.3.2.1. But lexical familiarity probably did affect non-native listeners’ performance on C targets in relation to native speakers. And this rather than lack of sensitivity to the temporal cue for compound words in English may be a factor in the Vietnamese listeners higher error rates on C targets (see Table 5). An N target is unambiguously signalled, at least in the experimental materials, when it has a compositional reading that countermands some attribution of the subject in the context sentence, expressed by negation and the use of a synonym or antonym of the adjective used in the target sentence (e.g., Context: ‘y house y not cold’ Target ¼ ‘hot house’; * ‘hot-house’). The consistent use of the negative (‘not’ or verb+n’t) was probably a highly salient structural cue signalling an N target for the non-native listeners, even though they may not have fully appreciated the countermanding function of contrastive intonation. It should be noted also that the intonation of the context sentence contained an intonational cue for the N target, which was not present in the case of the B or C targets; namely, the presence of a contrastive pitch accent, which was also signalled orthographically for subjects in the written stimulus materials by use of italics (e.g., ‘This bottle isn’t colored yellow’). The felicity condition for the B or broad-focus phrase is difficult to state with precision, because it is the default reading, which applies when the context demands no particular narrow focus. The broad-focus context sentences that were used in the experimental sentences could not strictly rule out a compound reading of the target. (After all, it may be truly said of most ‘blackberries’ that: ‘This berry is black’.) Nor could the broad focus triggers strictly be said to exclude a narrow-focus target reading, which can still produce a felicitous mini-discourse. (This berry is black. It’s a black berry! (emphatic).) Non-native listeners may not have appreciated these ambiguities, and simply followed a response strategy of choosing the B target whenever the context sentence mirrored the target (e.g., ‘yberry is black’. Target: ¼ ‘black berry’. The C and N targets never occur with ‘mirror contexts’ in the stimulus sentence set). Failure to appreciate the potential ambiguity of the broad focus contexts may explain why the Vietnamese learners performed as well as or better than English native listeners on this condition. Thus, an item analysis of the target sentences and their contexts revealed a complex set of moderating conditions involving the pragmatics of focus assignment and lexical semantic knowledge of English that potentially affected listeners’ responses, as well as their ability to phonetically discriminate amongst the target stress patterns. It also showed that response strategies specific to the properties of the item set used in the experiment may have been formulated by non-native listeners in performing the discrimination task. Clearly, to tease out the operation of these factors would require a series of carefully designed control experiments, including, at least a proportion of distracter items to discourage the development of stimulus set-specific response strategies on the part of listeners. 4. General discussion While the results of the perceptual study are open to alternative interpretations, they are consistent with the findings of the production test in indicating that tonal transfer effects can explain beginner learners’ responses to the three contrastive stress patterns and that as L2 competence in English increases, Vietnamese learners acquire sensitivity to the temporal cues that differentiate (compound) words from otherwise segmentally homophonous phrasal constructions in English. Indeed, it may be argued that the production task provided a clearer window on (inter-language) perceptual representations of the three prosodic patterns than did the results of the perceptual experiment, because the former was uncontaminated by pragmatic and lexical semantic factors that moderated listeners’ responses in the perceptual judgement task. Some may argue, to the contrary, that the production task failed to tap a more abstract level of perceptual representation because the subjects’ task was simply to mimic short context+target sentence pairs, for which long-term phonological representations of the L2 items are not required. But such a line of reasoning has difficulty accounting for the improved pattern-matching in the temporal domain evident in the productions of the advanced learners,

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

185

compared with the beginner group. Further experimentation, with variable time delays or an intervening task between the presentation of the stimulus items and the elicited imitations may provide relevant data on this question. 5. Conclusion and prospectus In summary, the results of the acoustic analysis of the production data provided clear support for what is probably the widely accepted view that a combination of f0 and timing cues serve to distinguish minimal prosodic triplets of compound words and their broad and narrow-focus phrasal counterparts. Compounds as a whole were shorter than phrases, which is argued to be due to a word shortening effect. The narrow-focus noun phrases were marked by an extensive f0 change (mean of 80 Hz from the first to the second constituent). Intensity cues played a supportive role to f0 perturbations in the data set spoken by native speakers (intensity change correlated with f0 change). The results also lend some support for the conventional phonological analysis that compounds and early narrow-focus phrases share the distinguishing property of being prosodically left-headed in contrast to the default or broad-focus phrasal stress pattern. However, the f0 change measurements also indicated, for the maximally contrastive but nevertheless natural stimulus set used in the present experiment that a three-way contrast set (N4C4B) may also be phonetically supported. The magnitude of the mean f0 changes which separate the centroids of the three stress patterns spoken by native speakers (Broad: 10 Hz, Compound: 39 Hz, and Narrow: 80 Hz) on the f0 dimension were clearly in excess of complex tone f0 discrimination limen or f0 limen for categorical tone identification (Gandour, 1978). Particularly, the acoustic analysis of the native English speakers’ data also provides evidence for the claim that compounding involves phrases conforming to temporal and accentual word template in stress accent languages. In terms of f0 cues, generally the two groups of non-native speakers had no problem in manipulating the f0 and intensity contrastive levels on the accent-bearing syllables as a result of positive transfer from lexical tonal pitch. On the timing aspect, only advanced speakers could discriminate compounds from phrases by means of duration contrast, whereas beginners fail to use duration cue because it is not a distinctive tonal feature in Vietnamese. In terms of contrastive relative prominent patterns, beginners also fail to realize the syntagmatic contrasts of accent in larger units such as polysyllabic words or phrases evidenced by their failure to deaccent the second element of the compound and narrow-focus patterns, which has a causal relationship with their failure to compress compounds. That is, their compounds are of comparable duration to phrases due to many effects. Compounds were unreduced as a result of (1) an accentual lengthening effect due to not deaccenting the nouns; and (2) not reducing unstressed syllables. Another contributing transfer effect is the lack of a reliable prosodic difference between compounds and phrases by means of either temporal compression or tonal neutralization in L1 Vietnamese. In addition, this also suggests a transfer effect of the paradigmatic tonal pattern where a lexical tone is preserved for each syllable, indicating a prosodic transfer effect at both phonological and phonetic levels and consistent with Ueyama and Jun (1998)’s findings on different interference effects respective to L1 Japanese and Korean post-focus tonal patterns. The results of this study not only confirm the transfer effect of acoustic prosodic cues (Nguyeˆ˜n & Ingram, 2005; Ueyama, 2000) and L1 phonetic f0 patterns per se (McGory, 1997; Ueyama & Jun, 1998) but also suggest the transfer of functional (phonological) prosodic patterns (i.e., paradigmatic vs. syntagmatic f0 contrast). On the other hand, the advanced speakers’ ability to de-accent the noun in the narrow and compound patterns and to compress the compound words to some extent indicates the effect of language learning/experience on prosodic acquisition. Acknowledgements We would like to thank Prof. Mary Beckman and the anonymous reviewers for their constructive and helpful comments. Thanks to our subjects for their participation, Jeffery Chapman for ToBI transcription of part of the data. The Postdoctoral research fellowship granted to N.T.A.T. by the University of Queensland is gratefully acknowledged.

ARTICLE IN PRESS 186

T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

Appendix A. Three types of test words/phrases 1 syllable+2 syllables

2 syllables+1 syllable

2 syllables+2 syllables

Black berry Blue bottle Gray matter Hot houses

Moving van Butter fish Heavy weight Rubber plant

English teacher Sleeping partner Open classroom Plastic money

Appendix B. List of minimal sets of sentences 1. black berry a. This berry is black. It is a black berry. b. This berry isn’t green. It’s a black berry. c. This is a kind of fruit. It is a black- berry. 2. blue bottle a. This is a bottle which is colored blue. It is a blue bottle. b. This bottle isn’t colored yellow. It’s a blue bottle. c. This kind of jelly fish is quite common here. It is a blue- bottle. 3. English teacher a. This teacher is from England. She is an English teacher. b. This teacher is not from France. She’s an English teacher. c. This woman teaches English. She is an English- teacher. 4. gray matter a. This person has a lot of gray stuff. He has a lot of gray matter. b. This person hasn’t a lot of green matter. He has a lot of green matter. c. This person is very brainy. He has a lot of gray- matter. 5. hot houses a. These houses are very hot. They are hot houses. b. These houses are not cold. They’re hot houses. c. These houses are for growing plants. They are hot- houses. 6. moving van a. He is driving the van. It is a moving van. b. This van is not parked. It’s a moving van. c. This is a van for moving furniture. It is a moving- van. 7. sleeping partner a. This is her partner who is asleep. He is her sleeping partner. b. This is not her partner who is awake. He’s her sleeping partner. c. This is the person she sleeps with. He is her sleeping- partner. 8. heavy weight a. This man is heavy to carry. He’s a heavy weight. b. The man isn’t light to carry. He’s a heavy weight. c. He is the boxer in the heaviest weight group. He’s a heavy weight. 9. butter fish a. This fish is made from butter. It’s a butter fish. b. This fish is not made from flour. It’s a butter fish c. This is a kind of tropical fish. It’s a butter fish. 10. plastic money a. This money is made from plastic. It’s plastic money. b. This is not paper money. It’s plastic money. c. This is a credit card. It’s plastic money.

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

187

11. rubber plant a. This plant is made from rubber. It’s a rubber plant. b. This is not a real plant. It’s a rubber plant. c. This is a kind of tree. It’s a rubber plant. 12. open classroom a. The classroom is open. It’s an open classroom. b. This classroom is not closed. It’s an open classroom. c. There are no rows of desks here. It’s an open classroom. Appendix C. The answer sheet for the perception experiment

Instructions for the test 1. This stocking is blue. It is a blue STOCKing. 2. This stocking is not green. It is a BLUE stocking. 3. This woman is a feminist. She is a blue-stocking. ‘‘blue stocking’’ in the above three sentences have different meanings depending on three different stress patterns. -‘‘blue STOCKing’’ in (1) is a broad-focus noun phrase. The stress is on ‘‘stock’’. It means a stocking which has blue color. -‘‘BLUE stocking’’ in (2) is a narrow-focus noun phrase in which the word ‘‘BLUE’’ is emphasized to show the contrast with ‘‘green’’; it means that the stocking is blue, not green, and thus BLUE is stressed. -‘‘blue-stocking’’ in (3) is a compound noun with a lexical stress on blue. It is a word, not a phrase. It means ‘‘a feminist’’. Notice the three different stress pattern: 0 00 blue STOCKing: 00 0 BLUE stocking 0 blue-stocking – In the following test, you will hear a contextual sentence followed by three different target sentences. Your task is to choose the target sentence with the appropriate stress pattern that fits the meaning of the context sentence. The contextual sentence will be read once and the three target sentences will be read twice. Circle the letter of your choice. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.

This berry is black. This bottle isn’t colored yellow. This person is very brainy. These houses are very hot. This van is not parked. This is a kind of tropical fish. This man is heavy to carry. This is not a real plant. This woman teaches English. This is her partner who is asleep. This is not paper money. There are no rows of desks here. This berry is black. This bottle isn’t colored yellow. This person is very brainy. These houses are very hot. This van is not parked. This is a kind of tropical fish.

A A A A A A A A A A A A A A A A A A

B B B B B B B B B B B B B B B B B B

C C C C C C C C C C C C C C C C C C

ARTICLE IN PRESS 188

19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70.

T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

This man is heavy to carry. This is not a real plant. This woman teaches English. This is her partner who is asleep. This is not paper money. There are no rows of desks here. This berry isn’t green. This kind of jelly fish is common here. This person has a lot of gray stuff. These houses are not cold. This van is for moving furniture. This fish is made from butter. The man isn’t light to carry. This is a kind of tree. This teacher is from England. This is not her partner who is awake. This is a credit card. The classroom is open. This is a kind of fruit. This is a bottle which is colored blue. This person hasn’t a lot of green matter. These houses are for growing plants. He is driving the van. This fish is not made from flour. He is a boxer in the heaviest weight group. This plant is made from rubber. This teacher is not from France. This is the person she sleeps with. This money is made from plastic. This classroom is not closed. This is a kind of fruit. This is a bottle which is colored blue. This person hasn’t a lot of green matter. These houses are for growing plants. He is driving the van. This fish is not made from flour. He is a boxer in the heaviest weight group. This plant is made from rubber. This teacher is not from France. This is the person she sleeps with. This money is made from plastic. This classroom is not closed. This berry isn’t green. This kind of jelly fish is common here. This person has a lot of gray stuff. These houses are not cold. This van is for moving furniture. This fish is made from butter. The man isn’t light to carry. This is a kind of tree. This teacher is from England. This is not her partner who is awake.

A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A

B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B

C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C

ARTICLE IN PRESS T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

71. 72.

This is a credit card. The classroom is open.

A A

189

B B

C C

References Archibald, J. (1998). Second language phonetics, phonology, and typology. Studies in Second Language Acquisition, 20, 189–211. Atkinson-King, K. (1973). Children’s acquisition of phonological stress contrasts. Unpublished doctoral dissertation, University of California, Los Angeles. Barker, C. (1989). Extrametricality, the cycle, and Turkish word stress. In J. Runner (Ed.), Phonology at Santa Cruz. UC Santa Cruz: Syntax Research Center. Bartels, C., & Kingston, J. (1994). Salient pitch cues in the perception of contrastive focus. In P. Bosch, & R. van der Sandt (Eds.), Focus and natural language processing, Proceedings of the journal of semantics conference on focus. IBM working papers, TR-80.94-006. Beckman, M. E. (1986). Stress and non-stress accent. Holland, Dorrecht, The Netherlands: Foris Publications. Beckman, M., & Ayers, G. (1994). Guidelines for ToBI Labelling. Unpublished manuscript. Ohio State University. Version 3. March 1997. Downloadable manuscript. /http://ling.ohio-state.edu/Phonetics/etobi_homepage.htmlS. For information on obtaining by ftp, send e-mail to [email protected] and visit /http://ling.ohio-state.edu/tobi/S]. Beckman, M. E., & Pierrehumbert, J. B. (1986). Intonational structure in Japanese and English. Phonology Yearbook, 3, 255–309. Best, C. T. (1995). A direct realist view of cross language speech perception. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross language research. Baltimore: York Press. Blair, A. D., & Ingram, J. (2003). Learning to predict the phonological structure of English loanwords in Japanese. Applied Intelligence, 19, 101–108. Bolinger, D. L., & Gerstman, L. J. (1957). Disjuncture as a cue to constructs. Word, 13, 246–256. Brunelle, M. (2003). Coarticulation effects in Northern Vietnamese tones. In Proceedings of the 15th international conference of phonetic sciences, Barcelona, 3–9 August. Cassidy, S. (1999). Compiling multi-tiered speech databases into the relational model: Experiments with the Emu System. In Proceedings of the Eurospeech ’99, Budapest, September 1999. Chafe, W. (1974). Language and consciousness. Language, 50, 111–133. Chen, M. Y. (2000). Tone sandhi: Patterns across Chinese dialects. Cambridge, UK: Cambridge University Press. Couper-Kuhlen, E. (1984). A new look at contrastive intonation. In R. Watts, & U. Weidman (Eds.), Modes of interpretation: Essays presented to Ernst Leisi (pp. 137–158). Gunter Narr Verlag. Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics, 11, 51–62. Farnetani, E., & Cosi, P. (1988). English compound versus non-compound noun phrases in discourse: An acoustic and perceptual study. Language and Speech, 31, 157–180. Faure, G., Hirst, D. J., & Chafcouloff, M. (1980). Rhythm in English: Isochronism, pitch, and perceived stress. In L. R. Waugh, & C. H. van Schooneveld (Eds.), The melody of language (pp. 71–79). Baltimore: University of Park Press. Flege, J. E. (1995). Second language speech learning: Theory, findings and problems. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross language research. Baltimore: York Press. Fletcher, J., & Harrington, J. (2001). High-rising terminals and fall–rise tunes in Australian English. Phonetica, 58(4), 215–229. Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. Journal of the Acoustical Society of America, 27, 765–768. Gandour, Jack (1974). On the representation of tone in Siamese. UCLA Working Papers in Phonetics, 27, 118–146. Gandour, J. T. (1978). The perception of tone. In V. A. Fromkin (Ed.), Tone: A linguistic survey (pp. 41–72). New York: Academic Press. Garde, Paul (1965). Accentuation et morphologie. La Linguistique, 2, 24–39. Gsell, R. (1980). Remarques sur la structure de l’ espace tonal en Vietnamien du sud (Parler de Saigon). Cahiers d’etudes Vietnamiennes, 4, 1–26. Universite´ Paris. Hardcastle, W. J. (1968). Stress in Australian English. Unpublished M.A. thesis, University of Queensland. Hoˆ`, Œa˘´c Tu´c (1997). Tonal facilitation of code-switching. Australian Review of Applied Linguistics, 20(2), 129–151. Hoa`ng, Tue:ˆ , & Hoa`ng, Minh (1975). Remarques sur la structure phonologique du Vietnamien. Etudes Vietnamiennes, 40. Hanoi. Ingrisano, D., & Weismer, G. (1979). s-Duration: methodological and linguistic factors. Phonetica, 36, 32–43. Inkelas, S. (1999). Exceptional stress-attracting suffixes in Turkish: Representations vs. the grammar. In W. Zonneveld (Ed.), The prosodymorphology interface (pp. 134–187). Cambridge: Cambridge University Press. Iverson, P., Kuhl, P. K., Akahane-Yamada, R., Diesch, E., Tohkura, Y., Kettermann, A., et al. (2003). A perceptual interference account of acquisition difficulties for non-native phonemes. Cognition, 87, B47–B57. Jackendoff, R. (1972). Semantic interpretation in generative grammar. Cambridge: MIT Press. Jannedy, S. (1997). Acquisition of narrow focus prosody. In Proceedings of the GALA ’97 conference: Language acquisition, knowledge representation & processing. Kubozono, H. (1993). The organization of Japanese prosody. Tokyo: Kurosio Publishers. Kuhl, P. K. (1993). Innate predispositions and the effects of experience in speech perception: The native language magnet theory. In B. de Boysson-Bardies, et al. (Eds.), Developmental neurocognition: Speech and face processing in the first year of life. Dordrecht: Kluwer Academic Publishers.

ARTICLE IN PRESS 190

T.A.-T. Nguyeˆ˜n et al. / Journal of Phonetics 36 (2008) 158–190

LaCharite´, D., & Paradis, C. (2005). Category preservation and proximity versus phonetic approximation in loanword adaptation. Linguistic Inquiry, 36(2), 223–258. Ladd, D. R., & Morton, R. (1997). The perception of intonation emphasis: Continuous or categorical? Journal of Phonetics, 25, 313–342. Ladd, R. D. (1980). The structure of intonational meaning. Bloomington: Indiana University Press. Levi, Susannah V. (2002). Intonation of noun compounds and genitives. In Turkish ninth international phonology meeting on structure and melody, 1–3 November, Vienna. Levi, Susannah V. (2005). Acoustic correlates of lexical accent in Turkish. Journal of the International Phonetic Association, 35, 73–97. McGory, J. T. (1997). Acquisition of intonational prominence in English by Seoul Korean and Mandarin Chinese speakers. Unpublished Ph.D., Ohio State University. Mennen, I. (2004). Bi-directional interference in the intonation of Dutch speakers of Greek. Journal of Phonetics, 32, 543–563. Michaud, A., & Vu, Ngoc Tuaˆn (2004). Glottalized and nonglottalized tones under emphasis: Open quotient curves remain stable, f0 curve is modified. In Bernard, B., & Isabelle, M. (Eds.), Speech prosody—International conference (pp. 745–748), Nara, Japan, March 23–26, ISCA Archive. Nguye˜ˆ n, Œaˇng Lieˆm (1970). A contrastive phonological analysis of English and Vietnamese (Pacific linguistics series, no. 8). Canberra: Australian National University. Nguye˜ˆ n, Œı` nh Ho`a (1980). Language in Vietnamese society. Chicago: University of Illinois Press. Nguyeˆ˜n, Thi Anh Thu’ (2003). Prosodic transfer: The tonal constraints on Vietnamese acquisition of English stress and rhythm. Ph.D. thesis, Australia: University of Queensland. Nguyeˆ˜n, T. A. T., & Ingram, J. (2004). A corpus-based analysis of transfer effects and connected speech processes in Vietnamese English. In Proceedings of the tenth Australian international conference on speech science & technology, Macquarie University, Sydney, 8–10 December. Nguyeˆ˜n, T. A. T., & Ingram, J. (2005). Vietnamese acquisition of English word stress. TESOL Quarterly, 39(2), 309–319. Nguyeˆ˜n, T. A. T., & Ingram, J. C. (2007). Acoustic and perceptual cues for compound–phrasal contrasts in Vietnamese. The Journal of the Acoustical Society of America, 112(3), 1746–1757. Nguye˜ˆ n, V. L., & Edmondson, J. (1997). Tones and voice quality in modern northern Vietnamese: Instrumental case studies. Mon-Khmer Studies, 28, 1–18. Ph: am, Hoa Andrea (2003). Vietnamese tone: A new analysis. New York: Routledge. Pierrehumbert, J., & Hirschberg, J. (1990). The meaning of intonational contours in discourse. In P. Cohen, J. Morgan, & M. Pollack (Eds.), Intentions in communication. Cambridge, MA: MIT Press. Pittam, J., & Ingram, J. (1991). Influence of Vietnamese tone and prosody on the acquisition of English stress patterns. In Proceedings of the second European conference on speech communication and technology meeting. Riney, T. J. (1988). The interlanguage phonology of Vietnamese English. Unpublished Ph.D., Georgetown University. Rossi, M. (1998). Intonation in Italian. In D. Hirst, & A. Di Cristo (Eds.), Intonation systems: A survey of twenty languages. Cambridge: Cambridge University Press. Silverman, D. (1992). Multiple scansions in loanword phonology: Evidence from Cantonese. Phonology, 9, 289–328. Thompson, L. (1987). A Vietnamese reference grammar. Honolulu: University of Hawaii Press. Trubetskoy, N. S. (1939). Grundzuege der phonologie. (Travaux du Cercle linguistique de Prague No. 7.) Prague: Cercle linguistique de Prague. [Translated 1969, by C.A.M. Baltaxe as Principles of phonology. University of California Press.] Ueyama, M. (2000). Prosodic transfer: An acoustic study of L2 English vs. L2 Japanese. Unpublished Ph.D. thesis, University of California Los Angeles. Ueyama, Motoko, & Jun, Sun-Ah (1998). Focus realization in Japanese English and Korean English intonation. Japanese and Korean Linguistics, 7, 629–645. Umeda, N. (1977). Consonant duration in American English. Journal of the Acoustical Society of America, 60, 846–858. Vogel, I., & Raimy, E. (2002). The acquisition of compound vs. phrasal stress: The role of prosodic constituents. Journal of Child Language, 29, 225–250. Vu˜, Thanh Phu’o’ng (1981). The acoustic and perceptual nature of tone in Vietnamese. Unpublished Ph.D. thesis, Australian National University, Canberra. Vu˜, Thanh Phu’o’ng (1982). Phonetic properties of Vietnamese tones across dialects. Papers in South-East Asian Linguistics, 8, 55–76. Willems, N. (1982). English intonation from a Dutch point of view. Dordrecht: Foris Publication.