Stress, lexical focus, and segmental focus in English: patterns of variation in vowel duration

ARTICLE IN PRESS Journal of Phonetics 32 (2004) 493–516 www.elsevier.com/locate/phonetics Stress, lexical focus, and segmental focus in English: pat...

Download PDF

303KB Sizes 0 Downloads 70 Views

Report

PDF Reader
Full Text

ARTICLE IN PRESS

Journal of Phonetics 32 (2004) 493–516 www.elsevier.com/locate/phonetics

Stress, lexical focus, and segmental focus in English: patterns of variation in vowel duration Kenneth de Jong Department of Linguistics, 322 Memorial Hall, Indiana University, Bloomington, IN 47405, USA Received 31 July 2003; received in revised form 22 April 2004; accepted 10 May 2004

Abstract This paper reports a study of the effects of stress and two types of focus on vowel duration and quality in English. Our previous work on Arabic has found that while such factors as phonemic quantity and the voicing of a following consonant affect vowel duration, such effects differ from one another concerning how they interact with stress and focus. Quantity effects are larger with stress and focus, while voicing effects remain about the same across stress and focus conditions. These results suggest that vowel duration differences due to vowel quantity indicate a linguistic contrast, but vowel duration differences due to consonant voicing do not. The current paper tests this interpretation by extending the previous studies of Arabic to a similar corpus in English. This paper ﬁnds durational differences between vowel categories and between vowels preceding voiced and voiceless stops. As in Arabic, stress increases differences due to vowel category. Unlike in Arabic, however, stress also increases vowel duration differences due to voicing of the following consonant. Focus effects are mediated by stress such that increases in durational differences are localized largely in syllables which are primary stressed and accented. Results show that both stress and focus can be used to distinguish contrastive from noncontrastive aspects of speech behavior. r 2004 Elsevier Ltd. All rights reserved. Keywords: Stress; Focus; Quantity; Voicing; Vowel duration

1. Introduction de Jong and Zawaydeh (2002) probed the nature of stress and two different kinds of linguistic focus by examining vowel duration variation in Arabic. Previous work on linguistic stress has found it useful to characterize stress as an increase in articulatory activity aimed at producing E-mail address: [email protected] (K. de Jong). 0095-4470/$ - see front matter r 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.wocn.2004.05.002

ARTICLE IN PRESS 494

K. de Jong / Journal of Phonetics 32 (2004) 493–516

phonologically distinct output (e.g., Tiffany, 1958; Ohman, 1967; Kent & Netsell, 1971; Harris, 1978; Engstrand, 1988; de Jong, 1995; as well as related precursors in descriptions of English as far back as Walker, 1787). de Jong (1995) speciﬁcally deﬁned stress in terms of ‘hyperarticulation’, based on Lindblom (1990), who used this term to describe speech styles which appear to be more ‘output oriented’. In Lindblom’s description, speech production is determined by factors including both system-derived pressures toward economization and factors required for communicating linguistic information to a listener. Hyperarticulation is said to occur when the system-derived pressures become less effective due to speakers attending more to expressing linguistic information. Stress languages, then, are languages in which speakers modulate attention from syllable to syllable, and hence also listeners must modulate attention from syllable to syllable (de Jong, 2000, see also perceptual data in Cole, Jakimik, & Cooper, 1978; Cole & Jakimik, 1980). The deﬁnition of stress as localized hyperarticulation then makes it very similar to ‘focus’, speciﬁcally ‘contrastive focus’ as discussed by Gundel (1999), and ‘emphasis’, e.g., as explored in experimental work by Erickson and colleagues (Erickson, Fujimura, & Pardo, 1998; Erickson, 1998, 2002). In studies that examine focus and emphasis, a discourse structure is invoked in which an uttered item is expected to bear some close relationship to a speciﬁc item in the discourse, often a relationship of minimal semantic or phonological contrast. This relationship, then, attracts the speaker’s attention to that spoken item, and induces them to expend more attention to articulating that item as well as mark that item as something that the listener should attend to. Hence, the description of stress in de Jong (1995) suggested that linguistic focus and linguistic stress should be the same phenomenon, although with some differences. While stress is usually taken to be a property of a particular syllable, focus effects could be global, as in Lindblom’s original use of the term ‘hyperarticulation’, or at the level of the word, or even at the level of the individual segment (e.g., as examined by van Heuven, 1994). In addition to the domain of the effect, another difference between stress and focus is that stress is conventionalized and lexicalized attention modulation, while focus (and emphasis) are actively controlled by the on-going discourse. Regarding stress, we take it to be a feature of the hierarchically organized phonological prosodic conventions in some languages whereby syllables which are heads of prosodic constituents are systematically hyperarticulated in addition to being marked with qualitative features such as pitch accents. Beckman (1986, as well as work presented in Beckman & Pierrehumbert, 1986; Beckman & Edwards, 1994) describes the English system with lexically speciﬁc differences between unstressed and stressed syllables, and a lexical mark indicating the expected location of pitch accents. The actual presence of accent, though, depends on the larger prosodic structure in which the lexical item is found. This system, then, indicates (at least) three levels of stress: unstressed, stressed but not accented (which we will call ‘secondary stress’), and stressed and accented (which we will call ‘primary stress’). In addition to these three, there are also differences in stress between syllables which are heads of intonational phrases (‘nuclear accented’) and those which are not. When an item is focused (or emphasized), the phonological conventions, then, indicate how to place the focused item in a prosodically strong position, usually the nuclear accented position. Hence, many of the effects of focusing may be general correlates of nuclear accenting. However, the Arabic data in de Jong and Zawaydeh (2002) show that there are additional effects of focusing besides what is common to all prosodic heads; there were systematic differences between

ARTICLE IN PRESS K. de Jong / Journal of Phonetics 32 (2004) 493–516

495

contrastively focused nuclear accented items and nuclear accented items which are not contrastively focused. If stress, a correlate of being a prosodic head, is best characterized as hyperarticulation of particular syllables, we would expect the additional focus effects to be the same as the effects of stress. Central to characterizing stress (and focus) as attention modulation is the notion of linguistic information. Within the signal, various attributes must be due to systemic considerations in production, while others must be due to the encoding of linguistic information. Stress, like global hyperarticulation, should expand the latter, leaving or even reducing the former. To examine stress and focus, de Jong and Zawaydeh (2002, and also earlier results reported in Zawaydeh & de Jong, 1999) measured vowel durations in vowels of different phonemic quantity and in vowels preceding voiced and voiceless stops. The term quantity refers to the systematic use within a language of vowel duration to indicate lexical contrasts. Quantity differences in duration, by deﬁnition, are effects of encoded linguistic information. The voicing effects on vowel duration, however, have been claimed not to be encoding effects, but rather to be due to some systemic effect in the process of encoding the voicing contrast in the consonant. Hence, while such differences are available in the signal for perceivers, speakers do not rely on them for expressing voicing contrasts. Mitleb (1984a, b) similarly claimed that vowel lengthening due to consonant voicing was not part of the phonological structure of Arabic. His claim was based on failures to ﬁnd signiﬁcant consonant voicing effects on Arabic vowel duration (Flege & Port, 1981; Mitleb, 1984a), and ﬁndings of very consistent glottal pulsation in voiced consonant closures (Mitleb, 1984b). This claim echoed similar claims by Zimmerman and Sapon (1958) and Chen (1970) that some languages use vowel duration effects for voicing contrasts and others do not. Arabic differs from English in such a way that when Arabic speakers acquire English, they increasingly differentiate vowel durations before voiced and voiceless obstruents. As Arabic speakers become more ﬂuent, their vowels before voiced consonants become longer relative to their vowels before voiceless consonants (Flege & Port, 1981; Munro, 1993). de Jong and Zawaydeh (2002) reasoned that stress and focus should expand the linguistically speciﬁed quantity differences but have no effect on or eliminate voicing differences, which is exactly what they found. While stress increased quantity differences, the difference between phonemically long and short vowels was larger in stressed syllables; the voicing effects were the same in stressed and unstressed locations. These patterns differed from those found in de Jong (1991) for English speakers. This articulatory study examined stress and three other factors in vowel duration, postvocalic voicing, postvocalic continuancy, and compensatory shortening. Here, just as voiceless stops in Arabic shorten previous vowels, English voiceless stops shortened previous vowels. In addition, vowels before stops are shorter than vowels before fricatives, and vowels before consonant clusters are shorter than ones before singleton codas. Like the Arabic voicing effect, the latter two durational effects were not affected by stress. Unlike the Arabic voicing effect, stress increased the size of the English voicing effect. Putting these results together with the description of stress as local hyperarticulation indicates that, as claimed by Chen (1970) and Mitleb (1984a, b), the durational effects are speciﬁed as part of the English voicing contrast, but are not part of the Arabic voicing contrast. de Jong and Zawaydeh’s (2002) results for focus were mixed. Focus was elicited by having the speaker produce forms in which an imagined listener mis-heard a target item. There were two sorts of focus elicited in this manner, lexical focus, in which the speaker corrects a semantically

ARTICLE IN PRESS 496

K. de Jong / Journal of Phonetics 32 (2004) 493–516

related item, and (following van Heuven, 1994) phonological focus, in which the speaker corrects a phonologically related item. In this latter task, speakers were asked sometimes to correct an error in vowel quantity and sometimes in consonant voicing. Speakers in the lexical focus condition did not modify vowel duration at all, but rather tended to increase the ﬁrst formant, suggesting a general strategy of increasing vocal output, rather than more targeted speciﬁc changes in vowel duration. Patterns for phonological focus, however, were similar to those found for stress. Phonological focus on voicing did not increase the size of voicing effects, while phonological focus on quantity did increase the size of quantity effects. One difference, however, between stress and phonological focus is that the stress effects were much more stereotypical across speakers. The overall picture that arose, then, is that stress is indeed like focus in that it involves a shift toward productions which increase the size of phonologically opposed differences in the signal. However, stress also differed from focus in two ways: (1) Phonological oppositions were more sensitive to stress and phonological focus than to lexical focus. Lexical focus affected segmental material alike, regardless of the phonemic content of the material. (2) Stress was much more consistent in its effects on duration and vowel quality across the speakers. Speakers showed different strategies how they implemented both kinds of focus. There are several potential problems with these interpretations, however. The ﬁrst is that the elicitation techniques and corpus used in de Jong (1991) were very different than those in de Jong and Zawaydeh (2002). The most problematic of these is that stress location was moved by choosing different lexical items in de Jong and Zawaydeh (2002), while it was moved by manipulating contrastive focus in de Jong (1991). Because the study of English conﬂated focus and stress, it is impossible to tell if the durational effects are due to focus or to stress. In addition, the elicitation technique used in de Jong (1991) had speakers produce a corpus consisting entirely of minimally contrasting monosyllables produced in a semantically unrelated frame. This elicitation technique probably directed attention to the phonemic differences between the various types of monosyllables. Hence, it may be that a stress interaction was found for the English speakers and not the Arabic speakers because the production task implicitly involved much more segmental focus in the English case. Second, while quantity, stress, and voicing all affect vowel duration, these factors very likely do so with different dynamic mechanisms. Summers (1987) and de Jong (1991) showed that the dynamics of stress-induced vowel lengthening and voicing-induced vowel lengthening are quite different. Stress lengthening tends to extend the duration of the middle portion of the vowel as well as expand the size of the formant excursions, while voicing lengthening tends to lengthen the latter portion of the vowel in a manner which suggests as slower consonant gesture for the voiced consonants. Later articulatory studies (de Jong, 1991, 1995; Edwards, Beckman, & Fletcher, 1991; Harrington, Fletcher, & Roberts, 1995) suggested that this expansion of the middle portion of the stressed vowel is due to gestural rephasing such that the consonant closing gestures for the coda consonants are initiated at a later time relative to the opening gesture for the vowel. A common side effect of this rephasing is to eliminate undershoot (or ‘truncation’, c.f. Lindblom, 1963), an effect where a gesture is reduced in magnitude due to a subsequent contrary gesture arresting its movement before it approximates its target. In addition, it is possible that speakers may have more extreme targeting of the opening gesture with stress. de Jong and Zawaydeh (2002) found, similarly, that the lengthening due to consonant voicing and lengthening due to vowel quantity

ARTICLE IN PRESS K. de Jong / Journal of Phonetics 32 (2004) 493–516

497

were opposite with respect to their subsidiary effects on F1: With the low vowels in the study,1 both quantity and stress induced a higher F1 correlated with the vowel duration. For voicing, however, F1’s in the longer vowels preceding voiced stops were actually lower than in the shorter vowels preceding voiceless stops. If the mechanism of stress is localized in rephasing and retargeting of the vocalic opening and closing gestures, it is possible that this effect would not interact with a voicing effect which is largely localized in the closing gesture. It is possible, then, that Arabic voicing and quantity interacted with stress differently because they involved different sorts of dynamic effects. Third, besides differing in whether stress and focus interacted with them, the voicing and quantity differences in Arabic were very different in size. Quantity differences were much larger than voicing differences. Since there was such a difference in scale, it is possible that there were interactions between stress and focus, and both quantity and voicing in the Arabic data, but since the voicing effects were very small, they were swamped by other uncontrolled factors. To examine these possibilities, the current study seeks to more directly replicate de Jong and Zawaydeh (2002) with English speakers. If the interaction effects found there were due to differences in elicitation or general dynamical differences between voicing and quantity effects, one would expect the patterns found for Arabic speakers to also be found in English speakers subjected to the same experimental protocol. However, if the interactions with stress and focus are due to speciﬁcational differences between quantity and voicing with respect to vowel duration, we would expect the English speakers to implement stress and focus by expanding the difference in vowel duration due to voicing. A similar argument might be made concerning quantity, however, with the reverse prediction. English and Arabic durational differences between vowels are quite different phonologically. While Arabic quantity is very regular across the vowel inventory and grammatically independent of vowel quality, English differences in duration are not systematically orthogonal to quality differences. Historic quantity differences that gave rise to the tense/lax contrast in English have broken down in several parts of the vowel system and have taken on fairly large quality differences. (See work by Bohn & Flege, 1990, for a cross-language comparison of English with High German in this regard.) In many American English dialects, /æ/ has moved up in the vowel space, so that its quality is similar to that of /e/ (Hillenbrand, Gekky, Clark, & Wheeler, 1995). Thus, the two nonhigh front lax vowels are very similar in quality, but /æ/ is typically much longer in duration than its neighbor /e/. In fact, the vowel /æ/, despite its phonological classiﬁcation as a lax (historically short) vowel, is typically one of the longest vowels in the American English inventory (Peterson & Lehiste, 1960; Hillenbrand et al., 1995). If stress or focus effects only operate on systematic aspects of the phonological system, then one might not expect to ﬁnd stress and focus affecting these durational differences between the American English vowels. Finally, concerning the relationship between focus and stress, if stress is related to focus as a conventionalization of the same physical effect (localized hyperarticulation), the language should not matter. Hence, in the same elicitation situations, both Arabic and English speakers should exhibit the same patterns. Where stress increases differences between long and short vowels, or between vowels before voiced and voiceless consonants, phonological focus should do the same. 1

Both de Jong (1995) and Erickson (2002) ﬁnd different effects of stress and emphasis for high vowels from low vowels, whereby F1 will often be decreased in more stressed or emphasized production.

ARTICLE IN PRESS 498

K. de Jong / Journal of Phonetics 32 (2004) 493–516

Based on the Arabic results, lexical focus should be implemented in a stress-independent fashion, affecting stressed and unstressed syllables alike. If English speakers do not behave as do Arabic speakers with respect to stress and focus, this would indicate that some aspect of either stress or focus (or both) is not the same across the two languages, and would call into question the general notion that stress and focus are independently deﬁnable properties which can be generalized across all stress languages.

2. Methods 2.1. Subjects Five native speakers of American English participated in the experiment. Three were females (F1–F3) and two were males (M1 and M2), all in their early 30s at the time of recording. All spoke some mid-western variety, F1 being native to western Ohio, F2 to Chicago, F3 to Indianapolis, M1 to southern Indiana, and M2 to Wisconsin. All of the speakers have lived in various places around the country and impressionistically none spoke with a marked regional (Northern or Southern) accent, and had similar vowel qualities in /e/ and /æ/. Speakers F3 and M2 were graduate students in linguistics who were unaware of the purposes of the experiment (as checked with a posttest interview), and the other speakers had no training in linguistics. 2.2. Materials The experiment involved reading a corpus of words presented in large-font lists. Procedures were modeled as exactly as possible on those used in de Jong and Zawaydeh (2002). Target words are given in Table 1. Each word contained one of four target segment sequences composed of /b/, a nonhigh front lax vowel, and a voiced or voiceless coronal stop. /e/ and /æ/ were chosen to be similar to the Arabic low vowels from de Jong and Zawaydeh (2002), while still allowing for two vowels with typically different intrinsic duration. These four target sequences either (1) comprised monosyllabic words and hence were primary stressed and could be accented depending on the Table 1 Corpus design Condition

Target Word

Semantically contrasting foil

Primary voiced /e/ Primary voiceless /e/ Primary voiced /æ/ Primary voiceless /æ/ Second voiced /e/ Second voiceless /e/ Second voiced /æ/ Second voiceless /æ/ Unstressed voiced Unstressed voiceless

bed bet bad bat ﬂower bed sports bet Islamabad baseball bat rabid rabbit

chair invest good hummingbird window sill lottery ticket Karachi catcher’s mitt excited squirrel

ARTICLE IN PRESS K. de Jong / Journal of Phonetics 32 (2004) 493–516

499

discourse, (2) were in the second part of a compound,2 and hence were secondary stressed and were systematically not accented, or (3) were unstressed. Since vowel identities in unstressed syllables are largely or entirely neutralized, the vowel identity contrast did not appear in the unstressed condition, leaving a total of 10 target words. It should be noted that this English corpus, then, differs from the previous Arabic corpus in a number of respects. Arabic words are generally multisyllabic, while the English noun lexicon is largely monosyllabic. Hence, in the English corpus to get secondary stress (stressed but systematically not accented) along with the segmental restrictions on the target syllable generally requires compounding, while obviously analogous structures are not available in Arabic. Also, due to the Arabic stress rule which places stress internal to words, the voiced and voiceless consonants in the Arabic corpus are word internal, and are likely syllabiﬁed with a following vowel. Exactly parallel English consonants to those in the Arabic corpus would introduce an additional problem of factoring out the English ﬂapping rule which partially or completely neutralizes the voicing contrast. Hence, we chose to use monosyllabic target words in the English corpus preceding a consonant initial word in the frame. What this means is that the ﬁnal target consonants are word ﬁnal, and hence would be syllabiﬁed with the previous vowel. The elicitation sequence followed that used in de Jong and Zawaydeh (2002). In the ﬁrst stage of the experiment, four repetitions of each target word were randomized in blocks and presented to the speakers in isolation. Following this, the speakers were presented with the words embedded in sentence frames, ﬁrst, a set of postnuclear frames, and then a set of Lexical Focus frames. In the postnuclear frames, speakers focused on an item in the frame preceding the target, as with the word he in (1) (capitalization here indicates focus). In these cases, speakers consistently placed nuclear accent on he, and hence none of the target syllables bore an accent: (1) HE said bat, not her: In the Lexical Focus frame, speakers focused on the target word, as in (2). Here, the speakers consistently put a rising (L+H) accent on the target syllable. Hence, in the Lexical Focus condition, primary stressed target syllables (such as bat) were accented, while secondary stressed targets (such as bat in baseball bat) were not: (2) He said BAT, not hummingbird: In each Lexical Focus case, the contrasting foils had no phonological similarity to the targets. The contrasting foils are given in the right column in Table 1. The second stage involved reading the same corpus of words embedded in sentences in which the subjects corrected a listener who mis-heard a contrasting segment in the target word. In these sentences, each target word was paired with a word minimally contrasting in voicing or in vowel identity, as in the sentences in (3): (3) He said BAT, not bad. (Phonological Focus on Voicing). He said BAT, not bet. (Phonological Focus on Vowel Identity): Since there were no minimally contrasting forms with the compounds, the secondary stress condition was omitted from this part of the experiment. Rabid and rabbit, however, were paired with one another for the voicing condition. The explicit phonological focus recordings were done last in order to avoid making the subjects inordinately aware of the minimal pairs in the corpus until we were eliciting phonological focus. It should again be noticed that the Arabic corpus in de Jong and Zawaydeh (2002) did not 2

Since the data gathering for this experiment preceded the domestic political attention to northern Pakistan, I had assumed (wrongly) that Islamabad would be pronounced with a front lax vowel in the last syllable. This was uniformly not the case, so these items systematically had a more back quality than the other items in the corpus. Vowel durations, however, were similar to those of the front vowel.

ARTICLE IN PRESS 500

K. de Jong / Journal of Phonetics 32 (2004) 493–516

allow for the neat addition of the phonological focus block to the words used in the ﬁrst stage. Since the Arabic corpus is largely multisyllabic, it was much more difﬁcult to ﬁnd minimally contrasting forms to use as foils in the phonological focus condition. Thus, the Arabic corpus required additional lexical items for the second stage of the experiment. In all, there were 40 sentences in each block. In the ﬁrst three blocks, there were 10 target words four repetitions. In the last block, there were four primary stressed target words two different contrasting items, plus two unstressed target words, for a total of 10 combinations, four repetitions. The total number of tokens for each subject, then, was 160. Of these, 18 tokens for subject F1 were unusable due to problems in the recording environment. The subjects were recorded on a portable Marantz cassette recorder in quiet rooms, and recordings were analyzed using WAVESþ implemented on a Sparc 5 in the Linguistic Speech Lab at Indiana University. Vowel durations of the target syllables were measured from waveform displays and aligned 300 Hz bandwidth spectrograms. Vowel durations were measured from onset of release burst to onset of closure, as deﬁned by cessation of higher frequency energy often accompanied by a sudden drop in F1: In addition, to examine possible differences in vowel quality, the ﬁrst and second formant frequencies were estimated at the midpoint of the target syllable’s vowel, by means of a 13th order LPC analysis. Estimated values were checked visually overlaid on spectrograms for formant tracking errors. 2.3. Hypotheses and statistical analyses Statistical and graphical techniques were designed to match those used in de Jong and Zawaydeh (2002) as closely as possible, although the corpora are different enough that merging of the two databases for a uniﬁed statistical treatment was not feasible. Analyses of vowel duration were divided into two parts. The ﬁrst examined the effects of stress and lexical focus using data from the ﬁrst stage of the experiment, and the second compared phonological focus data with the lexical focus data. Analyses omitted words produced in isolation, since, just as was found for the Arabic data in de Jong and Zawaydeh (2002), the speakers varied considerably in how they rendered the isolated forms. Target words embedded in phrases were much more homogeneous across the speakers. Since the focus of the current study is to determine what aspects of speaker behavior generalize to the group of speakers of the language, each analysis consists of a repeated measures ANOVA. A repeated measures analysis is a very conservative estimate of signiﬁcance, since any proportionate difference between the subjects, due to overall tempo differences or vocal tract size, counts against an effect. Fixed factor analyses were also run. The patterns revealed by these ﬁxed factor analyses paralleled those of the repeated measures analyses; in general, effects signiﬁcant at the 0.10 level in the repeated measures analyses were signiﬁcant at the 0.001 level or lower in the ﬁxed effects analyses, except where noted below. For simplicity, reports of the current results will be restricted to the repeated measures analyses. There were two analyses for the ﬁrst stage. The ﬁrst examined the effect of lexical focus, stress, and voicing, and the second examined the effect of lexical focus, stress, and vowel identity. Based on previous studies, we expect signiﬁcant main effects of all of these factors on vowel duration. Interactions between these variables are the point of inquiry. Assuming stress expands phonemically speciﬁed differences, and English differs from Arabic in systematically specifying

ARTICLE IN PRESS K. de Jong / Journal of Phonetics 32 (2004) 493–516

501

duration as part of voicing but not vowel identity, one would expect interactions between stress and voicing, but not between stress and vowel identity. Arabic results did not indicate an interaction between focus and stress, focus effects being distributed across stressed and unstressed material, and focus did not interact with either voicing or quantity in the Arabic experiment. Assuming that stress and focus are both uniform across the two languages, one expects to replicate these results with the English speakers. There were also two analyses for the second stage. Here, stress can be included as a factor only for an analysis of voicing, since vowel identity was not varied for the unstressed vowels, and minimal pairs did not exist for the secondarily stressed vowels. Hence, the ﬁrst analysis includes stress, focus type (lexical vs. phonological focus on voicing), and voicing. The second analysis removes the unstressed forms and includes focus type (phonological focus on voicing vs. phonological focus on vowel quantity), voicing, and vowel quantity. As with stress in the ﬁrst analysis, we expect focus to interact with voicing, increasing the size of the voicing effect under phonological focus. We do not expect such an interaction for the vowel identity analysis. Also, we do not expect stress and focus to interact. In order to examine how similar the various durational effects in the current subjects are to the effects for Arabic speakers, a parallel set of analyses were performed on F1 and F2 values. Based on the Arabic results, we expect stress and lexical focus to increase F1; and (if anything) voicing to decrease it. In general, durational increases due to stress and phonological focus should be correlated with increases in F1; while lexical focus would not increase duration if the same pattern of results is obtained as found in de Jong and Zawaydeh (2002). While stress and focus did not affect F2 in the previous Arabic study, one might expect F2’s to be higher with increased stress and focus, since the English target vowels contrast with back vowels while Arabic has no phonemic contrast in backness with low vowels. In addition, considering that /e/ and /æ/ in the current experiment may differ in quality, we might expect the effect of stress and focus on vowel identity to expand the difference in quality between the two vowels.

3. Results I: stress and lexical focus An ANOVA with focus condition (postnuclear, focused), stress (primary, secondary, or unstressed), and voicing (voiced, voiceless) as independent factors indicates signiﬁcant main effects of condition, stress, and voicing (condition, F(1,4)=29.11, po0:01; stress, F(2,4)=143.21, po0:001; voicing, F(1,4)=8.04, po0:05). Post hoc Fisher’s PLSC t-tests reveal a signiﬁcant difference between all three stress levels. Postnuclear tokens were on average 25 ms (approximately 12%) shorter than the others. The stress effect consisted of an approximately 40 ms difference (approximately 15%) between primary and secondary stressed vowels, and unstressed vowels were less than half of the duration of stressed vowels. Compared with the previous Arabic results, the size of the primary to secondary difference is larger in absolute milliseconds than for Arabic, although the Arabic vowels tended to be considerably shorter overall. Hence, to a broad approximation, the difference between primary and secondary stress corresponds more closely in magnitude to the stress differences found for Arabic than does the unstressed to stressed difference. The voicing effect on average was approximately 35 ms, somewhat larger than what was found for Arabic.

ARTICLE IN PRESS K. de Jong / Journal of Phonetics 32 (2004) 493–516

502

Turning to the interactions involving voicing, the stress x voicing interaction was quite large (F(4,4)=25.66, po0:01): stress increases the size of the voicing effect. There was also a condition X voicing interaction (F(3,4)=8.14, po0:05). These interactions are plotted in Fig. 1. For stress (left panel), the interaction is apparent at each level; there is virtually no voicing effect in unstressed vowels, and primary stressed vowels have a larger difference than do secondary stressed vowels. The same pattern is found for the focus condition interaction: as the right panel of Fig. 1 shows, lexical focus increases the voicing effect. The analysis including vowel identity instead of voicing shows similar effects. Main effects of focus condition, stress, and vowel identity were all signiﬁcant (focus condition, F(1,4)=27.39; stress, F(1,4)=56.81; vowel identity, F(1,4)=54.54, all po0:01). The vowel identity difference was more than double the voicing effect, on the order of 75 ms, an increase of approximately 40%. There was also a signiﬁcant interaction between stress and vowel identity (F(1,4)=23.91, po0:01), but the focus interaction was only marginally signiﬁcant (F(1,4)=5.29, po0:10). These effects are plotted in Fig. 2 for comparison with those plotted in Fig. 1. The direction of the effects are the same, larger differences with more stress and focus; however, the interactions are less extreme. While the vowel differences are very large under all stress and focus conditions, the voicing differences depend considerably on stress, and even at their largest are smaller than the vowel identity differences. Contrary to what is predicted from previous Arabic results, there were also interactions between stress and focus condition in both analyses (voicing analysis: F(2,4)=7.72, po0:05; vowel identity analysis: F(1,4)=6.84, po0:10), as well as three way interactions (voicing analysis, F(2,4)=7.42; vowel identity analysis, F(1,4)=11.77, both po0:05). The three way interactions are illustrated in Fig. 3 with data taken from the voicing analysis (left) and from the vowel identity analysis (right). What accounts for a large part of the voicing interaction is the exceptionally large difference between vowels preceding voiced and voiceless tokens found in the primary stressed syllable of Voiceless Voiced

Vowel Duration (ms)

250

200

150

100 unstressed

secondary

primary

post-nuclear

focused

Fig. 1. Average vowel durations for vowels before voiced and voiceless stops, divided by stress (left panel) and by focus condition (right panel). Error bars indicate standard errors.

ARTICLE IN PRESS K. de Jong / Journal of Phonetics 32 (2004) 493–516

503

/ε/ /æ/

Vowel Duration (ms)

250

200

150

100 secondary

primary

post-nuclear

focused

Fig. 2. Average vowel durations for /æ/ and /e/ in primary and secondary stressed syllables (left panel) and with and without lexical focus (right panel). Error bars (which are hidden behind the symbols) indicate standard errors.

Primary Stressed

Secondary Stressed

Unstressed

{ { {

Voiced

/æ/

Voiceless

/ε/

Voiced

/æ/

Voiceless

/ε/

Voiced Voiceless

Vowel Duration (ms)

300 250 200 150 100 post-nuclear

focused

post-nuclear

focused

Fig. 3. Average vowel durations for vowels before voiced and voiceless stops (left panel) and for /æ/ and /e/ (right panel). Plotted here are interactions between focus condition (x-axes) and stress (symbol size). Error bars indicate standard errors.

focused items (the largest circle and square in the right column of the left panel). The focus effect enhances the voicing distinction mostly in primary stressed (and hence, accented) syllables. This observation suggests that the voicing interaction is not due to lexical focus, but is due to the

ARTICLE IN PRESS 504

K. de Jong / Journal of Phonetics 32 (2004) 493–516

presence of an accent on the target syllable in the lexical focus condition. The lexically focused items bear nuclear accent on the primary stressed syllable, while the postnuclear items do not. However, this explanation is not entirely complete. First, one sees the stress effect on the voicing difference between secondary stressed and unstressed syllables in the postnuclear condition (small and medium-sized symbols at the left side of the left panel). Although secondary stressed syllables do not bear a pitch accent, they have larger voicing effects than do unstressed even in the absence of accent. This impression is conﬁrmed in a two-way post hoc ANOVA of nonprimary postnuclear tokens in which stress interacted with voicing (F(1,4)=28.59, po0:01). Second, there is also a focus effect on the voicing difference in the absence of primary stress (and hence in the absence of accent). The voicing effect is nonexistent (actually 6 ms in the wrong direction) in the unstressed and unaccented (postnuclear) tokens, but appears (in the right direction) with focus. A two-way post hoc ANOVA of the unstressed vowels showed a marginally signiﬁcant interaction between focus and voicing (F(1,4)=5.96, po0:10; the effect is strongly signiﬁcant in the ﬁxed factor analysis). The vowel identity interaction is similar in that the biggest difference between the vowels is in primary stressed and focused syllables (right panel, difference between the large symbols in the right column). However, vowel identity does seem different from voicing in that focus does not appreciably increase the vowel difference without primary stress. (The lines connecting the smaller symbols in the right panel are very nearly parallel.) This impression was also conﬁrmed by a two-way post hoc ANOVA which showed no interaction between focus and vowel identity (F(1,4)=0.006, ns, nor was this effect signiﬁcant in the ﬁxed factor analyses). To summarize the current results, as predicted, stress affects the size of voicing effects. What was not predicted was a parallel effect on the size of vowel identity differences. Unlike de Jong and Zawaydeh’s (2002) results with Arabic speakers, lexical focus behaves very similarly to stress. Concerning focus and stress, the lexical focus effects here differ from those in de Jong and Zawaydeh (2002); it not only has parallel effects to those of stress but also these effects interact with stress such that the focus effects are nearly exclusively found in primary stressed, accented syllables.

4. Results II: phonological focus In order to determine the effect of lexical vs. phonological focus on segmental contrasts, the phonologically focused tokens were extracted along with their lexically focused counterparts. The ideal analysis would include stress, phonological focus, voicing, and vowel identity in a single analysis. However, only the voicing contrast is available in the unstressed syllables. Thus, two analyses were run. The ﬁrst analysis involves voicing only and includes both primary stressed and unstressed tokens. The second analysis eliminates the unstressed tokens, and includes vowel identity as a factor. 4.1. Phonological vs. lexical focus, stress, and voicing The ﬁrst analysis ﬁnds, like the results reported above, main effects of voicing (F(1,4)=18.33, po0:05) and stress (F(1,4)=96.12, po0:01), as well as a robust interaction of stress and voicing in

ARTICLE IN PRESS K. de Jong / Journal of Phonetics 32 (2004) 493–516

505

the expected direction (F(1,4)=41.23, po0:01). Here, the focus condition factor (lexical vs. phonological focus on voicing) was almost nonexistent (F(1,4)=0.018). However, focus condition did interact with voicing (F(1,4)=7.94, po0:05). On average, the voicing effect is larger with phonological focus on voicing. The stress x focus interaction and three-way interaction were not signiﬁcant (stress x focus, F(1,4)=1.37; three-way, F(1,4)=3.35). The three-way interaction is plotted in Fig. 4, showing that the size of the voicing x focus interaction is particularly large in the unstressed syllables, and is not apparent in the primary stressed syllables. However, it is the twoway interaction between voicing and focus condition which is statistically signiﬁcant. What Fig. 4 does not show is that the subjects are remarkably similar in increasing the voicing effect with phonological focus. The superﬁcially larger three way interaction, however, was not as consistent across subjects. Fig. 5 illustrates the largest part of the subject differences, plotting the vowel durations for each of the unstressed tokens (rabid and rabbit) against the ﬁrst formant for two of the subjects. The subject to the left shows very large effects of focus on vowel durations; the durations often double with focus, becoming similar to the typical values for stressed syllables. Along with this large increase in duration, there is a difference between voiced and voiceless tokens (circle and triangle), such that vowels before /d/ (ﬁlled circles) are especially long. The subject to the right, by contrast,

Primary Stressed

Unstressed

Voiced

{ {

Voiceless Voiced Voiceless

Vowel duration (ms)

300

250

200

150

100

lexical

phonological

Focus Condition

Fig. 4. Average vowel durations for vowels before voiced and voiceless stops in primary stressed and unstressed positions (indicated by symbol size), with lexical focus and with phonological focus on the voicing contrast. Error bars indicate standard errors.

ARTICLE IN PRESS K. de Jong / Journal of Phonetics 32 (2004) 493–516

506

Subject M1

Subject F1

600

F1 (Hz)

550 500 450 400 350

50

100

150

200

250 50

Vowel Duration (ms) rabid rabbit

100

150

200

250

Vowel Duration (ms)

Filled = Phonological focus on voicing Hollow = Lexical focus

Fig. 5. First formant estimates plotted against vowel duration for two subjects tokens of unstressed vowels before voiced and voiceless stops.

shows very little effect of focus on durations at all. Impressionistically, the reason for the difference is that subject M1 often shifted accent onto an unstressed syllable when required to disambiguate the voiced and voiceless consonants. Hence, most of the ﬁlled tokens in the left panel of Fig. 5 are actually stressed and accented. 4.2. Phonological focus on vowel identity and voicing In order to most directly compare the effect of focus on the voicing contrast and focus on the vowel identity, the lexical focus and unstressed tokens were removed, and a three-way ANOVA was conducted with focus (focus on voicing vs. focus on vowel identity), vowel identity, and voicing as factors. As with results reported above, there are strong main effects of voicing (F(1,4)=55.08, po0:01) and vowel identity (F(1,3)=44.65, po0:01), again in the expected directions. The two effects did not interact (F(1,4)=4.28, p > 0:10). Of more interest here are effects of focus condition. There is no overall effect of focus condition on vowel duration (F(1,3)=0.07), so the focused vowels were of roughly the same duration regardless whether the focus is on voicing or vowel identity. There are, however, signiﬁcant interactions between focus condition and voicing (F(1,4)=8.77, po0:05), and between focus condition and vowel identity (F(1,4)=8.93, po0:05). These interactions are plotted in Fig. 6. The focus effect on vowel identity differences (right panel) is as expected; when speakers focus on vowel identity, long vowels get longer and short vowels get shorter. The results for voicing (left panel), however, are exactly opposite to what is expected; focus on voicing makes the difference between voiced and voiceless forms smaller. Both of these interactions are quite small, but are very consistent across the speakers. The three way interaction is not signiﬁcant (F(1,4)=0.97, po0:10).

ARTICLE IN PRESS K. de Jong / Journal of Phonetics 32 (2004) 493–516

Voiced

/æ/

Voiceless

/ε/

507

Vowel Duration (ms)

350

300

250

200 Focus on ...

vowel identity

voicing

voicing

vowel identity

Fig. 6. Average vowel durations for vowels before voiced and voiceless stops (left panel) and for /æ/ and /e/ (right panel) in conditions where the subject is disambiguating vowel identity or consonant voicing. Error bars (which are hidden by the symbols) indicate standard errors.

To summarize the results for phonological focus, then, the pattern with respect to vowel identity is roughly the same as that found in Arabic for quantity. When speakers phonologically focus on the vowel identity, the long /æ/ gets longer and the short /e/ gets shorter. However, the voicing effect is more complicated. An analysis with unstressed tokens ﬁnds that speakers tend to increase the voicing effect when focusing voicing contrasts. However, comparing what speakers do when they focus on voicing to what they do when they focus on vowel identity shows that the voicing difference is even larger when the speakers focus on vowel identity.

5. Results III: vowel quality Paralleling the analyses in Section 3, a three-way ANOVA of F1’s and F2’s in postnuclear and lexically focused tokens with focus, stress, and voicing as factors ﬁnds signiﬁcant effects of stress and voicing on F1 (stress, F(2,4)=77.50, po0:01; voicing, F(1,4)=9.28, po0:05), and an effect of focus on F2 (F(1,4)=10.90, po0:05). There were also trends (effects where po0:10) of focus on F1 (F(1,4)=4.49) and of stress on F2 (F(2,4)=6.64). The direction of F1 changes were the same as found for the Arabic speakers in de Jong and Zawaydeh (2002); greater stress and lexical focus increased F1, while the longer vowels before voiced stops actually had lower F1’s. The results for F2 were more complicated. Focused items had higher F2’s by about 80 Hz, as did primary stressed vowels relative to secondary stressed vowels (approximately 1880 vs. 1740 Hz on average). Complicating this picture were unstressed vowels which had high F2’s similar to those found in primary stressed vowels. The unstressed vowel had a quality akin to that of General American /i/. None of the interactions were signiﬁcant. Of particular interest were nonsigniﬁcant interactions between voicing and stress (F(2,4)=1.46, p > 0:10) and focus

ARTICLE IN PRESS 508

K. de Jong / Journal of Phonetics 32 (2004) 493–516

(F(1,4)=1.26, p > 0:10). Even though there was a voicing effect on F1, it was not modulated by either stress or focus. These results with accompanying durational results are summarized and illustrated in Fig. 7. The left panels plot average effects in the F1 F2 vowel plane oriented in the traditional fashion, and the right panels plot F1 against vowel duration (ﬁgures similar to those found in Lindblom, 1963). The upper plots show average stress (solid arrows) and voicing (dashed arrows) effects. Considering ﬁrst the stress effects, the solid arrows indicate a very large difference in the quality of unstressed and secondary stressed tokens. Stressed vowels are much higher in the vowel space, since duration and F1 increase in a correlated fashion. Stressed vowels are also further back in the vowel space; the formant values indicate that unstressed vowels have roughly the quality of /i/. Turning to the difference between secondary and primary stressed items, we note that F1 and duration both increase, just as they did in the Arabic study. These English results, however, also differ from our previous Arabic results, since there is also a substantial increase in F2 for primary stressed vowels. This result is perhaps due to the target vowels in the English system phonemically contrasting with back vowels, while there is no minimal front-back contrast in Arabic low vowels. The voicing effects, indicated by dashed arrows in Fig. 7, are consistently accompanied by a lowering of F1; with a tendency for more stressed items to have additional duration (stressed tokens in the upper right panel tend to also show a rightward shift). The lower panels show focus effects (solid arrows) interacting with voicing effects (dashed arrows). Lexical focus effects lengthen mostly vowels before voiced stops. The lower left panel shows the general effect of focus making the vowels more front, again possibly because the front vowels in the corpus contrast phonemically with back vowels. The same analyses with vowel identity instead of voicing (and omitting the unstressed tokens) ﬁnds, not surprisingly, a large effect of vowel identity and a smaller effect of stress on F1 (identity, F(1,4)=169.03, po0:01; stress, F(1,4)=13.06, po0:05), and an effect of stress and of focus on F2 (stress, F(1,4)=20.49; focus, F(1,4)=15.39). Here, unlike the Michigan speakers in Hillenbrand et al. (1995), /æ/ has a higher F1 than does /e/. With the exception of these vowel identity differences, these results match those of the analysis for voicing above. Interactions were not signiﬁcant. Again of interest are nonsigniﬁcant interactions between vowel identity and both stress (F(1,4)=0.07, N.S.) and focus (F(1,4)o0.01, N.S.) on F1: Thus, even though there is a large difference in F1 between /æ/ and /e/, this difference was not modulated by stress or focus. To summarize, then, while there are quality differences associated with the vowel identity and with voicing, none of these are signiﬁcantly affected by either lexical focus or stress. Rather, stress and lexical focus tend to raise F1 and F2; thereby making the low front vowels used in this study more extreme in the vowel space. The analyses of phonological focus in Section 4.2 for duration were repeated for the ﬁrst two formants. These analyses also revealed the same main effects of stress, voicing, and vowel identity on F1 (focus on voicing/stress analysis: stress, F(1,4)=69.62; voicing, F(1,4)=55.18; focus on voicing and identity analysis: voicing, F(1,4)=17.27; vowel identity, F(1,4)=17.67, all po0:01), and the main effect of focus on F2 (F(1,4)=12.98, po0:05). None of the other effects approached signiﬁcance. The one exception to this is an interaction between focus type and stress on F2 (F(1,4)=17.67, po0:01). Here, the difference between unstressed and stressed vowels in the lexical focus condition is eliminated with a segmental focus on voicing.

ARTICLE IN PRESS

F1 (Hz)

F1 (Hz)

K. de Jong / Journal of Phonetics 32 (2004) 493–516 450

850

500

800

550

750

600

700

650

650

700

600

750

550

800

500

850

450

600

800

650

750

700

700

750

650

800

1900

509

600

1850

1800

1750

1700

100

150

200

250

V-duration (ms)

F2 (Hz) Voiced

Voiceless

Fig. 7. Average positions of vowels in the F1 F2 vowel plane (left panels), and F1 plotted against vowel duration (right panels). Dashed arrows indicate the effects of voicing, going from voiceless to voiced. Solid arrows in the upper panels indicate the effects of stress, going ﬁrst from unstressed to secondary stressed and then from secondary to primary stressed. Solid arrows in the lower panels indicate the effect of lexical focus, going from postnuclear unfocused to accented focused.

ARTICLE IN PRESS 510

K. de Jong / Journal of Phonetics 32 (2004) 493–516

To summarize the effects on vowel quality, these analyses ﬁnd an overall difference between /æ/ and /e/ in F1; an effect of stress which raises F1; an effect of focus in general, which raises F2; and an effect of stop voicing, which lowers F1:

6. Summary of results To compile all of the current results and compare them with results from the previous study of Arabic (de Jong & Zawaydeh, 2002), the various effects for the two studies are compiled in Table 2. Analyses in the Arabic study differed somewhat in that the voicing and identity analyses reported in Section 3 (above) were combined in the Arabic study. Also, because of the lack of minimally contrasting foils in Arabic, the structure of the Arabic corpus was quite different for the elicitation of phonological focus effects in the Arabic study. Thus, there is no directly comparable statistical analysis of the Arabic data corresponding to the focus effects in the current study.

7. Discussion This study ﬁnds an effect of stress and focus on the size of voicing-induced lengthening of vowels preceding stop consonants. This is expected, based on de Jong and Zawaydeh’s (2002) interpretation of the behavior of Arabic speakers, who did not show such interactions. These results eliminate the alternative interpretation that the difference between the English speakers in de Jong (1991) and the Arabic speakers in de Jong and Zawaydeh (2002) was due to differences in the elicitation techniques. Concerning stress, the main Arabic stress effect seems to be roughly equivalent in size to that of the difference between secondary and primary stress. When one compares the secondary and primary stressed syllables elicited in very much the same fashion from Arabic and English speakers, the English speakers systematically show larger voicing effects with stress while the Arabic speakers do not. The current results also show that the difference between how stress interacts with quantity and voicing in Arabic is not due to the voicing effect being a different sort of dynamic effect. With both the English and Arabic speakers, the concomitant effects on vowel quality for voicing and stress are very similar. For the low vowels used in these studies, stress increases F1 and duration in a correlated fashion, while voicing actually reduces F1 with vowel lengthening. Despite this, English speakers clearly exhibit the stress by voicing interaction, while the Arabic speakers did not. The current results also indirectly rule out interpreting the difference between how stress interacts with quantity and voicing in Arabic as due to the size of the voicing and quantity durational effects. In the English data, the vowel identity durational difference is larger than the voicing difference. However, the interactions with stress and lexical focus are systematically larger for the voicing than the identity effects. Hence, whether stress and focus will enhance a durational difference cannot be predicted from the size of the difference. There is an alternative interpretation of the difference between the Arabic and English voicing results, however, which cannot be entirely ruled out. Since the voiced and voiceless consonants

ARTICLE IN PRESS K. de Jong / Journal of Phonetics 32 (2004) 493–516

511

Table 2 Repeated measures ANOVA results summary English (current results) Factor

V-dur

Section 3: voicing Focus condition Stress Voicing Stress voicing Focus voicing Focus stress Three-way

Section 3: vowel identity Focus condition Stress Identity Stress identity Focus identity Focus stress Three-way

Arabic (previous results)

F1

F2

^

^

^

Section 4.2: focus type Focus condition Voicing Vowel identity Focus voicing Focus identity Voicing identity Three-way

F2

^ ^

Section 4.1: focus type, stress, and voicing Focus condition Stress Voicing Stress voicing Focus voicing Focus stress Three-way

F1

^

^

V-dur

No comparable analysis

No comparable analysis

po0:01; po0:05; ^po0:10:

were word medial and prevocalic in the Arabic corpus, they would, under usual assumptions about syllabiﬁcation, be in an onset of a following syllable to the target vowel. These English consonants, however, are word ﬁnal and preceding a consonant initial word; hence, they would syllabiﬁed into the same syllable as the target word. Extending claims such as made by Maddieson (1985) to the current results might explain the difference between the Arabic and English results.

ARTICLE IN PRESS 512

K. de Jong / Journal of Phonetics 32 (2004) 493–516

Maddieson (1985) hypothesized that the presence of compensatory shortening of a vowel in the context of multiple following consonants acts as an indicator that those consonants are cosyllabic with the shortened vowel. To generalize to the present case, the large consonant voicing effects in English are indicators that the consonants and affected vowels are tautosyllabic. Note, though, that the domain of the consonant voicing effects is not what is at issue here. Previous work on the voicing effect on vowel duration in English, especially Chen (1970) and studies of English ﬂapping (Fox & Terbeek, 1977), show that the voicing effect does operate across a syllable boundary, as did the Arabic results of de Jong and Zawaydeh (2002). What is at issue here is the domain of the stress effects. Because the consonant and the vowel in the Arabic corpus were not tautosyllabic, either the consonant or the preceding vowel is part of the syllable bearing primary stress, but not both at the same time. In the English corpus, because the consonant and vowel are co-syllabic, stressed consonants are accompanied by stressed vowels. While this interpretation cannot be entirely ruled out with the current corpus, it should be noted that our previous study of English words with following vowel-initial words (de Jong, 1991) also shows large vowel duration differences before voiced and voiceless stops that interact with stress in the same way as found in the current results. Considering the ﬂapping effect as an extension of stress-governed lenition effects would place the ﬂapped consonant in the following syllable with the unstressed vowel. Nevertheless, stress on the vowel previous to the lenited consonants has the effect of expanding the voicing-related differences in the stressed vowel, despite the consonant marked with the voicing feature being in the following syllable. While the voicing interactions found in the current study are entirely as predicted, the effects for vowel identity were not. The durational difference between /æ/ and /e/ is systematically expanded by both stress and by focus, this despite the fact that the vowel duration difference is just an idiosyncratic difference between these two vowels. Hence, these stress and focus effects seem to be very acute in their effects. Stress does not only just affect the very systematic contrastive dimensions which generalize across obstruents or across vowels but also works to expand even idiosyncratic differences between speciﬁc vowels. Curiously, these stress and focus modulations of duration were not found for F1; even though the F1 for the two vowels was quite different. This lack of interaction effect on F1 may simply be due to greater noise in the formant estimation procedure than in the duration measurements. However, it is also important to note that focus does have a very clear effect on vowel quality, one of shifting both /e/ and /æ/ forward in the vowel space. It is possible that the difference in the formant and duration results are due to the structure of the American English vowel contrast space, the two vowels contrasting most directly in terms of duration. That is to say, the /e//æ/ contrast in a variety of American dialects is essentially one of quantity. Without comparison with different vowel pairs, we cannot currently test this explanation. Turning to the relationship between stress and focus, the current study replicates the general relationship between stress and focus found for Arabic. In de Jong and Zawaydeh (2002), the two effects were found to be largely the same, affecting the same contrastive differences in the same way, and the same general pattern is found here for English speakers. To this extent, then, stress and focus can be said to be parallel physical effects. However, there are two clear points of difference between the Arabic and English speakers. First, English speakers expand durational differences not only for phonological focus but also for lexical focus, while the Arabic speakers did not. It is not clear whether this difference is due to

ARTICLE IN PRESS K. de Jong / Journal of Phonetics 32 (2004) 493–516

513

some difference between the lexical structure of the two languages. It might be that the functional load of the voicing and vowel identity contrast is higher in the more monosyllabic English noun lexicon, and hence lexical focus is more likely to be spelled out in the durational dimension that this study measured. Regardless of the reason, it is clear that the English and Arabic implementation of lexically corrective focus is different in its details. Second, lexical focus interacts with stress differently in the Arabic and English speakers. Of course, one major difference between English and Arabic is the presence of unstressed and reduced vowels in English. These unstressed and reduced vowels are markedly different from their stressed counterparts, both in terms of quality and duration, and their presence no doubt contributes substantially to rhythmic differences between English and Arabic found in previous studies (Tajima, Zawaydeh, & Kitahara, 1999; Zawaydeh, Tajima, & Kitahara, 2002). Their presence in the current corpus also contributes substantially to the stress effects found here. The most remarkable demonstration of the relationship between stress and focus is the tendency for some of the speakers to shift accent (and hence, stress) onto the unstressed syllable in the phonological focus conditions to distinguish between the words rabid and rabbit. Clearly, these unstressed syllables are not locations in which it is easy to direct attention to phonemic contrasts. In order to focus attention on these target segments, speakers sometimes reorganize the prosodic structure, putting them in a stressed location. Interestingly, one of the subjects after the elicitation session actually apologized for ‘saying the word wrong’ in these cases. The English convention of having unstressed, low attention material runs directly opposite to the phonological focus task demands of placing attention on that material. However, the difference between the Arabic and English speakers is not restricted to the treatment of unstressed syllables. While the Arabic speakers in de Jong and Zawaydeh (2002) showed no interaction between stress and focus, there is a very clear interaction between stress type (secondary vs. primary) and focus in the English speakers. The primary–secondary stress differences in duration are roughly of the same magnitude as those found for Arabic. Nevertheless, a disproportionate amount of the focus effects in the English speakers are found in the primary stressed syllables. This difference suggests that, even though Arabic and English can both be classiﬁed as stress languages, the function of primary stressed, accented syllables in English as high-attention areas is clearer in English than Arabic. This, in turn, suggests that, even among languages which can both be classiﬁed as stress languages, and even factoring out reduction processes, there are further graded distinctions between languages with respect to the function of stress in the phonological and lexical system.

8. Conclusion There are three general conclusions that can be drawn from the current study, then. First, not all physical differences between segmental categories have the same status with respect to those categories. There are a variety of durational differences, but these durational differences are there for different purposes, only some of which are to mark the difference between segments. In English, duration bears a contrastive load for voicing, and for distinguishing /e/ and /æ/. In Arabic, it contrasts vowel quantity, but not voicing. A physically measurable durational difference in Arabic does not function to contrast consonantal voicing.

ARTICLE IN PRESS 514

K. de Jong / Journal of Phonetics 32 (2004) 493–516

What this underscores is that the simple presence of an effect in speech does not indicate its function. To determine function, one must examine how the effect is instantiated in different functional conditions. Second, there is much to be learned about phonological structure, both by examining the conventional modulation of physical actions created by different languages’ prosodic systems, and by examining metalinguistic analogues to these conventions. Here, the effect of stress (conventional attention modulation) and focus (deliberate attention modulation) on segments is a tool for examining how the contrast structure between those segments is reﬂected in production strategies. While there is much yet to learn about how sensitive various physical properties of speech are to stress and focus, their use as a tool for uncovering phonological speciﬁcity is promising. Finally, there is much that is not known about the prosodic systems of the world’s languages. Here, the particular system is stress, and the general conclusion of this study is that stress clearly exists in Arabic and English, and stress is closely analogous to focus. However, despite their similarity, stress in English, and stress in Arabic are clearly not identical.

Acknowledgements I acknowledge helpful and substantive commentary on and discussions of this data with Anne Bradlow, Jennifer Cole, Stuart Davis, Donna Erickson, Susanne Fuchs, Janet Pierrehumbert, Dan Silverman, and Marija Tabain. Work partially supported by the NIDCD under grant number R03 DC04095 and by the NSF under grant number BCS-9910701.

References Beckman, M. E. (1986). Stress and non-stress accent: Netherlands Phonetic Archives 7. Dordrecht: Foris. Beckman, M. E., & Edwards, J. R. (1994). Articulatory evidence for differentiating stress categories. In P. A. Keating (Ed.), Phonological structure and phonetic form: Phonology and phonetic evidence (pp. 7–33). Cambridge: Cambridge University Press. Beckman, M. E., & Pierrehumbert, J. B. (1986). Intonational structure in English and Japanese. Phonology, 3, 255–309. Bohn, O., & Flege, J. E. (1990). Interlingual identiﬁcation and the role of foreign language experience in L2 vowel perception. Applied Psycholinguistics, 11, 303–328. Chen, M. (1970). Vowel length variation as a function of the voicing of the consonant environment. Phonetica, 22, 129–159. Cole, R. A., & Jakimik, J. (1980). How are syllables used to recognize words? Journal of the Acoustical Society of America, 67, 965–970. Cole, R. A., Jakimik, J., & Cooper, W. E. (1978). Perceptibility of phonetic features in ﬂuent speech. Journal of the Acoustical Society of America, 64, 44–56. de Jong, K. J. (1991). An articulatory study of vowel duration changes in English. Phonetica, 48, 1–18. de Jong, K. J. (1995). The supraglottal articulation of prominence in English: Linguistic stress as localized hyperarticulation. Journal of Acoustical Society of America, 97, 491–504. de Jong, K. J. (2000). Attention modulation and the formal properties of stress systems. In J. Boyle, J.-H. Lee, & A. Okrent (Eds.), Chicago linguistic society 36, Vol. 1 (pp. 71–91). Chicago: Chicago Linguistics Society.

ARTICLE IN PRESS K. de Jong / Journal of Phonetics 32 (2004) 493–516

515

de Jong, K. J., & Zawaydeh, B. A. (2002). Comparing stress, lexical focus, and segmental focus: Patterns of variation in Arabic vowel duration. Journal of Phonetics, 30, 53–75. Edwards, J. R., Beckman, M. E., & Fletcher, J. (1991). The articulatory kinematics of ﬁnal lengthening. Journal of the Acoustical Society of America, 89, 369–382. Engstrand, O. (1988). Articulatory correlates of stress and speaking rate in Swedish CV utterances. Journal of the Acoustical Society of America, 88, 1863–1875. Erickson, D. (1998). Effects of contrastive emphasis on jaw opening. Phonetica, 55, 147–169. Erickson, D. (2002). Articulation of extreme formant patterns of emphasized vowels. Phonetica, 59, 134–149. Erickson, D., Fujimura, O., & Pardo, B. (1998). Articulatory correlates of prosodic control: Emotion and emphasis. Language and Speech, 41, 399–417. Flege, J. E., & Port, R. F. (1981). Cross-language phonetic interference: Arabic to English. Language and Speech, 24, 125–146. Fox, R. A., & Terbeek, D. (1977). Dental ﬂaps, vowel duration, and rule ordering in American English. Journal of Phonetics, 5, 27–34. Gundel, J. K. (1999). On different kinds of focus. In P. Bosch, & R. van der Sandt (Eds.), Focus: Linguistic, cognitive, and computational perspectives (pp. 293–305). Cambridge: Cambridge University Press. Harrington, J., Fletcher, J., & Roberts, C. (1995). Coarticulation and the accented/unaccented distinction: Evidence from jaw movement revisited. Journal of Phonetics, 23, 305–322. Harris, K. S. (1978). Vowel duration changes and its underlying physiological mechanisms. Language and Speech, 21, 354–361. Hillenbrand, J., Getty, L. A., Clark, M. J., & Wheeler, K. (1995). Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America, 97, 3099–3111. Kent, R. D., & Netsell, R. (1971). Effects of stress contrasts on certain articulatory parameters. Phonetica, 24, 23–44. Lindblom, B. E. F. (1963). Spectrographic study of vowel reduction. Journal of the Acoustical Society of America, 35, 1773–1781. Lindblom, B. E. F. (1990). Explaining phonetic variation: a sketch of the H&H theory. In H. J. Hardcastle, & A. Marchal (Eds.), Speech production and speech modeling, NATO ASI Series D: Behavioural and social sciences, Vol. 55. (pp. 403–439). Dordrecht: Kluwer Academic Publishers. Maddieson, I. (1985). Phonetic cues to syllabiﬁcation. In V. Fromkin (Ed.), Phonetic linguistics: Essays in honor of Peter Ladefoged (pp. 203–221). New York: Academic Press. Mitleb, F. (1984a). Voicing effect on vowel duration is not an absolute universal. Journal of Phonetics, 12, 23–27. Mitleb, F. (1984b). Vowel length contrast in Arabic and English: A spectrographic test. Journal of Phonetics, 12, 229–235. Munro, M. J. (1993). Productions of English vowels by native speakers of Arabic: Acoustic measurements and accentedness ratings. Language and Speech, 36, 39–66. Ohman, S. E. G. (1967). Word and sentence intonation: A quantitative model. Speech Transmission Laboratory Quarterly Progress Status Report, 1967(2–3), 20–54. Peterson, G. E., & Lehiste, I. (1960). Duration of syllable nuclei in English. Journal of the Acoustical Society of America, 32, 693–703. Summers, W. V. (1987). Effects of stress and ﬁnal-consonant voicing on vowel production: Articulatory and acoustic analyses. Journal of the Acoustical Society of America, 82, 847–863. Tajima, K., Zawaydeh, B. A., & Kitahara, M. (1999). A comparative study of speech rhythm in Arabic, English, and Japanese. In J. J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, & A. C. Bailey (Eds.), Proceedings of the XIVth International Congress of Phonetic Sciences, Vol. 1 (pp. 285–288). Berkeley, CA: University of California. Tiffany, W. R. (1958). Non-random sources of variation in vowel quality. Journal of Speech and Hearing Research, 2, 305–325. Van Heuven, V. J. (1994). What is the smallest prosodic domain? In P. A. Keating (Ed.), Phonological structure and phonetic form: Papers in laboratory phonology, Vol. III (pp. 76–97). Cambridge: Cambridge University Press. Walker, J. (1787). Elements of elocution. London: J. Walker.

ARTICLE IN PRESS 516

K. de Jong / Journal of Phonetics 32 (2004) 493–516

Zawaydeh, B. A., & de Jong, K. J. (1999). Stress, phonological focus, quantity, and voicing effects on vowel duration in Ammani Arabic. In J. J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, & A. C. Bailey (Eds.), Proceedings of the XIVth international congress of phonetic sciences, Vol. 1 (pp. 451–454). Berkeley, CA: University of California. Zawaydeh, B. A., Tajima K., & Kitahara, M. (2002). Discovering Arabic rhythm through a speech cycling task. In Perspectives on Arabic linguistics, Vol. XIII–XIV (pp. 39–58). Amsterdam: John Benjamins Publishing Company. Zimmerman, S. A., & Sapon, S. M. (1958). Note on vowel duration seen crosslinguistically. Journal of the Acoustical Society of America, 30, 152–153.

Stress, lexical focus, and segmental focus in English: patterns of variation in vowel duration

Stress, lexical focus, and segmental focus in English: patterns of variation in vowel duration

Recommend Documents