Post-focus F0 compression—Now you see it, now you don’t

Post-focus F0 compression—Now you see it, now you don’t

Journal of Phonetics 38 (2010) 517–525 Contents lists available at ScienceDirect Journal of Phonetics journal homepage: www.elsevier.com/locate/phon...

452KB Sizes 0 Downloads 74 Views

Journal of Phonetics 38 (2010) 517–525

Contents lists available at ScienceDirect

Journal of Phonetics journal homepage: www.elsevier.com/locate/phonetics

Letter to the Editor

Post-focus F0 compression—Now you see it, now you don’t Yiya Chen  Phonetics Laboratory, Leiden University Center for Linguistics (LUCL), Cleveringaplaats 1, P.O. Box 9515, 2300 RA Leiden, The Netherlands

a r t i c l e in fo

abstract

Article history: Received 13 June 2009 Received in revised form 25 June 2010 Accepted 29 June 2010

We report data on post-focus realization of lexical tones in Standard Chinese, comparing them to their counterparts in the pre-focus and on-focus conditions reported in Chen and Gussenhoven (2008). While post-focus lexical tones are often realized with a compressed F0 range as observed in prior studies, in some tonal contexts, their F0 range is expanded. In addition, some post-focus tones also show a long-lasting effect of the preceding tone, but with a pattern different from the usual tonal coarticulation effect. What consistently differentiates the post-focus from the on-focus condition is the degree of distinctiveness in the lexical tonal contours, in particular when the preceding focused tone is High. These data suggest that F0 range compression is not the only and primary characteristic of postfocus tonal realization. Rather, the observed post-focus effects may be viewed as the multifaceted manifestations of weak implementation of post-focus tonal targets, as they are associated with prosodically non-prominent constituents. & 2010 Elsevier Ltd. All rights reserved.

1. Introduction By now it is well-known that in speech communication, the same string of phonetic segments is often pronounced differently depending on communicative contexts and speaker intention. Take (1) as an example, if the speaker intends to emphasize that the place Mary went to last year was Tibet, not Hong Kong, Tibet would be contrastively focused (indicated with capital letters in (1)) and pronounced with prominence. As a contrast, the acoustic realization of the Tibet in (2), as given information in a post-focus position, sounds much less prominent. Such a way of packaging an utterance to integrate it into the information flow of on-going discourse is commonly referred to as the prosodic encoding of information status.1 (1) Mary traveled in TIBET last year. (She did not travel in Hong Kong.) (2) Mary TRAVELED in Tibet last year. (She did not work there.) Much work has been done on how such information statusinduced prominence difference is instantiated in different languages.2 Impressionistically speaking, focus often raises or  Tel.: + 31 71 527 1688; fax: + 31 71 527 7569.

E-mail address: [email protected] In this paper, we are concerned mainly with contrastive focus, defined here as focus elicited to make a correction. For more details on different notions of information structure, see Fe´ry, Fanselow, and Krifka (2007). See also Gussenhoven (2007) and Selkirk (2006) for different types of focus. 2 Languages differ in the obligatoriness of expressing information status prosodically. In some languages such as Northern Sotho (Zerbian, 2006), focus has been reported to show no prosodic variation. Other grammatical correlates such as 1

0095-4470/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.wocn.2010.06.004

expands the F0 range, while post-focus materials, which are usually given, typically show lowered or compressed F0 contours (for Germanic languages, see, among others, Bartels & Kingston, 1994; Baumann, Grice, & Steindamm, 2006; Cooper, Eady, & Mueller, 1985; Cruttenden, 2006; Eady, Cooper, Klouda, Mueller, ¨ & Lotts, 1986; Fe´ry & Kugler, 2008; Grønnum, 1992; Ishihara & Fe´ry, 2006; for Sinitic languages, see, among others, Chen, in press; Garding, 1987; Jin, 1996; Liao, 1994; Pierrehumbert, 1980; Shih, 1988; Xu, 1999; Yuan, 2004). This general pattern of F0 expansion/raising and compression/ lowering has been hypothesized in different languages as being due to different linguistic representations/organizations. For example, in West-Germanic languages, where F0 does not typically differentiate lexical meanings, a distinction is usually recognized between the effects of focus and post-focus on the phonological tonal representation of the intonational contour. Specifically, the presence of pitch accents on focused constituents and absence of pitch accents on post-focal constituents have been considered structural devices in the expression of focus (Gussenhoven, 1983, 1984; Ladd, 1980, 1996; Selkirk, 1984, 1996). Over the last decades, there has been a lot of supporting evidence for a close correspondence between information status and (de-) accentuation in Germanic languages in behavioural studies of both speech production and comprehension (e.g., Cutler, 1984; Nooteboom & Kruyt, 1987; Terken & Nooteboom, 1987).

(footnote continued) word order and morphological marking may also be employed in different languages to encode information structure, depending on the language’s general linguistic properties.

518

Y. Chen / Journal of Phonetics 38 (2010) 517–525

More recently, an increasing amount of new research suggests ¨ that the effect of focus may not be purely structural. Fe´ry and Kugler (2008) show, with data in German, that ‘‘focus raises tones while givenness lowers them in prenuclear position and cancels them out postnuclearly’’, and propose that ‘‘[T]hese changes in the values of accents were explained by the influence information structure has on reference lines associated with prosodic domains’’. This, to a certain extent, is in line with earlier studies which suggest a more gradient nature of focus-induced F0 modification (e.g., Cooper et al., 1985; Eady et al., 1986; Xu & Xu, 2005). This work also suggests that focus in West-Germanic languages may be viewed as being two-fold. On the one hand, focus requires the focused element to be aligned with the head of a prosodic domain (i.e., a structural re-alignment of the intonational tier with the segmental tier). On the other hand, focus introduces the modification of pitch range of prosodic domains, which, in turn, affects the F0 realization of pitch accents (i.e., a more gradient and phonetic effect of pitch range expansion). These two effects go hand-in-hand in marking focal prominence. In languages such as Chinese, where F0 movements indicate lexical contrast, speakers do not have the option of accenting or de-accenting words according to their information status. There has been a long tradition of representing focus realization as manifested via pitch range manipulation (Garding, 1987; Jin, 1996; Shih, 1988; Xu, 1999, 2005; Xu & Wang, 2001). Specifically, ‘‘the pitch range of the focused region is expanded; that of the post-focus region compressed; and that of the pre-focused region left largely neutral’’ (Xu, 2005: p. 235). With data on F0 range as well as on the changes of F0 contours of lexical tones as a function of the durational change of the tone-bearing syllables, Chen (2003) argues that focus does not just introduce pitch range manipulation. Rather, the effect of focus is better accounted for by appealing to an abstract notion of prosodic prominence. Chen and Gussenhoven (2008) argue further that just like in English, focused elements in Standard Chinese are associated with high-level prosodic prominence of the utterance. In English, this representation leads to the structural effect of focus, i.e., the association of nuclear pitch accent with, for example, the contrastively focused constituent (Selkirk, 2002). Focus-induced prosodic prominence also correlates with syllables that are accented and articulated with greater articulatory force so that phonemic contrasts among vowels and consonants are enhanced (Chen, 2008; Cho, 2005; de Jong, 1995; Erickson, 2002). In languages such as Mandarin Chinese, where the addition of a pitch accent is impossible, such structural prominence is manifested in the greater articulatory force that is applied to both the segments and tones of the syllables of the focus constituent. In the post-focus condition, there are a number of studies which show the effect of pitch range compression (e.g., Jin, 1996; Xu, 1999; Shih, 1988). Chen (2003) and Chen and Gussenhoven (2008), however, report data which show that post-focus pitch range compression may not be the only and primary characteristic of post-focus tonal realization. In certain tonal contexts, post-focus lexical tones may be realized with an F0 range that is expanded, particularly when compared to the same tone uttered in the prefocus position. This phenomenon, however, has been reported rather briefly and invites further and more detailed investigation. The goal of this study is, therefore, to analyze and present a more complete set of data (compared to Chen & Gussenhoven, 2008), taken from a larger number of subjects (compared to Chen, 2003). The central question addressed here is exactly how post-focus lexical tones are realized in different tonal contexts. Specifically, we aimed to answer the following two sub-questions:

(1) Is F0 range compression a consistent and primary cue for post-focus tonal realization?

(2) How does post-focus tonal realization differ from the pre-focus and on-focus lexical tonal realization? Our data confirm that although post-focus lexical tones are sometimes realized with a compressed F0 range, they can also be realized with a more expanded F0 range than their pre-focus counterparts, raising questions about a purely F0 range-manipulation view of focus realization. Furthermore, we will show that the specific F0 contours of post-focus lexical tones are subject to the influence of (mainly) preceding as well as (sometimes) following tonal contexts. Despite the sometimes expanded F0 range, post-focus lexical tones are consistently realized with significantly less distinctive F0 contours than their on-focus counterparts, especially when the preceding on-focus tone is High. Here, we define distinctiveness in terms of acoustic differentiation between F0 contours of the four lexical tones, which in turn should affect the perceptual salience of the four distinct tonal categories. In other words, the acoustic distinctiveness of an F0 contour can be viewed as the degree of informativeness of this contour in distinguishing the tonal category it denotes from contrastive tonal categories. We will show that although pitch range compression is one likely manifestation of weak tonal articulation, the wide range of post-focus effects on tonal realization—F0 range compression/expansion, lowering/raising, and the lack of contour distinctiveness—cannot simply be modelled as post-focus pitch range compression.

2. Methods Post-focus data were elicited together with another data set reported in Chen and Gussenhoven (2008) where the target syllables were produced in the pre-focus and on-focus discourse contexts. This makes it possible to compare post-focus tonal realizations with their counterparts in the on-focus and pre-focus conditions. Details of the experimental setup were introduced in Chen (2003) and Chen and Gussenhoven (2008). For the readers’ convenience, we will recapitulate some basic methodological issues in the following sections.

2.1. Test materials The target syllable in this production experiment is indicated as Y in the template sentence in (3), while the preceding and following syllables are identified as X and Z, respectively. For Y, all four lexical tones (i.e., High, Low, Rising, and Falling) were included and they all have a complex CGVG syllable structure (i.e., mi]o). The preceding syllable X varies between shuo¯ (‘say’) with a High tone (H) and xie˘ (‘to write’) with a ´n Low tone (L). The following syllable Z varies between na (‘difficult’) with a Rising tone (LH) and ma n (‘slow’) with a Falling tone (HL). The choice of these words was made based on the following considerations: we needed to construct sentences that were readily interpretable with neutral semantic context and grammatical syntactic structure, and the words chosen also needed to have phonotactic structures that are easy to segment. Furthermore, we wanted to control the preceding and following tonal context. Including all four lexical tones in Standard Chinese would significantly increase the number of tonal contexts to sixteen. It was difficult, if not impossible, to find stimuli that satisfied the above syntactic, semantic, and phonetic requirements. As a compromise, the target syllable Y was preceded by tones that end high or low and followed by tones

Y. Chen / Journal of Phonetics 38 (2010) 517–525

that start high or low. In total, sixteen stimulus sentences were included. (3) zhou ¯ bın suo¯ X Y Z hˇen duo¯ zhou ¯ bın said X Y Z very more ‘zhou bın (proper name) said it is much more ¯ Z (difficult/slow) to X (write/say) Y (target syllable with four different tones).’

2.2. Discourse context The target syllables were elicited in the post-focus condition. In other words, their preceding syllable (i.e., X) was always elicited with focus. Subjects were first given a sentence in Chinese characters on a computer screen, labelled ‘correct information’. An example in pinyin (romanized Chinese orthography) is given in (4). Subsequently, they were given a new sentence in which one element was different, this time with the label ‘incorrect information’, as illustrated in (5), again in pinyin (see in (4) and (5) the underlined contrasting pair). On the same screen, they were given the instruction to provide a correction, as in (6). A typical answer from the speakers, with emphasis on shuo (underlined), is illustrated in (7). (4) Correct information: zhou ¯ bın shuo¯ shuo¯ miao ¯ na´n hˇen duo¯ ‘Zhoubin said that it is more difficult to say miao.’ (5) Incorrect information: zhou ¯ bın shuo¯ xie˘ miao ¯ na´n hˇen du ‘Zhoubin said that it is more difficult to write da.’ (6) Context for eliciting focus on the preceding syllable: Suppose you gave the correct information in sentence (4), and the experimenter thought you said sentence (5), how would you correct the experimenter? (7) Response with the target syllable in post-focus position: zhou ¯ bın shuo¯ shuo¯ miao ¯ na´n hˇen do¯ ‘Zhoubin said that it is more difficult to say miao.’

2.3. Subjects and recording Three male and two female speakers of Standard Chinese participated in the experiment. The recordings were made in a sound treated booth, with three participants at Stony Brook University via a Sony Digital Mega Bass MZ-R55 mini recorder, and two participants at Radboud University Nijmegen, via a Datrecorder. The test materials were randomized and presented to speakers who were unaware of the purpose of the experiment. Each participant completed three repetitions with different orders of stimuli. Recordings were made at a sampling rate of 44,100 Hz (and later downsampled to 16,000 Hz). During the recording, whenever the experimenter failed to perceive the intended pragmatic meaning, the experimenter would remind the participant to pay attention to the question on the screen and then ask the participant to try producing the answer again. The complete data set of all five participants was completely reprocessed and reanalyzed, to be compared with the analyses performed in Chen (2003) and Chen and Gussenhoven (2008). 2.4. Acoustic analysis The start and end of each of the syllables designated X, Y and Z were manually labelled in Praat (Boersma & Weenink, 1996–2005) and syllable duration was derived. F0 contours were obtained by (1) taking 20 F0 points (in Hz) at proportionally equal time intervals

519

between the acoustic onset and offset of the syllable and (2) averaging across three repetitions of the same sentence uttered by the same speaker for each discourse context separately. These F0 values were then transformed into semitones and averaged across speakers. The normalized syllable duration is the mean duration of the target syllable averaged across speakers, repetitions, and syllable structures. In addition, different F0 maxima (max F0) and minima (min F0) were measured to gain a quantitative understanding of the compression of F0 range in the post-focus condition. These quantitative data were also subjected to statistical analyses. Specifically, we first took the max- and min-F0 over the whole pitch contour of the tone-bearing syllable. The pitch range derived from these measurements (i.e., F0 range within the tone-bearing syllable) will be referred to as SyllableRange. We define SyllableRange (st)¼12  Log2(max F0/min F0). Note that the four lexical tones in Standard Chinese have distinct shapes. Tone-specific F0 measurements were therefore also taken to gain further understanding of pitch modifications of the post-focus lexical tones. The High and Low tones are known as static tones (Xu, 1999) and F0 at the end of the tone-bearing syllable is a good indicator of their realization (i.e., End F0 at the syllable offset). We define End F0 (st)¼12  Log2(End F0(Hz)/100). The Rising and Falling tones are known as dynamic tones (Xu, 1999) and therefore, a better indicator of these tones are, respectively, the range of F0 rise or fall (i.e., the ToneRange). So, the end of F0 rising (i.e., Max0 F0) for the Rising tone and the end of F0 falling (i.e., min0 F0) for the Falling tone were taken, from which the ToneRange was derived, as illustrated in Fig. 1 (Fig. 1a for Rising tone and Fig. 1b for Falling tone). An index of the range difference was calculated for the Rising tone as ToneRange(st)¼ 12  Log2(max0 F0 /min F0) and for the Falling tone as ToneRange(st)¼12  Log2(max F0/min0 F0). 2.5. Statistical analysis A linear mixed-effect model with Subject as a crossed random effect was run in R (Baayen, 2008). The independent fixed-effect predictors included (1) TONE (i.e., four lexical tones—H, LH, L, and HL), (2) PRECEDING TONE (i.e., High vs. Low tone), and (3) FOLLOWING TONE (i.e., Falling vs. Rising tone), and (4) DISCOURSE (2 levels: Pre-focus vs. Post-focus). Note here that the Pre-focus data were taken from Chen and Gussenhoven (2008). Our goal was to examine the consistency and robustness in speakers’ possible manipulations of F0 range as a signal or cue for post-focus tonal realization. So, we were mainly interested in the effect of DISCOURSE and its possible interaction with other factors. Significant interactions of factors other than those with DISCOURSE therefore may not be covered (especially when such interactions were ordinal and their magnitudes were negligible).

3. Results Fig. 2 plots the mean F0 contours of the four post-focus lexical tones, as compared to their counterparts in the Pre-focus condition (reported in Chen & Gussenhoven, 2008). These post-focus tones are preceded by a High (left column) or Low (right column) tone, and followed by a Rising tone. The black solid lines here indicate the F0 contours in the post-focus condition (i.e., the preceding High or Low tone was focused), and the black dotted lines for the prefocus condition (i.e., the following Rising tone was focused3). 3 Here the semantic focus domain was the whole constituent man/nan duo le but the most salient prosodic manifestation of focus was on the constituent duo.

520

Y. Chen / Journal of Phonetics 38 (2010) 517–525

15

Min F0 Max’F 0

5000

Frequency (Hz)

180

Syllable Range Tone Range

0

60

200 ms

miao

SyllableRange (st)

MaxF0

Rising Tone

Preceding T: High

10

5

0 H

MinF0 MaxF0 Min’F 0

Falling Tone

5000 Frequency (Hz)

Fig. 1. Spectrograms with superimposed F0-contours showing the max- and minF0 points within the tone-bearing syllable as well as the start of fall or rise in the Rising (a) and Falling (b) tone, respectively. (The frequency range of the tonal contours is 60–180 Hz.)

H

R

0 16 12 8 4 0

0

16 12 8 4 0

0.4 0.8 Normalized Time

Post-focus Pre-focus

0.4 0.8 Normalized Time Post-focus Pre-focus

R 0

16 12 8 4 0

0.4 0.8 Normalized Time

F

0

16 12 8 4 0

0.4 0.8 Normalized Time

L

H

0

F0 (st)

16 12 8 4 0

Preceding Tone: Low

16 12 8 4 0

0.4 0.8 Normalized Time

F0 (st)

0

0.4 0.8 Normalized Time Post-focus Pre-focus

L

0

F0 (st)

16 12 8 4 0

F0 (st)

Preceding Tone: High F0 (st)

H

R L F Post-focus

60 200 ms

miao

F0 (st)

F L Pre-focus

Tone Range Syllable Range

0

F0 (st)

R

Fig. 3. F0 range within the tone-bearing syllable (i.e., SyllableRange) as a function of the lexical tone ([H]igh, [R]ising, [L]ow, and [F]alling), preceding tone (High and Low), and discourse context (pre-focus and post-focus).

180

0

F0 (st)

Preceding T: Low

16 12 8 4 0

0.4 0.8 Normalized Time F

0

Post-focus Pre-focus

0.4 0.8 Normalized Time

Fig. 2. F0 contours of the four lexical tones ([H]igh, [R]ising, [L]ow, and [F]alling) uttered in two discourse contexts (i.e., pre-focus and post-focus) when preceded by a High (left) or Low (right) tone and followed by a Rising [R] tone. (Pre-focus data adapted from Chen & Gussenhoven, 2008).

It is important to note that although F0 compression was observed on the Rising tone of the last syllable (i.e., the Z syllable in the carrier sentence), it is clearly an inconsistent manifestation

of post-focus tonal realization for the syllable (i.e., the target Y syllable) that immediately follows the on-focus syllable (i.e., the X syllable in the carrier sentence). With regard to the F0 range within the tone-bearing syllable (i.e., SyllableRange), there was a significant interaction between DISCOURSE  TONE  PRECEDING TONE (SyllableRange: [F(1, 464)¼3, p o.05]). This is further illustrated in Fig. 3. All four lexical tones clearly showed greater SyllableRange in the post-focus condition than in the pre-focus condition, although the magnitude of difference varied for each lexical tone. The four lexical tones in Standard Chinese differ in the role of pitch range in their canonical F0 contour realization. As mentioned earlier, a good indicator for the High and Low tones is the End F0. Results showed a significant interaction between DISCOURSE  TONE  PRECEDING TONE on their End F0 [F(1, 224) ¼37, p o.0001]. Fig. 4 shows that this interaction was partly due to the significant lowering of End F0 in the post-focus position when the preceding tone was High. When the preceding tone was Low, there was no significant difference between the pre- and post-focus conditions regardless of the preceding lexical tone. For the Rising and Falling tones, a better indicator of their tonal realization is ToneRange. There were two significant threeway interactions for the ToneRange (DISCOURSE  TONE  PRECEDING TONE: [F(1, 224) ¼58, p o.0001]; DISCOURSE  TONE  FOLLOWING TONE: [F(1, 224) ¼6, p o.05]). As Fig. 5 shows, when the preceding tone was High, there was a postfocus F0 range compression in the Rising tone but an expansion in the Falling tone. When the preceding tone was Low, postfocus Rising and Falling tones both showed an effect of pitch range expansion, though with a much greater magnitude in the Rising tone. This general pattern holds true both when the following tone was Rising (Fig. 5a) and Falling (Fig. 5b). The following tone in general did not have much influence on the target tone. The magnitude of post-focus F0 expansion/ compression, however, varied slightly as a function of different following tones. Another pattern worth noting is the significant effect of the Preceding tone over the post-focus tonal realization. Fig. 6 plots the realization of the four target lexical tones when they were preceded by High (solid lines) or Low tones (dotted lines). In this graph, we compared data in the post-focus condition (left column) with that in the on-focus condition (right column) reported in Chen and Gussenhoven (2008). It is clear that the effect of the Preceding tone was much greater when the target tone was produced in the post-focus condition than in the on-focus condition, except for the Low tone where due to the Low

Y. Chen / Journal of Phonetics 38 (2010) 517–525

16 Preceding T: High

Preceding T: Low

EndF0 (st)

12

8

4

0 High

Low Pre-focus

High

Low

Post-focus

Fig. 4. End F0 of the High and Low tones as a function of the lexical tone (High and Low), preceding tone (High and Low), and discourse context (pre-focus and post-focus).

ToneRange(st)

5

Preceding T: High

Preceding T: Low

2.5

0 Rising

ToneRange(st)

5

Falling Pre-focus

Preceding T: High

Rising Post-focus

Falling

Preceding T: Low

2.5

0 Rising

Falling

Rising

Pre-focus

Post-focus

Falling

Fig. 5. (a) ToneRange of the Rising and Falling tones (followed by a Rising tone) as a function of the preceding tone (High and Low) and discourse context (pre-focus and post-focus). (b) ToneRange of the Rising and Falling tones (followed by a Falling tone) as a function of the preceding tone (High and Low) and discourse context (pre-focus and post-focus).

tone sandhi,4 the Preceding tone (i.e., the first Low tone) was realized with a rising F0 contour and the second post-focus Low tone showed an F0 contour comparable to that after a focused

4 Low tone sandhi refers to the realization of a Low tone with a rising F0 contour before another Low tone when they are phrased into the same tone sandhi domain. For more details, readers are referred to Chen (2000), Duanmu (2000), and Chen and Yuan (2007), among others. It is clear that the Low tone sandhi was applied by the speakers in this study despite the fact that the first Low tone word was under focus (see a similar finding in Chen, 2004).

521

High tone. There was also an interesting dissimilatory effect, particularly on the post-focal High and Rising tones, where the on-focus Low tone raised the End F0 of its following tone and the on-focus High tone lowered the End F0 of its following tone. The preceding-tone effect lasted longer in the post-focus condition than in the on-focus condition, confirming a pattern discussed briefly in Chen and Gussenhoven (2008). As shown in Fig. 7, by the end of the tone-bearing syllable, the Preceding tone did not exhibit any significant effect in the on-focus condition. In contrast, in the post-focus condition, the Preceding tone showed a significant effect on the End F0 of the target tone in the High and Rising tones, as well as a clear trend in the Falling tone. By now it should be clear that pitch range compression is by no means a reliable cue for tonal realization in the post-focus condition immediately after an on-focus lexical tone. What quite consistently differentiated the post-focus from the on-focus condition was the degree of distinctiveness in the F0 contours of their tonal realization. The magnitude of difference in terms of acoustic distinctiveness is particularly striking when the preceding on-focus tone was High, as shown in Fig. 8, where part of the data in Fig. 6 were re-plotted to bring out the post- vs. on-focus contrast in the distinctiveness of tonal realization more vividly. This figure shows the realization of the four target lexical tones in the post-focus vs. on-focus conditions when the target tones were preceded by a High tone. The F0 trajectories of the four lexical tones in the on-focus condition differed substantially from the F0 trajectories of the post-focus lexical tones. More importantly, the four averaged F0 contours in the on-focus condition were very distinguishable, making the tonal contours a particularly informative cue for the identification of tonal contrast. In contrast, in the post-focus condition, there was much less difference amongst the contours of the four lexical tones, making the F0 trajectories much less informative in signalling tonal contrasts.

4. Discussion and conclusion Results of the experiment show that F0 compression cannot be the only and primary cue for post-focus tonal realization, as illustrated in Fig. 2. When the total F0 range over the tone-bearing syllable was considered (Fig. 3), lexical tones immediately following an on-focus syllable were realized with an F0 range that was more expanded than their pre-focus counterparts. When only the tone-intrinsic F0 range/End F0 was considered (Fig. 4 for the two static tones—High and Low; Fig. 5 for the two contour tones—Rising and Falling), post-focus lexical tones were sometimes realized with a compressed range or lowered F0 value and sometimes with an expanded range or raised F0, depending on the tonal context. Specifically, a High tone showed a significantly lowered End F0 in the post-focus condition only when the preceding tone was High; a Low tone, however, showed no difference in its End F0 between the post-focus and pre-focus conditions (Fig. 4). As for the two contour tones, a post-focus Rising tone was realized with a compressed F0 range after a High tone but with an expanded F0 range after a Low tone. A post-focus Falling tone, however, was realized with an expanded F0 range after a High tone but had neither compression nor expansion after a focused Low tone (Fig. 5). In short, while there is some evidence to confirm observations in prior studies on post-focus pitch range compression and F0 lowering (e.g., Jin, 1996; Xu, 1999), questions arise with regard to the modelling of cases where post-focus tones showed pitch range expansion or raising. One may argue that such post-focus pitch range expansion can be explained as being due to the preceding on-focus tone. One possibility is that the pitch range expansion of the preceding

522

Y. Chen / Journal of Phonetics 38 (2010) 517–525

On-Focus

16

16

12

12 F0 (st)

F0 (st)

Post-Focus

H

8

H

8

4

4

0

0 0

0.5

0 16

12

12 F0 (st)

F0 (st)

16

R

8

R

8 4

4

0

0 0

0

0.5 Normalized Time 16

12

12 F0 (st)

F0 (st)

0.5 Normalized Time

16

L

8

L

8

4

4

0

0 0

0.5

0

Normalized Time

0.5 Normalized Time

16

16

12

12 F

F0 (st)

F0 (st)

0.5 Normalized Time

Normalized Time

8

F

8 4

4

0

0 0

0.5 Normalized Time

0

0.5 Normalized Time

Fig. 6. F0 contours of the four lexical tones ([H]igh, [R]ising, [L]ow, and [F]alling) uttered in two discourse contexts (on-focus and post-focus) when preceded by a High (solid line) or Low (dotted line) tone (and followed by a Rising tone not shown here).

on-focus tone resulted in delayed tonal realization into the following post-focus syllable, which leads to post-focus F0 range expansion. Fig. 2, together with impressionistic observations of the corpus, suggests that this cannot be the case as the F0 peak and valley of the preceding on-focus High and Low tones respectively were typically realized within the preceding tonebearing syllable. For example, the mean F0 contours in Fig. 2 show that the preceding on-focus High tone exhibited a subtle but clear tendency of F0 lowering within the tone-bearing syllable. Concerning the other tones, there was neither plateau nor continued F0 rising within the post-focus syllable, suggesting that the F0 peak of the on-focus High tone was realized within its own tone-bearing syllable. Similarly, there was no evidence that the on-focus Low tone was realized in its following post-focus syllable. Another possibility is that the expanded F0 range resulted from the carryover effect of the preceding tone and the transition

from the preceding on-focus tone to the target post-focus tone. This does explain some data such as the expanded SyllableRange for the Rising tone after an on-focus High tone. As for the postfocus Falling tone, a falling contour was the intended tonal gesture, making it difficult to tease apart possible carryover effects of the preceding on-focus tone from the effect of postfocus target tone implementation. Note further that although the Preceding tone did have significant influence on post-focus tonal realization (Fig. 6), both the direction and magnitude of the Preceding tone effect were different from the commonly observed tonal co-articulation effect (Xu, 1997). For High and Rising tones, we observed an interesting dissimilatory effect, where the preceding on-focus Low tone raised their End F0 but the preceding on-focus High tone lowered their End F0. This is reminiscent of the report in Xu (1995) which describes that an emphasized Low tone exerts an additional

Y. Chen / Journal of Phonetics 38 (2010) 517–525

dissimilation effect on the following tone, raising (part of) the F0 contour of the following tone. Typical carry-over effect of highending and low-ending tones should raise and lower their following tonal contours, respectively (Xu, 1997). This general pattern was indeed manifested in the realization of on-focus tones (Fig. 6). This makes it clear that the observed preceding tone effect on post-focus F0 realization cannot simply be attributed to the coarticulatory influence of the preceding tone in a commonly acknowledged sense. Furthermore, F0 range expansion and compression should be treated differently from F0 raising and lowering. Why do focused High and Low tones show such different effects on their post-focus lexical tonal realization? We know that high and low F0 values are produced with different laryngeal muscle configurations (Zemlin, 1988). It is possible that after a focused High or Low tone, the laryngeal muscles employed in implementing the tone may exert opposite forces in the implementation of the following post-focus tone, leading to F0 lowering after a focused High tone but raising after a focused Low tone. In the latter case, such a force also seemed to create an opportunity for relatively more distinct realization of post-focus tones, compared to those after an on-focus High tone. The data here seem to be somewhat similar to the post-L bounce of the neutral tones (which are associated with reduced syllables like English schwa) after a Low tone (Chen & Xu, 2006). Different from the neutral tone, the lexical tones after a focused Low tone are often realized with F0 trajectories reminiscent of their characteristic tonal contours despite their much reduced distinctiveness, suggesting that post-focus syllables in Standard Chinese are not as reduced as neutral tone syllables. To further evaluate this

Mean EndF0 (st)

16

On-focus

Post-focus

12

8

4

0 H

R

L

Preceding H

F

R

H

L

F

Preceding L

Fig. 7. The effect of preceding tone (High and Low) on the End F0 of the four lexical tones ([H]igh, [R]ising, [L]ow, and [F]alling) as a function of the discourse context (on-focus and post-focus)

Post-focus

523

possibility, the exact nature and laryngeal mechanism of such post-focus bounces need to be investigated in the future. If we are indeed on the right track, we should also expect more evidence for weak implementation of post-focus tonal targets. Fig. 7 illustrates such an aspect concerning the effects of Preceding tone on the on-focus and post-focus lexical tonal realization. By the end of the tone-bearing syllable, on-focus lexical tones showed no significant effect of their Preceding tone while post-focus lexical tones showed quite a significant effect of the Preceding tone, especially in the High and Rising tones. This suggests weakened effectiveness of the post-focus tonal targets in overcoming the influence of the preceding tone, as compared to their on-focus counterparts. Further evidence of weak implementation comes from the lack of distinctiveness in post-focal tonal realization. As shown in Fig. 8, when preceded by a High tone, all four lexical tones showed declining F0 contours over the tonebearing syllable in the post-focus position (Fig. 8a), which presents a sharp contrast to the on-focus tonal realization (Fig. 8b) where the F0 contours were much more distinctive and therefore more informative in signalling the tonal contrasts. Although the extent of reduction in terms of tonal distinctiveness varied as a function of the Preceding tone (i.e., the reduction was much more severe after an on-focus High tone than an on-focus Low tone), the degree of distinctiveness in the lexical tonal contours quite consistently differentiated the post-focus from the on-focus tonal realization. We propose that the weak implementation of post-focus lexical tones is due to the fact that they are associated with prosodically non-prominent constituents and are therefore hypoarticulated. Such hypoarticulation makes it possible for the preceding on-focus lexical tones to exert a strong influence on the post-focus tones and this, together with the requirement for focal elements to have a downstream F0 effect, makes it difficult for effective implementation of the post-focus lexical tonal targets which, consequently, are realized with reduced degrees of distinctiveness compared to their on-focus counterparts. In other words, the long-lasting effect of the preceding tone, the sometimes observed post-focus pitch range expansion, and the consistently observed lack or reduced degree of distinctiveness in post-focus tonal contours may all be due to the weak implementation of post-focus tonal targets. Thus, the data on post-focus tonal realization provide additional evidence that the phonological reflex of contrastive focus does not have to differ between Standard Chinese, a tonal language, and West-Germanic languages such as English and German. Focus in both types of languages introduces prosodic prominence, as argued by Chen and Gussenhoven (2008). While both types of languages show the effect of pitch range compression, the variation between the two typologically different groups

On-focus

Fig. 8. F0 contours of the four lexical tones ([H]igh, [R]ising, [L]ow, and [F]alling) uttered in two discourse contexts (post-focus in Fig. 6a; on-focus in Fig. 6b). The tones were elicited in a carrier sentence (not shown here) with a High tone as the preceding tone.

524

Y. Chen / Journal of Phonetics 38 (2010) 517–525

of languages lies in their different instantiation of the prosodic prominence in terms of their available phonological organization and phonetic cues. In West-Germanic languages, post-focus leads to deaccenting while in Standard Chinese, post-focus leads to reduction in tonal realization. The degree of tonal reduction is determined, in part, by the tonal contexts. The post-focus data reported here thus contribute further to the growing body of evidence that the prosodic encoding of information status is achieved by different grammatical correlates in different languages, depending on the general properties of the language and the available phonological and phonetic cues for prominence marking in particular linguistic contexts (see, e.g., Arvaniti, Ladd, & Mennen, 2006 for focus marking in Greek). Three questions, however, remain urgent for future investigation. One concerns the specific definition of prosodic prominence as introduced by information status variation and how it relates to the prosodic structure of the language in general. A second one concerns the way prosodic prominence is instantiated over focused constituents which are larger than one-syllable domains (as investigated in this project).5 Lastly, it is also important to examine whether contrastive focus is distinguishable from other types of focus such as WH-elicited focus which are typically based upon semantic alternatives. Methodologically speaking, a fourth question arises concerning approaches that allow objective and statistical comparisons in terms of the distinctiveness of continuous tonal contours. Functional data analysis (FDA) (Ramsay and Silverman, 2005) might be an alternative to consider as it allows the deformation or warping of entire, continuous F0 trajectories over time to be characterized and compared within and across subjects in evaluating linguistic variables of interest. Lee, Byrd, and Krivokapic (2006) have successfully applied FDA to examine articulatory speech production data, where, e.g., speakers modulate the spatiotemporal organization of articulatory gestures as a function of their phrasal position, comparable to continuous modulation of F0 trajectories in research on tone and intonation realization. By definition, lack of acoustic distinctiveness reduces the perceptual salience of the tonal contours, which predicts difficulty in tonal category identification. This prediction was indeed born out in a pilot study. The target syllables Y in (3) were excised out of the template sentence and presented to listeners for identification. The rate of correct identification for on-focus tones was well above 90% while post-focus tones showed a much lower rate (with an average of about 65%). The perceptual difference between the distinct tonal contrasts in the on-focus position and reduced distinctiveness in the post-focus position follows from the general functionalist hypothesis of speech communication (Lindblom, 1990). Focused elements, which are important and highlighted, require robust acoustic cues so that they can be more readily differentiated from contrastive categories and such distinctiveness facilitates fast and accurate perception by the listeners to ensure efficient speech communication. By contrast, post-focus elements are background information which permits speakers to be more egocentric and consequently attenuate the speech signal as listeners do not usually require robust cues for comprehension. In conclusion, while post-focus F0 range compression is often observed, it cannot be the only and primary characteristic of post-focus tonal realization, which contradicts a direct relation

5 The effect of focus in Chen (2003) and Chen and Gussenhoven (2008) has been limited to mono-syllabic constituents. It is not clear how the modification of segments and tones is achieved over larger focus domains. Chen (2006), with durational data on four-syllable monomorphemic words, suggests that focusinduced lengthening is sensitive to the metrical structure of words. Further studies are certainly needed to examine the effect of focus on tonal and segmental realization of large focused domains with diverse morpho-syntactic structures.

between pitch range manipulation and focus realization. Furthermore, we argue that the long-lasting effect of the preceding tone, the sometimes observed post-focus pitch range expansion, and the consistently observed lack or reduced degree of distinctiveness in post-focus tonal contours can be viewed as the multifaceted manifestations of the weak implementation of post-focus tonal targets, as they are associated with prosodically non-prominent constituents.

Acknowledgements Questions and comments from Ellen Broselow, Carlos Gussenhoven, Marie Huffman, Chilin Shih, and Yi Xu on the topic in earlier write-ups are gratefully acknowledged. I would also like to thank Vincent van Heuven, the two anonymous reviewers, and the editor Kenneth de Jong for their comments on an earlier version of this letter. The speakers not only made this experiment possible but also fun, for which I am very thankful. Support from the Netherlands Organization for Scientific Research (NWO-VIDI 016084338) and the European Research Council (ERC-Starting Grant 206198) is gratefully acknowledged.

References Arvaniti, A., Ladd, D. R., & Mennen, I. (2006). Phonetic effects of focus and ‘‘tonal crowding’’ in intonation: Evidence from Greek polar questions. Speech Communication, 48, 667–696. Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press. Bartels, C., & Kingston, J. (1994). Salient pitch cues in the perception of contrastive focus. In P. Bosch, & R. van der Sandt (Eds.), Focus and natural language processing, Vol. 1 (intonation and syntax). (pp. 1–10). Heidelberg. Baumann, S., Grice, M., & Steindamm, S. (2006). Prosodic marking of focus domains—categorical or gradient? In Proceedings of speech prosody 2006 (pp. 301–304). Dresden, Germany. Boersma, P., & Weenink, D. (1996–2005). Praat: Doing phonetics by computer. Available at /http://www.praat.org/S. Chen, M. Y. (2000). Tone sandhi: Patterns across Chinese dialects. New York: Cambridge University Press. Chen, Y. (2003). The phonetics and phonology of contrastive focus in Standard Chinese. Ph.D. dissertation, State University of New York, Stony Brook. Chen, Y. (2004). Focus and intonational phrase boundary in Standard Chinese. In Proceedings of the IEEE International Symposium on Chinese Spoken Language Processing (ISCSLP 2004) (pp. 41– 44). Hongkong. Chen, Y. (2008). The acoustic realization of Shanghai vowels. Journal of Phonetics, 36, 629–648. Chen, Y. (2006). Durational adjustment under corrective focus in Standard Chinese. Journal of Phonetics, 34, 176–201. Chen, Y. (in press). Prosody and information structure mapping: Evidence from Shanghai Chinese. Chinese Journal of Phonetics. Chen, Y., & Gussenhoven, C. (2008). Emphasis and tonal implementation in Standard Chinese. Journal of Phonetics, 36, 724–746. Chen, Y., & Xu, Y. (2006). Production of weak elements in speech: Evidence from F0 patterns of neutral tone in Standard Chinese. Phonetica, 63, 47–75. Chen, Y., & Yuan, J. (2007). A corpus study of the 3rd tone sandhi in Standard Chinese. In Proceedings of the interspeech 2007 (pp. 2749–2752). Antwerp, Belgium. Cho, T. (2005). Prosodic strengthening and featural enhancement: Evidence from acoustic and articulatory realizations of /], i/ in English. Journal of the Acoustical Society of America, 117, 3867–3878. Cooper, W. E., Eady, S. J., & Mueller, P. R. (1985). Acoustical aspects of contrastive stress in question–answer contexts. Journal of the Acoustical Society of America, 77, 2142–2156. Cruttenden, A. (2006). The deaccenting of given information: A cognitive universal. In G. Bernini, & M. L. Schwarz (Eds.), The pragmatic organization of discourse. Berlin: Mouton de Gruyter. Cutler, A. (1984). Stress and accent in language production and understanding. In D. Gibbon, & H. Richter (Eds.), Intonation, accent, and rhythm: Studies in discourse phonology (pp. 77–90). New York: de Gruyter. de Jong, K. J. (1995). The supraglottal articulation of prominence in English: Linguistic stress as localized hyperarticulation. Journal of the Acoustical Society of America, 97, 491–504. Duanmu, S. (2000). The phonology of Standard Chinese. Oxford: Oxford University Press.

Y. Chen / Journal of Phonetics 38 (2010) 517–525

Eady, S. J., Cooper, W. E., Klouda, G., Mueller, P., & Lotts, D. (1986). Acoustical characteristics of sentential focus: Narrow vs broad and single vs. dual focus environments. Language and Speech, 29, 233–251. Erickson, D. (2002). Articulation of extreme formant patterns for emphasized vowels. Phonetica, 59, 134–149. Fe´ry, C., Fanselow, G., & Krifka, M. (Eds.). (2007). The notions of information structure—Interdisciplinary studies of information structure, Vol. 6). Potsdam: Univ.-Verl. ¨ Fe´ry, C., & Kugler, F. (2008). Pitch accent scaling on given, new and focused constituents in German. Journal of Phonetics, 36, 680–703. Garding, E. (1987). Speech act and tonal pattern in Standard Chinese. Phonetica, 44, 13–29. Grønnum, N. (1992). The groundworks of Danish intonation: An introduction. Copenhagen: Museum Tusculanum Press. Gussenhoven, C. (1983). Testing the reality of focus domains. Language and Speech, 26, 61–80. Gussenhoven, C. (1984). On the grammar and semantics of sentence accents. Dordrecht: Foris. ¨ Gussenhoven, C. (2007). Types of focus in English. In C. Lee, M. Gordon, & D. Buring (Eds.), Topic and focus: Crosslinguistic perspectives on meaning and intonation (pp. 83–100). Heidelberg, New York, London: Springer. Ishihara, I., & Fe´ry, C. (2006). Phonetic correlates of second occurrence focus. In Proceedings of the 36th North-Eastern Linguistics Society (NELS 36) (pp. 371–384). GLSA, University of Massachusetts, Amherst. Jin, S. (1996). An acoustic study of sentence stress in Mandarin Chinese. Ph.D. dissertation. Ohio State University, Columbus. Ladd, D. R. (1980). The structure of intonational meaning: Evidence from English. Bloomington: Indiana University Linguistic Club. Ladd, D. R. (1996). Intonational phonology. Cambridge: Cambridge University Press. Lee, S., Byrd, D., & Krivokapic, J. (2006). Functional data analysis of prosodic effects on articulatory timing. Journal of the Acoustical Society of America, 119, 1666–1671. Liao, R. (1994). Pitch contour formation in Mandarin Chinese: A study of tone and intonation. Ph.D. dissertation. Ohio State University, Columbus. Lindblom, B. (1990). Explaining phonetic variation: A sketch of the H&H theory. In M. Hardcastle (Ed.), Speech production and modeling (pp. 403–439). Dordrecht: Kluwer Academic Publishers. Nooteboom, S. G., & Kruyt, J. G. (1987). Accents, focus distribution, and the perceived distribution of given and new information: An experiment. Journal of the Acoustical Society of America, 82, 1512–1524.

525

Pierrehumbert, J. (1980). The phonology and phonetics of English intonation. Ph.D. dissertation, MIT, Cambridge. Ramsay, J. O., & Silverman, B. W. (2005). Functional data analysis (2nd ed.). New York: Springer-Verlag. Selkirk, E. O. (1984). Phonology and syntax: The relation between sound and structure. Cambridge: MIT Press. Selkirk, E. (1996). Sentence prosody: Intonation, stress, and phrasing. In J. Goldsmith (Ed.), The handbook of phonological theory (pp. 550–569). Oxford: Blackwell. Selkirk, E. (2002). Contrastive FOCUS vs. presentational focus: Prosodic evidence from right node raising in English. In Proceedings of speech prosody 2002 (pp. 643–646). Aix-en-Provence, France. Selkirk, E. (2006). Contrastive focus, givenness and unmarked status of ‘‘discoursenew’’. In C. Fe´ry, G. Fanselow, & M. Krifka (Eds.), The notions of information structure: Interdisciplinary studies of information structure, Vol. 6 (pp. 124–146). Shih, C.-L. (1988). Tone and intonation in Mandarin. Working Papers of the Cornell Phonetics Laboratory, 3, 83–109. Terken, J., & Nooteboom, S. G. (1987). Opposite effects of accentuation and deaccentuation on verification latencies for ‘given’ and ‘new’ information. Language and Cognitive Processes, 2, 145–163. Xu, Y. (1995). The effect of emphatic accent on contextual tonal variation. In Proceedings of the XIII International Congress of Phonetic Sciences (pp. 668–671). Xu, Y. (1997). Contextual tonal variations in Mandarin. Journal of Phonetics, 25, 61–83. Xu, Y. (1999). Effects of tone and focus on the formation and alignment of F0 contours. Journal of Phonetics, 27, 55–105. Xu, Y. (2005). Speech melody as articulatorily implemented communicative functions. Speech Communication, 46, 220–251. Xu, Y., & Wang, Q. E. (2001). Pitch targets and their realization: Evidence from Mandarin Chinese. Speech Communication, 33, 319–337. Xu, Y., & Xu, C. X. (2005). Phonetic realization of focus in English declarative intonation. Journal of Phonetics, 33, 159–197. Yuan, J. (2004). Intonation in Mandarin Chinese: Acoustics, perception, and computational modeling. Ph.D. dissertation, Cornell University, Ithaca. Zemlin, W. R. (1988). Speech and hearing science—Anatomy and physiology. Englewood Cliffs, New Jersey: Prentice-Hall. Zerbian, S. (2006). Expression of information structure in the Bantu language Northern ¨ zu Berlin, Berlin. Sotho. Ph.D. dissertation, Humboldt-Universitat