Journal of Phonetics (1994) 22, 357-388
Stress shift and early pitch accent placement in lexical items in American English S. Shattuck-Hufnagel Massachusetts Institute of Technology , 77 Massachusetts Ave., 36-511, Cambridge, MA 02139
M. Ostendorf and K. Ross Boston University, 44 Cummington St., Boston, MA 02215 Received 20th November 1992, and in revised form 28th October 1993
The perception of early prominence in late-main-stress words like Mississippi has been described in both metrical and intonational terms . Metrical theory views early prominence as the result of rhythmic stress shift under conditions of stress clash in the metrical grid, while intonation theory attributes early prominence to a tendency for speakers to place the first pitch accent of a phrase as early as possible. We describe an integrated theory of structural and rhythmic aspects of pitch accent placement that combines parts of both approaches, andpresent evidence to support the theory from perceptual and acoustic analyses of a speech corpus produced in the FM radio news style. We find that early accent placement occurs in contexts which would otherwise result in pitch accent clash, and that the initial accent in an intermediate intonational phrase tends to be located early in its word. Double accents are more common on words that carry all the accents in a phrase (as predicted by phrase onset marking), particularly for words with alternating rather than adjacent lexical stress. Acoustic evidence supports the claim that perceived early prominences typically coincide with pitch accents, and replicates previous results showing no increase in duration for these accented syllables.
1. Introduction
A notable fact about some words of English is that their major perceptual prominence can occur on a syllable to the left of the main-stress syllable , under certain conditions. Words that can undergo this apparent "stress shift" have an unreduced vowel preceding the main-stress syllable; this unreduced vowel can carry the shifted prominence, and is often described as having secondary stress. Examples commonly cited in the literature include Massachusetts (e.g. as often produced in the phrase the Massachusetts miracle), thirteen (e.g. the thirteen men) and Japanese (e .g. Japanese food). Phoneticians have occasionally described apparent shift; for example, Jones (1939) refers to it as rhythmical variation in the pronunciation of individual words. But the first systematic treatments within the framework of a 0095-4470/94/040357 + 32 $08 .00/0
© 1994 Academic Press Limited
358
S. Shattuck-Hufnagel et al.
grammar did not appear until the emergence of more comprehensive phonological theories of phrase-level prosody , beginning in the 1960s. Three major accounts of early prominence in the word can be identified in those theories: one based on rhythmic stress , one on intonational prominence, and a third account which combines rhythmic and intonational factors . These accounts share the view that the apparent shift of prominence to an early syllable in a word like Massachusetts is perceived when the main-stress syllable is produced with less prominence than its earlier strong syllable (or perhaps with equal prominence) . But the three approaches make very different claims about the sentential contexts that will induce apparent shift, about the dominant acoustic correlates associated with shift , and about the kinds of prosodic representations that must be included in a . model of the speech production planning process in order to account for this behavior by speakers. In the studies described below, we tested some of the predictions made by these theories, using data from perceptual and acoustic analyses of continuous speech . In Section 2. we provide the background for our studies, and lay out the questions we set out to explore empirically. In Section 3. we describe our analysis methods , and in Section 4. we summarize the experimental results for each question explored. Finally, in Section 5. we discuss the implications of our findings for various aspects of speech research , and point out some remaining questions not answered by our data. 2. Background In this section we review the two major ph.onological accounts of apparent stress shift, one based on rhythmic stress regularities related to the metrical grid (Liberman and Prince, 1977) , and the other based on constraints on the placement of Fa markers known as pitch accents (Bolinger 1958) . We then describe a third class of theories that integrates aspects of both rhythm and pitch accent placement, and summarize the results of previous experimental tests of these theories. Finally, we present the experimental questions we set out to investigate. 2.1. Phonological approaches 2.1.1. The rhythm-based view Liberman (1975) gave the phenomenon of apparent stress shift an important role as evidence for his tree-based theory of metrical phonology, and since then a number of theoretical proposals have suggested that stress shift provides persuasive arguments in support of the metrical grid. The metrical grid is a representational device which indicates the degree of rhythmic stress on a syllable by marking more or fewer contiguous cells in a vertical column of cells erected above each syllable. The rows and columns of cells above the syllables of an utterance form a grid-like matrix, as illustrated in Fig. 1, where each column of X's corresponds to a syllable nucleus and each row to a level or degree of rhythmic prominence (Liberman and Prince, 1977; Prince, 1983; Hayes, 1984; Selkirk, 1984; Nespor and Vogel, 1986, 1989). The description given here combines the work of a number of investigators into a composite picture , despite the significant differences among them (e.g. Hayes'
Pitch accent placement within words
, ~' I
,'
__,
,,
359
X
I
X
X
X
X
X X
X
X
X X X
Massachusetts miracle Figure 1. Metrical grid for the utterance Massachusetts miracle. Stress clash (defined by the adjacent X's enclosed in the box) provokes a shift of a stress marker leftward to an earlier strong syllable in the· first word .
1984a, rules of grid euphony vs. Selkirk's, 1984, euphony; Nespor and Vogel's, 1986, hierarchy of prosodic constituents vs. Selkirk's, 1984, hierarchy), and focuses on the aspects of the metrical approach that bear directly on the issue of early prominence within the word. In these metrical models, lexical stress corresponds to the marking of a certain number of cells in the column. When words are concatenated into a phrase , and phrase-level stress is assigned, the resulting pattern of markings in the grid may create a stress clash. That is, two rhythmically strong syllables in the same phrase may be adjacent or nearly-adjacent at several levels in the grid, so that a tendency toward placement of heavy stresses at more equal intervals requires that one of the prominences be moved to a syllable that is farther away. Observation has shown that the left-hand member of a pair of clashing stresses, which (according to the nuclear stress rule of English) is normally the weaker one, appears to shift further left, away from the stronger clashing stress on its right. An example that is often cited to illustrate the claims of the metrical approach compares the phrase the Mississippi Legislator with the Mississippi Legislation. The Mississippi Legislator, with the main-stress of Legislator on its first syllable, has the potential for a rhythmic ·stress clash between the main-stress syllables of the two words; the extrametrical syllable -pi does not prevent the clash. As a result, Mississippi is predicted to undergo rhythmic stress shift, with the effect that an early prominence is heard on this word. In the Mississippi Legislation, the main stress of Legislation occurs on its third syllable, so that rhythmic stress clash between the main-stress syllables of the two words does not arise, shift is not predicted, and no early prominence should be heard. In these metrical theories, rhythmic stress is viewed as a unitary dimension, whether it occurs at the lexical or at the phrasal level. All X-marks in the cells of the grid are of the same variety, and differences in degree of stress are reflected simply in the number of such marks in the vertical columns. Phrase-level stress occurs on the main-stress syllables of words, except when this results in an irregular pattern of rhythmic stress, in which case phrase-level stress can move to an earlier syllable in its word, as described above. The acoustic correlates of the moved prominence, as for other rhythmic stress prominences, are not specified, although the discussion
360
S. Shattuck-Hufnagel et al.
often implicates tlmmg. In particular, prominences at different levels in the hierarchy of prosodic constituents (e.g. lexical vs. phrasal) are not distinguished by differences in their acoustic correlates. An exception to this general treatment is found in Selkirk (1984, p. 273), where a distinction is made between rhythmic stress shift and concomitant pitch accent re-association (see below). 2.1. 2. The intonation view Well before the emergence of tree- and grid-based theories of stress that accounted for apparent stress shift in terms of rhythmic stress regularity, several models of the phenomenon of early prominence in the word had been proposed based on intonation. Bolinger (1965) made use of the concept of a pitch accent, or prominence-lending F0 marker. He proposed two main intonational markers for phrasal prominence: one pitch accent early in the phrase (themic accent) and another late in the phrase (rhemic accent). He noted that speakers prefer to place the first accent of an intonational phrase as. far as possible from the last accent. As a result, a target word like Mississippi, that has a full-vowel syllable to the left of its main lexical stress, might receive an early accent on that first full-vowel syllable if this accent is the first one in a new phrase. Developments in a separate line of research at IPO in the Netherlands also suggested the possibility that an early F0 marker in the phrase is a common occurrence. Although they· did not discuss early prominence directly, 't Hart and Collier (1975) described a theory of the possible F0 contours in Dutch which included a prominence-lending "onset rise" as a frequently-occurring aspect of simple declarative utterances. This work was later extended to British English (Willems, 1982; dePijer, 1983) and American English (Maeda, 1974), and was recently summarized in 't Hart, Collier and Cohen (1990). Shattuck-Hufnagel (1988, 1991) discussed early prominence in terms of pitch accent placement, noting that speakers have more than one option for the location of pre-nuclear pitch accents. Adopting Bolinger's view that lexical stress is not cued by F0 markers, but rather defines the potential sites for pitch accent location, she postulated that words with one or more strong syllables preceding the main-stress syllable can receive early-accent only, main-stress accent only, or double accent, depending on a number of factors. One of these factors, as in Bolinger (1965), is the speaker's propensity to mark the onset of a new intonational phrase by placing the first pitch accent as early as possible, which sometimes leads to early pitch accent placement within the word . Beckman and colleagues (Beckman, De Jong, and Edwards, 1987; Beckman and Edwards, 1990) also suggest that pitch accent placement might be responsible for the perception of early prominence in the word, noting (Beckman and Edwards, 1994) that speakers may tend to place a pitch accent on the earliest accentable syllable in an intonational phrase. These intonation-based theories of early prominence share the view that phrase-level prominence is marked by F0 , and that intonation contours often place a prominence-lending F0 marker early in a new intonation phrase . Bolinger's theory suggests that a phrase can contain only two pitch accents, so it makes no provision for multiple pre-nuclear accents in the phrase, and the IPO theory does not systematically address the question of which syllables of a word or phrase the prominence-lending F0 markers will occur on . In their early work, neither
Pitch accent placement within words
361
Shattuck-Hufnagel nor Beckman and colleagues take the influence of rhythm on pitch accent location into account. These difficulties have been addressed by the evolution of integrated theories that combine rhythmic and intonational factors to account for early prominence in the word. 2.1. 3 An integrated view A number of investigators have described theories in which the regularities of prominence placement reflect aspects of both the rhythm-based and intonationbased views . This integrated view involves two main arguments. The first is that different types of prosodic prominence (e.g. phrase-level vs. lexical level) are associated with different dominant acoustic cues, in contrast to the prevailing view that " stress" is a single dimension with varying degrees of strength. The second is that rhythmic considerations influence the location of phrase-level prominencelending F0 markers, called pitch accents in some (but not all) of these theories. That is , just as the unreduced (i.e . lexically-stressed) and reduced syllables of a phrase form a rhythmic pattern, so do the pitch-accented and non-pitch-accented syllables; Bolinger (1981) describes these as "two kinds of rhythm". To some extent, speakers tend to regularize these rhythms in an utterance. Vanderslice and Ladefoged (1972) published a landmark paper in which they distinguished four levels of prosodic prominence, characterized by different acoustic correlates (see also Ladefoged, 1975). In current terminology, they distinguished reduced syllables (characterized by reduced timing), stressed syllables (full articulation), accented syllables (characterized as receiving extra prominence from increased respiratory energy and laryngeal adjustment causing a pitch obtrusion , called "accented" but not "intonational"), and syllables with the nuclear pitch accent of the phrase (considered "intonational"). Although they did not explicitly describe the contexts in which early prominence in the word occurs, Vanderslice and Ladefoged postulated that early prominence was associated with the occurrence of an F0 accent on the early syllable, "leaving the determination of which accents are ultimately to be realized phonetically to the care of a late rhythm rule ... " (p. 829). Thus, Vanderslice and Ladefoged linked accent patterns to rhythmic considerations, but did not specify how the rhythm rule would work or in what contexts it might result in early accent placement in the word. Subsequent discussions (Bolinger, 1981; Gussenhoven, 1987, 1991 ; ShattuckHufnagel, 1922, 1994) have expanded on this combined intonational/rhythmic view, suggesting that early prominence in the word is associated with a pitch accent on the early syllable, either as the result of early placement for the first accent in the intonational phrase, or as the result of removal of the main-stress accent to avoid pitch accent clash with an accent on the following word. In this view , rhythmic prominence is expressed in terms of pitch accents rather than in terms of stress, and rhythmic clash is defined as a context in which two pitch accents can occur in close proximity. Gussenhoven's integrated theory postulates that lexical accents, which occur on all potentially accentable syllables, are deleted alternately moving leftward from the final or nuclear accent of the phrase, subject to further constraints of constituent structure. An important aspect of this theory is that the accent on the initial accentable syllable of the phrase is not deleted; thus, when the first accentable
362
S. Shattuck-Hufnagel et al.
syllable in a phrase is a secondary-stress syllable in a early prominence candidate word like Mississippi, a pitch accent will occur on that syllable. This aspect of Gussenhoven's model leaves room for double accenting on a phrase initial word, e.g. on the word Massachusetts in the phrase The Massachusetts celebration. This is because accenting all potential accent sites and then deleting accents leftward from the nuclear accent on -bra- results in deletion of the accent on eel-, no deletion of the accent on -chu-, and also no deletion of the accent on Mass- since it is the initial accent site in the phrase. Thus the model predicts early accent placement in the word when the word contains the first accent site in its phrase. Guessenhoven takes the important further step of hypothesizing that different acoustic correlates are cues to different aspects of prosodic structure, stating that "The three parameters associated with stress (duration, degree of qualitative reduction and pitch) are thus separately encoded, which makes the claim that the correct generalizations concerning stress manipulate only one of these aspects at a time" (p. 2). Beckman and Edwards (1990) move even further in the direction of hypothesizing different dominant acoustic correlates for different kinds of prominence. They postulate that the constituents in the prosodic hierarchy are defined by the type of prominence that heads them (e.g. strong vowel syllables for stress feet, nuclear pitch accents for intermediate intonational phrases), and that the dominant acoustic correlates for prominence are different for the heads of different constituents (e.g. duration and vowel quality· for the heads of stress feet, Fa markers for the heads of phrases). In Beckman and Edwards (1994), these investigators draw out the implications of their theory for early prominence within the word, noting that "there might be an impetus to associate (the initial accent in a phrase] with the first readily accentable syllable". The metrical, intonational and integrated approaches to early prominence within the word have a good deal in common, particularly the view that the perception of early prominence can be associated with the loss of prominence from the late main-stress syllable of a target word. They differ sharply, however, with respect to a) the contexts that will induce early prominence, e.g. metrical theory's rhythmic stress clash vs. Bolinger's pitch accent clash, and intonation-based phrase onset marking vs. no onset marking; b) the explicitness of their claims about the acoustic correlates that accompany early prominence, e.g. explicit for intonation and integrated theories, Jess explicit for metrical stress theories; and c) the provision they make for the possibility of double accenting within a word. We will briefly review the experimental literature that bears on these conflicting claims. 2.2. Previous experimental tests
Initial work on early prominence relied on investigator intuitions as primary data, i.e. on judgments about the well-formedness of various prominence patterns on a word when it occurs in different contexts. Such intuitions are invaluable for determining well-formedness; in fact, they are the only means of determining the facts about grammaticality. Once the grammatical theory exists, however, it is possible to carry out empirical tests of the predictions it generates about the location of perceived prominences within the word in clash and non-clash contexts, and about the acoustic correlates of these prominences. In particular, it is possible to test
Pitch accent placement within words
363
the notion that the well-formed strings revealed most readily by intuitions exhaust the range of possible prosodic treatments for a given text. Metrical theories of rhythmic stress clash and shift were first tested with perceptual and acoustic measurements by Cooper and Eady (1986). They found little evidence for increased duration or Fo in the early syllable for target words produced in rhythmic stress clash and non-clash contexts, for example in pairs of single citation-form phrases such as Pennsylvania and Pennsylvania relatives, and Tennessee and Tennessee connections. These results might be expected if, as the integrated theory of early prominence permits, the early syllable of the target word carried a pre-nuclear pitch accent in both the clash and non-clash contexts, because it is the first accentable syllable of the phrase. Beckman, et al. (1987) reported a series of experiments in which speakers produced target words like Chinese and fifteen both in rhythmic clash contexts (Chinese dresser, fifteen rabbits) and non clash contexts (Chinese antique, fifteen antiques) in a tableau description task. They found that speakers either placed a pre-nuclear pitch accent on the first syllable of the target word and a nuclear accent on the first syllable of the following word, or (if the entire phrase was post-nuclear) no pitch accent on either word. They report that these intonations were perceived as giving equal or greater prominence to the first syllable of Chinese and fifteen , compared to the second syllable , and found that this perceived prominence pattern occurred in the majority of tokens , even for non-clash contexts. The type of phrase in question here is presumably the intermediate intonational phrase, defined by Beckman and Pierrehumbert (1986) as a prosodic constituent that is the domain of a coherent intonational contour. This constituent is characterized by the presence of one or more pitch accents, the last of which is termed the nuclear accent, and by the occurrence of a phrase tone which controls the F0 contour between the nuclear accent and the end of the phrase. One or more intermediate intonational phrases form a full intonational phrase , characterized by the additional phenomenon of a high or low boundary tone, which is realized on the final syllable of the phrase. Most theories agree in predicting that the nuclear pitch accent in a phrase must occur on the main-stress syllable of its word; Beckman and Pierrehumbert (1986) suggest that the accent that will exhibit this behavior is defined as the last one in an intermediate intonational phrase. Beckman, Swora, Rauschenberg, and DeJong (1991) also asked listeners to label stress, and experienced prosodic labellers transcribed the pitch accents and intonational phrase boundaries for utterances of the same target words and phrases. Results again showed that listeners perceived early prominence in the target word in both clash and non-clash contexts, and that stress and accent markings agreed in 74% of cases. Beckman et al. conclude that "relative stress in words such as Chinese , where both syllables are strong, is a matter of accent placement." These results are compatible with the view that early prominence in the word results from early placement of a pitch accent. Shattuck-Hufnagel (1988, 1991 , 1992) examined the location of perceived prominence within the word in rhythmic clash and non-clash contexts like Mississippi legislator and Mississippi legislation. Like Beckman and colleagues, she found evidence that speakers have multiple options for the placement of prominence within pre-nuclear words in a given phrase. In addition, she reported that where prominence was perceived on the early syllable of a target word, there was usually
364
S. Shattuck-Hufnagel et al.
evidence for a pitch accent in the form of a substantial prominence-lending Fo excursion on that syllable . Horne (1990) tested the prediction (from Gussenhoven's, 1987, theory of alternate pitch accent deletion) that the Fo excursion on the main stress syllable of a target word should be smaller in cases where this syllable is the middle of three accent sites in a phrase, since the pitch accent on that syllable is deleted. She used four adjacent-stress target words , produced in three contexts: in isolation; in the phrases Dundee tartan, canteen cook, postpone meetings and maintain roads ; and in more complex sentential environments. For two of the four target words, the prediction that the main-stress syllable has a smaller Fo excursion in stress clash contexts was confirmed , suggesting that pitch accents are deleted from the main-stress syllable in at least some of the contexts predicted by Gussenhoven's accent-deletion theory. Taken together, the weight of the existing empirical evidence favors the hypothesis that at least some instances of perceived early prominence within words are accompanied by a pitch accent on the early syllable and no pitch accent on the main-stress syllable of the target word. This point of view is compatible with the integrated theory of pitch accent placement within words. But previous analyses have not combined perceptual analysis of prominence location with acoustic analysis in a substantial corpus of continuous speech that has been extensively labelled with prosodic constituent boundaries as well. We have taken advantage of the existence of such a corpus to carry out investigations of the questions presented in the next section.
2. 3. Experimental questions The prosodically-labelled BU FM radio news speech corpus described below provides the opportunity to address the following questions about prominence placement within words in this style of American English speech: 1) Do listeners hear early prominence in this sample of continuous speech? Although investigators (Beckman et a/. 1987, 1991 ; Shattuck-Hufnagel , 1989) have shown that listeners unfamiliar with theories of prosody can perceive early prominence on target words in both citation form and spontaneous laboratory speech, many native speakers of English do not share the intuition that this phenomenon occurs. Therefore it is important to establish that speakers who are engaged in the communicative task of announcing radio news actually produce utterances that contain early prominences on the words that are candidates for this phenomenon. It is particularly important to demonstrate early prominence in communicative running speech , where most examples are not likely to be produced with contrastive stress, as might be the case for contrastive pairs of isolated sentences produced in the laboratory. Like Beckman et a/. 's (1987) tableau description task, the BU FM radio news corpus provides a sample of task-driven speech addressed to a listener with whom the speaker is trying to communicate, and in this sense can be expected to reflect much of the prosodic behavior found in everyday speaking. 2) Is this early prominence associated with the occurrence of a pitch accent on the early syllable? The results of Horne (1990) suggest that pitch accent clash contexts
Pitch accent placement within words
365
are associated with a lack of pitch accent on the main-stress syllable of target words , and those of Beckman et al. (1987, 1991) and Shattuck-Hufnagel (1991) suggest the presence of a pitch accent on the early syllable, but these observations have not been confirmed in a large corpus of continuous speech that is labelled for prosodic constituent boundaries, boundary tonal phenomena and syllables with pitch accents. The BU radio news corpus permits evaluation of the claim that the perception of early prominence is associated with the occurrence of a pitch accent. 3) Does early prominence occur in contexts where it prevents pitch accent clash? Bolinger (1965) suggests that speakers avoid pitch accent clash, and Gussenhoven's (1987, 1991) model of alternate accent deletion ensures a similar result in many cases. Is this pattern observed in our corpus? Do speakers place pitch accents early in the word in contexts where a main-stress accent would create a pitch clash, i.e. when the following word in the same intermediate intonational phrase has a pitch accent on its initial syllable? 4) When early prominence cannot be accounted for by an accent clash, can it be accounted for as phrase onset marking? Since the boundaries of intonational phrases are marked in this corpus, it is possible to compare the within-word location of the first prominence in a phrase with the location of prominences that are phrase-medial or phrase-final. If the phrase onset marking aspect of the integrated theory is correct, then phrase-initial accents should systematically occur early in target words, while phase-medial accents need not. 5) Is rhythmic stress clash required for early prominence placement? Metrical theorists (Liberman, 1975; Liberman and Prince, 1977; Prince, 1983; Selkirk, 1984, Hayes 1984, Nespor and Vogel 1986) suggest that early prominence occurs in the context of rhythmic stress clash, and does not occur in nonclash contexts. Is the presence of rhythmic stress clash required for early prominence perception in the target words in this corpus? We will test this hypothesis by analyzing tokens of target words that occur without rhythmic clash, to see if early prominence placement occurs under these conditions. For example, is early prominence perceived in the target word Massachusetts in a phrase like The Massachusetts Association, where no rhythmic stress clash occurs? 6) In what contexts does double prominence within a word occur? Not all of the theories described above explicitly account for the occurrence of multiple pitch accents on a single word, although several investigators observe that this can occur (e.g. Bolinger, 1981; Selkirk , 1984; Beckman et al., 1987; 1990; Gussenhoven, 1991). An integrated theory can allow for multiple accents, because all pre-nuclear full-vowel syllables are potential pitch accent docking sites; however, factors like pitch accent clash avoidance may work to reduce the likelihood of multiple accenting. The interesting question then becomes, under what conditions are potential multiple accents actually realized. Since the FM radio news style uses pitch accents very freely, it provides a good test of the limits on multiple accenting within the word. The prediction made by almost every theory-that phrase final (i.e. nuclear) accents occur on the main-stress syllables of their words--can also be evaluated empirically. In Section 3. we describe the corpus, the speakers, the method of collection and the labelling and analysis procedures, before presenting the results for each of these questions in Section 4.
366
S. Shattuck-Hufnagel et a!.
3. Methods
Empirical studies that will contribute to an adequate understanding of early prominence perception in the word require a) careful specification of the target words that are candidates for early prominence, and inclusion of a wide variety of words; b) perceptual analysis of the utterances containing the target words, with syllable-by-syllable labelling for prominence in order to ensure that early prominence has actually occurred and is heard by listeners; c) inclusion of a range of contexts for the target words, to test various hypotheses about the situations in which early prominence will occur, and d) acoustic analysis of the presumed correlates of early prominence . The BU FM radio news corpus, with phrase-level prominence labelled for each syllable, with the boundaries of intonational phrases marked, and containing several hundred words that are candidates for early prominence in a wide variety of contexts, provides an opportunity to extend earlier studies in these ways. In this section we present details on the corpus and experimental methods used in analyses aimed at answering the questions posed in Section 2. 3.1. Radio News Corpus
The corpus used for this study was a set of recorded FM public radio news broadcasts read by two female radio announcers. The stories were studio recordings of actual radio broadcasts, which were transcribed orthographically by a listener who did not have access to the original scripts. A total of forty-four radio news stories were available for this study, which together contained 13,728 words. The corpus represents a subset of the more than six hours of speech in the Boston University Radio News Corpus, available from the Linguistic Data Consortium at the University of Pennsylvania. The stories were hand-labelled for prosodic patterns, including presence vs. absence of phrase-level prominence on each syllable and seven levels of prosodic boundaries. The labellers relied primarily on auditory perceptual judgments, although in some cases they had access to Fa displays. In many cases, stories were labelled by multiple listeners who worked as a group and discussed any disagreements before assigning the labels. The prosodic boundaries or phrase breaks were labelled using a system of integer break indices, one index marked between each pair of words (Price, Ostendorf, Shattuck-Hufnagel, and Fong, 1991). The breaks ranged from "0" for a cliticized word boundary to "6" for a sentence boundary, and form a superset of those used in the ToBI system (Silverman et al., 1992). Fig. 2 shows an utterance from the corpus labelled with prominences and break indices, to illustrate the labelling system . The analysis presented here does not separate all 7 constituent boundary levels, but instead contrasts only the presence vs. absence of an intonational phrase boundary (whether intermediate or full); that is, we grouped together constituents marked with boundaries 0-2 as lacking an intonational phrase boundary, and constituents marked 3-6 as having such a boundary. In a separate study on radio-style speech , we have found that phrase-level prominence on syllables and prosodic breaks between words can be labelled with good agreement between labellers: over 90% agreement between labellers on presence vs. absence of prominence on syllables and agreement within 1 break level for over 98% of the breaks.
367
Pitch accent placement within words A
8
nineteen1
p
Massachusetts
can
enact
s1ss
p
1
8
PaPs
eighteen
one
s4 of
4s1s1 laws 1P
state P
constitutional 1s
twenty-
ssss1sP
three p
Ps1
by
plebiscite.
4s1
Pas
amendment
states 1
s
5P
where 5
made
citizens 1
p s s
4
6
Figure 2. Example of a labelled utterance in the radio news corpus, showing the orthographic transcription, accent labels (P for a syllable with prominence and s for no prominence) and break indices between words (0-6).
A subset of the corpus is also phonetically labelled using a speech recogmhon system with the known word sequences as a grammatical constraint; these alignments provide the durations for our acoustic analyses. The specific subset comprised all of the segmented data available at the time, a total of 9,854 words, which included utterances from only one of the two speakers discussed in this paper and excluded any utterances with background sound effects edited into the speech signal by the radio news producer. The BU recognition system was used with speaker-dependent, context-dependent models (Kimball, Ostendorf, and Bechwati , 1992). The model uses multiple pronunciation networks and cross-word phonological rules in order to obtain more accurate phone labels and segmentations. A preliminary analysis on a subset of stories (approximately 30k segments) showed that the RMS error between automatically marked and hand-corrected segment boundaries was less than one analysis frame (10 ms), with 82% of the boundaries within 15 ms and only 4% of the errors greater than 30 ms. It should be noted that the FM radio news speaking style is different from both non-professional read speech and spontaneous speech. In a pilot study, we found that the radio style tends to have more frequent and stronger prosodic markers than non-professional read speech (Price, Ostendorf, Shattuck-Hufnagel, and Veilleux, 1988), and in the data reported here, almost half of the words are marked with phrase-level prominence. However, during our interviews with the announcers, they reported that they strive for a natural style, and in our opinion the professional FM radio news speech is produced in a more natural and fluent style than the non-professional read speech used in many studies. It is prosodically well-formed according to the rules of English, so an adequate theory must be able to account for its patterns, and it is simpler to analyze than spontaneous speech because of the relatively low rate of disfluencies in the radio news. We believe that many of the same factors influence pitch accent location in the different styles, albeit with an increased frequency of pitch accents for radio news and perhaps some differences in the relative strength of the factors that determine speaker choices among potential pitch accent sites.
3. 2. Phonological analysis To test hypotheses about the factors that influence early prominence placement within a word, we analyzed all words that were considered candidates for "stress
368
S. Shattuck-Hufnagel et al.
shift", i.e. for location of a phrase-level prominence on a syllable earlier than the main stress syllable . Candidates were defined as the multi-syllable words in our sample that had a pre-main-stress syllable marked for secondary stress in Webster's Ninth Collegiate Dictionary (1984), e.g. Massachusetts, illegal, education, representative, university, and ecological. Candidates were eliminated if they were listed as having an alternative pronunciation with primary stress on an early syllable. A list of the word candidates and the number of times each one appeared in the radio news corpus is given in an Appendix. A total of 244 different candidate words appeared on our list, corresponding to 483 tokens in the corpus. Separately from the single-word tokens, we also analyzed late-stress hyphenated words (e.g. seventy-jive and vice-president) and strings of letter names (e.g. VA W, USA). Hyphenated words and letter-name strings were grouped together on the assumption that both may have more than one main-stress syllable underlyingly; in addition, the scarcity of tokens made it difficult to analyze them separately. The set included 99 different items, corresponding to 141 tokens in the corpus, of which more than one-third were hyphenated numbers (e.g. eighty-six, nineteen-eighteen). Other subclasses included two free morphemes (e.g. hard-hit, inner-city), and strong prefix plus free morpheme (e.g. anti-tax, co-founder). For some analyses, the single unhyphenated lexical items were subcategorized further , separating words with alternating stress patterns (e.g. Massachusetts, University, Association) from words with adjacent stress (statewide, shortchange, predates, illicit). This was done because the adjacent stress words were sometimes judged to be more resistant to early prominence then the alternating stress words, and it was sometimes difficult to determine whether the first of two adjacent stress syllables would be produced with a reduced or full vowel, as in minority and diverse. In addition, the dictionaries we consulted often disagreed about the presence or absence of secondary stress on a full-vowel syllable (and even occasionally about the location of main lexical stress) for adjacent stress words but never for words with an alternating stress pattern. These informal observations raised the possibility that adjacent stress words might behave less regularly with respect to early prominence, so some analyses were carried out separately for words with the two different kinds of lexical stress patterns . After the utterances had been perceptually labelled for prominences and boundaries, the prosodic labeling of the candidate word tokens was verified. This more careful analysis, carried out with access to visual displays of F0 contours in all cases, resulted in some changes to the initial perceptual labeling. Fewer than 10% of the tokens were changed, and almost all changes involved adding an accent to an already-accented word , increasing the number of double-accented words. In the process of verifying the prominence labels, we also observed some cases where multisyllabic words seemed to have an additional small subordinated prominence. For the purpose of this study , these prominences were not labelled as accents because they were perceived as a very different level of prominence. However, it could be argued that these syllables are indeed accented, which would result in some additional increase in the number of double accented words in our results. In order to answer some of the questions posed in the previous section, i.e. about the role of clash and of phrase onset marking, we tabulated the data according to the location of the labelled prominence in the intermediate intonational phrase, and other aspects of its prosodic context. Specifically, we separated word tokens according to
Pitch accent placement within words
369
the position of the accent in its phrase, the pitch accent pattern of the following word (if that word was in the same phrase), and the lexical stress pattern of the following word (if it was in the same phrase). We also separated early-accent-only, main-stress-accent-only and ~ouble accent tokens. As noted above, single orthographic words are analyzed separately from hyphenated words and letter name strings.
3. 3. Acoustic Analysis Acoustic analyses were conducted in addition to the phonological analyses, to determine whether there was evidence that the marked prominences are indeed pitch accents, as well as to determine whether there is an increase in duration for syllables with early prominence. F0 contours were estimated by an algorithm in Waves+ (software from Entropies) which is similar to that described by Secrest and Doddington (1983). Then, a subset of the corpus was labelled using the ToBI accent labeling system (Silverman et al., 1992), which provides a vocabulary of different pitch accent types rather than the simple P marker for prominence. A phonological labeling in terms of pitch accent types is important to Fo analysis and these labels were not available for the whole corpus. Specifically, we labelled the 57 instances of the word Massachusetts, in part because confining these analyses to tokens of a single word provided some degree of control over segment-related variation in F0 . It was found that all but one of the pitch accents fell into the set of high accents, which includes H* (high peak), L + H* (peak rising from a valley) and !H* (downstepped peak, i.e. a peak which is significantly lower than the peak of the previous accent). The one exception to this generalization, and a second token that was difficult to analyze for F0 because of "creaky voice", were omitted from the study, leaving 54 tokens. F0 measurements were taken by hand, including the F0 change in the Ma- syllable, the peak F0 values for both the Ma- and the -chu- syllables, and the peak F0 of the intermediate phrase containing the token. The duration analyses were based on the 282 single orthographic early accent word candidates for which phonetic alignments were available, which included 192 words marked with a single prominence and 56 words marked with two prominences. Duration statistics were computed to address the question of whether increased duration was associated with early prominence placement within the word, and to better understand the phenomenon of double accenting. The duration analyses involved finding the normalized duration for all secondaryand main-stress vowels in non-phrase-final position, and then separating cases that are unaccented, accented in clash context and accented in non-clash context. (In the duration studies, "clash context" is defined as the case where the target token is followed by a word with an accent in its first syllable, since the phonological study suggests that this category is most clearly a clash context.) We specifically chose to measure vowel duration, because the results of Crystal and House (1990) suggest that syllable duration varies significantly as a function of syllable structure, and that vowel duration is more affected by "stress" than consonant duration (though their definition of "stress" was closer to lexical stress than the phrase-level prominence of interest here) . The vowel duration is normalized by subtracting out a phonemedependent mean and dividing by a phoneme-dependent standard deviation. The use
370
S. Shattuck-Hufnagel et al.
of normalized duration , or the number of standard deviations from the adapted phoneme mean, enables us to factor out variability due to phoneme identity, and averaging over a large corpus helps factor out the many other sources affecting phone duration. The phoneme-dependent means and variances are estimated from the full set of phonetically aligned data (9 ,854 words).
4. Experimental results
The empirical results that summarize the perceptual labeling for pitch accent placement in candidate word tokens in different prosodic contexts are given in Table I for the single orthographic words and in Table II for the hyphenated words and letter names. The first column shows the position of the target word accent in its TABLE I. Accent location within word for different categories for the following word's pitch accents in words containing the .first (but not last) , medial, and .final (but not first) pitch accents in the intermediate phrase , and for words that contained the only accent in the phrase. These data represent only tokens considered to be single orthographic words Accent location in word Accent location in phrase First accent
Location of accent in next word A. First Syllable B . Non-first Syllable C. No Accent
Medial accent
D. First Syllable E . Non-first Syllable F. No Accent
Final accent
G.N/A
Only accent
H . N/A Total
Total 1. Early
48 6 22 15 5 4 5 9 114
2. Double 3. Main observed 10 5 12 1 3 1 13
45 90
13
71
9
19 47 22 14 16 114 101 405
13
6 6 11 96 47 201
TABLE II. In this case, the " words" include only hyphenated words and strings of letter names Accent location in word Accent location in phrase First accent
Location of accent in next word A . First syllable B. Non-first syll. C. NoP
Medial accent
D. First syllable E. Non-first syllable F. NoP
Final accent
G.N/A
Only accent
H . N/A Total
Total 1. Early
19 3 12 6 0 4 0 7 51
2. Double 3. Main observed 7 7 2 3 1 0 7 23 50
0 0 1 0 0 1 21 3 26
26 10
15 9 1 5 28 33 127
Pitch accent placement within words
371
intermediate intonational phrase , the second the accent status of the following word, and the following three columns (labelled 1, 2 and 3) the location of the accent within the candidate word . The phrase-level prominence rate is relatively high in this corpus : 405 of the 483 single orthographic word candidates and 127 of 141 candidate hyphenated words or letter names, 84% and 90% respectively, were labelled with at least one prominence. These tables will be referred to in the sections addressing the different questions posed in these studies; only the tokens that received at least one accent are shown. 1) Does early prominence within words occur noticeably often in continuous speech? Both early prominence and double prominence labeling are relatively frequent in the words in this corpus, comprising 28% and 22% of the tokens analyzed , respectively, in the single orthographic words. The rates are even higher for the set of hyphenated words and letter names, 40% for early and 39% for double prominence. The fact that, for the hyphenated words and letter names, early and double prominence was actually more frequent than prominence only on the main-stress syllable , may reflect the possibility that the first morpheme in each of these words contains an inherently main-stress syllable of its own, and thus might be more likely to receive a prominence. The surprisingly high rate of doubleprominence words observed here may lend support to the notion of phase onset marking discussed under question 5 below . That is, since the prosodic constituents tend to be smaller in radio news style than for nonprofessional read speech, there are more potential sites for an onset prominence early in the word. Also, if a large number of intonational phrases are so short that all prominences occur on a single word, this would also increase the rate of double prominence. 2) Is early prominence associated with a pitch accent on the early syllable? There are several pieces of evidence to support the hypothesis that the hand-labelled prominences on the early syllables of words, and in fact all the labelled prominences, are pitch accents . First, in a subset of the corpus consisting of the tokens of the word Massachusets, labellers were able to determine the specific tonal quality of the pitch accent for each of the syllables marked as prominent-i.e . to mark it as a H *, L + H*, or whatever-according to the requirements for pitch accent labeling in the ToBI convention for prosodic transcription (Silverman et al., 1992). Second, the F0 measurements for this same subset of tokens show that there is a substantial pitch rise on the Ma- syllable when it is labelled with a prominence, contrasted with either a small rise or a fall when the syllable is not marked as prominent. (This difference is illustrated in Fig. 3, which shows the two distributions of Fo rise measurements , including plots with and without normalization by the peak Fo in the phrase to reduce variability due to differences in pitch range.) There was one exception: in one token the early syllable was perceived as prominent by several listeners (and marked with a pitch accent), though it had only a 4Hz Fo rise. In this case , the perceived prominence may have been due to an energy increase ; duration was close to average and therefore probably not the relevant cue. For tokens of Massachusetts that had a prominence labelled only on the first syllable, there was typically a 20Hz fall on -u-, as illustrated in Fig. 4(a), but this was always within or below the range of the Fo rise on the initial syllable. In the "double accented" tokens, the fall on -u- was larger and in some cases in a higher range, consistent with the claim that the prominence label on the main-stress syllable also corresponds to a pitch accent. Although a substantial Fo excursion is by no means required in order to
372
S. Shattuck-Hufnagel et al. Unnormalized Accents
Normalized Accents
Normalized FO rise Figure 3. Distribution of measured values for F0 rise in the syllable Ma- in Massachusetts for cases marked with prominence ("Accented") vs. cases not marked for prominence ("Unaccented"). (Left) the raw F0 rise values in Hz, and (right) the measurements normalized by dividing by peak F0 in the phrase (F0 rise/ F0 peak).
perceive a prominence, the co-occurrence of a perceived phrase-level prominence with a pitch accent label in the H* category and a noticeable Fo marker suggests strongly that these labelled prominences are indeed pitch accents. A third piece of evidence supporting the claim that prominences marked on early syllables correspond to pitch accents is the fact that the average normalized durations of syllables marked as prominent are not significantly different from those not marked as prominent; in fact the average for a prominence-marked syllable is slightly shorter, -0.30 vs. -0.20. This difference is not significant: p < 0.20, t = 0.88. (Note that these normalized durations are below average, i.e. have negative values, because averages are computed from all observations. As a result, word-final and phrase-final tokens tend to raise the average, even through they are not included in the accented and unaccented syllables compared here.) Thus, it is unlikely that the perceived early prominences are the result of duration-related rather than EO-related distinctions. For the main-stress syllables, tokens marked as prominent are significantly longer on average than tokens not marked as prominent, but only when there is no prominence marked in the following word. Average normalized durations are -0.43 for non-prominent syllables and for prominent syllables in a target word that is followed by word WITH a prominence-marked syllable and -0.05 for prominence-marked syllables NOT in an accent clash context; p < 10- 6 , t = 5.10. These differences might suggest that there is duration shortening in a clash context, or they may be the result of lengthening associated with nuclear accent since most of the main-stress "accented" syllables that were not in a clash context occurred in nuclear position. Finally, another small unpublished study we have done supports the hypothesis that, in general, the hand-marked prominence labels correspond to accented syllables labelled by intonation experts. At a workshop on prosodic transcription, several researchers transcribed 25 utterances according to their chosen method of prosodic labeling. Comparing the locations of prominence marked by our labellers using only auditory perception with the locations of accents marked by intonation experts who had access to an Fo contour, the agreement across the two systems was
Pitch accent placement within words
373
(a)
ssa -
chu -
setts
~
-
''-.../'
'""'
....
-
-
(b)
sects
chu -
Mf
~
-V' -
-
-
-
~
-
-
l E_••. ·~t-
(c)
chu -
-
I
set t s
"'-----
(d)
r;,
Figure 4. contour for examples of the word Massachuseus labelled with (a) an H * acce nt o nly on the first syllab le , (b) an H * accent o nly o n the ma in-stress syll able , (c) two H * accents , and (d) no accents.
374
S. Shattuck-Hufnagel et al.
similar to the agreement between labellers within each system: over 90% for the radio news-style subset of these utterances. In sum , our results are consistent with previous analyses suggesting that when prominence is perceived on a pre-main-stress syllable in the word, this syllable is very likely to have a pre-nuclear pitch accent. We will therefore refer to pitch accents and the labelled prominences interchangeably. 3) Does early accent occur in contexts where it prevents a pitch accent clash with the f ollowing word ? Integrated theories of pitch accent occurrence claim that speakers will tend not to place accents too close together; in this sense, these theories invoke a rhythmic constraint on phrase-level prominence patterns. In contrast, intonationonly theories describe a mechanism for early location of the first accent in a phrase, and late location of the nuclear accent, but make no provision for additional rhythmic considerations. Metrical theories do not generally describe clash in terms of pitch accent rhythm , although Selkirk (1984) provides a mechanism for early pitch accent location as the consequence of rhythmic stress clash, stress shift, and consequent reassociation of pitch accent. The relevant data in Table I suggest that pitch accent clash does play a role in early accenting, which weakens the case for the intonation-only theory. Ignoring accent position in the phrase, and excluding the double accented tokens since they would not be predicted by any clash theory, 77% of the candidate words in clearly clash environments (i.e. where the following word has an accent on its first syllable, rows A and D in the table) exhibited early pitch accent, compared with 52% of those in a clearly non-clash environment (where the following word is unaccented, rows C and F). This difference in relative frequency is significant: p < 0.002, t = 3.08. We conclude that early accent is more likely to occur in contexts where it prevents pitch accent clash, than in contexts where it does not. The data in Table II do not show a significant difference for clash and non-clash conditions for hyphenated words and letter name strings, possibly because the likelihood that these tokens will have an early accent is so high that it obscures any possible difference between clash and non-clash contexts. One might also ask whether a lesser degree of pitch accent clash can influence early accent placement within a word, i.e., whether an accent on a non-initial syllable of the following word might be considered as a clash environment. The data in Tables I and II provided no support for this hypothesis, since the early accent rate was actually lower if the following word had a non-initial pitch accent than if the following word had no accents at all. However, re-examining the data in terms of the number of post-main-stress syllables at the end of the target word, we find that the distance to the following accent in syllables may be a factor after all. The data in Table III show that early accent placement tends to be more likely when the main-stress syllable of the target is closer to an initial accent in the following word. Further study of this question is necessary because the data are too few to determine to what extent extra-metricality (Hayes, 1984) is a critical factor. 4) When early prominence occurs in NON-clash contexts, can it be accounted for as phrase onset marking? Although pitch accent clash appears to play a role in early accenting, there are many instances in our data which it cannot account for. In addition , the relatively high incidence of double accented words observed here is not consistent with clash theories which predict that speakers will avoid placing
375
Pitch accent placement within words TABLE III. Accent location in target word for 0, 1, and 2 syllables (shown as x's) between the main-stress syllable (S) of the target word and the following word boundary (indicated by "I"). Target word tokens include only single orthographic words that contain exactly one accent and contain either the first or medial pitch accents in the intermediate phrase. Only targets that occur in contexts with an accent on the first syllable of the subsequent word (indicated by "P"), are included Separation of main stress and accent Accent placement Early
Main-stress
I
Sxx P
27
34
2
1
14
4
prominences on adjacent or nearly adjacent syllables. We hypothesized that, as suggested by the integrated theory, these cases could be accounted for by a tendency to locate the first accent of a new phrase as early as possible, and several trends in the data support this hypothesis. First, we note that target words carrying the phrase-initial prominence (rows A , B and C in Table I) receive early or double prominence more frequently than words carrying phrase-medial accents (rows D, E and F in the Table): 75 % vs. 56% , respectively; p < 0.004, t = 2.65. The difference is more significant for words that are in clearly non-clash contexts (rows C and Fin Table I), where 72% of the target words carrying the first accent in the phrase are early or double accented , vs. 31% of the phrase medial targets; p < 0.001, t = 3.14. The phrase onset marker hypothesis is also supported by the fact that 86% of the 51 unaccented tokens were not phrase-initial. Finally, target words that carry all the pitch accents in the phrase (row H, Table I) are more often double accented than other target words: 45 % vs. 14%; p < 10- 8 , t = 6.61. In addition, the 30 WBUR tokens, which were left out of the original analysis because almost all tokens occurred in the same sentence and prosodic context (For WBUR, I'm ... ), were all double accented tokens containing the only accents in the phrase . (Two additional tokens of WBUR's were unaccented.) This pattern of results would be predicted by the phrase onset marking theory, because target words that contain all the accents of the phrase might tend to carry both early accent (to mark phrase onset) and nuclear accent (on the main-stress syllable). The evidence for early prominence without clash is less clear for the hyphenated words and letter names (Table II), because of the high frequency of early and double accenting. Even for these words, however, there is significantly more double accenting in words that carry all of the accents in the phrase: 70% vs. 29%; p < 10- 5 , t = 4.46. We conclude that, as many investigators have suggested, speakers tend to place the initial accent of a new intermediate intonational phrase on an early syllable when this is possible, and that this can result in an early accent within words which are early-accentable, even in the absence of a pitch accent clash. The data discussed above, supporting the claim that the first accent in an
376
S. Shattuck-Hufnagel et al.
intermediate intonational phrase tends to be located early in an early accent candidate word, suggests that speakers may mark the onset of a new phrase in this way. In contrast , the nuclear accent of an intermediate phrase occurs reliably on the main-stress syllable of its word. The fact that 98% of the single-word tokens of candidate words with nuclear prominence in the intermediate intonational phrase have an accent on the main-stress syllable is perhaps not surprising, given the degree of theoretical agreement on this point. However, it does provide support for the intermediate intonational phrase as the prosodic constituent in which nuclear prominence , like onset prominence , is determined. 5) Is rhythmic stress clash required for early prominence placement? The evidence presented above shows that most instances of perceived early prominence in a word (i .e. apparent stress shift) are cases of early pitch accent in this corpus of FM radio news speech. This finding is compatible with theories of pitch accent placement in the literature which do not rely on the shift of rhythmic stress in response to rhythmic stress clash (Bolinger 1981 , Gussenhoven 1991 , Beckman and Edwards in press), but rather postulate rules for the placement of accent among stressed syllables based on position of the accent in its prosodic constituent and rhythmic patterns among the accented syllables. However, these results do not directly address the question of whether or not rhythmic stress clash/shift is also an important factor in determining the rhythmic structure of these spoken utterances. There are two ways in which rhythmic stress might play a role. First, rhythmic stress clash may provoke rhythmic stress shift to an earlier syllable in the word, followed by re-association of the pitch accent from the former main-stress syllable to the earlier syllable which is now more prominent, as Selkirk (1984) suggests. That is, the Early Accent Placement observed in the FM radio news corpus might depend on the prior occurrence of rhythmic stress shift in response to rhythmic stress clash. This hypothesis could be tested in several ways, including i) by looking for evidence that syllables with early accent show additional acoustic correlates of the shifted stress, beyond those associated with the pitch accent, and ii) by determining whether Early Accent Placement occurs only or preferentially in rhythmic stress clash contexts. A second way in which rhythmic stress shift might contribute to the rhythmic patterns of utterances is by eliminating rhythmic stress clash in non-pitchaccented stretches of speech . To test this hypothesis, it is necessary to determine whether prominence has shifted in non-pitch-accented target words. Some experimental approaches to this latter possibility have been developed (Huss 1978, van Heuven 1987, Horne 1993) ; although results are suggestive, the issue has not yet been resolved . In the current study, we were able to test only the first hypothesis, that Early Accent Placement reflects the prior occurrence of rhythmic stress shift in response to rhythmic stress clash. We carried out two separate analyses: comparing the duration of early accented vs . non-accented syllables, and determining whether Early Accent Placement is possible in the absence of rhythmic stress clash. Duration data for Early Accent words. Earlier.empirical studies of duration and F0 as possible acoustic correlates of rhythmic stress shift in clash contexts did not find evidence for shift (Cooper and Eady 1986.) Our studies also found that vowels in early accented syllables were not lengthened relative to vowels in the corresponding unaccented syllables . However, it is not entirely clear which acoustic correlates are
Pitch accent placement within words
377
appropriate to measure, since the correlates of rhythmic stress are complex and embodied in the relative strength of the syllables of an utterance rather than in the absolute values for a single syllable. We attempted to overcome this problem by comparing the normalized duration of vowels in both early and main-stress syllables for accented vs. unaccented cases. As in previous measurements, we use normalized duration to help factor out variability due to segment identity. We found that main-stress syllables were shortened significantly relative to the secondary-stress syllables when the accented target word was followed by a word with an initial pitch accent, both for the case where the early syllable only was accented (p < 10- 4 , t40 = 4.58) and when the main-stress syllable only was accented (p < 10- 4 , tn = 5.13). In contrast, where the following word did not have a pitch accent on any syllable, the normalized duration of the early and main-stress syllables were similar (N .S., t 13 = 0.42) for early accented target words. Thus, we observe that duration reduces in the main-stress syllable of a target word in accent clash context whether or not the accent shifts left. This finding provides some evidence for rhythmic reorganization apart from but not necessarily independent of pitch accent placement. In order to test whether it is rhythmic stress clash and not pitch accent clash that is influencing the duration changes, we would need to separate unaccented tokens according to rhythmic stress, which results in data too sparse to draw conclusions from . Early pitch accent without rhythmic stress clash. Earlier work (e.g. Beckman et at. 1987, 1990) showed that apparent stress shift (early prominence in the word) could be perceived even in the absence of a rhythmic stress clash context, in phrases like "Chinese antique". To evaluate the possibility that rhythmic stress clash (and therefore, possibly, shift) was associated with the Early Accent Placements we observed in our corpus, we isolated the set of target words which were produced in a context where the following syllable was not pitch-accented, and then compared the effect of the presence vs. absence of a rhythmic stress clash. Such contexts are relatively rare in our database, because FM radio news speakers accent so pervasively, and so many words of English begin with an accentable main-stressinitial syllable. Nevertheless, the corpus contained 76 target words that a) were produced with a single pitch accent (leaving aside phrase-final pitch accents, which must occur on the main-stress syllable, double accents, and deaccented productions, which would not bear on the question directly), and b) were followed by a word that was not pitch accented on its first syllable. These examples were classified as either "With Rhythmic Stress Clash", if the word following the target word began with a main-stress syllable (e.g. seventeen hundred, Massachusetts public), or "Without Rhythmic Stress Clash", if the following word began with a reduced syllable (e.g. Massachusetts Association, psychological effects, ecological solution) or was a reduced function word (e.g. University of, overstate the, volunteered to). Some discussions of stress shift employ more complex definitions of clash vs. non-clash contexts, requiring the right-hand member of a clashing pair of syllables to be rhythmically stronger than the left-hand member, or suggesting that the likelihood of rhythmic stress shift does not disappear when the following syllable is unstressed, but merely lessens. Because our available sample is small, we did not take these complexities into account. Instead, we defined a reduced following syllable as a non-clash context, and an unreduced, full-vowel following syllable as a clash context; as it happened, all of these clash context syllables were main-stress
S. Shattuck-Hufnagel et a!.
378
TABLE IV. Distribution of early accents and main-stress accents for stress-clash and non-stress-clash contexts without pitch accent clash Phrase initial accent Stress clash?
Phrase medial accent
Early accent
Main-stress accent
Early accent
Main stress accent
Yes No
2
8
2 12
4
1 12
Total
10
14
4
13
syllables. In addition, we treated separately the set of target words with alternating stressed and reduced syllables (e.g. Massachusetts or University) and the set of words with two adjacent stressed syllables (e.g. campaign), because we had some evidence that their pitch accent patterns differ. The 41 alternating-stress tokens that met our criteria are shown in Table IV. If rhythmic stress clash and shift were a prerequisite to Early Accent, we would expect more Early Accent tokens to occur in rhythmic stress clash contexts and more Main-stress Accent tokens to occur in contexts without rhythmic stress clash. In this case, the entries on the diagonal in each table would be the largest. Instead, however, the two rates of occurrence were very similar: very few Early Accents occur in the +Stress Clash contexts for either phrase-initial-accented or phrasemedial-accented words. To put it another way , 12 of the 14 Early Accent tokens in this set did not occur in rhythmic stress clash contexts. Although these numbers are small, they reinforce the view that Early Accent can occur in the absence of rhythmic stress clash and associated rhythmic stress shift. The subset of 35 adjacent stress tokens includes a substantial proportion of the -teen number names, especially nineteen which occurs in the names of years such as nineteen eighty -five. These phrases , when used attributively (as in the 1985 alumnae), normally carry an early pitch accent on nine-; the number that follows usually has no pitch accent but provides a rhythmic stress clash context because it has initial stress. As a result, a speech sample that contains a high proportion of such year names will also contain a high proportion of early accent tokens of nineteen in rhythmic stress clash contexts. However, it is difficult to be sure whether this fact results from the high frequency of year names, or represents a more general pattern of rhythmic-clash-induced accenting for adjacent-stress words, because there are not enough non-rhythmic-clash tokens to compare with. Thus, the adjacentstress target word tokens in our corpus do not provide much insight into the question of whether rhythmic stress clash and its resulting rhythmic stress shift provoke early accent in the word, in the absence of pitch accent clash. These results, like those in the literature, are not definitive on the question of whether rhythmic stress shift, occurring in the context of rhythmic stress clash, is a prerequisite for Early Accent Placement. This is in part because the acoustic correlates of rhythmic stress have proved hard to pin down. To fully evaluate the possibility that rhythmic stress clash alone, without pitch accent clash, can provoke either a shift in rhythmic stress or early pitch accent, it would be necessary to determine whether rhythmic stress clash can provoke rhythmic stress shift in stretches of non-pitch-accented speech, and if so, whether such rhythmic stress clashes and
Pitch accent placement within words
379
shifts are a sine qua non for Early Accent Placement in accented stretches . As noted above, some investigators have made a beginning in analyzing stress-shiftable words in non-pitch-accented contexts, both perceptually and acoustically. In our corpus, we failed to find durational evidence for rhythmic stress shift in early accent tokens vs . main-stress tokens, or distributional evidence for an effect of rhythmic stress clash on early accent. This pattern of results is consistent with our general finding that the occurrence of Early Accent can be largely accounted for by a combination of phrase onset marking and avoidance of pitch accent clash. More complete resolution of questions about the need for the full metrical apparatus which describes rhythmic stress shift will require considerable further investigation . To date, however, our results do not appear to require this full representation. 6) Under what conditions does double accenting occur? The word tokens produced with two accents form an interesting set to study, because they are not predicted by the pitch accent clash or rhythmic stress clash mechanisms. Yet the rate of occurrence of double accents is substantial, so an adequate theory must account for them. When they occur at the beginnings of phrases, then the integrated theory of pitch accent placement can account for them as phrase onset marking (Bolinger, 1965; Beckman and Edwards, 1994; Gussenhoven, 1991; Shattuck-Hufnagel, 1994). We therefore examined the double accented words in our corpus to see where in the phrase they occurred , and found that in fact a high percentage of the double accented tokens contain the first accent in the phrase: 80% for the single-word tokens, and 78% for tokens in the set of hyphenated words and letter names. The picture is a bit more complicated than this, in that alternating stress target words seem to be more likely to received two accents than adjacent stress words . For example, while 55 % of alternating stress target words had double accents when they carried all the accents in an intermediate intonational phrase, only 17% of adjacent stress target words did so (p < 0.07, t = 1.51); it seems that speakers were more reluctant to place phrasal prominences on adjacent syllables within the same word than on alternating syllables. Further support for this hypothesis comes from the observation that, of the 90 single-word tokens receiving two accents (shown in the column labelled 2 in Table 1), more than 82% were alternating stress tokens, whereas the set of target tokens as a whole contains only 68% alternating stress items. Of the 18 double-accented single-word tokens that did not carry the first accent in the phrase, 13 carried the nuclear accent. In listening to some of these examples, it seemed to us that for non-phrase-initial accents, double accenting may be used to convey additional prominence, either for nuclear accenting or special emphasis. Our acoustic analyses support this conjecture. Average normalized duration in the early accented syllable of double accented tokens is greater than in single accented tokens: -0.09 vs. -0.30, respectively ; p < 0.08, t = 1.46. In the main-stress syllables, the average normalized duration for a double accented word is the same as was observed for the single accented word in non-clash (typically nuclear) context. F0 measurements showed that there were also larger pitch movements for double accented tokens. The average F0 rise on an accented Ma- of Massachusetts is 62Hz for the double accented words and 46Hz for the words with only an early accent , or 0.22 vs 0.18 for F0 rise normalized by peak Fo in the phrase . The F0 range , as estimated by the peak Fo in the phrase is also higher for double accented tokens .
380
S. Shattuck-Hufnagel et a!.
The average range ± one standard deviation for three cases are: double accented words (8) 268 ±52 Hz, early accented words (13) 250 ±57 Hz, and words carrying accent on the main stress syllable (22) 241 ±51 Hz. While these differences do not reach significance , they are consistent with our hypothesis that double accented words may be associated with emphasis. Accent patterns not predicted by the integrated theory. Although the integrated theory of pitch accent placement , by including both phrase onset marking and pitch accent clash factors , accounts for most of the within-word accent patterns in our data , we observed some accent patterns that are not predicted by the theory. While discourse structure and information status play an important role in accent placement and may explain these discrepancies, it is interesting to examine the exceptions to see if any consistent patterns arise. Here, we limit the analysis to the single orthographic words, since the hyphenated words and letter names seem to have somewhat different behavior that requires further study. · One class of exceptions not predicted by the integrated theory is the set of tokens with double accent where the first accent is not the initial accent in a phrase. We have already noted that double accent in this position may provide additional emphasis or contrast to the word. Other classes of accents not explained by the theory include tokens with early accent that are neither in phrase-initial position nor in a clash context (Fl in Table I), and tokens that do not receive a prominence on the main-stress syllable when that word receives nuclear prominence (Gl and Hl in Table I). (Virtually all theories of accent placement predict that nuclear promience , the last prominence in the phrase, should be located on the main-stress syllable of its word.) Of the nineteen exceptions in classes Fl, Gl and Hl, sixteen are words with adjacent stress patterns. Clearly, the adjacent stress word class requires further study. However, we conjecture that this trend may be a result of the conflict between the dispreference that speakers may have for accenting adjacent syllables and a need to double accent the word either for phrase onset marking plus nuclear accent , or for emphasis . The three alternating stress words that fell into the class of "exceptions" included two words that were labelled as early accented and were cases where the main-stress syllable carried one of the smaller prominences described earlier, proposition and University; the third example, legislation, had a nuclear accent on its early syllable and sounded like a prosodic error. In sum, it appears that the number of tokens that the integrated theory cannot account for is extremely small , particularly for words with alternating lexical stress. Avoidance of pitch accent clash and phrase onset marking together predict the accent patterns on most of the early-accentable tokens in our corpus. Our results also highlight the fact that the lexical stress pattern of a word, i.e. alternating vs. adjacent stress, significantly affects the likelihood that its early full-vowel syllable will be pitch accented. That is, words with adjacent stressed syllables are more resistant to early accent than words in which stressed syllables are separated by reduced syllables, and this fact may account for some apparent exceptions . For example, there were 13 tokens that resisted early accent even though they contained the initial accent in the phrase, and occurred in a pitch accent clash context (A3 in Tale I); of these 13, 10 were adjacent stress words, e.g. campaigns, undocumented, routinely. Thus, even in contexts that elicit early accent very reliably, adjacent stress words seem to resist it. These words also resisted double
Pitch accent placement within words
381
accenting, as noted above , and this may provide a clue to their irregular behavior. The general pattern of pitch accent clash avoidance indicates that speakers prefer not to produce accents on adjacent syllables across word boundaries, and a similar dispreference seems to hold within words. This presents a special problem in cases where an adjacent stress word carries the nuclear pitch accent of its phrase; in these cases, an additional early accent would result in two adjacent accents. Thus , speakers may resist early accenting for these words as a general principle , to prevent the problem of adjacent accents from arising. Perhaps the extra difficulties of pitch accent assignment in adjacent-stress words exerts some pressure against them as the language evolves , leading to a lower frequency for this lexical stress pattern . 5. Discussion The view of apparent stress shift that motivated this study draws on both metrical phonology and intonational phonology to develop a pitch-accent-based model of early accent placement in the word. The model of within-word accent placement is part of a more general account of pitch accent placement in the intermediate intonational phrase, an account that posits at least three factors exercising an influence in the speaker's choice among the grammatically well-formed options for pitch accent placement: prosodic constituent structure, rhythmic regularity and discourse factors such as semantic focus (Shattuck-Hufnagel , 1994). In this paper we have dealt only with the first two factors, which can be considered aspects of prosodic structure. The relation between these factors and the effects of syntactic, semantic and discourse factors in determining the pattern of accent placement remains to be studied. To understand more clearly how our view of early accent in the word reflects our more general view of pitch accent placement, it is useful to consider the pitch accent patterns that are available for a word with more than one strong syllable preceding the main-stress syllable. A typical candidate word for early accent, like Massachusetts, has two stressed pitch-accentable syllables and four possibilities for pitch accent placement: 1) pitch accent on the main-stress syllable only (Main-Stress Accent) , 2) pitch accent on the early syllable only (Early Accent) , 3) pitch accent on both strong syllables (Double Accent), and 4) pitch accent on neither strong syllable (Deaccentuation, in Bolinger's terms). An Fo contour for an example of each of the possibilities for the word Massachusetts is shown in Fig. 4. Speakers must select among these options, and our analyses of pitch accent distribution patterns in a large corpus of continuous speech provide evidence to support two claims about how they do so. Both were made first by Bolinger and find parallels in the work of other investigators. One factor influencing accent placement is the tendency toward Early Pitch Accent Placement in the Phrase. Bolinger (1965) expressed this tendency in terms of an early rhemic accent vs. a later themic accent for each intonational phrase. He posited that speakers seek to place these two accents in locations that are as widely separated as possible, and that this sometimes leads to the occurrence of the phrase-initial accent early in its word. The widespread observation of an early pitch marker in phrases is also reflected in the "onset rise" in the commonly-observed "hat pattern" F0 contour described for simple declarative utterances in the IPO model ('t Hart and Collier, 1975; 't Hart, Collier and Cohen, 1990), as well as in the special non-deletable quality of initial accents in Gussenhoven's (1987 , 1991)
382
S. Shattuck-Hufnagel et al.
phonological model , and in Monaghan's (1990) synthesis model. Beckman and Edwards (1990 , 1994) also suggest that.if a speaker chooses to place just one accent in the first word of a phrase , there may be an impetus to give a tonal target as early as possible in the word . The data in our corpus support the hypothesis of early accent placement in the phrase , since there is a significantly higher rate of early and double prominence within words that carry the first or only accents in the phrase, relative to words with phrase-medial or final accents , and there is both acoustic and perceptual evidence to suggest that these prominences correspond to pitch accents and not to duration lengthening . We emphasize that these insights, taken together with earlier experimental evidence, support a speech production planning model in which speakers seek to actively indicate that a new intermediate intonational phrase has begun by placing a pitch accent on the first accentable syllable (leaving aside for now the precise definition of "accentable syllable"). Since most words of English have main stress on the first syllable, this will not normally involve an apparent stress shift. But words like the target set studied here provide a special instance in which the effect of a tendency toward an early pitch marker in the phrase can be observed as early accent within the word, when other constraints permit it. A second important factor in pitch accent placement is the Avoidance of Pitch Accent Clash. Again Bolinger expressed this idea early (1981), suggesting that speakers have a tendency to avoid placing accents on adjacent or nearly-adjacent syllables in adjoining words. Vanderslice and Ladefoged (1972) leave the question of which accents are realized to " the care of a late rhythm rule," a view which finds echoes in Gussenhoven 's model of underlying accents deleted alternatively, leftward from the nuclear accent, and in Monaghan's alternate accent deletion algorithm for synthesis. Selkirk (1984) embodies a related idea in the metrical grid, suggesting reassociation of the main stress pitch accent to an earlier syllable after rhythmic stress shift. All of these approaches suggest that speakers avoid placing pitch accents too close together; some theories provide more precise definitions of what is meant by "too close" than others do. Our data suggest that proximity in terms of number of syllables is a useful measure , though further research is needed on this question. For purposes of this study, we have adopted a stringent definition of " too close", i.e . a pitch accent on the first syllable of the following word, and have argued that apparent stress shift occurring in pitch accent clash contexts is a matter of early pitch accent, and may not require a prior shift of rhythmic stress. This claim is supported by the fact that we found no increase in vowel duration in the syllable associated with early prominence perception. Shattuck-Hufnagel (1991) reported similar results for critical pairs of utterances read in the laboratory, and Cooper and Eady (1986) also found no duration increase (although they did not determine which cases were actually perceived with early prominence and compute durations for these cases separately). The integrated model supported by our data suggests a possible reinterpretation of some puzzling aspects of earlier experimental results. We have already discussed the fact that few phonetic differences might be expected between the early syllable of a target word in clash and non-clash contexts, if speakers choose to place a pre-nuclear accent on that syllable in both cases, as the integrated theory permits. This could explain why earlier investigators failed to find any phonetic evidence of "stress shift" (although, as we have noted, such evidence may require complex
Pitch accent placement within words
383
analysis of possible reorganization of the rhythm of the phrase as a whole.) Furthermore, there is evidence (e.g. Huss 1978) that some cases of perceived rhythmic stress shift may be perceptual rather than acoustic in nature. The integrated model also suggest.s an account of Cooper and Eady's (1986) finding that words like Pennsylvania and Tennessee were judged to have equal prominence on their two full-vowel syllables more often when they appeared at the beginning of a sentence than in other positions. Sentence-initial words, of course, occur at the beginning of an intermediate intonational phrase, and are likely to receive an onset-marking early pitch accent. If the main-stress syllable also received an accent in these utterances, listeners might easily judge the two syllables to be of equal stress. This revised and integrated view of the phenomenon of early prominence in words, which draws on the insights and observations of a number of investigators, has implications for our understanding of the concept of "stress", for phonological theories of prosodic constituents and prominences, for text-to-speech synthesis, and for speech recognition and understanding. The implications for our understanding of the phonetics of stress may help to clarify the somewhat confusing older experimental studies on the acoustic correlates of stress, most of which were based on the idea that stress is a unitary dimension. When Fry (1955), Lieberman (1960) and others (see Lehiste, 1970, for a summary) measured the acoustic correlates of "stress", they used stimulus configurations that placed the major phrasal prominence of the utterance on the main-lexical-stress syllable that they wished to study. Thus their measures confounded the correlates of phrase-level nuclear pitch accent with those of main lexical stress, and compared them with the correlates of other prosodic configurations whose status is difficult to determine from the text. As Beckman and Edwards (1994) point out, "many previous studies have compared the intensities, durations and F0 excursions in "stressed" vs. "unstressed' syllables without controlling systematically for the levels of the stress hierarchy involved .. .. Therefore, it comes as no surprise when such different corpora yield conflicting results." It is to be expected that, with the theoretical groundwork laid for distinguishing among several kinds of prosodic prominence, and with the emergence of transcription systems for labeling stimulus utterances to reveal the choices that were made by the speaker among the possibilities for prosodic structuring and prominence, rapid progress can be made toward understanding the acoustic correlates of both lexical and phrasal prominence, and the differences among possible categories within these two major levels. The implications of these results for phonological theory and for cognitive models of the production planning process lie in the support they provide for several proposed concepts and structures. For example, they suggest that the intermediate intonational phrase proposed by Beckman and Pierrehumbert (1986) is represented during the planning process, and plays an important role in the determination of accent placement. Similarly, our results provide support for the claim that final (nuclear) pitch accents in the intermediate phrase behave differently from accents in other positions, and therefore must be separable in the processing representations. Our findings appear to raise some doubts about one of the major supports for the phonological concept of a phrase-level metrical grid, i.e. the argument that phrase-level grids are necessary to define the context in which stress shift occurs (i .e. rhythmic stress clash) and to provide the mechanism by which prominence occurs on an earlier syllable (i.e. rhythmic stress shift) . However, we emphasize that our
384
S. Shattuck-Hufnagel et al.
results do not eliminate the possibility that grid-like representations are necessary at the phrasal level, for several reasons. First , the number of word tokens followed by a non-pitch-accented syllable was small , so a larger corpus will be necessary to determine the extent to which early accent in the word occurs in contexts without pitch accent clash , comparing the presence vs . absence of rhythmic stress clash . Second, we do not yet know whether there is rhythmic stress shift in non-pitchaccented stretches of speech , which would support the need for a phrasal grid. Several investigators have begun to explore this question (see Huss 1978) but the answer is not yet clear. Finally, we have not investigated the claims about syllable lengthening and shortening (Beckman and Edwards 1990, among others) which provide a second line of argument in support of the phrasal grid. This work has significant implications for accent prediction in text-to-speech synthesis applications , where it has been claimed that good pitch accent placement is more important than the specific F0 generation model (Akers and Lennig, 1985). As demonstrated by our results, proper pitch accent location is not simply a matter of choosing the appropriate words in an utterance to accent, but also involves accent placement within words . However, algorithms that have been developed to automatically predict pitch accent location for text-to-speech synthesis applications typically avoid the problem of within-word accent placement by assuming that a pitch accent always occurs on the syllable with primary lexical stress. Two exceptions are Monaghan's (1990) algorithm that involves right-to-left deletion of accents on alternate accentable syllables (including secondary stress syllables) excepting the initial accent in the phrase, and the decision tree prediction model described in Ross, Ostendorf, and Shattuck-Hufnagel 1992 that incorporates accent clash and phrase onset factors via questions allowed in classifier design. These results also have implications for research in speech recognition and understanding . Systems which use automatic stress detection prior to recognition (e.g. Hieronymus, McKelvie, and Mcinnes, 1992) or stress-dependent models in recognition (e.g. Bishop , 1992) should incorporate the notion of early accent placement into the lexical representation of words so as not to rule out valid hypotheses. Specifically, pitch accents (and possibly energy or spectral differences) should be allowed on candidate syllables for early accent placement, but phoneme duration models should incorporate accent status only for main-stress syllables. In speech understanding applications, where pitch accent marking of semantic focus or emphasis provides information for language processing, the specific choice of accent patterns within the word may be important for distinguishing words accented to mark structure (e.g. early accent) vs. to indicate information status (e.g. double accent) . Our results leave unanswered a number of relevant questions. First, does rhythmic stress shift occur in addition to early pitch accent placement? Although we have shown that early pitch accent placement plays an important role in the perception of early prominence in the word, and that early accent can occur without rhythmic stress clash (and thus presumably without rhythmic stress shift), the possibilities remain that rhythmic stress clash increases the likelihood of early accent placement or that rhythmic stress shift occurs in stretches of speech without pitch accents. Second, precisely what kind of separation between potential accent sites lessens the probability of early accent placement in the word? We have shown that the
Pitch accent placement within words
385
frequency of early accent placement lessens as the number of syllables between the two sites increases, as some metrical theories predict, but more data are needed to assess how much the word affiliation of these syllables matters. Third, are there degrees of prominence among pitch accents? Our experience with labelling prominences suggests that there may be, but it is not easy to quantify. The metrical grid may be useful for representing these differences. Another question concerns exactly what is meant by the "first accentable" and " last accentable" syllables in a phrase. These are clearly not simply the first and last unreduced-vowel syllables, as evidenced by accent patterns in phrases like Do you have to do it? (lui of do can be unreduced and unaccented) and It's an automobile factory (where the /re/ of factory can be unreduced and unaccented). For the last accentable syllable in a phrase, something like the most prominent syllable in the last constituent seems appropriate, but it is unclear precisely what kind of constituent is involved. Yet another question concerns possible prosodic distinctions among unreduced vowels. Theories of rhythmic stress distinguish several levels of stress within the word, whereas Bolinger (1981) argues for a simple two-way distinction, reduced vowel vs. unreduced vowel, at least on the left of the main stress syllable of the word. Gussenhoven (1991), on the other hand, points out that there ar·e words with early full vowels, like obese, that are nevertheless highly resistant to early prominence, and dictionaries often list words with early syllables that have full vowel quality but no stress. Further study will be required to determine whether unreduced-vowel syllables that nevertheless resist pitch accenting are largely or entirely found directly adjacent to an accentable syllable in the same word, and thus can be accounted for by a rule, or whether they form a different class of vowel. In general, the unexpected finding of a difference in accent behavior between alternating and adjacent stress words requires further study. An additional issue is the accent behavior of words which carry neither the initial nor the final accent in a phrase. It would be of interest, for example, to determine whether the accent patterns of these phrase-medial words can be predicted by a combination of clash avoidance and semantic factors; this approach may provide some insight into the phenomenon of deaccentuation, which we have not addressed. Finally, how do the factors demonstrated here interact with other factors that can influence speaker choices among possible patterns of prominence placement in phrases, like semantic and syntactic structure, morphological form class, emphasis and discourse structure? The results reported in this paper lead to many further questions, but we believe that the integrated theory of pitch accent placement which is supported by our corpus analyses offers a number of broadly useful insights and implications that go well beyond the simple finding that many cases of apparent stress shift in words are instances of early pitch accent placement. This research was funded by NSF under NSF grant number IRI-8805680, with additional support coming from NIH, award number NIH 8-ROl-DC0-0075, and the Department of Education under the Graduate Assistance Applied to National Needs Program, award number P200A90080. The radio news corpus was provided by WBUR, a Boston public radio station . The support of the MIT Undergraduate Research Opportunity Program is gratefully acknowledged, as are the contributions of Laura Dilley, Nanette Veilleux and Mary Hendrix.
386
S. Shattuck-Hufnagel et a!. References
Akers, G. & Lennig, M. (1985) Intonation in text-to-speech synthesis: evaluation of algorithms, Journal of the Acoustical Society of America, 77(6) , 2157-2165. Beckman , M. , De Jong, K. & Edwards, J . (1987) The surface phonology of stress clash in English, presented at the Annual Meeting of the Linguistics Society of America . Beckman , M. & Edwards , J. ( 1990) Lengthenings and shortenings and the nature of prosodic constituency. In Papers in Laboratory Phonology 1: Between the Grammar and Physics of Speech (J. Kingston & M. E. Beckman , editors) , 152-178. Cambridge University Press. Beckman, M. & Edwards, J. (1994) Articulatory evidence for differentiating stress categories, In Phonological Structure and Phonetic Form : Papers in Laboratory Phonology Ill, (P. Keating editor), 1994. Cambridge University Press. Beckman, M. & Pierrehumbert , J . (1986) Intonational structure in Japanese and English, Phonology Yearbook 3, 255-309. Beckman , M. , Swora , M.G ., Rauschenberg, J. & DeJong, K. (1990) Stress shift, stress clash and polysyllabic shortening in a prosodically annotated discourse , Proceedings of the 1991 International Conference on Spoken Language Processing , 1, 5-8. Bishop , K. (1992) Modeling sentential stress in the context of a large vocabularly continuous speech recognizer, Proceedings of the International Conference on Spoken Language Processing, 437-440. Bolinger, D. (1958) A theory of pitch accents in English , Word 14, 109-149. Bolinger, D. (1965) Pitch accent and sentence rhythm. In Forms of English : Accent, Morpheme, Order (1. Abe & T. Kanekiyo, editors) , Hokuou, Tokyo. Bolinger, D. (1981) Two kinds of vowels, two kinds of rhythm, distributed by the Indiana University Linguistics Club , Bloomington. Cooper, W. & Eady , J . (1986) Metrical phonology in speech production , Journal of Memory and Language, 25,369-384. Crystal , T. & House , A. (1990) Articulation rate and the duration of syllables and stress groups in connected speech, Journal of the Acoustical Society of America, 88(1) , 101-112. Fong , C. F. (1993) Duration modeling for speech synthesis and recognition, M.S. Thesis, Boston University. Fry, D . B. (1955) Duration and intensity as physical correlates of linguistic stress, Journal of the Acoustical Society of America, 27, 765-768. Gussenhoven , C . (1987) Review of Selkirk 1984, Journal of Linguistics, 22, 455-474 . Gussenhoven, C . (1991) The English rhythm rule as an accent deletion rule, Phonology 8, 1-35. 't Hart, J. & Collier, R. (1975) Integrating different levels of intonation analysis, Journal of Phonetics, 3, 235-255 . . 't Hart, J ., Collier, R . & Cohen, A. (1990) A perceptual study of intonation , Cambridge, Cambridge University Press. Hayes , B. (1984) The phonology of rhythm in English, Linguistic Inquiry, 15, 33-74. Hieronymous, J ., McKelvie, D . & Mcinnes, F. (1992) Use of acoustic sentence level and lexical stress in HSMM speech recognition, Proceedings of the International Conference on Acoustics, Speech and Signal Processing, I, 225-227. Horne, M. (1990) Empirical evidence for a deletion formulation of the rhythm rule in English, Linguistics, 28, 959-981. Horne, M. (1993) The phonetics and phonology of the rhythm rule in post-focal position : Data from English , Reports from the Department of Phonetics, University of Umea (Sweden) 2, 69-77. Huss, V. (1978) English word stress in the postnuclear position, Phonetica, 35, 86-105 Jones , D. (1939) An Outline of English Phonetics. 6th ed . New York: E. P. Dutton and Co. Inc. Kimball, 0., Ostendorf, M. & Bechwati, I. (1992) Context modeling with the stochastic segment model , IEEE Transactions: Signal Processing, 1584-1587. Ladefoged , P. (1975) A Course in Phonetics. New York: Harcourt Brace Jovanovich . Lehiste , I. (1970) Suprasegmentals. Cambridge, MA: MIT Press. Liberman, M. (1975) The Intonational System of English. Ph.D . Thesis, Massachusetts Institute of Technology. Liberman, M. & Prince, A . (1977) On stress and linguistic rhythm, Linguistic Inquiry, 8, 249-336. Lieberman , P. (1960) Some acoustic correlates of word stress in American English, Journal of the Acoustical Society of America, 32, 451-454. Maeda, S. (1974) A characterization of fundamental frequency contours of speech, Quart. Prog. Rep. MIT RLE, 114, 193-211. Monaghan, A. (1990) Rhythm and stress-shift in speech synthesis, Computer Speech and Language, 4, 71-78. Nespor, M. & Vogel, I. (1986) Prosodic Phonology. Dordrecht: Foris. Nespor, M. & Vogel, I. (1989) On clashes and lapses, Phonology, 6, 69-116. dePijper, J. R. (1983) Modelling British intonation, Dordrecht: Foris.
Pitch accent placement within words
387
Price, P ., Ostendorf, M., Shattuck-Hufnagel, S. & Veilleux , N. (1988) A methodology for analyzing prosody, Journal of the Acoustical Society of America, 84 , Suppl. I, S99. Price, P., Ostendorf, M., Shattuck-Hufnagel , S. & Fong, C. (1991) The use of prosody in syntactic disambiguation , Journal of the Acoustical Society of America, 90, 2956-2970. Prince, A. (1983) Relating to the grid, Linguistic Inquiry, 14, 19-100. Ross, K., Ostendorf, M. & Shattuck-Hufnagel , S. (1992) Factors affecting pitch accent placement , Proceedings of the International Conference on Spoken Language Processing, 365-368. Secrest, B. G . & Doddington, G. R. (1983) An Integrated Pitch Tracking Algorithm for Speech Systems, Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 3, 1352-1355. Selkirk, E. (1984) Phonology and Syntax: The Relation Between Sound and Structure, Cambridge: MIT Press. Shattuck-Hufnagel, S. (1988) Acoustic-phonetic correlates of stress shift , Journal of the Acoustical Society of America, 84, Suppl. 1, S98. Shattuck-Hufnagel, S. (1989) Stress shift as the placement of phrase-level pitch markers, Journal of the Acoustical Society of America, Suppl. 1, S493 . Shattuck-Hufnagel, S. (1991) Acoustic correlates of stress shift, Proceedings of the XII Congress of Phonetic Sciences, 4, 266-269 . Shattuck-Hufnagel, S. (1992) Stress shift as pitch accent placement, Proceedings of the International Conference on Spoken Language Processing, 747-750. Shattuck-Hufnagel, S. (1994) Stress shift as early pitch accent placement: a comment on Beckman and Edwards. In Phonological Structure and Phonetic Form: Papers in Laboratory Phonology Ill (P. Keating editor), Cambridge University Press. Silverman, K. , eta/., (1992) TOBI: A standard for labeling prosody, Proceedings of the International Conference on Spoken Language Processing, 867-870. Vanderslice, R. & Ladefoged, P . (1972) Binary suprasegmental features and transformational word-accentuation rules, Language, 48, 819-836. van Heuven, V. 1. (1987) Stress patterns in Dutch (compound) adjectives: acoustic measurements and perception data, Phonetica, 44, 1-12. Willems, N. (1982) English Intonation from a Dutch Point of View, Dordrecht: Foris. Webster's Ninth New Collegiate Dictionary, Merriam-Webster Inc., 1984.
Appendix: Early accent word candidates
The following is a list of words that were considered candidates for early accent, followed in parentheses by the number of times the word occurred in the Boston University radio news corpus if there was more than one occurrence. Single orthographic words:
academically, academics, acquisition, administration (2), affidavit, afternoon (2), allegations, already (5), applications, Arizona, association (10), automatically beneficiary (2), blockades, bucktoothed, campaign (5), campaigns, carbohydrates, Carolina, cartel, coalition, cocaine (2), communications, compensation, conservation (2), conservationists, consideration, constitutional (3), contraceptives (3), contributions (2), conversation, conversations, credibility, decontaminate, dehydration, deliberations, Democratic (5), demonstration (2), demonstrations, disabilities, disability, disagree, disagrees, disappointed, disbelief, disproportionate, distribution (3), distrusts, diverse, downtown (3), ecological, economics, education (14), educational (2), eighteen (2), elimination, engineering, enthusiastic, environmental (17), environmentalist, environmetnalists (3), environmentally, epidemic (3), everyday, experimental, explanation, favorability, fifteen (11 ), fifteenth (2), financially, fourteen, governmental, guaranteeing, guarantees, gubernatorial, hesitation, homophobia, homophobic (2), homosexual, homosexuality, ideological, illegal (9), illicit, immigration, immunization, immunizations, implications, impossibility, inconvenient, independence (2), independent, independents (2), indication, in-
388
S . Shattuck-Hufnagel et al.
dividual , individuals, indoor, inequitable , informal , information (3), insensitive , insiders, insolvent, institution, institution's (2) , institutionalize, integration, internal, international (2), intervene , investigation (2) , Irene (5), irresponsible, isolation, legislation (2), liability, longtime (3), manufacturing (2), marijuana, masquerade, Massachusetts (56), Medellin , medication , methamphetamine, Michelangelo's, misconduct , misplaced , misspent, nationwide , negotiations, nevertheless, nineteen (20), obsolete, occupational, ongoing, operation (2) , opportunity , opposition , organization (3), outnumbered , outside (6) , overall , overcrowding, overhauled, overnight, overseas (2) , overseeing , overstate , overt, overwhelmed, paranoia, politician , politicians, population , predates, premediated , prenatal , preoccupied (2), primarily, prohibition (3), proposition (2) , propositions, prosecution (2), protest, protesting, psychological ratification , reconsider, reconsideration , referendum, regulations (3), rehash, reimburses, reinforcing, reorganization, reprehensible, representative (8), representatives, represented , reputation (2), reservation (3), reshuffling, residential, responsibility, restructuring, routinely, sanitarian , sanitarian's, sanitarians (2), seventeen (2), shortchanges, situation, sixteen , speculation (2), statewide (2) , stillborn, superintendent (3) , supplemental (2), taxation, thirteen (5), thirteenth , transferred, translates , transmission, trustee (2), trustees (3) , unable, unaware , unborn, unclear, unconstitutional, uncontrollable , undergo, underHe, understands, undocumented (2), unemployable, unfortunate , united, universal (3) , universities (3) , university (6) , university 's (2) , unlike (2), unprecedented, unpretentious, unqualified, unseat, unsolicited, unspent , unsuccessful , untimately, unusual, unveiled , unwittingly , upbeat , upcoming (2), upscale (2), violation (2), violations (4), voluntarily, volunteered (2) , widespread , worldwide . Hyphenated words and letter names:
A.F.D.C., anti-C.L.T. , anti-Seabrook , anti-Sullivan, anti-crack, anti-defamation, anti-tax , anti-tracking , A .Z .T ., B.C.H. (2) , beefed-up, blue-collar, C.L.T. (2), co-founder (2) , D .C.U. , eighty-eight (4) , eighty-nine (2) , eighty-six , eighty-two (2), empty-handed, fifty-nine , fifty-four , fifty-one, fifty-seven , forty-two, forty-eight, forty-five , forty-two , forty-year , full-time, full-blown, full-power, full-scale, handsoff, hands-on, hard-hit , hard-core, hard-pressed, H .I.V., inner-city (3) , IRS , 1-V, L.A. , low-power, Mattapan-Roxbury , Merill-Lynch's (2), Merrill-Lynch , N-DoubleA-C-P's, N.E .A. (4) , nineteen-eighteen , ninety-three, non-intoxicating, non-toxic, one-fifth , one-third (2) , part-time , pro-METCO , pro-environment (2), pro-life, public-private, re-arrange, re-election, re-enrolled, re-evaluate, rubber-stamp, second-class, self-described , self-interests , self-righteous, seventy-eight, seventy-five (3) , seventy-four, seventy-six, seventy-two (3), short-circuit, sixty-eight, sixty-two, so-called (3) , ten-year, thirty-five (4) , thirty-one (2), thirty-seven, three-year, twenty-seven , twenty-five (6) , twenty-six, twenty-three (2), twenty-two (3), U-Mass, U.A.W. (2) , U .N. , U .S. (7), U .S.A. , vice-president (2), well-founded, welldeserved , well-integrated, year-round .