Journal of Phonetics 40 (2012) 329–349
Contents lists available at SciVerse ScienceDirect
Journal of Phonetics journal homepage: www.elsevier.com/locate/phonetics
The theme/rheme distinction: Accent type or relative prominence? Sasha Calhoun n School of Linguistics and Applied Language Studies, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
a r t i c l e i n f o
abstract
Article history: Received 14 May 2010 Received in revised form 5 December 2011 Accepted 8 December 2011 Available online 31 December 2011
In English, theme/rheme status (or topic/comment) is claimed to be marked by pitch accent type, i.e. L þHn (LH%) versus Hn (LL%). Calhoun (2010a) claims that, rather, themes are relatively less prominent than rhemes. The phonetic realisation of themes and rhemes was looked at in a semi-spontaneous game task, e.g. (following Will the banana land on some money?) No, the lollipop (rheme) will land on some money (theme), the banana (theme) will land on a monster (rheme). There were some phonetic differences consistent with an accent type difference: the preceding L pitch elbow, and the H peak, of (L þ )Hn accents were later and lower on themes than rhemes. Themes also had a high boundary (LH%, HH% or H-) more often. However, these differences were small and not consistent. On the other hand, there were large and consistent differences in the relative prominence of paired themes and rhemes (e.g. lollipop and money). In theme–rheme order, the rheme f0 peak was slightly higher, whereas in rheme–theme order, it was substantially higher. The f0 peaks of paired themes and rhemes were also highly correlated. There were smaller differences in mean intensity and duration. This is clear support for Calhoun’s claim that relative prominence marks the theme/rheme distinction, and for the importance of metrical prominence in signalling information structure. & 2011 Elsevier Ltd. All rights reserved.
1. Introduction One of the main functions of prosody in English is to signal information structure, i.e. marking the status of elements in an utterance in relation to the discourse model, e.g. focused or background, theme/topic or rheme/comment, contrastive or non-contrastive (Calhoun, 2010a; Halliday, 1968; Jackendoff, 1972; Ladd, 2008; Rochemont, 1986; Rooth, 1992; Selkirk, 1995; Steedman, 2000; Vallduvı´ & Vilkuna, 1998). This paper looks at the last two properties, theme/rheme status and contrast. It is widely claimed that both are ¨ marked by pitch accent type (Bolinger, 1965; Brazil, 1985; Buring, 2003; Gussenhoven, 1984; Jackendoff, 1972; Pierrehumbert & Hirschberg, 1990; Steedman, 2000), often framed as the distinction between ToBI accent types LþHn and Hn (Beckman & Hirschberg, 1999). However, there are phonetic difficulties with this accent distinction (see Calhoun, 2006; Ladd, 2008; Ladd & Schepman, 2003). My recent work suggests that, rather, both properties are marked by prominence within metrical structure (Calhoun, 2004, 2006). This experiment looks at the prosodic realisation of contrastive theme and rheme referents, to directly compare these two theories. It extends my previous experiments using more speakers and improved methodology, adapting a semi-spontaneous game task by Speer, Warren, and Schafer (2011). The results are relevant not only to resolving how theme/rheme status is marked, but also more broadly:
n
Tel.: þ64 4 463 9537; fax: þ64 4 463 5604. E-mail address:
[email protected]
0095-4470/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.wocn.2011.12.001
to evaluate the relative importance, at least for signalling information structure, of the two major components of intonational phonology, metrical structure and the intonational tune (Calhoun, 2006, 2010a; Ladd, 2008). 1.1. Theme/rheme structure and contrast The literature on the properties of information structure relevant to explaining prosodic realisation is vast, with much contradictory use of terminology to describe similar underlying phenomena (e.g. see Kruijff-Korbayova´ & Steedman, 2003). For concreteness sake, in this study, I follow the view of information structure set out in Steedman (2000) and Calhoun (2010a), drawing from Halliday (1968) (see also Vallduvı´ & Vilkuna, 1998). In this view, information structure is defined on two dimensions, the division between focus and background operates within the division into theme and rheme. For example (CAPS show accented words and round brackets prosodic phrases) (from Steedman, 2000, p. 654): 1. Q: I know who proved SOUNDNESS. But who proved COMPLETENESS? A: (MARCEL proved) (COMPLETNESS ) focus background focus ( rheme ) ( theme ) Each accented word is a focus. Focus is defined according to the influential theory of Rooth (1992): if a word is focused then it introduces a presupposition of alternatives to the focus, which
330
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
should be compatible with the context (or able to be accommo¨ dated) (see also Buring, 1997; Steedman, 2000; Vallduvı´ & Vilkuna, 1998). Here both foci are distinguished from salient alternatives given in the context: Marcel contrasts with the person the questioner knows proved soundness, and completeness contrasts with soundness. However, these foci have different functions in the sentence. Completeness is a thematic focus. The theme, or topic, is ‘‘that part of an utterance which connects it to the rest of the discourse’’ (Steedman, 2000, p. 655). In a response to a wh-question, this is unambiguously defined as the part contained in the question (Steedman, 2000, p. 655), as is the case here. The rheme, or comment, is the part of the utterance that advances the theme, or which is predicated. In response to a wh-question, this is the answer to the question; in this case Marcel. Since at least Bolinger (1965, pp. 57–66), it has been claimed that thematic foci are marked by a different pitch accent type to rhematic foci. Jackendoff (1972) claimed B-accents, i.e. ‘rise-fallrise’, mark topics (equivalent to themes), while A-accents, i.e. ‘fall’, mark foci (equivalent to rhemes). Steedman (2000) identifies this as ToBI LþHn (LH%) versus Hn (LL%) (note he also claims ¨ themes can be marked by Ln þH and rhemes by Ln). Buring (2003) puts forward a similar claim, based on a somewhat different conceptualization of themes (his ‘contrastive topics’). Others, however, have claimed that it is not thematicity, but contrast which is marked by LþHn, as opposed to non-contrastive Hn (e.g. Bartels & Kingston, 1994; Ito & Speer, 2008; Pierrehumbert & Hirschberg, 1990; Selkirk, 2002; Watson, Gunlogson, & Tanenhaus, 2008). For example, Pierrehumbert and Hirschberg (1990, p. 296) claim that the LþHn accent conveys that ‘‘the accented item – and not some alternative related item – should be mutually believed’’, or that LþHn marks correction or contrast. Later experimental work has claimed LþHn is a marker of contrastive reference (where the contrasting item is immediately available given the experimental set-up) (e.g. Ito & Speer, 2008; Watson et al., 2008). As can be seen, this is essentially the same as the definition of focus above. Rooth’s definition of focus is not universally agreed. Others claim ‘‘contrastive’’ focus, which evokes alternatives (either explicitly or implicitly), is distinct from ‘‘information’’ or ‘‘presentational’’ focus, which does not (Fe´ry & Samek-Lodovici, 2006; Kiss, 1998; Lambrecht, 1994; Selkirk, 2002; Umbach, 2004). That is, Marcel above would be an information focus: it simply marks the ‘‘new’’ information about who proved completeness, with no implicit contrast. Theme/rheme focus is related to contrast/non-contrast. Themes form the presupposition, and are therefore likely to be given and not prominent (Steedman’s unmarked themes); so, if they do contain a focus, it is likely to be salient and contrastive (see Gundel & Fretheim, 2004). However, examples such as the one above show why it is important to separate focus from rhematicity, as contrasts occur within the rheme too. The need for information structure to be defined on these two orthogonal dimensions has been recognised by a large number of authors (e.g. Brazil, 1985; Halliday, 1968; Kruijff-Korbayova´ & Steedman, 2003; Vallduvı´ & Vilkuna, 1998). Most experimental work on contrastive versus non-contrastive reference does not address theme/rheme status. However, it is difficult to see how LþHn could mark both thematicity and contrast, and Hn rhematicity and non-contrast: for what would the intonational distinction between themes and rhemes be in contexts where both are clearly contrastive? In Rooth’s (1992) view, the distinction between contrastive and non-contrastive focus is not categorical, in that all foci imply alternatives. In other work, I have argued that while this is true, the salience of the contrast can vary depending on the context and speaker’s intentions. The salience of the contrast is not related to a particular accent type, but is
rather cued by gradient increase in the prosodic prominence of the focal accent (Calhoun, 2009, 2010a). The design of the current experiment is intended to allow the comparison of theme and rheme accents while keeping contrast constant. Non-contrastive themes and rhemes cannot be compared, as non-contrastive themes are generally not accented; therefore, contrastive themes and rhemes are compared. To avoid the issue of whether there are contrastive foci separate from other foci, I operationalise contrast in this experiment as a referent having salient alternatives in the immediate discourse context, while recognising that a speaker will always have some degree of latitude as to whether they actually mark any given foci as being contrastive (assuming this can be done).
1.2. LþHn and Hn One major difficulty for both of the claims outlined above is that the LþHn/Hn distinction is one of the most contentious in the ToBI framework. In major studies of inter-annotator agreement, the LþHn/Hn distinction was usually so difficult to make that the categories are collapsed and agreement not reported (Pitrelli, Beckman, & Hirschberg, 1994; Silverman et al., 1992; Syrdal & McGory, 2000). The ToBI guidelines state Lþ Hn is a ‘‘rising peak accent: a high peak target on the accented syllable which is immediately preceded by relatively sharp rise from a valley in the lowest part of the speaker’s pitch range’’, while Hn is a ‘‘peak accent: an apparent tone target on the accented syllable which is in the upper part of the speaker’s pitch range for the phrasey includes tones in the middle of the pitch range’’ (Beckman & Hirschberg, 1999). Phonologically, LþHn should have an L tonal target preceding it, while Hn does not (Pierrehumbert, 1980). The trouble is that if there are sufficient preceding unaccented syllables, fundamental frequency (f0) tends to fall before the accentual rise, be it LþHn or Hn (Ladd, 2008; Ladd & Schepman, 2003). This apparent low can be hard to distinguish from an L target, e.g. see Fig. 1. In her original analysis, Pierrehumbert (1980) analysed this as a ‘‘sagging transition’’ between two Hn accents, varying by the number of intervening unstressed syllables. Ladd and Schepman (2003) showed experimentally that this low before Hn accents is in fact a fixed L target: the start of the rise is reliably anchored to the start of the stressed syllable, regardless of intervening syllables. Dilley (2005) further showed in an imitation experiment that there is no categorical boundary in the f0 scaling of the L target, making it difficult for this to be the basis of the distinction. In practice, the distinction is often made using peak height, with LþHn accents being higher (as follows from the ToBI guidelines). Evidence from experiments looking at whether LþHn accents mark contrastive referents, and Hn non-contrastive, seems to be consistent with a peak height difference (see further in Calhoun, 2006, chap. 4). Bartels and Kingston (1994) and Krahmer and Swerts (2001) (in Dutch) looked at a variety of phonetic cues to pitch accent type, and found that peak height was the only robust cue to contrastive reference. Both Watson et al. (2008) and Ito and Speer (2008) found LþHn accents bias contrastive reference in eye-tracking experiments, but state that their Lþ Hn accents were higher than their Hn accents. Welby (2003), on the other hand, failed to find LþHn accents were more acceptable than Hn on contrastive foci when choosing stimuli to hold peak height constant. Note that all of these studies, except Krahmer and Swerts, used Mainstream American English speakers. In a series of experiments, I tested whether L þHn and Hn mark theme and rheme status respectively in contrastive contexts
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
331
Fig. 1. Example of the similarity in the ‘‘sagging transition’’ low before an Hn accent, and the L target before an L þ Hn accent. From the ToBI annotation training materials: the first tier shows the tone annotation, the second the word transcription, the third the break tier, and the fourth alternate tone annotations given uncertainty in the main tone annotation (Brugos, Shattuck-Hufnagel, & Veilleux, 2006).
(Calhoun, 2004, 2006, chap. 4). I looked at the realisation of accents on thematic and rhematic referents such as:
rheme 2. Q: That guy’s Henry Lombard, I think? A1: That isn’t Henry LOMBARD, it’s Henry LAMBERT. theme rheme A2: That’s Henry LAMBERT, not Henry LOMBARD. rheme theme A single naı¨ve speaker was asked the question by the experimenter, and in separate trials answered either A1 (theme–rheme order) or A2 (rheme–theme order) from a printed sheet. It was found firstly that most thematic referents, especially in rheme–theme order, were not realised with clear (Lþ)Hn accents (7 of 32 themes, compared to 29 of 32 rhemes), i.e. accents where the locations of the preceding L and the peak H could be readily identified. Of the pairs which had full (Lþ)Hn accents, there were small differences: the preceding L elbow in theme accents was slightly lower and later, relative to the stressed syllable, than in rheme accents, and the H peak was lower; rheme accents were followed by a greater dip in f0 after the accent peak, and almost always had a low boundary (L- or LL%), while themes were equally likely to have a low or a high boundary (H- or LH%). In a perception experiment, however, the only phonetic cue subjects were sensitive to in judging the acceptability of an accent on the theme was peak height, i.e. subjects preferred peaks on themes to be lower. This is clearly inconsistent with the findings above, if LþHn marks themes, then LþHn accents would be lower than Hn accents, whereas the findings above suggest the main phonetic marker of Lþ Hn accents is that they are higher.
1.3. Theme/rheme status marked by relative prominence This finding led me to claim that theme/rheme status is not marked by tonal accent type at all, but by relative prominence (Calhoun, 2006, 2010a). I claimed that paired theme and rheme foci are in a metrical weak–strong relationship. Metrical prominence depends on position: the last of roughly equally phonetically prominent accents in a phrase will be perceived as more structurally prominent, as it is nuclear (Calhoun, 2006, 2010b; Grabe & Warren,
theme
rheme theme
Fig. 2. Prominence relationship between theme and rheme foci by position in phrasal structure.
1995; Ladd, 2008; Liberman, 1979; Selkirk, 1984). Therefore, in theme–rheme order, theme and rheme accents can be of roughly equal phonetic prominence, but in rheme–theme order, the theme accent must be substantially less phonetically prominent, see Fig. 2. I claimed that this relationship can hold over contiguous prosodic phrases (Ladd, 1988; Truckenbrodt, 2002), so the theme and rheme can be in separate phrases. I suggested that contrastiveness, defined as the pragmatic interpretation of alternative sets, is marked by increasing prominence in general, i.e. the more phonetically prominent a focal accent, the more likely a contrastive interpretation. The marking of theme status and contrastiveness can therefore have competing influences on prominence, potentially leading to ambiguity (see Calhoun, 2010a). My theory nicely explains the results of my first production experiment: the theme was usually not accented, or only weakly accented, and therefore less prominent than the rheme; but when accented, it was lower (and therefore probably less prominent, although that experiment only measured f0). However, the materials did not allow this to be tested directly: the paired rheme of Lombard in (1) is isn’t or not; while the paired theme of Lambert is that’s or it’s. These words were often not accented and were hard to analyse acoustically. Therefore, I tested another set of materials, again with read speech, this time using a speaker with extensive phonetic training: 3. Q: You’re going to see Amanda tomorrow, right? A: No, I’m seeing AMANDA on MONDAY, theme rheme I’ll see NORMA TOMORROW. rheme theme
332
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
The results were consistent with my hypothesis. In this experiment, most themes and rhemes were accented. The peak of the first accent (e.g. on Amanda) was higher than that the second (e.g. on Monday) in each phrase. However, the drop was much larger in rheme–theme order than in theme–rheme order. Further, there was a strong correlation between peak heights for paired referents. There was no significant effect of location for either the L or H target, though the dip following rhemes was again greater than following themes. These findings are very like those in Liberman and Pierrehumbert (1984), using materials similar to (3). They were interested in universal properties of intonation contours, not discourse semantics; and saw these contexts merely as useful to elicit accents of systematically different heights, which they called ‘‘background’’ (equivalent to theme) and ‘‘answer’’ (rheme). They analysed both as Hn. They found that accents on answers were on average higher than accents on backgrounds. However this difference was much greater in ‘‘answer-background’’ than ‘‘background-answer’’ order. Liberman and Pierrehumbert analysed this as the combined effect of the relative prominence of the two accents and ‘‘final lowering’’, a rule which lowers the final accent in an intonational phrase (note these data were also discussed in Pierrehumbert (1980), where the difference was attributed to declination). I claim this is rather the effect of position in metrical prominence (as outlined above). They also found a greater dip in f0 following answer accents than background accents. Finally, although all of these studies only looked at f0, my claim is that theme/rheme status is marked by prominence. Prominence can also be signalled by other phonetic cues, including increased duration and intensity, vowel quality and spectral tilt (e.g. see Terken & Hermes, 2000). Therefore, I expect themes to be less prominent than rhemes by a combination of all these measures. 1.4. Experimental investigation of prosody Much of the experimental work reported above analysed read speech, with the context (usually a cueing question) read by the experimenter. There are long-noted concerns in prosodic research about whether read speech actually reflects natural production. For example, Speer et al. (2011) note that the pragmatic intentions are different: speakers want to achieve a communicative goal; whereas readers can tend to focus on speaking clearly and ‘‘correctly’’, assuming the meaning of their utterances is already known to the hearer (the experimenter). The prosodic characteristics of read and spontaneous speech are different (Blaauw, 1994; Howell & KadiHanifi, 1991; Yuan, Brenier, & Jurafsky, 2005), and these differences can change experimental results (see Ito & Speer, 2006). Read speech is particularly problematic for the intonational/prosodic functions of interest here. In read sentences, the information structure is either completely clear and explicit (because of the cueing context), or underspecified (because there is no context). Therefore, there is little communicative pressure to mark it expressively. In pilot studies for Calhoun (2004), I had difficulty getting naı¨ve subjects to produce clear intonational contrasts at all with these materials. This was the reason only one speaker was used in the first experiment (who even then only clearly accented a small number of the target words), and a phonetically trained speaker in the second production experiment (Calhoun, 2006, chap. 4). However, the generalisability of the results is evidently questionable. Other experimenters with similar materials have ‘‘coached’’ speakers to produce the required intonation (e.g. Arvaniti & Garding, 2007; Liberman & Pierrehumbert, 1984). I wished to avoid this because of inherent problems with experimenter bias, and since I was comparing two theories of theme/rheme marking, ‘‘coaching’’ would directly interfere with this. On the other hand, the advantage of read speech, e.g. over spontaneous speech corpora, is that the information structure and
segmental content can be carefully controlled. A semi-spontaneous scripted game task, developed by Speer et al. (2011) to look at the prosodic marking of syntactic ambiguity, seemed like the ideal compromise. Subjects participated in pairs in a co-operative board game, in which they had to move game pieces around the boards into their goals, while negotiating ‘‘hazards’’ and ‘‘rewards’’. They could only communicate using a set of sentence frames, e.g. I want to change the position of the square with the [cylinder/rectangle/triangle] (substituting the appropriate game piece name). This allowed the segmental material to be carefully controlled, while speakers still communicated in a reasonably natural, spontaneous way, with clear pragmatic goals. Speer, Warren, and Schafer have demonstrated the effectiveness of this design over a number of studies (Schafer, Speer, & Warren, 2000, 2005; Speer, Warren, & Schafer, 2003).
2. Experiment This experiment was closely based on the co-operative board game task developed by Speer et al. (2011), with some modifications to meet the requirements of this experiment. The aim was to investigate how the theme/rheme dimension of information structure is marked prosodically, i.e. the status of referents as theme or rheme in contrastive contexts. The term ‘information status’ is used to refer to the status of referents as theme or rheme. The game was therefore designed to elicit productions of the following two utterances (i.e. the reply in each). The speakers had different roles in each, Driver (‘D’) and Slider (‘S’) (to be explained below). There were slightly different versions of these utterances, coded ‘v1’ and ‘v2’. D-v1. Slider: OK, I think we should move the lollipop with the monkey. Driver: No, that’s no use, I want to move the LOLLIPOP with the RHINO, theme rheme and I want to move the BANANA with the MONKEY. rheme theme S-v1. Driver: If we do this, will the banana land on some money? Slider: No, the LOLLIPOP will land on some MONEY, rheme theme the BANANA will land on a MONSTER. theme rheme
The object names in capitals were the target words. Information status was defined with reference to the preceding context: referents were defined as themes if they were mentioned in the preceding utterance, rhemes if not. An object was defined as contrastive if it contrasted with an equivalent object in the same utterance (e.g. lollipop and banana, and rhino and monkey). (Note that objects were moved in pairs, so there were always two objects being moved (e.g. lollipop and banana), and two moving objects (e.g. rhino and monkey) to be negotiated in each turn, leading to a salient contrast.) In order to balance for any effect of position, the order of the referents in the first phrase (theme–rheme versus rheme–theme) was swapped between the two utterances. Further, to balance for the effect of position by speaker role, there were two versions of the sentence frames: version 1 (v1) above, and version 2 (v2) where the order of the phrases was swapped, as follows (context same as above). D-v2. Driver: No, that’s no use, I want to move the BANANA with the MONKEY, rheme theme and I want to move the LOLLIPOP with the RHINO. theme rheme S-v2. Slider: No, the BANANA will land on a MONSTER, theme rheme but the LOLLIPOP will land on some MONEY. rheme theme
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
2.1. Method 2.1.1. Participants Ten pairs of native speakers of New Zealand English, naı¨ve to the purposes of the experiment, participated. There were four female–male pairs, four female–female, and two male–male. All were students at Victoria University of Wellington, New Zealand, or part of the University community. They received book or music vouchers in recognition of their participation. 2.1.2. Design and materials Experimental sessions involved playing rounds of the cooperative board game. There were two kinds of game pieces, Helpers (monkey, rhino, wallaby and walrus) and Groceries (mango, lollipop, walnut and banana). Game piece names were the target words to be analysed. They were all at least two syllables long with penultimate or antepenultimate primary stress, in order to separate the accent from any following boundary tone. The stressed vowel was preceded and followed by a sonorant consonant, providing a continuous f0 contour and avoiding microprosodic effects in the onsets and offsets of obstruents (see Ladd, 2008, online appendix). Of English words conforming to these requirements, the most plausible game pieces were ‘‘animals’’ and ‘‘fruit/sweets’’, resulting in the Helpers and Groceries above. There were also ‘‘board features’’ (money, monsters), which also conformed to these requirements. Finally, one of the target utterances referred to an empty square (see utterance S3 in Appendix B), which fitted the correct profile if the stress was on empty and there was no break between an and empty. Each game piece was a round colour-coded piece of plasticine (green for Helpers, red for Groceries) with a picture of the object stuck on. Participants were informed of the correct term for each piece using a printed chart. The design of the game boards and the rules and objectives of the game largely followed Speer et al.’s (2011) original design. The objective was to get all the Groceries from their starting points to their goals marked on the board, while trying to get them to land on Money and avoid Monsters (marked on the board) along the way. Players were awarded points (as a team) for each Grocery that reached its goal and for landing on Money. If a Grocery landed on a Monster, it could be ‘‘paid’’ with Money; however, if the pair did not have any Money, points were deducted. Participants took part in pairs, taking turns to be the Driver or the Slider. Only the Driver’s board showed the Grocery goals, and only the Slider’s board the Money and Monsters. The game pieces and locations were the same on both boards. Three pairs of game boards were used, plus a pair of practice boards and a demonstration board (see Fig. 3 and Appendix A).
333
The Driver decided which pieces to move so that the Groceries would end up in their goals. Each Grocery could only be moved one space at a time, when ‘‘pushed’’ by a directly adjacent Helper piece (e.g. in Fig. 3, the lollipop can be pushed one space left by the rhino or one space up by the monkey). In order to set up the double contrasts in the target utterances, two Groceries had to be moved at a time. The Slider would suggest which Helper should move one of Groceries that the Driver had proposed moving. The Driver would then either accept this or prescribe another move. He or she also had to check with the Slider whether the move would result in a Grocery landing on Money; and the Slider then had to tell the Driver whether each Grocery would land on Money, Monsters or an empty square (the example utterances above show one such possible exchange given the board configuration in Fig. 3). Neither player could see the other’s board. The layout of the boards was designed to maximise use of the target utterances.
2.1.3. Procedure Participants sat at either end of a long table, separated by a divider in a quiet room. They wore head-mounted microphones. They were recorded onto mini-disk at a sampling rate of 44 kHz. They were accompanied by an experimenter who explained the game using the demonstration board and supervised, offering advice where necessary. She handed out cards for the Money and Monsters landed on, and awarded or deducted points. She also operated the recording equipment. The players’ conversation was restricted to a set of sentence frames and game piece names (see Appendix B). Points were deducted for using expressions not on the scripted list. Players were free to choose which sequence of moves to follow. They were therefore generally free to choose which sentences to use when. However there were certain restrictions to stop them from missing out steps in the negotiation of each move, since this would make them less likely to use the target utterances. Speakers were encouraged to use the sentence frames naturally without reading from the printed list, and generally became familiar and fluent with them during the course of the experiment. In order not to bias the participants’ productions, the experimenter was careful not to use the sentence frames or piece names herself. If needed, each sentence frame was printed with a number which the experimenter could refer to. The two versions of the sentence frames (see above) were used in alternate experiment sessions, so both participants in each experiment used the same version. To ensure participants produced the Grocery names in the order intended, piece names were followed by an index (1 or 2), signifying whether that piece had been referred to in the previous utterance (1) or not (2).
Fig. 3. Driver (left) and Slider (right) game boards for game 1. The starting positions of the Helpers (background lines) and the Groceries (background dots) are shown by the large pictures (note these were green and red on the actual board). Goals for the Groceries are shown by the small pictures in the top right corner of the square on the Driver’s board. The board ‘‘features’’, money and monsters, were printed on the Slider’s board only.
334
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
For example, if the Driver asked ‘‘Will the mango land on some money?’’, and the Slider replied using the following frame: 2 S3.
mango_2
3
6 lollipop_2 7 6 7 No, the 6 7 will land on some money, 4 walnut_2 5 banana_2 2
3 mango_1 " # 6 lollipop_1 7 amonster 6 7 the 6 . 7 will land on anemptysquare 4 walnut_1 5 banana_1 then the mango had to be in the second phrase, as it was mango_1. Participants firstly played using the practice board, once in each role (Driver and Slider). Then they played each of the other three game boards, alternating Driver and Slider roles; before repeating the three game boards with opposite roles. The entire session took around two and a half hours.
2.1.4. Phonetic analysis The author manually annotated key points at the segmental level and in the f0 contour for each target utterance using the Praat annotation tool (Boersma & Weenink, 2008). This annotation was used to automatically extract acoustic measures relevant to testing whether theme/rheme status is marked by accent type or relative prominence.
2.1.5. Annotation A sample annotation is shown in Fig. 4. The annotation criteria are described below (from the bottom annotation tier up).
Word Tier: orthographic transcription of the target word, along with its information status: theme (T) or rheme (R) (according to the experiment design). Phone Tier: phones in the primary stressed syllable of the target word by type.
C0: the consonant before the stressed vowel Vn: the stressed vowel C1: consonant following the stressed vowel C0 and C1 were either nasal consonants or approximants. Nasals were relatively straight-forward to segment on the basis of intensity and formant patterns in the spectrogram. Approximants, while good for f0 tracking, are harder to segment (Turk, Nakai, & Sugahara, 2006). The whole region of formant movement was included in the consonant, ending where full voicing began, as shown in Fig. 4. Where the stressed vowel was followed by a consonant cluster, e.g. mango, the second consonant was not included in C1. Where there was only one consonant, e.g. banana, this was all in C1, even if it seemed ambisyllabic or the onset of the following syllable. All target syllables therefore had the same number of segments. Turns Tier: the turning points in any (L þ)Hn accent on the target word and any following boundary. In some cases the turning points were not clearly identifiable (because the rise was very small or gradual, or because the f0 track was missing), or there was no f0 rise; ‘F’ or ‘?’ markers were used in these cases:
f0pre: the intensity peak in the waveform in the last syllable in
the word before the target one; if this word was accented, marked at a low, stable point in the f0 contour at the end of this word, where possible. L: the f0 low point at the start of the accent; if there was a long low portion, this was marked at the point where the f0 track
Fig. 4. Sample annotation of a target word. Annotation was done based on the waveform, spectrogram and f0 track shown. The first annotation tier shows the f0 turning points, the second the phone level annotation and the third the word level (see text for more details).
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
began to rise sharply. Marked L? at the closest stable low point if the f0 track was missing or unreliable (because of creaky voice), or if the rise was so gradual there was no clear turning point (where the total rise was less than 10 Hz). H: the f0 peak of the accent, marked at the absolute peak. If followed by an HH% boundary, this was marked at the turning point to the boundary rise (there were 5 such cases, in all of them the boundary rise was much steeper than the accentual rise). If the peak was a flat ‘hat’, it was marked H? at the end of the high portion. Marked H? at the closest stable high point if the f0 track was missing, or if the rise was so gradual there was no clear turning point (where the total rise was less than 10 Hz). F: flat, no f0 rise (or less than 5 Hz), marked in the middle of Vn. f0post: f0 after the f0 contour had levelled off after the postaccentual fall. Only marked if this level was reached within two words, and before any following accent; not marked if there was a following HH%, or if there was no f0 track. B: marked if there was a following break (ToBI 3 or 4), according to usual criteria, i.e. disjuncture given pause, lengthening and f0 reset cues.
2.1.6. Accent type measures As discussed in Section 1, it is disputed whether LþHn and Hn accents, claimed to mark theme/rheme status, are in fact phonologically distinct, and if so, what their phonetic correlates are. Therefore, it was decided a phonological analysis of accent type was not worthwhile. Rather, differences in the realisation of accents on themes and rhemes were looked at, which correspond to proposed phonetic differences between the two accent types. The measures, extracted automatically, were:
335
2.1.8. Mixed effects modelling Linear mixed effects (LME) models were used to analyse the data (Baayen, 2008; Baayen, Davidson, & Bates, 2008), using the lmer function in the R statistical package (Bates, 2005; R Development Core Team, 2010). LME allows the effects of the experimental conditions (‘fixed effects’), and participants and items (‘random effects’) to be tested separately, rather than assuming that the random effects exist and averaging over them, as in ANOVAs. This is particularly useful in the Relative Prominence analysis, where there are two item effects, the first and second target word. LME is also more robust to unbalanced designs, such as this game task produced. To derive the models reported below, first all possible fourway interactions were included, and then each was removed if it was not significant (p o0.05), then every three-way interaction, etc. (note lmer standardly includes lower order effects if there is a higher order interaction in the model, and it is usual to keep them in, Baayen, 2008, chap. 7). The final models therefore only include significant interactions and simple effects, except for the simple effect of information status (InfoStat), which was always included as that was of primary interest. Significance was tested using Markov chain Monte Carlo (MCMC) sampling implemented in R’s pvals.fnc (Baayen, 2008, chap. 7). ANOVAs were used to test whether the final model (i.e. significant fixed effects plus InfoStat) was significantly different with and without each random effect (see Baayen, 2008, chap. 7). Non-significant random effects were excluded (p o0.05). Finally, extreme values with standardised residuals more than 2.5 sd from 0 were removed, and each model refitted, as these could have had a distorting effect (see Baayen, 2008, chap. 7), this excluded between 2 and 17 tokens (1.2–5.7%) from each model. 2.2. Predictions
L_f0 and H_f0: f0 in Hz at the points marked L and H. LH_rise: H_f0–L_f0, the rise in f0. L_T and H_T: the time of L and H relative to the stressed syllable, calculated as follows (the same formula was used for H_T, with H instead of L): L_T¼(L_Tabs startC0_T)/(endC1_T startC0_T)
where L_Tabs is the time of L in the sound file, startC0 is the start of the C0 segment, and endC1 is the end of the C1 segment (all in absolute time in the sound file). f0_pre_post: the change in f0 level between f0pre and f0post. Boundary: the type of boundary (for tokens marked B), L% (L- or LL%) or H% (H-, LH% or HH%). This was judged manually.
2.1.7. Relative prominence measures To test whether theme/rheme status is marked by the relative prominence, measures were extracted which would show a change in prominence between the first and second target word in each phrase of the target utterances (e.g. between lollipop and rhino in (D) above). In each case, the measure was extracted over the stressed syllable of the target word (i.e. from the start of C0 to the end of C1):
If information status (i.e. whether a referent is theme or rheme) is marked by pitch accent type, there should be significant differences in the alignment and depth of L and H by information status (InfoStat). L and H should both be aligned later relative to the stressed syllable, and have lower f0, in theme than rheme accents. There should also be a larger dip in f0_pre_post in rhemes than themes, and more themes should have H% boundaries. If information status is marked by relative prominence, themes should be realised with lower prominence than rhemes, over the four prominence measures. The difference in prominence should be much greater in rheme–theme than theme–rheme order. Other factors were also tested which could affect the phonetic realisation of the target words. These were Position (start v. end, e.g. lollipop versus rhino in D-v1 above) in the Accent Type analysis, and Phrase (first versus second, e.g. lollipop/rhino versus banana/monkey in D-v1 above), Role (Driver versus Slider), and speaker Sex (female versus male) in both analyses. Although these may affect the phonetic measures, e.g. Sex on f0 level, neither theory would predict them to significantly interact with InfoStat. 2.3. Results
f0mean_diff: the difference in mean f0 of the stressed syllable between the first and second target words.
f0max_diff: the difference in f0 between the points marked H in the first and second target words.
Int_diff: the difference in the mean of the intensity values in the stressed syllable of the first and second target words.
Dur_diff: the difference in the duration of the stressed syllable between the first and second target words.
Five pairs of participants played using each version of the sentence frames. There were 68 D-v1 utterances, 68 S-v1, 76 D-v2 and 70 S-v2 (with four target words per utterance). Each speaker produced an average of 7.2 Driver utterances (ranging from 5 to 9) and 6.9 Slider utterances (ranging from 3 to 12). Speakers only used the utterances if appropriate in the game context, so equal numbers per speaker were not guaranteed.
336
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
2.3.1. Exclusions 38 target words were excluded from analysis because of disfluency, i.e. mispronunciations, word-internal pausing, restarts and non-speech events (e.g. sniffing, laughing) during the target word. Disfluencies in other parts of the utterance, including just before the target words, were included, as these were reasonably frequent. 21 tokens of empty (half of the total) were excluded as they were produced with phrasal stress on square, i.e. an empty SQUARE, not an EMPTY square; this represents 7.6% of the target words in that position for the Slider utterances. A sizeable proportion of target words were not produced with clear (L þ)Hn accents, i.e. those annotated F, or L? and H? To test whether themes and rhemes are marked by distinct accent types, the comparison must be between accents where the phonetic cues to accent type can be reliably measured, so these were excluded from the Accent Type data set. In total, 159 (28.9%) theme tokens, and 314 (58.1%) rheme tokens were included. In theme–rheme order this was 126 themes (46.3%) and 114 (41.6%) rhemes, versus 33 themes (11.9%) and 200 (75.2%) rhemes in rheme–theme order. The implications of these exclusions will be discussed in Section 3. All target words were included in the Relative Prominence data set. According to this theory, themes should be less prominent than rhemes regardless of accent status. However, accented word pairs were also analysed separately for comparison (87 pairs, 16.4%). For the boundary type analysis, accented and unaccented target words were included. However, only target words at the end of their phrase were used (e.g. rhino and monkey in D-v1), as most target words at the start of the phrase did not have a boundary (83.3%). Target words at the end of the phrase without a boundary were also omitted (5.1%). f0 features were extracted automatically using Praat. Not all f0 measures could be extracted because of problems with the automatic f0 extraction, or because they could not be reliably annotated. For the Accent Type data set, the L_f0 measure was missing for 23 target words, 1 for the H_f0 measure, 24 for LH_rise, and 83 for f0_pre_post. Values of L_f0, H_f0 and f0_pre_post which were statistical outliers were manually checked and either corrected or excluded where necessary (7 exclusions; note that Praat calculates f0 slightly differently in the annotation display and the ‘‘pitch objects’’ used to extract f0 values automatically, leading to these errors). For the L_T and H_T measures, points marked L? or H? were also excluded, as this indicated the annotator was not sure of their timing (69 tokens of L_T, 21 tokens of H_T; there were more problems with L_T as voicing was often weak or missing at the beginning of the target words). For the Relative Prominence data set, 2 f0mean_diff tokens and 13 f0max_diff tokens could not be extracted successfully (out of 510 possible tokens).
2.3.2. Accent type analysis Fig. 5 shows the estimated means for each measure of Accent Type by InfoStat (theme/rheme) derived from their respective linear mixed effects models, and also any significant interactions between InfoStat and the other factors. The error bars show standard errors, also derived from the LME models (note that these are not used directly in the MCMC sampling significance testing, so they may not always match the significance indicated below the bars. Standard errors were used rather than confidence intervals from the MCMC sampling, as the latter are not stable). Appendix C gives full summaries of these LME models, along with their log-likelihoods, a measure of the overall goodness-of-fit of the model. Rather than testing whether the means of main effects are significantly different, as in ANOVA, LME tests whether the estimated regression coefficient of each simple effect, i.e. the
‘‘non-default’’ value of a factor, is significantly different from a baseline condition, or intercept, where all factors have their ‘‘default’’ value. For these models, the baseline was InfoStat¼ rheme, Position¼end, Phrase¼first, Role¼Driver and Sex ¼female. As Fig. 5 shows, there is no simple effect of InfoStat for the L_f0 measure (t ¼0.75, p 40.3). This means that the estimated mean for themes is not significantly different from the baseline condition (all other factors at ‘‘default’’ value). The coefficient for InfoStat (i.e. the difference from the intercept) holds over the ‘‘non-default’’ values of all the other factors with which it has no interaction (in this case Position, Role and Sex). For example, there is a large effect of Sex on L_f0 (t ¼ 12.59, po0.001) (see Appendix C): the estimated mean for male speakers is 85 Hz lower than female speakers. However, as there is no interaction between InfoStat and Sex, the estimated (non-)difference between themes and rhemes still holds for male speakers, though the estimated means will both be lower. Interactions show the difference in the coefficient from what would be expected given the additive effects of the simple fixed effects making up that interaction (the MCMC sampling tests the significance of this difference). There is an interaction between InfoStat and Phrase (t ¼ 2.53, p ¼0.013). For target words in the second phrase, L_f0 in themes is estimated to be on average 4.3 Hz lower than in rhemes, i.e. the predicted direction. For reference, the raw means for these data (including residual outliers) by InfoStat are given in Table 1 (by Sex as the ranges for male and female speakers are so different), and for interactions found in the LME models in Table 2. Note, though, that the analysis here is based on the estimated means in the LME models and MCMC significance testing. The estimated means take into account speaker-dependent variation, so they may not always match the raw means. Fig. 5 also shows a simple effect of InfoStat for both H_f0 (t ¼ 5.65, po 0.001) and LH_rise (t ¼ 4.77, p o0.001), with theme H_f0 peaks estimated to be 12 Hz less than rheme peaks, and the f0 rise estimated to be 9 Hz less in themes than rhemes; in line with Accent Type predictions. However, this is not inconsistent with the Relative Prominence hypothesis: if themes are relatively lower than rhemes, it is not unexpected they will be absolutely lower as well. There is no effect of InfoStat on f0_pre_post (t ¼ 1.16, p ¼0.249). f0 dipped after the accent for both themes and rhemes. As Fig. 5 shows, there is a simple effect of InfoStat on L_T (t ¼3.30, p ¼0.001), in the predicted direction. In the baseline, rheme, case L is aligned at 0.26, i.e. approximately a quarter of the way into the stressed syllable of the target word; whereas for themes the estimated mean is 0.38, i.e. more than a third into the stressed syllable. However, this is not consistent. There is an interaction between InfoStat and Role (t¼ 3.02, p ¼0.001). In Fig. 5, there looks to be no effect of InfoStat for the Slider utterances. There is also an interaction with Sex (t ¼ 2.36, p¼0.019). L is aligned earlier for male speakers, closer to the start of the stressed syllable (t¼ 2.21, p ¼0.019); and as Fig. 5 shows there is a much smaller difference by InfoStat, estimated to be 0.02. There is no simple effect of InfoStat on H_T (t ¼ 1.19, p¼0.281), though there is an interaction with Phrase (t ¼2.15, p¼0.034). Peaks are earlier in the second phrase (t ¼ 3.91, po0.001), at 1.02 for rhemes, about the end of the stressed syllable, versus 1.06 for themes, slightly further into the next syllable. Finally, a Generalised Mixed Effects Model, also implemented in lmer (see Baayen, 2008, chap. 7), was used to test the effect of each factor on the log odds of an H% boundary (as opposed to L%). Recall this analysis only included target words at the ends of phrases with a boundary. The simple effect of InfoStat on boundary type was not significant (t ¼0.70, p40.3). However, there were interactions between InfoStat and Phrase (t ¼ 2.63,
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
337
Fig. 5. Estimated means in mixed effects models of phonetic measures of Accent Type, for simple effects and interactions involving InfoStat. Fixed effects are relative to the Intercept, which is the ‘default’ level for all fixed effects in the model (see text). Error bars show standard errors. Significance is indicated above the bar label (ns¼ nonsignificant, *p o 0.05, **p o 0.01, ***p o 0.001).
p ¼0.009), and InfoStat and Role (t ¼5.45, po0.001). Table 3 shows the counts of boundary type by InfoStat, Phrase and Role (although note this three-way interaction did not reach significance). In the first phrase of the Slider utterances rhemes usually had L% boundaries and themes H% boundaries, as predicted; in
the second phrase, most themes had L% boundaries also, though they were still more likely to have H% boundaries than rhemes. In the Driver utterances, there does not seem to be any InfoStat effect: most themes and rhemes had L% boundaries, though more of both had H% boundaries in the first phrase than the second.
338
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
Table 1 Descriptive statistics of phonetic measures of Accent Type, separated by speaker Sex. f0 measures are in Hz, Time (T) measures are proportions of the stressed syllable of the target word.
Table 3 Counts of following boundary type (L% or H%) by Role, Phrase and InfoStat for target words in the end Position. Role
Sex
InfoStat
Female
Rheme mean sd N Theme mean sd N
L_f0
H_f0
LH_rise
f0_pre_post
L_T
Phrase
InfoStat
Boundary
H_T L%
Male
Rheme mean sd N Theme mean sd N
193.3 22.0 153
242.4 45.7 161
48.4 35.3 153
11.3 24.0 138
0.262 0.227 134
1.058 0.271 157
189.3 21.0 91
227.5 35.9 94
37.0 28.1 91
10.6 16.2 67
0.315 0.262 82
1.184 0.289 90
104.4 15.0 145
129.0 24.5 152
25.3 16.7 144
4.1 11.8 132
0.168 0.161 131
0.962 0.188 143
107.1 10.2 61
125.0 16.0 65
18.3 11.4 61
0.6 7.3 53
0.137 0.147 57
0.984 0.212 62
Driver
First Second Total
Slider
First Second Total
Total
First Second Total
Table 2 Descriptive statistics of the phonetic measures of Accent Type for which there was an interaction with InfoStat in the LME analysis (apart from with Sex, listed above). InfoStat
L_f0
L_T
H_T
First ph
Second ph
Driver
Slider
First ph
Second ph
Rheme mean sd N
153.4 47.4 184
144.6 49.5 114
0.218 0.206 139
0.213 0.199 126
1.053 0.246 184
0.948 0.215 116
Theme mean sd N
156.3 45.6 80
156.3 42.6 72
0.308 0.267 55
0.199 0.207 84
1.127 0.289 80
1.076 0.263 72
2.3.3. Relative prominence analysis Fig. 6 shows the estimated means for each measure of Relative Prominence by InfoStat (theme–rheme/rheme–theme) derived from the LME models, and also any significant interactions between InfoStat and the other factors. For these models, the baseline was InfoStat¼ theme–rheme, Phrase¼ first, Role¼Driver and Sex¼female. Appendix C gives full summaries of these LME models, along with their log-likelihoods. As above, for reference, the raw means for these data (including residual outliers) by InfoStat are given in Table 4 (f0 measures are again by Sex), and for interactions found in the LME models in Table 5; though these are not used in this analysis. As can be seen in Fig. 6, there is a simple effect of InfoStat on f0mean_diff in the predicted direction (t¼11.54, po0.001). The estimated means are most easily interpreted as distances from ‘‘equality’’, i.e. both words having equal f0 (indicated by the dashed lines in Fig. 6). In theme–rheme order, the second word is estimated to be 8 Hz higher than the first, whereas in rheme– theme order, the second word is 30 Hz lower. There is a significant interaction between InfoStat and Sex (t ¼ 4.56, p o0.001): for male speakers the effect of InfoStat is less extreme: in theme–rheme order, the second word is 3 Hz higher, in rheme– theme order 10 Hz lower. There is a simple effect of InfoStat on f0max_diff (t ¼17.51, p o0.001). In theme–rheme order the second peak is estimated to be 26 Hz higher than the first, in rheme–theme order the second peak is 43 Hz lower. There is an interaction with Sex (t¼ 6.27, p o0.001). Once more the effect
H%
Rheme Theme Rheme Theme Rheme Theme
37 34 62 64 99 98
26 28 11 2 37 30
Rheme Theme Rheme Theme Rheme Theme
43 7 56 45 99 52
8 55 2 24 10 79
Rheme Theme Rheme Theme Rheme Theme
80 41 118 109 198 150
34 83 13 26 47 109
for male speakers is less extreme, with the difference between InfoStat values being 38 Hz. There is also an interaction between InfoStat and Phrase (t ¼ 5.78, p o0.001). In the second phrase, the second peak is estimated to be 10 Hz lower than the first in theme–rheme order, but 50 Hz lower in rheme–theme order, i.e. still in the predicted direction. There is a simple effect of InfoStat on Int_diff in the predicted direction (t ¼6.97, po0.001). In theme–rheme order, the stressed syllable of the second word is estimated to be nearly the same intensity as the first (0.6 dB lower), whereas in rheme–theme order there is a 3.6 dB drop. There were no significant interactions. There is also a simple effect of Dur_diff in the predicted direction (t ¼2.79, p ¼0.004). In theme–rheme order, the stressed syllable of the second word is estimated to be 20 ms longer than the first, whereas in rheme– theme order the second word is 4 ms longer. There were no significant interactions. The Relative Prominence hypothesis predicts paired theme/ rheme peaks should be correlated. Therefore, the correlation between H_f0 in the first and second target words in each phrase was tested, using the data set in the LME analysis (with residual outliers removed). Male and female speakers were analysed separately, as otherwise the consistency in f0 values by speaker sex artificially inflated correlations enormously. For female speakers r ¼0.355 (d.f. ¼273, po0.001), for male speakers r ¼0.499 (d.f. ¼205, po0.001). Breaking this down by InfoStat: for female speakers, in theme–rheme order r ¼0.541 (d.f. ¼131, po0.001), in rheme–theme order r¼0.422 (d.f.¼ 140, po0.001); for male speakers, in theme–rheme order r ¼0.706 (d.f.¼ 104, po0.001), in rheme–theme order r¼ 0.625 (d.f. ¼99, po0.001). Overall, correlations between f0 values were reasonably strong, although weaker for female speakers. There was a stronger correlation within each InfoStat than overall, suggesting each InfoStat is more internally consistent than all H_f0 pairings together, consistent with predictions. The scatterplots in Fig. 7 show H_f0 values for the first word against the second. The dotted lines in each graph show equality, i.e. points where the first and second peak are of equal height. As can be seen, while there is some overlap, the ‘R’ tokens (rheme–theme) tend to cluster to the top-left, where the second peak is lower than the first, and there are virtually no ‘R’ tokens below the equality line. The ‘T’ tokens (theme–rheme) tend to cluster near the equality line, and are more concentrated bottom-right, where the second peak is higher. In theme–rheme
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
339
Fig. 6. Estimated means over all words for mixed effects models of phonetic measures of Relative Prominence, for simple effects and interactions involving InfoStat. Fixed effects are relative to the Intercept. Error bars show standard errors. Significance is indicated above the bar label (ns ¼ non-significant, *p o 0.05, **p o 0.01, ***p o0.001). The dashed lines show equality, i.e. no difference between the target words.
Table 4 Descriptive statistics of phonetic measures of Relative Prominence over all words. f0 measures separated by speaker Sex. f0 measures are in Hz, intensity (int) in dB, and duration (dur) in ms. Sex
InfoStat
Female
Th-Rh mean sd N Rh-Th mean sd N
Male
Th-Rh mean sd N Rh-Th mean sd N
f0mean_diff
f0max_diff
9.6 31.4 138
7.6 41.5 139
42.7 43.1 152
58.9 53.0 148
1.1 26.8 109
0.3 17.7 106
0.9 70.6 109
20.4 50.7 104
InfoStat Th-Rh mean sd N Rh-Th mean sd N
Int_diff
Dur_diff
3.43 4.78 247
7.5 72.1 247
5.82 4.58 263
22.2 61.5 263
order, there seems to be a pattern that at lower f0 levels, the second peak is slightly lower than the first, whereas at higher f0 levels, the second peak is higher. The correlations for f0 mean
Table 5 Descriptive statistics of phonetic measure of Relative Prominence over all words for which there was an significant interaction with Type in the LME analysis (apart from with Sex, listed above). InfoStat
f0max_diff First ph
Second ph
Th-Rh mean sd N
1.2 40.6 115
7.3 25.5 130
Rh-Th mean sd N
49.8 47.2 136
35.1 62.9 116
were very similar to the H_f0 results, so these are not reported. For mean intensity, there was a reasonably strong correlation between the first and second target words, r ¼0.560 (d.f. ¼502, po0.001). This did not vary substantially by InfoStat or Sex. There was a weaker correlation with syllable duration, r¼0.320 (d.f. ¼496, po0.001). Again, the correlations by InfoStat and Sex were similar.
340
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
rheme order, mean f0 is estimated to be 2 Hz higher in the second word than the first, whereas in rheme–theme order the second word is 13 Hz lower. There were no interactions with InfoStat. There is a simple effect of InfoStat on f0max_diff as predicted (t ¼4.11, po0.001). In theme–rheme order, the second peak is 18 Hz higher, whereas in rheme–theme order, it is 8.5 Hz lower. There were no interactions with InfoStat. There was a simple effect of InfoStat on Int_diff in the predicted direction (t ¼4.64, po0.001). The difference was estimated to be 6 dB greater in rheme–theme order than theme–rheme order. There was an interaction between InfoStat and Phrase (t¼ 2.43, p ¼0.020). The InfoStat difference was much smaller for word pairs in the second phrase, at 2.3 dB. There was also an interaction with Sex (t ¼ 3.76, p o0.001). As Fig. 8 shows, there was also no effect of InfoStat on intensity for male speakers. Finally, there was no significant effect of InfoStat on Dur_diff (t¼ 0.61, p 40.3). The correlations between H_f0 for the first and second target word were also tested on this restricted data set. For female speakers r ¼0.335 (d.f. ¼33, p¼ 0.049), for male speakers r ¼0.562 (d.f. ¼45, po0.001). Breaking this down by InfoStat: for female speakers, in theme–rheme order the correlation was not significant (p 40.3), in rheme–theme order r¼ 0.800 (d.f. ¼11, p¼0.001); for male speakers, in theme–rheme order r ¼0.634 (d.f. ¼36, po0.001), in rheme–theme order r ¼0.842 (d.f. ¼7, p¼0.004). Overall, the correlations seem to be stronger than over all word pairs, this is not surprising as (L þ)Hn accented pairs might be expected to be more homogenous. Apart from the anomalous result for theme–rheme order pairs for female speakers, the correlations are again stronger within InfoStat than overall. Fig. 9 shows scatterplots of H_f0 for the first against the second word, the dotted line again showing equality. As can be seen, once again, ‘R’ tokens are almost all above the equality line, tending to cluster to the top-left, where the first peak is higher than the second. ‘T’ tokens tend to be close to the equality line, or to the bottom-right, where the second peak is higher. However, for the female speakers, there is a small group of anomalous ‘T’ tokens at the top of the graph, explaining the lack of a correlation in that condition.
3. Discussion
Fig. 7. Scatterplots showing the f0 level at H for the first against the second target word for all word pairs. Points marked ‘T’ are theme–rheme order, ‘R’ rheme– theme order. The dashed lines show linear regression lines for rheme–theme order (top) and theme–rheme order (bottom). The dotted line shows the values at which the first and second peaks would have equal height. Female and male speakers are plotted separately.
As discussed above, the Relative Prominence data set was much more inclusive than the Accent Type data set, because the Accent Type measures were only valid for (Lþ )Hn accented words. The Relative Prominence measures were therefore analysed again, only including pairs where both words had clear (L þ)Hn accents, to see if the results still held. Fig. 8 shows the estimated means derived from the LME models by InfoStat, and significant interactions with InfoStat (see Appendix C for full summaries). The raw means for these data (including residual outliers) by InfoStat are given for reference in Table 6, and interactions in Table 7; though these are not used in this analysis. As can be seen, there is a simple effect of InfoStat on f0mean_diff in the predicted direction (t¼ 4.20, p o0.001). In theme–
According to the Accent Type theory, thematic contrasts are marked by LþHn (LH%), while rhematic contrasts are marked by ¨ Hn (LL%) (Buring, 2003; Jackendoff, 1972; Steedman, 2000). Although there were some results consistent with this hypothesis, it is very difficult to see how Accent Type could be a reliable marker of theme/rheme status based on these results. As in my previous studies, the L tended to be slightly lower and later in themes than rhemes (Calhoun, 2004, 2006). However, the effect on the f0 level of L was very small (4 Hz) and was only found in the second phrase. It is doubtful this is salient and consistent enough to be a useful perceptual cue. The difference in L alignment for female speakers in Driver utterances was larger (0.12 of the stressed syllable). However, this is smaller than most of the phonologically significant differences in alignment reported in the literature (e.g. Ladd, Schepman, White, Quarmby, & Stackhouse, 2009; Pierrehumbert & Steele, 1989). This difference also disappeared in the Slider utterances, and was much smaller for male speakers (0.02), so it is again unlikely to be useful perceptually. Likewise, the H was aligned slightly later in themes (0.04), but only in the second phrase, so this is unlikely to be perceptually useful. Unlike in my previous studies, there was no effect of theme/rheme status on the f0 dip following the accent. As in my previous studies, the most robust effect was for the f0 level of H, with themes estimated to be 12 Hz lower, and the f0
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
341
Fig. 8. Estimated means over word pairs with full accents for mixed effects models of phonetic measures of Relative Prominence, for simple effects and interactions involving InfoStat. Fixed effects are relative to the Intercept. Error bars show standard errors. Significance is indicated above the bar label (ns ¼non-significant, *po 0.05, **p o 0.01, ***p o 0.001). The dashed lines show equality, i.e. no difference between the target words.
Table 6 Descriptive statistics of phonetic measures of Relative Prominence for word pairs with (L þ)Hn accents. f0 and intensity (Int) measures separated by speaker Sex. f0 measures are in Hz, intensity in dB, and duration (dur) in ms. Sex
InfoStat
Female
Th-Rh mean sd N Rh-Th mean sd N
Male
Th-Rh mean sd N Rh-Th mean sd N
f0mean_diff
f0max_diff
Int_diff
2.9 38.6 25
2.9 56.4 25
4.33 4.75 25
22.7 20.1 14
27.4 30.6 14
4.79 4.80 14
4.8 10.4 39
0.8 18.9 38
3.46 4.09 39
16.0 10.6 9
18.8 11.3 9
0.92 2.82 9
InfoStat Th-Rh mean sd N Rh-Th mean sd N
Dur_diff
10.5 69.8 64 34.2 52.1 23
rise, with theme accents estimated to have a 9 Hz smaller rise. However, this is also consistent with a relative height difference. Overall, therefore, the phonetic differences between theme and
Table 7 Descriptive statistics of phonetic measure of Relative Prominence over (Lþ )Hn accented word pairs for which there was an significant interaction with Type in the LME analysis (apart from with Sex, listed above). InfoStat
Th-Rh mean sd N Rh-Th mean sd N
Int_diff First ph
Second ph
3.10 4.67 39
4.89 3.61 25
3.02 4.46 13
3.60 4.76 10
rheme accents were small and inconsistent. These subtle effects could be by-products of pitch manipulations meant to signal relative prominence, or due to other pragmatic features of the utterances, e.g. differences between the Drive and Slider utterances caused by their different pragmatic roles. Further, though both themes and rhemes were contrastive in the context, only 29% of themes had (L þ)Hn accents, compared to 58% of rhemes (similar to my previous studies). It therefore seems unlikely that
342
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
Fig. 9. Scatterplots showing the f0 level at H for the first against the second target word for word pairs with full accents. Points marked ‘T’ are theme–rheme order, ‘R’ rheme–theme order. The dashed lines show linear regression lines for rheme– theme order (top) and theme–rheme order (bottom). The dotted line shows the values at which the first and second peaks would have equal height. Female and male speakers are plotted separately.
accent type is the only, or even the main cue, to theme/rheme status. As for whether a H% (or LH%) boundary is part of a theme ¨ ‘‘tune’’ (Buring, 2003), while there was a tendency for themes to have H% boundaries more than rhemes, this was not consistent. In the first phrase of Slider utterances, this tendency was very strong, however it was less so in the second phrase, where most themes had L% boundaries. In the Driver utterances, this did not follow at all: most H% boundaries occurred at the end of the first phrase, in both themes and rhemes, typical for a continuation rise. It seems that themehood is not sufficient to explain the pragmatic use of H% boundaries, as factors such as the utterance type (Driver
versus Slider) and position in the utterance (first or second phrase) significantly affect their use. On the other hand, these results provide clear support for the Relative Prominence hypothesis: themes are relatively less prominent than rhemes in metrical structure (given that the last of consecutive accents of equal phonetic prominence in a phrasal structure is the most structurally prominent) (Calhoun, 2006, 2010a). There was a predicted effect of information status on all four measures of Relative Prominence. The effects on f0 were very large, the difference between theme–rheme and rheme–theme order being 38 Hz for mean f0 and 69 Hz for max f0. This effect was less extreme for male speakers, but still large and in the right direction. Unlike in my previous study, the rheme peak was usually higher than the theme in both orders (apart from in second phrases for max f0); however, there was a much larger difference in rheme–theme order, as before (Calhoun, 2004, 2006). There were also strong correlations between first and second peaks, by information status order. The effects on intensity and duration were smaller, but in the predicted direction. The second target word was longer than the first in both conditions, presumably because of the effects of final lengthening (e.g. see Turk & Shattuck-Hufnagel, 2007), though the difference was greater in theme–rheme order. Interestingly, there seems to be the opposite tendency for intensity: in theme–rheme order the two target words were on average of nearly equal intensity, whereas in rheme–theme order there was a clear drop. This could be consistent with Gussenhoven’s Production Code (2002), i.e. all else being equal, speakers put in less effort towards the end of a phrase, as breath runs out. In summary, for these utterances at least, prominence seems to be primarily conveyed by peak height, with intensity and duration providing weaker cues. The findings were similar when these measures were tested on word pairs with (Lþ )Hn accents, showing the Relative Prominence findings were not down to the difference in the sizes of the Accent Type and Relative Prominence data sets. A similar effect on f0 mean and maximum was found, and there was no interaction with speaker sex or phrase. Unsurprisingly, since both words had full accents, the f0 differences were smaller, but they were still large (15 Hz for f0 mean, 27 Hz for f0 max). The effects on intensity and duration were not as robust. The effect for intensity only held for female speakers, and there was no significant effect on duration. It may be that when both the theme and rheme have full (Lþ )Hn accents, relative prominence is primarily conveyed by peak height, rather than other phonetic cues. In rheme–theme order, the second peak is always lower than the first, usually by a large amount. However, in theme–rheme order, the second peak varies somewhat, from being higher than the first, to slightly lower. This raises the question of how listeners interpret the information status of two successive accents where the second is slightly lower. This pattern is probably ambiguous. However, the scatterplots in Fig. 7 suggest that at f0 levels lower in a speaker’s range this pattern is more likely to be theme–rheme, while at higher f0 levels it is more likely to be rheme–theme. This would be worthwhile investigating further. As in my previous studies, most themes were not produced with clear (Lþ)Hn accents, despite being in contrastive contexts. In rheme–theme order only 12% of themes had clear (Lþ)Hn accents. This is expected given the relative prominence hypothesis, but not the accent type one. As discussed, a ToBI accent type analysis was not done, but the tokens excluded from the Accent Type data set would probably have been annotated as Hn, !Hn or ?Hn, or for the ‘F’ tokens, Ln, !Hn or unaccented. Could it be that themes are in fact marked by !Hn or Ln accents? !Hn is usually seen as a variant of Hn, which is supposed to mark rhemes. !Hn also cannot mark themes in
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
theme–rheme order, as the first accent in a phrase cannot be downstepped. Ln could be used in that position, but this does not fit with the findings: many more themes in first position were marked with (Lþ)Hn accents than in second position. In any case, a substantial number of the rheme tokens in this study were not marked with (Lþ)Hn accents either, so !Hn (and maybe Ln) can mark rhemes too. Whichever way you look at it, an accent type account leads to considerable ambiguity and overlap in theme/rheme marking. The relative prominence hypothesis is more straight-forward, and better accounts for these findings. This study did not directly address the marking of any contrast/non-contrast distinction, as all target words were in contrastive contexts. An anonymous reviewer objected that only the rheme foci were truly contrastive in this experimental set-up: e.g. in D-v1 above, rhino directly contrasts with the Slider’s suggestion of monkey, and banana contrasts with the Slider’s suggestion of lollipop; while the contrasts on lollipop and monkey are more ‘‘optional’’, as they are mentioned in the Slider’s utterance, and arise only if the speaker chooses to mark the inherent contrasts in the juxtaposition of the first and second phrases of the Driver’s reply. I would argue that these are quintessential examples of marked themes (or contrastive topics) in the sense of Steedman (2000), Halliday (1968) and Jackendoff (1972): a referent is unambiguously defined as the theme or topic if it is mentioned in the cueing question or utterance; and these are contrastive in that the experimental set-up, and the form of the Driver’s reply, automatically sets up an alternate set of both Groceries and Helpers relevant to each clause, as there are always at least two of each to be negotiated in each move. However, even if these results were to be taken to show the prosodic distinction between contrastive and non-contrastive reference (with themes being non-contrastive and rhemes contrastive), these results are still not consistent with contrast being marked by LþHn. If this were so, we should find a clear difference in the shape and alignment of theme and rheme accents, which we do not. The only clear difference was in the height of the accents. However, a large proportion of rhemes were not realised with accents at all, or at least with high accents. On the other hand, the results seem consistent with my claim that all foci are semantically contrastive (after Rooth, 1992), but that contrastiveness varies pragmatically depending on the context and speaker intent, so that contrasts can be marked with different levels of prominence (Calhoun, 2009, 2010a). During the course of the game, the salience of these contrasts varied (e.g. as the participants grew familiar with the game), so that at different points speakers chose to mark the contrasts more or less strongly, leading to the variation in realisation found in both themes and rhemes, from full (L þ)Hn accented to unaccented – the consistent effect being the relative prominence difference between themes and rhemes. These results support Ladd and Schepman’s (2003) and Dilley’s (2005) call for Lþ Hn and Hn accents to be merged in the ToBI annotation scheme. As discussed in Section 1.2, this distinction is dubious on phonetic grounds. The question is then whether it marks a semantic distinction. This study shows LþHn/Hn does not mark theme/rheme status. The empirical evidence reviewed in Section 1.2 suggests that the claim that LþHn marks contrast reduces to ‘‘contrastive’’ accents being higher. As argued in Calhoun (2006), I claim that LþHn is used to mark accents that are not really of a different type, but rather at the extreme of a gradient scale of prominence (see also Ladd, 2008). It would be better to recognise this explicitly. The subjects were native speakers of New Zealand English, so this experiment is only direct evidence of theme/rheme marking in that variety. However, there is no reason to think these results would not generalise to other varieties of English. The results are very similar to my earlier, smaller, studies (Calhoun, 2004, 2006,
343
chap. 4), and indeed consolidate the findings there on a much larger data set. The first study (the Lombard/Lambert sentences) used an Edinburgh Scottish English speaker, and British and American English listeners, the second study (the Amanda/Norma sentences) used a Mainstream American English speaker. These results are also very similar to Liberman and Pierrehumbert (1984), who used American English speakers. One difference was that, in my first study, rhemes were almost never followed by H% boundaries, whereas in this study they were 19.2% of the time. This is not unexpected, given the greater variety of pragmatic uses of high boundaries in New Zealand English (e.g. see Warren & Britain, 2000). The naturalistic methodology used here seems to have been successful in getting speakers to produce themes and rhemes in systematically different ways, while still using carefully controlled utterances. As detailed in Section 1.4, in previous studies I had difficulty getting naı¨ve subjects to produce such utterances with any intonational variation using read speech; and this approach is clearly better than coaching or using trained speakers. As argued by Speer et al. (2011), certain aspects of prosody, including it seems, signalling information structure, depend upon the speaker having a clear communicative intention. Finally, this study adds further weight to my claim that metrical prominence is the most important prosodic component in signalling information structure (Calhoun, 2010a) (see also Ladd, 2008; Truckenbrodt, 1995). In part because of the success of the ToBI annotation framework, there has been much emphasis in research on information structure in the past 20 years or so on the location and type of pitch accents. In my earlier work, I showed that metrical prominence is more relevant than the location and type of accents in signalling the location and scope of focus, and contrastiveness (Calhoun, 2006, 2009, 2010a). This study shows its importance in signalling theme/rheme status as well. Rather than being primary, accenting is a consequence of underlying prominence relationships. And while tonal accent type is no doubt important to signalling some aspects of discourse meaning, it does not seem to convey theme/rheme status or contrastiveness. It is hoped this study helps to promote due weight to relative prominence in prosodic research.
Acknowledgements This work was supported by an Ian Gordon Fellowship at Victoria University of Wellington, and a British Academy Postdoctoral Fellowship at the University of Edinburgh. Many thanks to Paul Warren for his generous help in designing, setting up and carrying out the experiment, as well as valuable feedback on an earlier draft of this paper; and to him, Shari Speer and Amy Schafer for their kindness in helping me to adapt the SPOT experimental design for my purposes. Thanks also to Bob Ladd and Mark Steedman for many useful discussions during my thesis work which helped to develop this experiment, and for their comments on the experimental design. Finally, thanks to Manon Jones and Mateo Obrego´n for help with the statistics.
Appendix A Driver (left) and Slider (right) game boards for game 2 (top) and game 3 (bottom). The starting positions of the Helpers (lines) and the Groceries (dots) are shown by the large pictures (these were green and red on the actual board). Goals for the Groceries are shown by the small pictures in the top right corner on the Driver’s board. The board ‘features’, money and monsters, were printed on the Slider’s board only. On both Slider’s boards, the
344
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
starting position of one of the Walruses was on top of a feature; this is shown by a smaller picture of the walrus top left, and a smaller picture of the money or monster bottom right.
Appendix B Full list of sentence frames which could be used by participants to communicate during each game, along with hints as to when each should be used. Restrictions on use of some of the utterances are indicated in bracketed italics before the sentence frame. Indexing, to indicate the order in which different Groceries and Helpers should be referred to, is indicated by a small 1 or 2 after the object name (see main text for more details). All sentences in Version 1 are listed first, and then the differences in Version 2 of the sentence frames for Drivers and Sliders, respectively. B.1. Driver Sentence Frames B.1.1. Version 1 When you want to tell the Slider which Groceries you want to move: 3 2 3 mango mango 6 lollipop 7 6 lollipop 7 6 7 7 6 D1. I want to move the 6 7 and the 6 7. 4 walnut 5 4 walnut 5 2
banana
banana
To specify which Helpers you want to move the Groceries: 3 2 2 3 wallaby_2 mango_1 6 7 6 lollipop_1 7 6 walrus_2 7 6 7 7 D2. No, that’s no use, I want to move the 6 7 with [a/the] 6 6 rhino_2 7, 4 walnut_1 5 4 5 monkey_2 banana_1 2 3 2 3 wallaby mango_2 6 7 6 lollipop_2 7 6 walrus 7 6 7 7 (and) I want to move the 6 7 with [a/the] 6 6 rhino 7. 4 walnut_2 5 4 5 monkey banana_2
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
3 wallaby_2 6 7 6 lollipop_2 7 6 walrus_2 7 7 7 D3. Good idea, let’s do that, and move the 6 6 7 with [a/the] 6 6 rhino_2 7. 4 walnut_2 5 4 5 monkey_2 banana_2 2
mango_2
2
3
(Only after D3 or S2) To see if, with your move, the Groceries will land on any money: 2 3 mango ( ) 6 7 If we do this 6 lollipop 7 D4. , will the 6 7 land on some money? Well, if we do what I want 4 walnut 5 banana (Only after the reply to D4) To get the Slider to move the first Grocery in the pair: D5. OK, move one of the pair please. To tell them they moved correctly: D6. Well done. 2 3 mango_1 6 lollipop_1 7 7 D7. I am able to confirm the move was the final one. The 6 6 7 has now reached its goal. 4 walnut_1 5 banana_1 D8. Congratulations, we have reached the end of the round. (Only after D6 or D7) To get the Slider to move the second Grocery in the pair: 2 3 2 3 wallaby_2 mango_2 6 7 6 lollipop_2 7 6 walrus_2 7 7 7 D9. Now, please move the 6 6 7 with the 6 6 rhino_2 7. 4 walnut_2 5 4 5 monkey_2 banana_2 If you change your mind about which Groceries and/or Helpers you want to move: D10. OK, forget that. I’ll do something else. D11. Whoops, go back, there’s a different pair I want you to move.
B.1.2. Version 2 Same as above, but with the following version of D2: 3 wallaby 6 7 6 lollipop_2 7 6 walrus 7 7 7 D2. No, that’s no use, I want to move the 6 7 with [a/the] 6 6 6 rhino 7, 4 walnut_2 5 4 5 monkey banana_2 2 3 3 2 wallaby_2 mango_1 6 7 6 lollipop_1 7 6 walrus_2 7 6 7 7 (and) I want to move the 6 7 with [a/the] 6 6 rhino_2 7. 4 walnut_1 5 4 5 monkey_2 banana_1 2
mango_2
2
3
B.2. Slider’s sentence frames B.2.1. Version 1 To ask which Helper you should use to move one of the Groceries: 2
mango
3
2
wallaby
3
6 7 6 lollipop 7 6 walrus 7 7 7. S1. OK, I think we should move the 6 6 7 with [a/the] 6 6 7 4 walnut 5 4 rhino 5 monkey banana (Only as a reply to D2 or D3) To let the Driver know what money they would have landed on: 2 3 mango 6 lollipop 7 7 6 S2. OK (but the 6 7 would have landed on some money). 4 walnut 5 banana (Only as a reply to D4) To let the Driver know what features the Groceries will land on: 2 3 mango_2 6 lollipop_2 7 7 S3. No, the 6 6 7 will land on some money, 4 walnut_2 5 banana_2
345
346
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
2
mango_1
3
6 lollipop_1 7 6 7 the 6 7 will land on 4 walnut_1 5
"
# a monster . an empty square
banana_1 (I can use the money to pay the monster. ) 2 3 mango_1 " # 6 lollipop_1 7 a monster 6 7 , S4. No, the 6 7 will land on an empty square 4 walnut_1 5 banana_1 2 3 mango_2 6 lollipop_2 7 6 7 and the 6 7 4 walnut_2 5
8 > <
9 > = will land on an empty square . > > : ; will land on a monster will too
banana_2 (I have some money to pay the monster.) 3 2 mango_1 6 lollipop_1 7 6 7 S5. Yes, the 6 7 will land on some money, 4 walnut_1 5
banana_1 2 3 9 mango_2 8 will too > < = 6 lollipop_2 7 > 6 7 will land on a monster [and/but] the 6 . 7 4 walnut_2 5 > : will land on an empty square > ; banana_2 (I can use the money to pay the monster.) (Only as a reply to D5) To let the Driver know you have moved a Grocery with a Helper, and where it has moved: 2 3 2 3 wallaby mango 6 7 6 lollipop 7 6 walrus 7 7 7. S6. OK, I am able to confirm the move of the 6 6 7 with the 6 6 7 4 walnut 5 4 rhino 5 monkey banana It has moved one square [up/down/right/left]. After the Driver tells you they want to move a different pair of Groceries: S7. OK, they’re back where they were before. If you can’t do the move the Driver wanted: S8. I am unable to complete that move.
B.2.1. Version 2 Same as above, but with the following version of S3, and with S4 omitted (S5–S8 in Version 1 were then renumbered S4–S7 in Version 2): 2
mango_1
3
6 lollipop_1 7 7 S3. No, the 6 7 will land on 6 4 walnut_1 5 banana_1 2 3 mango_2 6 lollipop_2 7 6 7 [and/but] the 6 7 4 walnut_2 5 ( I
banana_2 )
can use the money have some money
"
a monster an empty square
# ,
9 > > > = will too . > > will land on an empty square > > > > ; : will land on a monster ! 8 > > > <
will land on some money
to pay the monster:
Appendix C For mixed effects models for phonetic measures of Accent Type and Relative Prominence, see Table C1.
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
Table C1 Summaries of mixed effects models for phonetic measures of Accent Type and Relative Prominence. Summary of mixed effects models for Accent Type L_f0 (N ¼434, log-likelihood ¼ 1639) Fixed effects Coefficient (Intercept) InfoStat ¼theme Position ¼start Phrase ¼second Sex ¼male Interaction: theme & second Random effects Speaker
185.36 1.06 8.31 7.44 85.33 5.35 Variance 215.51
H_f0 (N¼455, log-likelihood ¼ 2011) Fixed effects Coefficient (Intercept) InfoStat ¼theme Position ¼start Phrase ¼second Role ¼S Sex ¼male Interaction: start & S Interaction: S & male Random Effects Speaker
226.20 11.68 3.15 14.80 8.92 97.99 14.94 14.56 Variance 463.02
LH_rise (N¼436, log-likelihood ¼ 1871) Fixed effects Coefficient (Intercept) InfoStat ¼theme Phrase ¼second Role ¼S Sex ¼male Interaction: S & male Random effects Speaker
37.11 8.73 8.13 13.46 11.91 8.58 Variance 108.63
f0_pre_post (N¼377, log-likelihood ¼ 1466) Fixed effects Coefficient (Intercept) InfoStat ¼theme Position ¼start Role ¼S Sex ¼male Random effects Word
11.44 1.60 4.11 4.10 6.97 Variance 1.29
L_T (N¼ 397, log-likelihood ¼ 100.6) Fixed effects Coefficient
Std. error
t
p
4.39 1.41 1.09 1.24 6.78 2.11
42.18 0.75 7.60 6.02 12.59 2.53
o 0.001 4 0.3 o 0.001 o 0.001 o 0.001 0.013
Std. error
t
p
6.75 2.07 2.80 1.93 4.02 10.20 4.19 3.79
33.49 5.65 1.12 7.67 2.22 9.61 3.56 3.84
o 0.001 o 0.001 4 0.3 o 0.001 0.029 o 0.001 o 0.001
Std. error
t
p
3.50 1.83 1.74 2.37 5.38 3.43
10.60 4.77 4.67 5.68 2.21 2.50
o 0.001 o 0.001 o 0.001 o 0.001 0.027 0.013
Std. error
t
p
1.61 1.38 1.66 1.28 1.25
7.11 1.16 2.48 3.20 5.57
o 0.001 0.249 0.038 0.002 o 0.001
t
p
0.0312 0.0342 0.0220 0.0460 0.0385 0.0380
8.46 3.30 0.73 2.21 3.02 2.36
o 0.001 0.001 4 0.3 0.019 0.001 0.019
Std. error
t
p
0.0832 0.0246 0.1118 0.0209 0.1248 0.0522 0.0361 0.1279 0.0506 0.0610 0.0738
13.20 1.19 0.30 3.91 0.73 4.04 2.15 1.30 1.97 2.49 3.15
o 0.001 0.281 4 0.3 o 0.001 4 0.3 o 0.001 0.034 0.111 0.053 0.013 0.002
Boundary (N ¼504, log-likelihood ¼ 198.6) Fixed effects Coefficient
Std. error
z
p
(Intercept)
0.491
1.19
0.233
0.2643 0.1128 0.0160 0.1016 0.1161 0.0897 Variance 0.0077
H_T (N¼ 440, log-likelihood ¼ 114.1) Fixed effects Coefficient (Intercept) InfoStat ¼theme Position ¼start Phrase ¼second Role ¼S Sex ¼male Interaction: theme & second Interaction: start & S Interaction: start & male Interaction: S & male Interaction: start & S & male Random effects Speaker Word
1.0981 0.0292 0.0340 0.0816 0.0913 0.2107 0.0776 0.1668 0.0998 0.1521 0.2322 Variance 0.0051 0.0227
0.586
Table C1 (continued ) Summary of mixed effects models for Accent Type InfoStat ¼theme Phrase ¼ second Role ¼S Sex¼ male Interaction: theme & second Interaction: theme & S Interaction: second & male Random effects Speaker
0.314 1.284 1.398 0.147 1.632 3.982 1.576 Variance 1.384
0.450 0.505 0.490 0.648 0.622 0.731 0.738
0.70 2.54 2.86 0.23 2.63 5.45 2.13
40.3 0.011 0.004 40.3 0.009 o 0.001 0.033
Summary of mixed effects models for Relative Prominence over all word pairs f0mean_diff (N ¼ 497, log-likelihood¼ 2322) Fixed effects Coefficient Std. error t
p
(Intercept) InfoStat ¼ rheme–theme Phrase ¼ second Role ¼ S Sex ¼ male
7.78 37.70 13.71 39.35 3.12
4.38 3.27 4.25 4.58 6.77
1.77 11.54 3.23 8.60 0.46
0.076 o 0.001 0.002 o 0.001 40.3
Interaction: rheme–theme & male Interaction: second & S Interaction: second & male Interaction: S & male Interaction: second & S & male Random effects Speaker
22.90 40.59 11.66 25.63 35.13 Variance 79.52
5.02 6.58 6.86 7.02 10.04
4.56 6.16 1.70 3.65 3.50
o 0.001 o 0.001 0.096 o 0.001 o 0.001
f0max_diff (N ¼ 482, log-likelihood ¼ 2224) Fixed effects Coefficient Std. error t (Intercept) InfoStat ¼ rheme–theme Phrase ¼ second Role ¼ S Sex ¼ male Interaction: rheme–theme & second Interaction: rheme–theme & male Interaction: second & S Interaction: second & male Interaction: S & male Interaction: second & S & male Random effects Speaker
25.85 68.40 35.51 67.99 12.61 28.00 30.26 70.43 17.70 56.95 64.28 Variance 110.82
4.79 3.91 4.70 4.41 7.04 4.85 4.83 6.41 6.51 6.76 9.66
p
5.40 17.51 7.56 15.40 1.79 5.78 6.27 10.99 2.72 8.42 6.65
Int_diff (N ¼ 504, log-likelihood¼ 1346) Fixed effects Coefficient Std. error t Std. error
(Intercept) InfoStat ¼theme Role ¼S Sex ¼male Interaction: theme & S Interaction: theme & male Random effects Speaker
347
(Intercept) InfoStat ¼ rheme–theme Phrase ¼ second Role ¼ S Random effects Speaker Word1 Word2
0.626 3.009 1.064 4.793 Variance 1.805 2.204 2.390
1.160 0.432 0.298 1.235
p
0.54 6.97 3.57 3.88
Dur_diff (in secs) (N ¼ 498, log-likelihood ¼ 807) Fixed effects Coefficient Std. error t (Intercept) InfoStat ¼ rheme–theme Phrase ¼ second Sex ¼ male Random effects Speaker Word1 Word2
0.0201 0.0161 0.0214 0.0193 Variance 0.0003 0.0009 0.0010
0.0204 0.0058 0.0039 0.0088
o 0.001 o 0.001 o 0.001 o 0.001 0.068 o 0.001 o 0.001 o 0.001 0.008 o 0.001 o 0.001
40.3 o 0.001 o 0.001 0.009
p
0.99 2.79 5.44 2.18
40.3 0.004 o 0.001 0.027
Summary of mixed effects models for Relative Prominence for word pairs with full accents f0mean_diff (N ¼ 82, log-likelihood ¼ 301.8) Fixed effects Coefficient Std. error t (Intercept) InfoStat ¼ rheme–theme
1.82 15.27
4.92 3.64
P
0.37 40.3 4.20 o 0.001
348
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
Table C1 (continued ) Summary of mixed effects models for Relative Prominence for word pairs with full accents f0mean_diff (N ¼ 82, log-likelihood ¼ 301.8) Fixed effects Coefficient Std. error t Phrase ¼ second Role¼ S Sex ¼ male Interaction: second & S Interaction: second & male Interaction: S & male Interaction: second & S & male Random effects Speaker
10.16 40.40 0.27 44.35 12.13 31.92 45.51 Variance 55.9
6.25 6.60 6.55 9.66 9.49 8.28 13.71
1.63 6.12 0.04 4.59 1.28 3.86 3.32
f0max_diff (N ¼ 82, log-likelihood ¼ 344.3) Fixed effects Coefficient Std. error t (Intercept) InfoStat ¼ rheme–theme Phrase ¼ second Role¼ S Sex ¼ male Interaction: second & S Interaction: second & male Interaction: S & male Interaction: second & S & male Random effects Speaker
18.24 26.65 24.58 66.00 7.22 72.46 19.08 57.07 72.75 Variance 196.87
8.67 6.48 10.92 11.64 11.74 17.14 16.74 14.78 24.48
2.11 4.11 2.25 5.67 0.62 4.23 1.14 3.86 2.97
Int_diff (N ¼ 84, log-likelihood ¼ 203.8) Fixed effects Coefficient Std. error t (Intercept) InfoStat ¼ rheme–theme Phrase ¼ second Role¼ S Sex ¼ male Interaction: rheme–theme & second Interaction: rheme–theme & male Interaction: second & S Random effects Word1
0.138 6.396 3.624 5.408 0.316 4.120 5.953 3.591 Variance 2.883
1.203 1.380 1.301 0.893 0.803 1.696 1.584 1.490
0.11 4.64 2.79 6.06 0.39 2.43 3.76 2.41
Dur_diff (in seconds) (N ¼ 85, log-likelihood ¼ 120.3) Fixed effects Coefficient Std. error t (Intercept) InfoStat ¼ rheme–theme Phrase ¼ second Random effects Speaker Word1 Word2
0.0043 0.0102 0.0382 Variance 0.0014 0.0007 0.0013
0.0232 0.0166 0.0114
P 0.197 o 0.001 40.3 o 0.001 0.230 o 0.001 o 0.001
p 0.021 o 0.001 0.039 o 0.001 40.3 o 0.001 0.241 o 0.001 0.003
p 40.3 o 0.001 0.007 o 0.001 40.3 0.020 o 0.001 0.020
p
0.19 40.3 0.61 40.3 3.34 0.001
References Arvaniti, A., & Garding, G. (2007). Dialectal variation in the rising accents of American English. In: J. Cole, & J. I. Hualde (Eds.), Papers in laboratory phonology 9 (pp. 547–576). Berlin, New York: Mouton de Gruyter. Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics. Cambridge, UK: Cambridge University Press. Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390–412. Bartels, C., & Kingston, J. (1994). Salient pitch cues in the perception of contrastive focus. In: R. van der Sandt (Ed.), Focus and natural language processing, Vol. 1 (pp. 1–10). Heidelberg: IBM Working Papers. Bates, D. M. (2005). Fixed linear mixed models in R. R News, 5, 27–30. Beckman, M., & Hirschberg, J. (1999). The ToBI Annotation Conventions. From /http://www.ling.ohio-state.edu/ tobi/ame_tobi/annotation_conventions.htmlS. Blaauw, E. (1994). The contribution of prosodic boundary markers to the difference between read and spontaneous speech. Speech Communication, 14, 359–367. Boersma, P., & Weenink, D. (2008). Praat: Doing phonetics by computer (Version 5.0.35) [Computer program].
Bolinger, D. (1965). Forms of English: Accent, morpheme and order. Cambridge, MA: Harvard University Press. Brazil, D. (1985). The communicative value of intonation in English. Birmingham, UK: University of Birmingham. Brugos, A., Shattuck-Hufnagel, S., & Veilleux, N. (2006). Transcribing prosodic structure of spoken utterances with ToBI. MIT Open Courseware. Retrieved 26 November 2008, from /http://ocw.mit.edu/OcwWeb/Electrical-Engineerin g-and-Computer-Science/6-911January–IAP–2006/CourseHome/index.htmS. ¨ Buring, D. (1997). The meaning of topic and focus: The 59th street bridge accent. London: Routledge. ¨ Buring, D. (2003). On D-trees, beans and B-accents. Linguistics and Philosophy, 26(5), 511–545. Calhoun, S. (2004). Phonetic dimensions of intonational categories—the case of L þ Hn and Hn. In B. Bel & I. Marlien (Eds.), Proceedings of speech prosody (pp. 103–106). Nara, Japan. Calhoun, S. (2006). Information structure and the prosodic structure of English: A probabilistic relationship. Unpublished doctoral dissertation, University of Edinburgh. Calhoun, S. (2009). What makes a word contrastive: prosodic, semantic and pragmatic perspectives. In: D. Barth-Weingarten, N. Dehe´, & A. Wichmann (Eds.), Where prosody meets pragmatics: research at the interface (pp. 53–78). Bingley: Emerald. Calhoun, S. (2010a). The centrality of metrical structure in signaling information structure: a probabilistic perspective. Language, 86(1), 1–42. Calhoun, S. (2010b). How does informativeness affect prosodic prominence?. Language and Cognitive Processes, 25(7–9), 1099–1140. Dilley, L. (2005). The phonetics and phonology of tonal systems. Unpublished doctoral dissertation, MIT. Fe´ry, C., & Samek-Lodovici (2006). Focus projection and prosodic prominence in nested foci. Language, 82(1), 131–150. Grabe, E., & Warren, P. (1995). Stress shift: Do speakers do it or do listeners hear it?. In: B. Connell, & A. Arvaniti (Eds.), Papers in laboratory phonology IV (pp. 95–110). Cambridge, UK: Cambridge University Press. Gundel, J., & Fretheim, T. (2004). Topic and focus. In: L. Horn, & G. Ward (Eds.), The handbook of pragmatics (pp. 175–196). Oxford, UK: Blackwell. Gussenhoven, C. (1984). On the grammar and semantics of sentence accents. Dordrecht: Foris Publications. Gussenhoven, C. (2002). Intonation and interpretation: Phonetics and phonology Proceedings of Speech Prosody (pp. 47–57). Aix-en-Provence, France. Halliday, M. A. K. (1968). Notes on transitivity and theme in English: Part 3. Journal of Linguistics, 4, 179–215. Howell, P., & Kadi-Hanifi, K. (1991). Comparison of prosodic properties between read and spontaneous speech. Speech Communication, 10(2), 163–169. Ito, K., & Speer, S. (2006). Using interactive tasks to elicit natural dialogue production. In: S. Sudhoff, D. Lenertova´, R. Meyer, S. Pappert, P. Augurzky, I. Mleinek, N. Richter, & J. Schließer (Eds.), Methods in empirical prosody research (pp. 229–258). USA: Mouton de Gruyter. Ito, K., & Speer, S. (2008). Anticipatory effects of intonation: Eye movements during instructed visual search. Journal of Memory and Language, 58(2), 541–573. Jackendoff, R. S. (1972). Semantic interpretation in generative grammar. Cambridge, MA: MIT Press. Kiss, K. E. (1998). Identificational focus versus information focus. Language, 74(2), 245–273. Krahmer, E., & Swerts, M. (2001). On the alleged existence of contrastive accents. Speech Communication, 34(4), 391–405. Kruijff-Korbayova´, I., & Steedman, M. (2003). Discourse and information structure. Journal of Logic, Language and Information, 12, 249–259. Ladd, D. R. (1988). Declination ‘reset’ and the hierarchical organization of utterances. Journal of the Acoustical Society of America, 84(2), 530–544. Ladd, D. R. (2008). Intonational phonology (second ed.). Cambridge, UK: Cambridge University Press. Ladd, D. R., & Schepman, A. (2003). ‘‘Sagging transitions’’ between high pitch accents in English: Experimental evidence. Journal of Phonetics, 31(1), 81–112. Ladd, D. R., Schepman, A., White, L., Quarmby, L. M., & Stackhouse, R. (2009). Structural and dialectal effects on pitch peak alignment in two varieties of British English. Journal of Phonetics, 37, 145–161. Lambrecht, K. (1994). Information structure and sentence form. Cambridge, UK: Cambridge University Press. Liberman, M. (1979). The intonational system of English. New York: Garland Pub. Liberman, M., & Pierrehumbert, J. (1984). Intonational invariance under changes in pitch range and length. In M Aronoff & R Oehrle (Eds.), Language sound structure (pp. 157–233). Cambridge, MA: MIT Press. Pierrehumbert, J. (1980). The phonology and phonetics of English intonation. Unpublished doctoral dissertation. Cambridge, MA: MIT Press. Pierrehumbert, J., & Hirschberg, J. (1990). The meaning of intonational contours in the interpretation of discourse. In: P. Cohen, J. Morgan, & M. Pollack (Eds.), Intentions in communication (pp. 271–311). Cambridge, MA: MIT Press. Pierrehumbert, J., & Steele, S. (1989). Categories of tonal alignment in English. Phonetica, 46, 181–196. Pitrelli, J., Beckman, M., & Hirschberg, J. (1994). Evaluation of prosodic transcription labelling reliability in the ToBI framework. In Proceedings of the international conference on spoken language processing (Vol. 2, pp. 123–126). Yokohama, Japan. R Development Core Team (2010). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
S. Calhoun / Journal of Phonetics 40 (2012) 329–349
Rochemont, M. (1986). Focus in generative grammar. Philadelphia: John Benjamins. Rooth, M. (1992). A theory of focus interpretation. Natural Language Semantics, 1, 75–116. Schafer, A., Speer, S., & Warren, P. (2000). Intonational disambiguation in sentence production and comprehension. Journal of Psycholinguistic Research, 29(2), 169–182. Schafer, A., Speer, S., & Warren, P. (2005). Prosodic influences on the production and comprehension of syntactic ambiguity in a game-based conversation task. In: M. Tanenhaus, & J. Trueswell (Eds.), Approaches to studying world situated language use. Cambridge, MA: MIT Press. Selkirk, E. (1984). Phonology and syntax. Cambridge, MA: MIT Press. Selkirk, E. (1995). Sentence prosody: Intonation, stress and phrasing. In: J. Goldsmith (Ed.), The handbook of phonological theory (pp. 550–569). Cambridge, MA, Oxford: Blackwell. Selkirk, E. (2002). Contrastive FOCUS vs. presentational focus: Prosodic evidence from right node raising in English. In Proceedings of speech prosody 2002 (pp. 643–646). Aix-en-Provence, France. Silverman, K., Beckman, M.B., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., et al. (1992). A standard for labeling English prosody. In Proceedings of the international conference on spoken language processing (Vol. 2, pp. 867–870). Banff, Canada. Speer, S., Warren, P., & Schafer, A. (2003). Intonation and sentence processing. In Proceedings of the fifteenth international congress of phonetic sciences (pp. 95–105). Barcelona, Spain. Speer, S., Warren, P., & Schafer, A. (2011). Situationally independent prosodic phrasing. Laboratory Phonology, 2(1), 35–98. Steedman, M. (2000). Information structure and the syntax–phonology interface. Linguistic Inquiry, 31(4), 649–689.
349
Syrdal, A., & McGory, J. (2000). Inter-transcriber reliability of ToBI prosodic labeling. In Proceedings of the international conference on spoken language processing (Vol. 3, pp. 235–238). Beijing, China. Terken, J., & Hermes, D. (2000). The perception of prosodic prominence. In: M. Horne (Ed.), Prosody: Theory and experiment. Studies presented to G¨ osta Bruce (pp. 89–127). Dordrecht: Kluwer. Truckenbrodt, H. (1995). Phonological phrases: Their relation to syntax, focus and prominence. Unpublished doctoral dissertation. Truckenbrodt, H. (2002). Upstep and embedded register levels. Phonology, 19, 77–120. Turk, A., Nakai, S., & Sugahara, M. (2006). Acoustic segment durations in prosodic research: a practical guide. In: S. Sudhoff, D. Lenertova´, R. Meyer, S. Pappert, P. Augurzky, I. Mleinek, N. Richter, & J. Schließer (Eds.), Methods in empirical prosody research, Vol. 3 (pp. 1–28). Berlin, New York: De Gruyter. Turk, A., & Shattuck-Hufnagel, S. (2007). Multiple targets of phrase-final lengthening in American English words. Journal of Phonetics, 35(4), 445–472. Umbach, C. (2004). On the notion of contrast in information structure and discourse structure. Journal of Semantics, 21, 155–175. Vallduvı´, E., & Vilkuna, M. (1998). On rheme and kontrast. Syntax and Semantics, 29, 79–108. Warren, P., & Britain, D. (2000). Intonation and prosody in New Zealand English. In: A. Bell, & K. Kuiper (Eds.), New Zealand English (pp. 146–172). Wellington, New Zealand: Victoria University Press. Watson, D., Gunlogson, C., & Tanenhaus, M. K. (2008). Interpreting pitch accents in on-line comprehension: Hn vs Lþ Hn. Cognitive Science, 32(7), 1232–1244. Welby, P. (2003). Effects of pitch accent position, type and status on focus projection. Language and Speech, 46(1), 53–81. Yuan, J., Brenier, J., & Jurafsky, D. (2005). Pitch accent prediction: effects of genre and speaker. In Proceedings of interspeech (pp. 1409–1412). Lisboa, Portugal.