Journal of Phonetics 44 (2014) 47–61
Contents lists available at ScienceDirect
Journal of Phonetics journal homepage: www.elsevier.com/locate/phonetics
The effect of focus marking on supralaryngeal articulation – Is it mediated by accentuation? ⁎ Doris Mücke , Martine Grice IfL Phonetics, University of Cologne, Herbert-Lewin-Strasse 6, 50931 Cologne, Germany
A R T I C L E
I N F O
Article history: Received 25 March 2013 Received in revised form 20 December 2013 Accepted 2 February 2014 Available online 11 March 2014
A B S T R A C T
In this study we explore the effects of focus-background structure on accentuation (i.e. whether a word bears a pitch accent or not) and supralaryngeal articulation, measured in terms of acoustic durations (syllable and foot durations) and lip kinematics (parameters relating to the opening gesture: duration, displacement, peak velocity and stiffness). Although words in focus were accented and those out of focus were not, there were few supralaryngeal differences between accented words when they were produced in the broad focus context and unaccented target words (out of focus). Thus, accentuation per se did not appear to lead to supralaryngeal modifications. However, there was a clear distinction between the supralaryngeal articulation of words in broad focus and those in contrastive focus. We conclude that supralaryngeal articulation – in terms of acoustic duration and lip kinematics – is related directly to the expression of focus structure and contrastivity, and is not, contrary to conclusions drawn in previous studies, mediated by the presence or absence of accent. & 2014 Elsevier Ltd. All rights reserved.
1. Introduction This paper deals with the effect of focus-background structure on supralaryngeal articulation in German. Specifically, we investigate whether supralaryngeal modifications to articulation are due to the fact that certain words in focus are accented (i.e. bear a pitch accent) or whether the effect is directly modulated by focus structure, in which case supralaryngeal modifications and accentuation could be seen as to some extent independent. Focus denotes the part of an utterance that the speaker presents as being important and/or that the speaker assumes to be most informative for the listener (Lambrecht, 1994). The counterpart to focus is background, denoting the uninformative part of an utterance. We are concerned with how the important and informative part of an utterance is differentiated from the uninformative part. To ascertain which parts of an utterance are focussed, and which are in the background, contextualising contexts can be used in the form of a question. Such questions are used to elicit answers which have the appropriate information structure. The matching of questions to appropriate answers is also used as a diagnostic tool, and is referred to as Question–Answer Congruence (Wagner, 2012; Krifka, 2007; Culicover & Rochemont, 1983; Büring, 2003), as in (1). The accented syllable is given in capital letters. Q : Who did you want to meet? A : ½I wanted to meetbackground ½MAryfocus
ð1Þ
In the answer in (1) ‘I wanted to meet’ forms the background and ‘Mary’ the focus. Cases in which the focus is on one word only are referred to as narrow focus, a term introduced by Ladd (1980). A special type of focus involves an implicit contrast and a correction of what has previously been said (see Wagner, 2012 for discussion), see (2). Q : Did you want to meet John? A : No, ½I wanted to meetbackground ½MAryfocus
ð2Þ
In both (1) and (2), the word ‘MAry’ receives a pitch accent. Pitch accents are generally associated with the lexically stressed syllable of the word. However, there are cases in which the focus domain is greater than one word. These are referred to as broad focus structures (Ladd 1980). Here a pitch accent on one word is used to mark a larger focus domain, a phenomenon called focus projection (Büring, 2003; Welby, 2003), as shown in (3). Q : What’s up? A : ½I wanted to meet MAryfocus
n
Corresponding author. Tel.: + 49 221 4704253; fax: + 49 221 4705938. E-mail address:
[email protected] (D. Mücke).
0095-4470/$ - see front matter & 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.wocn.2014.02.003
ð3Þ
48
D. Mücke, M. Grice / Journal of Phonetics 44 (2014) 47–61
Fig. 1. Scheme of comparisons under investigation.
In (3), the complete answer is focussed. Thus, the word bearing the pitch accent, the focus exponent, is not co-extensive with the focus domain, as in (1) and (2) (Féry, 1993; Selkirk, 1984, 1995; Uhmann, 1991). So far, we have provided examples in English. The German translation of the answer in (3), given in (4), has the accent on ‘Mary’, even if the position in the phrase is not the same as in the English example. This is because the focus exponent in both German and English is generally on an argument of the verb (Ladd, 2008), in this case on ‘Mary’, rather than being strictly positionally defined (the last lexical item in the phrase), as has been claimed for English (Crystal, 1969; Cruttenden, 1997; Halliday, 1970). Q : A:
Was gibt’s? ½Ich wollte MAry treffenfocus
ð4Þ
In autosegmental–metrical phonology, accentuation is conceived of in discrete terms, a word is either accented or not. For a word to bear a pitch accent, it has to have a particular structural prominence within the prosodic hierarchy. By definition, the nuclear pitch accent is both the structurally most prominent and the last in the phrase. Furthermore, this nuclear accent (also referred to variously as sentence accent, focal accent and tonic) is obligatory, whereas prenuclear accents tend to be regarded as optional. The syllable with the main word stress bears the pitch accent. What does it mean for a syllable to bear the pitch accent? For West Germanic languages, all classed as having stress accent (Beckman, 1986), accentuation involves not only a pitch target or movement, but also an expansion of the syllable in a number of dimensions; accented syllables are longer, louder and have less vowel quality reduction than their unaccented counterparts (Beckman, 1986; Crystal, 1969; Lehiste, 1996). A number of kinematic studies have compared the supralaryngeal articulation of accented and unaccented syllables, finding that accented syllables show more distinctive articulation (overshoot) than unaccented ones (undershoot), leading to an increase in kinematic parameters when accented (for English: Beckman, Edwards & Fletcher, 1992; Cho, 2005, 2006; de Jong, 1995; de Jong, Beckman & Edwards, 1993; Harrington, Fletcher & Beckman, 2000; Harrington, Fletcher & Roberts, 1995; for Italian: Avesani, Vayra & Zmarich, 2007). They conclude that accentuation, as a structural property assigned by the prosodic hierarchy to a word or its lexically stressed syllable, leads to an increase in supralaryngeal parameters (longer, larger and faster movements) which go hand in hand with a modification at the laryngeal level affecting the pitch contour. That is, these studies confirm that accentuation (in the languages investigated) leads both to supralaryngeal and laryngeal modifications. However, the kinematic studies mentioned above have been mainly restricted to the investigation of focus structures with extremely different degrees of prominence, i.e. nuclear contrastive focussed constituents vs. those out of focus and consequently without accent. Thus, the results obtained may be simply reflecting the effect on supralaryngeal kinematics of contrast rather than accent. Such an interpretation is supported by results from studies on French and Korean (Dohen & Loevenbruck, 2005; Dohen, Loevenbruck & Hill, 2006; Cho, Lee & Kim, 2011), languages that do not have lexical stress or pitch accents comparable to English, German and Italian. These studies have shown that contrastive focus in itself can involve articulatory modifications. In a pilot study with three speakers Hermes, Becker, Mücke, Baumann and Grice (2008) compared lip kinematics in different focus structures in German. They found systematic differences in a number of articulatory parameters when comparing tokens in broad vs. contrastive focus, but not between broad and narrow focus. However, they did not compare kinematic differences in unaccented target words (out of focus) with accented target words in broad focus (referred to as Across-accentuation below). Therefore, it is still unclear, whether accentuation per se involves changes of the supralaryngeal system or not. In this study we are concerned with nuclear accents and with the question as to whether the differences found in supralaryngeal articulation are directly related to phonological specification for accentuation, or whether they express focus structures – or contrast – directly. That is, we are concerned with the question as to whether reported kinematic modifications in the supralaryngeal system are simply concomitants of accentuation or are related directly to the expression of focus structure. To address this question we conducted a study in which we compared four different focus conditions. These entail differing degrees of prominence. To our knowledge, the present study is the first to systematically investigate the articulation of more subtle differences in focus structure (rather than simply background vs. focus, we looked at four levels: background, broad focus, narrow focus and contrastive focus). The design of this study enables us to compare the articulatory characteristics of the (unaccented but lexically stressed) syllable in the background condition not only with accented syllables in a contrastive focus condition, but also with syllables in less prominent conditions (narrow and broad focus).1 We refer to the comparison of background with the three in-focus conditions as Across-accentuation (see Fig. 1(a)). We also compare the articulation of accented syllables with different focus structures (broad, narrow and contrastive focus). This is the Within-accentuation comparison (see Fig. 1(b)). ⁎ ⁎ The type of pitch accent, i.e. its tonal makeup (H , L +H and so on), is a paradigmatic choice, both for prenuclear and nuclear accents alike. It has been argued that there are considerable prominence differences across the accent types. Differences have largely been drawn in relation to the height ⁎ ⁎ of the F0 target corresponding to the starred tone. Thus H with a ‘high’ target is considered more prominent than L , with a ‘low’ target, see Pierrehumbert and Hirschberg (1990) and Breen, Fedorenko, Wagner and Gibson (2010) for English; Kohler (1991), Grice, Baumann, and Benzmüller (2005), Baumann and Grice (2006), Baumann (2006) for German. Since in this paper our main interest lies in the supralaryngeal articulation, we shall 1 We take narrow and broad focus to be less prominent, since they both involve an alternative set that is unspecified compared to the contrastive focus condition than involves an explicit alternative that is negated.
D. Mücke, M. Grice / Journal of Phonetics 44 (2014) 47–61
49
restrict our discussion of pitch accent type to a summary across speakers and conditions, leaving phonetic details within each accent type for a separate paper. In the present study, we investigate prosodically conditioned variations in lip aperture kinematics in German for the target syllables /baː/, /biː/ and /boː/ in the four different focus conditions mentioned above. 1.1. Prosodic strengthening: sonority expansion and hyperarticulation Articulation is more distinct in prominent positions, resulting in temporal and/or spatial expansion of articulatory movements. This involves strategies such as sonority expansion and/or hyperarticulation (Beckman et al., 1992; Cho, 2006; de Jong, 1995; de Jong et al., 1993; Harrington et al., 2000), and is referred to as prosodic strengthening (Cho, 2006). The Sonority Expansion Hypothesis claims that the intrinsic sonority of a vowel is enhanced to strengthen syntagmatic contrasts, i.e. the contrast between vowels and consonants. Under accent, the speaker intends to produce a louder vowel by opening the mouth wider over a longer time. A more open oral cavity allows for greater radiation from the lips, leading to an increase of the overall acoustic energy. However, there are some restrictions in enhancing the sonority of a high vowel, which generally requires a small constriction in the oral cavity rather than a low vowel (see comparisons of target syllables containing /a/ and /i/ in Cho, 2006; Harrington et al., 2000). Therefore, de Jong et al. (1993) suggest that beside spatial modifications the temporal dimension (in terms of change of vowel duration) plays an important role in giving the impression of enhanced sonority. The speaker intends to produce a louder vowel by opening the oral cavity over a longer time: “Given that intensity is perceptually integrated over time at short durations of speech, sonority should be integrated over time. Maintaining an open oral cavity over a longer duration should augment the percept of increased radiation power at the lips.” (de Jong et al., 1993:205). The Strategy of Hyperarticulation, also referred to as Localised Hyperarticulation (de Jong, 1995, based on the H & H Model, Lindblom, 1990), involves the enhancement of contrastive features as well as of sonority. A low vowel is produced with a lower tongue position, a front vowel with a more fronted tongue position (Cho, 2005; Harrington et al., 2000) and a back vowel with a more retracted tongue position (de Jong, 1995; de Jong et al., 1993). However, speakers differ in the choice of the feature they enhance to strengthen the contrast. Harrington et al., 2000 investigated the hyperarticulation of the front vowel /i/ in contrastively focussed tokens vs. those produced out of focus. They found two different manoeuvres for producing the same sharp timbre for /i/ in auditory space: under accent one speaker produced fronter constrictions of the tongue dorsum, while the other produced narrower constrictions. In vowel production, hyperarticulation mainly involves the lingual system, producing the vowel with more extreme tongue positions, whereas sonority expansion primarily involves the mandibular and labial systems (Cho, 2005: 3868); opening the mouth cavity wider increases the radiation of overall energy. Since the present study involves lip kinematics, it mainly deals with sonority expansion. However, sonority expansion can also be seen as a variant of localised hyperarticulation (Cho, 2005; de Jong et al., 1993; Harrington et al., 2000) in the sense that sonority expansion enhances properties of the vowel's sonority feature. That is, sonority expansion is phonologically driven to enhance linguistic contrasts and is therefore related to feature enhancement (Cho, 2005, 2006; Fougeron & Keating, 1997). Fig. 2 provides a schematic representation of the two strategies and how they can be characterised. 1.2. Articulatory control parameters Speakers differ in the way they modify their articulation under accent (or contrast). This has been shown for lip kinematics by de Jong (1995), Dohen and Loevenbruck (2005), Dohen et al. (2006), Cho (2005, 2006), Avesani et al. (2007), Hermes et al. (2008); for mandibular kinematics by Beckman et al. (1992), de Jong (1995), Harrington et al. (1995), Harrington et al. (2000), Cho (2005) as well as for lingual kinematics by Harrington et al. (2000) and Cho (2005). Speakers differ in their use of a control strategy to enhance phonetic properties (Browman & Goldstein, 1986, 1992). In an investigation of Italian, comparing lip aperture on words in contrastive focus and background, Avesani et al. (2007) found increased durations and larger movements into the accented syllable in the productions of both speakers in the study, while peak velocity was modified by only one speaker. Both speakers thus increased sonority (the lips were wide open to enhance the radiation of the overall intensity from the lips), but this was not achieved in the same way.
Fig. 2. Localised hyperarticulation and sonority expansion.
50
D. Mücke, M. Grice / Journal of Phonetics 44 (2014) 47–61
Fig. 3. (a–d) Relations between duration, displacement and peak velocities as a consequence of different articulatory control parameters after Beckman et al. (1992: 71) and Cho (2002: 17).
Cho (2006: 522) points out that all kinematic variations in duration, peak velocity and displacement can be understood as being the direct consequence of abstract parameter settings, namely (a) target modification, (b) stiffness modification, (c) rescaling of movements and (d) coarticulatory overlap between two movements, see Fig. 3(a–d), described in detail below. (a) Target modification (Fig. 3a) involves a more extreme target for the movement of the articulators. de Jong (1995) claims for English vowels that changes of the underlying target in the lingual system are probably the most common way to enhance the strength of articulation. When changing the target, the peak velocity increases in proportion to the target value while the duration of the movements remains unchanged. A larger displacement is reached without extra time (larger and faster, but not longer movements). (b) Stiffness (Fig. 3b) is a very abstract control parameter. Reducing a gesture's underlying stiffness leads to slower movements: The distance the articulators travel remains the same, but it takes extra time to reach the target (not larger, but slower and longer movements). Different measures are proposed in the literature to calculate a gesture's stiffness from the physiological signal, although stiffness as a parameter has been recently criticised (Fuchs, Perrier & Hartinger, 2011). In the Munhall, Ostry, and Parush (1985) measure, stiffness is calculated as the ratio of peak velocity to the maximum displacement, a spatio-temporal measure (Beckman et al., 1992; Hawkins, 1992; Munhall et al., 1985; Roon, Gafos, Hoole & Zeroual, 2007). Byrd and Saltzman (1998) and Cho (2002, 2006) propose a purely temporal measure: the time-to-peak velocity involving the time from the start of the movement to its maximum velocity (the rise-time for gestural activation). As a result of decreasing the gesture's stiffness, the rise-time of the movement from the onset of activation to its peak velocity is longer. This latter measure will be used in the present study. (c) Rescaling of a movement (Fig. 3c), involves a proportional change of target and stiffness modifications. In rescaled gestures, the articulators travel a larger distance with extra time, while the peak velocity remains the same (longer, larger but not faster movements). Displacement (target) increases and stiffness (peak velocity to maximum displacement ratio) decreases. Rescaling has been proposed by Harrington et al. (1995) as being a control parameter frequently used by speakers for prosodic strengthening. (d) Coarticulatory overlap (Fig. 3d) addresses the intergestural timing between two adjacent movements, e.g. an opening and a closing one. When the closing gesture is timed later with respect to the opening movement, the two movements overlap less. As a result, the opening movement is larger and longer (but not faster). The duration is shorter than predicted by stiffness in the Munhall et al. (1985) sense (peak velocity to maximum displacement), but the same in Cho's (2002, 2006) time-to-peak velocity measure of stiffness. This strategy of coarticulatory overlap is proposed by Beckman et al. (1992) for jaw kinematics, when comparing English accented and unaccented syllables, and it is also referred to in the literature as the degree of coarticulatory resistance when investigating the stabilisation or destabilization of paradigmatic contrasts between two sounds (Iskarous & Kavitskaya, 2010). There is a discussion in the literature whether target modifications (2a), rescaling (2c) or gestural overlap (2d) are the dominant strategies in producing kinematic changes under accent (Beckman et al., 1992; Cho 2005, 2006; de Jong et al., 1993; Harrington et al., 1995; Harrington et al., 2000), while so far pure stiffness variations have not been reported for accentuation (pure stiffness variation implies no change in displacement). It is
D. Mücke, M. Grice / Journal of Phonetics 44 (2014) 47–61
51
probably the case, as pointed out by Cho (2006), that modelling the influence of prosodic structure on the kinematic systems involves not one single dynamical control parameter but rather combinations of parameter settings. Furthermore, although the relationship between duration, displacement and peak velocity of movements is crucial for understanding the underlying parameters, the validation of those measures poses some problems. One such problem is discussed by Harrington et al. (1995) for the parameters rescaling and coarticulatory overlap, both of which lead to the same kinematic variations in the signal, i.e. larger and longer, but not faster movements. However, when calculating stiffness in terms of time-to-peak velocities, the stiffness changes as the gesture shrinks (c) while it remains constant if the gesture is truncated (d).
2. Method 2.1. Speakers and recordings We recorded five native speakers of Standard German from north of the Benrather isogloss (F1, F3 and M2 were from the Franconian area, and F2 and M1 from the Western Low German area). Three of the speakers were female, F1, F2, F3, and two were male, M1, M2. They were aged between 22 and 37 years. The recordings took place at the IfL-Phonetics laboratory in Cologne. The kinematic and acoustic data were recorded simultaneously. The kinematic data was recorded by using a 2-D Electromagnetic Articulograph (Carstens AG 100). Sensors were placed on the vermillion border of the upper and lower lips (mid-point of the lips). Two additional sensors on the bridge of the nose and the upper gums served as references for dynamic head movement corrections. A bite plate measure was used to rotate the occlusal plane. All kinematic data were recorded at 500 Hz, downsampled to 200 Hz and smoothed with a 40 Hz low-pass filter. They were subsequently converted to SSFF format using custom software2 for display in the EMU speech database system (Cassidy & Harrington, 2001). The acoustic data was recorded with a DAT-recorder (TASCAMDA-P1) using a condenser microphone (AKG C420 head set) sampled at 44.1 kHz, 16 bit. 2.2. Speech material The current study manipulates the focal structure of sentences by means of contextualising contexts in the form of question–answer pairs similar to those given in examples (1)–(4) above. The target word is the final argument of the sentence, corresponding to the default position for the nuclear pitch accent in German (see above). Four different focus structures were elicited: the target word occurred either as part of the background or in broad, narrow or contrastive focus. The target words, i.e. the fictitious names after the title ‘Doktor’, were always disyllabic, with the stressed syllable containing one of the four long target vowels /iː/ in
(/biːbɐ/), /aː/ in (/baːbɐ/), /uː/ in (/buːbɐ/) and /oː/ in (/boːbɐ/). An example of a set of question–answer pairs is given below for the target word Fig. 4). Subjects listened to the questions (which were presented both, visually and auditorily). They were instructed to read out the answer to these questions in a contextually appropriate manner at a speaking rate which they considered to be normal. At the beginning of each recording session, a test block of five question–answer pairs were recorded that did not go into the analysis. Question–answer pairs were randomized to avoid repetitions in sequences. In total, 560 tokens were recorded (4 target words×4 focus structures×7 repetitions×5 speakers). The current study is restricted to the analysis of target words containing /iː/, /oː/ and /aː/ in the stressed syllable. Target words containing /uː/ did not go into the analysis, since we were unable to reliably identify the turning points in lip aperture, due to the fact that the degree of maximum lip aperture was small. 2.3. Labels and measurements Tonal labels were placed using the acoustic waveform and F0 contours in PRAAT (Boersma & Weenink, 2010) by two independent transcribers. No access was given to any articulatory trajectories during intonation labelling. Accented target words were labelled using one of three different GToBI ⁎ ⁎ ⁎ accent types3 (Grice, Baumann, & Benzmüller, 2005): H +!H , H and L +H , as presented schematically in Fig. 5a–c. In all cases there was a low boundary tone sequence following, labelled as L-%. When the target word was unaccented, it was marked with ‘Ø’. Inter-transcriber agreement was 84%. In those cases where annotations of the two transcribers were not identical, a consensus transcription was achieved after listening to the utterance together. Acoustic and kinematic data were labelled by hand using the EMU speech database system (Cassidy & Harrington, 2001). For the acoustic analysis we used the waveform and spectrogram to annotate the target word and stressed syllable (segmentation) for each token. Recall that in the current experimental data the domain foot corresponds to the word. (a) Target foot/word: The acoustic duration from the onset of the initial consonant (decrease in the amplitude of higher formants of the preceding vowel) to the offset of the vowel in the following unstressed syllable (cessation of periodicity in the waveform). (b) Target syllable: The acoustic duration from the onset of the consonant (as in (a)) to the offset of the vowel (decrease in amplitude of higher formants). In the kinematic analysis, the lip aperture index (LA, Byrd, 2000:6) was calculated in terms of the Euclidean distance between the two sensors on the upper and lower lips capturing movements both in the horizontal and vertical dimensions, see (5). pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi lip aperture ¼ ðlip aperture xÞ2 + ðlip aperture yÞ2 lip aperture x ¼ upper lip x–lower lip x; lip aperture y ¼ upper lip y–lower lip y
2 3
EMA2SSFF (URL: http://phonetik.phil-fak.uni-koeln.de/50.html). ⁎ ⁎ Note that the difference between H and L + H is the extent of the initial rise, which is larger in the latter.
ð5Þ
52
D. Mücke, M. Grice / Journal of Phonetics 44 (2014) 47–61
Fig. 4. Speech material example, target word (/baːbɐ/).
Fig. 5. (a–c) Schematic representation of the three different pitch accent types as presented in the GToBI online guidelines http://www.gtobi.uni-koeln.de. All accents were followed by a low boundary tone sequence, L-L%.
Fig. 6. Example label scheme for the lip opening gesture.
The main task was to identify kinematic labels corresponding to the lip opening gesture in the stressed syllable, i.e. the movement from the maximum lip closure in the consonant to the maximum opening of the lips in the vowel. The point of maximum (max) and minimum (min) constriction of the opening gesture were identified at zero-crossings in the respective velocity trace. Additionally, we labelled the point of maximum velocity (peak velocity, pVel) at zero-crossings in the respective acceleration trace. The labels and measures are illustrated in Fig. 6. Based on the time and magnitude values for the different labels of lip opening movement (min, pVel, max), the following measures (a–d) were computed: (a) (b) (c) (d)
The duration of the opening movement from min to max, in ms. The maximum displacement of the opening movement from min to max, in mm. The peak velocity (pVel) of the opening movement, in mm/s. The stiffness of the opening movement computed as the interval from the start of the movement to its peak velocity (time-to-peak velocity; time2peak) (Cho, 2002, 2006; compare also discussion in Section 1.2, current paper).
3. Results 3.1. Intonation Our results confirm that the target words in the three focus conditions were accented (i.e., produced with a nuclear pitch accent), whereas those in the background condition were not, see Fig. 7. The figure also shows that there were clear preferences for particular accents in different focus types.
D. Mücke, M. Grice / Journal of Phonetics 44 (2014) 47–61
53
Fig. 7. Distribution of pitch accents on the target words as percentage according to focus conditions.
Table 1 Distribution of pitch accents on the target words as percentage according to focus condition, separately for each speaker (‘–’ is 0%). (%)
No accent
H+ !Hn
Hn
L +Hn
F1 Background Broad Narrow Contrastive
100 – – –
– 95 14 –
– 5 24 5
– – 62 95
F2 Background Broad Narrow Contrastive
100 – – –
– 100 50 –
– – 20 –
– – 30 100
F3 Background Broad Narrow Contrastive
100 – – –
– 10 – –
– 90 100 100
– – – –
M1 Background Broad Narrow Contrastive
100 – – –
– 95 – –
– 5 24 19
– – 76 81
M2 Background Broad Narrow Contrastive
100 – – –
– – – –
– 100 61 11
– – 39 89
⁎ ⁎ ⁎ For instance, L +H was most common for contrastive focus, whereas H+!H was most common for broad focus. The pragmatically more neutral H is ⁎ used in all three conditions, whereas H +!H is not used in contrastive focus at all. However, our results also clearly show that there is not a one-to-one relationship between accent type and focus type. Moreover, it is important to point out that these results are pooled across all speakers. Looking at the individual results for each speaker, as shown in Table 1, we can see that speakers use different accenting strategies. Whereas F1 and F2 (and M1 to a ⁎ ⁎ ⁎ slightly lesser extent) differentiate clearly between broad and contrastive focus (H+!H and L+H respectively), F3 and M2 make more use of H ⁎ ⁎ across the board. In fact, F3 makes almost exclusive use of H and does not produce L+H (not even in the contrastive focus condition). M2 conforms ⁎ ⁎ to the majority pattern in his marking of contrastive focus (predominance of L+H ), but does not produce H+!H at all. 3.2. Supralaryngeal articulation: overall analysis All analyses were performed using the statistical software package R. A series of repeated measures ANOVAs were conducted to test whether the acoustic and kinematic measures are affected by Focus Structure and Vowel Quality, followed by Tukey HSD post-hoc tests. Focus Structure (background, broad focus, narrow focus, contrastive focus) and Vowel Quality (/aː/, /iː/, /oː/) were included as independent variables and Speaker (F1, F2, F3, M1, M2) as a random factor. A total of 420 utterances went into the analysis (5 speakers×3 vowels×4 focus structures×7 repetitions). 3.2.1. Overall acoustic analysis Fig. 8 provides means and standard errors for the acoustic measures foot duration (which is also the word, as words are disyllabic trochees) and syllable duration. The analysis reveals a main effect of Focus Structure on foot duration [F(3, 12)¼ 20.44, p<0.001] and on syllable duration [F(3, 12)¼19.05, p <0.001]. Across-accentuation (i.e. comparing background to the other focus conditions, broad, narrow and contrastive focus), there
54
D. Mücke, M. Grice / Journal of Phonetics 44 (2014) 47–61
Fig. 8. Means and standard error for the acoustic measure (a) foot and (b) syllable duration for each focus condition (background, broad focus, narrow focus, contrastive focus) and vowel condition B/aː/ber, B/iː/ber and B/oː/ber, across all speakers.
is a clear difference between the maximally diverging focus structures, background and contrastive focus, with an average increase of foot duration of 47 ms and of syllable durations 31 ms. Furthermore, there is a difference between background and narrow focus, with an average increase of 27 ms foot duration and 17 ms syllable duration. However, when comparing less divergent focus structures, a different picture arises: Focus Structure failed to reach significance on the foot and syllable duration measures when comparing background and broad focus (post-hoc: background¼broad focusB/iː/ ber, p<0.05), while for the syllable duration measure all three vowels had a significantly different duration (post-hoc: /baː/>/boː/>/biː/, p <0.05). 3.2.2. Overall kinematic analysis Fig. 9(a–d) displays means and standard errors for the kinematic measures across all speakers. We found a main effect of Focus Structure on all opening gesture measures: duration [F(3, 12)¼ 16.49, p<0.001]; displacement [F(3, 12)¼ 9.445, p <0.01], peak velocity [F(3, 12) ¼5.441, p <0.05]) and stiffness (time-to-peak velocity: [F(3, 12)¼5.756, p<0.05]). Across-accentuation, we observed longer, larger and faster movements when comparing the maximally diverging focus structures background and contrastive focus, i.e. the opening gesture increased on average 27 ms in duration, 4 mm in displacement, and 52 mm/s in peak velocity, while stiffness (time-to-peak velocity) remained the same. Furthermore, when comparing background and narrow focus, we found longer and larger movements of the opening gesture (it increased on average 11 ms in duration and 2 mm in displacement), while stiffness (time-to-peak-velocity) and peak velocity were unchanged. However, when comparing background and broad focus (also across accentuation, but less divergent focus structure), none of the kinematic measures reached significance (p>0.05, n.s.). Within-accentuation, we also found different effects of Focus Structure on the kinematic measures. Duration and displacement of the opening gesture increased in all cases from broad through narrow to contrastive focus (p<0.05; post-hoc: broad focus0.05). Moreover, stiffness (time-to-peak velocity) systematically decreased when comparing broad vs. contrastive focus (p<0.05), but not when comparing the other focus conditions (narrow vs. broad focus; narrow vs. contrastive focus, p>0.05). The main factor Vowel Quality affects the kinematic measures: duration ([F(2, 8) ¼ 27.16, p <0.01; post-hoc: /baː/>/boː/>/biː/), displacement ([F(2, 8) ¼ 95.23, p<0.001], post-hoc: /baː/>/biː/>/boː/) and peak velocity ([F(2, 8) ¼ 63.81, p<0.001], post-hoc: /baː/ >/biː/>/boː/), but not stiffness when measured as the time taken from the minimum to the point of peak velocity ([F(2, 8) ¼1.66, p ¼0.249]. Furthermore, the analysis revealed an interaction between the main factors Focus Structure and Vowel Quality in all opening gesture measures despite differences in stiffness (duration: [F(6, 24)¼ 5.925, p <0.001]; displacement: [F(6, 24)¼ 95.23, p <0.001], peak velocity: [F(6, 24)¼ 4.044, p<0.01]; stiffness/time-to-peak velocity [F(6, 24)¼ 1.134, p >0.05]). When comparing maximally diverging focus structures, the strongest effects of prosodic strengthening can be found for target words containing /aː/ with modifications of 6 mm in displacement and 95 mm/s in peak velocity from background to contrastive focus. The strongest modifications in durations were found for target words containing rounded vowels, /boː/, with an increase of 28 ms from background to contrastive focus. 3.3. Supralaryngeal articulation: speaker specific strategies A series of one-way ANOVAs were performed to determine the speakers' individual contributions to the overall results, as it is customary in articulatory studies to examine each speaker individually. For multiple testing, the level of significance was α-corrected by dividing the p-value by the
D. Mücke, M. Grice / Journal of Phonetics 44 (2014) 47–61
55
Fig. 9. Means and standard error for the kinematic measures of the opening gesture: (a) duration, (b) maximum displacement, (c) peak velocity and (d) stiffness (time-to-peak velocity), separately for each focus condition (background, broad focus, narrow focus, contrastive focus) and vowel condition B/aː/ber, B/iː/ber and B/oː/ber, across all speakers. Table 2 Acoustic foot and syllable duration measures for test words (B/aː/ber, B/iː/ber, B/oː/ber), for each speaker separately, ‘n’ ¼significant effects at p< 0.05; Ø ¼background, B¼broad focus, N¼narrow focus, C¼contrastive focus; nall¼/aː, iː, oː/. Speaker
Across-accentuation Ø vs. C
F1 F2 F3 M1 M2
Within-accentuation Ø vs. N
Ø vs. B
B vs. C
B vs. N
N vs. C
Foot
Syll.
Foot
Syll.
Foot
Syll.
Foot
Syll.
Foot
Syll.
Foot
Syll.
n
n
n
n
n
n
n
n
n
n
– – n /iː/ n /iː/ –
n
n
– – – n /iː/ –
n
n
all all n all n all n /aː,oː/
all all n /iː,oː/ n all n all
all /iː/ – n /aː,iː/ n /aː,oː/ n
all – – n /iː/ n /aː,oː/
all – – – n /aː/
/aː,iː/ – – – –
all all n all n all n /oː/
all all n all n all n /iː,oː/
/iː,oː/ all n /iː/ n /oː/ – n
/iː,oː/ – n /oː/ n /oː/ –
number of comparisons (Bonferroni correction). Presented p-values are corrected values, the level of significance was defined as p<0.05. The speakers' individual relisations were analysed in terms of (i) Across-accentuation comparisons and (ii) Within-accentuation comparisons. The statistical results are displayed in Tables 2,3 and 4 with asterisks marking significant effects (p<0.05). An overview of all statistical means is given in Tables A1 and A2.
56
D. Mücke, M. Grice / Journal of Phonetics 44 (2014) 47–61
Table 3 Across-accentuation results of kinematic measures (duration, displacement, peak velocity, stiffness) for target words (B/aː/ber, B/iː/ber, B/oː/ber), for each speaker separately, ⁎ ⁎ ‘ ’ ¼significant effects at p< 0.05, ‘–’ ¼results not reaching significance; Ø ¼background, B¼broad focus, N ¼narrow focus, C¼contrastive focus; all ¼/aː, iː, oː/. Speaker
Across-accentuation Ø vs. C
F1 F2 F3 M1 M2
Ø vs. N
Ø vs. B
Longer
Larger
Faster
Less stiff
Longer
Larger
Faster
Less stiff
Longer
Larger
Faster
Less stiff
⁎ ⁎all all ⁎– ⁎all all
⁎ ⁎all ⁎all ⁎/aː/ all –
⁎ ⁎all /aː, oː/ ⁎– ⁎all x1 /oː/
⁎ ⁎/iː/ ⁎/iː/ /oː/ –⁎ /oː/
⁎ all – –⁎ /aː,iː/ –
⁎
⁎
⁎
⁎
⁎ /aː,iː/ – ⁎– /iː/ –
⁎ /aː,iː/ – ⁎– /iː/ –
⁎ /iː/ ⁎– /aː/ – –
– ⁎– –
/aː,iː/
all
– ⁎– –
/aː,iː/
/aː,iː/
– – – –
/iː/
– – – –
/iː/
x1, slower instead of faster movement.
Fig. 10. Averaged trajectories for the target words B/aː/ber, B/iː/ber and B/oː/ber, separately for each speaker (F1, F2, F3, M1, M2) with different focus structures. All trajectories are aligned with the acoustic beginning of the target word.
3.3.1. Speaker specific strategies: acoustic analysis (i) Across-accentuation comparisons: When comparing background vs. contrastive focus (maximally diverging focus structures), there is a consistent increase of foot and syllable durations for all speakers and all vowel conditions with only two exceptions, each involving one speaker and one vowel only. This is not the case when comparing background and broad focus (unaccented and a non-contrastive, less prominent accent), where only one speaker produced systematic differences (F1). When comparing background with narrow focus, we found that strategies were highly speaker dependent: while speaker F1 produced systematic differences in all conditions, speaker F3 produced no systematic differences, and the other speakers show modifications in some target words but not in others. (ii) Within-accentuation comparisons: We confirm a consistent difference between broad and contrastive focus for all speakers and conditions, with very few exceptions. However, the picture is different when comparing neighbouring focus structures structures. When comparing broad vs. narrow focus, there were only sporadic differences; when comparing narrow focus and contrastive focus, there were more differences, but not across the board (speaker-specific and restricted to isolated vowels).
3.3.2. Speaker specific strategies: kinematic analysis Averaged trajectories plotted separately for each speaker and target word (Fig. 10) illustrate how all speakers modify the supralarnygeal system when producing target words in different focus structures, but also highlight the fact that there are considerable differences across speakers: F1 and M1 generally show stronger modifications in the spatial domain (displacement) and greater differences between the different conditions than speakers
D. Mücke, M. Grice / Journal of Phonetics 44 (2014) 47–61
57
Table 4 Within-accentuation results for kinematic measures (duration, displacement, peak velocity, stiffness) for target words (B/aː/ber, B/iː/ber, B/oː/ber), for each speaker separately, ⁎ ⁎ ‘ ’ ¼significant effects at p< 0.05, ‘–’¼results not reaching significance; B¼ broad focus, N¼narrow focus, C ¼contrastive focus; all¼/aː, iː, oː/. Opening gesture
Within-accentuation B vs. C
F1 F2 F3 M1 M2
B vs. N
Longer
Larger
Faster
⁎ ⁎all ⁎/aː,iː/ ⁎all ⁎all /iː/
⁎ ⁎all ⁎/aː,iː/ ⁎/oː/ ⁎all /aː/
⁎ ⁎/aː,iː/ /aː/ ⁎– /aː,oː/ –
N vs. C
Less stiff
Longer
Larger
Faster
– –⁎ /iː,oː/ – –
⁎ /oː/ – – – –
⁎ /aː,iː/ – – – –
⁎ /aː,iː/ – – – –
Less stiff – – ⁎– /aː/ –
Longer ⁎– – – –
/iː/
Larger
Faster
Less stiff
⁎ ⁎/aː,iː/ /aː/ ⁎– /aː,oː/ –
⁎ ⁎/aː,oː/ /aː/ ⁎– /aː,oː/ –
– ⁎– /oː/ – –
F2, F3 and M2. Furthermore, while the spatial modifications are greatest in and smallest in , all target words show systematic durational modifications. (i) Across-accentuation comparisons: In the individual analysis, most modifications were found when comparing the most divergent focus structures, background vs. contrastive focus. Speakers F1, F2 and M1 produced longer, larger and faster but not less stiff movements in most conditions. The modifications of the other two speakers were less consistent across the conditions with the exception that speaker M2 produced longer, slower and less stiff movements for target words containing /oː/. When comparing less divergent focus structures, background vs. broad focus, despite the fact that there is a difference in accentuation (the former being unaccented), there are very few modifications of the kinematic patterns. Speakers F2, F3 and M2 show no modifications at all, whereas the other two speakers (F1, M1) show modifications in some but not all vowels and parameters. The same picture arose when comparing background vs. narrow focus, where speakers F2, F3 and M2 also showed practically no kinematic modifications. However, speakers F1 and M1 did produce more differences in background vs. narrow focus compared to those in background vs. broad focus, but they were less clear-cut (especially in terms of the individual vowels affected) than when comparing background and contrastive focus. (ii) Within-accentuation comparisons: Most modifications to the kinematic parameters (longer, larger and partially faster movements) were found when comparing broad and contrastive focus whilst results were unsystematic and rarely reached significance when comparing narrow focus with the other two focus structures. Comparing narrow focus to broad focus, only one speaker (F1) produced significant modifications, whilst, when comparing narrow and contrastive focus, four speakers showed modifications, albeit rather sporadic ones (except for /aː/ where speakers F1, F2 and M1 produce larger and faster movements).
4. Discussion In terms of accentuation, all speakers placed pitch accents in the broad, narrow and contrastive focus conditions, while they placed no pitch accent when the target word was in the background condition, i.e. out of focus. This means there was a clear intonational distinction between the in-focus and ⁎ out-of-focus conditions. Across the different in-focus conditions, there were differences in the distribution of the different accent types, H+!H being ⁎ predominantly used for broad focus, whilst L+H prevailed for contrastive focus. However, there were speaker specific strategies: some speakers did ⁎ not make systematic use of both of these pitch accent types, and made frequent use of a more neutral H accent. Two speakers made little or no use ⁎ of H+!H . In terms of supralaryngeal articulation there was no clear split between the in-focus and out-of-focus conditions. Across accentuation comparisons looked for distinctions between unaccented and accented target words. These comparisons showed for all acoustic measures (foot and syllable duration) and for three of four kinematic measures (duration, displacement and peak velocity of the opening gesture) that there are clear distinctions between background (out of focus) and contrastive focus, i.e. when comparing the most diverging focus structures. This is in line with the literature on a number of languages (Avesani et al., 2007; Baumann, Grice & Steindamm, 2006; Beckman et al., 1992; Cho, 2005, 2006; de Jong, 1995; de Jong et al., 1993; Dohen & Loevenbruck, 2005; Dohen et al., 2006; Harrington et al., 1995; Harrington et al., 2000; Hermes et al., 2008; Kügler, 2008). The comparison between background (out of focus) and narrow focus led to similar results, in that distinctions were found for all acoustic measures and for the articulatory measures duration and displacement, although stiffness and peak velocity were unaffected. These results alone could lead to the impression that background target words are distinct from those in focus by virtue of their being unaccented. Our results show, however, that this is not the case: Broad focus entails a pitch accent on the target word, but no systematic difference in any of the supralaryngeal parameters when compared to the unaccented background condition. This is true for both the acoustics and kinematics. Thus, the supralaryngeal system is not necessarily modified when a word is accented. If a word is the focus exponent of a larger constituent as is the case for broad focus, the intonational realisation is mediated by focus projection, which attenuates its prominence. That is, the word is less prominent than if it were the only word in focus. It is thus conceivable that the supralaryngeal modifications are a reflection of the prominence – also referred to as focal prominence (Beckman & Venditti, 2010) – of the target word rather than its accentuation. It is important to point out that the background condition does not just involve the absence of accent. The information structure overrides the default accent placement rules that would place the nuclear accent on the target word as the last argument of the verb (in this case the direct object). Instead the nuclear accent is placed on a focussed word earlier in the sentence (the subject), and the accent on the target word is suppressed, a process referred to as deaccentuation. Furthermore, there is a degree of prominence on the target word in the background condition, in that it bears a “phrase accent”, also known as a “phrase tone”. A phrase accent is not a fully-fledged pitch accent, but an edge tone that is stress-seeking (Grice, Baumann, & Benzmüller, 2005; Grice, Ladd, & Arvaniti, 2000). Nonetheless, in current models of German intonation, there is a consensus that postnuclear prominences are not to be treated as pitch accents. In a number of studies comparing accented and unaccented words, it is also the case that the words are referred to as deaccented rather than totally lacking in accent.
58
D. Mücke, M. Grice / Journal of Phonetics 44 (2014) 47–61
Within accentuation comparisons looked for distinctions between target words that are in focus and thus all accented. A comparison between two accented conditions, involving broad and contrastive focus, showed a systematic increase in acoustic syllable and foot durations and in all four kinematic parameters (duration of opening gesture, displacement, peak velocity, stiffness/time-to-peak velocity). However, results were rather mixed when comparing narrow focus to either of the other in-focus conditions. Although the overall analysis showed significant durational modifications in the acoustic analysis, and kinematic modifications for two parameters (duration of opening gesture and displacement), inspection of the individual results revealed considerable variation across individual speakers for all of the parameters. This speaker specific behaviour is reflected in the mixed results reported in the literature. For instance, comparing narrow focus with broad or contrastive focus conditions, Eady, Cooper, Kloouda, Mueller, and Lotts (1986) failed to find acoustic differences in English whereas Breen et al. (2010) did (longer word duration and greater energy), as did Féry & Kügler (2008), Kügler (2008) as well as Baumann, Grice, & Steindamm (2006) for German (longer word duration). Furthermore, narrow focus can be seen as implicitly contrasting an item with a set of alternatives (Krifka, 2007), such that it is difficult to draw a clear distinction between narrow focus and contrastive focus. Different results may also be related to the contexts used to elicit the data in each of the studies. We can model the differences in supralaryngeal articulation across the different focus structures using the following strategies in a mass-spring model: Comparisons to narrow focus: When comparing narrow focus to background, there are duration and displacement differences, while peak velocity and stiffness remain the same. This points to variation in gestural overlap. In the background condition the opening gesture is truncated by the earlier activation of the closing gesture, in line with Harrington et al. (1995), who discuss truncation as a source of temporal and spatial changes when target words are unaccented. When comparing narrow focus to the other accented conditions, broad and contrastive focus, it appears to be intermediate in terms of the supralaryngeal parameters. Narrow focus shows less coarticulatory overlap than broad focus and more overlap than contrastive focus. Comparisons to contrastive focus: When comparing contrastive focus to either background or broad focus, we found changes in duration and displacement, as in the comparisons with narrow focus, but we also found changes in peak velocities. This indicates that multiple strategies are employed to express contrastive focus. Variation in gestural overlap cannot explain the increase in peak velocity. It is probable that there is additional target modification. These results are in line with de Jong (1995), who claims for English vowels that changes of the underlying target are probably the most common way to enhance the strength of articulation (involving increases in peak velocity and displacement). Comparisons between background and broad focus: We found no systematic differences either in the acoustics or in the kinematics between these two conditions, although target words in the background are unaccented, whilst those in broad focus are accented. The differences in supralaryngeal parameters cannot therefore be due to accentuation per se. When looking at the speakers' individual strategies, three out of five speakers' kinematic modifications of target words in different focus structures cannot be mapped onto a single parameter: Speakers F1, F2 and M1 often used a combination of longer, larger and faster movements indicating the use of truncation as well as of target modifications as discussed above. There are only a few cases in which a speaker uses a single dynamical control parameter. This is in line with Cho (2006), who claims that a combination of different parameters are involved in the supralaryngeal prosodic strengthening of articulation. The fact that most of the speakers use more than a single strategy across the different focus conditions and vowels, indicates that we are probably dealing with redundancy in the different focus marking strategies. When comparing broad and contrastive focus, it is interesting to relate the supralaryngeal modifications of target words in different focus conditions to individual speakers' intonation patterns. Take for example speaker F3. This speaker barely differentiated between broad and contrastive focus in ⁎ terms of accent type, using the pragmatically neutral H (90% of the time in contrastive focus and 100% of the time in broad focus). This speaker showed significant durational modifications for the syllable and foot, as well as for two of the kinematic parameters for all three vowels. This means that the failure to use a particular pitch accent type does not preclude other modifications to express focal prominence. Moreover, three of the five speakers showed a fairly consistent differentiation between contrastive focus and broad focus in terms of pitch ⁎ ⁎ accent type: F1, F2 and M1 use predominantly L+H for contrastive focus and H+!H for broad focus. Although F2 was the speaker who ⁎ ⁎ differentiated contrastive focus from broad focus most clearly (L +H and H+!H 100% of the time for each condition respectively), her supralaryngeal articulation did not make the clearest distinctions between these two conditions. Here F1 and M1 showed the strongest modifications in the spatial domain. This could be taken as an indication that there is a degree of compensation in one dimension for a lack of or reduced differentiation in another. These results indicate that although there is clearly a relation between supralaryngeal modifications and accent type (as well as accentuation in general), there is also a degree of separation of these two levels. To some extent, we have to conclude that supralaryngeal modifications can be made regardless of whether or not a particular paradigmatic choice of pitch accent is made.
5. Conclusion The strategies we have identified rely on data from lip kinematics. We can thus assume that the differences we found relate to sonority expansion. We cannot make any clear predictions concerning hyperarticulation since we did not look at the lingual system. The Across-accentuation comparisons showed that there are clear and systematic distinctions between the production of words when comparing maximally diverging focus structures in background (out of focus) and contrastive focus conditions. There were few systematic differences between background and narrow focus, and even fewer between background and broad focus conditions. This means that accented (broad focus) words were not systematically articulated in a different way from unaccented (background) words. In fact, there were only sporadic differences for isolated vowels and mostly for only one speaker. These results are summarised in Fig. 11(a), where the solid line, dashed line and no line at all reflect the degree of differentiation found across the conditions. The Within-accentuation comparison shows a distinction between broad, narrow and contrastive focus, although the distinction between narrow focus and either broad or contrastive focus was less systematic when looking at individual speakers and vowels than the distinction between broad and contrastive focus. Fig. 11(b) sums up the results. It thus appears that the highlighting function of the information structure of an utterance in terms of focus type and focus domain can drive both laryngeal and supralaryngeal articulation. A less prominent but accented word (in our case broad focus, where the word is not the sole constituent
D. Mücke, M. Grice / Journal of Phonetics 44 (2014) 47–61
59
Fig. 11. Schematic summary (solid lines indicate differentiation of all acoustic parameters and at least two in the overall and individual analyses; dashed lines indicate differentiation in the overall analyses but mixed results in the individual analyses; no line indicates no differentiation in the overall or individual analyses, as is the case between background and broad focus).
under focus) involves modifications at the level of the larynx, in terms of a pitch peak, but not necessarily of the oral articulators such as the lips or the jaw. It follows, then, that prosodic strengthening is not simply a concomitant of accentuation, but is a means of highlighting a word in its own right. Furthermore, the clear distinction between broad focus and contrastive focus might be interpreted as a high degree of emphasis on the latter, comensurate with increased articulatory effort. The mixed results for narrow focus may be interpreted as differing degrees of emphasis, depending on the interpretation of the context. In sum, we have shown that different focus structures can affect both the presence or absence of accent, as well as, tendentially, the type of accent. We have also shown that supralaryngeal articulation – at least in terms of lip kinematics4 – may be related directly to the expression of focal prominence, and is not mediated by accentuation. Thus, although German is a stress-accent language, in that it can modify laryngeal and supralaryngeal parameters together when accenting words (Beckman 1986), this is not a necessary condition. A more distinct articulation in terms of longer, larger and faster movements cannot automatically be expected when a word is accented, but rather depends on focus structure and contrastivity, which in turn involves a degree of emphasis. Thus, laryngeal and supralaryngeal modifications can be seen as being controlled in different dimensions.
Acknowledgements The research reported here was funded by the Deutsche Forschungsgemeinschaft (DFG) under the project “TAMIS – Tonal and Articulatory Marking of Information Structure: Kinematic and Acoustic Correlates of Accentuation” (SPP1234: Phonological and phonetic competence: between grammar, signal processing, and neural activity). We are especially grateful to Henrik Niemann for his help with the statistical analysis of the articulatory data.
Appendix A See Tables A1 and A2.
Table A1 Means and standard deviations in parantheses for the acoustic foot and syllable duration measure (ø¼background, B¼broad focus, N ¼narrow focus, C¼ contrastive focus). Standard deviations for grand mean are computed on the basis of all utterances of the speakers including repetitions. Foot
/i/
/a/ ø
B
N
/o/
ø
C
B
N
ø
C
B
N
C
F1 F2 F3 M1 M2 Grand mean
312 329 309 356 344 330
(15) (8) (5) (21) (11) (22)
351 331 303 371 378 347
(15) (11) (10) (6) (24) (31)
362 338 317 391 381 358
(20) (7) (24) (20) (16) (32)
388 362 335 413 384 376
(15) (14) (13) (12) (9) (29)
294 309 292 325 333 311
(7) (7) (8) (16) (18) (20)
330 315 285 332 334 319
(12) (5) (8) (14) (11) (21)
346 326 299 375 352 340
(11) (10) (16) (12) (19) (29)
375 342 321 380 353 354
(17) (15) (12) (15) (16) (26)
319 320 299 345 351 327
(16) (17) (11) (10) (17) (24)
349 316 291 353 372 336
(15) (10) (13) (19) (29) (34)
353 334 308 373 386 351
(8) (11) (16) (22) (21) (32)
398 352 326 408 407 378
(11) (11) (11) (26) (22) (37)
Syllable F1 F2 F3 M1 M2 Grand mean
205 214 191 222 217 210
(11) (10) (3) (16) (13) (16)
229 210 184 232 231 218
(13) (9) (14) (9) (14) (22)
240 215 196 245 239 227
(17) (6) (16) (25) (14) (25)
259 228 206 266 241 240
(9) (12) (10) (13) (8) (24)
162 178 159 179 176 171
(9) (8) (5) (15) (11) (13)
184 179 149 182 177 174
(12) (8) (9) (14) (10) (17)
198 184 162 210 192 189
(10) (5) (10) (14) (14) (19)
220 196 174 214 197 200
(16) (11) (10) (17) (16) (21)
189 189 164 198 201 188
(14) (12) (5) (13) (12) (17)
204 188 160 201 210 193
(12) (11) (9) (13) (22) (23)
209 195 172 220 228 205
(9) (6) (13) (17) (16) (23)
237 205 186 244 242 223
(8) (6) (9) (18) (14) (26)
4
It remains to be investigated whether the aspects of supralaryngeal articulation (such as tongue and jaw) are related in such a direct way to the expression of focal prominence.
60
D. Mücke, M. Grice / Journal of Phonetics 44 (2014) 47–61
Table A2 Means and standard deviations in parentheses for the kinematic measures duration, displacement, peak velocity, and stiffness of the opening gesture (ø ¼background, B ¼broad focus, N¼narrow focus, C¼contrastive focus). Standard deviations for grand mean are computed on the basis of all utterances of the speakers including repetitions. Opening gesture
/a/
ø
B
Duration (ms) F1 F2 F3 M1 M2 grand mean
105 124 103 116 117 113
(5) (6) (3) (5) (6) (9)
117 121 100 119 121 116
(8) (7) (4) (3) (7) (10)
123 125 103 127 126 121
(12) (4) (9) (7) (8) (12)
132 131 111 129 127 126
(7) (7) (5) (3) (5) (9)
80 100 84 93 90 89
(6) (6) (1) (4) (5) (8)
93 100 79 98 90 92
(7) (5) (6) (6) (5) (9)
99 104 84 109 94 98
(4) (5) (4) (7) (6) (10)
10 11 12 11 16 12
(1) (1) (1) (2) (2) (3)
13 11 12 13 15 13
(1) (1) (2) (1) (2) (2)
16 12 13 17 17 15
(2) (1) (2) (3) (2) (3)
20 16 15 21 18 18
(3) (2) (1) (2) (1) (3)
6 5 8 6 8 6
(1) (1) (1) (1) (2) (2)
8 6 8 7 8 7
(1) (1) (2) (0) (1) (1)
10 6 8 9 9 8
(1) (1) (1) (1) (2) (2)
156 157 200 198 152 192
(15) (20) (17) (46) (26) (45)
199 153 204 218 232 201
(26) (18) (23) (18) (32) (35)
245 167 215 268 254 230
(28) (13) (26) (40) (27) (45)
293 209 223 348 265 268
(38) (25) (18) (48) (21) (59)
122 91 154 101 143 122
(11) (10) (25) (12) (36) (32)
156 95 159 138 149 139
(17) (20) (29) (17) (19) (31)
184 95 156 156 155 149
(18) (11) (20) (31) (31) (37)
59 78 60 60 66 64
(4) (7) (2) (6) (7) (9)
59 74 55 55 66 62
(4) (5) (3) (3) (3) (8)
62 78 57 62 67 65
(4) (6) (2) (3) (5) (9)
62 80 58 59 68 66
(2) (6) (2) (2) (8) (9)
52 66 51 55 53 55
(5) (4) (1) (10) (4) (8)
61 68 48 58 53 58
(5) (6) (5) (4) (4) (8)
60 70 52 65 53 60
(4) (5) (4) (9) (5) (9)
Displacement (mm) F1 F2 F2 M1 M2 grand mean Peak velocity (mm/s) F1 F2 F3 M1 M2 grand mean time2peak F1 F2 F3 M1 M2 Grand mean
/i/ N
ø
C
/o/
B
N
ø
C
B
108 112 89 111 100 104 11 7 9 9 9 9
(12) (6) (5) (10) (7) (12)
N
82 107 99 115 105 102
(9) (16) (4) (12) (9) (15)
C
93 110 89 113 122 105
(14) (3) (18) (8) (19) (18)
114 115 104 131 123 117 3 3 3 8 6 5
(15) (5) (12) (12) (17) (15)
127 119 114 145 142 129
(11) (5) (6) (17) (13) (16)
(1) (1) (1) (1) (1) (2)
4 4 4 12 6 6
(2) (1) (1) (1) (1) (3)
(1) (1) (1) (1) (2) (2)
2 2 2 5 6 4
(0) (1) (0) (2) (1) (2)
3 4 2 5 6 4
(0) (1) (2) (1) (1) (2)
199 103 170 154 158 157
(10) (9) (15) (13) (20) (35)
44 36 40 74 97 58
(10) (12) (9) (29) (14) (28)
51 51 29 81 77 58
(10) (14) (31) (11) (8) (25)
48 46 46 110 78 65
(16) (15) (16) (26) (19) (31)
69 54 47 181 67 84
(20) (10) (10) (31) (15) (53)
65 74 55 61 58 62
(4) (2) (4) (9) (3) (8)
44 57 58 62 52 55
(6) (14) (11) (22) (8) (14)
44 62 47 52 63 54
(10) (14) (16) (7) (23) (16)
52 68 54 67 65 61
(14) (14) (22) (8) (20) (17)
48 61 84 64 67 65
(15) (16) (17) (10) (25) (20)
References Avesani, C., Vayra M., & Zmarich C. (2007). On the articulatory basis of prominence in Italian. In Proceedings of the 16th international congress of phonetic sciences (pp. 981–984). Saarbrücken, Germany. Baumann, S. (2006). The intonation of givenness – evidence from German. Linguistische Arbeiten 508. Niemeyer: Tübingen. Baumann, S., & Grice, M. (2006). The intonation of accessibility. Journal of Pragmatics, 38(10), 1636–1657. Baumann, S., Grice M., & Steindamm S. (2006). Prosodic marking of focus domains – Categorical or gradient? In Proceedings of speech prosody 2006 (pp. 301–304). Dresden, Germany. Beckman, M. E. (1986). Stress and non-stress accent. Dortrecht: Fortis. Beckman, M. E., Edwards, J., & Fletcher, J. (1992). Prosodic structure and tempo in a sonority model of articulatory dynamics. In: G. J. Docherty, & D. R. Ladd (Eds.), Papers in laboratory phonology II: Segment, gesture, prosody (pp. 68–86). Cambridge: Cambridge University Press. Beckman, M. E., & Venditti, J. J. (2010). Tone and intonation. In: W. J. Hardcastle, J. Laver, & F. E. Gibbon (Eds.), The handbook of phonetic sciences (2nd ed.). Oxford: Wiley-Blackwell. Boersma, P. & Weenink E. (2010). Praat: Doing phonetics by computer (Version 5.1.30) [Computer program]. 〈http://www.praat.org/〉 Retrieved 29.05.10. Breen, M., Fedorenko, E., Wagner, M., & Gibson, E. (2010). Acoustic correlates of information structure. Language and Cognitive Processes, 25(7), 1044–1098. Browman, C. P., & Goldstein, L. (1986). Towards an articulatory phonology. In: C. Ewen, & J. Anderson (Eds.), Phonology yearbook 3 (pp. 219–252). Cambridge: Cambridge University Press. Browman, C. P., & Goldstein, L. (1992). Articulatory phonology: An overview. Phonetica, 49(3–4), 155–180. Byrd, D. (2000). Articulatory vowel lengthening and coordination at phrasal junctures. Phonetica, 57, 3–16. Byrd, D., & Saltzman, E. (1998). Intragestural dynamics of multiple phrasal boundaries. Journal of Phonetics, 26, 173–199. Büring, D. (2003). On D-trees, beans, and B-accents. Linguistics and Philosophy, 26, 511–545. Cassidy, S., & Harrington, J. (2001). Multi-level annotation in the EMU speech database management system. Speech Communication, 33, 611–677. Cho, T. (2002). The effects of prosody on articulation in English. New York: Routledge. Cho, T. (2005). Prosodic strengthening and featural enhancement: Evidence from acoustic and articulatory realizations of /a,i/ in English. Journal of the Acoustical Society of America, 117(6), 3867–3878. Cho, T. (2006). Manifestation of prosodic structure in articulation: Evidence from lip kinematics in English. Laboratory phonology 8 (pp. 519–548)Berlin, New York: Mouton de Gruyter519–548. Cho, T., Lee, Y., & Kim, S. (2011). Communicatively driven versus prosodically driven hyper-articulation in Korean. Journal of Phonetics, 39(3), 344–361. Cruttenden, A. (1997). Intonation (2nd ed.). Cambridge: Cambridge University Press. Crystal, D. (1969). Prosodic systems and intonation in English. London: Cambridge University Press. Culicover, P., & Rochemont, M. (1983). Stress and focus in English. Language, 59, 123–165. de Jong, K. (1995). The supraglottal articulation of prominence in English: Linguistic stress as localized hyperarticulation. Journal of the Acoustical Society of America, 97, 491–504. de Jong, K., Beckman, M. E., & Edwards, J. (1993). The interplay between prosodic structure and coarticulation. Language and Speech, 36(2–3), 197–212. Dohen, M., & Loevenbruck, H. (2005). Audiovisual production and perception of contrastive focus in French: A multispeaker study. Interspeech, 2005, 2413–2416. Dohen, M., Loevenbruck H., & Hill H. (2006). Visual correlates of prosodic contrastive focus in French: Description and inter-speaker variabilities. In Proceedings of Speech Prosody 2006 (pp. 221–224). Dresden, Germany. Eady, S. J., Cooper, W. E., Kloouda, G. V., Mueller, P. R., & Lotts, D. W. (1986). Acoustical characteristics of sentential focus: Narrow vs. broad and single vs. dual focus environments. Language and Speech, 29, 233–251. Féry, C., & Kügler, F. (2008). Pitch accent scaling on given, new and focused constituents in German. Journal of Phonetics, 36, 680–703. Féry, Caroline (1993): German intonational patterns. Tübingen (¼Linguistische Arbeiten 285). Fougeron, C., & Keating, P. A. (1997). Articulatory strengthening at edges of prosodic domains. Journal of the Acoustical Society of America, 101, 3728–3740.
D. Mücke, M. Grice / Journal of Phonetics 44 (2014) 47–61
61
Fuchs, S., Perrier, P., & Hartinger, M. (2011). A critical evaluation of gestural stiffness estimations in speech production based on a linear second-order model. Journal of Speech, Language and Hearing Research, 54, 1067–1076. Grice, M., Baumann, S., & Benzmüller, R. (2005). German intonation in autosegmental–metrical phonology. In: J. u.n. Sun-Ah (Ed.), Prosodic typology: The phonology of intonation and phrasing (pp. 55–83). Oxford: Oxford University Press. Grice, M., Ladd, D. R., & Arvaniti, A. (2000). On the place of “phrase accents” in intonational phonology. Phonology, 17, 143–185. Halliday, M. A. K. (1970). A course in spoken English: Intonation. London: Oxford University Press. Harrington, J., Fletcher, J., & Beckman, M. E. (2000). Manner and place conflicts in the articulation of accent in Australian English. In: M. Broe (Ed.), Papers in laboratory phonology 5 (pp. 40–55). Cambridge: Cambridge University Press. Harrington, J., Fletcher, J., & Roberts, C. (1995). Coarticulation and the accented/unaccented distinction: Evidence from jaw movement data. Journal of Phonetics, 23, 305–322. Hawkins, S. (1992). An introduction to task dymanics. In: J. Docherty, & D. R. Ladd (Eds.), Papers in laboratory phonology II: Gesture, segment, prosody (pp. 9–25). Cambridge: Cambridge University Press. Hermes, A., Becker J., Mücke D., Baumann S., & Grice M. (2008). Articulatory gestures and focus marking in German. In Proceedings of the 4th conference on speech prosody 2008 (pp. 457–460). Campinas, Brazil. Iskarous, D. R., & Kavitskaya (2010). The interaction between contrast, prosody, and coarticulation in structuring phonetic variability. Journal of Phonetics, 38(4), 625–639. Kohler, K. (1991). Terminal intonation patterns in single accent utterances of German: Phonetics, phonology and semantics. AIPUK, 25, 115–185. Krifka, M. (2007). Basic notions of information structure. In C. Fery, & M. Krifka (Eds.), Interdisciplinary studies of information structure 6, Potsdam. Kügler, F. (2008). The role of duration as a phonetic correlate of focus. Speech Prosody, 6–9 May 2008, Campinas, Brazil. Ladd, D. R. (1980). The structure of intonational meaning: Evidence from English. Bloomington: Indiana University Press. Ladd, D. R. (2008). Intonational phonology (2nd ed.). Cambridge: Cambridge University Press. Lambrecht, K. (1994). Information structure and sentence form: Topic, focus, and the mental representations of discourse referents. Cambridge: Cambridge University Press. Lehiste, I. (1996). Suprasegmental features of speech. In: N. Lass (Ed.), Principles of experimental phonetics (pp. 226–244). St. Louis: Mosby. Lindblom, B. (1990). Explaining phonetic variation: A sketch of the H&H theory. In: W. J. Hardcastle, & A. Marchal (Eds.), Speech production and speech modeling. Dortrecht: Kluwer Academic Publishers. Munhall, K. G., Ostry, D. J., & Parush, A. (1985). Characteristics of velocity profiles of speech movements. Journal of Experimental Psychology: Human Perception and Performance, 11, 457–474. Pierrehumbert, J. B., & Hirschberg, J. (1990). The meaning of intonational contours in the interpretation of discourse. In: P. Cohen, J. Morgan, & M. Pollack (Eds.), Intentions in communication (pp. 271–311). Cambridge, MA: MIT Press. Roon, K., Gafos, A., Hoole, P., & Zeroual, C. (2007). Influence of articulator and manner on stiffness. In: J. Trouvain, & W. Barry (Eds.), Proceedings of the 16th international congress of phonetic sciences August 2007 (pp. 409–412). Germany: Saarbrücken. Selkirk, E. (1984). Phonology and syntax: The relation between sound and structure. Cambridge, MA: MIT Press. Selkirk, E. (1995). Sentence prosody: Intonation, stress, and phrasing. In: John A. Goldsmith (Ed.), The Handbook of phonological theory (pp. 550–569). Cambridge, MA/Oxford, UK: Blackwell. Uhmann, Susanne (1991). Fokusphonologie. Eine Analyse deutscher Intonationskonturen im Rahmen der nicht-linearen Phonologie. Tübingen (¼Linguistische Arbeiten 252). Wagner, M. (2012). Focus and givenness: A unified approach. In: Kučerová Ivona, & Neeleman Ad (Eds.), Contrasts and positions in information structure (pp. 102–148). Cambridge University Press. Welby, P. (2003). Effects of pitch accent position, type, and status on focus projection. Language and Speech, 46(1), 3–81.