178 Prosodic Aspects of Speech and Language Hermes D & van Gestel J C (1991). ‘The frequency scale of speech intonation.’ Journal of the Acoustical Society of America 90, 97–120. Hirst D (1998). ‘Intonation in British English.’ In Hirst D & Di Cristo A (eds.). 60–82. Hirst D & Di Cristo A (1998). ‘A survey of intonation systems.’ In Hirst D & Di Cristo A (eds.).1–44. Hirst D & Di Cristo A (eds.) (1998). Intonation systems. A survey of twenty languages. Cambridge: Cambridge University Press. Hirst D, Di Cristo A & Espesser R (2000). ‘Levels of representation and levels of analysis for the description of intonation systems.’ In Horne M (ed.) Prosody: theory and experiment. Berlin: Kluwer Academic Press. Hyman L M (1975). Phonology: theory and analysis. New York: Holt, Rinehart and Winston. Jassem W (1952). Intonation of conversational English (educated Southern British). Wroclaw: Wroclawskie Towarzystwo Naukow. Klatt D H (1987). ‘Review of text-to-speech conversion for English.’ Journal of the Acoustical Society of America. 82(3), 737–793. Ladefoged P (1975). A course in phonetics. New York: Harcourt Brace Jovanovich. Lindley M & Turner-Smith R (1993). Mathematical models of musical scales: a new approach. Bonn: Verlag fuer systematische Musikwissenschaft.
Nooteboom S (1997). ‘The prosody of speech: melody and rhythm.’ In Hardcastle W H & Laver J (eds.) The Handbook of Phonetic Sciences. Oxford: Blackwell. 640–673. Pfitzinger H R (1998). ‘Local speech rate as a combination of syllable and phone rate.’ In Proceedings of the 5th International Conference on Spoken Language Processing. Sydney. 1087–1090. Pike K (1945). The intonation of American English. Ann Arbor: University of Michigan. Ramus F, Nespor I & Mehler J (1999). ‘Correlates of linguistic rhythm in the speech signal.’ Cognition 73(3), 265–292. Roach P (1982). ‘On the distinction between ‘‘stress-timed’’ and ‘‘syllable-timed’’ languages.’ In Crystal D (ed.) Linguistic controversies. Essays in linguistic theory and practice. London: Edward Arnold. Roach P (1998). ‘Some languages are spoken more quickly than others.’ In Bauer L & Trudgill P (eds.) Language myths. Harmondsworth: Penguin Books. Sluijter A (1995). Phonetic correlates of stress and accent. Ph.D. diss. University of Leiden. ’t Hart J, Collier R & Cohen A (1990). A perceptual study of intonation. Cambridge: Cambridge University Press. van Santen J, Sproat R W, Olive J P & Hirschberg J (eds.) (1996). Progress in speech synthesis. New York: SpringerVerlag.
Prosodic Cues of Discourse Units E Couper-Kuhlen, University of Potsdam, Potsdam, Germany ß 2006 Elsevier Ltd. All rights reserved.
Discourse Units in Scripted vs. Unscripted Speech Before addressing the question of which prosodic features cue which units in spoken discourse, it is necessary to determine whether we are talking about scripted or unscripted discourse. Scripted, or preformed, discourse is prototypically monologic, oneway communication, whereas unscripted discourse is more likely to be produced spontaneously, often in interactive communication. Depending on the circumstances in which discourse is produced and processed, the nature of speakers’ units varies dramatically. In scripted discourse, of which the lecture, public speech, and news broadcast are prototypical, the speaker is animating a text that has been drafted in advance. Although an occasional off-the-cuff remark may be inserted, this kind of discourse is typically cast in a written-language mode intended to be read aloud.
Consequently, its textual units are units of written grammar (e.g., the sentence and the paragraph). When read aloud, these grammatical units are often homomorphic with prosodic units such as the ‘spoken sentence’ or the ‘spoken paragraph’ – units created by characteristic patterns of pitch, loudness, and timing (including pause). In performance, skilled readers may depart from a one-to-one correspondence between printed and spoken units for rhetorical or aesthetic reasons. With unskilled readers, however, such departures are more likely to result from improperly learned ‘reading intonation.’ Children must be taught how to intone texts meaningfully. The skill does not come naturally nor are all readers equally adept at rendering a written text orally (Esser, 1988). Unscripted discourse, in contrast (if we take everyday conversation as its prototype), is produced as part of a two-way communicative event in which several parties (two or more) take turns speaking. The units of unscripted discourse revolve around the turn, flexibly constructed so as to be adaptable to the contingencies of interaction (Selting, 2000). They involve speech produced and processed online in a spokenlanguage mode, and prosodic features including
Prosodic Cues of Discourse Units 179
pitch, loudness, and timing (rhythm and speech rate) are an integral part of their formatting. The prosodic features used in everyday conversation are not consciously taught but are instead acquired along with communicative competence (Couper-Kuhlen and Selting, 1996a). In contrast to ‘reading intonation,’ for which there is generally a single acknowledged standard, prosodic formatting in unscripted discourse displays variety-specific features.
Scripted Speech or Read-Aloud Texts The Spoken Sentence
The spoken sentence, despite its name, is not in the first instance a syntactic unit but rather a prosodic one. Its beginning is signaled by high pitch, or a high tone, on the first accented syllable of an intonation phrase and its end by a falling tone starting on or from the last accented syllable of an intonation phrase and reaching a low point in the speaker’s voice range (Wichmann, 2000). Canonically, the first and last accents in the spoken sentence, together with other intervening accents, form a pitch line that gradually descends, or declines, throughout the unit (CouperKuhlen, 1986). A similar line of declination has been postulated for amplitude (Laver, 1994). Spoken sentences may consist of one or more intonation phrases or tone groups; if there are several, the groups must be linked by a single declination line for pitch and amplitude. Spoken sentences do not always correspond to the syntactic/orthographic sentence. The same prosodic patterns are used to render titles, headings, and often, in news broadcasts, the first noun phrase of a news item. Moreover, final adverbials and other nonfocal information in the sentence may be articulated with utterance-final rises, and falls-to-low may occur within a sentence (e.g., at the end of a main clause or at other points where the sentence could be possibly complete but is not). When spoken sentences consist of more than one intonation phrase or tone group, the groups are typically made to cohere by suppressing the height of the first accented syllable and any preceding unaccented syllables at the beginning of subsequent groups (‘onset depression’) (Wichmann, 2000). This tonal sandhi between the end of one group and the beginning of the next serves as an iconic cue to the cohesion that holds between the parts of the spoken sentence. The Spoken Paragraph
Just as the written sentences of a text are, as a rule, grouped into written paragraphs, so spoken sentences tend to group into spoken paragraphs or paratones
(Yule, 1980). However, whereas written paragraphs are conceptual or topical units, paratones are first and foremost prosodic in nature. Like spoken sentences, they have high pitch at the outset (‘high pitch reset’), but the pitch on their first accented syllable is extra high. Any unaccented syllables before the first accent also tend to be high. The peak on the first pitch accent at the beginning of a paratone regularly comes late with respect to the core of the syllable. In extreme cases, it does not show up until the following syllable (Wichmann, 2000). The end of a paratone is cued by extra-low pitch, close to the speaker’s baseline, and frequently by a noticeable pause (Couper-Kuhlen, 1986). In addition to these boundary cues to the spoken paragraph, there is some indication that its constituent parts, the spoken sentences, are linked internally via an overriding declination line. This so-called ‘supradeclination’ appears to define a global envelope within which more local pitch variation – in conjunction with information structure and rhetorical text relations – occurs (Wichmann, 2000).
Unscripted Speech or Talk-in-Interaction The Turn-Constructional Unit
Regarding the discourse units created spontaneously in talk-in-interaction, conversation analysts have argued that the smallest of these is the turn-constructional unit (TCU) (Sacks et al., 1974). TCUs are grammatical constituents of varying size – word, phrase, clause, or sentence – that are used to implement communicative actions. Whether a given word is a lexical TCU or is part of a phrasal, clausal, or sentential TCU is cued by its prosodic formatting. If it forms a prosodic contour of its own – an intonation phrase or tone group – it is likely to be a TCU; if it does not, it is more likely to be a fragment of one. Occasionally, sentential constructions are produced with more than one intonation phrase; in this case, the single intonation phrase marks only a fragment of a TCU. TCUs thus require three kinds of completion: syntactic, intonational, and pragmatic. Within their context of occurrence, they form a possibly complete syntactic structure, a possibly complete intonational structure, and at the same time a possibly complete pragmatic action (Ford and Thompson, 1996). Once produced, TCUs can still be extended – grammatically, intonationally, and pragmatically – should circumstances require it (e.g., if no response from an interlocutor is forthcoming; Ford et al., 2002). This phenomenon has been called ‘incrementing’ (Schegloff, 1996, 2001). Increments are bits of speech tacked on the end of a prior TCU that are
180 Prosodic Cues of Discourse Units
grammatically dependent on it. They in essence recomplete the prior unit and provide another occasion for possible turn transition. Often, increments are also prosodically dependent on the prior unit: They re-do or reshape its phonetic contour, they are fitted to it in terms of pitch range and placement, and they have loudness, tempo, and articulatory characteristics analogous to it. The prosodic shape of increments thus cues their syntagmatic relationship to the TCU that they are continuing (Walker, 2004). The Turn
Turns-at-talk may consist of one or more than one TCU. In the latter case, they may be planned as such from the outset or they may become multiunit over time (e.g., when interlocutors decline to take over the floor and the current speaker continues to talk; Schegloff, 1982). Whether a turn is transition-ready or not – that is, whether or not the talk has reached a transition-relevance place (TRP), a place in talk where the floor may legitimately shift to another speaker – is often cued prosodically. Research has shown, however, that the prosody of turn delimitation is specific to the variety of English being spoken (Local et al., 1985, 1986; Wells and Peppe´ , 1996). Whereas, for instance, both Tyneside and Ulster varieties appear to deploy rallentando and decrescendo over the last foot or two of a unit, with a swell of loudness and lengthening on the final accented (and lexically stressed) syllable, in London Jamaican there is no rallentando or descrescendo and only the very last (often lexically unstressed) syllable of the unit has a narrow fall with creaky voice. For English varieties with so-called stress timing, it appears to be the shape of the accent that projects an upcoming TRP. In the West Midlands, for instance, one type of TRP-projecting accent regularly falls on a syllable with centering vowel quality and is lower, louder, and longer than surrounding syllables; accents that lack these features do not appear to project that the turn is about to end (Wells and Macfarlane, 1998). Some multiunit turns arise because speakers suppress possible floor transition at the end of a TCU and rush into the next one (‘rush through’; Schegloff, 1982, 1996). In these cases, prosodic features cue what the speaker is doing. The end of the first unit tends to lack the rallentando and final lengthening associated with prosodic completion. The first syllables of the new unit are likely to come earlier than expected based on the speaker’s rate of speech so far. There is also often a sudden break in pitch and volume levels between the end of the first unit and the beginning of the next. Yet in a kind of phonetic sandhi, selected articulatory features from the first unit
may encroach onto the new one (Local and Walker, 2004). The Sequence
Sequences arise through the pursuit of courses of action in talk-in-interaction. A sequence consists minimally of a single adjacency pair – an initiatory turn and a corresponding response turn – such as greeting– return of greeting, question–answer, and assessment– second assessment. Sequences can, however, be expanded through presequences, insert sequences, and postsequences (Schegloff, 1995). Although sequences are structured through action, there are various prosodic cues to their emerging structure. In standard English, for instance, second pair-parts are normatively integrated with first pair-parts rhythmically. This means that the timing of the first accented syllable of the second pair-part is expected to be synchronized with the timing of the accented syllables at the end of the first pair-part. In other words, the regular delivery of the final accented syllables at the end of a first turn establishes a metric according to which the onset of the second turn is timed (Couper-Kuhlen, 1991, 1993). Departures from the local rhythmic metric (e.g., a next speaker coming in too early or too late with respect to the beat established in prior talk) cue various sorts of contextually interpretable problems in talk (Auer et al., 1999). An ongoing rhythmic beat established in a first turn and continued and/or modified in subsequent turns is thus a hallmark of a coherent sequence in talk: series of turns engaged in a single course of action ‘hang together’ rhythmically in time. Series of turns within a sequence are also fitted to one another in terms of pitch and loudness. Second pair-parts are, as a rule, never louder than first pairparts (Goldberg, 1978). Their overall pitch range, typically reflected in the height of the first accented syllable, tends to be no higher than that of the corresponding first pair-part. That is, in terms of both pitch and loudness, series of turns within a sequence create an overriding declination line (supradeclination), with first turns in a larger sequence typically beginning high and loud and subsequent turns gradually lowering the pitch and volume settings (SchuetzeCoburn et al., 1991). This type of declination is all the more remarkable in that it is jointly constructed and coordinated across speakers. Turns that begin a new sequence are cued as such by having high pitch from the outset (i.e., on both the first accented and any preceding unaccented syllables) (Couper-Kuhlen, 2001, 2003). Turns that extend a sequence under way begin in a fashion that reflects their continuing status. Their initial pitch and volume levels are fitted to those
Prosodic Cues of Discourse Units 181
of prior talk: these turns are thus cued as continuing on from something already begun rather than beginning something new (Couper-Kuhlen, 2004).
Conclusion One of the advantages of prosodic cues to units in both scripted and unscripted spoken discourse is that they are produced and processed as speech or talk emerges. Of course, information concerning the structure of discourse can be conveyed in so many words (i.e., metalinguistically): ‘‘I’m beginning a new sentence’’ or ‘‘This is a new paragraph.’’ However, it is most efficiently conveyed when produced simultaneously with the speech to which it refers. The cueing of structural units prosodically has numerous valueadded effects. In discourse that is scripted and read aloud primarily for aesthetic or entertainment purposes (e.g., poetry reading and talking books), it allows for a layer of interpretative meaning above and beyond that suggested through lines, stanzas, and/or sentences and paragraphs. In unscripted interactive discourse, cueing with nonverbal means provides for a measure of indeterminacy, which in turn facilitates in situ negotiation. See also: Conversation Analysis; Prosodic Aspects of Speech and Language; Speech; Spoken Discourse: Types.
Bibliography Auer P, Couper-Kuhlen E & Mueller F (1999). Language in time: the rhythm and tempo of spoken interaction. New York: Oxford University Press. Couper-Kuhlen E (1986). An introduction to English prosody. Forschung & Studium Anglistik 1. Tu¨ bingen/ London: Niemeyer/Arnold. Couper-Kuhlen E (1991). ‘A rhythm-based metric for turntaking.’ In Proceedings of the 12th International Congress of Phonetic Sciences, Aix-en-Provence. Service des publications, Universite´ de Provence. 275–278. Couper-Kuhlen E (1993). English speech rhythm: form and function in everyday verbal interaction. Amsterdam: Benjamins. Couper-Kuhlen E (2001). ‘Interactional prosody: high onsets in reason-for-the-call turns.’ Language in Society 30, 29–53. Couper-Kuhlen E (2003). ‘On initial boundary tones in English conversation.’ In Sole´ M J, Recasens D & Romero J (eds.) Proceedings of the 15th International Congress of Phonetic Sciences. Barcelona: Universitat Auto`noma de Barcelona. 119–122. Couper-Kuhlen E (2004). ‘Prosody and sequence organization: the case of new beginnings.’ In Couper-Kuhlen & Ford (eds.). 335–376.
Couper-Kuhlen E & Ford C E (eds.) (2004). Sound patterns in interaction. Cross-linguistic studies from conversation. Amsterdam: Benjamins. Couper-Kuhlen E & Selting M (1996a). ‘Towards an interactional perspective on prosody and a prosodic perspective on interaction.’ In Couper-Kuhlen E & Selting M (eds.). 11–56. Couper-Kuhlen E & Selting M (eds.) (1996b). Prosody in conversation: interactional studies. Cambridge, UK: Cambridge University Press. Esser J (1988). Comparing reading and speaking intonation. Amsterdam: Rodopi. Ford C, Fox B & Thompson S A (2002). ‘Constituency and the grammar of turn increments.’ In Ford C, Fox B & Thompson S A (eds.) The language of turn and sequence. Oxford: Oxford University Press. 14–38. Ford C E & Thompson S A (1996). ‘Interactional units in conversation: syntactic, intonational, and pragmatic resources for the management of turns.’ In Ochs E et al. (eds.). 134–184. Goldberg J (1978). ‘Amplitude shift. A mechanism for the affiliation of utterances in conversational interaction.’ In Schenkein J (ed.) Studies in the organization of conversational interaction. New York: Academic Press. 199–218. Laver J (1994). Principles of phonetics. Cambridge, UK: Cambridge University Press. Local J & Walker G (2004). ‘Abrupt joins as a resource for the production of multi-uni, multi-action turns.’ Journal of Pragmatics 36, 1375–1402. Local J, Wells W & Sebba M (1985). ‘Phonology for conversation: phonetic aspects of turn delimitation in London Jamaican.’ Journal of Pragmatics 9, 309–330. Local J K, Kelly J & Wells W (1986). ‘Towards a phonology of conversation: turn-taking in Tyneside English.’ Journal of Linguistics 22, 411–437. Ochs E, Schegloff E A & Thompson S A (eds.) (1996). Interaction and grammar. Cambridge, UK: Cambridge University Press. Sacks H, Schegloff E & Jefferson G (1974). ‘A simplest systematics for the organization of turn-taking for conversation.’ Language 50, 696–735. Schegloff E (1982). ‘Discourse as an interactional achievement: some uses of ‘‘uh huh’’ and other things that come between sentences.’ In Tannen D (ed.) Analyzing discourse: text and talk. Washington, DC: Georgetown University Press. 71–93. Schegloff E (1995). A primer in conversation analysis: sequence organization. Los Angeles: University of California at Los Angeles, Department of Sociology. Schegloff E A (1996). ‘Turn organization: one intersection of grammar and interaction.’ In Ochs E et al. (eds.). 52–133. Schegloff E A (2001). Conversation analysis: a project in process – ‘‘Increments.’’ Forum lecture, Linguistics Institute, University of California at Santa Barbara. Schuetze-Coburn S, Shapley M & Weber E G (1991). ‘Units of intonation in discourse: a comparison of acoustic and auditory analyses.’ Language and Speech 34(3), 207–234.
182 Prosodic Cues of Discourse Units Selting M (2000). ‘The construction of units in conversational talk.’ Language in Society 29, 477–517. Walker G (2004). ‘On some interactional and phonetic properties of increments to turns in talk-in-interaction.’ In Couper-Kuhlen E & Ford C E (eds.). 147–169. Wells B & Macfarlane S (1998). ‘Prosody as an interactional resource: turn-projection and overlap.’ Language and Speech 41, 265–298.
Wells B & Peppe´ S (1996). ‘Ending up in Ulster: prosody and turn-taking in English dialects.’ In Couper-Kuhlen E & Selting M (eds.). 101–130. Wichmann A (2000). Intonation in text and discourse. Beginnings, middles and ends. Harlow, UK: Longman/ Pearson Education. Yule G (1980). ‘Speakers’ topics and major paratones.’ Lingua 52, 33–47.
Prosodic Morphology J J McCarthy, University of Massachusetts, Amherst, MA, USA ß 2006 Elsevier Ltd. All rights reserved.
The phrase ‘prosodic morphology’ (hereafter, PM) refers to a theory and to a class of phenomena to which that theory is applied. The theory of PM deals broadly with the relationship between phonological and morphological structure, particularly restrictions on the size and shape of words, stems, and morphemes. This article presents an overview of the theory of PM and its applications.
The Theory of Prosodic Morphology PM is a theory of how morphological and phonological determinants of linguistic form interact. The theory focuses on understanding how prosodic structure impinges on certain kinds of morphology, such as reduplication and infixation. In McCarthy and Prince (1986/1996, 1990a), three essential claims are advanced about PM: (1) Principles of PM a. Prosodic morphology hypothesis Templates are defined in terms of the authentic units of prosody: mora (m), syllable (s), foot (Ft), phonological word (PWd). b. Template satisfaction condition Satisfaction of templatic constraints is obligatory and is determined by the principles of prosody, both universal and language-specific. c. Prosodic circumscription The domain to which morphological operations apply may be circumscribed by prosodic criteria as well as by the more familiar morphological ones.
A template is a restriction on the size and shape of an affix, stem, or word. According to the prosodic morphology hypothesis (1a), all such restrictions must be stated in prosodic terms. Since templates are defined
prosodically, satisfaction of templates (1b) is determined by the same principles that govern prosodic structure generally, such as foot binarity (see (4) below). Circumscription (1c) is a mechanism for delimiting a prosodically defined portion of a word to which a morphological process is then applied; it is important in the analysis of infixation and other phenomena. In short, templates and circumscription must be stated in terms of the vocabulary of prosody and must respect the well-formedness requirements of prosody. These principles are intended to address a fundamental explanatory goal: to reduce or eliminate the descriptive apparatus that is specific to particular empirical domains such as reduplication and instead derive the properties of those domains from general and independently motivated principles. Claims (1a, b, c) assert that prosodic theory is where these independent principles are to be found. Indeed, the goal of independent explanation is more important than the principles, and subsequent work has pursued the same explanatory goal while departing from these principles in certain respects. Some of this later work is reviewed below. On the morphological side, the theory of PM incorporates few assumptions. The morphological constituents root, stem, and affix form a labeled bracketing, essentially along the lines of Selkirk’s (1982) word-syntax. Most work in PM adopts a view of morphology that is morpheme-based, under the broad rubric of item-and-arrangement models, though the implementation of prosodic circumscription in McCarthy and Prince (1990a) is process-based. On the phonological side, prosodic morphology presupposes the prosodic hierarchy in (2) (cf. Selkirk, 1980). (2) Prosodic hierarchy Phonological word Foot Syllable Mora
PWd Ft s m