Journal of Phonetics (1984) 12, 327-344
The role of intonation as a cue to topic management in conversation Deborah Schaffer Eastern Montana College, English Department, 1500 North 30th Street, Billings, Montana , 59101 -0298, U.S.A. Received 7th May 1984
Abstract:
Only recently have researchers begun t o examine how intonation is used to signal topic structure in conversation, and most of the emphasis has been on F 0 characteristics in production , rather than perception. The present study was designed to test whether certain F 0 cues found to signal paragraph boundaries could also signal topic boundaries in conversation. Eight listening tests were constructed using excerpts isolated from one face-to-face (FF) and one non-faceto-face (NFF) conversation. Half of the tests were also filtered so as to make them unintelligible , but with F 0 patterns preserved. Subjects then made decisions about preceding or following topics for each isolated test utterance, based solely on the information present in the utterance. The results show that syntactic and semantic information provides much stronger topic cues in both FF and NFF conversations than does intonation ; rising F 0 (as a question marker) is the strongest intonation cue, signaling topic continuation. Overall, the results suggest that the total co ntext in which utterances are placed and the inherent optionality of conversational interaction may be the major factors contributing to the way topics are managed in natural conversation .
Introduction The concept of " topic" has been approached from a number of different viewpoints. Kantor (1977) , for example , points out three different lines of research , one dealing with sy ntactic topic (as in Li & Thompson , 1976), one with semantic topic (Kuno , 1976), and one with discourse topic (Keenan & Schieffelin , 1976). However , these studies all investigate various aspects of the structure or content of topics in speech or writing, and even those which deal with natural conversation still neglect the possibility that topic structure could be indicated by phonetic , i.e . prosodic , means .1 If topics in conversation are organized prosodically as well as syntactically and semantically , learning more about this organization will contribute to a better understanding not only of the relationship of prosody with the more abstract levels of linguistic organization, but also of the general areas of discourse analysis, conversational analysis and child language acquisition. Some researchers have, in fact , accepted this premise , and have studied the prosody of topic boundaries and/o r related units (e.g. interruptions: see French & Local, 1982). For ' A similar neglect of prosody , or at least intonation , could also be observed until recently in studies of turn-taking (see Schaffer, 1982).
0095-4470/84/040327 + 18$03.00/0
© 1984 Academ ic Press Inc. (London) Ltd.
328
D. Schaffer
instance, Cutler, Pearson & Beattie (1982), Schaffer (1982 , 1983) and others have looked at the function of intonation in signaling turn boundaries; and similarly, Lehiste (1975 , 1979), for example, and Kreiman (1982) have studied prosodic cues to sentence and paragraph boundaries in both production and perception. These investigators have found anumber of phenomena which may occur in varying combinations at such boundaries, and in addition may signal those boundaries to listeners. However, it should be noted that neither Lehiste nor Cutler et al. used natural two-party conversations, as the others mentioned here did; Lehiste studied prompted monologues, and Cutler et al. used scripted dialogues. More to the point, Coulthard & Brazil (1982) have impressionistically observed some correlations between the pitch ranges of consecutive speakers in classroom interactions, and have made some suggestions as to the function these pitch relationships serve (e.g. high-pitch signals a contrastive utterance). Menn & Boyce (1982), on the other hand , have used instrumental techniques in analyzing correlations between fundamental frequency (F 0 ) values at clause peaks and certain discourse categories (e.g. " topic change") in the production of same-speaker and cross-speaker clause pairs in natural adult-child interactions. Also, Brown, Currie & Kenworthy (1980) report a number of intonation phenomena which appear to be linked to topic structure in Edinburgh Scottish English (ESE) interviews, conversations between strangers, and read texts . Common characteristics noted for at least some of these units include increases in F 0 and amplitude at the starts of paragraphs and topics , and low falling F 0 at unit ends. Brown eta!. , in particular, include additional intonational features found at topic beginnings (TOBs), ends (TOEs) and continuations. Unfortunately , however , neither these characteristics nor those found by Menn & Boyce have been verified as perceptual cues for topics, and while the Lehiste and Kreiman results are based on listening tests, their subjects were judging sentences and paragraphs, not topics per se. Of course , there may well be a close association between paragraph and topic organization in discourse , but this has yet to be shown, at least from the conversationalist's point of view. Moreover, even if such a relationship can be established for one type of discourse (e.g. prompted monologues) , there is no guarantee that it will remain the same in other types. It is also possible that prosodic cues for topic structure are different in face-to-face (FF) interactions (specifically, conversations) as opposed to non-face-to-face (NFF) interactions, since the latter lack visual information which might itself contribute to determining topic status (see Butterworth, Hine & Brady, 1977; Beattie , 1981 ). The purpose of the experiments described here , then, is twofold . First, I wish to investigate whether prosodic characteristics can actually ·be used by listeners as cues to topic development in natural (American English) conversation- specifically, whether the intonation characteristics reported by Brown et al. for the production of topics, and by Lehiste and Kreiman for the perception of paragraphs, are used in this way. Second , if such cues do exist, I wish to compare their nature and function in FF vs NFF conversations, to see if any compensation occurs when conversationalists cannot see one another. The result should be a better understanding of the role intonation plays in the primarily semantic/discourse task of topic management, and of its relationship to other possible perceptual cues used in this task.
Procedure A number of listening tests were constructed in order to test listener reactions to intonation characteristics at topic boundaries. First, two conversations between the same pair of female speakers were recorded in an anechoic chamber using a TEAC 40-4 Tascam Series four-track tape recorder. In one conversation the speakers were seated face-to-face , and in the other they were seated back-to-back (thus non-face-to-face). These two speaker-orientation conditions are presented in Table I.
? Table I Speaker orientation
C)
Test and judgment categories Task
;:s
!::>
Filtering condition Unfiltered
Judgment type
Filtered
~·
;:s
s· C) "(::$
Non-face-to-face (NFF)
Face-to-face (FF)
Topic end (TOE)
NFFTOEU
Topic beginning (TOB)
NFFTOBU
Topic end (TOE) Topic beginning (TOB)
FFTOEU FFTOBU
NFFTOEF NFFTOBF
FFTOEF FFTOBF
"New topic follows" (NTF) judgments/ "Same topic follows" (STF) judgments "Different topic precedes" (DTP) judgments/"Same topic precedes" (STP) judgments NTF judgments/STF judgments DTP judgments/STP judgments
r:;· ~ ;:s
!::>
~ (1) ~ (1) ;:s
....
s·
<"':>
a;:s '<:: (1)
~
g. ;:s
w
N \D
330
D. Schaffer Table II
Listening test sample size and number of significant agreements
Listening test
Sample size
Number of subject agreements reaching significance
Percent equivalent
NFFTOBF,FFTOEU NFFTOBU FFTOEF , FFTOBF FFTOBU NFFTOEU NFFTOEF
27 26 23 22 19 18
21 21 19 18 16 15
77 .8 80.8 82.6 81.8 84.2 83.3
Fifty utterances (sentences, phrases and single words) were isolated from each conversation and placed in random order on two separate test tapes (one using only FF utterances and one using only NFF utterances). Two other tapes used the same sets of utterances after they had undergone band-pass filtering ; the low-pass cut-off frequency was 310Hz, while the high-pass cut-off value was 260Hz; the decay function (for two filters linked in tandem) was 12 dB/ octave. The resulting stimuli were unintelligible, since portions of the speech signal had been removed, but they were recognizably speech-like and still preserved their original intonation patterns, along with most other prosodic information. Kreiman (1982) and Schaffer (1982) present further discussion of the advantages and disadvantages of using filtered speech for perception tests. Before any of the listening tests were actually conducted, a separate group of 19 subjects was asked to make judgments about the topic relationships holding between the utterances included in a written transcript of a natural conversation; this was done to ensure that subjects could , in fact, tell when two utterances were about the same topic as opposed to different topics . Since they were highly successful at this task (see Schaffer (1982) for details of the results), other subjects were subsequently presented with the main listening tests. Groups of subjects (ranging in number from 18 to 27; see Table II) listened to a particular tape, and for each of the 50 test items on that tape were required to make a particular type of judgment; there were two types . One task was to decide whether the topic of the utterance which followed the current test item in the original conversation had been about the same topic as the current item (STF) or whether a new topic had been introduced (NTF); the second task was to decide whether the topic of tl;le utterance preceding the current test item had been about the same topic as that item (STP) or whether a different topic had preceded it and the current utterance therefore introduced a new topic (DTP). The total number of listening tests, then , was eight (four tapes, two tasks per tape). The subjects heard the test tapes in a classroom setting; the tape recorder used was either a Tandberg Series 15 two -track or a Uher 4000 Report-L recorder. Subjects were instructed to base their judgments on whatever information was available to them in each isolated test item : in the four unfiltered (U) tests this would include syntactic and semantic as well as prosodic information, while for the remaining four flltered (F) tests only prosodic information would be present. If the subjects found no strong cues to the topic status of a particular test item, they were asked simply to make a guess. At the beginning of each tape were four sample utterances, which were used for practice during an initial pre-test period of instruction, questions and clarifications. The purpose of these tests was simply to identify those utterances about which a large number of subjects could agree . Such utterances should present strong cues to their topic status (whether ending a topic, beginning one or continuing one), which in turn should be
Intonation in topic management in conversation
331
identifiable through an examination of the utterances themselves as well as through a comparison of characteristics held in common by utterances receiving similar patterns of subject agreements. A comparison of the unfiltered and filtered test results should then further iso late the role of intonation in leading listeners to make judgments about topic status, both when it interacts with verbal information and when only other prosodic information is pre sent. For a fuller discussion of the design and purpose of these experiments, see Schaffer (1982 , 1983). Results To facilitate the analysis of the listening test results , the utterances used in the tapes were divided into several categories based on the syntactic, lexical and prosodic characteristics which they displayed , and which were judged to be potential cues to topic status. These cue categories, with brief explanations of their meaning , are listed in Table III; as can be seen, they are divided into syntactic/lexical cue types, intonation contour cue types , and F 0 range cue types. Sample utterances are provided for the first section as additional illustrations ; note also that the first three categories in this section are mutually exclusive and complementary , while the others are not necessarily so (meaning that test items may belong to more than one of the last five categories) . For the two sets of intonation cues, narrow- and wide-band spectrograms were made for each test utterance, and from these were taken F 0 values of every measurable sonorant. The F 0 values of the last measurable point in each utterance and for the last stressed syllable preceding the endpoint were used in calculating the degree of change in F 0 at utterance ends, leading to the classifications of intonation contour types presented in Section B of Table III . Laryngealization is included here as a separate cue type because it has been associated with unit ends in other research ( cf. Lehiste, 197 5; Kreiman, 1982). For the last section, the average F 0 of the stressed vowels and unstressed vowels for each speaker in each conversation were also calculated and used to divide each speaker's F 0 range for each conversation into high, mid and low regions (for details cf. Brown et al. 1980, Schaffer; 1982). Each measured F 0 value of an utterance could then be placed somewhere in that range . Since , as mentioned above , some researchers have found increases in F 0 to mark paragraph and even topic beginnings in various forms of speech production, the possibility that high F 0 starts (which might be perceived as part of such an increase) could act as a perceptual topic-beginning· cue was considered a very strong one, as was the likelihood that others of the findings of Brown et al. for F 0 range values would be replicated for the percep tion of conversation. The first step in analysing the subjects' judgments about the listening tests was to determine when they agreed among themselves to a statistically significant degree . For this purpose , a chi-square approximation to the binomial distribution (with Yates' correction) was used to calculate how many listener agreements on a particular judgment (STP, NTF, etc.) made about a particular test item would be significantly different from chance at the 0.01 level (this was done separately for each test , since the total number of subjects differed from test to test). These numbers of listener agreements ranged from 15 out of 18 subjects (for the NFFTOEF test) to 21 out of 27 (for the NFFTOBF and FFTOEU tests); see Table II for a complete list. The distribution of these significant agreements over the items of each test was then studied to see which tasks led to the greatest numbers of significant items, and what cues were present in those items. Of the 400 items on the eight tests, 83 of them (21 %) received significant numbers of agreements: 27 out of 50 items on the NFFTOEU test were significant, the most of any test,
332
D. Schaffer Table III
Explanation
Label A. Syntactic/lexical cues 1. Sentence
2. Phrase
3 . Fragment 4. Pragmatically incomplete phrases 5. Abrupt syntactic starts 6. Lexically marked responses 7. Lexically marked topic introductions 8. Syntactically/ intonationally marked questions
B. Intonation contour cues 1. Falling F 0 2. Rising Fo 3. Level F 0 4. Laryngealized C. F 0 range cues* 1. High beginning F 0 range/high ending Fo range 2. Mid beginning F 0 range/mid ending Fo range 3. Low beginning F 0 range/low ending Fo range
Cue types
Having complete constituent structure, including a subject, verb, and possible objects Having complete constituent structure, but lacking some unit necessary to a complete sentence (e.g. a noun phrase) Ending before a constituent boundary has been reached Syntactically or lexically giving the impression that the speaker had more to say Syntactically or lexically suggesting that the utterance (phrase or fragment) started in the middle of a sentence Beginning with lexical items indicating responses to questions or statements Beginning with exclamations which appeared to occur often at the starts of new topics Exhibiting subject-verb inversion and possible Wh-words/exhibiting rising final F 0 ("question intonation")
Example "Even bright people aren't getting jobs" "More parts of the tape"
"Who else is invited to the" "'Cause I tried to get" "Although that would sound like more fun" "Yeah that's pretty hard" "Oh I just heard"
"How's six-oh-one?" / "For the hotel?"
Having a ratio of the F 0 value of the last stressed syllable to that of the last measured syllable greater than or equal to 1.06 (one semitone or more) Having the ratio described in Section B.l. equal to or less than 0.94 (one semitone or more) Having the ratio described in Section B.l. between 0.94 and 1.06 Having falling or level F 0 contours, plus finallaryngealization Having the F 0 value measured at the beginning/end of a test item greater than the average F 0 found for stressed syllables produced by the speaker of that item Having the F 0 value measured at the beginning/end of a test item between the two average F 0 values found for stressed vs unstressed syllables produced by the speaker of that item Having the F 0 values measured at the beginning/end of a test item below the average F 0 found for unstressed syllables produced by the speaker of that item
*See Brown eta/. (1980) for details on the procedures used to categorize these cues.
Intonation in topic management in conversation Table IV
333
Distribution of significant items over listening tests*
Listening test
Number of significant items: same topic/ new or different topic
NFFTOEU NFFTOEF NFFTOBU NFFTOBF FFTOEU FFTOEF FFTOBU FFTOBF Total
Total
6/ 2 5/2
8
20/7 5/0
27
8/2 7/2 10/3 3/1
10 9 13 4
64/19
83
7 5
*fifty items per test; 400 total.
while only four out of 50 items on the FFTOBF test were significant, the least number for any test; most tests had between five and ten significant items (Table IV). Of these 83 significant items, 47 were NFF items and 36 were FF items; 58 were unfiltered and 25 were filtered; 64 items received judgments for the same topic following or preceding the current one, while 19 items received judgments for a new or different topic following or preceding; and 49 of the items were on topic beginning tests , while 34 were on topic end tests. Moreover , even given the low number of items which received significant numbers of listener agreements, the number of items receiving unanimous judgments was still much lower-only three, all NFF, all unflltered. An initial analysis of the contribution of the syntactic/lexical and intonation cues to the observed patterns of listener judgments was undertaken through the use of several series of chi-square tests (these results are simply summarized here in brief, since other statistical tests performed were both more powerful and more informative; see Schaffer ( 1982) for more details about the chi-squ are values). For intonation, both the contour categories (falling, level, rising and laryngealized) and the beginning and ending F 0 range categories (high, mid and low) were investigated to see if the distribution of judgments for each listening test differed significantly across these categories. For the syntactic/lexical categories, however, only the first three (sentence, phrase and fragment) were used. Of the 32 chisquare tests thus calculated (four per listening test), 13 yielded significant results: five occurred for tests involving intonation contour type, six for tests involving F 0 range and two for tests involving syntactic cues. Also, nine of the significant chi-squ are values were for FF tests , with all but two of these occurring for the topic end test rather than the topic beginning test ; three of the four significant NFF chi-square values , on the other hand, were for the topic beginning test . Finally, the significant chi-square values were virtually evenly divided between the filtered and unfiltered listening tests. From this examination, then, there is little evidence that particular cue types or listening tasks led to greater differences in listener judgments, except for the FF topic end tests, which did have more than half of the significant results. Nevertheless, some possible trends are discernible when the relationship between specific cue categories and listener agreements is considered, including greater numbers of STF judgments than predicted by the chi-square calculations for items with rising F 0 , high ending F 0 and fragments (the last for unfiltered tests only) , more DTP judgments for rising F 0 and
334
D. Schaffer Table V
Pearson's r correlations of F 0 ratio vs NTF judgments
Listening test NFFTOEU NFFTOEF NFFTOBU NFFTOBF FFTOEU FFTOEF FFTOBU FFTOBF
F 0 Fall ratios ( 1.06+)*
F 0 Rise ratios (0.94-)t
- 0.543* -0.988* -0.630* -0 .269 0.502* 0.302 -0.285 -0.168
0.23 4 0.485 - 0. 176 - 0.6 04§ 0.300 -0.278 - 0.472 0.225
* For the NFF tests, df = 26; for the t For the NFF tests , df = 11; for the tp < 0.005. §p < 0.025.
FF tests, df = 25. FF tests, df = 13.
more NTF judgments than predicted for low ending F 0 and sentences (again, the latter applies only to the unfiltered tests) . However , other characteristics-notably high starting F 0 , falling F 0 , level F 0 and laryngealization-do not lead to the large numbers of the types of judgments (DTP , NTF , STF and NTF , respectively) that other research suggests they should ( cf. Brown et al., 1980; Kreiman , 1982), and other characteristics simply result in irregular distributions of listener judgments in the chi-square tests. A separate set of chi-square values was calculated to compare the effect of the same categories of cues on listener judgments which were made for tests differing only in speakerorientation condition (thus, the NFFTOEU and NFFTOEF test judgments were compared, and so on). The results were all highly significant except those for the filtered turn beginning tests compared for ending F 0 range; evidently, then, the FF vs NFF orientation of the speakers had a strong effect on how listener judgments were distributed over the four sets of cues described above . Yet based on the chi-square distributions, certain of these cues individually were judged similarly in both NFF and FF conditions, among them fragments and mid ending F 0 for all tests, and rising F 0 and low starting F 0 for the topic end tests. This suggests that speakers and/or listeners were using some characteristics in the same way in both FF and NFF conditions, while most of the others varied considerably in the two conversations. These possibilities will be taken up again below. Next, to test the strength of the relationships holding between the various sets of test data, a number of Pearson's r correlations were also calculated. First, the hypothesis that greater F 0 rises would lead to greater numbers of STF judgments than smaller F 0 rises, and that larger F 0 falls would be stronger topic end cues than lesser falls , was tested by correlating the numbers of "new topic" judgments made for items on a particular test with the rise or fall ratios for those items. Since smaller rise ratios would mean larger rises, while larger fall ratios correspond to larger falls, the correlations between F 0 rise ratios and NTF judgments were expected to be negative, while those between F 0 fall ratios and NTF judgments should be positive. Of the 16 resulting correlations, only five were significant even at the 0.025 level (see Table V), all but one of these being for NFF listening tests, and only one resulting for rising intonation (negative, as expected). Moreover, two of the correlations (for the NFFTOEU and NFFTOEF tests) run counter to expectations, since they show that greater F 0 falls actually received fewer topic end judgments than did smaller falls. The overall pattern of
Intonation in topic management in conversation
335
correlations, with so few significant values , thus supports the conclusion that degree ofF0 change does not in general have a gradient effect on the strength of the intonation contour as a topic end or beginning cue . A second set of Pearson ' s r correlations tested the strength of the relationship holding between the listening test judgments themselves , compared across all tests . The results are presented in Table VI. Of the 28 non-repetitive correlations possible between the eight tests , 11 were significant at least at the 0.05 level: these include all four of those correlations matching tests which differ only in speaker-orientation condition (e.g. NFFTOEU vs FFTOEU ; since the NFF and FF tests used different sets of utterances, these last were matched as closely as possible for the syntactic , lexical and intonation characteristics of interest here). These four correlations might be taken to indicate that subjects were making judgments in a parallel - if not identical- fashion in the two conditions. In addition, all of the four negative correlations were between topic end and beginning test pairs, as would be expected if utterances could not both start and end topics at the same time. Two of the four test pairs which differed only in filtering condition (FFTOEU and FFTOEF, etc.) had significant r values , and more significant values were found between FF tests than between NFF tests . However , it should be noted that no correlation accounted for more than one-quarter of the variance in listener judgments between two tests , as shown by r 2 (the highest value being 0.2532 for the FFTOEU/FFTOBU correlation). This indicates that no strong relationship actually exists between any pair of topic-judging tasks , and that each listening test was treated differently , for the most part , by each group of subjects. 2 The third set of statistical tests involved computer-generated stepwise multiple linear regression analyses (Nie et al., 1975) , which determined how well eight selected independent variables accounted for the listener judgments (the dependent variables) made for each test. Three of the variables were categorical in nature and so were assigned numerical values from one to three for each category ; they were syntactic structure (labeled SYNTYPE and divided into sentences, phrases and fragments) , beginning F 0 range (BEGRNGE , with high, mid and low divisions) and ending F 0 range (ENDRNGE, also with high, mid and low divisions) . The remaining five variables , beginning F 0 values (BEGFREQ) , ending F 0 values (ENDFREQ), F 0 ratio values (RATIOB) , length of item in words (LENWORD) , and length of item in milliseconds (LENMSEC) , were all assigned valued as measured from spectrograms of each test item. These variables were entered one at a time into the regression equations of each listening test so as to establish their statistical significance relative to one another. The results displayed in Table VII include only those variables which reached significance at the 0.05 level , with the exception of the NFFTOEU and FFTOBU variables; for those tests no variable contributed significantly to the regression analysis , so the highest ranking variable for each test is included here simply for comparison. The table also includes other statistics useful for determining the role of each variable in the regression analyses. Scrutiny of Table VII shows a number of surprising results. No regression had more than two variables entering into it significantly, and two , as just mentioned, actually had no significant variables at all. Furthermore , neither of the two beginning intonation variables appear in any of the regressions , and some of those variables which did reach significance are still difficult to explain . On the one hand, the ENDRNGE , ENDFREQ and RATIOB 2 Every effort was made to keep the subject pool as homogeneous as possible across listening tests, of course, and the responses made by the subjects show enough similarities even within the general variation to support the view that the test differences were due to the nature of the tasks or the utterance characteristics, rather than the subjects.
w w
0\
~ Table VI
NFFTOEU
~
Pearson's r correlations between listening test scores
NFFTOEF
NFFTOBU
NFFTOEU NFFTOEF NFFTOBU NFFTOBF FFTOEU FFTOEF FFTOBU FFTOBF
NFFTOBF
~
FFTOEU
FFTOEF
FFTOBU
FFTOBF
0.3013* -0.3885t 0.2565*
*p
< o.os;tp < o.ol; :J:p < o.oo1.
0 .3384t 0.3962t -0.4289:J: - 0 .5032:J:
0.3266t - 0.3410t -0.4366:J: 0 .3479t
~ .....
Table VII
Listening test
Results of regression analysis
Variable name
rm
~
r2
m
r
2
......
Change
Beta
F
df
0 ;::
""6·
;::
NFFTOEU NFFTOEF NFFTOBU NFFTOBF FFTOEU FFTOEF FFTOBU FFTOBF
(LENWORD) ENDFREQ* RATIOBt ENDRNGE * ENDFREQ* LENMSEC* RATIOB* ENDFREQ* SYNTYPEt (RATIO B) ENDFREQ * LENMSECt *p < O.Gl.
t p < 0 05.
0 .249
0.062
0.489
0.239
0.511
0.26 1
0.523
0.274
0.440
0.194
0.652
0.425
0.250
0.062
0.429
0.184
0.062 0.158 0.081 0.26 1 0.152 0.121 0.194 0.370 0.055 0 .062 0 .090 0.094
-0.249 -0 .554 -0 .324 -0.511 0.464 0.3 56 0.440 -0 .642 -0 .237 -0.250 0.37 5 0.315
1, 48
;;;·
7.389*
2, 47
0 \:l
16.939* 8.849*
1' 48 2,47
~ ;::
11. 545*
1, 48
~
17 .377*
2,47
......
3.188
I , 48
s·
2,47
"'0;::
3.172
5.292*
......
;::;·
""
~ (\) ;::
~
(\)
~ ...... c;· ;::
w
w -..)
338
D. Schaffer
variables appearing in three of the topic beginning tests can probably be explained by the contribution of high rising F 0 to questions use as a topic-introduction technique (as well as for continuing topics in the topic end tests); and the significance of the LENMSEC variable in the two filtered topic beginning tests no doubt relates to certain listener comments that they heard shorter utterances as being topic starts. But there is no satisfactory account for the appearance of SYNTYPE in the FFTOEF results-especially since none of the prosodic variables with which syntactic structure might correlate also appeared in the regression analysis. There is also considerable disparity between the variables of the NFF vs FF tests: while the two filtered topic beginning tests have the same set of variables , the only other even partial match-up is for the filtered topic end tests , with one identical variable. Likewise, the two tests which failed to produce any significant variables are matched neither in speaker-orientation condition nor type of task involved. Given these findings , then, plus the fact that the greatest amount of variance in listener judgments accounted for in any of the tests was still less than 50% (for the FFTOEF test, r~ = 0.425) , it seems likely that other factors than those studied in the regression analysis were contributing to the judgments made by subjects for all listening tests, NFF or FF . Some suggestions in regard to these other factors will be offered in the last section. Apart from these statistical tests , something may also be learned about how listeners responded to the selected utterance characteristics by comparing the distributions of judgments made for sets of items which share specific cues. Table VIII shows this distribution of judgments , in percentages , for each listening test and each cue type considered. Few of the percentages listed for a particular combination of listening test and cue type strongly favor one judgment over its complement (e .g. NTF over STF) , but a number of them at least show a majority for one judgment category over the other. Thus, fragments received a majority of STF and SIP judgments in all of the unfiltered tests, though not in all of the filtered ones , and pragmatically incomplete phrases received this same distribution of judgments in all tests. Abrupt syntactic starts likewise received a clear majority of SIP judgments in both unfiltered tests , though the other test results are very close to random. For lexically marked responses , all of the topic beginning tests show majorities of SIP judgments, but for lexically marked topic introductions, only the FFTOBU test received more DIP judgments than SIP, suggesting that these lexical markers are not particularly strong cues for new topics . Syntactically and intonationally marked questions, however, are strong topic continuation cues , even in the filtered tests (showing the contribution of rising F 0 ); they also frequently seem to signal new topics , at least in the unfiltered condition. Overall, what trends do appear seem to be stronger in the NFF tests (with one or two notable exceptions) than in the FF tests. For items sharing particular intonation contour types, even fewer trends are apparent. The only clear majorities are STF and SIP judgments occurring for scattered contour categories ; this makes sense for level and rising intonation with regard to STF judgments, but not for falling F 0 and laryngealization. However , only the trend for rising F 0 seems to be truly general, extending over both filtering conditions in both speaker-orientation conditions, and the probable reason for this pattern is the use of rising intonation to signal questions (see above). Therefore , from this analysis it seems likely that in the other cases where high percentages result, factors other than contour type were responsible for the way subjects reacted to the test items. This statement also applies to the judgment patterns found for beginning and ending F 0 range , with the probable exception of the high ending F0 range results. Since this category often occurs with rising intonation in questions , it is not surprising that a majority of STF
Table VIII
Distribution of listening test judgments(%) by cue categories
FFTOEF
No. of items*
NFFTOEU
NFFTOEF
NFFTOBU
NFFTOBF
(NFF/ FF)
NTF
STF
NTF
STF
DTP
STP
DTP
STP
NTF
STF
NTF
STF
DTP
STP
5/4 4/9
32 32
68 68
48 27
52 73
35 29
65
43 36
57 64
32 35
68 65
64 42
36 58
39 40
5/5 13/8
48 42
5'2 58
44 46
56 54
28 15
72
85
44 37
56 63
42 48
58 52
48 42
52 58
4/3
43
57
42
58
39
61
43
57
35
65
42
11 /13
23
77
32
68
67
33
51
49
25
75
Intonation contour Laryngeal. Level Fa Falling Fa Rising Fa
7/14 8/6 22/15 13/15
42 45 44 36
58 55 56 64
46 47 42 31
54 53 58 69
39
61 56 73 57
41 39 42 47
59 61 58 53
55 36 39 37
Beginning Fa range Low Mid High
10/9
45 37
66
40
'60
27
73
14 39 20
86 61
30
55 63 70
34
11/5 29/34
80
50 41 33
50 59 67
45 44 42
56 58
45 35 37
55 65 63
43 43 31
57 57 69
27
73 67 36
42 41 48
58 59 52
47 35 33
Cue type
Syntaxt Fragments Prag. Inc. Phrases Ab. Synt. Starts Lex . Mkd Res Responses Lex. Mkd Topic Introductions Questions
Ending Fa range Low Mid High
71
FFTOEU
FFTOBU
FFTOBF DTP
STP
61
39
61
60
40
60
36 41
64 59
46 39
54 61
58
74
26
58
42
36
64
61
39
51
49
45 64 61 63
49 49 43 34
51 51 57 66
34 36 42 50
66 64 58 50
39 41
61 59
~ ~
40
60
;::,;
50
50
55
31 47 45
69 53 55
48
50 50
50 50
42
52 70 58
40
60
?
a;::,; ""g.
-. .g ;::,; ;::,;
;::;· ~
31/34 9/7 10/9
44
27 43
33
64
30
53
47
39 49
40
60
4'/
53 53
61
65
5I
51
67
28
72
44
56
49 49
*The number of item s listed for each speaker-orientation condition indicates how many items out of 50 shared a particular charac teristic in the conversation recorded in that speaker-orientation condition. ! The syntactic categories listed here are not ex haustive, and some items may be classified as belonging to more than one category.
§ ~ ~
......
s· 8;::,; '
2:! ~
15' · ;::,;
51
w w
\0
340
D. Schaffer
judgments occurs for items ending with high F 0 in each topic end test. Note , however, that for high beginning F 0 all of the NFF tests had large majorities of STF or STP judgments, the latter seemingly running counter to the findings about F 0 increases at boundaries by Brown eta!. (1980), Kreiman (1982) and others. The FF tests show similar results, though the percentages are not as great as for the NFF tests. Possible reasons for this discrepancy between the present findings and the results of other researchers will be discussed in the concluding section. Conclusions It is quite clear from the observable relationships between test items and listener judgments described above that few characteristics actually serve as topic boundary cues in either perception or production. Even those yielding the strongest results, notably rising F 0 , other question markers, fragments and other syntactically incomplete utterances, are not totally dependable topic cues: they do not occur consistently where expected in production, nor do they signal the expected boundary to listeners every time they do occur. This is even truer of the intonation characteristics stud ied than of the syntactic and lexical features (as may be seen by comparing the results for the unfiltered tests with those for the filtered tests). In fact, the low number of items on the filtered tests which received significant numbers of agreements (12% of all filtered items) strongly suggests that prosodic information by itself is for the most part incapable of indicating topic status to conversationalists. This finding is in contrast to the role of intonation in signaling turn boundaries, since such information by itself can at least sometimes lead listeners to agree on turn status even if their judgment changes radically when syntactic and lexical information is also present (see Schaffer, 1982 , 1983). No doubt this disparity reflects differences in the place of turn and topic management in the overall organization of conversation , as well as in how each one is signalled (or perceived); thjs will be taken up again below. Likewise, while some differences in the results appeared in the NFF vs the FF tests, they were neither consistent nor significant enough to permit generalizations about the effect of speaker-orientation condition on listener judgments of topic status. That is to say, any patterns which extended over listening tests differing only in speaker-orientation condition (e.g. the matching regression variables for the NFFTOBF and FFTOBF tests) are outweighed by the lack of consistent patterns in other results. On the other hand, all of the Pearson's r correlations between such matching tests (NFFTOEU-FFTOEU, etc.) were significant at least at the 0.05 level, so that subjects must have been using at least some of the same general criteria in making their judgments. Overall, though, no basis is provided here for claiming that NFF conversationalists provide more auditory cues to topic management than do FF conversationalists? There is similarly no proof that the differences in the results for the NFF and FF tests were due specifically to differences in speaker-orientation condition; it is possible that such differences would arise between every pair of conversations , simply because each conversation is unique and subject to a great deal of variation (see below). Perhaps the most surprising result , given earlier research, was the failure of high beginning F 0 to mark the start of new topics. Not only did a majority of the test items in both NFF and FF conditions actually begin with high F 0 (see Table VIII)-itself an unexpected finding-but these items also received a majority of STP judgments in all tests, directly counter to predictions (this is especially true of the NFF tests) . The only support for any 3 Similar conclusions were drawn by Schaffer (1982, 1983), and Cook and Lalljee (1972), for turn-taking.
Intonation in topic management in conversation
341
kind of correlation between high starting F 0 and topic beginnings that can be found here at all comes from the fact that all major topic switches in the original conversations did begin with high F 0 ; however , since most other utterances also did so, as just mentioned, this F 0 characteristic can hardly be claimed to " mark" new topics either in production or, as seen from the listening test results , in perception. This conclusion leaves two things to be explained: first, why there were so many utterances in these conversations which started with high F 0 and second, why this intonation characteristic did not function here as do F 0 increases as reported in other research for both topic and paragraph organization in production (Lehiste, 1979; Goudie, 1979; Brown et al., 1980; Kreiman, 1982; Menn & Boyce , 1982). I believe that the answers to both questions depend on the nature of the speech materials used by these researchers and by me. My data were taken from two-party conversations which were quite lively and involved many short turns. Lehiste, on the other hand, used a prompted monologue as a means of obtaining speech with paragraph structure; Goudie studied newscasts and religious prophecies , beth very different from natural conversation; and while Kreiman did employ a two-party conversation , this was by necessity so structured that intra-speaker paragraphs would also appear (since listeners were judging sentence and paragraph boundaries, these had to be present within a speaker's turn, or subjects would confound turn boundaries with the other two types) . Brown et al. used a variety of materials, including read tex ts and two-party interviews, and presumably these would also involve longer turns at talk; and Menn & Boyce used adult-child interactions , which may or may not be similar in structure to adult-adult conversations. I would like to suggest that it is this difference in paragraph structure which at least partly explains the frequency of high F 0 at utterance beginnings in my conversations, and its weakness as a topic beginning cue. That is , in stretches of speech produced by one speaker which are long enough to have paragraph structure , it is likely that paragraph and topic boundaries would coincide and be marked by some of the same cues, among which might be increased F 0 for the start of new units. In my own recorded conversations , however, turns were so short tha t paragraph boundaries were virtually non-existent, and thus topic boundaries occurred both at turn boundaries and within turns but not at paragraph boundaries. If high or increased F 0 is a marker for new paragraphs, as both Lehiste and Kreiman found, then it makes sense that it would also mark new topics when the latter coincide with new paragraphs. But where paragraphs do not occur, then high F 0 might not necessarily mark topic boundaries, especially ·if it occurs frequently at other types of boundaries- namely , turn boundaries. This hypothesis about the different rates of occurrence of high F 0 at turn , topic and paragraph boundaries should be easily testable through instrumental observation. Moreover, in all of the previous studies the speech materials analysed were continuous, so that cues both before and after boundaries could have interacted with one another. Since my listening tests employed isolated utterances, they restricted the number and location of cues available to subjects, and thus the results may give a better idea of the strength of individual speech characteristics as cues. That is , perhaps high or increased F 0 is only one cue to topic starts, and a relatively minor one by itself. Its primary function, furthermore, may be to signal paragraphs rather than topics, and the paragraph may simply be less common in more spontaneous and unstructured types of speech than in speech which is close to writing in structure. Of course, it is also possible that high starting F 0 would simply not be perceived as part of a F 0 increase unless the F 0 value at the end of the preceding utterance were also heard , allowing the actual F 0 change to be evaluated. My reasoning in comparing high starting F 0
342
D. Schaffer
with F 0 increases was that in isolated utterances, high starting F 0 was more likely to suggest increased pitch to listeners than was mid or low starting F 0 , but perhaps only the actual F 0 increase can serve as a strong cue to new topics. This, too, should be empirically testable. But in general, at least in two-party conversations where neither speaker dominates, perhaps topic management makes less use of intonation than turn taking does, the former depending most of all on the content of the conversation as conveyed through lexical and syntactic information. Turn taking , on the other hand, is more independent of the conversational content; its primary function is to build the external structure of the conversationto control the smooth apportionment of the floor. However , the evidence for the importance of intonation in regulating turn taking is not much greater than for topic management (see Schaffer, 1982, 1983), so it is probable that intonation is most useful in delimiting the syntactic units of speech rather than discourse units , and even then more in formal speech than in casual, interactive conversation. Overall, though , the lack of correspondence between intonation cues and syntactic/lexical cues in signaling turn vs topic boundaries as well as sentence vs paragraph boundaries (Lehiste, 1979; Kreiman , 1982) reinforces the independence of each of these systems from the others. Three explanations seem to be possible for the low number of listening tests receiving significant numbers of agreements from subjects about t opic boundaries . The most obvious is that those cues which were present in the isolated utterances-including intonation characteristics-simply were not strong enough to signal topic changes or continuations for most of the utterances and most of the subjects. No doubt this was true at least part of the time, and for some of the reasons already discussed , such as the importance of lexical and syntactic information in determining topic status. This problem relates to a larger issue, namely that of the importance of analysing speech within the total context in which it is uttered. Researchers such as Gunter (1972) , Schegloff ( 1982) and a number of others have stressed the necessity of studying conversational phenomena within their actual contexts in order to have available all information which could possibly be affecting those conversationalists producing the phenomena. Isolating an object of study from its context distorts the analysis, since some contributing factors may be excluded from consideration in this way. One obvious candidate for such a neglected cue in the current tests is inter-utterance pause, which has been stressed as an important organizing factor in conversation (Erickson, 1982; Scallon, 1982). The only instance where this isolation from context might be acceptable is when the goal of the researcher is to determine the nature of some phenomenon in isolation. Since my aim in this study was to focus on the ability of intonation characteristics to function as topic boundary cues for listeners just when these cues are located at the beginnings or ends of utterances, I feel that this limitation is acceptable here. But such a study can only be a first step in discovering how intonation is used by speakers in an ongoing conversation, where other cues may interact with or overpower the ones under investigation, and where cues on both sides of the boundaries may be required to signal the boundaries successfully. In fact, as I have said elsewhere (Schaffer, 1983), the information present throughout a natural conversation may very well render many organizational cues redundant, regardless of whether the cues are verbal , vocal or visual , and regardless of whether the interaction is FF or NFF. However, even beyond the contribution of contextual information, it also seems likely that two other factors were also partially responsible for the test results. One is the presence in some test utterances of other , sometimes conflicting cues which were not controlled for in analysing subject responses; these could include changes in rhythm or speech rate
Intonation in topic management in conversation
343
(especially pre-boundary lengthening; see Lehiste, 1975), and changes in amplitude (Kreiman, 1982 ; Meltzer, Morris & Hayes, 1971 ; Goldberg , 1978). In fact , more than one of the subj ects of my listening tests remarked that utterances with slow or slowing speech rate sounded as if they were "winding down", i.e. coming to an end of the current topic. I am inclined to believe, however, that an even more basic cause of the current results is the overriding optionality which exists at every level of conversational organization. Speakers and listeners have a choice at every moment in the conversation as to what behaviors to produce, as well as how to interpret the behaviors of others participating in the interaction . This has been shown to be true for a variety of conversational units, such as paragraphs and turns (see, for example , Sacks, Schegloff & Jefferson , 1974; Duncan & Fiske , 1977; Kreiman , 1982; Orestrom , 1982; Schaffer, 1983), and even for topics, as far as speech production is concerned (Brown eta!., 1980). The present study confirms this optionality for both the production and perception of topic management cues: such cues do occur, but not always; nor will they always be interpreted in the same way, especially when all other information in the surrounding conversational context is taken into account. This may be especially true of prosodic cues, since for topic management it is undoubtedly the content which gives the most information about the direction and organization of topics, and since prosodic phenomena may perform several different functions at various times, thus leaving open the possibility of misinterpretation and/or confusion. It seems clear that in natural conversation the context will resolve an overwhelming majority of such potential problems of interpretation, but probably not all , since occasional difficulties in handling topic transitions do occur. Therefore, in regard to this experiment, we have some indication of the relatively minor role intonation cues play in topic management, but we need to test these results in the total context of conversations, and for conversations where the variables involved can be more rigorously controlled (e.g. by using synthetic speech), before we can fully understand how intonation functions in ongoing natural conversation. This applies not only to topic management , but to all other aspects of conversational organization as well. A shorter version of this paper was presented at the annual winter meeting of the Linguistic Society of America, Minneapolis, 27-30 December 1983. I would like to thank Professors Ilse Lehiste, Robert Fox and Arnold Zwicky, my dissertation committee, for their help with the research presented here; Dr Rachel Schaffer, Dr Nancy Levin, Jean Godby and other friends and colleagues for their many different kinds of support; and the numerous students and instructors of The Ohio State University's Introduction to Language course who acted as subjects for the experiments discussed here. These last are ultimately responsible for the results herein reported, but naturally I am responsible for any inaccuracies in their interpretation. References Beattie, G. (1981) . The regulation of speaker turns in face-to-face conversation: some implications for conversation in sound-
344
D. Schaffer
Duncan, S. & Fiske, D. (1977). Face-to-Face Interaction: Research, Methods, and Th eory . Hillsda le, New Jersey : Lawrence Erlbaum Associates. Er ickson, F. (1982). Money tree, lasagna bush, salt and pepper: social construction of top ica l co hesion in a conversatio n among Italian-Americans. In: Analyzing Discourse: Text and Talk. Georgetown Un iversity Ro undtable 1981 (Tannen, D., ed.) , pp . 43-70. Washington, D .C.: Georgetown University Press. French, P. & Local, J. (1982) . The prosodic structure of in terr uptio ns in English co nversation. Paper presented at the British Association for Applied Linguistics Seminar on Intonation and Discourse, University of Aston at Birmingham, 5- 7 April1982. Goldberg, J. (1978) . Amplitude shift: A mechani sm for the affiliation of utterances in co nversa tional intera ction. In: Studies in the Organization of Conversational Interaction (Schenkein , 1 ., ed.), pp. 199-218. New York: Academic Press. Goudie, K. (1979). A Study of Intonation and Pause in a Group of Expository Monologues. Ph.D. Dissertation , University of Michiga n , Ann Arbor , Michigan. Gunter , R. (1972). Intonation and relevance. In Intonation: Selected Readings (Bolinger, D. , ed .), pp. 194-215. Harmondsworth: Penguin. Kantor , R. (1977). Th e Managemen t and Comprehension of Discourse Connection by Pronouns in English. Ph.D. Dissertation , Ohio State University , Columbus, Ohio. Keenan, E. & Schieffelin , B. (1976). Topic as a disco ur se notio n : A study of topic in the conversations of childr en and adults. In Subject and Topic (Li, C., ed .) , pp. 335-384. New York: Academic Press . Kreiman , J. (1982). Perception of sentence and paragraph boundaries in natural conversation. Journal of Phonetics, 10,163-175. Kuno , S. (1976). Subject, theme, a nd speaker's empat hy-a reexamination of relativization phenomena. In Subject and Topic (Li, C., ed.) , pp. 417-444 . New York: Academic Press . Lehiste, I. (1975). The phonetic structure of paragrap hs. In Structure and Process in Speech Perception: Proceedings of the Symposium on Dynamic Aspects of Speech Production (Cohen, A . & Nooteboom , S. , eds), pp. 195 -206. New York : Springer-Verlag . Lehiste, I. (1979). Sentence boundaries and paragraph boundaries-perceptual evidence. In The Elements: A Para session on Linguistic Units and Levels, pp . 99-109. Chicago: Chicago Linguistic Society. Li, C. & Thompson, S. (1976). Subject and topic: A new typo logy of language. In Subject and Topic (Li , C. , ed.), pp. 457-490. New York: Academic Press. Meltzer , L., Morris, W. & Hayes , D. (1971). Interruption out comes and vocal amplitude: Exp lorations in social psychophysics. Journal of Personality and Social Psychology, 18, 392-402. Menn, L. & Boyce, S. (1982). F undamental frequ ency and discourse structure. Language and Speech , 25,341-383. Nie, N., Hull , C. , Jenkin s, 1 ., Steinbrenner , K. & Bent , D. (1975). Statistical Package for the Social Sciences. New York: McGraw-Hill. Orestrom, Bengt. (1982). When is it my turn to speak? In Impromptu Speech: A Symposium (Enkvist, N., ed.), pp. 267-276. Turku (Abo), Finland: Publication of the Research Institute of the Abo Akademi Foundation, No. 78. Sacks, H. , Schegloff , E. & Jefferson , G. (1974). A simplest systematics for the organization of turn taking for conversatio n. Language, 50, 696-735 . [Reprinted in Studies in the Organization of Conversational Interaction (Schenkein, J., ed.), pp . 7- 55. New York: Academic Press.] Schaffer , D. (1982). Intonation Cues to Management in Natural Conversation. Ph.D . Dissertation, Ohio State University , Columbus , Ohio . Schaffer , D. (1983). The role of intonation as a cue to turn taking in conversa tion.Journa1 of Phonetics, 11, 243-257. Scheglo ff, E. (1982). Discourse as an interactional achievement: Some uses of "uh huh" and other things that come between sentences . In: Analyzing Discourse: Text and Talk. Georgetown University Roundtable 1981 (Tannen, D., ed .) , pp. 71-93 . Washington, D.C.: Georgetown University Press. Scallon, R. (1982). The rhythmic integration of ordinary talk. In: Analyzing Discourse: Text and Talk. Georgetown University Roundtable 1981 (Tannen , D., ed .), pp. 335 - 349. Washington , D.C .: Georgetown University Pr ess.