The role of intonation as a cue to turn taking in conversation

The role of intonation as a cue to turn taking in conversation

Journal of Phonetics (1983 ) 11 , 243-257 The role of intonation as a cue to turn taking in conversation Deborah Schaffer Department of Linguistics, ...

7MB Sizes 0 Downloads 20 Views

Journal of Phonetics (1983 ) 11 , 243-257

The role of intonation as a cue to turn taking in conversation Deborah Schaffer Department of Linguistics, The Ohio State University, Columbus, Ohio 43210, U.S.A . Received 9th March 1983

Abstract :

The sociolinguistic literature concerning turn taking in conversation is extensive, but also limited in certain respects. Most studies have focused on kinesic behaviors and semantic/pragmatic devices for maintaining the flow of conversations, while few have investigated the role of prosody, especially in non-face-to-face turn taking. Moreover, most researchers describe the production of conversational behaviors rather than their perception by conversationalists. The present study attempts to fill these gaps through a series of listening tests incorporating both face-to-face (FF)and non-face-to-face (NFF) conversational excerpts, in order to discover how intonation is used as a perceptual cue for turn taking. Utterances isolated from these conversations were used to construct two test tapes; each set was also filtered so that the utterances were unintelligible but retained some prosodic information , notably intonation. Subjects then made turn beginning and turn end judgments for each item on the four resulting tapes. The findings show a great amount of variability in listener use of intonation as a cue to both FF and NFF speaker status, with rising fundamental frequency the strongest cue (to turn ends) in both conditions. The results also illustrate the highly interactive nature of prosody and other types of cues (e.g. syntactic and contextual information).

Introduction The phenomenon of turn taking in conversation has an extensive history of research behind it. Most of this is sociological (see Kendon, 1967; Sacks , Schegloff & Jefferson, 1974; Duncan & Fiske, 1977 ; and many others), and most concentrate on verbal (syntactic and lexical) or visual (kinesic) cues in the production of conversations. In fact , even the majority of studies which discuss prosodic cues (including Allen & Guy , 1974 ; Duncan & Fiske, 1977) still do so from the point of view of what speaker behaviors can be observed, without verifying that the "cues" so identified are actually used by listeners in the way predicted . A major exception to this is the work of Anne Cutler and her colleagues ( cf. Cutler , Pearson & Beattie , 1982; Cutler & Pearson, 1983); they combine analysis of speech production with judgments gained from listening tests in order to achieve a fuller picture of what prosodic behaviors consistently function as cues to turn taking. However , even these researchers have not dealt with another very important issue, that of the relationship between visual and auditory turn taking cues in face -to -face (FF) as opposed to non-face-to-face (NFF) conversations. NFF interactions lack visual cues present in FF 0095-44 70/ 83/030243 + 15$03.00/0

© 1983 Academic Press Inc. (London) Ltd.

244

D. Schaffer

conversations, but are still carried on without noticeable difficulty . Almost all of the turn taking and/or conversation studies of which I am aware use FF conversations , and the exceptions which deal with NFF conversations at least part of the time (Kasl & Mahl, 1965 ; Jaffe & Feldstein , 1970 ; Cook & Lalljee , 1972 ; Allen & Guy, 1974; Butterworth, Hine & Brady , 1977 ; Beattie , 1981 ; Kreiman, 1982) either do not make an explicit comparison between the NFF behaviors and what happens in FF conversations, or do not do so from the listener's point of view . The aim of this study is to make a start towards establishing what prosodic cues listeners make use of in determining the ends and beginnings of turns, and how these may differ in FF vs NFF conversations. Specifically , I would like to determine the role of intonation as a cue to turn taking in these two situations ; while intonation has been shown to play a definite role in signaling sentence and paragraph boundaries, among other syntactic and discourse units (cf. Lehiste , 1979 ; Kreiman, 1982) , its importance to turn taking in American English, at least, has not been investigated in any detail. Duncan & Fiske (1977) and others mention only very general intonation characteristics, e.g. "non-level intonation contours" , as turn end cues, but it is possible that there are finer distinctions in how different aspects of intonation function in signaling not only turn ends, but also turn beginnings and turn continuations. It would therefore increase our understanding of how conversations are managed (especially when comparing FF and NFF conversations, as Butterworth et al., 1977 , themselves suggest) to study more closely the role which intonation takes in the perception of turn boundaries, and eventually to integrate its role with that of other auditory and visual cues.

Procedure In order to discover how listeners make use of intonation in judging speaking turn boundaries, four tapes were constructed and presented to different sets of subjects, who acted as poten tial conversationalists. First, two conversations between the same pair of female speakers were recorded: one was face-to-face and the other was non-face-to-face,with the conversationalists sitting back-to-back . These are the two speaker-orientation conditions listed in Table I. Fifty excerpts (sentences , phrases, or single wOFds) from each conversation were isolated and randomly ordered on two separate test tapes (one using the NFF utterances and one using the FF utterances). Two additional tapes were constructed using the same sets of utterances after they had been band-pass filtered so as to render them unintelligible by removing part of the speech signal , but with most aspects of prosody (including intonation patterns) preserved. Table I Speaker orientation

Test and judgment categories

Task

Non-face-to-face Turn end (NFF) (TUE)

Face-to-face (FF)

Filtering condition Unfiltered Filtered NFFTUEU

NFFTUEF

Turn beginning (TUB)

NFFTUBU

NFFTUBF

Turn end (TUE) Turn beginning (TUB)

FFTUEU

FFTUEF

FFTUBU

FFTUBF

Judgment type "New speaker follows" (NSF) judgments/ "Same speaker follows" (SSF) judgments "Different speaker precedes" (DSP) judgments/ "Same speaker precedes" (SSP) judgments NSF judgments/ SSF judgments DSP judgments/ SSP judgments

Intonation in conversation

245

The original conversations were recorded on a TEAC 40-4 Tascam Series four -track tape recorder, with the speakers sitting in an anechoic chamber. The finished test tapes were presented to subjects in a classroom setting, using either a Tandberg Series 15 two-track tape recorder or a Uher 4000 Report-L tape recorder. For the filtered tapes, the low-pass and high-pass cut-off points were 310Hz and 260Hz, respectively; since two filters were employed in tandem, the overall decay function was twelve dB per octave. The resulting quality of sound was definitely speech-like, although unintelligible. For some objections to the use of filtered speech, and responses to these objections, see Kreiman (1982) and Schaffer (1982) . Subjects hearing a particular tape were asked to make one of two types of judgments for each of the fifty utterances on the tape. One type of judgment was to decide whether the speaker of the utterance which followed the current test item in the original conversation had been the same speaker as the one who produced the current test item, or whether a new speaker had produced the following utterance; the other type was to make the same sort of judgment, but for the speakers of the utterances preceding each test item in the original conversation. There were thus eight listening tests in all. Table I shows the test categories, plus the types of judgments to be made for each test, and the labels used here for both test and judgment categories. The listening tests were presented to groups of subjects ranging in number from 20- 31 (see Table II). The subjects were told to make their judgments as best they could by finding whatever information was available in each isolated test item ; if they had no strong intuitions (especially for the filtered test tapes) , they were to guess. Each tape included four sample utterances for practice, and a period of questions and clarifications was set aside before each listening test began. Table II

Listening test FFTUEU,FFTUEF NFFTUBF NFFTUEU FFTUBU NFFTUBU, FFTUBF NFFTUEF

Listening test sample size and number of significant agreements

Sample size

Number of agreements reaching significance

Percent equivalent

31 29 25 24 22 20

24 22 20 19 18 17

77.4 75:9 80.0 79.2 81.8 85.0

It was expected that those utterances with strong turn end, beginning, or continuation cues would receive a large number of the same type of judgment from subjects hearing them , and that these cues would be identifiable through inspection of the utterances themselves, along with comparisons made between utterances with similar characteristics and patterns of subject responses. For the unfiitered test utterances, verbal as well as prosodic information should influence the judgments of the listeners, but for the filtered tests only prosodic cues would be available. The judgments made for the test items could then also be compared between the unfiltered and filtered versions of the utterances, and the usefulness of intonation as a predictor of listener judgments when interacting with verbal cues could be contrasted with its effectiveness when only other prosodic information was present. For more discussion of the design and purpose of these experiments, see Schaffer (1982).

246

D. Schaffer

Cues In analyzing the results of the listening tests the utterances were categorized according to the characteristics they possessed which were considered to be possible turn taking cues. These were of two types: syntactic/lexical characteristics and intonation characteristics. First, each unfiltered test utterance was classed either as (a) a sentence, having complete constituent structure, including a subject, verb and possible objects (for instance, "even bright people aren't getting jobs") ; (b) a phrase, consisting of a complete constituent which lacks some unit necessary for a complete sentence (for example, a noun phrase such as "more parts of the tape") ; or (c) a fragment, ending within a constituent, e.g. after a definite article (as in "who else is invited to the"). Some of the phrases either syntactically or lexically gave the impression that the speaker had more to say (e.g. "oh I just heard"); these were called pragmatically incomplete phrases. Likewise some of the phrases or fragments began in what appeared to be the middles of sentences, given their lexical and syntactic structure (for example, "although that would sound like more fun"), and these were labeled abn.tpt syntactic starts. A different category of utterance, lexically marked responses, began with lexical items indicating responses to questions or statements (primarily "yeah"), while other test items were themselves syntactically or intonationally marked questions. For intonation, a number of different phenomena were analyzed. Narrow- and wide-band spectrograms were made for each test utterance, and from these fundamental frequency (F0 ) values were taken of every measurable sonorant. The F 0 values of the last measurable point in each utterance and for the last stressed syllable preceding the endpoint were used in calculating the degree of change in F 0 at utterance ends, leading to classifications of intonation contour types. Thus, items had falling contours if the ratio of the F 0 value of the last stressed syllable to that of the last measured syllable was 1.06 or greater (one semitone or more), rising contours if the ratio was 0.94 or less (again, a one-semitone difference), level contours if the ratio was between 0.94 and 1.06; and if falling or level items also had laryngealization at their ends, they were placed in a separate category, since laryngealization has been associated with unit ends in other research (Lehiste, 1975; Kreiman, 1982;etc.). In addition, the average F 0 of the stressed vowels and unstressed vowels for each speaker in each conversation was also calculated and used to divide each speaker's F 0 range for each conversation into high, mid and low regions (Brown, Currie & Kenworthy, 1980; Schaffer, 1982). Each measured F 0 value of an utterance could then be placed somewhere in that range. Since other researchers have found increases in F 0 to mark paragraph and topic starts in speech (Lehiste, 1975 ; Brown eta!., 1980), the possibility of higher F 0 starts for turn beginnings was considered worth investigating, along with the possibility of other uses of speaker's F 0 range as turn cues. Therefore , the beginning and ending F 0 measurements of each test item were classed either as high (greater than the average F 0 found for stressed syllables), low (below the average F 0 found for unstressed syllables), or mid (between the two average F 0 values). Results

The number of listener agreements for a particular judgment about each item on each test was determined; then the number of these agreements which would be significantly different from chance at the 0 .01 level was calculated for each test through the x2 approximation to the binomial distribution, with Yates' correction. The minimum numbers of agreements for each item which reached significance, as shown in Table II , range from 17 out of 20 subjects (for the NFFTUEF test) to 24 out of 31 (for both FF turn end tests).

Intonation in conversation

247

The results for the 50 items on each listening test were then examined to see which ones received significant numbers of listener agreements. A total of 122 out of the 400 items on all eight tests, or 31%, received significant numbers of judgments: Table III shows that the greatest number of items reaching significant levels was 26 out of 50, for the FFTUEU test, while the smallest number was two out of 50 for the FFTUBF test. Of the 122 significant items , 65 were FF items and 57 were NFF items; 85 were unfiltered items, while the remaining 3 7 were filtered ; 69 received judgments for a new or different speaker following or preceding the item, while 53 received judgments for the same speaker following or preceding; and 73 of the items were on turn end tests, with the other 49 on the turn beginning tests. Table III

Distribution of significant items over listening testsa

Number of significant items: Same speaker/ New or different speaker

Total

NFFTUEU NFFTUEF NFFTUBU NFFTUBF FFTUEU FFTUEF FFTUBU FFTUBF

10/10 5/2 4/18 3/5 13/13 11/9 6/11 1/1

20 7 22 8 20 17 2

Total

53/69

122

Listening test

26

a Fifty items per test; 400 total.

In addition, only ten of the 122 items received unanimous listener judgments - nine NFF utterances and one FF utterance, all unfiltered. To determine the significant patterns in these listener judgments, a number of statistical tests were performed, including x2 tests , Pearson's r correlations, and stepwise multiple linear regressions. One set of x2 tests 1 investigated the distribution of listener judgments in each listening test over the intonation contour categories (falling, rising, level, and laryngealized), the beginning and ending F 0 range categories (high, mid, low), and certain of the syntactic structure categories (sentence, phrase, fragment). These cue types yielded significant x2 values for almost all of the turn end listening tests in both NFF and FF conditions. For the turn beginning listening tests, however, fewer cue categories resulted in significant x2 values, with more for the NFF tests than the FF (in fact, syntactic structure yielded no significant values in the latter set). Rising F 0 and high ending F 0 led to greater numbers of NSF judgments than expected for both filtered and unfiltered turn end tests, while level F 0 and fragments led to fewer NSF judgments than expected for most of these tests. The other cues yielded more irregular results, especially the remaining F 0 range categories. Other x2 tests compared how these cue categories affected judgments for listening tests differing only in the speaker-orientation condition (i.e. the NFFTUEU and FFTUEU tests 1 Because of the large numbers of x' tests calculated for each listening test, and their relative lack of statistical power, the x' results will only be summarized here in general terms ; please refer to Schaffer (1982) for a complete discussion.

248

D. Schaffer

results were compared for each set of cues, and so on for the other test pairs). Most of the results were extremely significant (for all cue types, in fact, except ending F 0 range for the comparison of all NFF and FF filtered tests) , showing that the subjects of the NFF tests reacted quite differently from those of the FF test counterparts even when dealing with the same sorts of cues. However , the specific categories of fragments , rising F 0 , low beginning F 0 range and mid ending F 0 range appear to function similarly in both NFF and FF conditions , based on the l results for these cues . Pearson's r correlations between listener judgments for the test items and the F 0 ratio values of these items were calculated to determine whether the degree of F 0 rise or fall might have some effect on how strong a turn end or beginning cue the F 0 change would be. That is, if a significant positive correlation between ratios of 1.06 or greater and numbers of new /different speaker judgments was found, this would mean that larger F 0 falls resulted in more of these judgments than did smaller falls ; the same would be true for a significant negative correlation between ratios of 0.94 or less (for intonation rises) and the numbers of such listener judgments. Only three of these sixteen tests yielded significant r values (Table IV), two of them being negative correlations for NSF judgments and rising F 0 ratios, as predicted, and one a positive correlation of F 0 fall ratios and DSP judgments. Of the first two significant r values, one was for the NFFTUEF test and one for the FFTUEU test, so no claim can be made that either filterin g condition or speaker-orientation condition affects this use of rising intonation . In general, too , it does not seem that degree of F 0 change has a differential effect on the strength of that change as a turn end or turn beginning cue. Table IV

Pearson's r correlations of F 0 ratio vs NSF judgments

F 0 rise ratios (0.94-)b

Listening test

NFFTUEU NFFTUEF NFFTUBU NFFTUBF FFTUEU FFTUEF FFTUBU FFTUBF

0.078 0.233 -0.306 -0.060 0.079 0.029 0.467d 0.122

-0.398 - 0.753e -0. 192 -0.247 -0.561c -0.120 -0.044 0.248

a For the NFF tests, df = 26 ; for the FF tests, df = 25. bFor the NFF tests, df = 11; for the FF tests, df = 13. cp < 0.025. dp < 0.01. ep < 0.005.

The judgments made for each listening test were also correlated with one another, across tests, to see if relationships existed between the various sorts of units and conditions being judged; the results are listed in Table V. Of the four test pairs which were alike except for speaker orientation (e .g. NFFTUEU and FFTUEU, etc.), the turn end test pairs (filtered and unfiltered) correlated significantly at the 0.01 level, the unfiltered turn beginning test pair correlated only at the 0 .025 level, and the filtered turn beginning test pair failed to correlate significantly . It appears from this that subjects were judging turn ends in a more similar fashion in both FF and NFF conversations than they were turn beginnings in the two conditions. Two of the four test pairs differing only in filtering condition , the FF turn beginning and NFF turn end test pairs, also correlated significantly. Overall, the highest correlation was only r = 0.649 (r 2 = 0.421 ), for the NFFTUEU and FFTUEU tests, so it is

.......

;:;

Table V

..... a

;:;

Pearson's r correlations between listening test scores

;::,

:::t.

FFTUEU

FFTUEF

FFTUBU

FFTUEU FFTUEF FFTUBU FFTUBF NFFTUEU NFFTUEF NFFTUBU NFFTUBF ap

< 0.05 .

FFTUBF 0.2966a 0 .3295b

NFFTUEU

NFFTUEF

0.6488c 0.285la

0.2695a 0.4018b

NFFTUBU

NFFTUBF

a

;:;

s· ~

a

;:;

"'..,"'~

0.26 16a 0.2505a

.....

c:;·

0.4125 c

;:;

0.2536a

bp

< 0.01.

cp

< 0.001.

IV \0

""'"

250

D. Schaffer

evident that most of the variance in these listening test judgments is not accounted for by the nature of the tasks themselves. Nor is much of the variance in test results accounted for by regressing intonation and other variables against listener judgments. Eight variables were so regressed, using a computer program for stepwise multiple linear regression analysis (Nie et al., 197 5). The three categorial variables, syntactic structure (SYNTYPE), beginning F 0 range (BEGRNGE) and ending F 0 range (ENDRNGE), were assigned values from one to three for each test item, while the beginning F 0 values (BEGFREQ), ending F 0 values (ENDFREQ), F 0 ratio values (RATIOB), length of item in words (LENWORD) and length of item in milliseconds (LENMSEC) were all taken as measured for each test item. These variables were entered one at a time into the regression equations of each listening test, and a hierarchy of significance was established. Table VI lists only those variables which achieved significance at the 0.05 level, plus relevant statistics for the regression analysis. As can be seen, no more than three variables significantly contributed to the regression of any listening test, and three of the tests had only one significant variable apiece at p < 0.05. The NFF and FF turn end regression variables are quite similar, although the NFF tests have an extra variable in each filtering condition (LENWORD). In the unfiltered tests SYNTYPE is the most significant variable, with ENDRNGE second; in the filtered tests ENDFREQ and BEGFREQ play the greatest roles. In contrast, the NFF turn beginning tests rely more on beginning F 0 characteristics, while the FF counterparts have LENMSEC as a major variable. Since ending F 0 characteristics enter into the regressions of both turn end and turn beginning tests, the ending of an utterance might well be the most informative about turn status, although more evidence than this is needed to make such a claim. In any case, for only one listening test, the NFFTUEU test, was even half of the variance in listener judgments accounted for by the variables (r = 0.754, r 2 = 0.569), so that other factors not considered in the regression analysis must be influencing the test results. These will be discussed later. Table VI Results of regression analysis Listening test

Variable name

Ym

r2

SYNTYPEC ENDRNGEC LENWORDb

0.754

0.569

ENDFREQc LENWORDC BEGFREQb

0.658

NFFTUBU

BEGRNGEb

NFFTUBF

2

change

Beta

F

df

0.361 0.148 0.060

0.538 -0.408 0.248

20.257c

3,46

0.433

0.240 0.140 0.053

0.463 0.436 0.266

11.687c

3,46

0.322

0.104

0.104

0.322

5.547b

1,48

BEGFREQc ENDRNGEC

0.592

0.350

0.229 0.121

0.495 0.349

12.660c

2,47

FFTUEU

SYNTYPEC ENDRNGEC

0.628

0.394

0.306 0.088

0.529 -0.298

15.277c

2,47

FFTUEF

ENDFREQc (BEGFREQ)a

0.608

0.369

0.316 0.053

0.477 0.246

13 .749c

2,47

FFTUBU

LENMSECC ENDFREQb

0.515

0.265

0.169 0.096

-0.486 -0.319

8.4 71 c

2,47

FFTUBF

LENMSECC

0.456

0.208

0.208

-0.456

12.603c

1,48

NFFTUEU

NFFTUEF

r

m

a Approaches p = 0.05.

bp

< 0.05 .

cP < O.Dl.

Intonation in conversation

251

A final form of analysis of listener judgments involved the inspection of those judgments made for individual items and groups of items which shared syntactic, lexical or intonation characteristics . Table VII lists the categories considered, with the percent of each type of judgment received by items in each category, test by test. The number of items within each category is listed in the second column: the number before each slash represents the NFF tests , and the number after each slash represents the FF tests. Judging from the distribution of listener judgments, fragments and pragmatically incom plete phrases were strong turn continuation cues in both NFF and FF conditions (a little more so in the NFF situation, however), and even for filtered, unintelligible utterances; intonation characteristics, e.g. level F 0 contours, may have helped here. Abrupt syntactic starts received a smaller majority of SSP judgments, even unfiltered, and the FF utterances actually received more NSF judgments than SSP judgments (again, probably because other factors were involved). On the other hand, lexically marked responses did receive clear majorities of DSP judgments in both unfiltered turn beginning tests,2 and unfiltered questions likewise were overwhelmingly judged to have new speakers following (this pattern holds to a lesser degree for utterances only marked prosodically, too - even in the filtered tests, unlike the syntactically marked questions). Many of the individual items displaying one or more of these characteristics (when unfiltered) also received a significant majority of the expected judgments: for instance, eight of the nine fragments received a significant number of SSF judgments. Moreover, all of the unanimously judged items displayed at least one of these syntactic and lexical characteristics. All of these trends for the syntactic and lexical features are stronger than those discernible for items sharing intonation characteristics. Laryngealization appears to have minimal effect on any of the turn judgments made, especially with regard to turn ends, and so is not as decisive a cue for turn ends as it is for other unit ends (e.g. sentences or paragraphs). The results for falling F 0 are even farther from those expected, since for the most part those tests with a clear majority of one type of judgment over the other actually have more judgments for same speaker than for new/different speaker. Certainly, then, falling F 0 with or without laryngealization is ·not an unambiguous indication of turn ends. Level intonation distinguishes only the NFFTUEF and FFTUEU tests, signaling turn continuation, as would be expected. But this function fails to carry over to the other turn end tests, so level F 0 may not be a particularly strong cue by itself. Even the rising F 0 results are somewhat surprising: in the NFF tests - filtered and unfiltered - F 0 rise does lead to more NSF judgments than SSF (to be expected, if listeners were interpreting rising intonation as signaling questions), but in the FF tests this pattern is much less apparent. It is true that rising intonation may signal incompleteness as well as questions, and perhaps this was the interpretation given to the five FF and two NFF test items with rising F 0 which received a majority of SSF judgments . Nevertheless, a majority of the items with rising and level intonation still individually received significant numbers of NSF and SSF judgments, respectively, in both speaker-orientation conditions. The results for beginning F 0 include a clear majority of DSP judgments over SSP judgments in the NFFTUBU test, for both low and mid ranges, while high F 0 range has a slight majority of SSP judgments for the NFFTUBF test. This is surprising, given the findings of other researchers that higher starting F 0 marks the beginnings of paragraphs and topics in 2 This may be because some of the items were identifiable as back channel utterances (Yngve, 1970). Back channels were not studied separately here, however , due to limitations of time and test materials.

N Ul

N

Table VII

Cue type

Distribution of listening test judgments (in percent) by cue categories

No . of items (NNF/F F)

NFFTUEU NSF SSF

NFFTUEF NSF SSF

NFFTUBU DSP SSP

NFFTUBF FFTUEU DSP SSP NSF SSF

FFTUEF NSF SSF

FFTUBU DSP SSP

FFTUBF DSP SSP

Syntax a Fragments Prag. inc . phrases Ab. synt. starts Lex. mkd. responses Questions

5/4 4/9 5/5 13 /8 11/13

5 11 40 33 87

95 89 60 67 13

29 30 39 42 52

71 70 61 58 48

74 51 40 86 65

26 49 60 14 35

48 47 49 54 40

52 53 51 46 60

24 22 19 54 84

76 78 81 46 16

30 39 33 41 57

70 61 67 59 43

49 50 33 76 49

51 50 67 24 51

50 51 42 57 48

50 49 58 43 52

Intonation contour Laryngeal Level Fo Falling Fo Rising Fo

7/14 8/6 22/15 13/15

47 49 35 68

53 51 65 32

43 26 40 60

57 74 60 40

58 43 71 59

42 57 29 41

49 50 53 46

51 50 47 54

53 30 45 58

47 70 55 42

37 42 39 55

63 58 61 45

60 49 47 53

40 51 53 47

49 43 49 45

51 57 51 55

Beginning F 0 range Low Mid High

10/9 11I 5 29/34

44 56 46

56 44 54

49 45 44

51 55 56

80 60 50

20 40 50

59 50 39

41 50 61

48 46 50

52 54 50

54 37 45

46 63 55

42 52 55

58 48 45

46 48 48

54 52 52

Ending F 0 range Low Mid High

31 /34 9/7 10/9

38 53 73

62 47 27

39 42 56

61 58 44

61 66 59

39 34 41

54 44 47

46 56 53

42 68 64

58 32 36

42 41 62

58 59 38

55 52 44

45 48 56

49 38 49

51 62 51

a The syntactic ca te~ories listed here are not exhaustive, and some items may be classified as belonging to more than one category.

q ~ ~

~

~ ....

Intonation in conversation

253

discourse (see above). However , even these trends are not duplicated in the FF tests, and are probably due to syntactic and lexical factors in the intelligible utterances . A difference between the speech materials used by Brown eta!. (1980) and Kreiman (1982), on the one hand, and my own conversations, on the other, which might account for the failure of high starting F 0 to function as a turn beginning cue is the lack of paragraph structure in the latter; see Schaffer (1982) for more discussion. The low ending F 0 range results in a majority of SSF judgments for the two NFF turn end tests , and a majority of DSP judgments in the NFFTUBU test, but these are not duplicated in the FF tests. Mid ending F 0 leads to more DSP judgments and NSF judgments only in the NFFTUBU test and FFTUEU test, respectively , meaning that other syntactic and lexical factors could be influencing the subjects. Finally, high ending F 0 led to a majority of NSF judgments in both NFF and FF unfiltered turn end tests , and even in the filtered FF test; this is probably due to the association of a number of these test items (including most receiving significant numbers of NSF judgments) with rising F 0 , and thus with questions. In fact, I would hypothesize that this link with rising intonation and question is the main factor responsible for ending F 0 range (ENDRNGE) appearing as a significant variable in the regression analyses of the NFFTUEU, FFTUEU and FFTUEF listening test results. Conclusions In general, it appears that there are few consistent relationships holding between the intonation characteristics present in an item and what kinds of judgments it receives. Even in cases where a trend is apparent, none of the actual intonation characteristics exclusively marks a particular type of turn boundary in either one or both speaker-orientation conditions. Syntactic and lexical characteristics appear to be used much more consistently as cues to turn status. This is shown by the greater number of significant agreements resulting for items when unfiltered as opposed to filtered, and by the other statistical tests. It is clear that syntactic and lexical/semantic information is more useful to subjects in judging turn boundaries than is prosodic information alone, although the sometimes large reversals in agree ments about following or preceding speaker which occur for particular items when they are unfiltered as opposed to filtered - e.g. an item may receive a majority of NSF judgments when unfiltered, but a majority of SSF judgments when filtered - show that prosodic and verbal cues do interact with one another. Syntactic and lexical information also allows subjects to make more accurate judgments about turn status as compared to prosodic information alone, although in both cases the significant agreements for items are usually over 60% correct. The statistical tests show considerable differences in the way test items were judged in the two conditions, but these differences do not include either intonation characteristics or patterns of listener judgments made in reaction to these characteristics which consistently occur for the utterances produced in one condition as opposed to the other. Such variation may be a product of the two diffe rent orientations of the speakers giving rise to differences in the cues which they present , but there is little systematic evidence here of how the cues differ. Nor is it yet possible to say that such differences would repeat themselves when other NFF and FF conversations are compared (even given that the subject population and the nature of the test utterances are matched as closely as possible across all tests in both NFF and FF conditions, as they were here). Without more such comparisons the possibility cannot be ruled out that every conversation would lead to a different use of those prosodic characteristics in judging turn boundaries, simply because there are too many other factors

254

D. Schaffer

(and interactions between factors) contributing to the organization of conversation. That such variability can exist without disrupting communication is indicative both of the redundancy and the creativity built into language and conversational organization . In spite of this variability, however, there are some parallel similarities and differences in the statistical results for some of the listening tests in the FF and NFF conditions: the matching significant regression variables in the NFFTUEU-FFTUEU and NFFTUEFFFTUEF tests; the similar role of rising F 0 and fragments in the x2 comparison of the NFF and FF results; the appearance of the LENWORD variable in the NFFTUEU-NFFTUEF regression analyses but not in the FF counterparts ; and others. Moreover, a number of significant correlations between the NFF and FF test results did occur, suggesting that subjects were making some parallel uses of cues in their judgments, even if the actual values of their responses were different enough to lead to significant differences in other tests. But in general, what the results indicate is that similarities in listener reactions to available cues are outweighed by a great deal of variability within and across NFF and FF conversational utterances. Therefore , in spite of some advantages for the NFF results over the FF - e.g. the higher proportion of variance accounted for in the regression analyses for most of the NFF tests than for the corresponding FF tests, as shown by r~- there is still no conclusive evidence that listeners have more auditory cues to work with in NFF conversations than in FF conversations, whether due to compensation on the part of the conversationalists for lack of visual cues or to some other reason. This is consistent with Cook & Lalljee's (1972) report of few significant differences in the verbal behaviors studied in their FF and NFF conversations; however, Butterworth eta/. (1977) did find some differences in pause behavior between the two different kinds of NFF conditions they studied (a telephone condition and one where the participants were separated by a screen). This suggests that other differences might also exist in the nonverbal behaviors - including intonation patterns - presented in the two conditions. It should be possible to test this hypothesis experimentally, both with regard to the production of such behaviors and their utilization by listeners as cues to conversational management. A different sort of finding is that the number of items receiving significant agreements was actually greater for the turn end tests than for the turn beginning tests, although for the NFF tests the numbers were comparable for each pair of tests matched for filtering condition (i.e. NFFTUEU-NFFTUBU, NFFTUEF-NFFTUBF), suggesting that some kind of compensation might take place in the NFF conversation with regard to turn beginnings. Also, the regression analyses for the turn end tests resulted in higher proportions of variance accounted for than did those for the turn beginning tests, with an extra variable involved significantly , and there were more significant x2 values for tests on turn ends than on turn beginnings. So even though intonation cues appear to be relatively ineffective for both types of turn boundaries, we must still ask why turn ends are judged better - even part of the time - than are turn beginnings, especially given that equivalent types of cues are available at both ends of an utterance . I would hypothesize that the answer lies not in actual differences in the production of turn end vs turn beginning cues (since I have found no evidence of such differences here), but in the listener's attention to and interpretation of these cues . That is, it may be more important for the successful, smooth progression of a two-party conversation that turn ends be signaled to the listener- or that he or she be sensitive to such signals- than that current speakers be forewarned about new turn beginnings (the likely exception to this is the case of interruptions; cf. Meltzer, Morris & Hayes , 1971 ; French & Local , 1982 , for evidence that interruptions are especially marked). Listeners may be more aware of cues to turn ends

Intonation in conversation

255

than turn beginnings because as polite conversationalists they will attach more importance to the end of the current speaker's turn, which will allow them to take over the floor without violating any rules of conversation, than to the start of their own, which they are in control of in any event. The situation should be different in multi-party conversations, however, since listeners must keep track of other potential turn takers among current listeners in addition to themselves . The fact that subjects were able to reach significant levels of agreements about the turn boundaries of utterances in only 31 % of the test items, turn beginnings and turn ends , very strongly suggests that other factors than the cues studied here were influencing their decisions. No doubt some of their judgments were the result of pure chance, but the statistics do support the claim that a good many judgments were not. The most likely candidates for other cues which could have been present in the utterances and used by the subjects, judging from their own comments and other sources, may well be rhythm or speech rate (including pre-boundary lengthening; see Lehiste, 1975), and changes in amplitude (Kreiman, 1982 ; Meltzer et al., 1971; French & Local, 1982). Moreover, it is still possible that intonation does play a more consistent and widespread part in turn taking than has been suggested here, but in a way too subtle to be isolated in the highly variable and interactive setting of natural conversational utterances. At the least , these results show that rising F 0 and falling F 0 do not function in the same way as turn end cues , counter to what Duncan & Fiske's (1977) "intonation-marked clause" cue suggests ; rising F 0 is actually a much stronger cue ( cf. also Goodwin, 1979). With respect to this issue, speech synthesis should provide more controlled conditions for studying the role of slight modifications of intonation in turn taking. Besides the issues just mentioned, the isolation of test items from their surrounding conversational context must be considered as a factor contributing to the results found here . A number of researchers (including Gunter, 1972 ; Schegloff, 1982; etc.) have stressed the importance of studying conversational phenomena in context in order to have available all information which could possibly be affecting those phenomena. Removing an object from its context distorts the analysis, because not all the contributing factors are being taken into account. In the case of my own experiments , this limitation is an acceptable one , since my goal was only to clarify how intonation functions at certain types of boundaries . But because the judgments of my subjects were based on isolated utterances , we cannot know what cues were present in the intact conversations which could have interacted with intonation characteristics and/or provided much stronger cues to the turn status of each test utterance. In fact , the contextual information of normal conversations might often obviate the need for many organizational cues, or at least render them redundant , and perhaps this is why FF and NFF conversations can show few differences in nonverbal (and verbal) auditory cues and still progress with equal ease , in spite of the lack of visual information in the latter condition . Moreover , in the intact conversation subjects have access to inter-utterance pauses , which not only may serve as turn taking cues, but must surely also contribute to the overall rhythmic structure of the conversation, a very important aspect of its organization (Erickson , 1982 ; Scallon , 1982). Devising experiments which can discover the intuitions of subjects about conversational phenomena heard within the total context of the conversation must thus remain an important goal for future research. Even beyond the matter of context, however, is that of the optionality inherent in conversational behavior. Kreiman (1982), Brown et al. (1980) and others have found considerable variation in the use of and reaction to cues for other types of units , such as paragraphs

256

D. Schaffer

and topics . Likewise, many of the behaviors studied here can act as cues to turn boundaries, but do not automatically do so each time they occur, no doubt often because of what other information is also present (or absent), but also perhaps because of individual speaker/ listener variation. The complications are increased when one considers that some prosodic phenomena may serve more than one function, and so could be mistaken as a cue for some thing not intended at all. Clearly, the overall context in which something is said can, and usually does, eliminate potential ambiguity of this sort, but never all of the time, as simultaneous starts and overlaps show. And there are always willful violations of rules of turn taking, as in intentional interruptions . It is up to the individual conversationalist at each point in the conversation to decide what cues to display and what subsequent actions to take, and the same applies to the listener. Thus, the variability in these test results may well be the normal variability of participants negotiating a conversation. Further experimentation may teach us more about the nature of such variation in conversational organization. A shorter version of this paper was presented at the annual winter meeting of the Linguistic Society of America, San Diego, December 27- 30, 1982. Special thanks go to Dr Use Lehiste for all her help in the design and analysis of these experiments; to Dr Robert Fox for sharing his statistical and computer expertise; to Dr Rachel Schaffer for her advice and support; and to the many students and teachers of the Introduction to Language course taught at The Ohio State University who participated in the listening tests. References Allen, D. & Guy, R. (1974). Conversation Analysis: The Sociology of Talk. The Hague: Mouton. Beattie, G. (1981). The regulation of speaker turns in face -to -face conversation: Some implications for conversation in sound-only channels. Semiotica, 34, 55-70. Brown, G., Currie, K. & Kenworthy, J. (1980). Questions of Intonation. Baltimore: University Park Press. Butterworth, B., Hine, R. & Brady , K. (1977). Speech and interaction in sound-only communication channels. Semiotica, 20,81-99. Cook , M. & Lalljee, M. (1972). Verbal substitutes for visual signs in interaction. Semiotica, 6, 212-221. Cutler, A. & Pearson, M. (1983). On the analysis of prosodic turn-taking cues. In: Studies in Intonation and Discourse (Johns-Lewis, C., ed.). London: Croom Helm. Cutler, A., Pearson, M. & Beattie, G. (1982). Prosodic cues to turn -taking in conversation . Paper presented at the British Association for Applied Linguistics Seminar on Intonation and Discourse. University of Aston in Birmingham, April 5-7 , 1982. Duncan, S. (1972). Some signals and rules for taking speaking turns in conversation. Journal of Personality and Social Psychology, 23, 283-292. Duncan, S. & Fiske, D. (1977) . Face-to-Face Interaction: Research, Methods, and Theory. Hillsdale, N.J.: Lawrence Erlbaum Associates. Duncan, S. & Nierderehe, G. (1972). On signalling that it's your turn to speak . Journal of Experimental Social Psychology, 10,231-247. Erickson, F. (1982). Money tree, lasagna bush, salt and pepper: Social construction of topical cohesion in a conversation among Italian- Americans . In: Analyzing Discourse: Text and Talk. Georgetown University Roundtable 1981 (Tanner , D. , ed.), pp. 43 - 70. Washington , D.C.: Georgetown University Press . French, P. & Local, J. (1982). The prosodic structure of interruptions in English conversation. Paper presented at the British Association for Applied Linguistics Seminar on Intonation and Discourse. University of Aston in Birmingham, April 5-7, 1982. Goldberg , J. (1978). Amplitude shift: A mechanism for the affiliation of utterances in conversational interaction. In: Studies in the Organization of Conversational Interaction (Schenkein, J ., ed.), pp.199-218. New York: Academic Press. Goodwin, C. (1979). Review of Face-to-Face Interaction: Research, Methods, and Theory, by Starkey Duncan and Donald Fiske . Language in Society, 8, 439-444 . Goudie, K. (1979) . A Study of Intonation and Pause in a Group of Expository Monologues. Ph.D. Dissertation. Ann Arbor, Michigan: University of Michigan. Gunter, R. (1972). Intonation and relevance. In : Intonation: Selected Readings (Bolinger, D., ed.), pp. 194-215. Harmondsworth: Penguin Books. Harrigan , J. (1980). Methods of turn-taking in group interaction. In: Papers from the Sixteenth Regional Meeting of the Chicago Linguistic Society. pp. 102-111. Chicago: Chicago Linguistic Society. Jaffe, J. & Feldstein, S. (1970). Rhy thms of Dialogue. New York: Academic Press.

Intonation in conversation

257

Kasl, S. & Mahl, G. (1965). The relationship of disturbance and hesitations in spontaneous speech to anxiety. Journal of Personality and Social Psychology, 1, 425-433. Kendon, A. (1967). Some functions of gaze direction in social interaction. Acta Psychologica, 26, 1-47. Kreiman, J. (1982). Perception of sentence and paragraph boundaries in natural conversation. Journal of Phonetics, 10,163-175. Lehiste, I. (1975). The phonetic structure of paragraphs. In: Structure and Process in Speech Perception: Proceedings of the Symposium on Dynamic Aspects of Speech Perception (Cohen, A. & Nooteboom, S., eds), pp. 195-206. New York: Springer-Verlag. Lehiste, I. (1979). Sentence boundaries and paragraph boundaries - perceptual evidence. In: The Elements: A Parasession on Linguistic Units and Levels. pp. 99-109 . Chicago: Chicago Linguistic Society. Meltzer, L., Morris, W. & Hayes, D. (1971 ). Interruption outcomes and vocal amplitude: Explorations in social psychophysics. Journal of Personality and Social Psychology, 18, 392-402. Nie, N., Hull, C. , Jenkins, J ., Steinbrenner, K. & Bent, D. (1975) . Statistical Package for the Social Sciences. New York: McGraw-Hill. Sacks, H., Schegloff, E. & Jefferson, G. (1974). A symplest systematics for the organization of turn taking for conversation. Language, 50, 696-735. Reprinted in: Studies in the Organization of Conversational Interaction (Schenkein, J ., ed.), pp. 7-55. New York: Academic Press. Schaffer, D . (1982). Intonation Cues to Management in Natural Conversation. Ph.D. Dissertation. Columbus, Ohio: Ohio State University. Schegloff, E. (1982). Discourse as an interactional achievement : Some uses of "uh huh" and other things that come between sentences. In: Analyzing Discourse: Text and Talk. Georgetown University Roundtable 1981. (Tannen , D., ed.), pp. 71-93. Washington, D.C.: Georgetown University Press. Scallon, R. (1982). The rhythmic integration of ordinary talk. In: Analyzing Discourse: Text and Talk. Georgetown University Roundtable 1981. (Tannen, D., ed.), pp. 335-349. Washington, D.C. : Georgetown University Press. Tannen, D. (1979). Processes and Consequences of Conversational Style. Ph.D. Dissertation. Berkeley, California: University of California. Wiemann , J. & Knapp, M. (1975). Turn-taking in conversation. Journal of Communication, 25, 75-92. Yngve , V. (1970). On getting a word in edgewise. In: Papers from the Sixth Regional Meeting of the Chicago Linguistic Society. pp. 567-578. Chicago: Chicago Lingustic Society.