Journal ofPhonetics (1973) 1, 309- 327
Intonation by rule: a perceptual quest J. 't Hart and A. Cohen Instituut voor Perceptie Onderzoek, Insulindelaan 2, Eindhoven, Holland Received 23rd August 1973
Abstract:
An approach towards the study of Dutch intonation which has been called " perceptual analysis" has revealed the regular occurrence of a rather simple configuration, resembling a hat-shape. This papeJ; presents a number of rules according to which pitch contours for arbitrary utterances can be derived from this pattern. When tested on several kinds of new material, these rules turn out to be too rigid . On the same basis, other equally acceptable contours may be formed . In the material used, these alternative shapes do not give rise to changes in interpretation. On these grounds, the rules have been adjusted and extended with a number of "optional alternative rules".
Introduction This paper deals with some aspects of Dutch intonation, as revealed partly by a specially designed method , which is called perceptual analysis, and partly by some experiments with naive speakers and listeners. The results obtained may have relevance not only for Dutch, but also for the study of intonation in other languages. Although it cannot be said that the study of intonation has been neglected in recent years [see e.g. the survey of intonation studies over the past ten years given by Leon (1972)], it must be admitted that the results are somewhat disappointing. This is perhaps best exemplified by the failure of most presentday systems for speech synthesis-by-rule to generate acceptable intonation patterns for connected speech, because the basic regularities involved are not known. We feel that in the study of Dutch intonation we have come to a point where the basic regularities begin to become apparent. A first question which may be asked in the study of intonation is whether it is possible to formulate rules from which correct intonation patterns for a given sentence can be predicted, and what syntactic andfor semantic information is needed to do so. Even if we could not give a complete description of all possible correct intonation patterns of the language, we could at least try to describe a subclass of these. A first answer to this question is given in the section called " Perceptual Analysis of Dutch Intonation" . It is partly based on findings described in an earlier publication (Cohen & ' t Hart, 1967). In that section, the research method , the terminology and the transcription system used will be explained. It ends with the formulation of a preliminary set of rules . In the next section, under the heading of " Three Experimental Tests of the Rules" , the validity of these rules is checked against various samples of speech material. Not too surprisingly, the preliminary rules appear to be too strict. The section concludes with the necessary reformulation of the rules. A final section surveys the main results of this paper. At the same time, it shows the most serious shortcomings, some of which may , however, be coped with in future research following the same methods used here.
310
J. 't Hart and A. Cohen
Perceptual Analysis of Dutch Intonation
Approach We have restricted our investigations of Dutch intonation to pitch phenomena (although we are aware of the fact that in prosody other parameters may play their part as well). One of the main difficulties in the study of pitch phenomena as rel ated to intonation is that pitch registrations show much more capricious details than are relevant to the perception of intonation. This difficulty can be resolved by using a simulating technique which makes it possible to replace the natural pitch of a real uttera nce by an artificially generated and controlled voice-like periodic signal, while keeping the spectra l a nd temporal information of the utterance intact. This can be done with a Vocoder-like apparatus, called the Intonator (Borst & Cooper, 1957 ; Cooper, 1962 ; Willems, 1966 ; Willems & Loonen , 1967). The working of this instrument is as follows . A set offilters, each followed by a rectifier, measures the spectral composition of the input speech in term s of so many d.c.-voltages as functions of time. Meanwhile, a voiced- unvoiced detector serves to determine the choice of either a voice or a noise source. The signals from these so urces a re led to a second bank of, in principle, identical filters , whose outputs are regulated in amplitude by modulators, controlled by the d.c.-voltages obtained in analysis. The voice so urce is controlled by a function generator, by which the repetition rate is varied as logarithmic function s of time . With the help of the Intonator it can be shown th at, without changing the perception of intonation, the greater part of the natural pitch movements in an utterance can be replaced by a straight line of slowly falling pitch, which we have called the declination line whereas a limited number of other movements may not be smoothed out with impunity, but these can, without audible consequences, be stylized into rather steep, simple, standardized rises and fall s [though not as steep and as simple as Isacenko's & Schadlich's (1970) tone switches]. These movements are to be called henceforth the perceptually relevant pitch movements. Involuntary, merely physiologically determined pitch movements are not relevant to the perception of intonation. Their physical absence or presence should cause no mentionable perceptual difference in intonation (naturally this does not mean that they have no perceptual effect on the segmental level, e.g. as a cue to the voiced / voiceless distinction) (Haggard, Ambler & Callow, 1970). With respect to perceptually relevant pitch movements, work with the Intonator has shown that the intonation of an utterance may change clearly and abruptly with minor changes in F0 settings, whereas sometimes substantial F 0 changes do not give rise to overall changes in pitch perception . In other words, the extent of F 0 changes as such is no reliable measure for their perceptual releva nce. These effects sugges.t tha t there is no one-to-one relationship between voice periodicity and the perce ption of speech pitch ; rather, the listener interprets what he hea rs in terms of a limited set of recogni za ble patterns, which therefore may be assumed to possess a perceptual identity, and may even be considered perceptual units in some sense. We may interpret this experience by introducing a di stinction between two ways of listening to speech pitch, viz. broad and analytic listening. In broad listening, a listener receives an overall impression which he may use as a criterion on which to base a classifica tion of intonation al wholes. This capability of listeners has been demonstrated in a series of experiments in which their task was to sort or to match disparate verbal material merely on account of their melodica l resemblance as a whole (Collier & 't Hart, 1972).
Intonation by rule
311
In analytic listening, one tries to concentrate on the detailed melodic aspect of intonation while making abstraction of any relation to the verbal information. Analytic listening can be used as a genuine analysis technique if the ear is aided by instrumental means, viz. an electronic gate ('t Hart & Cohen, 1964) with which the attention can be focused on every detail, and the Intonator for controlled comparison of entire contours. In our approach, analytic listening is exploited in order to find the perceptually relevant pitch movements and to verify that these are indeed indispensable elements in a perceptually based representation of the speech pitch. Ultimately, the aim would be to try to find a link between the two ways of listening, in other words, to try to show which analytic elements can be held responsible for the perceptual identity of a given pattern. Some definitions We define a pitch contour as the stylized perceptual equivalent of the natural course of the fundamental frequency (or the inverse of the vocal cords periodicity) in an utterance. Figure I gives an illustration of such a stylized contour. 1 150 140 130 N
~
.c <)
a:
120 110 100 90 80
500 dors
Figure 1
X.
a
d a
E
n
1000 1500 Ti me(ms) )(.a ndawor\m£nw Ei
Example of a stylized pitch contour as seen against actual measurements, obtained in analytic listening with an electronic gate, of the pitch in the utterance "door schade en schande wordt men wijs".
The pitch contour is established by having the artificial voice of the Intonator perform the smallest number of such movements as are needed to achieve, between input and output signal, a resemblance which is satisfactory according to a judgment obtained in analytic listening. The movements of which the contour is composed are called the perceptually relevant pitch movements. 1
The lack of interruptions in the pitch contours drawn in the illustrations that follow seem to suggest that there are no voiceless stretches in normally spoken Dutch. The reason for this drawing convention is that it is not contrary to perceptual experience to regard as equivalent, e.g. an upward jump of pitch in case the voicing is interrupted by a voiceless fricative, and the upward pitch glide which can actually be measured if the voicing is continued. To avoid misunderstanding, the stylized pitch contour should be considered to be the graphic representation of the control settings for the fundamental frequency of the artificial voice of the Intonator. The proper alternation between voiced and voiceless stretches, which is taken care of by the voicedunvoiced detector in the system, will automatically cancel the a udibility of pitch movements in voiceless stretches, but preserves it if, in a comparaqle · utterance 1 those stretches happen to l:Je voiced,
312
J. 't Hart and A. Cohen
From the description of the procedure for finding the pitch contour it is clear that it is an essential property of these elementary units that the deletion or the undue insertion of any perceptually relevant pitch movement causes a clearly audible change in the perception of the pitch contour as a whole. Below, we will describe the characteristics of the most important perceptually relevant pitch movements. We define a basic intonation pattern as an abstract, mental category of intonation, underlying an actual pitch contour. On the strength of this postulated notion, one and the same utterance may have different intonation patterns, whereas different utterances may have pitch contours that are derived from the same basic intonation pattern. A pitch contour may be the realization of one basic intonation pattern, or a concatenation of such patterns, in their elementary shape or in more complicated forms which can be derived from them. Basic regulatities A good example of a basic intonation pattern is the hat-shaped pattern. Examples of this pattern and of some of its realizations are given in Figs 2 and 3. There exist a number of basic patterns, this shape being the most frequently occurring of these. In this paper, we will restrict ourselves to realizations of the hat-shaped pattern. ~
12
"' -~ 10 c
j
.<::
8 6
u
t.
Cl:
2 0
0·5
0
Time (s)
Figure 2
Elementary shape of the hat-shaped pattern.
(a) "Pointed" hat 1 pitch
accent
(b)
3 pitch accents
(c)
(d) Two hat patterns linked
by
continuation
(e)
Hat final
Figure 3
pattern with rise
Various pitch contours derived from the hat-shaped pattern.
Intonation by rule
313
The most typical pitch contour corresponding to the pattern in its elementary shape is given in Fig. 2. On a low line of declination are superimposed, successively, a rise and a fall; in between them, we find an upward shifted stretch of declination line. A characteristic property of the hat pattern is that both the rise and the fall lend perceptual prominence 2 to the syllable in which they are realized. Both are pitch accents. Thus, a sentence spoken with the pitch contour of Fig. 2 has two prominent syllables. This immediately brings us to the intimate relation between the syntactically and semantically dominated requirements of accentuation and the pitch contour. This relation was first fully recognized by Bolinger (1958). The accentuation depends on what we call the dominance of the words. One or more of the words in a sentence may be dominant because of their semantic role in the communication situation. At present, the dominance of words cannot be predicted from explicit rules. Once it is known which words are dominant, and which syllables ofthese words are lexically stressed, a correct pitch contour can be derived from the underlying hat pattern, each lexically stressed syllable of a dominant word receiving a pitch accent. The first one is always realized by a rise, the last one by a fall. How this works out in cases with less or more than two dominant words will be explained in dealing with a number of examples of pitch contours derived from the pattern as presented in Fig. 3. Figure 3(a) shows the situation for a contour with only one pitch accent: rise and fall coincide on the lexically stressed syllable of the only dominant word . Usually, we call this contour a "pointed hat" . However, the contour sounds more natural if between rise and fall several tens of ms of high declination line are inserted. Figure 3(b) and 3(c) show possible solutions for a contour with three pitch accents: the first and the second one are realized by means of rises, the last one by a final fall or a rise-fall. Before a prominence giving rise, the pitch has to be low, so that between the two rises it must be "reset" by some kind of fall without the introduction of an extra prominence (non-final fall); if the third pitch accent is realized by a rise-fall, the non-final fall occurs twice. A similar regularity can be found in cases with more than three pitch accents. A different situation is one in which the sentence is broken up by an obvious syntactic boundary, or in other words, when the sentence is composed of two rather independent parts that have to be linked. Figure 3(d) gives an illustration of the contour which can very often be found in such a situation. In both halves, we find the normal pattern; their being linked up is intonationally marked by a rapid upward movement "before the comma", followed by low resumption of pitch at the start of the second part. It seems convenient to follow Delattre, Poenack & Olsen (1965) by using the term "continuation" for this phenomenon. 3 Finally, in Fig. 3(e), we see the hat pattern of Fig. 2, but the contour ends with a rise instead of mere declination. It is important to note that in Intonator-simulated pitch contours, the continuation rise [of Fig. 3(d)] and the final rise [of Fig. 3(e)] have the same specifications. This means that possible objective differences between the two are not perceptually relevant. Both with the rises and the falls, we find a distinction between pitch movements which lend prominence and others which do not. Below, we will discuss some of the conditions which make such a division possible. 2
The term "prominence" is taken in the sense of Jones' definition (Jones, 1962) and in line with Rigault's (1962) "proeminence". 3 Th is term therefore replaces the previously used one "caesura". We have not found clear evidence for the necessity to distinguish between the socalled major and minor continuation as advocated by Delattre, Poenack & Olsen (1965).
314
J. 't Hart and A. Cohen
The examples of Fig. 3 are stylized representations of the course of the pitch in real speech. When implemented on utterances by means of the Intonator, they are perceptually equivalent to the original intonation. For that reason, they may be considered adequate descriptions of relevant characteristics of pitch in spoken Dutch . The perceptually relevant pitch movements with which the contours are built can from now on be used to construct new contours, which in their turn can be tested for perceptual adequacy. In doing so, we were able to show that, simply sticking to standard values for the elementary movements, they can be applied to assemble a suitable contour for an arbitrary, different test sentence. Thanks to this property of their being reproducibly usable elements in constructing satisfactory pitch contours for great numbers of utterances, it seems legitimate to make an inventory of them.
The inventory of the perceptually relevant pitch movements and their transcription For a first description of pitch contours in Dutch we will define here six elementary pitch movements, viz.: declination line, low and high; two types of rise, the one with prominence, the other without; two types of fall, the one with prominence, the other without. Later in the paper this basic inventory will be extended with two movements that occur in the experiments to be dealt with in the next section . Declination. Symbols "0" and "0" . Perceptually equivalent to the vacillating course of the pitch during those parts of the contour where no important changes take place, is a slightly downward tilted line, the declination line. In between a rise and a subsequent fall it manifests itself at a high level. For that reason, it seemed feasible to distinguish low and high declination line by using the symbols "0" and "0" respectively. For short utterances an F 0 decrease of3 % every 100 ms is a useful standard value, which fits fairly well with the measurements. However, when the utterance becomes rather long, this fixed decrease results in too low a pitch towards the end. Therefore, the circuitry of the Intonator has been modified so as to have a slope of the declination line which is variable as a function of time: at the start, it is still about 3 %, or half a semi tone, per 100 ms, but this value is gradually decreased to a fixed value of0·5 % per 100 ms after 5 s. Prominence lending rise. Symbol" 1".The upward movement of the pitch which is found in the majority of the prominent syllables can be simulated by a movement which raises the pitch about four semitones in I 00 ms. It is subjected to well-defined positional constraints with respect to the voiced part of the syllable to be accentuated (Van Katwijk & Govaert, 1967; Van Katwijk, 1969; Collier, 1970). From these experiments it appears that the optimal position is rather early in the syllable.4 Non-prominence lending rise. Symbol "2". The pitch rise that can be found in case of a continuation and at the end of the very last syllable of an utterance, is well distinguishable from rise "I" by the fact that it lends no prominence. This is mainly caused by the typical position of this rise, viz. as late in the syllable as possible, and thus just opposite to the requirements for accentuation. Apart from this difference in position, in the Intonatorsimulated version of this movement, the settings for duration and slope may be taken the same as for rise" l " . In order to facilitate the distinction between rises" 1" and "2", in our 4
The location in the syllable of a pitch movement is defined and measured with respect to the vowel onset. Due to the perceptual inaccuracy which may amount to several tens of ms, we are not able to say if, in the case of a prevocalic consonant cluster of e.g. a voiceless stop and a liquid, such as /pr-/, the reference point should be the onset of voicing, in this case of /r/, rather than the vowel onset.
Intonation by rule
315
drawings of pitch contours rise " 2" is sketched as if it were concave, but this distinction is not perceptually relevant. Final.fall, prominence lending. Symbol " A". In the kind of utterances to be discussed in this study, the last prominent syllable of the utterance, or the last prominent syllable before a continuation has a falling pitch. This movement is the final fall " A" . Its prominence lending capacity has been studied in Van Katwijk & Govaert (1967) and Van Katwijk (1972). As with rise " 1" ,the prominence lending effect is dependent on its position in the syllable: the final fall, however, should occur rather late in that syllable. " A" lowers the pitch about five semitones in 75 ms. Non-final fall, non-prominence lending. Symbol " B". " B" is the non-final fall which has to occur between two successive prominence lending rises to provide the necessary " reset" of pitch after the first rise . Since " B" should not lend prominence, the pitch has to fall rather early in an unconspicuous syllable, or even so early that it can be considered to fall in between (the voiced parts of) two syllables. Intonator settings for duration and slope may be taken the same as for " A" , but in actual speech, " B" tends to fall less deep than does " A". We transcribed the contours of Fig. 3 in terms of the symbols mentioned above. Each successive syllable is assigned at least one symbol. When necessary, two or more symbols are transcribed for one syllable, e.g. " 1&A&2" with " &" as a link between them to show the difference with the sequence " I A2" where the same succession of pitch movements occurs over three syllables. Since the movements mentioned may have a shorter duration than that of an entire syllable, the remaining part of it will have declination . Nevertheless, transcriptions like " 1&0", " 0&A " or "0&2" are not used , since these would be redundant. The convention of reserving digits for rises, letters for falls, as used above, was maintained when , later on, the original inventory was extended. The distinction between rises " ! " and " 2", falls " A" and " B" is dependent on whether or not these movements give prominence. As was said above, the occurrence ofprominence is related to the position of the pitch movement with respect to the vowel onset. But, in spite of this seemingly clear relationship, it is by no means easy to establish whether or not a given rise or fall gives prominence from objective registrations of F 0 . Rather, it is thanks to the experimental facilities given by the Intonator that we have been able to reveal this systematic relationship. To illustrate this point, two examples will be given in which only slight differences in timing give rise to clear perceptual effects, switching the listeners' judgments from prominence to non-prominence, a nd vice versa. Ex ample I. If in the sentence " En daarom ging ik toen maar niet naar huis" (see Fig. 4) the fall on " ging" comes early (non-final fall), the interpretation can be something like: "And for that reason I decided not to go home". But if that fall comes about 100 ms later, it becomes a final fall, thus offering the possibility of an interpretation in terms of two sentences : " And for that reason I went. But I didn't go home. " "En do o r om ging
ik t een
maa r ni et
n c ar hu i s"
·---"------..../
Figure 4
Example of a situation in which a shift over several tens of ms of a fall causes it to change from non-final fall into final fall , thus giving rise to a different interpretation .
The other example illustrates the effect of the location on the distinction between the two rises " 1" and " 2".
316
J. 't Hart and A. Cohen
Example 2. The test sentence was "De donder zie je niet de bliksem". In (1), the rise on "niet" was located late, so that it became a rise "2", being part of a continuation, which made "niet" belong to " donder", and the interpretation was that you do not see the thunder, but rather the lightning. In (2), a shift of that rise from "late in 'niet'" to "early in 'niet'" (which amounted less than 50 ms in the experiment, in which there was no possibility to alter the temporal build-up of the utterance) made "niet" prominent, and the interpretation changed into one in which "niet" belongs to " bliksem": "this time you see the thunder, and not the lightning". "De donder
z ie
je
niei
de
bl i ksem"
2
Figure 5
Example of a situation in which a shift over several tens of ms of a rise causes it to change from non-prominence giving rise into prominence giving rise, resulting in a different interpretation.
Preliminary rules Some rules for generating acceptable 5 pitch contours for Dutch can now be formulated in terms oflntonator control settings, which stand for perceptually relevant pitch movements. Given an arbitrary utterance, first decide whether it is (a) one utterance as a whole, or (b) one which is split into pieces by one or more major syntactic breaks. Regardless of this decision, start to make low declination " 0", according to the specification given above. In case (a): Let certain words be given as dominant. One dominant word. Put a rise" I " and a fall " A" on the lexically stressed syllable of the dominant word. Proceed with low declination line " 0" [cf. Fig. 3(a)] . Two dominant words. Put a rise "1" on the lexically stressed syllable of the first dominant word and a fall " A" on that of the second one. Make high declination "0" between "1" and "A" (cf. Fig. 2). Three dominant words. Put a rise" 1" on the lexically stressed syllable of the first dominant word. Keep the remaining syllables of that word after the rise high ("0"). Make a non-final fall "B" immediately after that word. Put rise "1" on the lexically stressed syllable of the second dominant word, and a fall " A" on that of the last dominant word. Make high declination line between the last "1" and " A" [cf. Fig. 3(b)]. For more than three dominant words, the rule is essentially the same as for three dominant words, so that every pitch accent is realized by a rise, except for the last one. Between every two rises a non-final fall is inserted, between the last rise and the final fall there is high declination line. 5 Hitherto, adequate pitch contours were defined as those concatenations of perceptually relevant pitch movements as could be judged perceptual equivalents to the intonation of the input sentences of which they were Intonator stylizations. Pitch contours to be generated according to the rules to be given here are not stylizations of the course of the pitch in actual utterances used as input material for the Intonator. Therefore, they should be matched against an internal criterion about the possible existence-in the language at issue- of the course of the pitch in the given utterance, to which the contour should be perceptually equivalent. A contour which meets this requirement is called an acceptable contour.
Intonation by rule
317
In case (b): In case of a major syntactic break, a continuation should be made. On either side of the break, the same procedure is followed and mentioned for case (a). The continuation is made by a rise " 2", very late in the last syllable before the break, immediately followed by an inaudible fall to the level of the extended low declination line [cf. Fig. 3(d)]. The rules presented here constitute a necessary minimum according to which-within the frame of the hat-shaped pattern and its derived contours-acceptable pitch contours can be generated. But even within this limited frame, they are not sufficient to account for a number of phenomena which we have discussed already. For one thing, in Fig. 3(c), we have shown an exception to the rule which says that the penultimate pitch accent is given by a rise, the last one by a final fall, with high declination line between them: Fig. 3(c) shows a variant with a rise-fall for the last pitch accent and hence a non-final fall after the preceding one. This variant can often be observed in cases where .. . 10 . .. 0A ... would have been an equally acceptable solution. For another, the rules cannot predict whether or not the contour will contain an utterance final rise "2" [cf. Fig. 3(e)]. The reason is that contours with this rise can be observed to occur in a variety of situations of a diverging nature : in tag words (in Dutch "he" and "hoor"), in several kinds of question , in personal address (vocatives), and in a number of cases which cannot be characterized with one term (cf. Bolinger, 1958, note 29, p. 125; Bolinger only mentions the downward obtrusions in his examples, but they all contain the final rise as well). Thus, we are not only incapable of making a list of situations in which the final rise is bound to occur ; another difficulty is that in most of the situations given above this type of rise is not necessarily present. Three Experimental Tests of the Rules The results presented so far have been obtained in the analysis of isolated sentences, which have been taken from spoken news bulletins, radio comments and interviews, as well as from spontaneous conversation. These sentences were taken rather arbitrarily, i.e. without any explicit selection criterion. At this point however, the material to be chosen should reflect the passing on from the explorative stage to a purposeful examination. A proper verification of the rules would deal with the following questions: How far can the rules be confirmed, and , can there be found systematic deviations from the rules? To answer these questions, we aimed at obtaining several realizations, by various speakers, of same sentences. From the speech material we required that it would not allow of seriously diverging interpretations. This implied the condition of having subjects read aloud written, prescribed texts. Confirmation of the rules might well be favoured by this condition, but if we would find deviations, it would mean that these have to be expected in spontaneous speech a fortiori. Two different corpuses were taken as material. The first one, a fictitious business letter, would introduce the aspect of the sentences being connected into an entire text; the second one, a set of proverbs, was chosen mainly for their supposedly conventionalized intonation . Unlike the procedure thus far, according to which every sentence to be analyzed was processed on the Intonator to find its decomposition into perceptually relevant pitch movements, the analysis of more extended material required the introduction of a somewhat less laborious, though still reliable way of obtaining a transcription. Such a transcription is actually carried out by means oflistening to the repeatedly played back recordings in the first place. But this remains a risky procedure. Although the
318
J. 't Hart and A. Cohen
descriptive units in terms of which the transcription is made have been shown to be perceptually relevant units, with rather coarse and easily recognizable specifications (up or down, prominence lending or not, otherwise declination low or high), and although the listeners may be regarded as highly qualified through long trai ning with Intonator-aided analysis, it is imprudent to rely on this procedure alone. Therefore, in order to be more certain about our results, we checked them by a procedure of simultaneous listening and visual inspection of the output of a pitch meter (Willems & De Vries, 1970) on a large screen oscilloscope. In cases of doubt, there were stili two other means, viz. analytic listening with the aid of an electronic gate and, of course, Intonator simulating.
First corpus: connected text The text consisted of 125 words, or 230 syllables, forming seven rather long sentences. It was read aloud by four subjects twice. The eight recordings obtained were subjected to an analysis as described above. Rather than giving the full transcription of the material, we will discuss clear examples of a number of systematic deviations for the rules. Of the total of 56(= 8 x 7) sentences, only 4 appeared to have been realized in the way the rules would have predicted (each subject made one entirely "correct" realization). More detailed, there were 174 instances in which the rules were violated. Of these, 45 had to do with the rules about the continuation ; in 82 cases, the non-final fail was involved; in an additional26 cases, where the non-final fall was involved , the departures from the rules might have been influenced by other irregul arities of the contour, such as the use of a different basic pattern ; the remaining group of 21 contained violations of a diverging nature . We shall concentrate on the most frequently occurring deviations, viz . those concerning the continuation and the non-final fall. In the text, there were I 2 situations where on syntactic grounds continuations were likely to occur, and in 87 of the 96 corresponding realizations, the speakers actually gave the audible impression of performing an appropriate intonational gesture. In 25 of these cases, the pitch contour was in full agreement with the rules, viz. rise "1 " for the penultimate pitch accent, high declination line "0"; fall " A" for the last pitch accent, low declination line " 0", rise "2", etc. [Fig. 6(a)]. In 17 cases, we observed the variation which has been discussed above, viz. rise " I", a short stretch of high declination line "0", non-final fall "B", low declination line "0", rise + fall " 1&A", low declination line " 0", rise " 2", etc [Fig. 6(b)]. (This variation was also observed in quite a number of comparable utterance final parts of the contours.) In 35 cases, there was no final fall at all [Fig. 6(c)]. The last two pitch accents were both realized by a rise" l "-with a non-final fall in between them- and the continuation was formed by maintaining the high pitch level after the last rise, and a simple resumption of low pitch after the comma. Of the remaining 10 continuations, 7 seemed to contain an unknown feature, viz. a movement which might be characterized as a " half-fall " [Fig. 6(d)]. The indispensable check on the Intonator revealed that it can indeed be stylized by means of a fall of the same slope as have falls " A" and " B" , but with a duration of 25 to 50 ms . This movement has been given the symbol "E" . The three remaining cases did not allow of considering them as belonging to one class. Figure 7 shows the two clearest examples in which all the shapes are present that have been discussed above. (For the sake of simplicity, in all following figures the declination will be omitted .)
Intonation by rule
(a)
319
--- ~
__r- ---
25
(b)
17
-- -~'',__/35
(c )
-- -~ (d)
Other
indistin c t
f o rms
To t a ls
Figure 6
87
Different shapes of pitch contours observed in continuation constructions in the first corpus.
re la i i e s .
vo n ve r sc h i\le n de R eali z oti on
no.
6
0 1 2 3 5 0
0
0
0
0
A
_r--',,_/\_/
0
0
0
0 B
1&A 2
7 ,8
II Realiza t i on
no.
wij
n i ei s
a nbepr aef d za u de n I a t en.
3 .5 .6 0 1 2 0
0
A
0
0
G 2
______/"'·,~ 0
0
1
B
0
B& 1
B
B
Figure 7
0
1&A
0
0
0 2
lJ
lJ
I'J I'J
0
0
E 0
Two fragments from the first corpus in which all the forms of Fig. 6 have been used.
J. 't Hart and A . Cohen
320
Of the 170 cases where apparently the speakers intended to make a non-final fall, only 62 were in precise conformity with the rules. If we discard the 26 complicated situations mentioned above, we are left with 82 deviations in which merely the non-final fall was involved. There were 25 instances where the slope, and 57 where the position did not obey the rules . With the latter, 45 falls came too early , and 12 too late. In the case of a deviating slope, it was as if the fall was spread over all the syllables between the two successive rises, rather than that it had one distinct position. This movement has been given the symbol " D ". Figure 8 shows an example in which, among the eight realizations, the various deviations discussed above can be observed. A non-final fall which comes too early may even occur in the syllable that bears the preceding rise (" 1 &B"). "In
Reali z ation
~',,___/
A cco rd i n g t a ru1 e s
0
l oi
Gradual sl ape
Figure 8
1
0
B
0
__/' ,,~
T oo e a rl y
Too
antwoo r d op Uw brief
no.
0
1
B
0
0
0
1
0
0
0
0
1
0
0
0
e
2 0 I, 0 5 08
0
F ragment from the first corpus in which the non-final fall shows various deviations from the rules.
S econd corpus: proverbs It seems obvious that a revision of the rules has already become necessary. But it could be unwise to carry out a merely ad hoc adaptation without first considering the following. The transcriptions which yielded the variations of the continuations and the non-final falls were achieved in a number of operations of which analytic listening is one. In view of our conceptions about the difference between analytic and broad listening, we must now ask what is the effect in broad listening of the variations which are observed in analytic listening. Now, if there are good reasons for a distinction between the two ways of listening, one of them must be that not every alteration on the analytic level will result in an effect which is audible in broad listening. In order to check this expectation in more detail, we try to examine the following aspects of the relation between analytic variations and their effect in broad listening : (1) (2) (3) (4)
To what extent do they affect the interpretation? Do they produce differences in acceptability? Moreover, it seems interesting to ask : Do they show a fixed correlation with syntactic structure? Are they systematically related to a particular speaker ?
The conditions of the following experiment to be discussed allow of an evaluation of these questions . We decided to take proverbs as material for a new experiment. Thus, again, the experimental conditions were such that it seemed rather unlikely to expect many
Intonation by rule
321
deviations from the rules, since proverbs are not context bound, have only one interpretation, and, expectedly, have conventionalized intonation. The entire set contained 28 proverbs of simple syntactic construction, all of them being well-known to all speakers and listeners. They were typewritten on separate cards and presented for citation to six speakers. The speakers had the opportunity to prepare themselves to the citation of each proverb since they manipulated the remote control switch of the tape-recorder. As opposed to the previous experiment, extra care was given to the aspect of acceptability. It was judged by a panel of ten listeners, five of whom judged the material twice. Use was made of a five point scale, so that the theoretical maximum total score per citation was 75. In about half of the discarded cases of the previous experiment the use of a different basic pattern was involved. With the proverbs, all citations in which patterns different from the hat were used got typically low acceptability scores. On account of the shortness of proverbs, this material does not lend itself to further analysis of the non-final falls. In 14 proverbs however, the occurrence of a continuation was expected and in nine of them, at least four out of the six speakers actually made it. We concentrated on the manifestations of the continuation in these nine proverbs. 6 We considered only those citations that were judged sufficiently acceptable (score at least 50). The results are collected in Table I. Table I
Survey of the use of the three variants of the continuation in nine proverbs, by six speakers
Variants "1A2"
"IE"
"10"
Speakers Proverb no. FL HB IS JH LB WK FL HB IS JH LB WK FL HB IS JH LB WK 1 2 3 4 5 6 7 8
9
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X X
X X
X
X
X X
X
X
X
X
X
X X
X
It shows that e.g. proverb no. 5 was said with continuation type "1 A2" by HB and WK, with type" 10" by FL and IS, with type" 1E" by JH. The citation of this proverb by LB was not judged sufficiently acceptable. In only one proverb (no. 9) the three accepted speakers 6
The nine proverbs were: (1) Als het kalfverdronken is, dempt men de put. (When the steed is stolen, the stable-door is locked). (2) Hij heeft de klok horen luiden, maar weet niet waar de klepel hangt. (He has heard a bell ring, but he does not know what is what.) (3) Wie een kuil graaft voor een ander, valt er zelf in. (Harm watch, harm catch.) (4) Een man een man, een woord een woord. (A bargain is a bargain.) (5) De mens wikt, maar God beschikt. (Man proposes, God disposes.) (6) Een vos verliest wei zijn haren, maar niet zijn streken. (What is bred in the bone will come out in the flesh.) (7) Als de vos de passie preekt, boer pas op je ganzen. (Beware of the geese, when the fox preaches.) (8) Spreken is zilver, zwijgen is goud. (Speech is silver, silence is gold .) (9) Wie eens steel!, is altijd een dief. (Once a thief, always a thief.)
20
J. 't Hart and A. Cohen
322
used the same type of continuation. In all others at least two types have been used. This outcome excludes a possible rel ation to syntactic structure within the frame of this material. Two speakers seem to have a preference for one type of continuation, viz. H B for " I A2", IS for "10". In general, however, the speakers used more than one type, and HB and IS formed no exception to it. Therefore there is no reason to suspect a systematic relation to a particular speaker. This outcome suggests that we have to extend the rules with a number of "optional alternatives" in such a way that the application of either the original rules or the alternatives- irrespective of the speaker and of the syntactic structure- will result in the generation of contours that do not lead to differences in interpretation and that all are equally acceptable. 7 Test of optional alternative rules If we are entitled to extend the rules with the optional alternatives, we should be able to predict that, in a large enough number of realizations of an arbitrary sentence, which is only complicated enough to permit the application of optional alternative rules at all, the extended rules will indeed account for all the corresponding variations, across different speakers, without differences in interpretation and without systematic differences in acceptability. In other words, every variation which can be generated by the rules will occur, and every observable variation will be covered by the rules. To take up this challenge, a sentence was chosen from an architectural description, such that, on the basis of the various possible rules, twelve different contours could be predicted ('t Hart, 1968 ; 't Hart, 1972). The sentence was :
"De dwarsbeuk en bet koor stammen uit de dertiende eeuw, The transept and the choir date from the thirteenth century de lengtebeuk en de torens uit de veertiende euuw" . the nave and the towers from the fourteenth century. The italicized syllables were expected to be accentuated . For the continuation, we expected the forms " 1A2" and "10" , but not " IE", since the test sentence had three syllables following the last accentuated one. (See the detailed description of "E" below). The non-final falls after " dwars-" and "!eng-", and in the case of continuation type " 10" also after "koor", would be either early (on the same syllable as the one with the rise) or late (according to the original rule) . The variant with " D" was expected not to be distinguishable from that with "B", because of the small number of syllables available. Thus six shapes were predicted for the first part of the utterance, two for the second, making twelve in all (see Fig. 9) . Nos. 1, 2, 3 and 4 have continuation type " 10", 5 and 6 have " lA2" ; 2 and 6 have an early non-final fall on " dwars-" , 3 on "koor" , 4 on both "dwars-" and "koor" . In the second part version b has an early non-final fall on "!eng-". Forty-four readings, partly prepared, partly unprepared, were elicited from eleven speakers. Owing to various deviations from the predicted, " ideal" contours, it was not possible for each of the 44 realizations to identify it as one of the twelve in its full shape, but if we considered only the locations of the non-final falls and the type of caesura, we were able to classify them all fairly easily. Figure 10 shows the response distribution. It suggests 7
ln line with the traditional terminology, we could consider the phenomena to be described in the optional alternative rules to be free variants of the shapes generated by the original rules.
Intonation by rule ''De dw a r s-
323 I e n g-
der -
koor
\ 0-
veer-
1 ~'·.~',~ 2 ______/',,_______/'· ,~
3 ~· ·._/',,_______r--4
______/'',,__/'·.~
5~··,~
6 ____/ '·-~ Figure 9
Twelve possible contours for the sentence "de dwarsbeuk en het koor stammen uit de 13e eeuw, de lengtebeuk en de torens uit de 14e eeuw".
that in a large enough sample any of the twelve contours would have been used (the absence of no. 4a in the present sample was considered accidental). This outcome, showing that for a sentence which so clearly allows of no more than one interpretation people are capable of generating so many acceptable variants in the pitch contour, may be considered valuable evidence in favour of the introduction of the optional alternative rules. We will indeed try to formulate these below. With respect to this experiment, two phenomena call for further comment. The distribution of Fig. 10 shows three pronounced peaks, viz. for contours la, Sa and 6b. Of the eleven first readings, in which the subjects were forced to read immediately after the text had been revealed , eight were realized with contour la. We are now able to understand why this contour is so much preferred, particularly in an unprepared situation: this contour is generated by the simplest rules for acceptable intonation, according to which every syllable that has to be accentuated gets a rise, except the last one, which has the final fall, and according to which, between every two rises a non-final fall has to be inserted. The alternative shape of the continuation " 010" fits in easily with this formula thanks to the resemblance of the resumption of low pitch to the non final fall: the only difference is that the location of the former is fixed, to the effect that in this test sentence the syllable "eeuw" (of the first part) should be kept high . We have called this type of contour the " emergency contour", because of its frequent occurrence in conditions of high speech rate, or in other cases where the speaker is deprived of his possibilities to co-ordinate the various parts of an utterance (such as in the enumeration of a list). Although its name could suggest otherwise, the application of the emergency contour ensures an acceptable intonational result for almost every situation. 15
"'"'u
~ 10
:J
u
u
0
0
5
Q; _!)
§ z
o L-~--~---L---L___ L_ _J __ _~--L-~--~---L--~
1o
1b
2o
2b
3o
3b
4a
4b
5a
5b
5o
6b
Contour type
Figure 10
Frequency of occurrence of the twelve contours of Fig. 9 in 44 rea li zations.
324
J. ' t Hart and A . Cohen
The peaks in the distribution of Fig. I 0 for nos. Sa and 6b could be shown to reflect a certain preference for " intonational rhyme" , i.e. the structural parallelism between the two halves of the utterance is intonationally expressed by the use of the same variants. The capabi lity in the subjects to do so indicates that at some programming level representations of intonation contours of the size of at least half of a compound sentence like the test utterance are present.
Extension of the inventory: adaptation of the rules The systematic occurrence of two new perceptually relevant pitch movements as has been demonstrated above, leads to the following extension of the inventory of these elementary movements. Gradual fall. Symbol " D ". The non-final fall " B" can be replaced by a gradual fall which extends itself over all syllables between two prominence-giving rises. Its slope and duration depend directly on the number of these syllables (and their durations). The difference between " B" and "D" is only audible in very attentive listening, although it is fairly well noticeable in objective measurements . Half-fall. Symbol " E". This movement is called a half-fall since after its occurrence the pitch makes the impression of being neither " high " (as in " 0" -stretches), nor "below" (as e.g. after " A" ) but somewhere half-way. We have verified this impression by measurements and by Intonator simulation . It is clearly distinguishable from other falls by its curious side-effect of suggesting a jump of pitch over a musical interval, viz. a minor third (which cannot be confirmed in objective measurements). As far as our experience goes, no other pitch movement is capable of consistently giving a comparable impression . Since the introduction of a third level of declination seemed an unnecessary ad ho c measure, we had the choice, for the stretch of pitch after " E " , between the symbols "0" and " 0". We chose "0", since this gives the advantage of making it possible to indicate where the resumption of low pitch takes place. It should be noted that we have never observed situations in which there were more than two syllables between the one with "E" and the comma. Furthermore, our observations lead to the conclusion that " E" is not necessarily connected with prominence. In the kind of experiments dealt with in this section, it cannot be proved whether or not the inventory of elementary movements of hat-shaped patterns and their derivatives is complete. However, this defect need not prevent us from summarizing the observed phenomena and the regularity of their occurrence, checked, as far as possible, in the last experiment. To this end , we reformulate the preliminary rules in the following way : Case (a)-one utterance as a whole. In utterances with more than one dominant word :
either: put rise " 1" on the penultimate stressed syllable and fall " A" on the last one, with high declination line between them [Fig. 3(b)]. or: put rise " 1" on the penultimate stressed syllable, make a non-final fall (see adjusted rule for non-final falls) after it and put a rise-fall " 1&A" on the last stressed syllable [Fig. 3(c)]. Adjusted rule for non-final falls:
either: make fall " B" anywhere between two successive rises "1 " (this may interfere with the requirement that the non-final fall should not lend prominence. Thus, e.g. locating it on the lexically stressed syllable of a non-dominant word should be avoided) (Fig. 8).
Intonation by rule
325
or: make a gradual fall "D" during all the syllables between two successive rises "1" (by way of standard recipe, its slope should be adjusted to such a value as to arrive at the extended low declination line just before the next rise) (Fig. 8).
Case (b)-continuations. either: follow one of the procedures for the last part of contours without continuation, put rise "2" on the last syllable before the comma and proceed at the level of the extended low declination line [Fig. 6(a, b)] . or: make a non-final fall (see adjusted rule for non-final falls) after rise "1" on the penultimate stressed syllable (before the comma), put a rise "1" on the last one, keep theremaining syllables high and drop the pitch to the level of the extended low declination line after the last syllable before the comma [Fig. 6(c)]. or: put rise " 1" on the penultimate stressed syllable, half-fall "E" on the last one (only applicable if the number of syllables between the last stressed syllable and the comma is low) , make declination on the thus attained " half-way" level until the pitch is reset to the low level after the comma [Fig. 6(d)] . If the last stressed syllable is considerably more than just a few syllables ahead of the comma, another possible continuation construction with "E" is one in which that syllable is given a rise" 1",while "E" is put on the lexically stressed syllable of the last word , or on the last word anyway if it is monosyllabic (that word is not unmistakably dominant, and "E" does not clearly lend prominence, cf. Fig. 7, II). Discussion and Conclusions
The rules, with which pitch contours can be constructed in which standardized pitch movements act as building elements, constitute a workable frame in that, for one thing, they generate contours which can be shown, in a reproducible way, to be judged as specimens of acceptable intonation. For another, the generated standard contours might well, in a future, more refined description , serve the purpose of being a reference with respect to which possibly systematic occurrences of more subtle phenomena can be described . It seems as if a serious restriction to the predictive power of the rules is constituted by the fact that their application depends on the knowledge of which words are to be considered dominant. And, as was mentioned before, we have as yet no explicit rules from which the dominance can be predicted. However, without wanting to underestimate the importance of this problem, we feel free to say that it is not a problem which should be solved in the analysis of intonation. Moreover, our rules operate in such a way that, whatever the place and the number of the dominant words may be, they invariably generate a correct pitch contour, i.e. a change in place or number of the dominant words does not call for the application of another set of rules. In conclusion, we may briefly survey our findings in terms of two claims of different nature, the first one about the influence on the interpretation of an utterance of the application of optional alternative rules, the second one about the fruitfulness of our method of perceptual a nalysis. First, in Section 2, we have explained that we can find perceptually relevant pitch movements by looking, in the Intonator simulated contour, for those movements that may not be deleted or inserted or otherwise considerably changed without causing a major change in the perceptual identity of the contour as a whole. But in Section 3 we have shown that it is yet really possible to introduce changes which are audible in analytic listening, but which
326
J. 't Hart and A. Cohen
do not lead to such a major change in the perceptual identity. This has been demonstrated to be the case with clearly measurable alterations in the location or the slope, or even in the direction of a perceptually relevant pitch movement. This may be considered one of the implications of the difference between analytic and broad listening: indeed, one would expect more "sameness" on the level with the higher degree of abstraction than on the analytic level. It could be wise to explain herethatalthoughhavingfound the expected "more sameness", we are not able to claim that in all cases where the contours differ from one another in the way described, there will be no difference in interpretation . Rather, we have been able to show that alterations of the contour do not inevitably and consistently result in changes of interpretation, as seems to be the generally adopted view. Thus, e.g., in his introduction to Part Three of his bundle on intonation, Bolinger (1972, p. 155) describes " ... two ways of separating the clauses of a sentence .. .", which are almost exactly equal to our types "1 A2" [Fig. 6(a)] and " 10" [Fig. 6(c)] for continuation. According to Bolinger, the former type is used-in English-" ... when the speaker intends the first clause to be viewed as a new idea ... ", the latter, " . . . if it only repeats what has gone before ... ". N otab1y, the latter intonation " ... is common on other expressions that are 'not new', such as folk sayings: Easy come, easy go; One for the money, two for the show; ... ".Unless it is completely impossible to compare two different languages with respect to these phenomena, our findings clearly contradict Bolinger's position. Second, thanks to our method of perceptual analysis we seem to have been able to lay bare the major part of the most important features of one basic pattern of Dutch intonation and of the various contours that can be derived from it. Provided that the contour will be one that is based on the hat pattern, and if the places where pitch accents have to be realized are given , we are able, by means of the application of a set of rules, to construct stylized pitch contours that are claimed to be perceptually equivalent to the intonation of actually spoken realizations of the utterance under consideration. Surely, this leaves at least two important questions unanswered, viz. can we find rules for the stress assignment, and can we unravel the characteristics of other basic patterns and their derivatives? As for the former question, our method of perceptual analysis may not be able to give a substantial contribution to the solution of that problem. But we have every reason to believe that the application of this method in trying to solve the latter problem will appear to be fruitful. According to the same belief, we think that this method may equally successfully be applied to the study of intonation in other languages. Of the numerous people who have helped us to shape and reshape the form of this article and whose assistance is hereby gratefully acknowledged, we want to mention by name Dr S. Nooteboom for his essential contributions. References Bolinger, D. L. (1958). A theory of pitch accent in English. Word 14, 109- 149. Bolinger, D. L. (Ed.) (1972). Intonation . Harmondsworth: Penguin. Borst, J. M. & Cooper, F. (1957). Speech research devices based on a Channel Vocoder. Journal of the Acoustical Society of America 29, 777. Cohen, A. & 't Hart, J. (1967). On the anatomy of intonation. Lingua 19, 177-192. Collier, R. (1970). The optimum position of prominence lending pitch rises. !PO Annual Progress Report 5, 82- 85. Collier, R. & 't Hart, J. (1972). Perceptual experiments on Dutch intonation . Proceedings of the Vlfth International Congress of Phonetic Sciences, Montreal, 1971. The Hague, Paris: Mouton. pp. 880- 884 Cooper, F. S. (1962) . Speech synthesizers. Proceedings of the 1Vth International Congress of Phonetic Sciences, Helsinki 1961. The Hague: Mouton. pp. 3-13.
Intonation by rule
327
Delattre, P., Poenack, E. & Olsen, C. (1965). Some characteristics of German intonation for the expression of continuation and finality . Phonetica 13, 134- 161 . Haggard, M., Ambler, S. & Callow, M. (1970). Pitch as a voicing cue. Journal of the Acoustical Society of America 47, (2) (part 2), 613- 617. 't Hart, J . & Cohen, A. (1964). Gating techniques as an aid in speech analysis. Language and Speech 7, 22-39. 't Hart, J . (1968). Intonational rhyme. !PO Annual Progress Report3, 40- 44. 't Hart, J. (1972). Intonational Rhyme, paper presented at the Symposium on lntonology, Prague 1970. Acta Universitatis Caro/inae, Philologica 1-Phonetica III, 105- 109. Isacenko, A. & Schadlich , H . J. (1970). A Model of standard German intonation. The Hague, Paris : Mouton. Jones, D. (1962). An Outline of English Phonetics , ninth ed . Cambridge : Heffer & Sons Ltd . Van Katwijk, A. F . V. (1972). On the perception of stress, paper presented at the Symposium on Intonology, Prague 1970. Acta Universitatis Carolinae, Philologica 1-Phonetica III, 127-135. Van Katwijk, A. F . V. & Govaert, G. A. (1967). Prominence as a function of the location of pitch movement, !PO Annual Progress Report 2, 115- 117. Leon, P. (1972). Oil en sont les etudes sur !'intonation? Proceedings of the VUth International Congress of Phonetic Sciences Montreal197 / . The Hague, Paris : Mouton . pp. 113- 156. Rigault, A. (1962). Role de Ia duree, de l'intensite et de Ia hauteur dans Ia perception de !'accent en Fran<;ais. Proceedings of the IVth International Congress of Phonetic Sciences, Helsinki 1961. The Hague: Mouton, pp . 735- 748. Willems, L. F. (1966). The intonator. !PO Annual Progress Report 1, 123-125. Willems, L. F. & Loonen, A. R. M. (1967). Intonation contour generator. !PO Annual Progress Report 2, 197- 200. Willems, L. F . & de Vries, H. (1970). The phonetograph. 1PO Annual Progress Report 5, 181-185.