Journal ofPhonetics (1979) 7, 119-145
Linguistic features in fundamental frequency patterns Douglas O'Shaughnessy* Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts U.S.A. 02139 Received 13 October 1977
Abstract:
Patterns of fundamental frequency (F0 ) extracted from natural speech of male speakers reading isolated sentences were examined. The F0 contours as functions of time exhibited systematic behavior with regard to sentence type, syntactic construction, emphasis, word type, and phonetics. Examples are given illustrating typical effects in Fo, and possible Fo "features" are suggested. The "stress" F0 feature consists of a rise or sharp fall in F 0 ; the "boundary" feature utilizes a fall-rise F0 pattern, as well as a "continuation rise" shape. Vocatives can be distinguished from appositives via F 0 • If confirmed by perceptual analyses, these F 0 features could be used in automatic speech synthesis or recognition algorithms.
Introduction Understanding how human speakers vary their pitch in communicating aspects of an utterance-message to listeners would be useful in designing speech synthesis and recognition schemes. The perception of pitch and the related features, intonation and stress, correlate in the acoustic speech waveform with fundamental frequency (F0 ), the rate of vibration of the vocal cords in voiced speech, as well as with duration and amplitude. This paper primarily deals with the analysis of F 0 patterns in natural speech, with the objective of hypothesizing F0 features modeled upon those used by speakers to impart information to a listener. In particular, F 0 patterns are analyzed at four hierarchical levels: phoneme, word, phrase, and sentence, corresponding to the multiple functions of F0 in speech to convey information as to consonant voicing, emphasis, syntax, and type of sentence. Analysis of natural F 0 contours The following F0 data were obtained from natural speech, isolated sentences recorded from four adult male speakers in a quiet room. The read sentences were designed to illustrate how F0 might be used contrastively to convey linguistic meaning. The results reported here were part of a larger F 0 -analysis project (O'Shaughnessy, 1976). The plots in ensuing figures consist of F 0 values every 10 ms (with pitch period resolution of 0·08 ms) (see Gold & Rabiner, 1969 for a description of the algorithm). The F 0 points in the figures are connected in the high-amplitude portions of each syllable to aid in interpreting the plots. *Currently with: INRS-Telecommunications, Universite du Quebec, Verdun, Quebec, Canada 0095-4470/79/020119+27S02.00/0
© 1979 Academic Press Inc. (London) Ltd.
120
D. 0' Shaughnessy
Limits of F 0 variation To establish ranges of intra-speaker F 0 variation and to identify differences in F 0 patterns between stressed and unstressed syllables, speaker KS recorded in one session 229 sentences of the form "Say X instead", where X represented 229 different monosyllabic and bisyllabic words (O'Shaughnessy, 1974). F0 at selected points in the fixed words (at the end of "Say" and at the F0 peak of "instead") exhibited an average range of 18 Hz, with a standard deviation of 4·0 Hz, compared to a wider variance at the peak of X (32 Hz range, 5·5 S.D.) (Fig. 1). The stressable (i.e. lexically-stressed) syllable of "instead" was marked with a +17·3 Hz jump (4·1 S.D.) during the unvoiced obstruents /st/.
:~f~~o@~·
0
descent
~78
0 · 2 sec o
Say
en
-
hance
in 0
(b) 120
Say
Figure 1
0
91
0
0
~0
jump
~75 2> 1n
0
ex
st eo d.
0
~/rise
1
~ 0
116
105
95
part
'
stead.
Plot of F0 vs time for speaker KNS: (a) "Say enhance instead"; (b) "Say export instead". (Note the Fo points out of line; they represent errors of the pitch extraction device.) The F0 peak values are noted in Hz, and the dimensions are noted in Hz and seconds (all figures use the same scaling).
While X was similarly marked with a +16·1 Hz obtrusion (5·1 S.D.) when its second syllable was stressable [Fig 1 (a)], the rise was much smaller when the first syllable was stressable [Fig 1(b)]. The "descent" (F0 fall from peak value on a syllable to F0 at the start of the ensuing syllabic nucleus) was more consistent on X's stressable syllable averaging at least-20Hz, with S.D. about 5·6 Hz. The non-stressable syllables had smaller F0 movements. The S.D. were large enough to discount possible quantum levels of stress, both in absolute F 0 values and in relative F 0 changes, but were small enough to allow the pattern of larger Fa changes on the stressable syllables to be observed. As a result of this and other experiments, Fa's contribution to stress in an utterance was viewed as a large rise or fall, termed "Fa accent".
Fa at the phonemic level-consonant voicing One phonemic Fa effect that has been observed in simple citation utterances is that Fa tends to fall at the voicing onset of a stressed syllable if the initial consonant is unvoiced, and to rise if it is voiced (Lehiste & Peterson, 1961; Lea, 1973). However, rise vs fall Fa patterns at the starts of syllables are not consistent cues to phonemics. The variability of the initial Fa pattern on a stressable syllable is illustrated in "farm-" in Fig. 2 ("This was the farmer who was eating the carrot"): despite its initial unvoiced consonant, Fa at the
(ol
0
f'"'
0
~ 0 A
0154
I I~~ I~~~'1was 198
k,
(b)
r I ,, 0\
l
This
l30
the form- er who was eating
°
the carrot .
This was
0
<110
168
co
~
\t ~~?~ ~$ 1: the ca rrot.
~
~
~ ~
~
~
"'::!
~
~
~
<0>
il : I
2-
::s
....
0
was the form - er who was eating
the carrot.
0
0
This
Figure 2
form-er who was eat ing 198
~
153
0
122
0 <0>
I
the
(d )
0 00
~
;::
~ \~;~ l_
<0>
0
0
0
0
0
0
~960
(c)
1
°
This was the
I
"'
137
~~~~~~\
form -er who wa s eat-ing
the ca r rot.
F 0 plot of " This was the farmer who was eating the carrot" for speaker JA : (a) normal version; (b) with stress on "This"; (c) with
stress on "the"; (d) with stress on "farm-".
...... N
122
D. 0' Shaughnessy
start of the syllable was either : rising (when F 0 jumped up between syllabic nuclei) [Fig. 2(a)- (c)], or falling (when F0 dropped between nuclei) [Fig. 2(d)]. The amount of initial rise was apparently related to F 0 on the prior syllable: if F 0 jumped sharply upward , F0 continued to rise on "farm-", but otherwise "farm-" had little F0 rise. However, "car-" (in "carrot") showed no initial F0 rise even though F0 jumped up prior to the voicing onset in "car-"); it is likely that unvoiced stops are less inclined to F 0 rises than unvoiced fricatives. The voicing factor in F 0 may be viewed as a tendency, rather than a simple binary correlation. The F 0 inclination at syllable start to rise rather than fall likely lies on a scale: sonorant (most likely), voiced obstruent, vowel, unvoiced fricative, unvoiced stop (least likely). This tendency is influenced by the direction of F 0 change: if F 0 is undergoing a large upward movement from the previous to the current syllable, F0 tends to continue rising at voicing onset; whereas if F0 is going down, F0 tends to continue to fall. For speech recognition purposes, whether a syllable with an upward F 0 obtrusion has an initial unvoiced consonant could be judged on a basis of what proportion of the F0 rise occurs after voicing onset : if mostly jump prior to voicing, the start is likely unvoiced, if not, voiced. (However, the F 0 -voicing boundary line is difficult to establish [e.g. "farm-" vs "eat-" in Fig. 2(b)-(c)]. If a syllable in a generally-falling F0 pattern starts with a small rise, it likely has a voiced start, but otherwise voicing cannot be determined in falling F0 trends. 172
173
(a)
~
\
J
I
0
0
~ 0
0
That
fish
0
is
196
(b)
I
~
~
0
J~, 0
0 0
Figure 3
176
0
~<9
That
ty.
tas
190
l ill)
0
f i sh
is
0
tas
-
ty
0
is
true.
Fo plot for JA of: (a) "That fish is tasty"; (b) "That fish is tasty is true" .
Fundamental frequency patterns
123
Fa at the word level (a) Cuing word usage
Different uses of words can lead to different Fa patterns. In "That fish is tasty" [Fig. 3(a)], "That" functioned as an adjective and received an Fa accent rise. In "That fish is tasty is true" [Fig. 3(b)], "That" was a conjunction and got little accent, with Fa remaining low. Similarly, in "After fainting, Bob came to yesterday" [Fig. 4(a)], "to" (as a particle) had a large jump+ fall Fa obtrusion, signaling stress, whereas in" . . . came to consciousness" [Fig. 4(b)], "to" (as a preposition) formed part of the typical Fa fall-off pattern of nonstressable syllables, with little Fa change. [Duration and amplitude also played a role in stress-marking here: "to" had more amplitude and was 160 ms longer in Fig. 4(a) than in Fig. 4(b), and "That" behaved similarly for two speakers, but showed little difference for speaker JA (he apparently only used Fa to cue the accent here).] Thus, words in similar sentence locations may use Fa to cue one of two possible uses, following the tendency for "stressable" words (e.g., adjectives, particles) to get accent, while most function words (e.g., conjunctions, prepositions) get little. However, since any word may receive contrastive accent, accent on a word is not a definite cue to the word's usage (e.g. an accented " that" or "to" could still be a conjunction or preposition), but lack of accent on a word is a fairly reliable indicator that it is a function word or a repeated word.
( 0)
0
133
~
co
~
(!l)
0
1
\0~
.. oBab
to
come
yes -
ter-doy.
0 0
(b)
r; 134
r
0 0
0
\ 0
Bob
Figure 4
~~
~147
come
to
0
~
'
con-scious- ness.
Fo plot for JA of: (a) "After fainting, Bob came to yesterday"; (b) "After fainting, Bob came to consciousness".
D. 0' Shaughnessy
124 (b) Emphasis
"Emphatic stress" on a word or syllable in a natural utterance is correlated with increased F0 (Lofqvist, 1975) and larger F0 excursions. Emphasis involves enhanced relative accent on selected syllables; they receive larger F0 obtrusions, while the other accented syllables get decreased accent patterns (especially those after the emphasized syllables). If there is any F0 feature that could be called "emphasis", it would likely comprise an F0 contour
with one or two large rise+ fall obtrusions, each of about a syllable's duration, with other (especially ensuing) F0 relegated to a low level. Figure 2 shows several readings by JA of "This was the farmer who was eating the carrot". Viewing Fig. 2(a) as the basic version, with moderate accents each on "farm-", "eat-", and "car-" , Fig. 2(b- d) had increased (perhaps "emphatic") accents on "This", "the", and " farm-", respectively. In each case, the F0 pattern on the specified syllable was raised, while that on ensuing syllables was lowered. The specific F0 shapes were essentially retained, with only levels and amounts of change varied, which suggests that the syntax of the sentence and the phonemics of each word specify F0 shape, while the amount of accent (and corresponding perceived stress) is related in F 0 to relative peak levels and amounts of change.
(a)
150 0
105 0
fa c e,
and
then
hit
she
the
a rd .
bast
( b)
0
0
I Figure 5
~
101
~
0 121
\
oorttp 0
rn
o7'
0 0
~ 0
0
$
face,
and
then
s.he
hit
the
c
bast
F0 plot for JA of "She slapped him in the face, and then she hit the
bastard": (a) with coreference; (b) without coreference.
~I
ard .
0 . . 16 4
( c) o
f
(a)
0 0
ro 0
148%
132 0
0
1~
~'~' form-er
The
I,®~
0
'6
0
was eat-ing the
00
0
«
% c , ,
The
form - er
~~~ ~~
was eat- i ng
t he
1}
car r o t .
;:s
Sl
Ei
s
"'tiD
~
carrot .
s::
~ ~
0
0
'~::!
10 6
1:> ..... ;;;-
0
(b)
166/A
f
1~ The
fJ
I
0
~
"'
130
The
0 0
(e)
1&
0
eat-ing
the
carrot .
o The
Figure 6
~
I
0
~ farm - er was
~
;::
form-er
was
eat- i ng the
c arrot .
~~~ form-er
was
eat-ing
the
cor rot.
o
Fo plot for JA of "The farmer was eating the carrot": (a) without context; (b) preceded by "What was the farmer doing?"; (c) preceded by "Who was eating the carrot?"; (d) preceded by "What was the farmer doing with the carrot?"; (e) preceded by "What was the farmer eating?".
...... N
Vl
126
D. 0' Shaughnessy
In "She slapped him in the face, and then she hit the bastard" (Fig. 5), the absence/ presence of emphasis on "bastard" cues whether or not "bastard" = "him". When the two were coreferent [Fig 5(a)], each of three speakers used F0 accent on "hit" (emphasizing "hit" vs "slapped"), and little on "bastard" (where F0 stayed low) ; but when not coreferent [Fig 5(b)], "bastard" got a large F 0 obtrusion [at the expense of "hit", which was treated as coreferent with "slapped" in Fig. 5(b)]. Thus whether the last word had a stress feature cued a meaning difference here, via the rule that "new" non-coreferent words get accent, while "old" coreferent words do not. As above, duration and amplitude also helped cue the difference. (c) Context To investigate the F0 effect of context, "The farmer was eating a carrot" (Fig. 6) was recorded (1) without context and (2) preceded by four different questions. Words repeated in the answer received decreased accent, while words not repeated got increased accent (e.g. in "Who was eating the carrot? The farmer was eating the carrot", "farm-" got a bigger F 0 obtrusion than in the citation case, while "eat-" and "car-" had smaller F0 movements). The F 0 differences can be succinctly described as a combination of obtrusion and recession : the stressable syllables of the remaining "new" words had their F0 patterns raised (to cue those words as the surviving important words, which the listener is least likely to anticipate), while the F0 patterns on the "old" words were lowered. Taking the peak F 0 values on each stressable syllable as indicative of its F 0 level, "old" words before the lone "new" word were recessed in F0 by -17 Hz, while those after it were lowered by -28 . The remaining new word had its F 0 raised +27 if it was not the initial stressable one in the sentence, but only +3 if it was. This indicates a greater tendency for more F0 lowering after the new word than before, and a tendency for F0 raising only if an old word precedes the new one. Coinciding with these F0 changes, the word durations also varied slightly: relative to the citation utterance, the three stressable syllables increased an average of 19 ms in the new words, and decreased 25 ms in the old ones. These represented changes of less than 15 %, compared to radical F0 alterations, suggesting F0 's greater role here. Not all stressable syllables in "new" words are marked with large F 0 movements. Such syllables, when surrounded by others with large rises and falls , often lack F 0 accent features. Such "adjacency" effects are especially notable in words interior to a syntactic unit, rather than at the start or end. Some phrase-medial words utilize level F 0 (in an otherwise falling F0 pattern) to signal their stressable syllables. One view might be that the speaker economically uses his larger F0 movements on words where they serve multiple purposes (e.g., signal both stress and a syntactic boundary with one F0 change). Also, not all stressable syllables have the same amount of F 0 movement: more important words get bigger amounts. F0 at the phrase level-syntactic structure Syntactic structure has been related to F0 patterns (e.g. Takefuta eta!., 1972; Atkinson, 1973; Olive, 1975), but few studies have made explicit claims as to how F0 cues syntax, other than stating that syntactic phrases frequently start with rising F 0 and end with falling F 0 (e.g. Cowan, 1936; Maeda, 1974). Lea & Kloker (1975) claimed that F0 fall+ rise "valleys" mark internal syntactic boundaries; however, these F 0 "breaks" often coincide only loosely with boundaries; syllables with low F 0 often intervene between a large fall and ensuing large rise [e.g. Fig. 4(b)]. F0 rises usually correlate well with stressable syl-
127
Fundamental frequency patterns 0
~
0 0
0
0
C1l>
188 0
0
60~~~
0
158
0
Jro
II
0
0
0
/). 0
0
108
19~J
$
Rob-ert
Figure 7
has- n' t
bought
yet , ..
car
the
F 0 plot for JA of "Robert hasn't bought the car yet, but he will later today".
!abies, but often at the start of a syntactic phrase, an F0 rise starts on such a syllable, but "overshoots" into an ensuing nonstressable one (e.g. "Robert" in Fig. 7). Stressable syllables can be related to the initiation of F0 rises and falls, with such F0 movements continuing into other syllables.
0
154 147 0
~0
The
good
f I i es
quick- ly
0
passed. 0 0
0 0
(b)
154 0
~ 0
0
I
0
The
Figure 8
good
flies
quick-ly
past.
F0 plot for JA of: (a) "The good flies quickly passed"; (b) "The good flies quickly past".
128
D. 0' Shaughnessy
(a) Boundary cues In sentences of the form NPI-verb group-NP2-preposition-NP3 (where NP =noun phrase), speakers tended to mark a "juncture" after NP2 when the ensuing prepositional phrase was adverbial (modifying the verb group) and not when adjectival (modifying NP2), by the use of "a relatively lively intonation or a. relatively low intonation" on NP2, as opposed to a pattern "relatively high or lacking in prominence" (Hartvigson, 1965: 230). Relatively high and level F0 on NP2, plus a lack of change in F0 direction, signaled no juncture after NP2. A rise+ fall pattern signaled a juncture only when the peak F 0 was relatively high; if NP3 had a "more lively" F 0 than NP2, no juncture was heard after NP2. This suggests that F 0 falls and/or low F0 are cues to syntactic boundaries, and that boundaries occur after large F0 obtrusions. In the following, pairs of sentences with similar segmentals are examined which cue different syntax via prosodies. Scholes ( 1971) found that phonetically-identical sentences can be disambiguated via F 0 (e.g. "The good flies quickly passed/past"). In JA's reading (Fig. 8), the subject in Fig. 8(a) ("the good flies") differed from that in Fig. 8(b) ("the good") mainly in that F0 fell -59 Hz in "good" in Fig. 8(b) [vs only -12 in Fig. 8(a)] and F 0 fell -46 in "flies" in Fig. 8(a) [us only -27 in Fig. 8(b)]. Thus the end of the subject had a rapid F 0 fall; further, a small "continuation rise" (an F0 rise on the last syllable before a syntactic boundary) occurred at the end of the fall [+18 in Fig. 8(b); +3 in Fig. 8(a)]. The disambiguation via F 0 occurred entirely on "good flies"; the rest of the F 0 pattern was essentially 0
(a)
0 0
I
~
'.
\
She
gave
$
the
bean
I pI ants
I to
'~
[[])
0 97
0
I char -
$\I ty.
i -
(b) 0
0 [[])
\
I
J02
She
Figure 9
gave
the
boy
pi ants
to
wot - er.
Fo plot for JA of: (a) "She gave the bean plants to charity"; (b) "She gave the boy plants to water".
129
Fundamental frequency patterns
the same in Fig. 8(a)-(b). However, duration and amplitude may also have played a part: the syllabic nuclei of "good" and "flies" were 110 ms longer and 110 ms shorter, respectively, in Fig. 8(b) than in Fig. 8(a), and the longer nuclei had higher amplitudes as well. In "She gave the bean plants to charity" [Fig. 9(a)], "the bean plants" is the direct object, but in "She gave the boy plants to water" [Fig. 9(b)], "the boy" is an indirect object and "plants" is the direct object. Thus, there is a syntactic boundary after the fourth word in Fig. 9(b) but not in Fig. 9(a). JA used a 22Hz larger "descent" (defined earlier) on the phrase-final "boy" than on the phrase-initial "bean", while the other two speakers used more accent on "bean" than on "boy". All three placed greater F 0 accent on "plants" in Fig. 9(b) (phrase-initial) than in Fig. 9(a) (phrase-final). Thus the boundary was marked by larger accent on the word after the boundary (by all three speakers), and by a larger F 0 fall on the word before the boundary (by one speaker). Syntactic phrasing of scope affects F 0 patterns. In "Steve or Sam and Joe will be coming" (Fig. I 0), the F0 pattern on the first word and last three words varied little in the two readings, but F 0 differed considerably on the other words: when either Steve or Sam were to accompany Joe [Fig. IO(a)], F0 remained high on "or" [+48 higher than in Fig. IO(b)], then fell -77 on "Sam", with an ensuing +33 continuation rise; F 0 was then low on "and", and executed a large rise+ fall on "Joe". In contrast, when either Steve or the other two were to go [Fig. IO(b)], F0 was low on "or", rose on " Sam", remained high on
(a),l9d, 0
0
~\
I {b)
166
0 0
/98
Steve
or
~~ 0
Sam
and
Joe
l
82
Steve
Figure 10
or
Sam
and
Joe
F 0 plot for JA of "Steve or Sam and Joe will be coming": (a) with the
first three words forming a syntactic constituent; (b) with "Sam and Joe" forming a syntactic constituent.
130
D. O'Shaughnessy
(o)~ 176 62 0
0
0
0
0
0 0
\
Lo' \ 0
John
cooked
~0~
ocf
o
the
fish .
0 0
160
\ 0
0
John
Figure 11
Iy
152/168~
(b)o
l
on -
on -
ly
cooked
the
fish .
F0 plot for JA of "John only cooked the fish": (a) with "only" modifying "John"; (b) with "only" modifying "cooked the fish".
"and", and fell on "Joe". Thus, the three words forming a syntactic "unit" had a rise+ level+ fall F0 pattern ; the adjacent conjunction had low Fa, while the interior one had high Fa . Again , duration may also play a role in identifying the meaning: in Fig. IO(a), the duration between the syllabic nuclei of " Sam and " was 270 ms longer than in Fig. IO(b) ; while in Fig. I O(b), the durations between "Steve or" and between " Joe will" were 180 and 80 ms longer, respectively, than in Fig. lO(a); i.e. the last word of the large " unit" in the subject, as well as the word preceding the unit, had an ensuing "pause" . The scope of certain adverbs can also be cued by Fa patterns. In "John ADVERB cooked the fish" (Fig. II), where ADVERB was either "also", "only", or " even", it could quantify the subject [Fig. ll(a)] or the verb phrase [Fig. ll(b)]. The three speakers noted the syntactic difference in Fa by using a la rge (-43 Hz) fall on the adverb when it formed a syntactic unit with "John", but a smaller (- 12) fall when it formed a unit with "cooked the fish ". Again, the large fall signaled the end of a major syntactic unit [" John"+ ADVERB , in Fig. ll(a)], but was not used when the subject contained only one word. As above, the large fall in Fig. II (a) was accompanied by an increase in duration [the adverb and any ensuing " pause" was 52 ms longer than in Fig. II (b)], but the increase was only 12 %, suggesting that the Fa difference was the major factor in disambiguation here. Thus Fa can be used to cue syntactic boundaries in ambiguous sentences. The apparent syntactic features involve a fall on the last stressable syllable prior to the boundary and a rise on the first one after it. An additional cue is often provided by a third feature , the continuation rise, which precisely locates the boundary by immediately preceding it. Not
Fundamental frequency patterns
131
all boundaries are cued by these F0 features, but they occur at many major boundaries, especially when the utterance would otherwise be ambiguous. Another possible F 0 feature is high vs low F0 in non-stressable syllables: syntactic "cohesiveness" can be cued by relatively high and flat F0 in these syllables, which normally have falling or low F0 • (b) Continuation Rises (CRs) Useful in cuing syntax, the continuation rise (CR) has been related to the pre-pausal lengthening that often occurs at syntactic boundaries. Several authors have noted this "continuative" use of F0 (e.g. Delattre, 1965; Isacenko & Schadlich, 1970; Lieberman, 1967; 't Hart & Cohen, 1973), but few have defined or described it in detail. An F0 rise on the final syllable (stressable or not) of a syntactic unit, it often occupies only the latter part of the syllable, especially in a stressable syllable. CRs are used at syntactic boundaries involving conjunction (often at punctuation marks), when the ensuing syntactic unit is at the same or a higher syntactic "level", e.g. in conjoining words, phrases, and clauses of a similar type. They are frequently not used when a dependent clause follows an independent one (Fig. 12), or when an appositive in a list or a vocative ensues (see below). Most often the CR starts from a low F0 level, after an F0 fall, but in certain "list" contexts, it starts from a high level. For example, in "John has won the race and left the city" [Fig. 13(a)], "race" (at the end of the first verb group) had a fall+ rise pattern, but when "both" preceded "won" [Fig. 13(b)], "race" lacked an F0 fall before the CR, with F0 staying +24Hz higher than in Fig. 13(a). Similarly, in "He drinks coffee with meals and on the run" [Fig. 14(a)], "meals" had a rise+ fall+ rise pattern, but when "both" preceded "with" [Fig. 14(b)], "meals" had a flat+ rise pattern, with F 0 remaining +42 Hz higher than in Fig. 14(a). Apparently, the fall part of the normal F0 pattern at the end of these syntactic units was deleted in Figs. 13(b)-14(b), when the parallel nature of the conjoined phrases was emphasized by the inclusion of the word "both". This distinction of a high vs a low CR is probably not contrastive, since it apparently does not cue a difference in meaning. However, a CR without a prior fall must be distinguished from the simple F0 rise that signals stress: a stress F0 rise starts early in the syllable (and often includes an F 0 jump prior to the syllabic nucleus), whereas a CR occurs 0
198 0 0 0
152
\
0
130
c
'\
1 0
~
0
0
It' s
Figure 12
\ true
t hot
fish
is
t as
F0 plot for JA of "It's true that fish is tasty".
-
ty .
132
D. 0' Shaughnessy 0
0 0
191
( 0)
0
~\/~vo has
Joe
the race
won
137
J~ and
1el t
~
134
\
0
ci -
1 he
t y.
0 0 (b)
166 155
J"'
(
0
130
J
0
~'34\~
0
I
$
Joe
Figure 13
has
both
won
I he
race
o
$
12
and
lei t
the
~
ci-ty.
F0 plot for JA of: (a) "Joe has won the race and left the city"; (b) "Joe has both won the race and left the city".
later in the syllable, after falling or relatively level Fa. Furthermore, a CR occurs only on the word-final syllable, and a stress rise only on a stressable syllable (except with contrastive stress, where any syllable may get a stress rise; but there the syllable is cued by a rise and fall, whereas any fall after a CR in the same syllable is very small). That the CR only occurs on the last syllable is illustrated in Fig: '15 ("He bought a Cadillac mirror and a Cadillac tailpipe''): "mirror" had relatively level Fa until the second, non-stressable syllable. The distinction between a stress rise and a CR is cloudy in lists of single words [e.g. "Gulls, foxes, lemmings, and bears ... " (Fig. 16)]: stressable syllables in words non-final in a syntactic phrase usually get Fa stress rises, which can "overshoot" into an ensuing non-stressable syllable; thus the rises in "-es" and "-mings" may be CRs or mere "overshoots". Indeed an overshoot may be a possible form of CR (Fig. 7). It invariably occurs at the start of a syntactic unit, with falling or level Fa immediately ensuing from the high level, whereas Fa usually "resets" at a lower level after a CR, as if "starting anew". Thus when Fa following an overshoot starts low and rises, it can be taken as functioning as a
133
Fundamental frequency patterns (a)
186
0
181
r
~ <1J)
0
0 0
166
0 0
r~
0
0
He dr i nks (b) 0 (!l8iJ
and on the run .
cof-fee with meals
181
0
0 0
0 0
0
131
0
~0 'ir
.. co l- fee
Figure 14
0
both
with
0
meals
and on the run.
F 0 plot for JA of: (a) "He drinks coffee with meals and on the run" ; (b)
"He drinks coffee both with meals and on the run".
CR, rather than just indicating the start of a syntactic unit (in more emphatic speech, such a single-word syntactic unit would have a rise+ fall+ rise, with the first two movements opening and closing the syntactic unit; but the fall is deleted in less emphatic speech, as here).
162
0
0~~
0 0
154
147~
\
~0 0
~
I
0
·- i 1-1 ac
mir
-
ror
and a
Cod 0
Figure 15
0
i I - I ac
\ I ai I
-
~~ pipe .
0
F 0 plot for JA of " He bought a Cadillac mirror and a Cadillac tailpipe".
134
D. 0' Shaughnessy
_,
150
0
Gulh,
fox
-
es,
lem-mings, and
bears ...
F0 plot for JA of "Gulls, foxes, lemmings, and bears live in the Arctic".
Figure 16
Absence of a fall before the CR appears related to the amount of redundancy in the conjoined phrases (Bolinger, 1972): the fall was lacking when "both" preceded the phrases (Figs 13-14), and when there was a common entity in the phrases (e.g. "Cadillac" in Fig. 15, or "car" in Fig. 17, but not "red" in Fig. 18). However, the presence of a fall seems the more typical situation, even with redundancy, since in "He bought a red car, a blue car, and a blue coat" (Fig. 19), both "car" 's had falls before their CRs. The tendency for
0 0J'92 0
0
0
0
0
0
0
0
l
r
f'66
0 0
~~ ~
0'
~o~OV 0
~
---H-e-+-bo_u_g_h_tro---re-d+-----+--c-a-r~$r----a+n-d--a~---b-lu-e+---~~~ 0
Figure 17
He
Figure 18
0
F 0 plot for JA of "He bought a red car and a blue car".
bought a
red
car
and
a
red
F 0 plot for JA of "He bought a red car and a red blouse".
b I ouse~
135
Fundamental frequency patterns
149
137
r
137
ltV
0 0 0
v
~
0
0 .. bought o
Figure 19
red
cor,
o
blue
0 cor, . .
F 0 plot for JA of "He bought a red car, a blue car, and a blue coat".
larger CRs in lists of three or more phrases is also illustrated in Fig. 19. The fall+ rise case may also be the more prevalent due to less "effort" needed in raising F0 from a lower level. Apparent throughout these examples is that how much F 0 falls and how low a value F0 attains are not distinctive: large falls and lower levels likely are stronger syntactic and stress cues, but such variations do not appear contrastive. The differences likely function at a secondary level: rather than cue distinct differences in meaning, such variations in amounts of F0 change likely serve to assist the listener along a continuum-a larger F0 change signaling greater importance, and perhaps less amounts of fall before a CR signaling more redundancy, with "importance" and "redundancy" forming continuous scales of values. (c) Commas Commas often signal syntactic boundaries in text; and in spoken versions, F0 frequently has a large fall on the last stressable syllable, and a CR on the last syllable, before the comma. When a comma separates a phrase from the end of the main portion of the sentence, the CR is often deleted. In "He speaks English naturally" (Fig. 20), a comma after "English" [Fig. 20(b)] led to a large F 0 fall on that word, whereas with no comma [Fig. 20(a)], F 0 on"English" remained high. While the comma led to greater disjuncture [250 ms more in Fig. 20(b), than in Fig. 20(a)], "English" had the same duration in Fig. 20(a-b), with or without a large fall; thus larger F 0 changes do not necessarily correlate with longer durations. The comma in Fig. 20(b) can be viewed as inserting a fall +rise valley into the F 0 contour of Fig. 20(a). In "There are many books I know that are worth reading" [Fig. 2l(a)], "many books I know" formed a syntactic unit, with a high F0 on "books" and a large fall on "know". When commas separated "I know" from the rest of the sentence [Fig. 2l(b)], F 0 fell sharply on "books", and stayed low on "I know". The difference was partially cued in duration: the disjuncture before "I" was 50 ms longer in Fig. 2l(b) and "I" itself was 50 ms shorter, than in Fig. 2I(a). However, the other durations remained the same: the large fall on "books" had no effect on its duration, and the second comma was not marked by any durational change. Thus the biggest difference here occurred in F0 .
136
D. O'Shaughnessy 0 ( 0)
162 0 0
158
0
r~
gns
I
146
~
~
0
\-+
$
0
(b)
0 0 154 0
lr ' He
Figure 20
0
speaks
r88 0
0
10 \ 147
1
~
Eng-lish,
o\-
not
-
urolly .
Fo plot for JA of: (a) "He speaks English naturally"; (b) "He speaks English, naturally".
In "Joe grew cotton to make money" (Fig. 22), " cotton" marks the end of a phrase, whether a comma follows [Fig. 22(b)] or not [Fig. 22(a)]. Thus the prosodic effect of a comma here was less than above. Both versions of "cotton" had large F 0 drops during the (t(, but F 0 fell to a lower value and the ensuing F 0 was lower in Fig. 22(b), while F 0 had a CR at the end of "cotton" in Fig. 22(a). Thus the comma had the effect of marking " to make money" as an "afterthought" (i .e. ensuing the low F0 typical of utterance-end). (d) Appositives and vocatives F0 can signal the difference between a member of a list and an appositive. In "The three/
two people in the house are Joe, my son, and his wife" (Fig. 23), "my son" is either a third person [Fig. 23(a)] or the same person as " Joe" [Fig. 23(b)]. In Fig. 23(a), both "Joe" and "son" had CRs, typical of non-final members of a list, whereas in Fig. 23(b), "Joe" lacked a CR. Furthermore, the CRs were larger in Fig. 23(a) than in Fig. 23(b) (+45 vs +23); larger CRs are often used in simple lists of items (probably to emphasize the list structure) ; thus in Fig. 23(b), where the list was interrupted by an appositive, the CR was smaller. Possibly related to the larger CRs in Fig. 23(a) was that the syllabic nuclei of "Joe" and "son" averaged +40 ms more than in Fig. 23(b). In Fig. 23(b), there was a pause after "son" [260 ms more than in Fig. 23(a)] to help cue the appositive.
137
Fundamental frequency patterns
J
211~ (a)
'J
0
·~
0
8~ %>
8
0 0
\33
~ 0
r~
~
0 0
$
$
.. are
man-y
books
know
that ore worth
\,
reod- i ng.
0
0
(b)
0 0 0
129
~
I 17
There ore mon· y
Figure 21
books,
know,
t ho t o re wo rth
read- i ng .
F 0 plot for JA of: (a) "There are many books I know that are worth reading"; (b) "There are many books, I know, that are worth reading".
The distinction between appositive and vocative can also be cued in F0 • In "Jane has a brother, Robert" (Fig. 24), Robert is either the brother [Fig. 24(a)] or the person addressed [Fig. 24(b)]. In Fig. 24(a) (appositive), "brother" had a typical pre-comma large Fa fall plus CR, and "Robert" had a typical utterance-final rise+ fall. But in Fig. 24(b) (vocative), "brother" lacked a CR, and "Robert" had low Fa, with an Fa rise in the last, non-stressable syllable. This utterance-final, small F 0 rise may be termed a "vocative" rise, as opposed to a terminal statement fall, or question rise (which rises to a higher level, and starts on a stressable syllable). When a vocative occurs at the end of a yes/no question, the Fa pattern is affected similarly. In "Did you hear a horse whinny?" [Fig. 25(a)], "horse" had slightly-falling Fa, and "whinny" had the final question rise. In " Did you hear a horse, Winnie?" [Fig. 25(b)], "horse" exhibited a question rise, while "Winnie" also retained an Fa rise. Thus the
138
D. O'Shaughnessy
(o),l71r 0
0
8
~
co
I
0 0 161
0
00137
~~~' 0
(bl
grew
162
Po~ !8
0
$ I <1:ot - ton
~6
1\ $ to make mon-ey .
0
0
0 127
0
b Joe
Figure 22
0
0 0
Joe
I
CXlll
~
grew
0~ I
cot -ton,
~~ l-t0
fS1J 0
I
to make mon-ey .
F 0 plot for JA of: (a) "Joe grew cotton to make money"; (b) "Joe grew
cotton, to make money".
presence of a final vocative here moved the question rise to the stressable word before the vocative. Duration varied little here. The difference between a question rise and a vocative rise is shown in Fig. 26 . In "What's for dinner, Stan?" [Fig. 26(a)], F0 on "Stan" followed the vocative pattern of!ow, relatively flat F0 until the end of the syllable, where F0 rose slightly. Replacing "Stan" with "steak" [Fig. 26(b)] led to a question rise on "steak": a much larger F0 rise, spread throughout the syllable. A secondary F0 cue occurred at the end of "dinner": in Fig. 26(b), the "afterthought" -nature of the question" . . . steak?" caused F0 to fall low, compared to F 0 prior to "Stan". Thus a vocative in a statement seems to be cued by a number of F0 features: a sharp fall on the stressable syllable prior to it, no prior CR, and the " vocative rise" itself, which consists of relatively low and flat F0 until the end of the vocative, where F 0 rises. At the end of a question, the vocative is cued by a question rise before it, and the "vocative rise" takes place at a relatively high level. In both cases, the vocative starts with an F 0 lower than the peak on the previous accented syllable. Appositives had less distinguishing F 0 cues: when in a list, one was cued by the Jack of a preceding CR ; when in a possible vocative context, it was cued by the lack of vocative cues (but could have a prior CR).
F0 at the sentence level-sentence type Although several authors have analyzed F 0 contours of sentential utterances as to indications of sentence type, few solid conclusions have been drawn (e.g. Peck, 1969) other
139
Fundamental frequency patterns Cl])
0
(a)
0
0
137
137
r~
~0
~
+-.. are
son,
my
Joe,
and his
wife.
(b)J~
J
I~ .. are
Figure 23
131
~\ 0
Joe,
my
son,
and
his
wife .
F 0 plot for JA of: (a) "The three people in the house are Joe, my son, and
his wife"; (b) "The two people in the house are Joe, my son, and his wife".
than that F 0 falls in statements (Ss) or rises in yes-no questions (Qs) substantially at the end of a natural utterance (Chatman, 1966; Lieberman, 1967), and that the primary stress involves a fall inS and a rise in Q (Atkinson, 1973). (a) A question description Previous perceptual experiments (Studdert-Kennedy & Hadding, 1973; Majewski & Blasdell, 1969; Hadding & Studdert-Kennedy, 1974) have shown the possible uses of F0 at the sentence level to be complex. However, the synthetic stimuli used may have varied F0 differently from the way that humans do . In comparing actual F 0 contours in Ss and Qs, there appear to be three critical areas in a Q contour: the first and last accented syllables, and a medial accented syllable which could be termed the "break" syllable (one syllable could serve two or even all three roles). After the initial accented syllable (where
140
D. O'Shaughnessy 0
( 0) 0 0
0
1
0
-0---t---+---+--~&---+-----+-\John
has
~
bi'oth- er,
Rob
-
er t.
'"'(~, ~
1----+---t----+---t--~--t-=--+-~ ~ 8
0
John
Figure 24
has
a
broth-er ,
Rob-ert.
F 0 plot for JA of "John has a brother, Robert" : (a) with apposition; (b)
with vocative.
F0 rises), ensuing F0 patterns are "flattened" somewhat, with F0 falling more slowly in non-stressable syllables than in a statement and generally staying at a higher F0 level. Rather than fall immediately after an accented syllable, F 0 often "hesitates" a syllable before falling. F0 does, however, often fall low again, especially just before another accented syllable. The accented words form low to-high patterns, with F0 starting low on the stressable syllable, and then rising (sometimes only after that syllable, if a non-stressable syllable follows in the word). After the "break" syllable, F0 remains at a high level, falling only slightly in ensuing non-stressable syllables; the ensuing F0 patterns appear compressed in that the movements at the higher level are small (except at an accented syllable, where F0 has a relatively larger fall and then a rise on the immediately prior syllable and the accented syllable, respectively). After the last accented syllable, F 0 continues to rise, ending the utterance at a high level ; this last rise may be termed the "question rise", and is apparently the main factor that distinguishes Q from S. The speaker apparently has considerable freedom to choose which stressable syllable is the "break" syllable: speaker JA often chose the first one, while ML often chose the last. The final F0 value was very high or only relatively high, depending on how early in the utterance the break syllable occurred: the earlier, the higher.
141
Fundamental frequency patterns
( 0) 167
166
T: 0~,~~ 1j&
CQl
0
0
Did you
hear
a
horse
whin-ny?
(b)
Did
Figure 25
you
hear
a
horse,
Win-nie?
Fo plot for JA of: (a) "Did you hear a horse whinny?"; (b) "Did you hear a horse, Winnie?".
(b) Examples In comparing Ss of the type "Joe has been studying his books" with corresponding Qs ("Has Joe been studying his books?") (Fig. 27), after the initial F0 rise and before the terminal rise/fall, JA averaged 99Hz for a low value in the Ss, but 146Hz in the Qs, while KS averaged 93 in the Ss, but 103 in the Qs. Thus F0 was generally higher in the Qs, even before the terminal rise, but the degree of difference was speaker-dependent. A " Q rise" occurs at the end of the main clause of the Q and again at the end of the Q, if a dependent clause is utterance-final. In "Do you understand me when I speak German?" [Fig. 28(a)], Q rises occurred on "understand" and "German", with the last stressable syllable in each clause initiating the Q rises. The final one was the larger, going to a higher F0 value. When the dependent clause came first ["When J speak German, do you understand me?" Fig. 28(b) ], there was no Q rise on "German"; instead, it had a typical clausefinal rise + fall + rise, and F0 started low in the second clause, compared to the high values there in Fig. 28(a). Thus the Q rise F0 feature apparently cues the clause containing the main Q, as well as the end of the Q. In clauses ending with a Q rise, F0 falls were not used to mark accent; instead, each stressable syllable initiated an F0 rise, with the amount of rise likely proportional to the perceived stress. The syllable with the largest rise (usually the "break" syllable) likely cues its word as the "focus" of the Q, i.e. the word about which a yesfno answer is desired. Comparing perceptual S/Q tests with these actual F0 contours, the correlation of S/Q judgments with terminal rises vs falls is easily seen. However, any effect of a high turning
142
D. 0' Shaughnessy
fS87
(a) 0
0
0
~0
~
:
~
0
'0
1
\
What's
J
din - ner ,
for
137
0
Stan?
oo (b) 0
r 1 0
What's
Figure 26
~'
\8
0
\
$
0
for din-ner,
bleak?
F 0 plot for JA of: (a) "What's for dinner, Stan?"; (b) "What's for dinner,
steak?".
point is not so easily explained; it may be that a high F0 value at that crucial point serves the same F0 cue as generally-high F0 in natural Qs. The perceptual experiments either used single-word utterances or only varied F0 on the last word; thus the F0 differences that occur throughout most natural Qs and which distinguish them from Ss were not examined. Combining the first and last accented syllables, as well as the break syllable, all in one final word limits the F0 cues. However, since one-word Qs do exist, S/Q cues can occur in compact form, as well as spread out on several words in longer Qs. At the end of rapid movements, F0 often levels out for the last few pitch periods [e.g. "to" in Fig. 4(a)], which suggests a certain amount of freedom in executing F0 maneuvers: in rapid F0 changes (e .g. those cuing stress), the objective is likely simply to attain a lower or higher F0 value (within a certain range) within the syllabic nucleus, after which F0 can stay at the achieved level until the syllabic nucleus ends. This may partially explain why sharp F0 rises and falls on stressable syllables often form broad "S" -shaped patterns [e.g. "good" in Fig. 8(b)] : the speaker holds back the F0 rise or fall until within the nucleus, with F0 relatively level in the first few periods, then the change commences, then F0 levels off in the last few periods. Accelerator and decelerator muscles may be involved (Sundberg, 1973). While a major syntactic boundary is often marked in F0 by a sharp fall and ensuing sharp rise, lesser boundaries frequently go unmarked in F 0 • In "Robert hasn't bought the
143
Fundamentalfrequency patterns 0 152
I Joe
I
I been
has
I $ stud-ying
178~
books.
19J
0
0
r~oo
I Has
Joe
been
stud-ying his
books?
0
Figure 27
F0 plot for JA of: (a) "Joe has been studying his books"; (b) "Has Joe
been studying his books?".
car yet, but he will later today" (Fig. 7), Fa rose sharply on "Robert" and fell sharply on "car", but the subject-verb group boundary was essentially uncued. Fa remained high throughout "hasn't bought the"; the slight rise on "bought" signaled some stress on the verb, but there were no major dips in Fa. This type of Fa shape has been referred to as a "hat pattern", due to its broad rise +level+ fall shape (Cohen & 't Hart, 1967). This Fa pattern represents a non-contrastive option by the speaker not to mark a boundary within the clause, and may be related to the lack of a contextually-required boundary cue, or to the desire by the speaker to expend less effort (less Fa movement, especially rises, corresponding to less "effort"). Discussion-F0 features From the observations above, certain Fa patterns may be advanced as features, consistent with the correlation of Fa movements to the syntactic, semantic, and phonetic content of utterances. Ignoring fine variations, the basic Fa movements of rise and fall can be considered the fundamental linguistic parameters which constitute Fa feature patterns. Rises and falls appear to be controllable by the speaker and perceivable by the listener (Klatt, 1973; Ohala, 1970), and relate closely with perceptions of phonemes, of stress, of syntactic boundaries and certain phrases, and of sentence-type. Including "level Fa" (relatively flat or slightly-falling Fa) with rises and falls in Fa, the three basic features should be sufficient to describe the linguistically-relevant Fa patterns. Specifically, all semantic, syntactic, and phonetic information present in natural Fa should be reduceable to sequences of these features, with each syllable containing 1-5 such patterns. Non-stressable syllables usually
144
D. 0' Shaughnessy
0
Do you un - der
(b) 0
-
186
0
LY When I
Figure 28
speak
stand
me when
speak
A
0
0
V:~
Ger - man,
do you un-der -
Ger - man?
7
0~ 0
0
stand
$ me?
Fo plot for JA of: (a) "Do you understand me when I speak German?"; (b) "When I speak German, do you understand me?".
only have 1-2 such features, whereas stressable ones have patterns as complex as rise+ level + fall + level + rise. The F0 "features" which actually cue linguistic content to a listener are comprised of these three basic primitives, with specific timing constraints. Consonant voicing may be partially cued by which primitive begins a syllabic nucleus, together with which one precedes it (i.e. jump, drop, or level F0 between nuclei) (such F0 changes between nuclei are simply forms of the basic F0 features timed to occur during sections of no voicing). Stress on a syllable is likely cued by one of two possible F 0 sequences: a rise (either just before the nucleus or at its start) after falling or level F0 , or a fall (either during or just after the nucleus) preceded by a section of rising or level F0 during that syllable or by level F 0 in the previous syllable. Thus there appear to be two F 0 "stress features". Syntactic boundaries apparently utilize two F 0 features: the CR, and the broad fall + rise pattern. The CR is a non-stress F0 rise, i.e. one occurring at the end of a nucleus, after a section of falling or level F0 in the syllable. It is a direct boundary cue, because the boundary immediately follows the CR, whereas the fall+ rise only indicates that a boundary occurs between the fall and the rise. This second syntactic F0 feature is a sequence of two simpler features: the stress fall, which signals an ensuing syntactic boundary, and the stress rise, which cues a prior boundary. The combination offall +rise is simply a stronger boundary cue than either one by itself. Vocatives and other interjected expressions (e.g. "I know", Fig. 21) use a "vocative" F 0 feature consisting of: the lack of a stress F0 feature, level F0 until the latter part of the last syllable, and then an F0 rise.
Fundamental frequency patterns
145
If there is one feature that cues a statement, it is a terminal F0 fall: falling F 0 from the last F0 stress feature to the utterance-end, inclusive. Similarly, the primary yes-no question feature is a terminal rise (the "Q rise") : rising F0 starting with the last F0 stress feature . Secondary F0 features cuing Q are: lack of stress falls and other falling F0 in clauses ending with a Q rise. I wish to thank Professor Jonathan Allen for his considerable assistance in my research. This research was sponsored by a National Science Foundation Fellowship and Grant, and all work was done at the Research Laboratory of Electronics at MIT. References Atkinson, J. (1973). Aspects of Intonation in Speech: Implications from an Experimental Study of Fundamental Frequency. Ph.D. Thesis. University of Connecticut. Bolinger, D . (1972). Intonation. Harmondsworth, England : Penguin. Chatman, S. (1966). Some intonational crosscurrents: English and Danish. Linguistics 21, 24-44. Cohen, A. & 't Hart, J. (1967). On the anatomy of intonation. Lingua 19, 177-192. Cowan, M. (1936). Pitch and intensity characteristics of stage speech. Archives of Speech Supplement. University of Iowa. Delattre, P. (1965). Comparing the Phonetic Features of English, German, Spanish, and French. Heidelberg: Julius Verlag. Gold, B. & Rabiner, L. (1969) . Parallei Processing Techniques for Estimating Pitch Periods of Speech in the Time Domain. Journal of the Acoustical Society of America. 46, 442-448. Hadding, K . & Studdert-Kennedy, M. (1974). Are you asking me, telling me, or talking to yourself? Journal of Phonetics 2, 7-14. 't Hart, J. & Cohen, A. (1973). Intonation by rule: a perceptual quest. Journal of Phonetics l, 309-327. Hartvigson, H . (1965). A Specific Case of Terminal Juncture and Syntactic Cohesion. Phonetica 13, 227-251. Isacenko, A. & Schadlich, H.-J. (1970). A Model of Standard German Intonation, translated by J. Phelby. Mouton: The Hague. Klatt, D. (1973). Discrimination of fundamental frequency contours in synthetic speech: implications for models of pitch perception. Journal of the Acoustical Society of America 53, 8-16. Lea, W. (1973). Segmental and Suprasegmental Influences on Fundamental Frequency Contours. In Consonant Types and Tone, (Hyman, L., Ed.). University of Southern California, 15-70. Lea, W. & Kloker, D . (1975). Prosodic Aids to Speech Recognition : Timing Cues to Linguistic Structure and Improved Computer Programs for Prosodic Analysis. Sperry-Univac Technical Report PX 11239. Lehiste, I. & Peterson, G . (1961). Some Basic Considerations in the Analysis of Intonation. Journal of the Acoustical Society of America 33,419-425 Lieberman, P. (1967). Intonation, Perception, and Language. Cambridge, Massachusetts: M.I.T. Press. Lofqvist, A. (1975). Intrinsic and Extrinsic F0 Variations in Swedish Tonal Accents. Phonetica 31, 228-247. Maeda, S. (1974). A characterization of fundamental frequency contours of speech. MIT Research Lab of Electronics QPR 114, 193-211. Majewski, W. & Blasdell, R. (1969). Influence of Fundamental Frequency Cues on the Perception of Some Synthetic Intonation Contours. Journal of the Acoustical Society of America 45, 450-457. Ohala, J. (1970). Aspects of the Control and Production of Speech. UCLA Working Papers in Phonetics 15. Olive, J. (1975). Fundamental frequency rules for the synthesis of simple declarative sentences. Journal of the Acoustical Society of America 57, 476-482. O'Shaughnessy, D. (1974). Consonant Durations in Clusters. IEEE Transactions of ASSP 22, 282-295. O'Shaughnessy, D. (1976).Modelling Fundamental Frequency and its Relationship to Syntax, Semantics, and Phonetics, Ph.D. Thesis, MIT. Peck, C. (1969). An Acoustic Investigation of the Intonation of American English, U. of Michigan, Natural Language Studies 1. Scholes, R . (1971). On the spoken disambiguation of superficially ambiguous sentences. Language and Speech 14, 1-11. Studdert-Kennedy, M. & Hadding, K . (1973). Auditory and linguistic processes in the perception of intonation contours. Language and Speech 16, 293-313. Sundberg, J. (1973). Data on maximum speed of pitch changes. Speech Transmission Lab QPSR 4, 39-47. Takefuta, Y., Jancosek, E. & Brunt, M. (1972). A statistical analysis of melody curves in the intonation of American English, 6th International Congress on Phonetic Sciences 1035-1039.