Journal of Phonetics (1991) 19, 281-292
X-ray data on the temporal coordination of speech gestures Sidney A. J. Wood Department of Linguistics, The University, S-90187 Umea, Sweden and Department of Linguistics, Helgonabacken 12, S-22362 Lund, Sweden Received 26th September 1990, and in revised form 8th January 1991
This paper presents some preliminary examples of core gesture analysis from a crosslinguistic study of coarticulation and temporal organization based on the frame by frame analysis of speech gestures from X-ray motion films of speech, and evaluation of the functions of the individual component gestures. Data are reported for a Southern Swedish speaker. Lip, tongue, jaw and larynx maneuvers are tracked and evidence of anticipation , perseveration, assimilation, interarticulator coordination and compensation is reported that supports the Kozhevnikov & Chistovich model of coarticulation and challenges models based on competition between gestures.
1. Introduction
This paper reports preliminary results from an ongoing study of speech gestures with respect to various issues raised in the literature on coarticulation. Gestures are analyzed from an X-ray motion film of a Southern Swedish speaker, one of a set of six films of speakers of five languages. The paper describes how the initiation of new activity for a phoneme was timed and coordinated with other ongoing activity, and then assesses how adequately existing models can account for the observed patterns of behavior. Coarticulation has been an object of study since Menzerath & Lacerda's (1933) investigation of lip movement and nasal airflow, but the phenomenon was noticed much earlier. Sweet (1877, pp. 56, 60-63) saw speech sounds as momentary points "in a stream of incessant change" consisting of the inevitable transitional onglides and offglides between adjacent points. This model was supported by Menzerath & Lacerda's results, and was compatible with the traditional distinction between coarticulation and assimilation (e.g. Jespersen, 1897, p. 500); assimilated gestures are started earlier or held longer than the transitions, implying mental revision of the motor program. It remained the accepted view until Joos (1948, pp. 104-108) reported different spectra for a vowel phoneme in different·consonant environments, which he interpreted as evidence of coarticulation extending beyond the transition. The hitherto clear-cut distinction between coarticulation and assimilation suddenly became cloudy. Consequent research on coarticulation has addressed such issues as how far ahead a phoneme may be initiated, how long it may be kept going, what and where its boundaries are, and in what sense simultaneous phonemes are serially ordered . 0095-4470/91/030281 + 12 $03.00/0
© 1991
Academic Press Limited
282
S. A. J. Wood
The study of coarticulation is clouded by incompatible legacies of different scientific traditions: is coarticulation the response to conflict between competing instructions and articulator mechanics, or is it the planned and implemented intention of the speaker? The extent of anticipatory coarticulation is a crucial issue for models of speech motor control, since different conclusions entail different exigencies for prior knowledge of future events: when does the production system refer to the component gestures of upcoming phonemes, and how is such knowledge supplied and utilized? For example, the Kozhevnikov & Chistovich syllable model (1965, Chapt . 4) offered an explanation for simultaneous expression of serially ordered phonemes, by allowing gestures of the phonemes of a syllable to be initiated simultaneously provided there is no articulatory conflict between them . However, subsequent reports (e.g. Ohman, 1966) contradicted this , since the domain of coarticulation appeared to extend into neighboring syllables. The different definitions and models of coarticulation fashion how we see the task facing speech motor control, and coarticulation remains a central problem in phonetics (see overviews and debate in Daniloff & Hammarberg, 1973; Fowler, 1980, 1983; Hammarberg, 1976, 1982; Kent , 1983; Kent & Minifie , 1977; Lindblom, 1986; Whalen 1991). 2. Materials and procedures Each film consists of one 35 mm reel of some 3000 picture frames , running for about 40 s/subject at a camera speed of 75 frames/s (13 .3 ms/frame) . Sound and picture are synchronized by means of a sync pulse (taken from the camera shutter) that appears on every tenth picture frame and that was recorded on a separate tape channel alongside the microphone signal. The films were made at the Lund University Hospital with the assistance of the Rontgen Technology Unit. The general conditions under which the six films were made and processed are described in Lindblad (1980; pp. 108-109) , Wood (1979) and Wood (1982, pp. 27-31) . The gestures discussed here are taken from one utterance from a film recording of a Southern Swedish speaker. The utterance is of a nonsense sentence, Ebbe Sjise tjasar i Sodasjo (a person, Ebbe Sjise, does something, tjasar, at a place , i Sodasjo ), phonemically /'ebe 'fi:se ''j::sar i 'su:da.fu:/. The If I is a voiceless rounded or bilabial prepalatal or palatovelar fricative usually transcribed [6]; this particular subject's variant is palatovelar, and bilabiodental rather than just bilabial (Lindblad, 1980). The phoneme /r/ is uvular [B"], and /a:/ is dark and slightly rounded [o]. The remainder of the film contains systematic permutations of the stressed vowels in the same sentence frame. Midsagittal profiles are traced frame by frame throughout selected film sequences, and individual gestures then tracked through each sequence. Strictly speaking, all parts of the vocal tract are floating with respect to each other, and the nearest one can get to a fixed reference structure is the hard palate, which is used here as a reference for mandible , upper lip and larynx movement; lower lip and lingual maneuvers are then defined in relation to the mandible (see Fig. 1). This corresponds to the primary (active) movement attributed to the musculature of the articulator itself, and excludes secondary (passive) movement due to some other articulator moving. The gestures tracked are palatal, palatovelar, pharyngovelar, and low pharyngeal tongue body maneuvers (see Wood, 1979) , apical and laminal elevation, mandibular depression, lip protrusion and/or approximation, and larynx
X -ray data on temporal coordination
101-108
(f)
90-97~l
283
67
76-~?
Figure 1. Examples of seven types of gesture analyzed from midsagittal profiles traced from the test utterance. Starting and ending frames for (a) lower lip approximation, (b) apex elevation (example starting from a palatovelar configuration) , (c) upper lip protrusion, (d) mandible depression, and (e) larynx depression. Frame-by-frame tracings for (f) lower lip protrusion and (g) apical elevation. (The three separate frame sequences in (f) show frame-by-frame movement too fine to be reproduced at this scale .)
depression. Protrusion and approximation of the lips are lumped together for this report to simplify presentation. There was approximation alone in the one example of [b] (Fig. l(a), (c)); remaining examples were of rounding (all instances of [6, o, u]) with protrusion and approximation together (Fig. l(f)). The raw measurements (and consequently derived measures like articulator velocities) are not reported in this paper, but are reduced to presence or absence of movement by a particular articulator since the previous film frame, that is, a discernible change of articulator posture when successive midsagittal profiles are superimposed and compared (Fig. l(g)). A positive movement is in the direction of the named gesture, (e.g. apex elevation in Fig. l(b)), whereas a negative movement is withdrawal of the named gesture (e.g. negative apex elevation is depression) . No movement means that there was no discernible change of articulator posture when comparing successive profiles (Fig. l(f)). Whenever there were sequences of several frames with an inactive articulator, the current profile was compared with all previous profiles in such sequences in order to check for the possibility of very gradual creeping movement that would otherwise go undetected. Frame by frame resolution of movement is about 1 mm. The articulation rate of this sentence was 4.75 syllables/s, i.e. neither fast nor slow but typically average.
284
S. A. J. Wood
3. Results and discussion 3.1. The data Individual maneuvers are shown in Fig. 2(a)-(c) . The gestures are listed down the left, the film frames are numbered along the top (commencing at the beginning of the utterance), and the sequence is transcribed phonetically below each panel, and (a) frame no:
TONGUE: palatal
w
"'
0
oooo---- oo oooo ---- --
eeee ee eeeeeeee ee
palatovelar
tttttQQ-:--:--:--:- -:-:-:-:-:-:co +++++++ 1111111 1 111 111111
++++ ++++++oo ------- ssss ssssssss sssssss s
pharyngeal
UPPER LIP approx/prot
++++
00000000 - - - - - - - -
ssss ssssssss sssssss s
oooo ---- oo oooo---- - coo o++ooooooo-- oooooooo ooooooo o bbbb bb bbbbl l I I Ill I I I I ss ssssssss sssssss s oooo++++ oo o -- -oo++ coco o --- oooooooo oooooooo ooooooo o bbbb bb bbbb I I I I I I I I I I
LOWER LIP aprox/prot oooo++++ bbb LARYNX depression oooo++++ bbbb seg ments
oo - --- -- ++ oooo ---- oooooo++ oooooooo ooooooo o
bb bbbbbbi I Ill I Ill I
ss ssssssss sssssss s
+o oo --- --o oooo o++o-ooo-- oo oooooooo ooo- ooo o bb bbbbbbb iiiiiiiii
v
0
[ 'E
b
106
duration ms:
(b) frame no:
+
++++ ++oo o-- --IIII Ill! III II!
apex elev MANDIBLE depression
Ln 0
0
27
v
H 'fj
v
H
e
i:
v
106
53
148
106
93
_,.
e l
__,
w
00
0
0
TONGUE: palatal
++ +oooo -
palatovelar pharyngeal apex e lev
+++++++oooo---- o ooooooo ooooooo o- ----- + s ++++++ ++ 0000000 - - - +++++ 0 sssss s
MANDIBLE depression UPPER LIP approx/prot LOWER LIP aprox/prot LARYNX depression
segments durati on ms:
oo ooo+++oooooooooo ooooooo --ooooo oo oo+++ + sss u +++++oooo------- - oooooo coo++++ +o ---oo + u
v
H
v
93
93
V
27
62
i J
a
o: 213
L
Figure 2. Frame-by-frame gesture analysis of the test sentence. Symbols + , - and 0 in the rows in the main body of the figure indicate positive movement since previous film, negative movement, or no movement (i.e. sustained posture) for each type of gesture listed down the left. A phonetic symbol below a + , - or 0 indicates example assignments of gestures to particular phonemes. Panels (b) and (c) continue the analysis from panels (a) and (b), respectively, with the first set of frames repeating the last segment from the preceding panel. Panel (d) shows the waveform and spectrum for the utterance.
285
X -ray data on temporal coordination
segmented vertically into traditional phonetic segments (vocoids V, occlusion 0, hiss H, liquid L). The vocoid, hiss, and liquid segments were identified according to their spectral structure and waveform in Fig. 2(d). The occlusion segments were identified directly from the film (and it turned out they were slightly briefer than would have been expected from the waveform). Segment durations are also given below the transcription.
(c) frame no: TONGUE: palatal
....
00
1<,<,<,<, ---++++ ++ ++++++++
++++++++o-----
palatovelar
uuuuuuuuuuuuuu
pharyngeal
·. :..._:.._:. ..:_ ...:. ++++oo . . . . . sssss s ssssssss
apex elev
·+·+·+·+·+
+ ++++++++oo -- -------a aaaaaaaaaa aa aaaaaaaa oooooo ----- ---o+++++ oo ---------- -- ----oooo
~~~~~ ssssss 55555555 MANDIBLE ~~~~~ --oooo o+++oooooooooo oo depression ·.·.·.·.·. ssssss suuuuuuuuuuuuu uu UPPER LIP approx/prot :0-!:,:0'0:0 coo+++ ooo+++++oooooo -uu u uuuuuu uuuu uuu u uu
LOWER LIP aprox/prot
LARYNX
++oooo---- oo coo+++++ aaaaaa
coo---++++ oo oooooooo uuu uu u
06+++ ++oooo ooo+++oooooooo -- oo--ooooo+ oo oooooooo :-:- sss uuu uuu uuuuuuuuuuuuuu uu uuu u
depression
:-<-<..:. 00 ++++++
segments
H
V
's
"u
0 d
V
::(i ::::
a
H fj
duration ms:
:-:62:::
74
186
27
133
27
++++++++o-- --- -- ------oooo oo oo++++++ ..... uuu uuu uu uu uu uuuuuuuu uu uuuuuu
-: -:y.:-:
segments
251
~o:
s
a~ds;}uda
VOVH V H V H V
H
VLV H V 0 V H V
[ebefj
(d)
V
"u]
i:
se
fj;}u]
sees
Ebb e Sj
e
tj
a
s a r
S o d a sj o
Figure 2. (Continued)
286
S. A. J. Wood 3.2. A stream of incessant change
It is often said that the articulators are in constant motion. Figure 2((a)-(c)) shows when each articulator was activated ( + , - ) or when it was left in a particular posture (0). Examples of frame-by-frame movement and inactivity are given in Fig. 1(g) and (f), respectively. The typical utilization of any articulator was that there were brief periods of movement and longer periods of inactivity. There were occasional moments when the whole vocal tract was stationary (e.g. frame sequences 1-4, 21-22, 29-30 and 41-42, representing 52 ms, 26 ms 26 ms and 26 ms respectively). The overall impression is one of articulators being marshalled momentarily as needed and left alone when not needed.
3.3. Overlapping phonemes The essence of coarticulation is that attributes of more than one phoneme are produced simultaneously, but these attributes obviously have to be identified and quantified before any discussion of parallel phoneme production is possible. The individual instances of gestures reported in Fig. 2((a)-(c)) were therefore assigned to their respective phonemes. Some of these assignments are indicated explicitly in the figure . Three of these are presented below in order to explain the procedure, while the others are examples for the subsequent discussion. The first example is /b/, with first movements occurring in frames 5-8 for mandibular elevation, upper and lower lip approximation and larynx depression. The upper lip was withdrawn in frames 12-14, the lower lip in frames 11-16, and the depressed larynx in frames 13-17. The phoneme /b/ thus "lasted" from frame 5 until frame 17, during which period there was always some vocal tract perturbation associated with it, although the occlusion itself was limited to frames 9 and 10. The functions of component gestures are not always obvious. The hallmark of /b/ is the bilabial occlusion provided by the labial approximation just noted. The slight mandibular elevation has an auxilliary function, to assist in that closure. Larynx depression in labial vowels and consonants has frequently been reported, and its function is presumably to stabilize resonance conditions in the sense that it diminishes undesirable changes in spectral sensitivities arising from the lip activity (the conclusion drawn from model experiments on larynx depression in rounded palatal vowels in Wood, 1986). The second example of gesture assignment is the vowel /e/ that sequentially follows immediately after the /b/ in this utterance, although it was also initiated in frame 5 simultaneously with /b/. This is how Kozhevnikov & Chistovich would have expected to see it produced, since its palatal tongue gesture is neither involved in nor antagonistic to the bilabial occlusion (although the tongue movement would have had some strange spectral consequences if the larynx had not been simultaneously depressed for the /b/). To conclude with a third example, the phoneme /s/ was initiated in frame 31 with tongue blade elevation that continued to frame 34, where the dental hiss source was established. At the same time, a pharyngeal movement of the tongue continued right until frame 40, almost at the end of the hiss segment (an obscure gesture that is discussed further in Section 3.4), and the lower lip was protruded in frames 33-34 (there is further discussion of this protrusion in Section 3. 7). The tongue blade
X -ray data on temporal coordination
287
elevation and pharyngeal tongue movement were withdrawn in frames 43-50, providing a total effect of this instance of /s/ on the vocal tract in frames 31-50, of which frames 35-42 were the hissing [s] segment and frames 41-42 a steady state in the whole vocal tract. All gestures throughout the entire utterance were identified in this way and assigned to their respective phonemes. These assignments are not illustrated here beyond the few examples in Fig. 2, but Fig. 3 shows the time course of the aggregate of gestures for each phoneme, and thus illustrates the period of activity associated with each phoneme and, consequently, the extent to which phonemes overlapped in time. Gestures have been grouped into onglides ( « , comprising all gestures directed towards each phoneme) and offglides ( » , comprising gesture withdrawal from each phoneme). The first « in an onglide marks the first appearance of the earliest gesture for that phoneme, the final » marks the very end of its last decaying gesture. Absence of gesture movement is also marked ( = ). Figure 3 can be compared with the similar assignment of spectral features to phonemes in the well known "Santa Claus" example of Fant (1961). Figure 3 demonstrates how phonemes were produced in parallel and reveals the number of phonemes active at any one moment in the sequence illustrated. There were at most three phonemes simultaneously (frame sequences 5-8, 15-17, and 47-49). Usually there were two. Occasionally there were components of only one phoneme (/c/ in frames 1-4, /J/ in frames 21-22, /i:/ in frames 29-30, /s/ in frames 41-42) . These all happened to be steady states of the entire vocal tract and are discussed in the next section. As a demonstration of the parallel production of phonemes, Figs 2 and 3 simply confirm what has been reported many times before. What is more interesting is to see how the initiation of new activity is timed and coordinated with other ongoing activity, and how adequately existing models of coarticulation can account for this behavior. _.
frame:
I'£ I IbI IeI
~
N 0
0
= = ==))))))))
«««« «= »»»»»»» ««« « «« ««« « »» »» »»
I 'J I I i: I
«««« ««== »»»»»» « « « « « «==»)) » » » » » » » »
IsI IeI
«««« ««««««== »»»»»»» )) ««««»»» ««« «» »»»» « ««««
I 'r; I I a: I segments
etc.
v ( 'E
0 b
v e
H
'fj
v i:
H
s
v e
Figure 3. The time course of the first eight phonemes in the test sentence shown in Fig. 2. The assignment of gestures to phoneme segments is listed down the left. Frames for the assignments are labeled as containing onglide gestures ( « ), stationary hold configurations ( = ) or offglide gestures ( » ).
H 'r;
v
o: l
288
S. A . J. Wood 3.4. Steady states
An old subject of contention is whether there are steady states in continuous speech. Moll (1960) reported that "it appears that the articulatory structures seldom, if ever, assume static positions, even during the productions of sustained vowel sounds" . On the other hand, Fig. 2 shows several steady states in speech produced at the very ordinary articulation rate of 4.75 syllablesls. The apparent contradiction must be due to the differences in temporal resolution; Moll's facility was working at the then customary 24 framesls (40mslframe), a camera speed that would fail to disclose many of the steady states observed here . The stressed vowels had brief steady configurations (e.g. frame sequences 1-4, and 29-30 in Figs 2a and 3) before the withdrawal of current gestures and the initiation of new gestures for the next phoneme. Two instances of fricative consonants had static portions (frame sequences 21-22 and 41-42), both of which occur at the end of their hiss segments, and not in the middle. There was hiss for If I throughout frames 19-22, but the palatovelar tongue gesture was still moving in 19-20. In the case of Is/ , there was hiss throughout frames 35-42 , but there was an internal tongue body depression into the pharynx that was still going on until frame 40, almost at the end of the hiss segment. These fricative configurations are illustrated in Fig. 4. The pharyngeal gesture appears to be an inherent component of this speaker's apical [s). (A similar maneuver was previously observed by Wood (1975) in the apical occlusives of another speaker.) It may explain some of the movement seen on dynamic EPG records of alveolar consonants reported by other contributors to this special issue. Its purpose is not clear, but one consequence would be to uncouple some high frequency back cavity resonances. This pharyngeal maneuver in apical [s] is clearly not a coarticulatory or assimilatory anticipation of a pharyngeal vowel since it occurs in every instance with nonpharyngeal vowels ([i:se] in Fig. 2(a), frames 23-49, and [i:s~u) in Fig. 2(c), frames 84-108), and not when adjacent to a pharyngeal vowel ([o:sa] in Fig. 2(b), frames 52-81) . It seems that this pharyngeal movement is executed by this speaker whenever there is not an adjacent pharyngeal environment already providing it, indicating that it is an inherent component of his Is/; its execution occupies the
Figure 4. Midsagittal profiles illustrating tongue body movement during the hiss segments of two instances of voiceless fricatives. Left: conclusion of the palatovelar gesture for [fi] (If!) during frames 19 (solid line) and 20 (dashed line) . Right: the pharyngeal gesture for [s] during frames 35 (solid line) to 40 (dashed line).
X -ray data on temporal coordination
289
greater part of the hissing segment so that the articulatory configuration cannot be static until towards the end. Turning now to the occlusives, we see a brief steady state for lbl in frame 10 (although palatal configuration hold belongs to the oncoming lei). There was no steady state for ldl (frames 109-110). There was no sustained configuration in the unstressed vowels, where initiation of the oncoming postvocalic consonants overlapped with the decaying prevocalic consonants (e .g. the onglide of postvocalic III overlapped with the offglide of prevocalic lbl in frames 15-17 during the vocoid segment of weak lei) . Gesture timing was thus affected by stress. In stressed vowels, a brief steady state is obtained by not initiating postvocalic consonants until the prevocalic consonant gestures are concluded. In weak vowels, there was no such delay and consequently no steady state. 3.5. Syllable timing The build-up and decay phases seen in Fig. 3 are reminiscent of the onglides and offglides of the Sweet model. But there are also some aspects of phoneme timing and coordination that were not foreseen by Sweet. For example , an oncoming vowel lei was initiated in frame 5 simultaneously with the current consonant lbl , and not sequentially later as Sweet would have expected. A better explanation for the observed timing pattern is provided by the Kozhevnikov & Chistovich model. In gestural terms, this speaker started each syllable by initiating a consonant in the latter half of the preceding vocoid segment (at frames 5, 15, 31, 47, etc.), and sometimes even earlier if the preceding vowel was unstressed (e.g. frame 84) . When there was no antagonism between gestures the vowel of the new syllable started early too (e.g. frame 5 for Iel , where the tongue was not being engaged in lb/) . When the tongue was constrained differently for the consonant and vowel , the subject delayed the vowel. An example can be seen in the next syllable, I Ji:l, where initiation of the palatal tongue gesture for the vowel /i:l was deferred until frame 24, because the tongue was still busy with the palatovelar gesture for the consonant I J I initiated in frame 15. Details like this speak against Ohman's (1966) proposal that consonant gestures are superimposed on an underlying continuous diphthongal vowel sequence. They also speak against the notion of conflicting gestures . In any case, it is not clear why a sophisticated system like speech motor control should make simultaneous calls for competing gestures. Surely it knows where the articulators are and what tasks they are currently engaged upon. Or does it immediately ignore, or forget, what it has just set going? The sequence reported in Fig. 2 does not exhibit any example of competing gestures, and the Kozhevnikov & Chistovich model expressly precludes simultaneous antagonistic maneuvers. Ohman (1966) also reported spectral effects from one vowel to the next, which is usually taken to contradict the Kozhevnikov & Chistovich model. However, the syllable programming seen in Fig. 2 may explain Ohman's findings. In cases like frame sequence 5-8, where both the consonant and vowel of the oncoming syllable are initiated during a current ongoing vocoid, the latter will contain spectral information for both the current vowel phoneme and the oncoming vowel phoneme. The opposite situation occurs when components of a decaying vowel are still active
290
S. A. J. Wood
simultaneously with the vocoid of the next syllable , which will then contain spectral information for both the decaying vowel phoneme and the current vowel phoneme. There is an example in frame sequence 111-116 (Fig. 2( c)), where the offglide of the vowel /u:/ (the decaying liprounding and larynx depression) overlapped the onglide of the next vowel/a/ (mandible depression and pharyngeal tongue movement). This part of the [a)-like vocoid will consequently contain spectral information from both the previous vowel and the current vowel. Thus anticipatory or perseveratory spectral effects from one vocoid to another can be quite consistent with the Kozhevnikov & Chistovich model, and do not necessarily contradict it.
3. 6. Anticipation and perseveration There are no obvious cases of anticipation (i.e. of gestures initiated earlier than expected) in the data presented in Fig. 2 apart from the initiation of /e/ in frame 5 that was discussed in the previous section. As already noted, gestures were initiated in accordance with the Kozhevnikov & Chistovich model. In particular, gestures associated with a new syllable are regularly initiated during the preceding vocoid, which is thus the normal starting point and is not anticipatory. There are several instances of perseveration in Fig. 2 (i.e., of gestures being retained longer than need be for the original purpose). All seem to be examples of the previously noted tendency for individuals articulators to be left idle until required again. For example, the liprounding of /u:/ was held until the end of the vocoid segment in frames 101-108, was withdrawn slightly during the [d] occlusion in frames 109-110, and the lips were again left idle for a moment before withdrawal of rounding was completed in frames 113-116 for spread-lip I a/. 3. 7. Compensation
There were two examples of articulatory compensations in this utterance. One is an instance of the frequently reported "bite-block effect" and is presumably universal; the other is possibly related to the subject's overbite and is consequently speaker dependent. In Fig. 2(a) frames 5-8, the mandible was raised during the latter part of the vocoid [e] for the oncoming [b)-occlusion. A side-effect of this would have been internal modification of the vocal tract towards an [i)-like configuration by lifting the palatal tongue posture closer to the hard palate. However, the on-going palatal gesture was diminished slightly while the mandible was being raised, which cancelled out most of the mandibular part of the tongue raising. The actual profiles can be seen in Fig. 5. This compensatory lingual configuration was then retained for the weak /e/ of the next syllable (for which the mandible was not subsequently lowered again after the [b], an omission by the subject that contributed to the reduction of this unstressed vowel). The other example of compensation concerns the subject's /s/ and his overbite (see Fig. 4). The subject regularly protruded the lower lip slightly in [s] (e.g. in frame sequence, 33-34 and 86-88), possibly to help control the turbulence source. For the [s] in frames 68-74, the lower lip was already protruded during the preceding rounded [o], and there was no further adjustment before the [s] (although the upper lip was withdrawn towards the end of the [o] in frame 65).
X -ray data on temporal coordination
291
[£]
Figure 5. Midsagittal profiles illustrating lingual compensation in frame sequence 5-8. Left: the mandible is raised for the oncoming [b]. Right: the palatal tongue posture is simultaneously depressed to keep the [c)-like lingual configuration.
The timing of both compensations suggests they are preplanned activity rather than online responses to an error signal from a feedback loop. Such preplanned activity implies utilization of knowledge of other gestures that are to be implemented at a later point in the utterance. 3.8. Larynx depression
Larynx depression was in most cases correlated with lip activity: frame sequences 5-9 for [b], 52-56 for [o:], 89-102 for [;m] and 125ff for [;m] . The exception was absence of depression for bilabiodental [5] (frames 19-22, 121-122). There was also larynx depression for [ff] in frames 78-82, without any apparent labial activity. 4. Conclusions The data presented were subjected to the simplest level of measurement, a frame-by-frame judgement of presence or absence of articulator movement. This is sufficient to evaluate issues of temporal coordination, and segmental simultaneity and sequencing. A distinction is made between gesture types (such as mandibular depression) and their token instantiation. This is fundamental to an analytical approach to problems like gestural antagonism, superposition, and contextual variation. It is striking how often several gestures are initiated simultaneously, especially when a new syllable is being initiated. An example is the initiation of the tongue, mandible, lips, and larynx for lbl and lei in frame 5, the tongue and mandible for I JI in frame 15 and so on. The brief episode of speech reported here confirms order and planning, and is adequately described by the Kozhevnikov & Chistovich model. The examples of gestures being delayed while ongoing antagonistic activity is being
292
S. A . J. Wood
implemented speak against models based on competition between gestures, such as Ohman's additive model where antagonistic consonant maneuvers are superimposed on a continuous underlying vowel motion. The apparent challenge to some current conceptions of coarticulation justifies continuing this work in order to quantify the various results by analyzing more data from the same speaker, from other speakers and from other languages. References Daniloff, R . & Hammarberg, R . (1973) On defining coarticulation , Journal of Phonetics, 1, 239-248. Fant, C. G . M. (1961) Sound spectrography. In Proceedings of the fourth international congress of phonetic sciences. The Hague: Mouton . Fowler, C. (1980) Coarticulation and theories of extrinsic timing, Journal of Phonetics, 8, 113-133 .. Fowler, C. (1983) Realism and unrealism: a reply, Journal of Phonetics 11, 303-322. Hammarberg, R . (1976) The metaphysics of coarticulation, Journal of Phonetics, 4, 353-363 . Hammarberg, R. (1982) On redefining coarticulation, Journal of Phonetics, 10, 123-137. Jespersen , 0 . (1897-99) Fonetik . Copenhagen : Schubotheske forlag . Joos, M. (1948) Acoustic phonetics. Language Monograph 23 . (Supplement to Language, 24.) Kent , R . D. (1983) The segmental organization of speech . In The production of speech (P. F. MacNeilage , editor), Chapt. 4. New York: Springer. Kent , R. D . & Minifie, F. D . (1977) Coarticulation in recent speech production models, Journal of Phonetics, 5, 115-133. Kozhevnikov, V. A. & Chistovich, L.A. (1965) Speech , articulation and p erception. Washington: Joint Publications Research Service. Lindblad, P. (1980) Svenskans sje- och tje-ljud i ett allmiinfonetiskt perspektiv. Travaux de l'institut de phonetique de Lund 16. Lund : Gleerup. Lindblom, B. E . F. (Ed .) (1986) Speech processes in the light of event perception and action theory. (Special issue of Journal of Phonetics, 14.) Menzerath, P. & Lacerda , A . de. (1933) Koartikulation, Steuerung und Lautabgrenzung. Phonetische Studien 1. Berlin ; Diimler. Moll , K. L. (1960) Cinefluorographic techniques in speech research. Journal of Speech and Hearing Research, 3, 227-241. Ohman, S. (1966) Coarticulation in VCV utterances: spectrographic measurements, Journal of the A coustic Society of America , 39, 151-168. Sweet, H . (1877) Handbook of phonetics. Oxford : Clarendon . Whalen, D . H . (1991) Coasticulation is largely planned, Journal of Phonetics, 18, 3-35. Wood, S. A. J . (1975) What is the difference between English and Swedish dental stops? Working Papers, 10, 174-193 (Dept . of Linguistics, University of Lund .) Wood, S. A. J . (1979) A radiographic analysis of constriction locations for vowels, Journal of Phonetics, 7, 25-43 . Wood , S. A. J. (1982) X-ray and model studies of vowel articulation . Working Papers, 23, 1-49 (Dept . of Linguistics, University of Lund .) Wood, S. A. J. (1986) The acoustical significance of tongue, lip and larynx maneuvers in rounded palatal vowels, Journal of the Acoustical Society of America, 80, 391-401.