Journal of Phonetics (1991) 19, 343-350
Proportional timing in speech motor control Anders Lofqvist Department of Logopedics and Phoniatrics, University Hospital, S-221 85 Lund, Sweden Received lOth July, and in revised form 17th December 1990
This paper explores the hypothesis that speech motor control is characterized by proportional timing. Proportional timing implies that ratios between articulatory intervals should remain constant as stress and speaking rate are varied . According to the received view, proportional timing is found both in speech and many other areas of motor control. Experimental analysis of movement intervals within and across linguistic segments does not show any evidence of proportional timing, however. Thus, proportional timing does not appear to be an adequate description of speech movement sequencing.
1. Introduction From the point of view of motor control it is attractive to assume that variations of a given movement pattern result from scaling between the individual parts making up the complex. That is, there is a generalized motor program that can be reparameterized (cf. Schmidt, 1975). For example, if a person writes a word on a piece of paper with a pencil or on a blackboard with a piece of chalk, different parts of the body are used-in the former case the hand and movements around the wrist, in the latter case the arm and movements around the shoulder joint. Since the written pattern on the blackboard can be regarded as a scaled up version of the pattern on the paper, it has generally been argued that there is but one underlying representation of the movement pattern that is instantiated by different parts of the body using a scaling relation. The alternative view that each pattern is somehow stored as a separate entity is at least intuitively implausible and inefficient. The generalized motor program predicts that when variations in speed and amplitude of a movement complex occur, the relationship between the individual movements should remain virtually unchanged. Many studies of different areas of motor control have purportedly shown this to be the case. For example, the model has been applied to locomotion (Grillner, 1975; Shik & Orlovsky, 1976; Shapiro, Zernicke, Gregor & Diestel, 1981), handwriting (Viviani & Terzuolo, 1980), and typing (Terzuolo & Viviani, 1979). Studies of speech motor control using the same experimental paradigm (i.e., varying stress and speaking rate) have argued that the model also applies to speech. At the intersegmental level, Tuller, Kelso & Harris (1982) and Tuller & Kelso (1984) showed that the relationship between successive jaw lowerings for the vowels in a VCV sequence and lower lip raising for the medial (labial) consonant was stable across changes in stress and speaking rate. Similarly, at the intrasegmental level, 0095-4470/91/030343 + 08 $03.00/0
© 1991
Academic Press Limited
344
A. Lofqvist
Lofqvist & Yoshioka (1984) argued that the same was true for temporal relationships between oral and laryngeal gestures in voiceless consonant production. The speech timing experiments not only extended the proportional timing model to speech but could possibly also rationalize some results on perceptual constancy in speech perception. In particular, studies by Port (1979), Fitch (1981), Miller & Grosjean (1981), and Summerfield (1981) have shown how variations in speaking rate affect the perceptual boundaries between segments (see also Miller, 1981, for a review). For example, it is well known that voice onset time (VOT) is an important acoustic cue to the voiced-voiceless distinction in stop consonants (cf. Lisker, 1986, for a catalogue of other cues). Since the actual duration of VOT will change with speaking rate (e.g., LOfqvist & Yoshioka, 1984), the perceptual voiced-voiceless boundary should also shift, and this is indeed the case. While the production and perception studies would thus seem to converge, it is important to note that none of the perceptual studies have explored whether there is a constant proportionality between, say, the duration of a preceding vowel and the closure duration of a following stop that could account for the perceptual constancy. At most, the perceptual experiments have revealed boundary shifts in the expected directions. The evidence in favor of the proportional duration model (i.e., the generalized motor program) has recently been re-examined by Gentner (1987). He argues that most of the studies showing support for the model have used improper analysis for assessing the model. When the proper analyses are applied, the model is not supported; at least the reparameterization is not linear. Some of the methodological problems with the existing studies presenting evidence in favor of the proportional duration model are illustrated by the studies of speech production referred to above. One particular problem is the use of correlations to assess invariance and scaling (cf. Benoit, 1986). All experiments on speech have made part-whole correlations. For example, Lofqvist & Yoshioka (1984) analyzed the relation between closure/constriction duration and the interval from onset of closure/constriction to the onset of glottal adduction in voiceless stops and fricatives. A high correlation was found between the two intervals, but the interval from onset of closure/constriction to the onset of glottal adduction is part of the closure/constriction, and correlating the whole with one of its parts would in itself yield a correlation of about 0. 7. In order to avoid this statistical artifact, Munhall (1985) and LOfqvist (1986) calculated the expected part-whole correlations and then examined whether the obtained results significantly deviated from the expected ones. Munhall (1985) found that most of the effects could be accounted for by part-whole artifacts, whereas Lofqvist (1986) argued that the effects were real. (However, the results published by LOfqvist (1986) only pertain to the row and column means. When the individual cells were analyzed, as they should more properly be, the results are less supportive of proportional timing.) If regression analysis is used instead of correlations, the results are clearer. The intercepts are generally non-zero, indicating that a constant ratio between the measured intervals does not hold (cf. LOfqvist & Yoshioka, 1984; Lofqvist, 1986). In spite of this, claims have been made about constant ratios even though the published regression analyses do not support such a strong claim. In order to avoid the statistical pitfalls of such correlations, Gentner (1987) proposed a constant proportion test, which examines whether the ratio between one movement interval and the duration of the whole movement sequence is unrelated to the duration of the whole movement sequence (cf. Fig. 1). The proportional
Proportional timing
345
Predicted regression line
I Duration of movement sequence
Figure 1. The expected regression line under the assumption of proportional timing.
duration model predicts that this should be the case, since the durations of all the components of a movement sequence should maintain a constant proportion of the overall duration. A regression analysis can be used to examine whether the slope is different from zero or not. The model assumes that the slope should not differ from zero.
2. Procedure As a further examination of proportional timing of speech gestures, the constant proportion test was applied to the timing of articulatory gestures at the intrasegmental and intersegmental levels, respectively. The intrasegmental material has been presented in Lofqvist & Yoshioka (1984). Briefly, it consists of dental voiceless stops and fricatives under different stress conditions spoken at two different rates by two native speakers of American English. The stress manipulation used stressed and unstressed syllables, whereas the rate manipulation used two self-selected speaking rates. The temporal intervals of interest are the duration of the oral closure/constriction, and the interval from onset of oral closure/constriction to onset of glottal adduction (cf. Fig. 2, top). The intersegmental material was taken from Lofqvist (1986). It consists of VCV sequences where the vowels are I a/ and the middle consonant is drawn from a set of labial consonants. Stress was placed on the first or second vowel, and the VC sequence contained either a long vowel and a short consonant, or a short vowel and a long consonant; the material was produced at two different rates by a native speaker of Swedish. The measures of interest are the interval between the onsets of jaw lowering for the vowels, and the interval from onset of jaw lowering for the first vowel to onset of lower lip raising for the medial consonant (cf. Fig. 2, bottom) . The proportionality should be manifested at the token level and occur irrespective of the source of the change of overall movement sequence duration. The voiceless consonants from the Lofqvist & Yoshioka (1984) study have thus been analyzed across variations in stress, position in a word, and speaking rate. Since voiceless stops in stressed and unstressed syllables differ with respect to aspiration, the stressed and unstressed stops have been analyzed separately. Following Gentner (1987), a significance level of 0.05 was chosen.
A . Lofqvist
346
Oral closure or constriction Time to peak opening
__/\__l
Glottal opening
c
v
v
Interval from onset of jaw lowering for VI to onset of lower lip raising for the medial consonant r-----1 Lower lip
Jaw
-v-vb
a
b
a
Up
b
Down
Interval from onset of jaw lowering for VI to onset of jaw lowering for V2
Figure 2. The articulatory intervals used for the analysis of (top) intrasegmental and (bottom) intersegmental relative timing.
3. Results Let us first look at the results for intrasegmental timing. Figure 3 shows plots for fricatives, stressed stops, and unstressed stops, respectively. Generally, the results show low correlations between the ratio of the part to the whole and the whole. For the fricatives, the slope was significantly different from zero for subject TB (t331 = 5.463) but not for subject FBB (t345 = 0.108). A similar pattern holds for the stressed stops, where subject TB has a slope different from zero (t 137 = 2.598) but FBB does not (t 146 = 1.57). For the unstressed stops, the slope did not differ from zero for either subject, with t 134 = 1.176 and t 91 = 1.43 for FBB and TB, respectively). Figure 4 shows representative plots of the result for intersegmental timing. Here, all slopes differed from zero, except for the sequence /bavvavv/ shown in Fig. 4(b). Furthermore, correlations were uniformly high. 4. Discussion The outcome of the constant proportion test does not show any evidence of constancy at the intersegmental level, where the hypothesis was rejected in 90% of the cases examined. At the intrasegmentallevel, the idea of constant proportionality was rejected only in 33% of the cases. This could possibly indicate a difference in gestural cohesion within and across segments. That is, the gestures forming a segment may show a greater degree of internal stability in the form of coherence of patterns of muscular activity and/or movement than those associated with different segments (cf. Lee, 1984, and Lofqvist, 1990, for discussions of gestural coherence).
347
Proportional timing
ic
0
TB r=0.29
FBB r=0.01
-~
~
y=0.001x + 44.21
80
:J "0
c
c 0
-.:;
70
·c tlc
60
y =-0.07X + 52.45
80
c
c ..c
(a)
70
u
60
0
u
en c
·c:
50
50 & &
C1J
a. 0
40
40
30
30
-"-
~
a.
B C1J
E
20+-----~-----.----~------.-----~
100
0
i-=
20+---~-----.----~----.----.----,
200
100
0
200
300
Constriction duration (ms)
ic
Q
.... ~
:J "0
f::J
"'0 :Q
120
. .... . .... .. ..
110
C1J
.
90
160
&
& &
.. .. &
........
.
(b)
y:-0.17X +132.02
.
&
c c
180
y=-0.07 X +104.12
en 100
a.
TB r=0.22
FBB r= 0 .13 130
& & &
..
140
.. .. ..
..
..
&
&
&
120
0
-"-
ro
80
.
C1J
a.
.... 0
..
.
70
100
. .
C1J
E i-=
60+-~--,-~---.-----.--~-.--~-.
40
...!!
~
c 0
120
:J "0
100
~
f::J
"'0
100
120
140
FBB r=0.1 y=0.15X +59.52
160
•
c. •
c
CIJ
60
40
60
c •• • c []
•• •
••
ro
•
[]
120 100
[]
Ccc•c cccoc • 00 occ 0
c
80
[]
cc
40
c
C1J
I-
60
[] []
•
. . .. c o•• cc c• []
[]
•
• 0 • •.•
•• •••
•
20+--.--.-~--.-~---.-----.--~-,
20
40
60
140 (c)
....0
.s
120
• •• •
[]
-"C1J
100
TB r=0.15 y=-0.16x +109.47
140
0
a.
80
Closure duration (ms)
[]
~ en c a.
80
c []
80
·c:
60
80+-~--,-~---.-----.--~-.--~-.
80
100
120 Closure duration (ms)
Figure 3. Results for (a) fricatives; (b) stressed stops; (c) unstressed stops . .._ : normal stressed; 6.: fast stressed; •: normal unstressed; 0 : fast unstressed .
• • •
348
t
A. Lofqvist bapap
babab
80
r=0.90
80
+ 45.87
y=0.09x
C\J
y=0.08x + 41 .95
••
>
~
.....
(a)
r=0.90
• •• ••
70
70
Ol
•
c L
•
~ 60
60
.Q
;:
.~
... 0
50
50
a
a a
a
> L
~Ol 40+-----,-----,-----,-----,-----,-----, c
·c v ;:
baffaff r=0.47 y=0.02x+42.66
bavvavv
r=0.23
.Q
.!5o
+ 56.84
y=-0.01x a .. .. a .. .. c 0 6
E
:? .....
[]
b.
6A
[]
~ 50 v
...c ...-c
40+---------.---~----.---~---.
6.
[]
a
•
a
...
•
r/' 8
60
•
50
0
•
0
40+---~-----.----~---,.---~----.
~ 80
bavav r=0.67
"'c0 '0 v E v ~
•
••
a []
a
60
A
bafaf
..• •
(c)
6
60
.... ......
AA A A A
..
L
~
•
•
50
••
Ol
·~ 50
•
r=0.55 y=0.06x+40.95
. •"'
70
aa
• • ••
o
.. .
·~
v
~
...
40+-----,-----,-------~----...-----~-----,
~
bam am
80
0
>L
••• •
.~ L
•
"'
40+---~~---r----r----.----~---,
70
y=0.05x=49.51
u
Iii
• . .. ••.. "' ..
........ . ..
0
~
0
(b)
r=0.86 y=0.08x +48.25
70
6
Ol
c
·c ~ 60 .Q
0
a 0
0
.~
bammamm
(d)
r=0.62 y=0.04x+49.66
70
~-
• • ••
60
. .... .. ~.
0
a
;:
o
•
40+---------.---------.---~---.
80
••• ••• •
•
6
0
[]
0
50
E 50
6.[]6
a
•
• . ...... • •
0
0 L
..... Iii >
...E~
40+----,r----.----~----.---~----.
100
200
300
400
40+----,-----.----~----.---------·
100
200
300
Interval from jaw lowering for V1 to jaw lowering for V2 (ms)
Figure 4. Results for (a) /babab/ and /bapap/; (b) /bavvavv/ and /baffaff/;
(c) /bavav/ and /bafaf/; (d) /bamam/ and /bammamm/ . A : normal stressed; !:::. : fast stressed; •: normal unstressed; D: fast unstressed .
400
Proportional timing
349
There is, however, considerable variability in the data. For example, the data for the fricatives in Fig. 3(a) do not show any consistent ratio between time to peak opening and constriction duration. Instead, this ratio varies greatly for a given constriction duration. Thus, this reanalysis does not show any support for the constant proportionality model in speech production. Similar results have been presented by Sock, Ollila, Delattre, Zilliox & Zohair (1988) based on acoustic measurements. A suggestion that the proper metric is phase rather than temporal intervals was made by Kelso, Saltzman & Tuller (1986) , who also presented some evidence in support of that notion. Further investigations have, however, obtained conflicting results (Nittrouer, Munhall, Kelso, Tuller & Harris, 1988). The speech timing data reported here do not provide any evidence for proportional timing in speech motor control. Thus, proportional timing does not appear to be a characteristic feature of speech, nor of any other area of motor control that has been investigated so far (cf. Wann & Nimmo-Smith, 1990, for a recent study of handwriting). We should add the necessary caveat that the peripheral events measured in this and other studies (i.e ., onsets and offsets of movements) may not be the ones that are relevant from the point of view of motor control; in addition, they may be difficult to locate in kinematic records. Furthermore, Wing (1980) and Heuer (1988) argue that the proportional duration model may still be valid for the central planning of timing while not necessarily for the timing of the observed peripheral movements. The reason given is that different neuromuscular delays obscure the underlying proportionality. While possibly saving one version of proportional timing, this proposal makes testing the model considerably more difficult. While proportional timing is not an adequate description of speech movement sequencing, it is obvious that the temporal control of speech movements exhibits some form of stability and scaling. For example, Gracco (1988) and Gracco & Abbs (1989) found a constant patterning of peak velocities for the articulators participating in the formation of the closure for a labial voiceless stop. That is, the peak velocity of the upper lip preceded that of the lower lip, which in its turn preceded that of the jaw. This pattern was maintained even when a mechanical perturbation was applied to the lower lip before the closure (Gracco & Abbs , 1989) . On the other hand, at the release of the stop, no such consistent pattern was found (Gracco, 1988). Future studies of speech motor control will have to clarify the rules governing such scaling relationships.
References Benoit, C. (1986) Note on the use of correlation in speech timing, Journal of the Acoustical Society of America, 80, 1846-1849. Fitch, H . (1981) Distinguishing temporal information for speaking rate from temporal information for intervocalic stop consonant voicing, Haskins Laboratories Status Report on Speech Research, SR-65, 1-32. Gentner, D. (1987) Timing of skilled movements: Test of the proportional duration model, Psychological Review, 94, 255-276. Gracco, V. (1988) Timing factors in the coordination of speech movements, Journal of Neuroscience, 8, 4628-4639 . Gracco, V. & Abbs, J . (1989) Sensorimotor characteristics of speech motor sequences, Experimental Brain Research, 75, 586-598. Grillner, S. (1975) Locomotion in vertebrates: Central mechanisms and reflex interaction, Physiological Reviews, 55, 247-304.
350
A. Lofqvist
Heuer, H. (1988) Testing the invariance of relative timing: Comment on Gentner (1987). Psychological Review, 95,552-557. Kelso, J. A. S., Saltzman, E. & Tuller, B. (1986) The dynamical perspective on speech production: Data and theory, Journal of Phonetics, 14, 29-59. Lee, W. (1984) Neuromotor synergies as a basis for coordinated intentional action, Journal of Motor Behavior, 16, 135-170. Lisker, L. (1986) "Voicing" in English: A catalogue of acoustic features signalling /b/ versus /p/ in trochees, Language and Speech, 29, 3-11. Liifqvist, A. (1986) Stability and change, Journal of Phonetics, 14, 139-144. Liifqvist, A. (1990) Speech as audible gestures. In Speech production and speech modelling (W. Hardcastle & A . Marchal, editors), pp. 289-322. Dordrecht: Kluwer. Liifqvist, A. & Yoshioka , H. (1984) Intrasegmental timing: Laryngeal-oral coordination in voiceless consonant production, Speech Communication, 3, 279-289. Miller, J. (1981) Effects of speaking rate on segmental distinctions. In Perspectives on the study of speech (P. Eimas & J. Miller, editors), pp. 39-74. Hillsdale, NJ : Lawrence Erlbaum. Miller, J. & Grosjean, F. (1981) How the components of speaking rate influence perception of phonetic segments, Journal of Experimental Psychology: Human Perception and Performance, 1, 208-215. Munhall, K. (1985) An examination of intra-articulator relative timing, Journal of the Acoustical Society of America, 78, 1548-1553. Nittrouer, S., Munhall, K., Kelso, J. A. S., Tuller, B. & Harris, K. S. (1988) Patterns of interarticulator phasing and their relation to linguistic structure, Journal of the Acoustical Society of America, 84, 1653-1661. Port, R. (1979) The influence of tempo on stop closure duration as a cue for voicing and place, Journal of Phonetics, 1, 45-56. Schmidt, R. (1975) A schema theory for discrete motor skill learning, Psychological Review, 82, 225-260. Shapiro, D., Zernicke, R., Gregor, R. & Diestel, J. (1981) Evidence for generalized motor programs using gait pattern analysis, Journal of Motor Behavior, 13, 33-47. Shik, M. & Orlovsky, G. (1976) Neurophysiology of locomotor automatism, Physiological Reviews, 56, 465-501. Sock, R., Ollila, L. , Delattre , C., Zilliox, C. & Zohair, L. (1988) Patrons de phases dans le cycle acoustique de detente en fran~ais, Journal Acoustique, 1, 339-345. Summerfield, Q. (1981) Articulatory rate and perceptual constancy in phonetic perception, Journal of Experimental Psychology: Human Perception and Performance, 1, 1074-1095. Terzuolo, C . & Viviani, P. (1979) The central representation of learned motor patterns. In Posture and movement (R. Talbot & D. R. Humphrey, editors), pp. 113-121. New York: Raven Press. Tuller, B. & Kelso, J. A. S. (1984) The timing of articulatory gestures: Evidence for relational invariants, Journal of the Acoustical Society of America, 76, 1030-1036. Tuller, B., Kelso, J. A. S. & Harris, K. S. (1982) Interarticulator phasing as an index of temporal regularity in speech, Journal of Experimental Psychology : Human Perception and Performance, 8, 460-472. Viviani, P. & Terzuolo, C. (1980) Space-time invariance in learned motor skills. In Tutorials in motor behavior (G. Stelmach & J . Requin, editors), pp. 525-533. Amsterdam: North-Holland. Wann, J. & Nimmo-Smith, I. (1990) Evidence against the relative invariance of timing in handwriting, Quarterly Journal of Experimental Psychology, 42A, 105-119. Wing, A. (1980) The long and short of timing in response sequences. In Tutorials in motor behavior (G . Stelmach & J. Requin, editors), pp. 469-486. Amsterdam: North-Holland.