Journal of Phonetics (1980), 8, 427-438
Control of the voicing distinction for intervocalic stops and fricatives: some data and theoretical considerations Gary Weismer * Department of Speech and Hearing Sciences, Indiana University, Bloomington , Indiana 47405, U.S.A. Received 29th March 1979
Abstract:
The present paper represents an attempt to integrate some new data with previously published information on control of the voicing distinction for obstruents. The experiment reported herein deals with the duration of the voiceless interval associated with voiceless stops and fricatives in the intervocalic, prestressed position. The voiceless interval is defined as that time-segment during which the vocal folds are not vibrating. For stops, this interval will generally include the duration of the closure interval plus the voice-onset time (VOT). In the case of fricatives, the voiceless interval includes the duration of both the supraglottal constriction and any aspiration which precedes a following vowel. The results of this experiment demonstrated that the duration of the voiceless interval is independent of obstruent manner- and place-of-articulation. From these findings and consideration of previously reported fiberscopic and electromyographic data, it is concluded that the devoicing gesture is the same for stops and fricatives, and that the timing of this gesture is executed in a preprogrammed, ballistic fashion. Arguments are subsequently developed to the effect that (1) speakers do not, and need not, control VOT in a continuous fashion, and (2) speakers may sometimes structure supraglottal time programs in terms of laryngeal timing demands, as evidenced by control of the voicing distinction for fricatives.
Introduction One of the more complex aspects of constructing a model of speech production concerns the specification of timing relationships which characterize the coordination of glottal and supraglottal adjustments. Of special interest are the various glottal-supraglottal timing patterns associated with obstruents, which physiological patterns may correspond closely and more or less discretely to the potential (phonological) voicing characteristics of a given obstruent. The seminal acoustic work of lisker & Abramson (1964, 1967) explored one means by which glottal-supraglottal timing could be described for stop consonants; much of the more recent research has attempted to obtain a more direct physiological description of these events, either through the use of electromyography (EMG) (reviewed in Hirose, *Present address: Speech Motor Control Laboratories, Waisman Center, University of Wisconsin, Madison, Wisconsin 53706, U.S.A.
0095-4470/80/040427
+ 12 $02.00/0
© 1980 Academic Press Inc. (London) Ltd.
428
G. Weismer
1977a, b) or by fiberoptic studies of laryngeal adjustment (Sawashima . 1970 , 1977; Lindqvist , 1972; Benguerel , Hirose , Sawashima & Ushijima , 1978). When the literature on glottal-supraglottal timing for obstruents is examined , it is perhaps not surprising that the overwhelming majority of work has focussed on stop consonants. Part of this concentration results, no doubt, from the development of the measure voiceonset time (lisker & Abramson , 1964), which was shown to be useful in distinguishing voicing categories of homorganic stops in a variety of languages. More specifically, the study of stops may be appealing in a cross-linguistic sense since , for this class of obstruents, different languages make various and sometimes unique uses of the voicing dimension at a given place of articulation. lisker & Abramson (1967 , p. 2) have noted that an" . .. apparent affinity between stops and distinctive voicing . .. "is good reason for studying the acoustic correlates of voicing in stops. These observations should not imply , however , that the study of glottal-supraglottal timing in fricatives is somehow uninteresting . Indeed , it would seem to be of great interest to determine whether the coordination of glottal and supraglottal events for fricatives is similar or different from the corresponding coordination for stops. One problem which might arise when attempting to make such a comparison is the selection of some measure of glottalsupraglottal timing which is not manner-specific . As pointed out by at least one author (lieberman, 1967, p. 108) , the notion of voice-onset time is relevant to the production of stops only ; the measure cannot be applied unambiguously to fricatives , since the discrete kind of release observed for stops is not typical of fricatives . Some investigators (Eilers & Minifie, 1975 ; Massaro & Cohen, 1976) have , in fact, used the term "voice-onset time" to refer to glottal-supraglottal timing for fricatives , but in a way that is not comparable to the conventional application of the measure to stops .1 A few studies have been reported which examine glottal- supraglottal timing for fricatives. Sawashima (1970) reported data based on fiberoptic observation of the glottis during connected speech (English), and concluded that the timing of glottal and supraglottal gestures for voiceless fricatives was similar to that for unaspirated stops. When glottal aperture was examined, however , the voiceless fricatives resembled more closely the aspirated stops. It is difficult to evaluate Sawashima's conclusions regarding the comparable glottal-supraglottal timing for fricatives and unaspirated stops , since no time -scale is provided with his data displays . lindqvist (1972) described some fiberoptic data in which voiceless fricatives appeared to have greater glottal aperture as compared to voiceless stops. In addition , the duration of the laryngeal abduction gesture for voiceless stops was said by lindqvist to be slightly shorter than for voiceless fricatives, although actual measurements were not reported. Klee, Weismer & Ingrisano (1976) attempted to compare the glottal- supraglottal timing for voiceless stops and fricatives by constructing ratios of the duration of the unvoiced segment to the duration of an entire CV syllable . The results suggested that the fricative and stop ratios were similar under certain conditions, but dissimilar under others. This normalization of glottal-supraglottal timing across obstruent manner-of-articulation can be criti'Both Eilers & Minifie (1975) and Massaro & Cohen (1976), in preparing synthetic CV stimuli, define voice-onset time for fricatives as the interval between the onset of frication and the onset of glottal pulsing. Massaro and Cohen go so far as to say that they chose this measure of VOT for fricatives " .. . because it is more comparable ... " (than other measures) " ... to the use of VOT for stop consonants" (p. 716). In what way the two measures are comparable is certainly not apparent on physiological grounds, since the fricative measure uses the onset of a supraglottal constriction as a reference point, whereas the stop measure is referenced to the release of the constriction.
Control of voicing distinction
429
cized, however, because it depends on the assumption that a CV syllable-form is basic to the organization of speech production. Obviously, it would be more desirable to compare across manner without depending on the much-debated notion of the CV as the "basic" syllable unit. 2 The purpose of the exploratory study described below was to obtain measures of the duration of the voiceless interval for both voiceless stops and fricatives in intervocalic, wordinitial position. The voiceless interval for voiceless stops will generally include the duration of the closure plus the voice-onset time. The voiceless interval for voiceless fricatives will generally include the duration of frication noise , which should correspond to duration of the supraglottal constriction, plus the duration of any aspiration which may follow the cessation of frication noise. 3 This measure simply provides an index of the period during which the vocal folds are not vibrating, and is technically independent of the manner-ofarticulation. The measure also does not depend on any assumptions concerning basic units in speech production. Following presentation of data, we consider in some detail theoretical aspects of voicing control for obstruents.
Method Subjects Nine adult males , ranging in age from 20-53 years, served as subjects in the present investi gation. The subjects' speech reflected a wide range of dialects , including those heard in the midwest, east and south-west of America. None of the subjects was familiar with the purposes of the investigation. Speech samples, procedures and data analysis Twenty-four C1 VC 2 words were constructed such that C 1 was either a voiceless stop or fricative, V was the vowel /1/, /U/ or fee/, and C2 was a voiced or voiceless stop. More specifically, 12 of the words began with fricatives (/f/, /sf and / f/, four words each) and 12 with stops (/p/, /t/ and /k/, four words each). The different vowels and final consonants were not manipulated systematically, but were adjusted so that English words would result in all cases. Each of the CVC words was printed on a separate card to yield a deck of 24 cards. This deck was presented to each subject, who was seated in an lAC audiometric booth with an Altec 628B microphone located approximately 12 in from his lips. All recordings of speech samples were made at a recording speech of 7! i.p.s. on a Crown series 700 tape deck. Subjects were instructed to produce each CVC word in the carrier phrase Say -- instead, at a speaking rate they considered to be representative of their conversational speech. When they had gone through the entire 24 card deck, subjects were instructed to shuffle the deck and repeat the protocol described above. In total , the entire deck was repeated five times by each subject. Broad-band spectrograms were prepared for each of the 1080 utterances collected in the present investigation (24 words x 5 repetitions x 9 subjects= 1080 utterances). The criteria for measurement of voiceless intervals for both stops and fricatives were selected in accordance with certain previously reported strategies which appear to yield reliable data. The 2 Kent & Minifie (1977, p. 131) provide an interesting discussion of the various types of syllable unit which could be considered as having organizational utility for speech production. ' Klatt (1975) reports that the production of [s] often includes a brief period (approximately 12 ms) of aspiration before a following vowel.
430
G. Weismer
interval to be measured was always located between the termination of /el / in Say and the onset of glottal pulsing for the vowel in the target word. Specifically, both the offset and onset of voicing-that is , the boundaries of the voiceless interval- were operationally defined as the final (offset) and initial (onset) vertical striations (glottal pulses) observed in the second and higher formants of the spectrographic display . This criterion was suggested by Klatt (1975) for measurement of voice-onset time (VOT) and was shown by Weismer (1979) to yield relatively small mean measurement errors for this same interval. For the present measurements, the extension of this criterion to voicing offset before voiceless stops and fricatives , and to voicing onset following voiceless fricatives, appeared to result in mean measurement errors which are comparable (i.e. 2-4 ms) to those reported for VOT by Weismer (1979). Results
The results obtained in the present investigation are summarized in Table I, which reports mean durations for the voiceless intervals of both stops and fricatives. Data for individual subjects as well as group means are presented; standard deviations are included for the group means only. When the group data are examined, it is obvious that the voiceless interval maintains a relatively constant value, regardless of place or manner of articulation . The range of voiceless interval durations across place and manner of production is 9 ms; statistical evaluation of pairwise comparisons 4 between means revealed no significant differences. It is also of interest to note that, within either of the manners of articulation, the voiceless interval does not show any obvious dependence on place of articulation as has been observed for VOT (Lisker & Abramson, 1964, 1967; Klatt, 1975; Weismer, 1979). This would seem to be consistent with observations made by Lindqvist (1972: p. 18), who noted that the timing of the laryngeal abduction gesture relative to supraglottal constriction was the same for /p/, /t/ and /k/ . The individual subject data are, for the most part, consistent with the group data described above. That is , most subjects produce voiceless intervals which are relatively constant in duration regardless of place or manner of articulation. For example, when the data for seven of the nine subjects (Sl, S2 , S3 , S4 , S5 , S6 and S9) are considered, the greatest range of voiceless interval durations across place and manner of articulation is 34 ms (S1 and S3). Two subjects (S7 and S8) produced wide ranges of voiceless interval durations (63 and 73 ms). The data of S7 are unusual in an additional sense, in that certain of the variances associated with his means are unusually large, whereas certain others are unusually small. It is interesting 4 In the present experiment, all statistical comparisons employed the error rate per experiment as the conceptual unit of statistical significance (Ryan, 1959). More specifically, the large number of potential comparisons in the present data implied that if each pairwise comparison were tested at a preselected alpha-level (say, 0.05) , the overall error rate for the entire experiment would be k.Oi , where k = the number of comparisons and Oi = the preselected level. Since there was a total of 150 potential pairwise comparisons, use of 0.05 for each comparison would produce an overall error rate that was intolerable. One approach to maintaining a reasonable error rate per experiment is to select an Oi-level (i.e. for the entire experiment) and test each comparison within the experiment at Oi/k (Dunn, 1961) . For the present experiment, than, we selected an error rate per experiment of 0.10. Thus, each comparison was tested by t-tests at 0.10/150 = 0.00066 (see Kirk, 1968, p. 84 for details on estimating critical t-values for non-tabled probability points). Because this approach sets the actual error level for an entire experiment (i.e . a " family" of comparisons: see Ryan, 1959) at 0.10, we should expect 15 of the 150 comparisons to be significant by chance alone.
Control of voicing distinction
431
Table I Mean durations for voiceless intervals of stops and fricatives Initial obstruent of target word f t k
p Group
X
s.d. S1 S2 S3 S4 S5 S6 S7 S8 S9
187 50 146 127 159 144 238 235 254 169 208
185 53 112 123 148 1 51 225 227 218 242 219
183 42 122 144 155 162 219 222 191 199 228
180 44 135 126 139 155 208 237 230 183 205
189 40 145 142 173 162 224 236 201 204 215
J 182 36 141 132 153 159 217 216 227 181 209
Standard deviations are presented for the group means only. All values are reported in ms. that if one "outlier" mean is discarded for both of these subjects (Xp for S7 and Xt for S8), the newly calculated ranges for voiceless interval durations (39 ms for S7 and 35 ms for S8) are much more consistent with the ranges demonstrated by other subjects . Table II Statistical summary of pairwise comparisons for individual subject data Subject
Statistically significant comparisons
1 2 3
None
4 5 6 7 8 9
/PI vs /t/; /t/ vs /sf /f/
vs
/sf; /t/
vs
/s/
None
/PI
VS
/f/
None
/PI VS /k/ /P/ VS /t f; /t/ VS /k/; /P/ VS /k/; /P/ VS /sf; jtj VS /f/; /t/ VS /sf; /t/ VS /JI /k/ VS /f/
All comparisons are based on t-tests (two tailed) ; statistical significance is set at P < 0.00066 (see footnote 4)
A summary of the statistical analyses performed on the individual subject data is provided in Table II. Three subjects failed to provide a statistically significant comparison, and five others provided either one or two significant comparisons. The remaining subject , S8, produced data in which seven significant comparisons were identified. A close examination of the significant comparisons identified in Table II fa ils to reveal any kind of systematic trend. That is, the comparisons appear to be equally divided between within- and acrossmanner differences . Moreover, altho ugh one across -manner comparison (/t/ vs /sf) is statistically significant in the data of three subjects (Sl, S3 and S8), it does not necessarily follow that the direction of the difference is the same for all three subjects (compare in Table I the XcXs difference for S8 to that for Sl and S3). In summary , the individual subject data suggest that most subjects produce a voiceless
432
G. Weismer
interval duration that is independent of place and manner of production. Whereas a substantial range of voiceless interval durations exists across subjects (e.g. compare S2 to S5), the duration of the interval within subjects appears to be relatively consistent. The range across subjects is attributable, no doubt , to individual differences in speaking rate. Discussion Based on the results of the present investigation, we conclude that the time course of the voiceless interval is essentially the same for both stops and fricatives, and that the duration of the voiceless interval is not a function of place of articulation . Our conclusions to these effects are based on a global analysis of the statistical results. Specifically, pairwise comparisons within the conversational rate data revealed only 14 statistically significant differences, whereas 15 significant differences should be expected by chance alone. It is the case, however, that it is not possible to determine which comparisons within a set of statistically significant differences are "real" effects, as opposed to statistical artifacts resulting from chance variation in the data. One obvious way to resolve this dilemma is to repeat an experiment and see if particular effects continue to emerge throughout replications. That is, chance variation across experiments should "smooth out" statistical artifacts but affect "real" differences in only a small number of cases. If the data from each of our nine subjects is considered as a replication of the current experiment, there is no evidence that any "real" effects exist. Thus, the basic conclusion of a voiceless interval duration which is independent of obstruent manner- and place-of-articulation is supported both by the relatively small number of statistically significant comparisons and the non-systematic distribution of those few significant differences. We interpret the independence of the voiceless interval duration from obstruent placeand manner-of-articulation as indicating-at least for obstruents in intervocalic prestressed position-a laryngeal gesture for devoicing that is essentially independent of other obstruent features (i.e. such as place and manner). This interpretation is based on a comparison of published fiberscopic traces of laryngeal activity for voiceless fricatives (Lindqvist, 1972, Fig. 1-B-5) and voiceless stops (Benguerel et al. 1978 , p. 182; Lindqvist, 1972, Fig. 1-B-6; Sawashima, 1977, p. 43), as well as general observations made by Sawashima (1970) and Benguerel et al. (1978) . More specifically , the fiberoptic traces show an abduction-adduction gesture for both stops and fricatives that is characterized by smooth increases and decreases in glottal area, and little or no plateau at the point of maximum glottal width. In other words, the abduction-adduction gesture appears to be an integrated one, probably controlled by a programmed reciprocity between the posterior cricoarytenoid and interary tenoid muscles (Hirose, 1977a, b). With regard to the timing of this integrated gesture relative to supraglottal articulation, it appears that separation of the arytenoids is coincident with the supraglottal closure for both voiceless (aspirated) stops and voiceless fricatives (Sawashima, 1970; Lindqvist, 1972 ; Benguerel et al. 1978);moreover, there is evidence (Sawashima, 1970) that the maximum glottal width attained during the abduction-adduction gesture is approximately the same for voiceless (aspirated) stops and voiceless fricatives in prestressed position ( cf. Lindqvist, 1972, p. 17). These observations plus the present data support the notion suggested above of a laryngeal devoicing gesture that is independent of place and manner features of obstruents. A similar conclusion has been stated by Benguerel et al. (1978).
Control of the voicing distinction for stops The "programmed reciprocity" between the posterior cricoarytenoid and interarytenoid muscles would seem to be of such nature that the timing of the abduction-adduction sequence
Control of voicing distinction
433
is preprogrammed, at least for voiceless obstruents in the intervocalic, prestressed position. In other words, the lack of a plateau in the glottal area function, or the continuous nature of the laryngeal devoicing gesture, does not support the notion of independently controlled abduction and adduction gestures during the production of voiceless obstruents. 5 This view, if valid, appears to present some difficulties for accounts of speech production in which speakers are alleged to exercise some control over VOT 6 (for example, see Kewley-Port & Preston, 1974; Cooper, 1977; Port & Rotunno, 1979). Cooper (1977) represents this position most explicitly when reviewing the account for the developmental data reported by KewleyPort & Preston, who found long-lag and prevoiced VOTs to appear later than short-lag VOTs. Specifically , Cooper states: For short lag VOTs, a speaker must simply adduct the vocal folds at some time prior to the actual release of consonant closure and keep the folds relatively slack. As the consonant closure is released, a sufficient drop in pressure across the larynx occurs ... enabling vocal fold vibration to begin shortly after release. The point ... is that a relatively broad temporal interval exists during which the speaker can issue a motor command to adduct the vocal folds in order to produce a short-lag VOT value . . . For long lag VOTs, the timing of vocal fo ld adduction must be initiated at a relatively precise moment in relation to the consonant release . .. for a long-lag VOT, the vocal folds must be in an abducted or spread position . .. Th e com mand to adduct the vocal folds must be initiated so that the folds will be approximated typically within 80 msec after the release (1977: pp. 362-363, emphasis added here). It is not clear whether Cooper means to restrict this explanation to stops in utterance-initial position (i.e. the position of the stops in the Kewley-Port & Preston experiment), but the present interpretation of the facts would seem to apply to this position also, even though our focus is on stops in the intervocalic position. For voiced (short-lag) stops in utteranceinitial position, we would agree with Cooper that a speaker can bring the vocal folds together at a variety of times during the closure interval; for these same stops in the intervocalic position, most investigators have observed vocal-fold vibration to continue throughout the closure interval (Sawashima, 1970; Undqvist, 1972; Benguerel eta!. 1978), so the timing issue is essentially unimportant here. In the case of voiceless stops, however, we would argue that the notion of precise timing relationships between the laryngeal and 5 This does not mean to imply that speakers are incapable of controlling the laryngeal devoicing gesture, only that in most cases the abduction-adduction sequence can be "run off" as a unified, integrated gesture. Evidence that this gesture is, under certain circumstances, controllable comes from French, in which the devoicing gesture for voiceless geminates is associated with a plateau in the glottal area function (Benguerel eta/., 1978). 6 The most extreme example of this position is provided by Port & Rotunno (1979) , wherein temporalimplementationru/es are invoked to explain contextdetermined variation in VOT (cf. Weismer, 1979). It should be pointed out here that the notion of VOT as an interval that needs to be "controlled" by a speaker is a relatively recent one. In the present view, Lisker and Abramson (1964, 1967) did not invoke the interval they labeled voice-onset time to discuss control of the speech mechanism, but rather as a measure which might (and did) distinguish between homorganic categories. In fact our conclusions are not dissimilar to a statement made by Lisker and Abramson to the effect that" . .. a relatively complicated acoustic output is dependent upon the rather simple matter of varying the area of the glottis." (1964, p. 415).
434
G. Weismer
supraglottal gestures is probably overstated. Data obtained by Lindqvist (1972, p. 20) seem to suggest that the devoicing gesture used for intervocalic, voiceless stops is also often present for utterance-initial voiceless stops. That is, in utterance-initial position the glottal gesture for voiceless stops is not one in which the vocal folds simply move from the respiratory position to the adducted position ; rather, an abduction-adduction sequence is observed during the utterance-initial closure interval. The occurrence of this sequence, both in the utterance-initial or intervocalic position, should produce the delay in voicing following release of the stop which we associate with "voicelessness". There is no need to time the adduction of the folds relative to the stop release, because the presence of the laryngeal devoicing gesture, whose onset is synchronized with the onset of the supraglottal constriction and whose time course is preprogrammed (see above), ensures the required delay of voicing when the closure interval duration is similar to that which would be observed for prestressed voiced stops. 7 Thus, VOT as such need not be controlled, at least in any continuous fashion; instead, control of the voicing distinction for stops (as well as other obstruents) is a simpler matter of whether or not the laryngeal devoicing gesture occurs. The resulting VOT differences are, in a sense, "byproducts" of the dichotomous laryngeal behavior. There are additional reasons for believing that speakers need not control VOT as they may control other phonetic intervals (e.g. vowels). Pisoni (1977) has presented data which demonstrate that when a temporal interval between two separate acoustic events is varied continuously, listeners are not sensitive to the continuous changes. Rather, listeners appear to respond to the temporal relations between two acoustic events in a more nearly binary fashion, their sensitivity allowing decisions of either simultaneous (i.e. when the separation is less than 20-25 ms) or non-simultaneous (i.e. when the separation is greater than 20-25 ms) onset of the two events. Thl:!S, according to Pisoni (1977, p. 1360), " .. . the only perceptual change to which the listener is sensitive appears to be the presence or absence of a discrete attribute rather than the magnitude of difference between events." In the perception of VOT in human (as opposed to machine) speech, the two events which define the VOT interval would be the stop burst and the onset of glottal pulsing for the following vowel. If Pisani's findings can be generalized to human speech signals, then separations between the burst and glottal pulse of less than approximately 20-25 ms should sound qualitatively different from separations greater than 20-25 ms. We know from perceptual experiments (Abramson & Lisker, 1965, 1970), of course, that this qualitative difference will correspond to the voiced/voiceless distinction for a homorganic stop pair. But these constraints on our perceptual apparatus suggest that a speaker gains nothing by the ability to "control" VOT as other phonetic intervals may be controlled. All that is required of a speaker is the ability to produce VOTs greater or less than approximately 20 ms. As stated above, the la"ryngeal devoicing gesture seems to ensure the former, whereas the lack of this gesture generates the latter. One might argue that the systematic relationships observed between VOT and stop place-of-articulation (Lisker & Abramson, 1964, 1967; Klatt, 1975; Weismer, 1979) suggest a potential perceptual cue to identification of place, which in turn may require speakers to exercise some control over VOT. This possibility can be rejected, however, since the place-conditioned, average differences in VOT are relatively small (~ 20 ms in sentential material: see Lisker & Abramson, 1967, p. 19), and the associated ranges show a good deal 7 Closure durations for· prestressed voiced and voiceless stops have been shown to be essentially the same (Lisker, 1972).
Control of voicing distinction
435
of overlap. The place effect can probably be explained on aerodynamic grounds, since the pressure drop across the glottis necessary for the initiation of vocal fold vibration may develop more slowly following the release of smaller volumes of air (i.e. with more back places of articulation). The systematic changes in VOT which accompany such factors as stress (Lisker & Abramson, 1967) and phonetic context (Weismer, 1979; Port & Rotunno, 1979) can be attributed to changes in a particular characteristic of the laryngeal devoicing gesture, rather than a concious control of the interval between the stop burst and first glottal pulse. Specifically, there is evidence which suggests that the maximum amplitude of the laryngeal devoicing gesture (i.e. the maximum glottal width) is greater in stressed as compared to unstressed stops 8 (Benguerel et al., 1978). If the velocity of the abduction and adduction gestures is not affected greatly by stress (Benguerel et al.), this could explain the greater VOTs observed when stops are in the prestressed position. 9 In addition, we have suggested previously (Weismer, 1979) that the longer VOTs observed before tense (as compared to lax) vowels or final voiced (as compared to voiceless) consonants may be due to small but systematic differences in stress which are tied to the duration of the vowel. Conceivably, these stress differences could be reflected in increased amplitude of the devoicing gesture, which would result in greater VOTs (see footnote 9). Finally, it is possible that stress-related changes in VOT could result from changes in the relative timing between the onsets of the devoicing gesture and closure interval. Lindqvist (1972) reports that the onset of the devoicing gesture for poststressed voiceless stops occurs up to 20 ms earlier relative to the closure interval when compared to voiceless stops in prestressed position. If the devoicing gesture is assumed to be otherwise constant (i.e. with respect to maximum glottal width and abduction/adduction velocities), this timing difference would account for greater VOTs for stops in prestressed position. Lindqvist's findings on this matter, however, were not replicated by Benguerel et al. (1978), and additional research is needed to clarify the issue.
Control of the voicing distinction for fricatives As stated in the introduction, the voicing distinction for fricatives has not received much attention in the experimental literature. The relatively recent papers by Cole & Cooper (1975) and Haggard (1978), however, suggest some interesting aspects of fricative perception and production which bear on this issue. Cole & Cooper (197 5) were able to demonstrate that perception of the voicing distinction for fricatives in syllable-initial position could be • Lindqvist (1972) also seems to suggest this in her report (p. 17), but later (p. 20) apparently reverses this position. 9 This is certainly a more complicated issue than is suggested here, since stress differences should also result in systematic differences in stop-closure duration (Lisker, 1972; Umeda, 1977). It is reasonable, then, to expect stress-conditioned increases in glottal width to be accompanied by increases in closure duration (see Benguerel et al., p. 184 ). Thus the potential lengthening effect on VOT due to increased glottal width could be offset by the increased closure duration. What seems to be needed is an experiment that would explore the relationship between maximum glottal width, closure duration, and VOT in stressed and unstressed stops. It could be that stress-related differences in glottal width would produce greater temporal effects (i.e. on the overall duration of the devoicing gesture) than would differences in closure duration. The data of Coker & Umeda (1974) showing a positive relationship between closure duration and VOT are not necessarily relevant here, since that relationship includes boundary and position effects not considered in this paper.
436
G. Weismer
cued by differences in fricative length alone; longer fricative durations were perceived as voiceless, shorter durations as voiced. Haggard's (I 978) productive data seem to support the notion that fricative duration might be a sufficient, and in some cases necessary, cue to voicing since many of the phonologically voiced fricatives spoken by his subjects were phonetically devoiced. This devoicing does not imply that phonologically voiced fricatives are produced with the laryngeal devoicing gesture discussed in detail above, but rather with the vocal folds in an adducted or slightly abducted position (see Sawashima, Abramson, Cooper & Lisker, 1970; but cf. Lindqvist, 1972, p. 14). There is at least some data which suggest that speakers produce voiced and voiceless fricatives with substantially different durations, so that if phonologically voiced fricatives are phonetically devoiced listeners will still have a reliable cue to fricative voicing. The single speaker employed by Umeda (I 977) produced voiced fricative durations in the intervocalic, prestressed position that were approximately 40 ms shorter than voiceless fricatives in the same position. If this result could be demonstrated for a number of different speakers, it may be reasonable to state with certainty that speakers control the voicing distinction in fricatives by adjusting the relative duration of the supraglottal constriction. It is interseting to speculate as to why voiced and voiceless fricatives might have durations which are so dissimilar. After all, since voiced fricatives are not always devoiced, the voicing opposition could be maintained in principle by the presence or absence of vocal fold vibra tion with equivalent durations of the supraglottal constriction. If we assume, however , that these equivalent durations would be in the range of values reported by Umeda for voiced fricatives 10 (68-72 ms), then production of voiceless fricatives with the laryngeal devoicing gesture described above should yield a substantial period of aspiration. This would follow from the present data, in which the voiceless interval-which, as measured here is probably somewhat longer than the duration of an abduction-adduction sequence which would be derived from fiberscopic data-is always in excess of 100 ms. If voiceless fricatives were of relatively short duration and were heavily aspirated, they may be easily mistaken in perception as voiceless stops. The greater length of voiceless fricatives may therefore be motivated by a need to "fit" the supraglottal constriction to the time course of the devoicing gesture. In this regard, we agree with Lisker (1974, p. 241) that it may not be appropriate to assume " ... that the articulatory program for a consonant involving oral occlusion is independent of whether the consonant is voiceless, oral and voiced, or nasalized and voiced." The laryngeal time program may in fact dictate certain aspects of supraglottal timing, rather than the commonly assumed opposite case (Lisker, 1974).
Conclusion We have presented data in this paper which indicate that the time course of the devoicing gesture for obstruents is independent of manner and place of articulation. Further consideration of fiberscopic and electromyographic data suggests that the actual devoicing gesture is similar for stops and fricatives, and that the overall time course of the abduction and adduction sequence is probably preprogrammed. Based on those considerations we would describe the gesture as "ballistic" in the sense used by Rothenberg (1968, pp. 47-48). These considerations are subsequently brought to bear on theoretical problems concerning control of obstruent voicing. In the case of stops it is argued that VOT need not be controlled in the 10 We chose to assume that all fricatives might have the durations of voiced fricatives, and not voiceless fricatives, since in Umeda's data the duration of the voiced members is very similar to the duration of another class of obstruents, that is, voiced and voiceless stops.
Control of voicing distinction
437
sense suggested by certain authors. Rather, the presence or absence of the devoicing gesture during the closure interval will ensure post -release aspiration or the lack thereof (see Kim, 1970). Control of the voicing distinction for fricatives seems to be demonstrated by different durations of the supraglottal constriction. This difference in the duration of voiced and voiceless fricatives seems to be motivated by certain demands of the laryngeal time program . Thus, although the voicing distinction for English stops is dependent on the presence or absence of a substantial period of aspiration, whereas that for fricatives is tied to the relative length of the supraglottal constriction, both phenomena are ultimately the result of the same laryngeal devoicing gesture. References Abramson, A. S. & Lisker, L. (1965). Voice onset time in stop consonants: Acoustic analysis and synthesis. ln Proceedings of th e Fifth International Congress on Acoustics. (Commins, D. E., Ed.) Liege: Imp. G. Thone , A51. Abramson, A. S. & Lisker, L. (1970). Discriminability along the voicing continuum: Cross-language tests. Proceedings of the Sixth International Congress of Phonetic Sciences, Prague, 1967. Pp. 569-573 . Prague: Academia. Benguerel, A. -P., Hirose, H., Sawashima, M & Ushijima, T. (1978). Laryngeal control in French stop production: A fiberscopic, acoustic and electromyographic study. Folia Phoniatrica, 30, 17 5 - 198. Cole, R. A. & Cooper, W. E. (1975) . Perception of voicing in English affricates and fricatives. Journal of the Acoustical Society of America, 58, 1280 - 1287. Cooper, W. E. (1977). The development of speech timing. ln Language Development and Neurological Theory. (Segalowitz, S. J. & Gruber, F . A., Eds). Pp. 357-373. New York: Academic Press. Dunn, 0. J. (1961) . Multiple comparisons among means Journal of the American Statistical Association, 56, 52-64. Eilers, R. E. & Minifie, F . D. (197 5) . Fricative discrimination in early infancy. Journal of Speech and Hearing Research, 18, 158 - 167. Haggard, M. (1978). The devoicing of voiced fricatives. Journal of Phonetics, 6, 95-102. Hirose, H. (1977a). Laryngeal adjustments in consonant production. Phonetica 34, 289 - 294. Hirose, H. (1977 b). Electromyography of the larynx and other speech organs. ln Dynamic Aspects of Speech Production. (Sawashima, M. & Cooper, F. S., Eds) . Tokyo: University of Tokyo Press. Kent, R. D. & Minifie, F. D. (1977). Coarticulation in recent speech production models. Journal of Phonetics, 5, 115 - 133. Kewley-Port, D. & Preston, M.S. (1974). Early apical stop production: A voice onset time analysis. Journal of Phonetics, 2, 195-210. Kim, C -W. (1970) A theory of aspiration. Phonetica, 21, 107-116. Kirk, R. E. (1968). Experimental Design: Procedures for the Behavioral Sciences. Belmont, California: Brooks/Cole Co. Klatt, D. H. (197 5) . Voice-onset time, frication, and aspiration in word-initial consanant clusters. Journal of Speech and Hearing Research, 18, 686-706. Klee, T. M., Weismer, G. & lngrisano, D. R. (1976). Laryngeal timing constraints in plosive-vowel and fricative-vowel syllables, or: is VOT really the best measure of glottal-supraglottal timing? Paper presented at the 92nd meeting of the Acoustical Society of America, San Diego, CA. Lieberman, P. (1976). Phonetic features and physiology. Journal of Phonetics, 4, 91-112. Lindqvist, J. (1972) . Laryngeal articulations studied on Swedish subjects. Quarterly Progress and Status Report , Speech Transmission Laboratory, Research Institute of Technology, Stockholm 2-3, 10-27. Lisker, L. (1972). Stop duration and voicing in English. In Papers in Linguistics and Phonetics to the Memory of Pierre Delattre.. (Valdman, A., Ed.) . The Hague: Mouton. Lisker, L. (1974). On "explaining" vowel duration variation. Glossa, 8, 233-246. Lisker, L. & Abramson, A. S. (1964). A cross-language study of voicing in initial stops: acoustical measurement. Word, 20, 384-422. Lisker, L. & Abramson, A. S. (1967) . Some effects of context on voice onset time in English stops. Language and Speech , 10, 1-28. Massaro, D. W. & Cohen, M. M. (1976). The contribution of fundamental frequency and voice onset time to the /zi/-/si/ distinction. Journal of the Acoustical Society of America, 60, 704 - 718. Pisoni, D. B. (1977) . Identification and discrimination of the relative onset time of two component tones: Implications for voicing perception in stops. Journal of the Acoustical Society of America, 61, 1352-1361.
438
G. Weismer
Port, R. F. & Rotunno, R. (1979) . Relations between voice-onset-time and vowel durat ion, Jou rnal of the Acoustical Society of America, 66, 654-662. Rothenberg, M. (1968) . The Breath-stream Dynamics of Simple-released-plosive Produ ction. Bibliotheca Phonetica, 6. Ryan , T. A. (1959). Multiple comparisons in psychological research. Psychological Bulletin, 56 , 24 - 4 7. Sawashima, M. (1970) . Glottal adjustments for English obstruents. Status Report Speech Resear ch: Haskins Laboratories, SR-21 /22, 187 - 200. Sawashima, M. (1977) . Fiberoptic observation of the larynx and other speech organs. In Dy namic Aspects of Soeech Production. (Sawashima , M. & Cooper, F. S. eds.). Tokyo : University of Tokyo Press. Sawashima, M., Abramson, A.S., Cooper, F. S. & Lisker, L. (1970). Observing laryngeal adjustments during running speech by use of a fiberoptics system. Phonetics, 22, 193 - 201. Umeda, N. (1977) . Consonant duration in American English. Journal of the Acoustical Society of America, 61, 846-858. Umeda, N . & Coker, C. H. (1974). Allophonic variation in American English. Journal of Phonetics, 2, 1-5. Weismer, G. (1979). Sensitivity of voioe-onset time (VOT) measures to certain segmental features in speech production. Journal of Phonetics , 7, 197- 204.