Vowel length and vowel transition: cues to [± voice] in post-vocalic stops

Vowel length and vowel transition: cues to [± voice] in post-vocalic stops

Journal of Phonetics (1983) 11,407-412 Vowel length and vowel transition: cues to [ ± voice] in post-vocalic stops Thomas Walsh and Frank Parker Loui...

2MB Sizes 0 Downloads 47 Views

Journal of Phonetics (1983) 11,407-412

Vowel length and vowel transition: cues to [ ± voice] in post-vocalic stops Thomas Walsh and Frank Parker Louisiana State University, Interdepartmental Linguistics Program, Allen Hall, Baton Rouge, Louisiana 70803, U.S.A . Received 20th June 1983

Abstract

Raphael ( 1972) demonstrates that for his synthetic tokens, listeners utilize vowel length as a cue to[± voice] in a following stop, in a continuous manner. That is , there is a range of vowel durations above which listeners will predict a following [ + voice] stop and below which they will predict a following [-voice] stop. However, within this range listeners' predictions will vary depending on the spectral configuration of the vowel: a falling F 1 cues a [ + voice] stop; a level F 1 cues a [-voice] stop. The present study corroborates the findings of Raphael ( 1972) using real speech tokens, although the critical durations are shifted downward 70-80 ms. It is concluded that since the vowel transition serves as the operative cue during the intermediate range of vowel durations, then the only possible function of vowel length as a cue is either to reinforce or to override the transition cue at extremely long and short vowel durations. In other words, vowel length is (at best) a redundant cue or (at worst) a misleading one.

Raphael (1972) presents compelling evidence that vowel length can function as a cue to phonological voicing in a following stop in English. That is, the longer the vowel, the more likely a speaker is to identify a following stop as [+voice V However, half of Raphael's synthetic stimuli also contained a vocalic F 1 transition, which appeared to function as the primary voicing cue in vowels of intermediate duration (i.e. approximately 200-280 ms in Raphael's experiment). For example, at a vowel duration of approximately 245 ms, 80% of the st imuli containing the F 1 transition were identified as ending in a [+voice] stop and 80% of those without the F 1 transition were identified as ending in a [-voice] stop. Since the two sets of stimuli were identical except for the presence or absence of the F 1 transition , it is clear that in these cases the speakers could be responding only to the vowel transition itself. Data from other studies using real speech as stimuli have supported the position that, for normal adult speakers, the primary cue to [±voice] in a post-vocalic stop is indeed the spectral characteristics of the vowel transition. (See, for example, Revoile, Pickett, Holden & Talkin, 1982 .2 ) These studie s, however, were based on artificial modification of vowel 1 The literature, we feel, evidences a great deal of confusion regarding the distinction between phonological and physiological voicing, in particular, and between psychological and physical phenomena, in general. For discussion, see Parker (1977), Repp (1981), and Walsh & Parker (198lb). 2 In addition, the data presented in O'Kane (1978), Raphae l (1981), and Krause (1982) can be interpreted as supporting this hypot hesis, although none of the authors argue specifically that the vowel transition contains the primary cue to [ ± voice] in post-vocalic stops. For a reinterpretation of O'Kane's data, see Walsh & Parker (1981a) and Revoile et al. (1982). For a more general discussion of these studies, see Walsh and Parker (in preparation).

0095-44 70/83/040407+06$03 .00/0

© 1983 Academic Press Inc . (London) Ltd.

408

T. Walsh and F. Parker

length. For example, O'Kane (1978) varied the length of the vowel by incrementally truncating the vowel from right to left. That is, he cut away the spectral properties of the vowel transition at the same time he reduced vowel length . If he had shortened the vowel from the left rather than from the right, this would have had the effect of shortening the vowel without removing the transition into the post-vocalic stop. Thus, given the limitations of synthetic and artificially modified speech, it is still not certain what acoustic characteristics of real speech serve to cue [± voice] in post-vocalic stops in the absence of release and closure. In order to investigate the relationship between vowel length and vowel transition as cues to [±voice] in real speech, we designed a perceptual experiment in which there was considerable overlap of vowel durations before the final stops /p/ and /b/ . We recorded a single speaker reading the nonsense syllables /gEp/ and /gr.b/ in two different frames. First, each syllable was read five times in groups of three , e.g. /gEp/- /gEp/- /gr.p/ . Then each syllable was read five times in the frame "Take a_a day". This procedure yielded 20 tokens of each syllable. Since the vowels in the group of three repeated syllables were, on the average, longer than those in the same syllable in the sentence frame, it was possible to obtain a considerable range of overlapping vowel durations before /p/ and /b/ . Spectrograms were made of each token, and by careful selection we were able to construct a perception tape containing ten vowels before /p/ ranging from 90 to 160 ms and ten vowels before /b/ ranging from 120 to 265 ms, without any artificial adjustment to vowel length. This resulted in an overlap of 40 ms with three pairs of vowels having identical durations. (See Table I for a complete list.) By means of the gate on the spectrograph we then isolated the /gr./ portion of each token. The closure and release (if any) of the final stop (/p/-/b/) was removed from each token, so that the stimuli consisted solely of 20 tokens of /gE/. With this set of stimuli we would be able to compare previous findings with those obtained from using unmodified vowels drawn from real speech. The tape was played to 57 undergraduates who were forced to choose between /gEp/ and /g~:.b/ upon presentation of each stimulus. (Since the subjects were linguistically naive, the possible responses were represented on the answer sheets as gepp and gebb, respectively.) The results of the perception experiment are given in Table II. In general, our results confirm those of Raphael, although the crossover points (i.e. the point at which a stimulus is judged [+voice] x% of the time) appear to be shifted downward in real speech. In other words, the percentage of [+ voice] responses tends to vary directly with vowel length ; however , the vowel length required to elicit a given percentage of [+voice] responses appear to be 70-80 ms shorter in real speech than in Raphael's synthetic tokens. First, there was almost 100% correct identification of the stimuli in the [-voice] series. This was not surprising, since Raphael obtained an almost flat response between 90 and 100% for all of his [-voice] (i.e. no F 1 transition) labial tokens of less than approximately 225 ms. Likewise, there was almost 100% correct identification of the stimuli in the[+ voice] series at or above a vowel duration of 195 ms (i.e. above the range of overlap). The most interesting results, however, as was the case with Raphael's experiment, are those obtained for the stimuli within the range of overlap (i .e. stimuli with vowel durations between 120 and 160 ms). At 135 and 140 ms, the tokens originally containing a [-voice] stop received 93 and 100% correct responses, respectively. At the same durations, however, the tokens originally containing a [+voice] stop received only 53 and 54% correct responses, respectively, which are tantamount to guesses. It is clear, as was the case with Raphael's synthetic speech experiment, that vowel duration cannot explain these different responses

Vowel length and vowel transition Table I

409

Duration of vowels in the test stimuli

Duration of vowels before fp/ (ms)

Duration of vowels before /b/ (ms)

90 95 100 (2 tokens) 105

125 135 140

120 125 130 135 140

150 160

195

220 235

255 265

to the[+ voice] and [-voice] stimuli, which contain vowels of identical duration. Vowel durations of 135-140 ms are well below the average duration of vowels before [+ voice] consonants, according to previous studies. For example, House & Fairbanks (1953, p. 107) found the average duration of vowels before /b/ to be 237 ms . Thus , if vowel duration alone serves as a cue to [±voice] in a following stop, vowels of 135-140ms should have unambiguously cued a following /p/ in our study. The fact that the two tokens from the [+voice] series received no better than a guess indicates that vowel duration is not the only operative cue in these cases. The explanation for the different responses to the two sets of stimuli with identical vowel durations appears to be that there is a conflict of cues in the [+voice] tokens. Examination of the spectrograms of both the [+ voice] and [-voice] tokens at 13 5 and 140 ms clearly shows a falling F 1 transition before /b/ but no such transition before / p/. (The first

410

T. Walsh and F. Parker Table II Percentage of correct responses. (A dash indicates no token at that vowel duration)

Duration of vowels (ms) 90 95 100/100 (2 tokens) 105

% correct responses to fgE( p)/

% correct responses to fgE(b)/

84 95 72/1 00 98 II

120 125 130 135 140

93 100

150

81

160

81

96

23 18 53 54

195

96

220

98

235

100

255

98

265

98

formants of all four tokens are represented schematically in Fig. 1. These schematics were obtained by tracking the midpoint ofF 1 throughout each of the four tokens.) In the latter case, the extremely short vowel and the absence of a falling F 1 transition constitute mutually compatible cues ; that is , they both signal [-voice J in a following stop. Thus, the subjects, predictably , agree on /p/ almost 100% of the time. In the former case, however, the short vowel and the falling F 1 constitute contradictory cues . The vowel length signals[- voice] in the following stop , but the falling F1 signals [+ voice] . Thus, the subjects , again predictably , are forced to guess. As vowel length drops below this level , however , it appears to become the primary cue. Note , for example, that the shortest [+voice] token, 120 ms , received only 11 % correct responses even though it contains a falling F 1 transition (Table II). This fact suggests that

Vowel length and vowel transition

411

N

I

100 OL-----~~------~-----L~~

50ms

Figure 1

lOOms 135ms 150ms

lOOms

140msl50ms

Schematic representation ofF, in[± voice] tokens at 135 and 140 ms. [£]in /g£(p)/ , ------ - -- -; [E] in fgr.(b)/, - - - - -

at extreme durations vowel length is a sufficient cue to [±voice] in a following stop. However, during some mid-range of durations, which is apparently lower in the case of real speech than for Raphael's synthetic stimuli, the vowel transition is the decisive cue. Thus, the relationship between vowel length and vowel transition as cues to [±voice] in a following stop seems to be complementary: vowel length is the primary cue in tokens containing extremely long and short vowels, whereas the vowel transition is the primary cue in tokens containing vowels of intermediate length. This relationship is presented schematically in Fig. 2, for both Raphael's data and ours. If, in fact, this is an accurate representation of the relationship between these two cues, an interesting consequence follows, namely, that vowel length as a cue in real speech is either redundant or misleading. That is, if it is the case that, ideally, all vowels preceding a [+voice] stop contain a falling F 1 transition and all vowels preceding a [-voice] stop do not contain a falling F ~, then a listener will be able to identify correctly the voicing characteristics of a post-vocalic stop on the basis of the vowel transition alone. However, if vowel length can override the transition cue in tokens containing vowels of extreme duration, then it is either redundant (when it signals the same information as the transition) or misleading (when it signals information contradictory to the transition). Consider, for example, a token containing an extremely short vowel and no F 1 transition. In this case, vowel length is redundant (at least in theory) since it merely reinforces the transition cue. On the other hand, consider, for example, a token containing an extremely short vowel and a falling F 1 . In this case, the vowel length cue will override the transition cue and cause the listener to misperceive the signal.

Figure 2

Synthetic stimuli 200 ms

240ms

280ms

Natural stimuli

160 ms

200 ms

120 ms

Relationship between vowel length and vowel transition as cues to [ ± voice] in post-vocalic stops. Vowel length, ; vowel transition, ---- ------.Synthetic stimuli from Raphael (1972) , natural stimuli from Walsh & Parker (present study). The 200 ms va lue for the natural st imuli only represents an in fe rence since none of our [-voice ] tokens were long enough to test the upper end of the scale.

412

T. Walsh and F. Parker

We can summarize our conclusions as follows. First, as Raphael's study demonstrates, vowel length is perceived as a cue to [± voice] in a following stop in a continuous manner so that , for a given spectral configuration, there is a range of vowel duration above which listeners will predict a following [+ voice] stop and below which they will predict a following [-voice] stop. Second, within this range listeners' predictions will vary depending on the spectral configuration of the vowel. In particular, within this intermediate range, listeners will predict a following [+ voice] stop if the vowel terminates in a falling F 1 transition and a [-voice] stop if the vowel does not terminate in a falling F 1 . Third , given this state of affairs, we must conclude that vowel length as a cue to [±voice] in post-vocalic stops in English is (at best) redundant and (at worst) misleading. Since the vowel transition serves as the defining cue during some intermediate range of vowel length, then the only possible function of vowel length as a cue is either to reinforce or to override the transition cue at extremely long and short durations . There appears to be no range during which vowel length is a necessary and sufficient accurate cue to [± voice] in a following stop. References House, A. S. & Fairbanks, G. (1953). The influence of consonant environment upon the secondary acoustical characteristics of vowels. Journal of the Acoustical Society of America, 25, 1OS -13 3. Krause, S. E. (1982). Vowel duration as a perceptual cue to postvocalic consonant voicing in young children and adults. Journal of the Acoustical Society of America, 71, 990-995. O'Kane, D. (1978). Manner of vowel termiRation as a perceptual cue to the voicing status of postvocalic stop consonants. Journal of Phonetics, 6, 311 -31 8. Parker, F. (1977). Distinctive features and acoustic cues. Journal of the Acoustical Society of America, 62, 1051-1054. Raphael, L. J. (1972). Preceding vowel duration as a cue to the perception of voicing in word -final consonants in American English . Journal of th e Acoustical Society of America, 51, 1296-1303 . Raphael, L. J. (1981). Durations and contexts as cues to word-final cognate opposition in English. Phon etica, 38, 126-147. Repp, B. H. (1981) . On levels of description in speech research. Journal of th e Acoustical Society of America, 69, 1462-1464 . Revoile, S., Pickett, J. M., Ho ld en, L. D. & Talkin, D. (1982). Acoustic cues to final stop voicing for impaired- and normal-hearing listeners. Journal of the Acoustical Society of America, 72, 1145-1154. Walsh , T. & Parker, F. (1981a). Vowel termination as a cue to voicing in post-vocalic stops. Journal of Phonetics, 9, 105-108. Walsh , T. & Parker, F . (1981b). Vowel length and 'voicing' in a following consonant . Journal of Phonetics, 9, 305-308.