Journal of Phonetics (1980) 8 , 469-474
Labeling, discrimination and repetition of stimuli with level and changing fundamental frequency lise Lehiste and Linda Shockey Department of Linguistics, Ohio State University, Columbus, Ohio 43210, US.A. Received 25th May 1979
Abstract:
Two experiments were conducted to investigate further certain relationships between fundamental frequency and the perception of duration. The first sought to determine whether categorical labeling functions of stimuli with level and changing F 0 are accompanied by categorical discrimination functions; the second tested whether repetition of stimuli is categorical or continuous. Results of listening tests indicate that the perception of linguistically significant suprasegmental continua is generally characterized by categorical perception when the subjects are presented with a labeling task, but by non-categorical perception in a discrimination task. Results of production tasks showed that while listeners assign labels to a suprasegmental continuum in a categorical fashion, they are nevertheless capable of producing phonetically different durations within the same durational category. It is suggested that this may constitute one of the characteristics of suprasegmental features in general.
Introduction The present paper represents another step in a series of investigations into the relationship between fundamental frequency and the perception of duration (Lehiste 1975; Lehiste, 1976; Pisoni, 1976; Rosen, 1976; Wang, Lehiste, Chang & Darnovsky, 1976; Lehiste, 1977 ; Rosen, 1977a, b; Derr & Massaro, 1978; Shockey and Lehiste, 1978). The experiments reported in Lehiste (197 5, 197 6) showed that of two stimuli of equal duration, the stimulus with fundamental frequency change is perceived as longer than the stimulus produced on a monotone. Pisani (1976) replicated and extended these findings. Similar results were obtained by Wang eta!. (1976) in a further series of experiments. On the other hand , Rosen (1976, 1977a, 1977b) found the effects of fundamental frequency change on the percep tion of duration to be inconsistent. He suggested further that these effects were probably too small to be of much importance in speech. Lehiste (1977) tested whether the effect of lengthening produced by changing F 0 is carried over into the perception of voicing in the fmal consonants of English monosyllabic words. It is generally known that lengthening of the syllable nucleus is a sufficient cue for signalling that the consonant following the syllable nucleus is voiced (Denes, 1954). Lehiste hypothesized that since changing F 0 is perceived as increased length, syllable nuclei of a certain intermediate duration may be perceived as being followed by a voiced consonant 0095-4470/80/040469 + 06 $02.00/0
© 1980 Academic Press Inc. (London) Ltd.
470
I. Lehiste and L. Shockey
when the syllable nucleus carries a changing F 0 contour, and as being followed by a voiceless final consonant when the syllable nucleus is produced on a monotone. Test stimuli used in the experiment reported in Lehiste (1977), synthesized by Shockey on the OVE synthesizer at Haskins Laboratories , consisted of durational continua whose extreme values represented the word pairs bad- bat and bead-beat. There were ten durations for each set, ranging in 24 ms steps from 396 to 180 ms. The initial and final consonants were synthesized by formant transitions alone; the final consonant was not released. Lengthening and shortening took place during the steady state of the vowel. All stimuli in both sets were synthesized on a monotone at 80 Hz and with a F 0 falling from 80 to 60 Hz. A randomized test tape was presented to 25 subjects, whose task was to identify the stimulus as either bad or bat in one case or as bead or beat in the second case. o- - -o- - -o- - -a.. "Bad"
...
'd
"
.q
I
I
I
I
II
( II I 1 I 1
I I
ms
Figure 1
Identifications of synthesized stimuli as bad or bat depending on the duration of the stimulus and its fundamental frequency contour. --,level; - --,falling.
Figure 1 summarizes the results obtained in the bad/bat set. Similar results were obtained with the bead/beat set. Labeling functions had sharp crossovers in all cases, and there was a systematic (and statistically significant) difference in the crossover points depending on the F 0 contour. As may be seen from the figure, the listeners' perceptions shifted from bad to bat between stimuli nos 5 and 6 in the monotone case, and between stimuli nos 8 and 9 with falling F0 . The difference between the crossover points amounts to 72 ms. In the bead/ beat pair, the crossover point for monotone stimuli was likewise between stimuli 5 and 6; it was between stimuli 7 and 8 in the set synthesized with falling fundamental frequency. Derr & Massaro (1978) described a series of experiments designed to test two models accounting for the integration of vowel duration, fricative duration, and F 0 contour as cues to the voicing of final fricatives. While Derr and Massaro were primarily concerned with the question whether the effect of F 0 contour on perception is direct or indirect, their results included the observation that changing F 0 contour is associated with significantly greater identifications of the final consonant as voiced , thus replicating the findings of Lehiste (1977).
Labeling, discrimination and repetition of stimuli
471
Purpose and procedure The present paper extends the study reported in Lehiste (1977) in two directions. The first experiment to be reported here was designed to test whether categorical labeling functions are accompanied by categorical discrimination functions, and whether the F 0 contour plays a part in discrimination as it does in labeling. The purpose of the second experiment was to fmd out whether the kind of trade-off between F 0 and duration that had been earlier found in labeling is also part of speakers' production strategies; whether production is as categorical as labeling ; and whether the average durations of the two members of a word pair produced by listeners in response to synthesized stimuli differ depending on the F 0 contour applied to the stimulus. A set of 11 synthesized words spanning the bead-beat continuum was used in the experiments. The stimuli, synthesized by Shockey at Haskins Laboratories on the OVE synthesizer, consisted entirely offormant patterns. Three formants were spaced so as to simulate the vowel [i], and made to sound like beat or bead by supplying them with a bilabial-type transition at the beginning and an alveolar-type transition at the end. No voiced closures or releases were synthesized. The durations of transitions remained constant throughout while the steady-state vowel length was changed in 20-ms steps in such a way that the total dura tion of the stimuli ranged in ten steps from 150 to 350 ms. The stimuli were synthesized twice , once with a level F 0 pattern at 100Hz, and once with a F 0 contour falling from 100 to 80 Hz. There were thus 22 stimuli included in the tests. The 16 subjects-volunteers who were enrolled in a beginning linguistics course at Ohio State University- were first presented with each of the 22 stimuli in randomized order and asked to judge whether they heard beat or bead. This was a replication of one of the experiments reported above. As before, the labeling was found to be categorical; the crossover from beat to bead occurred at an approximately 45 ms shorter duration with falling F 0 . The shift from the perception of beat to the perception of bead occurred between the third and fourth stimuli (190 and 210 ms respectively) for the stimuli with falling F 0 , and between the sixth and seventh (250 and 270 ms) for the stimuli with level F 0 . Figure 2 displays these results in graphic form. ,. . . o, ' '0/
/o,
'
/
\
\ \
11
8eat
11
12
I
c"'
I
"'
E 10
I
\
I
"'o>
'oII
"0
2.
(
0
P\ I I
"Bead"
Figure 2
Identification of synthesized stimuli as bead or beat depending on the duration of the stimulus and its fundamental frequency contour. - , level; --- ,falling.
472
I. Lehiste and L. Shockey
There was a great deal of individual variation in crossovers: the range of individual crossover points for level F 0 was 210 +-290 + ms (from between stimuli 4 and 5 to between stimuli 8 and 9). The range for falling F 0 was + 190-+ 250 ms (from between stimuli 3 and 4 to between stimuli 6 and 7). There was also much individual variation in difference between level and falling crossovers. One subject behaved contrary to the average and had a 20-ms earlier crossover point for level than for falling F 0 . Two subjects showed no difference in crossover point. The others showed a range of differences from 20-100 ms . The standard deviation in difference of crossover points was 33.5 ms . The listeners were then presented with the stimuli in pairs and asked to judge whether the members of the pair were the same or different. Both members of the pair had the same F 0 pattern. The differences in duration were either 20, 40 , or 60 ms. The order of presentation within the pairs was counterbalanced, and the pairs were randomized. Figure 3 presents the results as separate graphs for one-step, two-step and three-step comparisons. 30
20
.l'! c
"' E
30
"'0>
"0
.2.
0
20
Q; _Q
E ::J
z
30
20~~1 150210
170230
190250
210270
230290
250310
270330
290350
ms Figure 3
One-step, two-step and three-step discrimination functions for pairs of synthesized stimuli. (a) 20-ms difference; (b) 40-ms difference; (c) 60-ms difference. --,level; --- ,falling.
As is apparent from the figure, the subjects had a strong tendency to hear both members of the stimulus pairs as the same. When we see peaks in the discrimination function, they usually show a move from "defmitely same" to a "not sure" judgment. There are few "different" judgments at better than chance level. Furthermore, these peaks do not occur in regions in which there is a crossover in the labeling functions. The data appear quite noisy, and we hesitate to draw far-reaching generalizations from them; we admit, however , to a great curiosity as to why in several cases the longest stimulus pairs-both level and falling - were judged more different than others. At this time we have no reasonable hypo-
Labeling, discrimination and repetition of stimuli
473
thesis to account for this result. Nevertheless we feel reasonably certain that the results of the discrimination tests provide no evidence for categorical perception. Repetition task The subjects were then asked to listen to each of the 22 stimuli, presented in random order, and to repeat them. Tape recordings were made of their responses; these recordings were analyzed acoustically, and the duration of the vocalic parts of the responses was measured from mingograms. The results are presented in graphic form in Fig. 4. 400 . ------------------------------------------------,
0
c:
200
2
1i "
0
100
Duration of stimulus (ms)
Figure 4
Duration of repetition responses to synthetic stimuli. --,level;---, falling.
The average duration of the responses to falling stimuli was 232 ms, while the responses to level stimuli averaged 215 ms . The responses are definitely not categorical in the way the labeling responses were. In general, the subjects seem to have done quite well in matching the duration of the responses to the duration of the stimulus. However, in the range from 210-250 ms responses to falling stimuli are consistently longer th_an responses to level stimuli. Since the average beat-bead crossover point is slightly before 210 ms for falling stimuli, as shown by the labelling task, we conjecture that most subjects were probably hearing and saying bead for the falling stimuli and beat for the level stimuli in this interval. There is no way to verify this directly; presence of voicing during the final plosive consonant would provide corroborative evidence, but we found very few instances of word-final voicing in these data. The subjects exhibited a considerable amount of individual variability in the repetition task. For most of the subjects, there was a strong correlation between the perceptual (labeling) crossover point and a rather extensive increase in produced vowel durations. This might be interpreted as representing a change from the production of beat to the production of bead. There were also some subjects who appear to have changed their criterion in the course of the experiment. For example, one subject had a crossover between beat and bead at 190 ms in the labeling task, but at 230 ms in the repetition task; for another subject, these values were 210 and 250 ms respectively. Such individual differences make the averaged results, shown in Fig. 4, less clear than they would be for individual cases. The production responses show a generally continuous nature. There were nevertheless some subjects who showed a systematic difference between falling and monotone stimuli in the inferred crossover in the repetition task. One very consistent subject, for example, identified the monotone stimulus of 250 ms as beat and in repetition gave it a duration of 150 ms; she identified the falling 250 ms stimulus as bead, and in the repetition gave it a duration of 360 ms.
474
I. Lehiste and L. Shockey
Discussion and conclusions The results of these experiments suggest that while listeners assign labels to a suprasegmental continuum in a categorical fashion, they are nevertheless capable of producing phonetically different durations within the same durational category. In perception, they show no effects of boundaries between categories that would correlate with the boundaries established on the basis of labeling tests. A similar observation was made by Abramson (1977) with regard to the perception of tones in Thai. Abramson found a high level of discrimination across the continuum with no effects of boundaries between tonal categories that had been established earlier on the basis of identification tests. It may well be that the perception of linguistically significant suprasegmental continua is generally characterized by categorical perception when the subjects are presented with a labeling task, but by non-categorical perception in a discrimination task. This may constitute one of the characteristics of suprasegmental features in general. References Abramson, A. S. (1977) . The noncategorical perception of tone categories in Thai. Haskings Laboratories Status Report on Speech Research, SR 51-52, 91-100. Denes, P. B. (1954). Effect of duration on the perception of voicing . Journal of the Acoustical Society of America, 27, 761-764. Derr, M. A. & Massaro, W. (1978) . The contribution of vowel duration, F 0 contour, and frication duration as cues to the /juz/-/jus/ distinction. Wisconsin Human Information Processing Program, No. 8. Department of Psychology, University of Wisconsin, Madison, Wisconsin. Lehiste, I. (1975). Influence of fundamental frequency pattern on the perception of duration. Journal of the Acoustical Society of America, 58, 591(A). Lehiste, I. (1976) . Influence of fundamental frequency pattern on the perception of duration . Journal of Phonetics 4, 113-117. Lehiste, I. (1977). Contribution of pitch to the perception of segmental quality. Proceedings of the 9th International Congress on Acoustics, Madrid, July, p. 522 . Pisani, D.P. (1976) . Fundamental frequency and perceived vowel duration. Research on Speech Perception. Progress Report No. 3, pp. 145 - 154. Department of Psychology, Indiana University, Bloomington. Rosen, S.M. (1976). Linear FM pitch sweeps and perceptual duration in speech and nonspeech. STLQPSR 4, 1-12. Rosen, S. M. (1977a). The effect of fundamental frequency patterns on perceived duration. STL -QPSR 1, 17-30. Rose~, S. M. (1977b) . Fundamental frequency patterns and the long-short vowel distinction in Swedish. STL -QPSR 1, 31-37 . Shockey, L. & Lehiste, I. (1978) . Labelling and discrimination of monotone and changing F 0 stimuli. Journal of the Acoustical Society of America, 64, S 21. (A) . Wang, W. S-Y., Lehiste, I., Chang, C. K. & Darnovsky, M. (1976). Perception of vowel duration. Paper presented at the 92nd meeting of the Acoustical Society of America, San Diego, Nov. 18, 1976. Journal of the Acoustical Society of America, 60, S 92 (A).