Journal of Phonetics (1983) 11, 383-393
Perceptual development for labeling words varying in voice onset time and fundamental frequency Lynne E. Bernstein Division of Hearing and Speech, J.F. Kennedy Institute, Baltimore, Maryland 21205, U.S.A. Received 8th June 1983
Abstract:
The speech waveform is highly redundant, providing the listener with ensembles of potentially informative acoustic characteristics. Although it can be demonstrated that adults can respond differentially to independently varied acoustic-phonetic cues , it is likely that perceptual learning (Gibson, 1966) during childhood is required in order to achieve this level of speech processing. The voicing distinction in English (e.g. /g/ versus /k/) provides an excellent context for studying developmental changes in the perception of acoustic-phonetic information. It was hypothesized that among the acoustic cues to the prevocalic voicing distinction, voice onset time (VOT) would take priority as an effective cue over fundamental frequency (FO). A speech perception test was administered in which words that varied phonemically in their initial consonant (i.e. gate vs Kate) were presented in a two-alternative forced choice labeling procedure to four- and six-year-old children and adults. The stimuli were synthesized and varied factorially in terms of VOT and FO. Results show that, in contrast with adults, for children even at age six years, FO is not a factor in judging the voicing of the prevocalic stops fg/ and /k/. Results contribute to growing evidence that speech perception undergoes significant change during childhood .
Introduction Research in acoustic-phonetics has characterized the speech waveform as providing the listener with ensembles of potentially informative acoustic cues to each of the phonemes in a given language (Parker, 1977). That children use language quite effectively by age 3-4 years, indicates that they are able to process sufficient acoustic-phonetic information for the recovery of messages from the speech waveform. Nevertheless, it is likely that perceptual learning (Gibson, 1969) is required during childhood in order that adult-like ability to abstract and make use of some available acoustic-phonetic cues may be achieved . Opportunities to examine changes across age in the processing of acoustic-phonetic cues are provided by voicing contrasts in English (e.g. /b, d, g/ versus /p, t, d/). The present study tested whether the relative perceptual effects of two acoustic cues to prevocalic voicing distinctions, voice onset time (VOT) and fundamental frequency (FO), are different for young children than for adult listeners. There is support in the literature for the proposal that acoustic-phonetic cues to voicing distinctions vary in their relative effects across age . For example, Simon & Fourcin (1978) 0095-4470/83/040383 + 11$03.00/0
© 1983 Academic Press Inc. (London) Ltd.
384
L. E. Bernstein
investigated whether VOT and first formant (F 1) characteristics are effective cues to prevocalic voicing distinctions for British English-speaking children between 2 and 14 years of age. VOT is the temporal relation between the release of oral articulatory closure and the onset of vocal fold vibration in prevocalic stop consonants (Lisker & Abramson, 1967). When voicing precedes release , VOT is expressed as a negative value . Voicing that is simultaneous with release is expressed as 0 ms VOT, and post-release voicing is expressed as a positive value . In English , the distinction between prevocalic voiced stop consonants (i.e. /b, g, d/) and voiceless stop consonants (i.e. /p, t, k/) corresponds with, respectively, short positive VOTs and long positive VOTs (Lisker & Abramson, 1964). F1 transition rate and extent have also been shown to influence adults' judgments of voicedness when stimuli are varied along a continuum of VOTs (Lisker, 1975; Stevens & Klatt, 1974; Summerfield & Haggard , 1977). Adults perceive stimuli with short transitions or flat F1s as voiceless. As a result of their studies, Simon and Fourcin reported that British children between 2 and 7 years of age did not label as voiceless synthetic stimuli with level first formants . However, for children between 8 and 14 years of age responses corresponded with those of adults, i.e., stimuli with level F1 onset and short VOT were no longer judged as acceptable versions of voiced sounds. Between the ages of 2 and 14 years VOT is used to distinguish between phonemes from different voicing categories (if only between the end points of the continua in the case of the two-year-old subjects). A study by Greenlee (I 980) shows how the absence of a redundant acoustic cue can impair children's ability to make voicing distinctions. Greenlee tested children and adults on voicing distinctions (/t/ versus /d/, and /k/ versus /g/) which occur postvocalically in consonant-vowel-consonant stimuli. Stimuli were derived from digitized natural syllables. One of the stimulus dimensions was vowel duration, which in adults can function as a sufficient cue to signal postvocalic stop consonant voicing distinctions (Raphael, 1972). Long vowel durations signal voiced final consonants, while short vowel durations signal voiceless final consonants. Greenlee created two types of stimulus continua by iterating vocalic pitch periods in voiceless syllables (thus adding to the vowel duration artificially) and by removing pitch periods from voiced syllables (thus reducing the vowel duration artificially). Voicing which occurred during the period of postvocalic closure was also removed from test stimuli. This voicing during closure varies in duration as a function of the voicedness of the postvocalic consonant and functions as a redundant perceptual cue for adults (Hogan & Rozsypal , 1980). Unaltered control stimuli were also tested . The results showed that three-year-olds, in contrast to six-year-olds and adults, were unable to label the test stimuli differentially on the basis of vowel duration alone; although they were able to label unaltered stimuli. Krause (I 982) reports similar results for three- and six-year-old children who also listened to postvocalic voicing contrasts signalled by vowel duration only. The above research points to developmental change during childhood in the ability to judge voicing distinctions on the basis of independently varied acoustic-phonetic characteristics. The research reported here made use of a naturally occurring relationship between VOT and FO to furthe r investigate perceptual development for processing cues associated with voicing. Voice onset time and fundamental frequency as cues to voicing A number of researchers have reported systematic relationships between prevocalic voicing of consonants and FO characteristics. Measures of natural prevocalic stop consonants reveal that FO following closure release is high at the onset of voicing in voiceless consonants relative to FO in voiced consonants (House & Fairbanks, 1952; Lehiste & Peterson, 1961 ;
Perceptual development for labeling words
385
Ohde, 1982 ; Umeda , 198 1). Also peak FO is generally higher following voiceless as opposed to voiced stops (Lehiste & Peterson, 1961; Ohde, 1982; Umeda, 1981) as is mean FO (House & Fairbanks, 1952). House & Fairbanks (1952) and Lehiste & Peterson (1961) report that FO drops from its initial level following voiceless consonants but rises gradually following voiced consonants. Some controversy has centered around the question of whether the relative level of FO or, alternatively, differences in the direction of FO contour are more reliable cues to phoneme identity (Lea, 1973; Ohde, 1982 ; Umeda, 1981). Umeda (1981) reports measurements from 20-minute readings of essays in which the FO direction of change was found to be "not as reliable a voicing indicator of the preceding consonant as it is in isolated utterances" . Her data indicate , however, that initial and peak FOs are systematically distributed as a function of voicing. Ohde (I 982) also reports that the rise-fall dichotomy is less reliable than differencies between initial FO or peak FO values . Massaro & Cohen (1976) looked at the perceptual effect of FO contour versus FO level at voicing onset, and found the latter to be a more effective cue to voicing along /zi-si/ continua. However, there is also evidence that the direction of FO change can affect phoneme categorization. Haggard, Ambler & Callow (I 976) found that a stimulus synthesized to be ambiguous between /bi/ and /pi/ was judged to be /pi/ when the FO contour fell and /bi/ when it rose. Perceptual experiments using similar stimuli to those employed in the experiment reported here (Bernstein, 1980) show that when VOT is in the vicinity of a category boundary, adult subjects differentially adjust their labeling judgments in the direction of the voiceless alternative as FO increases. Thus, it appears that both the relative level of FO at voicing onset and FO contour can affect voicing judgments , but the contour cue may be the less reliably produced of the two cues. The experiment reported here tested whether children make use of the relative level of FO at voicing onset in labeling words that differ in VOT. The experimental questions were: Would children label the stimulus continua in an adult-like manner as a function of VOT, i.e., would they separate the stimuli into two sharply divided categories? Would FO have a differential effect on labeling judgments, i.e., would voiceless judgments increase as FO was raised? Would FO become a more effective cue as a function of age? Experiment
Experimental method Subjects. Children were four and six years of age. The 6 four-year-olds ranged in age from 4 years 0 months to 4 years 10 months, (4 years 4 months mean). The 6 six-year-olds ranged in age from 6 years 2 months to 6 years 8 months (6 years 5 months mean). The Peabody Picture Vocabulary Test (Dunn, 1959) and The Goldman-Fristoe Test of Articulation (Goldman & Fristoe, 1969) were administered to each of the children. Pure tone air conduction auditory sensitivity was tested at 500 and 2000Hz. These tests showed that all the children had normal hearing and articulation and had developed vocabularies of normal extent. According to parents, none of the children had a history of chronic middle ear pathology. No child had to be excluded on the basis of these screening measures. Six normal adults tested also were .e ither undergraduate or graduate students at Northwestern University. All subjects were paid. Stimuli. In order to make the experimental procedure as simple as possible for the children, words which could be represented in pictures were used as stimuli, i.e., gate and Kate. The stimulus formant frequencies for gate and Kate and their time course were derived
386
L. E. Bernstein
[!illj----
N
I
"'
>.
u
c
Q)
:J
0' Q)
t::
0
200 (ms)
Figure 1
Schematic spectrogra m of the formants in all gate-Kate stimuli. Solid fo rmant s represent periodic excitation whereas dott ed formants represent noise excitation.
from measures made of spectrograms of recordings by two male talkers who pronounced tokens of gate and Kate. The formant frequency pattern that was derived from analysis was fixed across all synthesized stimuli . The stimulus formant frequencies are shown in Fig. 1. Twenty-four test stimuli were synthesized by facto rially varying FO and VOT. Stimuli were synthesized on a DEC PDP 11 /40 computer in the Auditory Research Laboratory at Northwestern University using the Klatt cascade/parallel formant synthesizer (Klatt, 1980). The synthesizer is controlled by 23 parameters which remain fixed throughout a stimulus and by 20 variable parameters which are updated every 5 ms. The sampling rate is set at 10000 Hz. A continuum of VOT values was generated by progressively replacing the periodic voicing source with a noise source . The F1 bandwidth associated with the noise source was 400Hz and was subsequently narrowed to 50 Hz during voicing. The wide F1 bandwidth results in considerable F1 attenuation which mimics the natural outcome of voicelessness. VOT was measured from stimulus onset to the first voicing pulse. All stimuli were synthesized with an initial 5 ms of frication followed by from 10 to 75 ms of aspiration. The 8 VOT values were 15, 20 , 30 , 40 , 50, 60, 70 and 80 ms . Three different VOT continua were created by orthogonally varying the FO parameter. The three FO values employed were 165 , 135 and 105Hz. FO was fixed during the initial 95 ms of the stimuli so that FO and VOT would not interact. FO then fell linearly to 82% of its original value by the final period . Syllables were low-passed at 4000Hz through a Unigon (LP-120) filter with roll-off of 120 dB/octave and presented binaurally over matched headphones (TDH 49) at approximately 80 dB SPL. Procedures. All subjects were tested in a sound-treated room. Children participated in a training procedure in order to establish reliable labeling of endpoint stimuli. First, they were shown pictures corresponding to the stimulis words . The experimenter asked the children to point to the appropriate picture when she verbally produced the target stimulus words in random order. Then the child was shown that a smiling face could be made to appear on an oscilloscope after the correct response was made (Elliott, Longinotti, Meyer, Raz & Zucker, 1981 ). During the next phase of training, the children listened over headphones to blocks of endpoint stimuli. They were again asked to point to the correct pictures. Each training block consisted of endpoints from one continuum (i.e ., 15 and 80 ms VOT) with FO held constant.
Perceptual development for labeling words
387
Each endpoint was presented 10 times, and a total of 20 trials were presented in random order by the computer. The experimenter stood behind the child and recorded each pointing response by touching an appropriate response button. The experimenter was not monitoring stimulus presentation and could not differentiate between stimuli. The response was sent to the computer, and the program logic determined whether the response was correct. If it was correct, visual feedback was given. The order of presentation of endpoints with regard to FO was 105 then 165 then 135 Hz FO. In order to progress to the testing of the 24 different stimuli (8 VOTs x 3 FOs), children were required to achieve a score of 7 5% correct on two out of the three pairs of endpoint stimuli. This level of accuracy assures that chance performance on a set of endpoints is probable at less than 0.02 according to the binomial distribution. Accurate performance was required on two out of three sets of endpoints only, because on .the basis of previous work (Bernstein, 1979) it was not expected that the FO variable would have any influence on endpoint labeling. During testing, stimuli were randomly presented in blocks of 24 (8 VOTs x 3 FOs). The smiling face was presented contingent on correct response for endpoint stimuli and was presented unconditionally for all the intermediate stimuli. In this way children received reinforcement for responding but were not given feedback that could influence their judgment of intermediate VOT values (Kuhl & Miller, 1976). Each session lasted approximately 45 minutes. Following the first session, each subsequent session began with presentation of the 135Hz FO endpoints to re-establish the criterion level of response. A total of two to four test sessions was required in order to obtain ten blocks of test responses. Adult subjects labeled one block of the entire set of 24 stimuli for familiarization and then ten blocks of 24 for testing. Adults responded by touching response buttons labeled with the words gate and Kate. Adults completed ten blocks of stim uli in approximately 20 minutes. Results and discussion All of the children were able to reach criterion levels on the training procedure. Five of the six-year-olds easily achieved criterion on each of their first two blocks of endpoints (at 105 and 165 Hz FO). Four of the four-year-olds were also readily successful. One four-yearold , MD, failed to reach criterion when tested first on the low FO endpoints but surpassed criterion when the 165 and 135Hz FO endpoints were presented subsequently during the same test session. A similar performance was achieved by a six-year-old, IR, except that he failed to meet criterion on the 165 Hz FO stimulus pair. Finally, SR, a four-year-old, was administered three blocks of 105Hz FO stimuli and two blocks of 165Hz FO stimuli before reaching criterion. The children who had more difficulty initially were not dropped from the sample, because there was no a priori reason to expect that their performance after achieving criterion would be significantly different from the other children's. The results from individual subjects are presented in Figs 2-4 for four- and six-year-old children, and adults, respectively. Responses are plotted as percent Kate at each VOT. Separate curves are plotted for each FO value, i.e ., 105 , 135 and 165 Hz FO. Each value plotted is based on responses to ten stimulus presentations. Inspection of Figs 2-4 reveals that the majority of the subjects divided the continua into two distinct response categories (gate versus Kate) as a function of VOT. However, one four-year-old, SR, and one six-year-old, IR, failed to identify almost all of the stimuli consistently. These were two of the three children who were not able to reach criterion as quickly as the other children on initial training blocks. Another six-year-old, EM, also failed to achieve a high level of consistent categorization in comparison with the remaining
388
L. E. Bernstein
(c)
,r1/
I
I
h if if //
1/ /I
(f)
I
Voice onset time
Figure 2
Individual response curves for the six four-year-olds ((a) AH, (b) SR, (c) DH , (d) KM, (e) AM, (f) MB). Data are plotted as percent Kate responses. A separate curve is shown for each VOT continuum at each of the three valu es of FO: - - , FO = 165 ;- ·- ·-, FO = 135 ; --- -, FO = 105.
children, but was not as inconsistent as IR or SR. EM had required no additional training. It should be noted that MB, who required additional training was quite consistent in labeling. 1 It was predicted that as FO increased , adults' voicing judgments would increase in favor of the voiceless alternative Kate, but children's judgments would not. One method for assessing the effect of FO would be to examine the 50% crossover points and slopes of labeling functions at each level of FO, since the effect of FO is expected to occur in the midrange of the VOT values. Endpoints are not expected to be influenced by FO. This method requires relatively smooth functions, without large reversals in direction. The children's individual labeling functions were not , however, smooth enough to warrant treatment in terms of slope and crossover point estimates. Further, since each subject contributed only ten responses per stimulus, it was deemed inappropriate to use curve fitting procedures to estimate parameters. It was decided , therefore, to examine the effect of FO in terms of the total number of Kate responses at each FO. Total number of Kate responses at each FO reflects the overall FO effect, although slope and crossover position information about the
1 It is possible for a child to reach criterion in training without making phonetic distinctions. Instead, she/he can learn to associate the two stimuli with the respective correct pictures on the basis of auditory differences alone. Or a strategy can be adopted in which a correct response is repeated unless the stimulus "sounds different" and then the other picture is selected. These strategies might then lead to inconsistent responses when the number of stimuli are increased and also varied along two dimensions as is the case during testing .
389
Perceptual development of labeling words 100 /I
;I I l; II tl I
~
I
.e' 0
"'
0 100
'\
/'" 'I/)
(a)
( b)
\.- -
I
i
iJ. 75
\ J
25
30
\
h \ I \1 (e)
(d)
15 20
/ '-./
I I I
1~ /\
50
0
\
40
40
50
60
70
(f)
80
Vo ice onset time
Figure 3
Individual respo nse curves for the six six-year-Qlds ((a) MM, (b) VS, (c) AM , (d) GV, (e) IR, (f) EM) . Data are plotted as percent Kate respo nses. A separate curve is shown for each VOT continuum at each of the three values of FO: - - , FO = 165; -·-·-, FO = 135 ; - - - - , FO = 105 .
I
I I
I I
I I
I
l " / I I '-vi I! f/ I I . I
I I I I I I I
( 0)
l I
I
I / i/
·I I
il- -
;--....
(c)
I
I
.
y·-~
I
/1 I
(I
II II II
i
/
I
~
II
II l I I I / I
/
( d)
l
ll I
(tl
/ I / I I
Voice onset time
Figure 4
Individual respo nse curves fo r the six adult s. ((a) MR, (b) EM, (c) SS , (d) SK, (e) DS, (f) TK). Data are plotted as percent Kate respo nses. A separate curve is show n for each VOT co ntinuum at each of the three values of FO: - - , f'O = 165 ;-·-·-, FO = 135 ;- -- -, FO = 105.
390
L . E. Bernstein
labeling functions is lost. The initial statistical analysis then consisted of a repeated measures analysis of variance design for Ka te responses with Age (three groups) as a between-subjects factor, and FO (three levels) as a within-subjects factor. For this and all subsequent analyses, data from all subjects were employed, since inconsistent labeling in terms of VOT might, nevertheless , obscure consistent use of FO. The analysis resulted in significant effects of Age, F(2, IS)= 7.98, p < 0.004, FO , F(2 , 30) = 23.20, p < 0 .0005 and FO x Age, F(4, 30) = 13 .52, p < 0.0005 . To determine the source of the FO x Age interaction , individual repeated measures analyses of variance were performed at each age. FO was found to be nonsignificant for four-year-old, F(2, 10) = 1.01 , p < 0.398 , and for six-year-old children, F(2, 10) = 0.46, p < 0.63. FO was significant for adults, F(2, 10) = 37.55,p < 0.0005. Thus, the interaction effect may be ascribed to the children's lack of response to FO in both age groups, in contrast with its significant effect for adults. This finding was further substantiated in individual analyses comparing adults with four-year-old and then with six-year-old children. A significant FOx Age interaction effect (p < 0.005) arose in both analyses. Neither Age nor FO was significant in an analysis comparing six-year-old with four-year-old children . Figure 4 shows progressive displacem ents in the position of adults' category boundaries in the predicted direction , i.e., towards the shorter VOT values as a function of increases in FO. The pattern of displacements is the same across all adults subjects, although curves for individual subjects are not in every case parallel throughout the boundary region . The mean number of times adults labeled the 165Hz FO stimuli Kate was 46 . Means for the 135Hz FO and 105Hz FO stimuli were 44 and 38 , respectively. Repeated measures, onetailed t-tests were performed on adults' data in order to determine whether the differences in frequency of Kate responses between continua with 30Hz FO differences were significant in the predicted direction. These tests showed the predicted results. Adults labeled stimuli as Kate significantly more frequently when FO was 165Hz in contrast with 135Hz , t(S) = 6.35,p < 0 .0005, and when FO was 135Hz in contrast with 105Hz, t(S) = 3.18,p < 0.01. Possible systematic reversals of direction in the labeling functions were observed in Figs 2- 4. Fifty percent of the subjects identified fewer 165 Hz FO stimuli as Kate at 40 ms VOT than at 30 ms VOT. Three subjects plateaued between these two VOT values. Since the stimuli had been synthesized , stored digitally and subsequently deleted from computer files, it was not possible to determine directly whether these reversals were due to experimental sampling error or synthesis error. Examination of hard copies of synthesis input parameters failed to reveal any synthesis errors. Therefore, another set of statistical analyses were performed in which responses to the 165 Hz 40 ms VOT stimulus were estimated in favor of the hypothesis that FO was effective for all subjects. This was done by drawing a straight line between the points corresponding to responses at 30 ms and 50 ms VOT for the stimuli with 165 Hz FO . The number of responses corresponding to 40 ms VOT on the estimated curve was substituted for the original obtained value. This procedure for correcting the data operationalizes the assumption that the stimulus at 30 ms VOT was correct and the stimulus at 40 ms VOT was in error. The alternative assumption, that the 30 ms VOT stimulus "prematurely" signalled voicelessness, was not tested because previous research (Bernstein, 1980) had shown the predicted FO effects amo ng adults . Also, the estimated data favored the null hypothesis for development , i.e. , no difference across age. For all but two subjects (one four -year-old and one six-year-old) , this procedure inflated or left unchanged the total number of Kate responses at 40 ms VOT , 165 Hz FO. An analysis of variance was performed again with Age (three groups) as a between factor and FO (three levels) as a within factor, this time incorporating the estimated data. Again,
Perceptual development for labeling words
391
Age, FO, and the Age x FO interaction were significant, p < 0.0001. When children's data were examined by analysis of variance with Age (two groups) as a between factor and FO as a within factor , FO was significant, F(2, 20) = 5.55, p < 0.012 . Following the procedure employed earlier to examine the FO effect among adults, repeated measures t-tests were performed separately on the scores at each of the two ages. None of the comparisons between adjacent FO values were significant. Only the comparison between responses to the most extreme values , i.e ., 165Hz versus 105Hz FO, on the part of four-year -olds was significant, t(S) = 2.78, p < 0.02. It was felt that these analyses provide extremely weak evidence for an FO effect among the children. In particular, were there a genuine effect for the four-year-old children, a similar effect would be expected among the older children. Instead, the individual data plots for six-year-olds do not provide convincing evidence that FO systematically affected labeling. In particular, boundary functions for each child are not ordered systematically in relation to FO. In summary, the major finding of the present study was that young children, in contrast with adults, do not make use of the FO cue in identifying words that vary also in VOT. At the same time , VOT was shown to be an adequate cue to voicing for the majority of subjects at each age . These results add FO to the growing list of acoustic-phonetic cues to voicing whose effects vary across age (e.g., Greenlee, 1978; Krause, 1982 ; Simon & Fourcin, 1978; Zlatin & Koenigsknecht). General discussion The VOT and FO cues are (for adults) in what may be called a "trading relationship". Repp, Liberman, Eccardt & Pesetsky (1978) say of such relationships: "Within limits, one cue can be exchanged for another without any change in the phonetic percept; in that sense, the cues are perceptually equivalent, though they may differ greatly in acoustic (and presumably auditory) terms ... " (p. 622) . The trading relationship studied here between VOT and FO i!\ such that a decrease in VOT can be compensated by an increase inFO to achieve a /k/ as opposed to /g/ identification. The VOT range in which trading was observed was approximately 40 ms (i .e., between 20 ms and 60 ms) orthogonal to an FO range of60 Hz (i.e., between 105 and 165Hz). Although the study did not explore the limits of this trading relationship, it is highly unlikely that more extreme values of FO can trade for endpoint values of VOT. At appropriate VOT values , FO can signal a voicing distinction but cannot override the information conveyed by VOT at its extreme values . In this respect, FO is a secondary cue in relation to VOT . Because of this inherent difference in information value , it appears reasonable to find that VOT has developmental priority over FO for signalling voicing distinctions. Stevens (1973 ; Stevens & Blumstein, 1978), in discussions of the cues to consonantal place of articulation, has speculated that infant speech perception could depend on a subset of acoustic-phonetic cues that may be related to innate sensitivities. Stevens suggests that a few simple property detectors can provide a basis for classifying initial consonants according to gross place of articulation . This classification in terms of context-independent attributes provides a basis for the aquisition of a set of secondary context-dependent cues for features, and these cues are available for use when the primary cues are weak or absent. (1973 , p. 167) Whether property detectors or some other mechanisms correctly characterize infant speech perception is not at issu e here. What is of interest is the suggestion that speech perception may develop in relation to acoustic characteristics that have inherently different informational values .
392
L. E. Bernstein
Investigations of infants' and children's perception of voicing provide some support for a general developmental trend such as that described by Stevens. Indeed, VOT has been shown to be an adequate cue to induce differential responding on the part of prelingual infants (Aslin, Pisoni, Hennessy & Perey, 1981), and sensitivity to VOT differences has been posited to be a basic property of the auditory system (Pisoni, 1977). Young children may decode the speech signals for voicing distinctions in terms of VOT only. Subsequently, cues such as FO and F1 transition characteristics may have to be learned during middle childhood (Simon & Fourcin, 1978). If evidence continues to accrue in support of developmental changes in relation to inherent differences in acoustic-phonetic cues, an important question that must be asked is how the secondary cues and relationships among them and primary cues are learned . This is a question that has not yet been explored . Information about the relative effects of acousticphonetic cues across age begins to map out the dimensions of what is learned and the time course of learning. That perceptual learning continues for a prolonged period of time beyond initial language learning may be interpreted as an important indication that the processes of speech perception do need to be understood in terms of dynamic processes of developmental change as well as the usual models in which information to the perceptual system is considered to be some fixed set of characteristics and relationships. An adequate model of speech perception between birth and adulthood would need to encompass parameters of innate auditory sensitivity and processes and sequences of perceptual change. At this time it seems likely that characteristics of the speech waveform can be dichotomized, in terms of a developmental model, into those characteristics with primary phonetic value for initial decoding of the signal versus those with secondary phonetic value acquired in the course of language use. Thanks are due to Dr Rachel E. Stark, and Dr John Heinz at the Division of Hearing and Speech at the J.F. Kennedy Institute for their insightful suggestions for the analysis and interpretation of the study. Work on this study was supported by a grant from NIH (NS 07108) to Dr Lois L. Elliott and a grant from NIH (NS 12045) to Dr Frederic Wightman at Northwestern University, and a grant from MCH (917) to the J .F. Kennedy Institute, Baltimore, Maryland , 2120 5. References Aslin, R.N. , Pisani, D. B., Hennessy, B. L. & Perey, A. J. (1981) . Discrimination of voice onset time by human infants: New findings and implications for the effects of early experience. Child Development, 52,1135 - 1145. Bernstein, L. E. (1979). Developmental differences in labeling VOT continua with varied fundamental frequency . Journal of the Acoustical Society of America, 65, Supplement 1, 558, Abstract. Bernstein, L. E. (1980) . Labeling of VOT with fixed versus varied fundamental frequency . Journal of the Acoustical Society of America, 6 7, Supplement 1, S51 Abstract. Dunn, L. (1959). Peabody Picture Vocabulary Test. Minneapolis: Circle Press, American Guidance Service. Elliott, L. L., Longinotti, C., Meyer, D., Raz , I. & Zucker, K. (1981) . Developmental differences in identifying and discriminating CV syllables. Journal of the Acoustical Society of America, 70,669677. Gibson, E. J . (1969). Principles of Perceptual Learning and Development. New York: Appleton. Go ldman, R. & Fristoe, M. (1969). Goldman-Fristoe Test of Articulation. Minneapo lis: Circle Press, American Guidance Service . Greenlee, M. (1980) . Learning the phonetic cues to the voiced-voiceless distinction: A comparison of child and adult speech perception . Journal of Child Language, 7, 459-468. Haggard, M., Ambler, S. & Callow, M. (1970). Pitch as a voicing cue. Journal of the Acoustical Society of America, 47, 613- 617 . Hogan , J. T. & Rozsypal, A. J. (1980). Eva luation of vowel duration as a cue for the voicing distinction in the following word -final consonant. Journal of the Acoustical Society of America, 6 7, 17641771.
Perceptual development for labeling words
393
House, A. S. & Fairbanks, G. (1952). The influence of consonant environment upon the secondary acoustic characteristics of vowels. Journal of the Acoustical Society of America, 25, 105 - 113 . Klatt, D. (1980). Software for a cascade/parallel formant synthesizer. Journal of the Acoustical Society of America, 67,971-995. Krause, S. E. (1982). Vowel duration as a perceptual cue to postvocalic consonant voicing in young children and adults. Journal of the Acoustical Society of America, 71, 990- 995 . Kuhl, P. K. & Miller, J.D. (1975) . Speech perception by the chinchilla: Voiced-voiceless distinction in alveolar plosive consonants. Science, 190, 69-7 2. Lea, W. (1973) . Segment and suprasegmental influences of fundamental frequency contour. In: Consonant Types and Tone , Southern California Occasional Papers in Linguistic, (L. M. Hyman, ed.), 1, 17-69. Lehiste, I. (1970). Suprasegmentals. Cambridge, Massachusetts: MIT Press. Lehiste, I. & Peterson, G. E. (1961). Some basic considerations in the analysis of intonation. Journal of the Acoustical Society of America, 33, 419-425. Liberman, A.M. Delattre, P. C. & Cooper, F . S. (1958). Some cues for the distinction between voiced and voiceless stops in initial position. Language and Speech, 1, 153-167 . Liskcr, L. (1975). Is it VOT or a first-formant detector? Journal of the Acoustical Society of America, 57, 1547-1551 . Lisker, L. & Abramson , A. S. (1964). A cross-language study of voicing in initial stops: Acoustical measurements. Word, 20, 384-422. Lisker, L. & Abramson, A. S. (1967). The voicing dimension: Some experiments in comparative phonetics. Proceedings of the Sixth International Congress of Phonetics Sciences, Prague. Massaro, D. W. & Cohen, M.P. (1976). The contribution of fundamental frequency and voice onset time to the /zi/- /si/ distinction.Journal of the Acoustical Society of America, 60, 704-717. Ohde, R.N. (1982). The effects of linguistic context on temporal and FO properties of speech.Journal of the Acoustical Society of America, 72, Supplement I , LL5 Abstract. Parker, F. (1977). Distinctive features and acoustic cues. Journal of the Acoustical Society of America, 52, 1051-1054 . Pisani, D. B. (1977). Identification and discrimination of the relative onset time of two-component tones: implications for voicing perception in stops. Journal of the Acoustical Society of America, 61, 135 21361. Raphael, L. J. (1972). Preceding vowel duration as a cue to the perception of the voicing characteristics of word-final consonants in American English. Journal of the Acoustical Society of America, 51, 12961303. Repp , B. H., Liberman, A.M., Eccardt, T. & Pesetsky, D. (1978). Perceptual integration of acoustic cues for stop, fricative, and affricate manner . Journal of Experimental Psychology: Human Perception and Performance, 4, 621-637. Simon, C. & Fourcin, A. J. (1978). Cross-language study of speech-pattern learning . Journal of the Acoustical Society of America, 63, 925 - 935. Stevens, K. N. (1973). Potential role of property detectors in the perception of consonants. Quarterly Progress Report No. 110, pp . 155-167 . Cambridge, Massachusetts : Research Laboratory of Electronics, MIT. Stevens, K. N. & Blumstein, S. E. (1978). Invariant cues for place of articulation in stop consonants. Journal of the Acoustical Society of America, 64,1358-1368. Stevens, K. N. & Klatt, D. H. (1974). Role of formant transitions in the voiced- voiceless distinction for stops. Journal of the Acoustical Society of America, 55, 65 3-659. Summerfield, Q. & Haggard, M. (1977). On the dissociation of spectral and temporal cues to the voicing distinction in initial stop consonants . Journal of the Acoustical Society of America, 62, 436-448. Umeda, N. (1981). Influence of segmental factors on fundamental frequency in fluent speech.Journal of the Acoustical Society of America, 70, 350- 355. Zlatin, M.A. & Koenigsknecht, R. (1975). Development of the voicing contrast : Perception of stop consonants. Journal of Speech and Hearing Research, 18, 541-55 3.