INFANT BEHAVIORAND DEVELOPMENT 5, 143.-159 (1982)
The Categorization of Male and Female Voices in Infancy* CYNTHIA L. MILLER
Department of Psychology University of Westera Ontario London, Canada N6A 5C2 BARBARA
A. YOUNGER
Department of Psychology University of Texas at Austin AND PHILIP A. MORSE
Department of Psychology University of Wisconsin-Madison Madison, WI 53706 The present set of studies explored the nature of the 7-month-old infant's percept.ion of human voices. In Experiment I, infants learned to respond discriminatively to groups of male vs. female voices. That this was evidence of male/female categorization was supported in Experiment II, in which it was shown that infants did not learn to respond discriminatively to the same voices when they were randomly organized into "categories" containing both male and female voices. The extent to which fundamental frequency may have contributed to this male/female classification wos investigated in Experiment III. The combined results of these three studies suggested that, ~lthaugh pitch is possibly one cue to which infants are attending when classifying these voices, it could not account fully for this ability. It remains for future research to identify other cues which may contribute to male/female categorization, as well as to investigate the developmentol course of speaker recognition and classification in general.
INTRODUCTION In the approximately
12 m o n t h s p r e c e d i n g t h e i n f a n t ' s f i r s t u t t e r a n c e o f m e a n i n g f U l
w o r d s , i m p o r t a n t d e v e l o p m e n t s o c c u r in t h e i n f a n t ' s p e r c e p t i o n a n d r e c e p t i o n o f h e r to-be-acquired language. Investigators of infant speech perception have devoted the * Some oYthis research is based upon a Ph.D. dissertation submitted to the Department of Psychology, University of Wisconsin-Madison, by the first author in partial fulfillment of the requirements for the Ph.D., and was supported by NICHD grants HD03352 and HD082240. The authors wish to thank John Shimshak, Mary Holmes, Pauline Meyer, Jan Goodsitt, and Sabrina Franklin for their assistance in data collection, and Umesh Singh, Dhruba Des, Bruce Andarsoi~, Mike Epp, Clifford Gillman, Jochen Dnermann, Bruce Orchard, and David Wilson for their technical and computer assistance. We are particularly grateful to Lewis Leavitt, Alexander Wilkinson, Raymond Kent, Stephen Suomi, and Arthur Glanburg, whose comments greatly assisted in the design of Experiments lI and HI. Finally, we owe a special thanks to the infants and their parents who participated in these studies. Portions of this paper were presented to the International Conference on Infant Studies, New Haven, CN, April 1980. Conespondence and requests for reprints ;hould be addressed to: Cynthia L. Miller, Department of Psychology, University of Western Ontario, London, Canada S6A 5C2. 143
144
MILLER, YOUNGER, AND MORSE
last decade or so to describing the nature and development of this "prelinguistic" knowledge. Over the course of this pursuit, essentially two types of questions have been explored. The first concerns the young infant's ability to discriminate certain of the acoustic cues that characterize and differentiate speech sound contrasts. By and large, this research has revealed that even very young infants (i.e., as young as one month of age) can discriminate most of the speech contrasts thus far presented, from vowels (Swoboda, Kass, Morse, & Leavitt, 1978; Swoboda, Morse, & Leavitt, 1976, Trehub, 1973) to consonants varying along place-of-articulation (Eimas, 1974; Miller & Morse, 1976) and voicing (Eimas, Siqueland, Jusczyk, & Vigorito, 1971; Miller, 1974) continua. A second and possibly more mature ability, however, is seen in the infant's developing appreciation for phonetic categories. One approach to the assessment of this ability, employed primarily by Kuhl and her colleagues (cf. Kuhl, 1980, for a review of these studies), examines the infant's recognition of the identity of a particular phonetic segment (e.g.,/i/) across its many acoustic variants, e.g., synthetic vs. natural-speech tokens, spoken by both males and females, and presented in different intonational contexts (Kuhl, 1979). Essentially, her studies have indicated that, at least by 7 months of age, infants seem to exhibit this perceptual constancy for many speech sounds. The process of speech perception, however, involves much more than the discrimination and categorization of phonetic segments. One source of information, providing important cues to the speech perceiver, is the acoustic variation contributed by different speakers. Thus, accuracy in speech perception, for both the adult and the infant, often depends on the listener's discrimination and recognition of different speakers and groups of speakers (e.g., male vs. female). Just as the study of phonetic perception in infancy has been explored at different levels, i.e., discrimination vs. categorization, a similar approach may be employed in exploring the nature of voice perception. Thus far, studies of voice perception in infancy are relatively limited and have been restricted to questions of discrimination. The few studies available in this area have demonstrated that infants can discriminate between two different voices (Kaplan, 1969; Turnure, 1971) as well as between different samples of the same voice (Culp & Boyd, 1975; Culp & Gallas, 1975). In addition, we know that infants can discriminate some of the specific acoustic cues that tend to contribute to speaker variation, e.g., relative formafft frequencies (Swoboda et al., 1978; Swoboda et al., 1976; Trehub, 1973) and changes in fundamental frequency (Kaplan, 1969; Kuhl & Hillenbrand, 1979; Morse, 1972; Sullivan, 1980). In addition to discrimination of voice information, the ability to categorize voices with respect to the type of speaker (e.g., male vs. female) is also an important component to speech perception. Male and female speakers, for example, not only may produce different spectral patterns for the same phoneme, but different phonemes may be characterized by similar sets of formant patterns, depending upon the sex of the speaker (Rand, 1971). We know that the adult listener uses her knowledge of these relationships in decoding the phonetic units of a speechstream,
INFANT VOICE PERCEPTION
145
and thus, the extent to which infants have access to the same knowledge becomes an important question. Since one of the primary sources of variation among voices is the sex of the speaker, one of the f'trst questions one might ask of the infant is whether she can classify male and female voices. It is perhaps important at this point to clarify the use of this term "categorization" in the present context. The term categorization has been used to describe a whole range of cognitive/perceptual activities, from very abstract conceptual knowledge to somewhat more concrete perceptually-based categories. Given the ambiguity of this term in common usage, we shall refer to categorization as the ability to treat as equivalent a group of stimuli that share similar qualities, yet may differ on a variety of irrelevant dimensions. This behavior is to be distinguished from the response to a set of stimuli that may be described by a single invariant and constant feature? Recent studies of phonetic categorization have employed an operant headturning paradigm developed by Wilson, Moore, and Thompson (1976) for infant audiologic assessment, and modified by Eilers, Wilson, and Moore (1977) for studies of infant speech 13erception. In this paradigm, the infant is presented with repeating tokens of a "background" stimulus (e.g., [ba]) that is periodically interrupted by a brief (e.g., 5 sec) presentation of a change stimulus (e.g., [ga]). The infant is reinforced (by viewing an animated toy animal) each time she produces a head turn in the direction of the loudspeaker when this change is detected. If the infant learns to respond to the change stimulus and to inhibit responses on control (no change) trials, then discrimination of the stimulus contrast is inferred. Kuhl's (1980) studies of perceptual constancy for phonetic segments have employed a modification of this paradigm. In these studies, variability is imposed along several irrelevant dimensions (sex of speaker, intonation) while the relevant feature remains constant (e.g., the vowel). The ability of the infant to continue responding to the constant and relevant feature(s) of the change stimuli, despite variations in the irrelevant dimensions, is an index of her ability to categorize this information. In the present studies, a variation of this approach was employed to investigate the ability of 7-month-old infants to categorize male and female voices, using a version of the head-turning paradigm similar to that employed by Kuhl. In Experiment I, a set of male voices and a set of female voices were selected to include a wide range of acoustic variability while maintaining their integrity as clear exemplars of the appropriate gender category. The use of stimuli that included this type of variability was designed to encourage responding to a complex of features that characterizes these voice categories rather than to some irrelevant acoustic cue that t Certainly, perceptual categories can be (and often are) based upon the detection of a single, higher order invariant cue. The possibility that such invariants may exist for phonetic and/or voice categories is still an important question for investigators of infant (and adult) speech perception. Until further research can delineate the status of such invariants, however, we can for the present assume that categorical responding to these stimuli is based upon the integration Of a complex of spectral qualities.
146
MILLER, YOUNGER, AND MORSE
might have systematically covaried with one of the voice categories. However, rather than introducing this variability into the stimuli in successive stages of the experiment, as Kuhl does, in the present study this variability was present from the beginning of training and was included in all of the test stimuli. Thus, throughout the testing session, infants responded to the entire set of stimuli. EXPERIMENT I Method
Subjects. Twelve infants, approximately 7 months of age (range: 29-35 weeks, mean: 31.1 weeks), participated in Experiment I. These infants were solicited by contacting parents when the infant was approximately 6 months of age by a letter describing the research. Appointments were scheduled in follow-up phone calls when the infant was approximately 7 months. A total of 18 infants was tested to achieve the final sample of 12, which consisted of 6 females and 6 males. The remaining 6 infants were eliminated from the study for reasons of equipment failure/experimenter error (2), failure to train (3), or change of state (1). Stimuli. The stimuli employed in Experiment I were the voices of 6 mothers and 6 fathers saying " h i " to their 7-month-old infants. The parents from whom voice tokens were solicited did not include the parents of any of the infants who participated in the experiment. Ten sets of parents (20 voices) agreed to have their voices recorded. All voices were originally recorded on a Uher 4400 stereo tape recorder (microphone M517). Recording continued until each speaker had uttered several tokens of the word " h i . " Following transcription of these recordings, all intelligible tokens were digitized through a 5KHz low-pass filter, matched for intensity, and stored on a Harris computer using the VOCAL program (Gillman & Wilson, 1979) of the Waisman Center Computing Facility of the Universty of Wisconsin. From the several tokens produced by these 20 voices, one token from each of 12 voices (6 female and 6 male) was selected for use in the study. Each of the stimuli was subjected to detailed acoustic analyses using the VOCAL program which provided a display in 25.6 msec increments of the frequency, bandwidth, and relative intensity of each of the first four formants. In addition, the fundamental frequency (Fo) and relative intensity at each 25.6 msec period was also estimated with this program. Table 1 summarizes the peak and the mean F0 values for each of the voices employed in Experiment I. The variability in Fo which is evident both within and between the two voice categories served to verify the initial perception of these stimuli as possessing diversity in the subjective dimension of pitch. The results of acoustic analyses on the first and second formants are summarized in Table 2. These analyses are consistent with other observations of significant differences in formant frequencies between male and female voices (Fant, 1973; Peterson & Barney, 1952). All of these stimuli were presented to adult listeners for identification, and all elicited very clear judgments of category membership from these listeners, thus validating their use in the present study (of. Miller, 1979, Figure I for these adult
INFANT VOICE PERCEPTION
147
TABLE 1 Peak Fundamental Frequency (Fo) and Mean Fundamental Frequency (Computer Across Entire Stimulus) in Hertz for all Stimuli Employed in Experiments I-tl.
I
Peak Fo
X..
M
RI
R2
HI
LO
226 355 285 333 337 330
186 165 331 260 222 117
226 337 330 186 165 222
355 285 333 331 260 117
337 331 260 329 348 336
226 117 133 267 335 242
280.2
323.5 220 t = 2.88*
256.2 230.3 221.9 256.0 217.1 109.2
246.5
213.5 t = 2.69*
180.7 256.2 230.3 221.9 246.5 262.2 X:
III
F
331
Mean Fo
II
244.3 t = .78
123.4 141.7 256.U 217.1 163.2 109.2
180.7 246.5 262.2 123.4 141.7 163.2
233.0 168.4 t = 2.47*
186.3
180.7 109.2 124.5 233.9 246.2 191.6
256.0 217.1 208.6 258.4 230.1
215.1
236.1 181.0 t = 2.28*
t = .91
*p < .05
data). The stimulus tape, sequenced and recorded using the VOCAL program, consisted of approximately one-half hour of randomly-ordered presentations of the six female voices on one channel synchronized with the onset of the six male voices on the other channel. An interstimulus interval (ISI) of 500 msec separated the stimuli on both channels.
Apparatus. Infants were tested in an Audio-Suttle sound-attenuated chamber. The stimuli were presented to the infant from a Sony TC-756 stereo tape deck TABLE 2 Results of Acoustic Analyses of the First Two Formants for Stimuli Employed in Experiments I-Ill. Values Represent Category Means and the Results of Independent t-test Comparisons Between Categories for Each Experiment.
Experiment I F
Experiment II
M
t
R1
Experiment III
R2
t
664.58 152.31
.04 -- .75
HI
LO
t
F1 Mean Freq Mean BW 1
742.49 175.13
598.96 102.26
F2 Mean Freq '1944.33 Mean BW 220.01
1717.86 232.32
*p < .05 1BW = Bandwidth
-
2.77* 2.50*
676.87 125.08
3.01" .34
1816.10 198.66
710.19 166.59
1846.08 -- .29 1829.31 253.67 - 1 . 7 0 222.70
704.76 143.43
.11
.55
1941.76 -- .92 186.28 1.13
MILLER, YOUNGER, AND MORSE
148
through a Crown D60 amplifier and an Acoustics Research 2ax speaker located approximately 1.5 m from the infant. The background stimuli were presented at 50±2 dB (A) SPL. The channel of the tape containing'the change stimuli was initially presented at 62__.2 dB (A) SPL. A Hewlett-Packard attenuator, connected to this channel of the tape recorder, allowed the change stimuli to be presented at four different intensity levels (levels 1-4). At attenuation level 1, the change stimuli were approximately 12 dB higher than the background stimuli. The intensifies of the change and background stimuli were equated when the change channel was presented at attenuation level 4. Intensity levels were measured by a General Radio Sound Level Meter (1551-C, microphone 1560-P5). Procedure. Upon arrival at the laboratory, the purposes and procedures of the study were explained to the infant's parent(s), at which time they were requested to sign a consent form for the infant's participation. The infant and one of her parents were then escorted into the testing chamber. The design of the testing chamber is illustrated in Figure 1. The infant was seated on a parent's lap directly in front of a table that contained a variety of toys. A second experimenter (E2) was seated across the table facing the infant and slightly to the left. Also in front and to the right of the infant (approximately 45 °) was a plexiglass cube (40cm X 40cm X 40cm) covered with smoked acetate. Activation of an animated toy monkey inside the cube constituted the reinforcement for the infant. During activation of the monkey, the cube was illuminated. At all other times, the cube remained dark, and the monkey invisible to the infant. The stimulus speaker was located on top of the cube. The infant's behavior was monitored by an experimenter outside the chamber (El) and E2 via a video camera that was connected to closed-circuit TV monitors. The basic format of the session consisted of a background of continuously
\\.//
TOYS
Figure 1. Schematizedillustration of the testing chamber for ExperimentsI-lil..
INFANT VOICE PERCEPTION
149
repeating tokens of one voice category (e.g., the "six female voices) which was periodically interrupted by changing the channel of the tape recorder to present a series of tokens of the other voice category (e.g., male). The infant's task was to turn her head toward the loudspeaker when this change in voice category occurred. Throughout the session, E2 attempted to maintain the infant's attention at midline by entertaining her with toys. Half of the infants were trained to turn to female voices against a background of male voices (Group M/F) and the remainder were trained to turn to male voices against a background of female voices (Group F/M). Each session consisted of both a Training and a Test phase.
Training Phase. The Training phase began when the infant was judged to be in a quiet state and actively interested in viewing the toys. A trial (i.e., change from background stimulation), was initiated by E1 when the infant was facing midline and engaged in viewing the toys. This consisted of changing the channel of the tape recorder to present the stimuli in the other voice category for a period of 5 seconds. If this channel change elicited a head turn from the infant, the cube was illuminated and the toy monkey activated. At the beginning of the Training phase of each session, the change stimuli were presented at an intensity of 12 dB higher than the background stimuli. This intensity difference was decreased by 4 dB each time the infant produced two consecutive correct responses at a given level. Thus, the training session consisted of four training levels (level 1 = + 12 dB, level 4 = 0 dB). At level 4, when the intensities of the change and background stimuli were matclied, the infant was required to produce three consecutive correct responses before proceeding to the Test phase. If training did not progress as the intensity differences were reduced, the following criteria were employed. First, if two consecutive incorrect responses (i.e., no head turn) were produced at intensity level 1 (+ 12 dB), a shaping trial (automatic activation of reinforcer) was introduced. Shaping was employed only at level I. Three consecutive failures to turn at any other level (i.e., levels 2-4), resulted in a return to the previous training level (e.g., three consecutive incorrect responses at level 3 would produce a return to level 2), after which the same criteria for proceeding were employed. Training was discontinued and the session terminated if the infant displayed a severe change of state or repeated failure to attain the training criteria at levels 1-3. If, however, an infant consistently failedat only level 4 (i.e., equal intensities), that infant was included in the study and classified as a non-discriminator. Test Phase. The Test phase ensued immediately upon attainment of the training criterion at level 4. The infant's parent and E2 had been fitted with headphones at the beginning of the Training portion of the session, and during the Test phase, these headphones delivered music at a comfortable listening level in order to mask the stimulus changes that the infant heard (E2 was alerted to the initiation of a trial by the lighting of a small LED, out of the infant's sight). As in the Training phase, E2 continued to entertain the infant with toys at midline. During the Test phase, a
MILLER, YOUNGER, AND MORSE
150
gating device randomly presented both experimental (i.e., change in voice category) trials and control (i.e., no.category change) trials. This randomization was constrained in two ways: (1) within every block of six trials, theprobability of experimental and control trials was .5, and (2) a given trial type (i.e., experimental or control) occurred no more than three times in succession. Both E1 and E2 depressed a vote button to signal a head turn if the infant turned during a trial. During this Test phase, a response was recorded (for both experimental and control trials) only if both vote buttons were activated. 2 Delivery of reinforcement on experimental trials was also contingent upon the activation of both vote buttons, but was inhibited on all control trials, even if both experimenters judged a response to have occurred. Test trials continued until one of the following conditions was met: (1) the infant failed to respond to three consecutive experirnental (change) trials, (2) the infant displayed an obvious and persistent change of state (i.e., sleepy or fussy), or (3) the infant had received approximately 40 test trials. Infants who completed the Training phase successfully were required to respond on at least six test trials (three experimental, three control) to be retained in the study.
Results and D~cussion
Acquisia'on. The training data from Experiment I may be seen in Table 3. As this table demonstrates, all of the subjects in this first experiment were able to complete the initial training phase, i.e., none was classified as a non-discriminator TABLE 3 Mean Number of Trials Required to Complete Training and Number of Trials and Percent-Correct Responding at Each Training Level in Experiments I-Ul. No. Trials Mean Trials
1
2
Training Level 3
4
Expt. I
15.75
6.58
3.17
2.50
3.50
Expt. II
29.83 19.50"
4.42 4.67
5.10 4.34
7.92 4.00
12.50 6.50
Expt. III
18.08
8.35
3.42
2.42
3.83
*Training data from six subjects in Experiment I! who completed training.
based on the criterion of three successive failures at level 4 (matched intensities). Table 3 summarizes the mean number of trials required to complete training as well as the number of trials required to reach criterion at each training level. In general, these data suggest that this discrimination was not a particularly difficult one since, on the average, infants required only 15.75 trims to complete training, only 6.75 trials greater than the minimum in which training could be completed (minimum number of trials = 9; i.e., 2 trials at levels 1-3 and 3 at level 4). z Providing E2 with masking music over headphones effectively prevented her from experimenter bias in voting decisions. Since E~ was not similarly protected, the requirement that both experimenters agree on the outcome of a test trial precluded the possibility of bias influencing these decisions during the Test phase.
INFANT VOICE PERCEPTION
151
Test. The test data from Experiment I may be seen in Table 4. For analyses of these test data, a percent-head turn (%HT = #HT/#trials) measure was computed separately for experimental (E) and control (C) trials. A two-factor analysis of variance with a within-subject factor of Trial Type (E vs. C) and a between-subjects factor of Background Group (F/M vs. M/F) was performed on these data. Since these data were percentage data, analyses were performed on the raw data as well as on their arc since transformations (Myers, 1972). Each analysis yielded a significant effect for Trial Type (raw data: F(I,10) = 279.8, p < .001; transformed data: F(1,10) = 116.13, p < .001). As can be seen in Table 4, this difference was characterized by a significantly greater rate of head turning to E trials than to C trials. Neither the main effect for Background Group nor its interaction with Trial Type was significant in either analysis, suggesting that discriminating males from females is no more or less diffcult than discriminating females from males. Thus the results of this study have demonstrated that 7-month-old infants can discriminate between the male and female voice stimuli employed in this study. The generality of this conclusion, i.e., the extent to which this represents the more general ability to categorize these voices, however, is limited in two ways. First, because the present study employed only six members of each category, it is possible that the subjects simply memorized all of the six change stimuli, rather than responding to the categorical quality of these voice stimuli. Since the task did not test for generalization to novel male and female voices, the present results cannot speak to this issue. Secondly, the results of this study do not make clear the role of specific acoustic cues in facilitating this discrimination. It is possible in the present study, for example, that infants were responding solely to a high-pitch vs. low-pitch distinction, rather than to some more abstract qualities of these male and female TABLE 4 Test Data for Individual Subjects in Experiments I-III.. Percent-HT on Experimental (E) and Control (C) Trials and Total Percent-Correct Responding.
Experiment I
Experiment II
Trial Type Subject I 2 3 4 5 7 8 9 10 11 12
Experiment III
Trial Type
E
C
%-Car.
.88 .82 1.00 1.00 1.00 1.00 .80 .95 .91 1.00 1.00 .57
.20 .08 .00 .00 .30 .32 .15 .24 .14 .14 .10 .11
.84 .86 1.00 1.00 .85 .84 .82 .85 .89 .93 .95 .75
.91
.15
.88
E
C
Trial %-Car.
Type
E
C
%-Car.
.38 .38 .50 .43 .40 .31 .75 .36 .67 .25 .26 .50
.69 .69 .53 .54 .60 .64 .48 .56 .33 .89 .62 .52
.40
.60
.67 .77
.57 .86
.54 .47
.67
.50
.58
.50 .57
.62 .71
.44 .43
.53
.50
.51
.77 .75 .57 .50 .60 .60 .73 .50 .33 .90 .50 .54
.63
.58
.50
.60
152
MILLER, YOUNGER, AND MORSE
voices. Although the stimuli in both categories included a wide range of fundamental frequencies, the overlap between the two categories was not large and the difference between the mean Fo in each category was quite large (peak difference = 97.5 Hz, mean difference = 64.54 Hz). Since pitch is the primary cue that serves to identify male and female voices, it would not be surprising to find it to be an important component in the infant's discrimination. However, since F 0 is not the only acoustic variable that differentiates these two classes of voices, it would be less interesting to discover that it was the only cue to which infants were responding. These two limitations to generality were addressed in Experiments II and III. In Experiment II, the possibility that infants were memorizing the six change stimuli was tested by assessing the extent to which infants can memorize a random assortment of these voices. This consisted of presenting infants with the same 12 voice stimuli employed in Experiment I (six background, six change voices), except that the stimuli were regrouped to form "random categories" of three male and three female voices each. The same head-turning task was employed. If infants were successful in learning this task, it would challenge our notion that infants in Experiment I were responding to category information in these voices. If, in contrast, infants could not learn the task, it would suggest that the ability to respond to groupings of these voices is contingent upon some specific relationship among the stimuli, i.e., similar category membership. Thus, in the present study, if infants are not successful, it would support the notion that infants' responses in Experiment I were integrative in nature. EXPERIMENT H
Method
Subjects. Twelve additional infants (age range: 27-32 weeks, mean: 29.5 weeks) were employed as subjects in the present study. There were five females and seven males. Fifteen infants were tested to achieve the final sample of 12. Three infants were eliminated because of an inability to learn the head-turning response during the beginning stages of training. Stimuli and Apparatus. The stimuli were the same 12 voices employed in Experiment I. These 12 stimuli were randomly reassigned to one of two "random categories" (RI or R2). The one constraint on randomization was that each "category" contain three female and three male voices. Acoustic descriptions of these stimuli may be seen in Tables 1 and 2 which reveal no significant differences between the two groups in any of the parameters measured. The stimuli within each category were randomly ordered and recorded on separate channels of the stimulus tape (i.e., Category RI was recorded on the left channel and Category R2 was recorded on the right) using the VOCAL program (Gillman & Wilson, 1979). The apparatus was identical to that employed in Experiment I. Procedure. Two groups of six infants each were tested with the same beadturning procedure employed in Experiment I. Half of the infants (Group RI/R2)
INFANT VOICE PERCEPTION
153
were trained to turn their heads to a change in one'group of six voices (Category R2), and the remaining six infants (Group R2/RI) were tested on the reverse discrimination (i.e,, background = Category R2, change = Category RI). The general procedures of the Training and Test phases were identical to those of Experiment I. Results and Discussion
Acquisition. Table 3 summarizes the training data for subjects in Experiment II, together with the comparison data from Experiment I. As this table illustrates, of the 12 subjects, only six were able to complete the Training phase and continue to the Test phase. The remaining six subjects failed the Training phase by exhibiting three successive failures, at training level 4. A Chi-square computed on the difference between the number of subjects in the two experiments who passed the Training phase was significant, X2 -- 5.54, df = 1, p < .02, indicating that the manner in which these voices were organized (i.e., Experiment 1 vs. Experiment II) had a significant effect on subjects' ability to learn this task. This difference in training performance was not evident in finer analyses of the training data (i.e., the number of trials to reach criterion), however. Although Table 3 demonstrates the number of training trials in Experiment II to be almost double that of Experiment I (i.e., 29.83 vs. 15.75 trials), it must be remembered that this figure includes the data from the six subjects who failed to complete training. Since the criterion for defining a failure to train was somewhat arbitrary, statistical comparisons included only the data from the subjects who were successful during the Training phase. An independent t-test comparing these six subjects to the subjects in Experiment I demonstrated no difference between the 19.50 trials in Experiment II and the 15.75 trials in Experiment I, t < 1. Thus, in terms of this measure, the acquisition of these two discriminations appears not to differ. Test. The test data from the six subjects who completed training in Experiment II are presented in Table 4, together with the test data of Experiment I. The results of the test data in Experiment II are quite clear: none of the six infants who completed the Training phase performed successfully during the Test phase. As can be seen in the total percent-correct scores, all subjects appear to be responding at chance level (i.e., approximately 50%). Support for this observation was revealed by an analysis of variance on the rate of head-turning measures (for E and C trials) with factors of Trial Type (E vs. C) and Background Group (R1/R2 vs. R2/RI), which revealed no significant main effects or interactions. The absence of, a main effect for Trial Type in this analysis confirms the observation in Table 4 that infants were producing head turns equally often to both E and C trials. In addition, an independent t-test, comparing the percent-correct scores of Experiments I and II, confirmed that the level of performance in Experiment II was significantly poorer than that in Experiment I, t(16) = 10.56, p < .01. Thus, both of these measures (%-correct and %HT on E and C trials) indicate that, despite apparent success during training, these infants were clearly not responding discriminatively to these stimuli, and thus had not learned the task.
154
MILLER, YOUNGER, AND MORSE
The results of Experiment II are very clear: infants did not learn to respond to a random assortment of 12 voices. None of the infants in the present study were able to learn to turn their heads consistently to the appropriate ctiange stimuli or to inhibit head turns on control trials. Thus, the possibility that the infants in Experiment I were responding to a memorized set of stimuli, rather than to category information, appears to be relatively unlikely. Rather, these results, together with the results of Experiment I, suggest that infants could learn to respond to a set of natural voices only when those voices are united by some common spectral feature(s), e.g., male vs. female characteristics. When these stimuli possess no obvious acoustic or perceptual similarities, infants did not learn the task. Thus, the suggestion in Experiment I that infants were responding to these male and female voices as a group rather than individually is supported by the present study. 3 Whether the infants may have been responding to a configuration of male vs. female spectral cues or only to high-pitch vs. low-pitch characteristics remains to be explored. In Experiment HI, .the extent to which pitch may have been used as a cue for male/female categorization was explored by assigning 12 voice stimuli to one of two pitch "categories." These categories were similar to the random categories in Experiment II in that each category contained three male and three female voices. However, in Experiment 111, the voices within each category were selected such that the difference in the mean F0 between the two categories approximated the male/female F0 difference in Experiment I. Thus, Experiment 111 was designed to determine if infants can learn to ignore random variations in other cues that may signal sex of speaker and respond to groups of voices solely on the basis of their pitch characteristics. E X P E R I M E N T 111
Method
Subjects. Twelve 7-month-old infants, 8 girls and 4 boys (age range: 29-32 weeks, mean: 30.3 weeks), participated in Experiment IZI. Six additional subjects were tested: one was eliminated due to experimenter error, and five were not able to reach criterion on training level 1. Stimuli and Apparatus. The stimuli employed in Experiment 111 were 12 natural voice tokens (six male and six female) consisting of five of the voices previously employed in Experiments I and II and seven new tokens. The 12 voices employed in this study were selected from among 24 similar stimuli employed by Miller (1979)in a previous study of infant voice perception. These 24 stimuli represented sufficient variability in F0 that it was possible to select three "highpitched" and three "low-pitched" voices from each gender category such that the mean difference in fundamental frequency between the two "pitch" categories (Hi vs. Lo) was similar to the mean F0 difference between the male and female catego3 Admowledging the difficulty of interpreting negative discrimination results with respect to the null hypothesis, it is important to note that a similar version of this" "random experiment" has been employed by others with essentially identical results (of. Kuhi, 1980). •
INFANT VOICE PERCEPTION
155
ries employed in Experiment I. This relationship may be seen in Table I. The stimuli within each category were ordered randomly and recorded on separate channels of audio tape using the VOCAL program (Gillman & Wilson, 1979). The apparatus was similar to that employed in Experiments I and II.
Procedure. The procedures of the Training and Test phases of each session were identical to those of Experiments I and II. The 12 infants were divided into two groups of six each. One group was trained to turn to the set of low-pitched voices against a background of high-pitched voices (Group Hi/Lo), and the other group was trained on the reverse discrimination (Group Lo/Hi). Results
and
Discussion
.
Acquisition. The training data of Experiment III appear in Table 3, along with the training data from Experiments I and II. As ih Experiment I, and in contrast to Experiment II, all subjects in the present experiment were able to complete the Training phase. A simple analysis of variance performed on the training data from the three experiments indicated no significant difference in the number of trials required to reach the training criterion, F < 1. Test. The test data from individual subjects may be seen in Table 4. In contrast to the training data, the overall level of performance in these test data is quite low, much lower than that observed in Experiment I. Indeed, most of the subjects are responding at or around chance level, and only one subject is responding at greater than 75% correct (i.e., the lowest level of performance seen in Experiment I). Independent t-test comparisons among the three experiments confirmed the observation of a generally lower level of test performance in Experiment III relative to Experiment I (raw data: t(22) = 6.44, p < .001; transformed data: t(22) = 5.81, p < .001), and revealed no difference between test performance in Experiments II and III. Thus, although there appears to be little difference in the ease of acquisition among Experiments I, II, and HI, the test data show very clearly that discriminative responding to pitch "categories" is inferior to male/female responding and is in fact comparable to infants' responses to random collections of stimuli. These data may not indicate a total failure of discrimination, however. Examination of Table 4 reveals that at least one subject in the present study demonstrated good discrimination during the test phase (Sl0 = 89% correct). The test performance of two additional infants (S 1 and $2) was also relatively high (69% cdrrec0, but fell just short of significance (z = 1.94 and 1.52, respectively). In addition, an analysis of variance similar to those performed in Experiments I and II indicate that, as a group, bead-turning to E trials was significantly greater than that exhibited to C trials (raw data: F(1,5) = 6.87, p < .05; transformed data: F(1,5) ----7.42, p < .05). This significant difference probably can be attributed primarily to the performance of S10, however, since when her data are removed from the analysis, the difference between E and C responding is no longer significant. Thus, although there is evidence that at least one subject did learn this pitch categorization task, the major-
156
MILLER, YOUNGER, AND MORSE
ity of subjects in the present study did not respond discriminatively during the test phase, thereby indicating a lack of discrimination. GENERAL DISCUSSION Interest and research on the nature of the infant's perception of voices has not been extensive. The few studies that exist have suggested that infants can discriminate between two different voices (Kaplan, 1969; Turnure, 1971) and between different tokens of the same voice (Culp & Boyd, 1975; Culp & Gallas, 1975). The present set of studies has extended this literature by investigating infants' response to voice categories. In Experiment I, it was suggested that infants can categorize male and female voice tokens, and this conclusion was supported in Experiment II. Experiment 11I was an attempt to begin to identify some of the cues that might contribute to this ability. With respect to the ability of fundamental frequency to serve as a cue for categorization, the results were somewhat ambiguous. Discriminative responding in this situation was significantly poorer than that in Experiment I (male vs. female) and apparently no better than the chance responding to the randomly-organized "categories" in Experiment II. However, the finding in Experiment III that headturning to experimental trials significantly exceeded head-turning to control trials does provide somewhat tentative evidence of discrimination in this situation. These data suggest that, although F 9 may contribute to infant discrimination of male and female voice categories, infants apparently did not group voices based solely upon this cue. Although fundamental frequency may be the primary and/or most salient cue that differentiates male and female voices, there are many other cues that contribute to this difference as well. Formant frequencies, for example, tend to be on the order of 20% higher in a female voice than in a male voice (Fant, 1973; Peterson & Barney, 1952). The acoustic analyses displayed in Table 2 confLrm the existence of significant male/female differences in both F1 and F2--differences that do not exist between either the RI/R2 categories or the Hi/Lo categories. Thus, these acoustic analyses support the argument that infants in Experiment I were basing their responses on a complex of cues, rather than only on F0. In accordance with our original definition of categorization, then, we can say that infants were responding in a categorical way to these male and female voices. The generality of this conclusion is, however, "limited in that we cannot state that infants have demonstrated abstract knowledge of "maleness" and "femaleness" in these voices. It is not clear how this more abstract knowledge might be demonstrated in young, preverbal infants; however, experiments in which generalization to novel male and female voices is measured would certainly add to the generality of the present results. In a recent study of the development of voice categorization, Miller (in preparation) has addressed the question of generalization more directly, In this study, following habituation to one group of, e.g., female voices, infants were presented with either novel female voices or with male voices. Although 2-month-old infants demonstrated discrimination (dishabituation) of both stimulus changes, 6-month olds dishabituated only to the change in category.
INFANT VOICE PERCEPTION
157
Dishabituation was not observed when noval voices of the same gender category were introduced. These results suggested that 6-month olds had habituated to some feature(s) of these voice categories beyond the specific spectral qualities of the habituation stimuli. Thus, these results are consistent with the notion that the 7month olds in the present study were responding to male vs. female voice categories. The demonstration that infants can classify male and female voices has a number of implications for the infant's development. As indicated earlier, sensitivity to speaker information is an important component to speech perception in adults. For adults, accuracy in language reception depends both on the ability to ignore speaker information, as well as to attend to speaker information when it is taskrelevant. The present results, together with those of Kuhl (I 980), suggest that by 7 months of age, infants are developing both of these skills, and are becoming more competent language receivers. In addition, these results have suggested that by 7 months of age, infants have learned to organize complex acoustic information into a system of classification, thus adding to a rapidly growing catalog of similar skills that infants are beginning to demonstrate (e.g., Cohen, 1977; Kuhl, 1980). Remaining for future research are questions concerning the specific psychoacoustic underpinnings of voice classification as well as studies of the development of this ability. Although the recent study by Miller (in preparation) has demonstrated that developmental changes in voice perception do occur between 2 and 6 months of age, we do not know how this development happens. One hypothesis with respect to this development is that the infant's perception of her parents' voices as category prototypes combines with extensive experience with other male and female voices in the formation of these categories. 4 A final word of methodological interest merits some comment. It is becoming increasingly apparent from the present studies and others (e.g., Miller, 1979, Experiment II; Kuhl, 1980; Eilers, Morse, & Gavin; personal communication) that this head-turning paradigm may be particularly valuable for investigating differential discriminability among various stimulus contrasts. Interestingly, it appears that in general, these differences will be reflected in measures of how well the discrimination was learned (i.e., test performance) rather than in how difficult it was to learn (i.e., length of training). The finding in the present studies that test data are more sensitive to differential discriminability than training data has now been replicated several times by different investigators (Eilers, Morse, & Gavin, personal communication; Kuhl, 1980; Miller, 1979). Although it is not clear why this difference in sensitivity might exist, more extensive investigation into the nature of this difference might yield valuable information about the nature of the infant's information-processing strategies. To summarize, the present set of studies has shown that 7-month-old infants can categorize male and female voices and has suggested that one of the cues that might be facilitating this ability is fundamental frequency. Since F0 is the primary 4 The interested reader is referred to Miller (1979; Experiment fl) for one attempt to address the question of the categorization of parents' voices.
158
MILLER, YOUNGER, AND MORSE
cue that governs adult perception of these voices (Coleman, 1973), this is not particularly surprising. However, discrimination of F0 did not appear to account fully for the ability to classify male and female voices in the present study. Thus, it is probable that the resposne to these male and female voice categories was based upon the integration of a variety of cues.
REFERENCES Cohen, L. B. Concepi acquisition in the human infant. Paper presented at the Biennial meetings of the Society for Research in Child Development, New Orleans, 1977. Coleman, R. O. A comparison of the contributions of two vocal characteristics to the perception of maleness and femaleness in the voice. Speech Transmission Laboratory, Quarterly Progress Status Report, 2-3, Royal Institute of Technology, Stockholm, Sweden, 1973. Culp, R. E., & Boyd, E. F. Visual fixation and the effect of voice quality and content differences in 2-month-old infants. In F. D. Horowitz (Ed.), Visual attention, auditory stimulation, and lan-
guage discrimination in young infants, Monographs of the Society for Research in Child Development, 1975, 39, 78-91. Culp, R. E., & Oallas, H. G. Discrimination of male voice quality by 8- and 9-week-old infants. Paper presented at the Biennial meetings of the Society for Research in Child Development, Denver, 1975. Eilers, R. W., Wilson, W. R., & Moore, J. M. Developmental changes in speech discrimination in three-, six-, and twelve-month-old infants. Journal of Speech and Hearing Research, 1977, 20, 766-780. Eimas, P. D. Auditory and linguistic processing of cues for place of articulation by infants. Perception and Psychophysics, 1974, 16, 513-521. Eimas, P. D., Siqueland, E. R., Jusczyk, P., & Vigorito, J. Speech perception in infants. Science, 1971, 171, 303-306. Fant, G. Speech sounds and features. Cambridge, MA: MIT Press, 1973. Gilhnan, C., & Wilson, D. Unified systemfor the synthesis, presentation, and analysis of speech. Paper presented to the National Conference on the Use of On-Line Computers in Psychology, November 1979. Kaplan, E. L. The role of intonation in the acquisition of language. Unpublished Ph.D. dissertation, Comell University, Ithaca, NY, 1969. Kuhl, P. K. Perceptual constancy for speech-sound categories. In O. Yeul-Komshian, J. Kavanaugh, & C: Ferguson (Eds.), Child phonology: Perception and production. New York: Academic Press, 1980. Kuhl, P. K. Speech perception in early infancy: Perceptual constancy for spectraily dissimilar vowel categories. Journal of the Acoustical Society of America, 1979, 66, 1668-1679. Kuhl, P. K., & Hillenbrand, J. Perceptual constancy for categories based on pitch contour. Paper presented at the Biennial meetings of the society for ReSearch in Child Development, San Francisco, 1979. Miller, C. L. Voice recognition and categorization in infants. Research Status Report, Infant Development Laboratory, University of Wisconsin-Madison, 1979, 3, 95-178. Miller, C. L., & Morse, P. A. The "heart" of categorical speech discrimination in young infants. Journal of Speech and Hearing Research, 1976, 19, 578-589. Miller, J. Phonetic determination of infant speech perception. Unpublished Ph.D. dissertation, University of Minnesota, Minneapolis, MN, 1974. Morse, P. A. The discrimination of speech and non-speech stimuli in early infancy. Journal of Experimental Child Psychology, 1972, 14,477-492. Myers, J. L. Fundamentals of experimental design. Boston, MA: Allyn and Bacon, 1972. Peterson, G. E., & Barney, H. L. Control methods used in a study of vowels. Journal of the.Acoustical Society of America, 1952, 24, 175-184.
INFANT VOICE PERCEPTION
159
Rand, T. C. Vocal tract size normalization in the perception of ltop consonants. Paper presented at the meetings of the Acoustical Society of America, Washington, DC, 1971. Sullivan, J. W. The effects of intonation on infant attention. Unpublished Ph.D. dissertation, University of Kansas, Lawrence, KS, 1980. Swoboda, P., Kass, J.., Morse, P., & Leavitt, L. Memory factors in infant vowel discrimination of normal and at-risk infants. Child Development, 1978, 49, 332-339. Swoboda, P., Morse, P., & Leavitt, L. Continuous vowel discrimination in normal and at-risk infants. Child Development, 1976, 47, 459-465. Trehub, S. Infants' sensitivity to vowel and tonal contrasts. Developmental Psychology, 1973, 9, 81-96. Turnure, C. Response to voice of mother and stranger by babies in the first year. Developmental Psychology, 1971,4, 182-190. Wilson, W., Moore, J., & Thompson, G. Sound-field auditory thresholds of infants utilizing Visual Reinforcement Audiometry (VRA). Paper presented at the Annual Convention of the American Speech and Hearing Association, Houston, 1976.