Gender Differences in Long-Term Average Spectra of Children’s Singing Voices Desmond C. Sergeant and Graham Frederick Welch London, United Kingdom Summary: This paper forms part of a larger study into the nature of singing development in children and examines gender differences in long-term average spectra (LTAS). Three hundred and twenty children in age groups 4–11 years learned a song and were then recorded singing alone. LTAS curves were calculated for each voice. Age of each singer was estimated and gender attributed by a panel of independent listeners. Rate of gender identification (71%) was consonant with that reported for children’s speech. Progressive statistically significant shifts of spectral energy as a function of increasing age (reported in a previous study) were found to be present in the data for both genders, but the developmental timetable over which the changes took place was earlier for girls than for boys. A theoretical basis for the developmental changes is proposed. Key Words: LTAS–Age–Singing–Children–Gender–Developmental changes. INTRODUCTION In a recent paper, the authors presented data of long-term average spectra (LTAS) curves computed from the vocal products of a large cohort of children in the age range 4–11 years who each sang a song.1 The data showed statistically significant shifts in spectral energies as a function of increasing age: energy levels at frequencies above 5.75 kHz decreased between the ages of 4 and 11 years, while those at frequencies below 5.75 kHz increased. The shifts were evident as a trend in the data across all of the age range sampled and were particularly operative in children between the ages of 6 and 11 years. The age-related differences were attributed to increasing maturity and competence of the vocal system, including changes in patterns of glottal closure. In view of the unlikelihood that energies at very high frequencies would possess harmonicity,96,97 those above 5.75 kHz were assumed predominantly to comprise noise. The present paper investigates three issues by further analysis of the same data bank: to discover whether the energy shifts previously reported would be found to be present equally in the spectra for both genders; should any between-gender differences in the shifts be observed, to examine possible relationships between such differences and patterns of gender identification judgments made by listeners; to attempt a fuller understanding of the auditory criteria against which listeners determine the gender of a speaker or singer and, in particular, to distinguish between factors that might be regarded as acoustical/spectral from those of sociophonological/behavioral origin, that is, between those features described by Laver2 as intrinsic—derived from invariant physical foundations of speaker’s vocal apparatus, and those that are extrinsic—aspects of vocal Accepted for publication October 11, 2007. From the School of Arts and Humanities, Institute of Education, London, United Kingdom. Address correspondence and reprint requests to Graham Frederick Welch, Department of Arts and Humanities, Institute of Education, 20 Bedford Way, London WC1H 0AL, UK. E-mail:
[email protected] Journal of Voice, Vol. 23, No. 3, pp. 319-336 0892-1997/$36.00 Ó 2009 The Voice Foundation doi:10.1016/j.jvoice.2007.10.010
activity that are under volitional control of speaker for purposes of communication, whether consciously or not.
Fundamental frequencies The presence of gender cues in the speech products of children, sufficient to enable listeners to reliably attribute gender, has been demonstrated in a number of studies. Rates of recognition reported for adults have been as high as 100,3,4 95,5 and 81%6; in this situation, the characteristic octave pitch difference is a salient and probably determining cue.7–9 Rates for children have generally been lower, although they have been at better-thanchance levels. Sachs et al6 report 81% accuracy, Weinberg and Bennet10 found 78% for boys and 71% for girls; Meditch11 74% for children as young as 3 years and 74% for children of 5– 6 years of age; and Karlsson12 66% for 3–8-year-old children from three different European language contexts. Inconsistencies in the reported rates are likely to be attributable to differences in age groups sampled, differing sample activities, for example, connected speech or vowel derivatives,13 and differences in experimental situation.14 A number of studies have proposed the possibility of differences in speaker fundamental frequency (SFF) as an obvious principal criterion against which listeners attribute speaker gender. Although intuition might support such a conclusion, alongside evidence from at least one study,15 the collective evidence in respect of prepubertal children is overwhelmingly negative to the proposition.16–24 Where studies have reported intergender differences, these have usually proved to be small in magnitude, and conflicting in direction, with boys sometimes yielding lower SFFs than girls,24,25 sometimes higher.6 In those rare cases where data of variance among speakers have been provided, standard deviations for perceived gender groups have often considerably exceeded the reported frequency differences between their respective means. Generally, fundamental frequency (F0) has been reported as a poor predictor of male/female voice identity. Studies of vocal output among transsexuals have demonstrated that voice pitch alone cannot be a determining criterion for gender perception, and that achievement of high average F0 does not guarantee perception of femininity.26–28 Perhaps most damaging to the proposition of a systematic relationship between F0 and accurate
320 gender perception is the evidence of Lass et al29 that listeners are able to attribute gender accurately even when F0 is filtered out from the signal. Intonation Gender-specific patterns of intonation have also received attention from researchers as a possible criterion cue for listeners. There is good evidence that males and females do use intonation differentially, and that listeners will allocate voice samples to categories of femininity/masculinity with reliable consistency.15,30–37 Differences have been found between genders in percentages of rises and falls of F0, with female speech showing more numerous changes of inflection,35 and higher rates of change38 than for males. By contrast, male speech has been found to show a greater number of periods of uninflected pitch, and these have been of longer duration than for girls. Male and female speakers show clear differences in preferences for avoidance of some intonational patterns.39 Ferrand and Bloom,37 for example, report that at age of 7–8 years males begin to restrict intonational ranges in speech, their participants showing significant boy/girl differences (P ¼ 0.003) in the number and extension of flat periods. Similarly, Key36 noted from children’s reading of a colorful story that boys showed reduced use of expressive intonation compared with girls. There are several important factors, however, that differentiate the acoustic phenomena of running speech or reading and those of singing, especially in respect of intonation. In song, the average F0 of the vocal product is determined normally by the rise and fall of the melodic line; it is not open to autonomous control by the singers, and, given relative accuracy of vocal pitch, is therefore not influenced by personal habitual behaviors of their speech intonation. The degree of freedom to vary pitch is limited to that moment before the start of the performance, when the unaccompanied singer determines the pitch level at which to perform the song. Once this is decided, if the performance is competent, the pitches of subsequent sounds of the product will follow in appropriate relation to the starting note. There is thus no measure that can be applied to output of singing that is an equivalent of the SFF of speech; a measure of F0s in song will yield only values for the relative presence or absence of individual degrees of the scale in the melody. Other important factors are encompassed by these differences between speech and song encompass them further important factors. The role of intonation in human communication is to empower the speaker to give emphasis to essential elements of the message. In speech, loudness and pitch are typically interdependent—a rise in pitch is usually associated with an increase of emphasis and, therefore, is also associated with an increase of loudness.40,41 In singing, these are two separate phonatory parameters42 under independent control as, for example, when a diminuendo occurs on a rising passage. In singing, emphasis is created by temporal features of the song, that is, from the relative durations of sounds in the rhythmic structure of the music and also from its metrical framework through which, by placement of stressed syllables at accented points of the music’s metrical frame, the composer contrives to make the words meaningful through the carrier medium of the music. As
Journal of Voice, Vol. 23, No. 3, 2009
Nessel90 observed, if vowels are prolonged or sustained, a systematic increase of energy will occur. As a consequence of the above factors, there are substantial differences in the demands of subglottal pressure control in speech and singing.42,43 As Wendler et al44 comment, the physiological correlates of singing are quite possibly different phenomena than those of speaking, though the degree of difference may be related to the style of singing. A corollary of this is that LTAS calculated for song cannot be expected to be directly comparable to those derived from speech, even if the two are products of the same individual. A further factor that qualitatively separates the spectra of song from those of speech arises from the word patterns in song. In free speech, the speaker chooses words only on criteria of their appropriateness to the meaning to be communicated; in the case of the reading of a written passage, they are determined a priori by the writer on similar criteria. In song, word usage is subject to greater constraints. For example, because of the characteristic rhyming of final words of the lines, or other regular points of the lyrics of a song, some vowels will recur in a way that would be uncharacteristic of normal speech. Also, a skilled composer of vocal music may avoid using ‘‘difficult’’ vowels, especially ‘‘back’’ vowels on higher pitches, in recognition of the difficulties these can create for the singer for consistent communication of word meaning. The range of vowels and consonants in a sample of song will, therefore, typically be narrower than for speech. Formant frequencies A third candidate proposed in several studies as the determinant criterion for gender attribution is the possibility of gender-related differences in formant frequencies (Ffrs). Here, the evidence is more positive, with several studies reporting higher values for females than for males.15,19,45,46 This is unsurprising in adults, where physical dimorphisms include vocal tract (VT) lengths and area ratios; these tend to be greater for males and thus resonances can be expected to be lower.47 In children, reported data of VT dimensions have been contradictory and reliant on circumstantial evidence, such as inferences drawn from facial dimensions, rather than on direct measurement. Where differences have been reported, they appear to be small.48– 52 However, there is a substantial body of evidence indicating that body size is not a relevant factor, especially in children.53–59 Merow and Broadbent60 report that males and females remain comparable in size and shape until 9–12 years, with boys having only slightly larger oral cavities up until that time. Perry et al20 measured children at ages 4, 8, 12, and 16 years for height, weight, sitting height, and neck circumference, finding a correlation with neck circumference and Ffr values at age 12 only, but for no other factors at any other ages. Difficulties of physical measurement that have resulted in approximations rather than absolute measurements have been addressed by the use of magnetic resonance imaging (MRI). Using this method, Yang and Kasuya61 measured VT dimension and found that data for VT length were continuously distributed across genders and that their scaling did not reflect the evidence of articulatory phonetics. An earlier study by
Desmond C. Sergeant and Graham Frederick Welch
Tecumseh Fitch and Giedd62 reached similar conclusions, that is, though there are clear differences in VT morphology between adult men and women, these sex differences are not evident in children, but arise after the onset of the period of puberty. A further difficulty with a proposition that differences in Ffrs are a direct outcome of differences in VT lengths and shapes is that the inverse of the proposition, that is, differences in Ffr values necessarily infer corresponding differences in VT length, is only true in part. This may lead to a conclusion that differences of VT length may also have behavioral and physical features, originating in gestural differences32,59,63–66 in the way that some speakers may manipulate speech of output to portray certain images.67 Bennett and Weinberg,17 Bennett,16 and Sundberg68 point out that we can change the frequency of the lower two formants by two octaves or more through changing the position of the articulators. Sachs et al,6 for example have posited an alternative possibility that boys may habitually use smaller jaw opening, more rounding of lips, lower larynx positions, or head elevations, resulting in relatively small extensions of the VT. Other writers also argue that formants can be greatly modified by changing the size and shape of the laryngeal tube, and that this may explain weak relationships observed between body size and Ffrs.55,56,59,69 For a proposition that Ffrs stem from gender-specific behavioral differences to be viable, however, the relevant behaviors responsible for these physical adjustments would have to be near universal, extending across many cultural and languagespeaker groups. This is not impossible, since, although there are differences between languages and cultures, crosslinguistic research suggests that there are also some universals of speech behaviors: for example, adult speech addressed to infants has been found to show higher overall pitches, wider pitch excursions, more distinctive pitch contours, slower tempi, and longer pauses. These are evidenced across a number of European languages, as well as Japanese and Mandarin Chinese.70,163,164 There is some agreement that, while there is limited evidence of physical gender differences,54,61,70,71 the male/female differences of vocal phenomena are greater than could be explained by differences in VT size; as Trauenmuller72 comments, ‘‘It is well understood that all Ffr of vowels vary between speakers in rough proportion with the inverse of VT length—however, the male/female differences observed in Ffr of the same vowels cannot be accounted for by uniform scaling factors.’’ There appears to be general acceptance that anatomical differences between males and females only explain part of the differences in Ffrs. Nevertheless, there appears to be a convergence of evidence in the literature that (1) Ffrs decrease with increasing age during childhood, converging with adult values at around 16 years of age and (2) during the period of preadult developmental change, girls consistently evidence higher Ffrs values than do boys, though this is highly dependent on the formant measured, speech-sample content, vowel type, and age group.19,47,73–76
Gender Differences in Children’s Voices
321
Few reports provide the reader with data than can be evaluated fully, but the comprehensive tables of Ffr values provided by Huber et al77 are worthy of examination. In their study, a substantial and well-balanced cohort of subjects was sampled, with 10 male and 10 female speakers from age groups at 4, 6, 8, 10, 12, 14, 16, 18 years and adults. Vowel samples were collected at three vocal intensity levels—low, comfortable, and high. Data provided by the authors for the ‘‘comfortable effort’’ condition are graphed logarithmically over age groups in Figure 1. These confirm the evidence typically reported in other studies (cited above), that is, that frequencies for the first three formants decrease with age, and that there is a tendency for girls to yield higher values than boys of comparable age. However, as the authors note, the between-gender differences of mean Ffrs are small. For example, authors report a mean gender difference of 1 Hz for the first formant (F1) at age 8, and 3 Hz for age 10. These intergender pitch distances are greatly overlapped by the standard deviations quoted for their respective distributions: for F1 (boys) ¼ 82 Hz, F1 (girls) ¼ 97 Hz and for F2 (boys) ¼ 94 Hz, F2 (girls) ¼ 131 Hz. Comparisons of mean differences at different frequency levels along a logarithmic scale of frequency are, in any case, highly problematic and their standard deviations are inherently skewed. The proposition that differences in Ffrs are the determinant criterion for listener differentiation of gender must therefore be regarded for the present as unproven. In summary, although the nature of the cues used by listeners to make successful identifications of speaker/singer gender have been the subject of extensive discussion and research, the mental processes and the precise auditory information used by listeners are not yet fully understood. A possible reason for this may be a priori assumptions that listeners will use a single feature as the principal criterion parameter, whether physical or behavioral in origin, whereas it seems more probable that they monitor simultaneously along several auditory parameters, selecting information from among these to determine perceived gender.35,78 It is also possible that some of these parameters may be in the nature of interaction traits, comprising both physical and behavioral elements that impinge on habitual oral cavity and VT configurations.30,79,80 It is also possible that the information used by listeners may not be consistent from one speaker/singer to another, and listeners themselves may differ. The best we can do at present is to examine the freedoms and limitations of several possible parameters; to attempt to identify a single criterion factor is possibly fruitless.
METHOD Approximately 700 children (age range 4–11 years) in 13 schools in and around London learned two songs. Creation of two songs provided children and their teachers with some element of choice, while ensuring common pitch data in the vocal output. The songs were constructed so that they contained identical numbers of notes in their melodies, shared common pitch ranges (A3 ¼ 220 Hz to B4 ¼ 493 Hz) and tonalities (C major), were suited to the age ranges of the children,39 used gender-neutral words, and comprised similar melodic and rhythmic patterns.
322
Journal of Voice, Vol. 23, No. 3, 2009 data for phonation in 'comfortable' condition girls boys
F3
mean frequency values (kHz)
4
3
F2
2
F1 1 F0 0 4
6
8
10
12
14
16
18
adult
age-group
FIGURE 1. Mean frequencies for F0 and F1, F2, and F3 by age. Data from Huber et al.
77
The songs were taught to the children by the teachers who were normally responsible for music education in their schools, keeping to a previously agreed pattern of teaching which sought to control the number and duration of teaching sessions, the number of repetitions of songs during these sessions, the method of teaching, and the pitch at which the songs were presented. As far as was practical, all children had a common exposure to the songs. In the week after the final teaching session, one of the research teama visited the school and met the children, and measured them for weight and height. In a relaxed and informal situation, each child individually sang one of the songs (child’s choice). No starting pitch was provided before the performances began, each singer spontaneously selecting his/her own comfortable pitch level. Similarly, no instruction was given concerning the loudness level at which the children should sing, as it was assumed that they would naturally select a comfortable level for themselves. Class teachers responsible for the children were asked about the children’s general health and to identify any who were suffering from any declared medical condition or relevant disorder. Only a small number were so reported. The visiting researcher had a trained background in speech science and pathology, and was competent to identify any singers likely to yield abnormal data. As the object of the research was to gain demographic normative data from a large cohort of children, only cases of voice disturbance that would have brought untypical data were excluded from the sample. Each child’s performance was recorded digitally at a sampling rate of 44.1 Hz, with a 16-bit resolution, using a good quality tie microphone (600 U series 250–485 model, RS Components, a
The research team were Graham Welch, Desmond Sergeant, and Peta White (now Peta Sjo¨lander).
Northants, UK) positioned approximately 125 mm below the mouth. Recording levels were monitored to ensure equality of dynamic input. After collection of the vocal samples, 320 examples were selected from among them. Selection was managed by means of singer code numbers to prevent investigator bias and determined only on the criteria that (1) samples where the performance was incomplete, or where the song was not readily recognizable were excluded and (2) the data bank for analysis should be so structured that there would be 40 children (20 of each sex) in eight age groups, aged 4–11 years.b The mean age of the children whose vocal products formed the 320 samples was 7 years 11 months; means for both genders were similarly 7 years 11 months. The samples were then copied to tape in randomized order. Four independent experienced listeners, not otherwise engaged in the research, and all professional musicians with not less than 20 years experience of working with young voices, rated each sample for both age and gender using a simple response form. Singer no: ****** Estimated sex of singer: I’m sure it’s a boy
I’m sure it’s a girl Could be either
Estimated age (years) of singer: 4
5
6 7
8 9 10 11
After completion of their rating sessions, listeners were
asked to list any factors that they considered had informed their judgments. LTAS curves were calculated for each vocal sample, using the Praat software system (Praat version 4.3.03, Boesma and Weenink,81 selected for its comprehensive functions and accessibility), with measurements taken at 500 Hz intervals through a spectral range of 0–20 kHz, at the center of each 500-Hz bandwidth. Any extraneous noise (inevitable when recording in a free-field situation) was isolated by adjustment of filters so as to exclude all but genuine voice data. There has been extended discussion in the literature as to whether LTAS is more validly applied to an entire speech sample, or whether it should be restricted to those segments generated by the voice source. Lofqvist and Mandersson103 took the view that inclusion of unvoiced elements in an analysis would constitute corruption of data, but Wendler et al44 compared LTAS curves for samples containing unvoiced elements with others from which these had been removed, and found no statistical differences between the two sample types. Clearly, a decision on this issue in respect of a particular research will depend on the data that are sought. In the present case, since the causes of the high frequency noise observed in our previous study1 b In the event, by the time of the recordings, one of the selected children was found to be a little short of a fourth birthday and four had recently passed their twelfth birthdays.
RESULTS Gender attribution Listener ratings along the seven-point confidence scale were scored using a ‘‘proximity to truth’’ procedure by counting correct gender recognitions with higher-than-median ratings of 5, 6, or 7 as positive identifications and ratings of only 1, 2, or 3 or an ambivalent median score of 4 (‘‘could be either’’) as negative. Of the total 1280 ratings, 71.4% proved to be positive, with an above-chance probability significant at P < 0.001. This fits well with the general findings for gender identification from speech. Recognition rates for boys and girls for the whole sample were closely similar: 71.56% for boys and 70.16% for girls. This difference is small and nonsignificant, but there is other evidence82 of a slightly higher identification rate for boys than for girls, though the level of significance for that finding is not indicated. Gender differences in positive identifications are plotted over age in Figure 2. The trajectory for correct identifications by age is steeper for boys than for girls, indicating that there was a significant and separate age effect for gender perception, with the level of gender-specific information contained in the voice samples increasing over age for boys more so than was the case for girls.
girls boys 3.50
Mean no of positive identifications for each singer
were uncertain, no elements were removed from samples before calculation of LTAS values. Group means were compared across genders and age groups. Energy levels for each measurement point in the spectra were compared by means of regression analysis and t tests using gender as the grouping variable. Although, as argued above, the interaction of intonation and emphasis that characterizes speech is not applicable to song, it was felt that the use of dynamic variation for expressive purposes would not be impossible for these singers. Accordingly, two measures were constructed as appropriate measures of dynamic movement. The mean energy of the two lowest points of measurement in the LTAS was compared with the mean of the highest two points of measurement, and the difference computed as a measure of overall distance from maxima to minima of spectral energy for each singer. As a second measure, the standard deviation of energies at all measurement points across the LTAS spectrum was also computed for each singer. Ffr values for the lowest four formants for three vowels that were common to both songs (/e/, /o/, and /i:/) were extracted for analysis from the voice samples of 80 randomly selected singers, five of each gender from each year group, using the formant extraction tool of Praat software with settings recommended for children’s voices. Because of the extended duration of the vowels on elongated notes of the musical rhythm of the song, it proved relatively easy to isolate a section of approximately 100 milliseconds that was free from influence of preceding and terminating consonant transitions. Average Ffrs for these sections were extracted.
323
Gender Differences in Children’s Voices
3.25
3.00
2.75
2.50
2.25
2.00 4-5 years
6-8 years
9-11 years
age-group
FIGURE 2. Positive gender identifications by singer age group. estimates given, and a mean standard deviation of 1.2 years. Estimates for both genders increased in accuracy with increasing singer age, though rates of improvement were not uniform between genders. Figure 3 shows near-perfect linearity of trajectories for age estimates for both genders (boys F linear ¼ 533.3, P ¼ 0.001; girls F ¼ 567.93, P ¼ 0.001). In general, errors made for both genders were underestimations of singer age, particularly around the ages of 8 and 9 years (see standard deviations—Figure 4).
10.00
girls boys
9.00
mean estimated age
Desmond C. Sergeant and Graham Frederick Welch
8.00
7.00
6.00
5.00 linear trends: Boys F = 533.3 p = .000, Girls F = 567.93 p = .000 4.00
Age estimates Estimates of singer age were also commendably accurate with a mean error of only 1.05 years for the 4 3 320 ¼ 1280
4
5
6
7
8
9
10
actual ages
FIGURE 3. Estimated ages versus actual ages.
11
324
Journal of Voice, Vol. 23, No. 3, 2009
Overall standard deviations for median points of the vocal ranges for the two genders were close: boys ¼ 1.92 semitones and girls ¼ 2.19 semitones.
standard dev. of errors
1.1 girls boys
1.0 0.9 0.8 0.7 0.6 0.5 3
4
5
6
7
8
9
10
11
12
age group (years)
FIGURE 4. Ambiguity of perceived age: standard deviations of age estimates.
Vocal pitch ranges Overall pitch ranges at which songs were recovered from the children covered a total range of 24 semitones, extending from a low of 64 cents below Eb3 (a girl) to a high of 35 cents above D5 (a boy). Differences of means for genders are plotted by age groups in Figure 5. Save for a small region at the upper extremity of the pitch ranges reached by 9–11-year-olds, where girls achieved a slightly higher mean pitch range than did the boys of the same age (slightly in excess of 1 semitone, significant P ¼ 0.034), no significant gender differences in pitch ranges were observed. This is consonant with the evidence of other studies cited above that, although there may be small differences in the anatomical length of the vocal folds (VFs) between boys and girls of the same size, this is not reflected in habitual child voice use before the age of puberty (Ref. 162, p. 180). Regression analysis showed a highly significant linear association between chronological age and extension of vocal range, both upwards and downwards, but not until age 10 did most children achieve the full pitch range for their song, though girls tended to do so earlier than boys, in some cases as much as 2 years ahead.
Signal intensity The mean overall signal intensity for the 320 samples was 72.28 dB, with a standard deviation of 2.95 dB. Comparison of values over chronological age indicated systematic gains from age 4 through 11 years, with a highly significant adherence to linearity over chronological age (F ¼ 54.08, df ¼ 1, 318, P < 0.001), but no significant between-gender differences were observed for any age group (Figure 6). Any gender differences in energy shifts over age that might be evident in the data could, therefore, not be attributable to intergender inequalities of overall sound pressure level (SPL). Although, increases in SPL represented a clear developmental trend, inspection of the data by means of scatter plots and box plots showed that its progress was not uniform. Data appeared to fall naturally into three stages: 4–5, 6–8, and 9–11 years. Differences in mean intensities between the younger two of these groups fell slightly short of significance (P ¼ 0.086), but between the 6–8 and the 9–11 age groups, a highly significant value for t (P < 0.0001) value was found. Given the critical dependency of LTAS data on overall signal intensity,77,162 comparisons of the LTAS curves were made observing these self-defining groups (see below). When between-gender t test comparisons of the two measures of dynamic range were made, there was some evidence of gender differences in the variance among the 6–8-year-olds, suggesting that girls may have made greater use of intensity variation as an expressive device than did boys, but this was not apparent for either the younger or older age groups. However, it is acknowledged that this measure was a somewhat approximate, ‘‘rule-of-thumb’’ device, and may be taken only as an indication of presence or absence of an effect.
23
74.00 girls
Mean pitch range (Middle C = 13)
20
boys
girls
boys
boys
19 18 17 16 15 14 13 12 11
pitch range at which songs were presented
21
girls boys
girls 73.50
mean overall intensity (dB)
22
73.00
72.50
72.00
71.50
10
gender differences not significant
9 8 7
boys' median pitch girls' median pitch 4-5 years
71.00 6-8 years
4-5 years
9-11 years
age-group
6-8 years
9-11 years
age-group
FIGURE 5. Mean pitch ranges at which songs were recovered from
FIGURE 6. Increases in mean overall signal intensity (dB) by age
singers.
groups.
Desmond C. Sergeant and Graham Frederick Welch
LTAS curves The mean LTAS curves for vocal products of the entire sample of 320 vocal products are shown across the three self-defining age groups (4–5, 6–8, and 9–11 years; combined genders) in Figure 7. Energy values at frequency levels above 15 kHz were small and are, therefore, omitted from the plots. Plots were then examined for trends across ages and between genders. The shifts of energies from frequencies above 5.75 kHz to frequencies below 5.75 kHz as a function of increasing age reported in the previous study1 were found to be present in data for both genders: as singer age increased, energies at all frequencies above the 5.75 kHz decreased, while those at all frequencies below this point increased. The changes over age were highly systematic, with high F values (P < 0.001) for linear trends in the case of both genders. No significant intergender differences were found; data plots for increases/decreases in energies are, therefore, combined across genders in Figure 8. However, the shifts were not uniform across the spectrum and at peak points of difference between curves (Figures 10 and 11) magnitudes in excess of 8 dB were found. Mean LTAS curves within each age group were then compared and intergender differences were assessed by means of t tests applied to each point of measurement through the spectrum. Comparisons for the youngest children, aged 4–5 years (Figure 9), showed no significant intergender differences at any point of the spectrum. Mean differences for this age group were small, ranging between 0.26 and 1.41 dB. Between-gender comparisons for the age group 6–8 years showed two narrow bands of significant between-gender differences centered between 2 and 3 kHz and between 3.5 and 5 kHz, respectively, with a broader area between 5.5 and 9 kHz, the latter with a peak point of difference of 3.45 dB at 7.75 kHz (Figure 10). For children in the 9–11 years age group, a single band of significant intergender differences was observed between 4.5 and 6.5 kHz, with a peak differential of 2.97 dB at 5.75 kHz (Figure 11).
4-5 years
mean energy (10 dB intervals)
6-8 years 9-11 years
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15
frequency (kHz)
FIGURE 7. Mean LTAS curves for the three age groups (4–5, 6–8, and 9–11 years).
Gender Differences in Children’s Voices
325
These results show that the previously reported energy shifts1 between successive age groups were, therefore, taking place for both genders, but not uniformly, and local areas of the spectrum were evident where developmental patterns for boys and girls differed. The extent of changes over age becomes more dramatically apparent if data for the youngest age group are plotted against those of the oldest (Figure 12). The magnitudes of the shifts over increasing age can be assessed from the peak differences observed between their respective curves (Table 1). It is evident from Figure 12 that, although there are significant differences between the curves for genders at certain points of the spectrum, these differences are relatively small in magnitude compared with the overall shifts that are attributable to increasing age. There is a clear developmental effect that is common to children of both genders in these age groups, though there are also qualitative differences in the ways that they manifest for boys and girls, respectively. DISCUSSION Pitch range differences Accurate gender attribution improved as a function of increasing age for both genders, but boys showed a steeper trajectory for positive identifications (Figure 2). Since no equivalent to SFF exists in the context of song, any propensity on the part of singers to adopt higher or lower pitch levels has to be inferred from the highest, lowest, and median pitches achieved in a performance. The respective mean values are plotted in Figure 5. Pitch ranges achieved by the singers expanded with increasing age, but not until age 10 did most children achieve the full pitch range that their song required, with girls somewhat ahead of boys in this respect. However, the difference between mean median pitches for the two genders was no more than 49 cents— just less than half of 1 semitone. The overlap of variance between the respective data for the two genders was such that, for each age group, differences fell well short of statistical significance. In respect of the pitch levels adopted by these children in their performances, the data here clearly indicate that there were no differences between genders before the age of 11–12 years. It is, therefore, concluded that the pitch of the children’s performances did not provide sufficiently reliable information to account for the high rates of accurate gender attribution achieved by the listeners. There appeared to be two periods in the developmental timetable when the ages of the singers were more ambiguous for the listeners (Figure 3). In the case of girls, this was between 6 and 7 years, when a small deviation from linearity indicated a tendency for listeners to underestimate singer age; for boys the corresponding period was a little later, between 8 and 9 years, when a reverse tendency was apparent, perhaps suggesting incipient onset of early puberty for some male children. These tendencies to nonlinearity are also reflected in the standard deviations of listener’s age-estimate errors (Figure 4) that show wider excursions for boys than for girls. These are indicative of periods when listeners experienced least and greatest uncertainty in their decisions of singer ages and are in approximate
326
Journal of Voice, Vol. 23, No. 3, 2009 Mean energies for frequencies below 5.75kHz
Mean energies for frequencies above 5.75kHz line of best fit for mean values
mean energy (10 dB intervals)
line of best fit for mean values
4
6
5
7
8
9
10
11
4
5
6
7
8
9
10
11
actual ages
FIGURE 8. Shifts of spectral energy above and below 5.75 kHz. agreement with timing of changes in speaking voice reported by Hacki and Heitm} uller.83 Overall intensity, energy shifts, and gender Overall mean signal intensity of vocal output increased significantly with age equally for both genders, though this development was not uniform over chronological age, tending rather to fall within three phases or age bands. On this evidence, listeners’ success in gender attribution cannot be accounted for by any gender separation in this factor. The shifts of spectral energy over age from relatively higher toward lower frequencies are, therefore, taken to represent a consistent developmental
feature of child voice for both genders. Data plotted in Figure 12 show differences of spectral energy in excess of 8 dB at peak frequency points between the LTAS curves of the youngest and oldest children. Formant frequencies It is not the purpose of this paper to present comprehensive data of Ffr values. To attempt to do so would be invalid for the following reasons: The songs used in the study were created against criteria of melodic suitability, verbal appropriateness, and ease Girls 6-8 yrs
No significant differences at any frequencies
0
1
2
3
4
5
6
7
8
Boys 6-8 yrs
mean energy (10 dB intervals)
mean energy (10 dB intervals)
Girls 4-5 yrs Boys 4-5 yrs
9
10 11 12 13 14 15
frequency (kHz)
FIGURE 9. LTAS curves for 4–5-year-old singers by gender.
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15
frequency (kHz)
FIGURE 10. LTAS curves for 6–8-year-old singers by gender.
Desmond C. Sergeant and Graham Frederick Welch
Girls 9-11 yrs
mean energy (10 dB intervals)
Boys 9-11 yrs
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15
frequency (kHz)
FIGURE 11. LTAS curves for 9–11-year-old singers by gender.
TABLE 1. Peak Points of Difference (dB) Between LTAS Curves for Youngest (4–5 years) and Oldest (9–11 years) Age Groups Frequency region (kHz) Difference at peak point (dB)
0
of teaching and learning, not with an intention of analysis of vowel formant structures. The children determined their own loudness levels for their singing, with no attempt being made by the research team to standardize this across singers. Vowels were affected by the singing abilities of these vocally naı¨ve children. Vowel productions were subject to the social and ethnic diversity of dialects within Inner London and were not validated by listeners as valid examples of received pronunciation. Formant values for only three vowels were extracted.
Despite these important limitations, some general observations can be made from the data obtained from the formant analysis of three vowels /o/, /e/, and /i:/. 1. The tendency for a decrease in Ffr values over increasing age was confirmed in the case of F1 for all three vowels. Values showed a small, but significant, linear trend for F1 to move downwards across the 4–11 years age range
mean energy (10 dB intervals)
Boys 4-5 yrs Girls 4-5 yrs Boys 9-11 yrs Girls 9-11 yrs
0
1
2
3
4
5
6
7
8
9
0.25 4.36
3.35 8.27
7.75 8.81
13.25 4.94
(F ¼ 5.33, df 1,78, P ¼ 0.024). No equivalent tendency was present for any of the three higher formants. 2. There was considerable variation in Ffrs between singers at all ages. 3. Some words occurred repeatedly through the songs; this enabled comparison of up to three examples of vowels with identical consonant environments as a measure of singer consistency of production. The results revealed considerable variability between successive examples of the same vowel by the same singer. Ffrs were found to shift by considerable percentages—commonly up to 30% of the frequency of an immediately preceding or succeeding sample. Factors that appeared to influence changes in Ffrs were location of the vowel example in a musical phrase of the song—that is, whether occurring at mid-phrase or at phrase-ending; the pitch at which the vowel occurred: a difference of a musical interval of a fifth could cause Ffrs to change by up to 30%; the pitch direction—ascending or descending—of the passage in which the vowel occurred. It seems probable that this phenomenon is a consequence of the relative vocal naivety of the singers in negotiating movement to higher or lower vocal pitches; the children were likely to be adjusting their laryngeal position, thereby inducing changes in VT length and shape.68,84–86 Such a tendency would be likely to affect singing more than speech owing to more abrupt and extensive changes of pitch, but it nevertheless suggests that it may be unwise, even in the context of speech, to extrapolate generalized Ffr values from isolated samples of vowels extracted from speech without account being taken of intonational context. 4. The gender separation widely reported in the literature of speech, where lower Ffr values have been reported for boys than for girls,77 was not evident here; no systematic or significant intergender differences were found, either for the 80 children sampled in the formant analysis, nor within any age group. 5. It has been reported that Ffrs for singing are generally lower than those for speech, though this finding has yet to receive corroboration from comparable studies. As no systematic samples of speech were obtained from these children, direct comparisons in this respect were not possible, but the Ffrs obtained here were lower than those generally reported in the literature for speech.
10 11 12 13 14 15
frequency (kHz)
FIGURE 12. LTAS curves for youngest (4–5 years) and oldest (9– 11 years) age groups.
327
Gender Differences in Children’s Voices
There are well-known difficulties in obtaining valid measurements of Ffrs with children owing to the high frequencies at which they occur, which results in their greater frequency
328 dispersion. This has led some researchers to use 10–11-yearolds as subjects, but children of these ages are already bordering on the onset of puberty and may not yield data comparable with those for children from wider age bands. Lee et al75 report that differentiation in Ffrs patterns begins at around 11 years. Reduction in within-subject variability has been reported as a major trend in the 10–12-year period, converging to adult at age 12. Other studies confirm that this age is an important one of change in Ffrs.20,23,60,61,75,101 An important factor not yet widely discussed in the literature on formants is that static formant patterns do not convey the information typically carried in the dynamic process of vowel production. It is possible that important elements of age and gender information are contained in the transitions that occur. Assman and Katz87 report that cues provided by Ffrs are important for speech intelligibility, citing Rosner and Pickering88 ‘‘perception of vowel quality is determined by the formant pattern and its changes over time.’’ It is possible that some of the differences between the values obtained for Ffrs here and those reported elsewhere in the literature may be attributable to the fact that our data were obtained from averaged data from samples of approximately 100-millisecond duration, and not obtained from a single spectral slice. Although Ffrs play a part in perceptual processes in relation to voice quality, there remain many uncertainties, especially when relatively small differences in spacing are involved. As Klatt and Klatt79 comment ‘‘The remarkable ability of human perceptual system to deal with problems revealed in measurement of peaks/formants calls into question the degree to which formants are actually perceptual dimensions.’’ Previous reports of spectral noise at high frequencies Spectral noise at high frequencies has been reported in several other researches. De Jonkere89 found abnormally large amplitudes of what he describes as white noise at high frequencies in voices classified as ‘‘hoarse.’’ This was attributed to turbulence in the airflow. Similar observations have been made for conditions of ‘‘hoarseness,’’ ‘‘breathy voice,’’ and ‘‘dysphonia,’’44,92 although the particular frequencies at which they have been reported have been somewhat inconsistent, ranging between 3 and 5 kHz,92,93 a concentration above 5 kHz,94 ‘‘above 6 kHz,’’95 5–8 kHz,102 6–10 kHz, and 10–16 kHz.96 Differences among findings have been dependent on the thrust of the investigation, the a priori assumptions that have been made, and the frequency range which has been considered relevant for study, with many studies not examining frequencies above those at which formants might be expected to be present. Evidence from spectrograms A necessary first strategy in seeking an explanation of the phenomena observed in this study was extended scrutiny of spectrograms generated from the voice samples. From this process, three factors became apparent. Some elements of high frequency noise were attributable to consonants, especially stop-releases, fricatives, and
Journal of Voice, Vol. 23, No. 3, 2009
plosives /f/ /x/ /b/ /w/ etc. These typically generated sound elements extending up to 15 kHz. However, they were of short duration, as would be expected in the context of song, where vowels’ durations are extended above those of speech by the attachment of the rhythmic characteristics of the melody. The consonants’ contribution to the overall time-aggregated values of the LTAS curves would therefore have been relatively small. Other high frequency elements were found to be synchronous with voiced segments of the samples. They covered a frequency range extending from 7–8 kHz up to 15 kHz, that is, at frequencies higher than could be explained by harmonicity, not least because harmonicity has been found to be suppressed in the presence of high frequency noise.96,97 From extended examination of large numbers of the spectrograms across age groups, a clear trend became evident in which high frequency noise elements were particularly prominent in the traces of younger children, but became less evident with increasing age of singers, although the level of variance and individuality among singers in this respect was considerable. This trend was seen as complementary to the evidence of the energy shifts observed over age from statistical comparisons of the LTAS curves reported above.
COMPETENCY OF VF BEHAVIOR—OPTIMAL VOICE VERSUS GLOTTAL LEAKAGE In optimal modal voice, the arytenoid cartilages bring the VFs together into clean contact and the edges of the folds meet simultaneously along their length. Tonal tension is appropriately adjusted to anticipated transglottal airflow. The pattern of excitation of the folds in response to subglottal pressure is, therefore, regular, the open and closed phases of the glottal cycle are clearly defined and the airflow is chopped cleanly into regular pulses. Higher harmonics are clearly specified at the instant of closure of the folds, creating a harmonically rich spectrum. Laver98 lists the conditions for modal voice as (1) only the true VF must be in vibration; (2) vibration must be regularly periodic without audible roughness arising from disperiodicity; (3) vibration must be efficient in air usage without audible friction; (4) degree of muscle tension in all phonatory muscle systems must be moderate. He states that any variation of these leads to voice qualities that are ‘‘harsh,’’ ‘‘creaky,’’ ‘‘whispery,’’ or ‘‘breathy.’’ Classification of voice conditions into categories such as those listed by Laver, above, is notoriously problematic owing to a lack of universally accepted units of scaling or of terminology for their description.98,99 Gelfer100 lists 67 terms that have been used for voice quality and it is clear that even the more commonly used and more objective descriptors do not define precise perceptual categories possessing clearly demarked boundaries.
Desmond C. Sergeant and Graham Frederick Welch
Unsurprisingly therefore, studies have repeatedly reported poor agreement among listeners in ratings of voice qualities, even where clinicians and voice experts have acted as listeners. Recent researches have attempted to address these problems and to arrive at improved descriptions and quantifications. McAllister et al,101 for example, had seven expert listeners rate 205 samples of voice products of 10-year-old children for five voice conditions. Ratings were then ranked and 50 examples, evenly distributed along the scale of rated means, were analyzed for perturbation of F0 (jitter), perturbation of amplitude (shimmer), and two measures of signal-to-noise ratio. Highly significant correlations were found for measured levels of jitter and ratings for hoarseness, breathiness, roughness, and grating conditions, and also for the first three of these with coefficients for signal-to-noise ratio within the frequency range 0–5 kHz. Their study demonstrates that considerable overlap exists between these voice conditions and that they share common characteristics and causes. However, Kreiman and Gerratt104 present a discouraging catalogue of widely varying values for correlations between perceived conditions and their relative acoustic characteristics that have been reported in the literature. This may support a notion that what have been regarded in the research literature as separate categories and conditions may to a significant extent merely be differing presentations of common causal physiological events.104,105 There is certainly plentiful evidence that several physiological parameters may combine to contribute to a single voice condition108 and that the category described as ‘‘normal voice’’ is not a narrowly defined condition,92,106,107,155,157 but will itself embrace considerable variance within its boundaries. Kreiman and Gerratt104 provide an appendix of accounts of physical characteristics of voice qualities drawn from a number of research studies. These may be summarized as follows: Breathiness: audible escape of air; weak phonation; inability to firmly adduct VF; VF vibrating but somewhat abducted; turbulence at glottis; degree of breathiness inversely related to length of closed glottal phase; more energy at fundamental but significant component of noise. Harshness: high pitch; irregular VF vibrations; rough, unpleasant, grating; sometimes associated with hard glottal attack. Hoarseness: wet, liquid sounding; rough; contains noise components; noisy; harsh; breathy; source noise plus friction; excessive escape of air; combination of rough and breathy voice: irregular VF vibrations with additive noise. Roughness: impression of irregular VF vibrations; fluctuation of F0 and amplitude of glottal sound; low frequency aperiodic noise; rough and unpleasant; crackling of glottal fry; irregular pulses; wide F0 range; high frequency roughness; uneven; modes of vibration lacking synchrony. These various accounts provide evidence that voice conditions are not discrete perceptual or causal categories, and may comprise several causes/symptoms in one condition.
Gender Differences in Children’s Voices
329
The occurrence of high frequency noise synchronous with vowels is an indication that, in addition to regularly pulsating phonatory airflow of a modal voice, additional air, not regularly activated by the VFs—and therefore characterized by turbulence rather than harmonicity—is passing into the pharyngeal tube, that is, glottal leakage is occurring. The resulting total spectrum, therefore, comprises a reduced harmonic component, with random elements caused by friction of the additional air traversing the wall of the pharyngeal tube and there is thus a direct relationship between transglottal airflow, VF contact area, and the degree of noise present in the generated spectrum.109 Voices described as ‘‘breathy’’ have also been reported as having high frequency noise elements and there is good agreement in attributing this to incomplete closure of the glottis during the vibratory cycle.91,95,109–112 In one typical breathy voice condition, the arytenoids move toward their due meeting point, but remain separated at their posterior. Although the glottis is never fully closed in this condition,85 the VFs are sufficiently approximated to allow them to vibrate in response to subglottal pressure from the lungs.79 Even though the edges of the folds are excited, generating a waveform, a space is left open between them posteriorly through which a flow of air escapes into the pharyngeal tube throughout the entire glottal cycle. Contact of this irregularly turbulent component of the airflow with the surface of the tube results in audible aspiration noise.113 Speciale and Cimino114 state that incomplete closure may extend up to 50–60% of the total length of the folds, or even along their total length. The resultant phonation thus comprises aspirative noise together with the regularly periodic voicing component generated by the VFs, creating a total spectrum combining both harmonics and random noise.115–118 Recent studies, however, have pointed to the possibility of several differing modes of leakage,119 including a condition in which complex turbulence is caused through the airflow being in two directions ‘‘enhancing excitation of the VT.’’120 Unlike the relatively clean simultaneous contact at closure of the folds in modal voice, in breathy voice the excitation at the edges of the folds is more irregular,79 creating jitter, and such harmonics as are generated are less positively specified at the moment of maximum closure, giving irregularity of waveform.121 The acoustic waveform of ‘‘breathy’’ voice is, therefore, characteristically more sinusoidal than for modal voice, stronger at its fundamental, and amplitudes of higher harmonics are attenuated. The perceptual effects of breathiness are thus an increase in the amplitude of the fundamental, and masking of higher harmonics by aspiration noise. Noise components of the spectrum will not maintain constant values, however, where the glottal airflow is increased to provide for stressed syllables, air leakage will be greater and the level of aspirative noise will be increased accordingly. The frequency range and spectral effect of such frictatious noise in the airways of the singers in the present study could be assessed approximately from intakes of breath taken by the singers as represented in the spectrograms. Children typically required two or three inhalations during the course of their song. These were often conspicuously noisy—sometimes spectacularly so. Measurements made from spectrograms showed
330 the associated noise elements from these passages of air to extend up to around 12 kHz. A necessary caveat here is that the spectral characteristics of inhalation and exhalation are probably not identical owing to reversal of the airflow and its passing first through the oral cavity and only subsequently the pharyngeal tube. Several studies have suggested that breathiness and hoarseness are sufficiently common characteristics of children’s voices to be regarded as a norm.114,122–124,155–157 Reasons put forward to support this view include that (1) incomplete closure is a normal configuration for voices at higher frequencies125; (2) there is a greater tendency in children to have lax tension of the VFs, permitting a greater ‘‘throughput’’ of redundant air,13,126 and (3) owing to their VF structure, children necessarily use higher levels of subglottal pressure in phonation.114,122,124,126 Vocal efficiency and signal intensity Stathopoulos and Sapienza127 have provided a detailed account of the development of the ventilation system through childhood. Reviewing available anatomical data, they report that, although lung length and capacity increase in a fairly regular pattern, it is not until the ages of 13–14 (females) and 14–16 years (males) that adult proportions are achieved. The volume of air available for phonation is, therefore, limited by children’s smaller lung capacities158,159,161 and, to achieve the necessary glottal airflow, up to 50–60% greater lung pressure is required than for adults. Children, therefore, have to work harder than adults to achieve a given level of vocal output.128 For many children among our singers, such high demands for subglottal pressure needed for modal phonation may have proved hard to achieve, leading to underfunding of air at the glottis. This could lead to lax tension at the VF, a prime condition for glottal leakage and breathy voice. McAllister et al122 report from a study of 60 10-year-olds representing both sexes that children in general seem to have somewhat compressed vocal range profiles, indicating restricted dynamic vocal capabilities. The open quotient of the glottal cycle is reported to be strongly related to vocal intensity,130 and so increases in air support consequent upon lung growth through the prepubertal period of childhood, allowing capability for increased intensity, will be accompanied by tendency to a shorter open quotient, thereby reducing the opportunity for glottal leakage.129 Laryngeal development From a study of VT growth using MRI, Vorperian et al132 report that VT length increases from 6–8 cm in infants to 15–18 cm in adult women and men; these values are close to those given by Menard et al133 (14.3 cm for adult females and 16 cm for males). According to Gollin,134 the rate and pattern of VT growth is not uniform, but is subject to growth spurts.57 A period of rapid development from birth to infancy is followed by an interval of slower growth during early and middle childhood, with a final period of rapid growth during puberty to achievement of adult proportions. Not only is rate of growth irregular, the pattern of development of vertical and horizontal segments of the tract is also
Journal of Voice, Vol. 23, No. 3, 2009
nonuniform. In the infant child, the horizontal oral cavity is longer than the vertical pharyngeal tube, but by puberty these proportions are reversed. VT growth, therefore, is biased toward the vertical dimension, with less change in the horizontal dimension of the oral cavity. A period of rapid growth of the lower face occurs between 7 and 9 years, but these changes affect the shape of the palate more than its size until the age of 12.132,135 At no time during the growth period can the child’s VT be regarded as a uniformly scaled-down version of that of the adult.136,160–161 With progressive elongation of the VT through childhood, a dependent expectation would be that the frequency regions of aspirative noise from abrasion of the airflow against the pharyngeal wall will decrease proportionally. Sapienza and Hoffman137 see a positive relationship between glottal air leakage, presence of high frequency noise, and listener perception of breathiness. Consequently, higher frequency elements arising from this cause, observable in LTAS for younger children, will migrate to lower regions as pharyngeal length increases with age. Smith and Patterson165 consider that VT length appears to decide judgments of sex and age, reporting that differences of VT length lead to shifts on Ffrs, for which they claim that differences as small as 6% are readily discriminated. A developmental account, based on a wide-ranging review of relevant literature, is therefore proposed in explanation of the phenomena observed in this study, under which random high frequency noise present in the spectra of the youngest children is caused by supraglottal turbulence of air entering the VT via incomplete glottal closure, giving rise to frictional sound as it passes through the pharyngeal tube. With progressive growth and development of the laryngeal tissue and musculature with increasing age, the glottal gap progressively represents a reducing proportion of the VF contact area: glottal functioning thereby gradually acquires a more ‘‘modal’’ pattern. Vocal efficiency is thus improved and the redundant frictional sound from turbulent air becomes less prominent in the vocal output. At the same time, as growth progresses, the pharyngeal tube increases in length, thus lowering the frequencies at which any residual frictional noise will occur. The high frequency spectral energies that previously affected vocal products of the youngest children now become shifted to lower frequency regions and at the same time approach more ‘‘modal’’ voice characteristics. The fulcrum point about which these shifts take place is, on the evidence of the LTAS curves reported above, approximately 5.75 kHz. Gender differences The above hypothesis fits comfortably with our data, but it also is necessary to offer an account for the gender differences found between the LTAS curves for boys and girls. The central question here is whether there are differences of phonatory function between boys and girls. As Corbin-Lewis and Johnson138 comment, the question is not simply an academic one and is far from resolved. LTAS curves for boys and girls are shown separately across age groups in Figure 13 (boys) and Figure 14 (girls). Our interpretation of these is that the shifts of spectral energies follow
Desmond C. Sergeant and Graham Frederick Welch
mean energy (10 dB intervals)
4-5 years 6-8 years 9-11 years
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15
frequency (kHz)
FIGURE 13. LTAS curves for boys in the three age groups (4–5, 6– 8, and 9–11 years). a similar pattern for both genders, but with differences of timing. The statistical evaluations of t tests have shown that no intergender differences were evident among the 4–5-year-olds (Figure 9). In the middle years of childhood, among the 6–8year-olds (Figure 10), the shifts of energy begin to be apparent, but appear to commence somewhat earlier for girls, with energies in lower frequency regions 2–3/3.5–5 kHz exceeding those of boys (by a mean of 1.52 dB, P ¼ 0.009). However, boys show higher energy levels between 5.5 and 9 Hz (by a mean of 2.5 dB, with P ranging from <0.01 to <0.009) and also retaining greater levels of high frequency noise between 12 and 15 kHz (P ¼ 0.047). These gender differentials in ages of change match the evidence of Hacki and Heitm} uller,83 as discussed above, though their data also reflect a lowering of pitch range of habitual speech (for girls at 7–8 years and for boys 8–9 years) that was not evident here, since our data for song showed expansion of the vocal range upwards in pitch and downwards. They also receive validation from the greater standard deviations of age and gender estimates made by our listeners, which showed
mean energy (10 dB intervals)
4-5 years 6-8 years 9-11 years
0
1
2
3
4
5
6
7
8
9
Gender Differences in Children’s Voices
331
greater uncertainty in judgments of boys and girls in these age groups (c.f. Figures 2–4 above). Some qualitative gender differences are evident in the data for singers in the 9–11 years age group. For these children, the relative positions evident for the 6–8-year-olds is reversed: boys now show higher levels of energy in the frequency region 4.5–6.5 kHz, peaking at 5.75 kHz with a mean difference of 3 dB, P ¼ 0.002. A possible explanation of this is that these children are approaching puberty, and their vocal output is beginning to conform to the characteristics of adults. There is common agreement among a considerable number of studies that adult women show greater breathiness of voice than men,9,18 that a causal condition is less complete female glottal closure,131,138–141 and that, for a given subglottal pressure value, closed quotient values will be lower for women than for men.142–144 The reasons for such intergender differences have yet to be reliably established, but in general two alternative views prevail: (1) there are structural male/female differences of laryngeal structure and/or functioning, as yet unidentified and (2) that differences are of sociophonological origin, that is, societal norms which govern the speech habits and style—differential paralanguages—of men and women.144 These might have their origin in personality biases between genders, or differences in the speech styles to which the genders aspire—separate modes of speech that have been described as ‘‘genderlects.’’11,145–149 Sachs et al6 suggest that it is possible that men/women may modify their articulation of the same phonetic elements to produce acoustic signals that correspond to male/female archetypes.c Puts et al147 also find evidence that phonological features of speech are adapted to match social roles and intentions. As mentioned above, after completion of their rating sessions, listeners were asked to list any cues that they felt they had used in their judgments. A few responses appeared to reflect attention to spectral qualities of voice—‘‘girls seemed lighter/ clearer/more focused in tone,’’ but most pointed to behavioral aspects, including ‘‘boys were stronger/more positive in delivery; boys were more matter-of-fact/girls more expressive; gender differences in consonants, girls being softer/slower/fuzzier in delivery, and their sibilants more protracted; seeming maturity/confidence of delivery—boys more confident.’’ These comments find some support in the literature: several studies have reported males as being more assertive/aggressive/positive in vocal output.147–149 Puts et al147 claim that their data show that phonological aspects of speech become adapted to match social role and intentions. McConnell-Ginet63 believes that femininity is conveyed by qualities of pronunciation and clarity of speech with longer pauses and longer segment durations. Several recent studies have identified clear gender differences in voice onset time (VOT), though this appears to be highly conditional on the phonetic structure of the sample examined and could be argued to support either of the above
10 11 12 13 14 15
frequency (kHz)
FIGURE 14. LTAS curves for girls in the three age groups (4–5, 6– 8, and 9–11 years).
c
‘‘Her voice was ever soft, gentle, and low, an excellent thing in woman’’ laments Shakespeare’s King Lear (Act V, sc. III) of his dead daughter Cordelia.
332 hypotheses. One theory is that sociophonic factors, such as speaking style, may contribute to gender differences in speaking behavior. Karlsson et al150 hypothesize that the gender differences they observed in 3–9-year-olds could have been due to differences in intensity and variability of airflow, but Whiteside and Marshall152 say that gender differences in VOT do not emerge until late preadolescence, and that they appear to be socially based—a view shared by others.153,154 Certainly there appears to be considerable variability between speakers in respect of VOT.151 Gender attribution and age cues Two issues remain unanswered: what were the auditory data against which listeners achieved their high levels of accuracy in judgments of age and of gender? The accuracy of estimates of singer age gives conclusive evidence of the presence in the voice samples of age-related information. It is possible that the declining levels of high frequency noise in the signals, or the shifts of energy from higher to lower frequencies played a role here. However, the song performances were also rich in linguistic, behavioral, and musical cues. These included, for example, rhythmic continuity of the performance, competence in management of the directional changes of the melody, pronunciation (a tendency for pronunciation of ‘‘little’’ as ‘‘lickel’’ was a clear indication of a younger child, as was excessive rhythmic accentuation reminiscent of the nursery rhyme). A strong musical clue would have been presence or absence of a strong concept of tonality. Younger children would quite frequently deliver a phrase of their song with a clear tonality, that is, with good feeling for tonal center or key, then proceed to sing the succeeding phrase also with a strong feeling for tonality, but shifting the tonal center to a new pitch. Some cases were noted where no two phrases of the song were sung in the same key, though each phrase was within itself perfectly tuneful. Such characteristics would be unlikely to have escaped the notice of our musically trained listeners. In the case of gender attributions, it seems less likely that these developmental/musical cues were relevant: there is insufficient evidence in the literature of well-defined gender differences in these parameters to support such a claim. We are, therefore, unable to give a clear explanation of the admirable accuracy of our listeners, and this is clearly an important area for future research. AREAS FOR FURTHER RESEARCH This study raises several issues that require resolution through further research. The developmental hypothesis presented here in explanation of the energy shifts observed from these studies requires validation from longitudinal studies in which periodic observation of glottal behavior, and especially of glottal closure, can be made with simultaneous assessment of levels of high frequency noise. The limited data on formants reported here indicate an intrinsic relationship between formant height and intonation
Journal of Voice, Vol. 23, No. 3, 2009
in speech and melodic pitch in song. Although the factor of pitch height is generally understood, there is need for more precise data from controlled experimental situations with younger voices. The apparent influence of syllable position in the overall phrase—that is, near start, middle, or end of a speech product or musical phrase—on vowel Ffr also needs further investigation. Whether this is a characteristic of childhood vocal output, or is extended to adults also needs to be clarified. The shifts of spectral energy that have been reported here were observed in the context of singing products. Whether they will be found to be equally present in speech has yet to be determined. Our subjects were naı¨ve singers. There are as yet only a few studies that have compared spectral energies in products of both trained and naı¨ve singers or speakers.
CONCLUSIONS The 320 children in the age range 4–11 years learned a song and each was digitally recorded singing it alone. Recordings were randomized onto tape and were audited by four experienced independent judges, who made estimates of ages of the singers and judgments of their gender. Rates of correct gender identification (71%) were in line with those reported previously for children’s speech. Age estimates were also commendably accurate, with a mean error of 1.05 years. The voice samples clearly carried reliable age and gender information. Pitch range lower borders at which the songs were recovered from the children ranged from a low of 64 cents below Eb3 (a girl) to a high 35 cents above D5 (a boy). Save for a small region at the upper extremity of the pitch ranges reached by 9–11-yearolds, where girls achieved a slightly higher mean pitch (slightly in excess of 1 semitone) than boys of same age, no significant gender differences in mean pitch ranges were observed. There was a highly significant linear association between chronological age and extension of the vocal range, both upwards and downwards, though not until age 10 did most children achieve the full range needed for their song, with girls tending to be ahead of boys in this respect, as much as 2 years in some cases. SPL of vocal output increased across the age range, but the relationship with age was not uniform—differences between the 4–5-year-olds and 6–8-year-olds fell slightly short of significance, but a highly significant increase was found between 6– 8-year-olds and 9–11-year-olds. No significant between-gender differences were observed for any age group. LTAS curves for each of the 320 singers were calculated. In view of the dependency of LTAS data on overall signal intensity, comparisons between genders and age groups were made observing the above self-determining age bands. For the youngest age group (4–5 years), no significant between-gender differences in LTAS curves were found at any frequency in the spectrum examined (0–15 kHz). With the two older age groups, areas of significant gender differences in
Desmond C. Sergeant and Graham Frederick Welch
curves were observed. Shifts of spectral energy from higher frequency regions to lower frequency regions, reported in a previous study1 were found to be present in data for both genders, and the fulcrum for these changes of spectral tilt was approximately 5.75 kHz, that is, energies at frequencies above 5.75 kHz decreased over age, and energies below 5.75 kHz increased correspondingly. However, the shifts were not uniform between genders. Our data are interpreted as indicating that the shifts began earlier for girls than for boys. Extended inspection of spectrograms showed the presence of considerable high frequency noise in the voice samples, though this was seen to reduce over age, as indicated in the reduction of energy at higher frequencies in the LTAS curves. From a wide-ranging review of the literature of child voice, a developmental theory is proposed in explanation of the phenomena observed in this study. Reasons are offered in partial explanation of the accuracy of age estimation by listeners. The cues used by listeners in making gender attributions, however, remain uncertain.142,160 REFERENCES 1. Sergeant DC, Welch GF. Age related changes in the long-term average spectra of children’s voices. J Voice. In press. 2. Laver J. The semiotic nature of phonetic data. York Pap Linguist. 1976;6: 55-62. 3. Wu K, Childers DG. Gender recognition from speech: Part 1: Coarse analysis. J Acoust Soc Am. 1991;90:1820-1840. 4. Schwartz MF, Rine H. Identification of speaker sex from isolated, whispered vowels. J Acoust Soc Am. 1968;44:1736-1737. 5. Caruso AJ, Mueller PB, Xue A. Relative contributions of voice and articulation to listener judgements of age and gender: preliminary data and implications. Voice. 1994;3:1-9. 6. Sachs J, Lieberman P, Erikson D. Anatomical and cultural determinants of male and female speech. In: Shuy RW, Fasold RW, eds. Language Attitudes: Current Trends and Prospects. Washington DC: Georgetown University Press; 1973. 7. Bladon A. Acoustic phonetics, auditory phonetics, speaker sex and speech recognition: a thread. In: Fallside F, Woods A, eds. Computer Speech Processing. Englewood Cliffs, NJ: Prentice Hall; 1983. 8. Monsen RB, Engebretson AM. Study of variations in male and female glottal wave. J Acoust Soc Am. 1977;62:981-993. 9. Mendoza E, Valencia N, Munoz J, Trujillo H. Differences in voice quality between men and women: use of the long-term-average spectrum (LTAS). J Voice. 1996;10:59-66. 10. Weinberg B, Bennet S. Speaker sex recognition of 5- and 6-year-old children’s voices. J Acoust Soc Am. 1971;50:1210-1213. 11. Meditch A. The development of sex-specific speech patterns in young children. Anthropol Linguist. 1975;17:421-465. 12. Karlsson I. Sex differentiation cues in voices of young children of different language. J Acoust Soc Am. 1987;81(Suppl 1):S68. 13. Sederholm E. Perception of gender in ten-year-old children’s voices. Logoped Phoniatr Vocol. 1998;23:65-68. 14. Payri BG. Perception of speaker characteristics with long and short samples. J Acoust Soc Am. 2000;108(Pt 2):3532. 15. Avery JD, Liss JM. Acoustic characteristics of less masculine-sounding male speech. J Acoust Soc Am. 1996;99:3738-3748. 16. Bennett S. A 3-year longitudinal study of school-aged children’s fundamental frequencies. J Speech Hear Res. 1983;26:137-142. 17. Bennett S, Weinberg B. Sexual characteristics of pre-adolescent children’s voices. J Acoust Soc Am. 1979;65:179-189. 18. Ingrisano D, Weismer G, Schucker GH. Sex identification of preschool children’s voices. Folia Phoniatr. 1980;32:61-69. 19. Busby PA, Plant GL. Formant frequency values of vowels produced by preadolescent boys and girls. J Acoust Soc Am. 1995;97:2603-2606.
Gender Differences in Children’s Voices
333
20. Perry TL, Ohde RN, Ashmead DH. The acoustic basis for gender identification from children’s voices. J Acoust Soc Am. 2001;109:2988-2998. 21. Coleman RO. A comparison of contribution of two vocal characteristics to the perception of maleness and femaleness in the voice. J Speech Hear Res. 1976;19:168-180. 22. Weinberg B, Zlatin M. Speaking fundamental frequency characteristics of 5-6 year old children with mongolism. J Speech Hear Res. 1970;13: 418-425. 23. Marshall G. Sex typing of speech of prepubertal children [PhD thesis]. Louisiana State University; 1972. 24. Wilson DS. A study of the child voice from six to twelve [PhD thesis]. University of Oregon; 1970, p. 134. 25. Sorenson DN. A fundamental frequency investigation of children ages 6-10 years old. J Commun Disord. 1989;22:115-123. 26. Mount KH, Salmon S. Changing the vocal characteristics of a post-operative transsexual patient: a longitudinal study. J Commun Disord. 1988;21: 229-238. 27. Gu¨nzburger D. Voice adaptation by transsexuals. Clin Linguist Phon. 1989;3:163-172. 28. Gu¨nzburger D. An acoustic analysis and some perceptual data concerning voice change in male-female transsexuals. Eur J Disord Commun. 1993;28:13-21. 29. Lass NJ, Almerino CA, Jordan LT, Walsh JM. The effect of filtered speech on speaker race and sex identifications. J Phon. 1980;8:101-112. 30. Crosby P, Nyquist L. The female register: an empirical study of Lakoff’s hypothesis. Lang Soc. 1977;6:313-321. 31. Edelsky C. Question intonation and sex roles. Lang Soc. 1979;8:15-32. 32. Eble CC. How speech of some is more equal than others. Presented at: Southeastern Conference on Linguistuistics; 1972. 33. Crystal D. Prosodic and paralinguistic correlates of social categories. In: Ardener E, ed. Social Anthropology and Language. London, NY: Tavistock; 1971:185-206. 34. Woods N, College L. It’s not what she says, it’s the way that she says it: the influence of speaker sex on pitch and intonational patterns. Research Speech Report. Indiana University. 1992; 18 (progress report): 84–95. 35. Wolfe VI, Ratusnik D, Smith F, North G. Intonation and fundamental frequency of male-to-female transsexuals. J. Sp. Hearing Disord. 1990;55: 43-50. 36. Key MR. Linguistic behaviour of male and female. Linguistics. 1972;88: 15-31. 37. Ferrand CT, Bloom RL. Gender differences in children’s intonational patterns. J Voice. 1996;10:281-291. 38. Terango L. Pitch and duration characteristics of the oral reading of males on a masculinity-femininty dimension. J Speech Hear Res. 1966;9: 590-595. 39. Brend R. Male-female intonation patterns in American English. In: Proceedings of the 7th International Congress of Phonetic Sciences. Mouton, The Hague; 1971; 866–869. 40. Gramming P, Sundberg J, Ternstrom S, Leanderson R, Perkins WH. Relationships between changes in voice pitch and loudness. J Voice. 1988;2: 118-126. 41. Bohme G, Stucklich G. Voice profiles and standard voice profiles of untrained children. J Voice. 1995;9:304-307. 42. Sundberg J. What’s so special about singers? J Voice. 1990;4:107-119. 43. Sundberg J, Elliot N, Gramming P, Nord L. Short-term variation of subglottal pressure for expressive purposes in singing and stage speech: a preliminary investigation. J Voice. 1993;7:227-234. 44. Wendler J, Doherty ET, Hollien H. Voice classification by means of long-term speech spectra. Folia Phoniatr. 1980;32:51-60. 45. White PJ. Voice source and formant frequencies in 11-year-old girls and boys. In: White PJ, ed. Child Voice. Stockholm, Sweden: KTH Voice Research Centre; 2000. 13–26. 46. Bennet S. Vowel formant frequency characteristics of preadolescent males and females. J Acoust Soc Am 1981;(1):231-238. 47. Hillenbrand J, Getty LA, Wheeler K, Clark MJ. Acoustic characteristics of American English. J Acoust Soc Am. 1995;97(Pt 1):3099-3111. 48. Hunter WS, Garn S. Disproportionate sexual dimorphism in the human face. Am J Phys Anthropol. 1972;36:133-138.
334 49. Walker GF, Kowalski CJ. On the growth of the mandible. Am J Phys Anthropol. 1972;36:111-118. 50. Walker J. Craniofacial changes. In: Pinkham JR, et al, eds. Pediatric Dentistry: Infancy Through Adolescence. Philadelphia, PA: Sanders; 1994. 51. Boersma H, Van den Linden FPGM, Prahl-Andersen B. Cranio-facial development. In: Prahl-Andersen B, ed. New York, NY: Academic Press; 1979. 52. E Bulygina, Mitteroecker P, Aiello L. Ontogeny of facial dimorphism and patterns of individual development within one human population. Am J Phys Anthropol. 2005;131:432-443. 53. Ku¨nzel HJ. How well does average fundamental frequency correlate with speaker height and weight? Phonetica. 1989;46:117-125. 54. Nordstrom PE. Female and infant vocal tracts from male area functions. J Phon. 1977;5:81-92. 55. Van Dommelen WA, Moxness BH. Acoustic parameters in speaker height and weight identification: sex-specific behaviour. Lang Speech. 1995;38: 267-287. 56. Collins S. Men’s voices and women’s choices. Anim Behav. 2000;60: 773-780. 57. Hollien H, Green R, Massey K. Longitudinal research on adolescent voice change in males. J Acoust Soc Am. 1994;96:2646-2653. 58. Lass NJ, Brown WS. Correlational study of speaker’s heights, weights, body surface areas, and speaking fundamental frequencies. J Acoust Soc Am. 1978;63:1218-1220. 59. Gonzales J. Research in acoustics of human speech sounds: correlates and perception of speaker body size. In: Recent Research Developments in Applied Physics, Vol. 9 2006;. 1–15. 60. Merow WW, Broadbent BH. Cephalometrics. In: Enlow DH, ed. Facial Growth. Philadelphia, PA: Sanders; 1990. 61. Yang C-S, Kasuya H. Speaker individualities of vocal tract shapes of Japanese vowels measured by magnetic resonance images. Presented at: The 4th International Conference on Spoken Language Process; October 3–6, 1996; Philadelphia PA. Available at: http://www.isca-speech.org/archive. 62. Tecumseh Fitch W, Giedd J. Morphology and development of the human vocal tract: a study using magnetic resonance imaging. J Acoust Soc Am. 1999;106:1511-1522. 63. McConnell-Ginet S. Intonation in a man’s world. In: Thorns B, Karmorae C, Henley N, eds. Language, Gender and Society. Browley, MA: Newbury House; 1983:69-88. 64. Oates J, Decakis G. Speech pathology considerations in the management of transexualism–a review. Br J Commun Dis. 1983;18:139-151. 65. O’Kane M. Recognition of speech and recognition of speaker sex. J Acoust Soc Am. 1987;82(Suppl 1):S84. 66. Fasold RW. A sociolinguistic study of the pronunciation of 3 vowels in Detroit speech. Detroit, MI: Centre for Applied Linguistics, University of Detroit; 1968. 67. Nolan F. The Phonetic Basis of Speaker Recognition. Cambridge University Press; 1983. 68. Sundberg J. Perception of singing. In: Deutsch D, ed. The Psychology of Music. London: Academic Press; 1982:59-98. 69. Rendall D, Vokey JR, Nemeth C, Ney C. Reliable but weak voice-formant cues to body size in men but not women. J Acoust Soc Am. 2005;117:2372. 70. Masataka N. Pitch characteristics of Japanese maternal speech to infants. J Child Lang. 1992;19:213-223. 71. Mattingly IM. Speaker variation and vocal tract size. J Acoust Soc Am. 1966;39:S1219. 72. Trauenmuller H. Size and physiological effort in the production of signed and spoken utterances. Working Papers 49, 164–167, Department of Linguistics, Lund University. 2001. 73. Yildirim S, Narayanan S, Byrd D, Khurana S. Acoustic analysis of preschool children’s speech. In: Proceedings of the 15th International Congress of Phonetic Society. Barcelona; 2003. 74. Peterson GE, Barney HL. Control methods used in a study of vowels. J Acoust Soc Am. 1952;24:175-184. 75. Lee S, Potamianos A, Narayanan S. Acoustics of children’s speech: developmental changes of temporal and spectral parameters. J Acoust Soc Am. 1999;105:1455-1468. 76. Eguchi S, Hirsh I. Development of speech sounds in children. Acta Otolaryngol. 1969;257(Suppl):1-151.
Journal of Voice, Vol. 23, No. 3, 2009 77. Huber JE, Stathopoulos ET, Curione GM, Ash TA, Johnson K. Formants of children, women and men: the effect of vocal intensity variation. J Acoust Soc Am. 1999;106:1532-1542. 78. Wolf JD. Efficient acoustic parameters for speaker recognition. J Acoust Soc Am. 1972;51:2044-2056. 79. Klatt DH, Klatt LC. Analysis, synthesis and perception of voice quality variations among female and male talkers. J Acoust Soc Am. 1990;87: 820-857. 80. Glenn JW, Kleiner N. Speaker identification based on nasal phonation. J Acoust Soc Am. 1968;43:368-372. 81. Boersma P, Weenink D. Praat v.4.3.03. Available at: http://www.fon.hum. ava.nl/praat/ 2005; 2005. 82. Coleman RO. Male-female voice quality and its relationship to vowel formant frequencies. J Speech Hear Res. 1971;14:565-577. 83. Hacki T, Heitm} uller S. Development of the child’s voice: permutation, mutation. Int J Pediatr Otorhinolaryngol. 1999;49(Suppl 1):141-144. 84. Sundberg J, Nordstrom PE. Raised and lowered larynx: the effect on vowel formant frequencies. J Res Sing. 1983;6:7-15. 85. Cleveland TF. Singing voice production: 25 yrs of progress. J Voice. 1994;8:18-23. 86. Howard DM. Larynx closed quotient in a capella SATB quartet singing. In: Proceedings of the Stokholm Music Acoustics Conference, SMAC-03. Stockholm; Aug 6–9, 2003; 467–470. 87. Assman PF, Katz WF. Synthesis fidelity—vocal identification. J Acoust Soc Am 2001; [142nd ASA meeting]. 88. Rosner BS, Pickering JB. Vowel Perception and Production. Oxford: Oxford University Press; 1994. 89. de Jonkere PH. Recognition of hoarseness by means of LTAS. Int J Rehab. 1983;6:343-345. 90. Nessel. Uber das Tonfrequenzspectrum der pathologie veranderten stimme. Acta Otolaryngol Suppl. 1960;197. 91. Klatt DH. Acoustic correlates of breathiness: first harmonic amplitude, turbulence, noise and tracheal coupling. J Acoust Soc Am. 1987;82(Suppl 1):S91. 92. Wayland R, Gargash S, Longman A. Acoustic and perceptual investigation of breathy voice. J Acoust Soc Am. 1995;97(5):3364. 93. Gordon M, Ladefoged P. Phonation types: a cross-linguistic overview. Available at: http://www.linguistics.ucsb.edu/faculty/gordon/phonatory/ pdf 2007; 2007. 94. Master S, De Blaise N, Pedrosa V, Chiari BM. The long-term average spectrum in research in the clinical practice of speech therapists. Pro Fono. 2006;18:111-120. 95. Shoji K, Regenbogen E, Yu JD, Blaugrund SM. High frequency power ratio of breathy voice. Laryngoscope. 1992;102:267-271. 96. Valencia NN, Mendoza LE, Mateo RI, Carballo GG. High frequency components of normal and dysphonic voices. J Voice. 1994;8:157-162. 97. Yumoto E, Gould WJ, Boer T. Harmonics to noise as an index of loudness. J Acoust Soc Am. 1982;71:1544-1550. 98. Laver J. Principles of phonetics, . Cambridge Textbooks in Linguistics. London: Cambridge University Press; 1994. 99. Sonninen A, Hurme P. On the terminology of voice research. J Voice. 1992;6:188-193. 100. Gelfer MP. Perceptual attributes of voice: development and use of rating scales. J Voice. 1988;2:320-326. 101. McAllister A, Sundberg J, Hibi SR, Acoustic measurements and perceptual evaluation of hoarseness in children’s voices. 1996, TMH-QPRS 4. 102. Kent RD. Anatomical and neuromuscular maturation of the speech mechanism: evidence from acoustic studies. J Speech Hear Res. 1976;19: 421-447. 103. Lofqvist A, Mandersson B. Long-time average spectrum of speech and voice analysis. Folia Phoniatr. 1987;39:221-229. 104. Kreiman J, Gerratt B. Measuring voice quality. In: Kent RD, Ball MJ, eds. Voice Quality Measurement. San Diego, CA: Singular Publishing Group; 2000:73-102. 105. Sodersten M, Hertegard S, Hammarberg B. Glottal closure, transglottal airflow and voice quality in healthy middle-aged women. J Voice. 1995;9:182-197. 106. Fairbanks. Voice and Articulation Drill Book. New York, NY: Harper; 1940.
Desmond C. Sergeant and Graham Frederick Welch 107. Sundberg J, Hibi, SR. Acoustic measurements and perceptual evaluation of hoarseness in children’s voices. TMH-QPSR 4/1996: 15–26. 108. Mathieson L. Normal-disordered continuum. In: Kent RD, Ball M, eds. Voice Quality Measurement. San Diego, CA: Singular Publishing Group; 2000:1-3. 109. Rothenberg M. Some relations between glottal airflow and vocal fold contact area. Available at: http://www.rothenberg.org/vfca/vfca.htm 2007; 2007. 110. University of Stuttgart. 5.1 Types of phonation. Available at: http://www. ims.uni-stuttgart.de/phonetik/EGG/page 10.htm; 2007. 111. Fritzell B, Hammarberg J, Gauffin J, Karlsson I, Sundberg J. Breathiness and insufficient vocal fold closure. J Phon. 1986;14:549-553. 112. Wendler J, Rauhut A, Kruger H. Classification of voice qualities. J Phon. 1986;14:483-488. 113. Shirastav R, Sapienza C. Difference limen for aspiration noise in perception of breathy voice. J Acoust Soc Am. 2002;112,:2444. 114. Speciale R, Cimino G. Types of voice disorders in children and laryngoscopic approach. In: White P, ed. Child Voice. Stockholm, Sweden: KTH Voice Research Centre; 2000:129-142. 115. De Krom G. Some spectral correlates of pathological breathy and rough voice quality for different types of vowel fragments. J Speech Hear Res. 1995;38:794-811. 116. Hammarberg B, Gauffin J. Perceptual and acoustical characteristics of quality differences in pathological voices as related to physiological aspects. In: Fujimara O and Hirano M, eds. Vocal Fold Physiology, Voice Quality Control. San Diego, CA: Singular Publishing Group; 1994: 282–303. 117. Sodersten M, Lindestadt P-A. Glottal closure and perceived breathiness during phonation in normally speaking subjects. J Speech Hear Res. 1990;33:601–611. 118. Hillenbrand J, Houde RA. The acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech. J Speech Hear Res. 1996;39: 311-321. 119. Birkholz J. Model of the vocal apparatus. Available at: http://wwwicg. informatik.uni-rostock.de/piet/tract_model.html 2007; 2007. 120. Cranen B, Schroeter J. Modeling a leaky glottis. J Phon. 1995;23:165-177. 121. Cranen B, de Jong F. Laryngostroscopy. In: Kent RD, Ball M, eds. Voice Quality Measurement. San Diego, CA: Singular Publishing Group; 2000: 265. 122. McAllister A, Sederholm E, Sundberg J, Gramming P. Relations between voice range profiles and physiological and perceptual voice characteristics in ten-year-old children. J Voice. 1994;8:230-239. 123. Bolfan-Stosic N, Yhilerva A, Welch GF. Vocal identity—differences and similarities between children from Croatia and Finland. In: Proceedings of the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA). Florence: Firenze University Press; 2003:47-50. 124. Bjork M, Wahlgren H. Is there a mutational triangle? A study on the mutational voice in boys attending a music school [PhD thesis]. Stockholm: Department of Logopedics and Phoniatrics, Karolinska Institute. 125. Murry T, Xu JJ, Woodson GE. Glottal configuration associated with fundamental frequency and vocal register. J Voice. 1998;12:44-49. 126. Stathopoulos ET. A review of the development of the child voice: an anatomical and functional perspective. In: White P, ed. Child Voice. Stockholm, Sweden: KTH Voice Research Centre; 2000. 127. Stathopoulos E, Sapienza CM. Developmental changes in laryngeal and respiratory function with variations in sound pressure level. J Speech Learn Hear Res. 1997;40:595-614. 128. Titze I. Critical periods of vocal change: early childhood. The NATS Journal. 1992;Nov/Dec:16-17 [Puberty: 1993 Jan/Feb 24]. 129. Tang J, Stathopoulos ET. Vocal efficiency as a function of vocal intensity: a study of children, women and men. J Acoust Soc Am. 1995;97: 1885-1892. 130. Henrich N, d’Alessandro C, Doval B, Castellengo M. Glottal open quotient in singing: measurements and correlation with laryngeal mechanisms, vocal intensity, and fundamental frequency. J Acoust Soc Am. 2005;117:1417-1430.
Gender Differences in Children’s Voices
335
131. Schneider B, Bigenzahn W. Influence of glottal closure configuration on vocal efficacy in young normal-speaking women. J Voice. 2003;17: 468-480. 132. Vorperian HK, Kent RD, Lindstrom MJ, Kalina CM, Gentry LR, Yardell BS. Development of vocal tract length during early childhood: a magnetic resonance imaging study. J Acoust Soc Am. 2005;17: 338-350. 133. Menard L, Schwartz J-L, Boe¨ L-J. Role of vocal tract morphology in speech development. J Speech Hear Lang Res. 2004;47:1059-1080. 134. Gollin E. Devlopmental plasticity. In: Gollin E, ed. New York, NY: Academic Press; 1981:231-252. 135. Kent RD, Vorperian HK. Development of the cranio-facial-oral-laryngeal anatomy: a review. J Med Speech Lang Pathol. 1995;3:145-190. 136. Boe L-J, Menard L, and Maeda S. Adaptation control strategies during the vocal tract growth inferred from simulation studies with articulatory model. Presented at: The 5th Seminar on Speech Production. 2000. 137. Sapienza CM, Hoffman B. Documentation of clinical features. In: White P, ed. Child Voice. Stockholm, Sweden: KTH Voice Research Centre; 2000:104-128. 138. Andrews ML, Schmidt CP. Gender presentation: perceptual and acoustical analysis of voice. J Voice. 1997;11:307-313. 139. Sundberg J, Fahlstedt E, Morell A. Effects on the glottal voice source of vocal loudness variation in untrained female and male voices. J Acoust Soc Am. 2005;117:879-885. 140. Sulter AM, Abers FW. The effects of frequency and intensity level on glottal closure in normal subjects. Clin Otolaryngol. 1998;23:97-98. 141. Bless DM, Biever D, Shaikh A. Comparisons of vibratory characteristics of young adult males and females. In: Proceedings of the International Conference on Voice. Japan; 1986; Vol. 2: 46–54. 142. Hodge FS, Colton RH, Kelley RT. Vocal intensity characteristics in normal and elderly speakers. J Voice. 2001;15:503-511. 143. Titze IR, Sundberg J. Vocal intensity in speakers and singers. J Acoust Soc Am. 1992;91:2936-2946. 144. Austin W. Some social causes of paralanguage. Can J Linguist. 1965;11: 31-39. 145. Tannen D. You Just Don’t Understand: Women and Men in Conversation. New York, NY: William Morrow; 1990. 146. Henton C, Bladon A. Breathiness in normal female speech: inefficiency versus desirability. Lang Commun. 1985;5:221-227. 147. Puts DA, Gaulin SJC, Verdolini K. Dominance and the evolution of sexual dimorphism in human voice pitch. Evol Hum Behav. 2006;27:283-296. 148. Cook AS, Fritz JJ, McCormiack BL, Visperas C. Early gender differences in the functional use of language. Sex Roles. 1985;12:909-915. 149. Sause EF. Computer content analysis of sex differences in the language of children. J Psycholinguist Res. 1976;21:127-146. 150. Karlsson F, Zetterholm E, Sullivan KPH. Development of a gender difference in voice onset time. In: Proceedings of the 10th Australian International Conference on Speech Science and Technology. December 8–10, 2004. 151. Allen JS, Miller JL, DeSteno D. Individual talker differences in voice onset time. J Acoust Soc Am. 2003;113:544-552. 152. Whiteside SP, Marshall J. Developmental trends in voice onset time: some evidence for sex differences. Phonetica. 2001;58:196-210. 153. Whiteside SP, Irving CJ. Speakers’ sex differences in voice onset time. Some preliminary findings. Percept Mot Skills. 1997;85:459-463. 154. Robb M, Gilbert H, Lerman J. Influence of gender and environmental setting on voice onset time. Folia Phoniatr Logop. 2005;57:125-133. 155. Sederholm E, McAllister A, Dalkvist J, Sundberg J. Aetiological factors associated with hoarseness in ten-year-old children. Folia Phoniatr Logop. 1995;5:262-278. 156. Williams J, Welch GF, Howard DM. An exploratory baseline study of boy chorister vocal behaviour and development in an intensive professional context. Logoped Phoniatr Vocol. 2005;30:158-162. 157. Welch GF, Howard DM. Gendered voice in the cathedral choir. Psych Music. 2002;30:102-120. 158. Polgar G, Weng TR. The functional development of the respiratory system. Am Rev Respir Dis. 1979;120:625-695.
336 159. Russell NK, Stathopoulos ET. Lung volume changes in children and adults during speech production. J Speech Hear Res. 1988;31:146-155. 160. Kahane JC. Growth of the human prepubertal and pubertal larynx. J Speech Hear Res. 1982;25:446-455. 161. Titze IR. Principles of Voice Production. 2nd ed.). Iowa: NCVS Library; 2005. 162. Nordenberg M, Sundberg J. Effect on LTAS of vocal loudness variation. Logoped Phoniatr Vocol. 2004;29:183-189.
Journal of Voice, Vol. 23, No. 3, 2009 163. Liasko EE. Some characteristics of maternal speech to three and six-month-old infants. Psychol J (Russ Assoc Sci). 2002;23:48-55. 164. Kuhl PK, Andruski JE, Chistovich IA, et al. Cross-language analysis of phonetic units in language addressed to infants. Science. 1997;277: 684-686. 165. Smith DRR, Patterson RD. The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex and age. J Acoust Soc Am. 2005;118:3177-3186.