Journal of Voice
Vol. 8, No. 2, pp. 145-156 © 1994 Raven Press, Ltd., New York
Articulatory, Developmental, and Gender Effects on Measures of Fundamental Frequency and Jitter Joan E. Sussman and Christine Sapienza Department of Communicative Disorders and Sciences, State University of New York at Buffalo, Buffalo, New York, U.S.A.
Summary: Fundamental frequency (F0) and jitter were measured in digitized live-voice productions of sustained vowels [a], [i], and [u] from women, men, and 6- through 9-year-old children. Results showed (a) significant developmental differences for mean Fo and for the pattern of jitter by vowel type, (b) significant gender differences in F o and jitter only for adults, (c) significant differences in F 0 and jitter according to vowel type for all subjects, and (d) similar amounts of mean absolute jitter for children and women for all vowels with nonsignificantly different values of jitter for boys and men on [i] and [u] productions. Results are related to Honda's theory of intrinsic F o for vowels and to Titze's neurologic model of jitter. Key Words: Acoustic--Fundamental frequency--Jitter--Children--Adults--Gender.
The development of speech production ability has been studied with acoustic analysis of children's voices for many years (e.g., 1,2). Acoustic analysis offers a noninvasive way to investigate differences in speech production that can then be used to establish normative patterns for subsequent comparisons to pathological productions (e.g., 3). One parameter that changes as children develop is fundamental frequency (F0), i.e., the rate of vocal-fold opening during voiced segments that is associated with vocal pitch. Fundamental frequency and variability of Fo are acoustic parameters that would be expected to be affected by anatomical/physiological development of the larynx and supralaryngeal cavities, which occurs in children between 4 and 13 years of age (1,4-9). Growth and differentiation of the vocal-fold structures and their neurological con-
trol have been speculated to affect children's voices over time (5-9).
FUNDAMENTAL FREQUENCY Adults In the voices of adults, regular differences in F 0 have been shown to occur with the vowel type produced when F0 was not controlled to a particular frequency (6,10-14). Honda (15, also 16,17) has presented an articulatory-based theory to explain intrinsic differences in vowel F0. Based on previous studies of vowel Fo and a relationship noted between geniohyoid activity and F0 (18-20), H o n d a speculated that a high tongue position during production of vowels such as [i] or [u] contributed to a higher rate of glottal pulses, that is, a higher Fo. Because the tongue and larynx are connected by muscles and ligaments, when the tongue root is raised primarily by posterior fibers of the genioglossus, the hyoid bone is moved anteriorly. These extrinsic attachments to the laryngeal structure are thought to create an increase in external tension on the larynx, perhaps combined with a forward thy-
Accepted November 9, 1992. Address correspondence and reprint requests to Dr. J. E. Sussman at Department of Communicative Disorders and Sciences, State University of New York at Buffalo, 122 Park Hall, Buffalo, NY 14260, U.S.A. Portions of the current project were presented at the Spring 1990 meeting of the Acoustical Society of America, State College, Pennsylvania State University.
145
146
J. E. S U S S M A N A N D C. S A P I E N Z A
roid cartilage tilt, to increase Fo. In contrast, when the tongue is low in the mouth as with an [a] vowel production, the hyoid bone position is more posterior, resulting in decreased laryngeal tension and lower Fo. Children Previous investigations of Fo in children's voices have shown that F 0 gradually changes between 6 and 12 years of age, with little apparent gender differences until 12-13 years (1,5). Although it is known that children's F0s differ from those of adults before puberty (5,21), less is known about the relationship between F 0 and type of phonetic segment in children's voices. Robb and Simmons (21), using electroglottographic measures, reported a trend for higher F0s in [i] and [u] vowels compared with [a] productions of 5-year-old children's vowels, but did not find significant differences. Ohde (6) found that in contrast to adults, Fos of 8-9-year-old children differed on the basis of tongue position, but not when the vowels were preceded by labial stop consonants. Glaze et al. (9) found significant differences in the Fo of vowels [a] versus [i] for children ages 5-12 years at medium and loud levels, with [i] having a significantly higher Fo. Thus, results from studies of children's voices have not shown as definite a pattern for higher F0s with high-tongue production vowels as is found with adults. This may be due to be a relative lack of multivowel studies, combined with procedural differences across investigations.
JITTER (F 0 VARIABILITY) Another related measure of F 0, that of jitter, the period-to-period differences in F 0, has also been speculated to show intrinsic differences with vowel type. Variation in jitter across vowel types has been attributed to the differences in F0 among vowels (related to tongue position) and/or because of an interaction between the laryngeal source and supralaryngeal filter functions (22-24). Titze (25) modeled neurologic sources of jitter in vocal-fold vibration, finding that jitter varied most according to the firing rate of motor units in the thyroarytenoid muscle. Higher motor-unit firing rates were associated with higher Fos and lower amounts of jitter. Therefore, it might be expected that factors responsible for producing a higher F0 within a subjec:t, because of differences in vowel type or syllable stress, might be responsible for lower amounts of absolute jitter. Journal of Voice, Vol. 8, No. 2, 1994
Adult studies of jitter by vowel type Some investigations of jitter by vowel type in adults (13,22-24) have shown that higher-Fo vowels [i] and [u] had less absolute jitter than did the lowerF o vowel [a], supporting predictions of Titze's model for a within-subject relationship between F 0 and jitter. Wilcox and Horii (14, for young and elderly men) and Milenkovic (26, for young men and women) reported lower amounts of jitter in the tongue-high vowel [u] compared with other vowels. Linville and Korabic (27) reported greatest percent jitter for ]a] productions in a group of 18 elderly women. Deem et al. (28) found higher percent jitter values for [a] than for [i] and [u] for young men and women. However, other studies of jitter measured in adult cross-vowel productions have not supported the pattern of less jitter with tongue-high vowel productions. Horii (29) reported similar Fos and jitter in 20 men across vowel when measured by microphone and accelerometer. Sorensen and Horii (30) reported higher percent jitter values for [i] than for [a] and [u] for 20 women studied, with a similar pattern in men reported by Horii (31). Linville (32) also found higher amounts of percent jitter for [i] productions in 22 women studied.1 Adult studies of jitter by gender Across-subject differences in F o associated with gender are explained primarily by relative length differences of the vocal folds, according to Titze (33). Higher F0s of women are associated with their shorter vocal-fold lengths compared with men. Titze (33) also theorizes that mode of vocal-fold vibration may be different in men and women due to degree of vocalis contraction. Men are thought to have a larger degree of vocalis muscle contraction with a resulting medial bulge along the vocal-fold surface, whereas women are believed to have a " m o r e triangularly shaped [and uniform] vocal fold" (33, p. 1705). Thus, because of both length and vibratory mode factors, it might be predicted that women would have smaller amounts of jitter than men during sustained vowel productions. Some investigators have found lower amounts of absolute jitter in the voices of women compared with measures in men's voices (26,34--36). Orlikoff I Differences in the way jitter have been reported may be partly responsible for apparently different patterns of jitter according to vowel type. Percent jitter has been used to "correct" for the relationship between jitter and F0 and may overcorrect in voices with high Fos (22,37).
A R T I C U L A T O R Y, DEVELOPMENTAL, A N D GENDER EFFECTS and Baken (37) reported lower mean absolute jitter for women's voices compared with men's, although the differences were not significant. They believed that the amount of jitter present in a voiced production was primarily determined by the individual's relative F o compared with the individual's F o range rather than to the absolute Fo of the production. In contrast to the studies reported above, however, Deem et al. (28) and Sorensen and Horii (30) have reported higher amounts of jitter in the voices of women compared with men. Thus, the relationship among jitter, F o, and gender for adult voices remains equivocal. Children's studies Results from studies of children's voices have also been equivocal concerning the total amount of Fo variability in their sustained vowel productions compared with adult productions and also for the pattern of jitter obtained by vowel type. Kent (5) reported that F0 variability, specifically the SD of Fo, was greater for children under the age of 10 years than for adults. Similarly, Ohde (6) found significantly greater amounts of Fo variability (coefficient of variation of Fo) for 8--9-year-old children compared with adults. He suggested immature neuromuscular development in children as an explanation. Steinsapir et al. (38,39) reported higher amounts of jitter in sustained [a], [i], and [u] vowels compared with measures from adults with highest amounts of jitter reported for [a] productions in children. In contrast to the above findings, Glaze et al. (8) reported mean absolute jitter values for children between 5 and 11 years of age in the vowel [a] in the production [ha] that were similar to those of adults (0.036 and 0.032 ms for girls and boys, respectively). However, similar to results of Kent (5) and Ohde (6), Glaze et al. observed greater variability among the children's productions compared with those of adults. In a follow-up study, Glaze et al. (9) found again that jitter in children's medium and loud productions was similar to that found in adults (i.e., <0.04 ms). Furthermore, jitter did not differ according to vowel type ([a] versus [i]) but did differ due to loudness level. Jitter decreased significantly with increased intensity level from medium to loud levels. However, F0 also increased significantly with intensity, to a much larger degree than Fo changed across vowel type. Thus, jitter decreased as F0 significantly increased with manipulation of vocal inten-
147
sity. These results suggest that a critical factor for determining the amount of jitter may be the degree of F0 change, regardless of how the F o change is created. Finally, as Glaze et al. (9) noted, the signalto-noise ratios (SNRs) of childrens' productions at 15.4 dB were lower than reported for adult voices by Milenkovic (26) at 20 dB. The children's SNRs might have been partially related to the use of an initial [hi consonant. Noise from the [h] may have been assimilated into the sustained portion of the vowel, which was analyzed at the midpoint of a 2-3 s production. Because jitter, along with shimmer, has been associated with "breathiness" (40), it is possible that jitter differences due to vowel type might have been masked by the presence of noise from the associated [h] sound in comfortable-level productions. Overall, considerable uncertainty exists about the amount of Fo variability in the voices of children compared with adults. Although suspected neuromuscular immaturity predicts that children's voices should demonstrate higher amounts of jitter compared with adult productions, their inherently higher F0s suggest that jitter should be lower in children's voices than in adults'. In addition, although there is considerable agreement that Fo varies with vowel type in adults, less is known about these articulatory influences in the vowel productions of children. THE PRESENT STUDY The current investigation measured Fo and jitter in sustained productions of vowels [a], [i], and [u] in voices of male and female children and adults to observe the effects of development, gender, and articulatory (i.e., vowel-type) factors. The focus of the study was twofold. First, the study measured Fo and jitter in children versus adults to determine any developmental differences that might occur across vowel type. Specifically, it was asked whether children and adults have higher F0s and less jitter in high vowels [i] and [u] compared with the low vowel [a] and whether the children's pattern differs from the adult pattern. Second, the importance of development versus F 0 on the resulting amount of jitter was investigated by comparing jitter (a) across lowand high-F0 vowels, (b) across female (high-F0) and male (low-Fo) groups, and (c) across younger and mature groups. If absolute jitter decreases with increased Fo, then there should be smaller amounts of jitter found in high-tongue vowel productions ([i] Journal of Voice, Vol. 8, No. 2, 1994
148
J. E. S U S S M A N A N D C. S A P I E N Z A
and [u]) compared with low-tongue vowel productions ([a]). In addition, smaller amounts of absolute jitter should be found in the voices of women and children compared with men, with smallest values found for children. However, if developmental immaturity is important, children's jitter values may be expected to be higher than those of women, who have fundamental frequencies more similar to those of the children. METHOD Subjects Seventeen boys and 14 girls ages 6.11-9.2 years (mean age: 7.6 years; SD 8 months), l0 women ages 22-34 years (mean age: 25.7 years; SD 3.9 years), and 10 men ages 19-28 years (mean age: 24.0 years; SD 2.7 years) participated in the current investigation. The children were screened by two ASHAcertified speech-language pathologists for normal voice production in their public school classrooms and were confirmed not to have had prior voice problems by the school speech-language pathologist. The prospective adult subjects completed a questionnaire regarding general health and voice status. The adults included in the investigation were nonsmokers who had no professional voice training. All adult subjects demonstrated good health and perceptually normal voice quality. Vital capacity, height, and weight were measured in all subjects to monitor physical fitness (41). Because all voice measures and perceptual judgment suggested normal voice functioning and subjects appeared to be in good health, the fitness factors were not analyzed further.
Apparatus A miniature accelerometer (Vibrometer, model 501M601, approximate weight of 1.8 g, and P16 preamplifier) was used to collect the voice signal from all subjects. It was used because it detects acoustic pulses through skin vibrations and has been shown to produce excellent SNRs because it does not transduce noise (29,42). In addition, accelerometer measures are less influenced by intrinsic vowel intensity differences associated with differing resonance characteristics of vowels (43,44) than is a microphone because of the accelerometer's position and the type of signal transduced (29,42). As Stevens et al. (42, p. 595) noted, the accelerometer transduces a glottal waveform with "relatively little harmonic content and fairly uniform amplitude." Askenfelt et al. (45), using a larger accelerometer Journal of Voice, Vol. 8, No. 2, 1994
(20 g), collected accurate measures of Fo that were not reliably different from those of the electroglottograph. They stated that accelerometers were capable of accurately measuring F0 in a wide range of subject voices including conditions of breathy phonation. To determine whether accelerometer signals would be analyzed in a manner similar to microphone signals by the computerized waveform analysis package (CSpeech, 46), data were collected from the female subjects simultaneously, with the accelerometer and a microphone (Shure dynamic, model SM511) placed a fixed 15 cm in front and slightly above the lips with a headset. The mean jitter and F 0 values with SDs from the accelerometer and microphone signals are shown in Fig. 1. Average values are similar for all measures, although the pattern of results for jitter is slightly different for microphone and accelerometer signals. In particular, the amount of jitter measured for [i] productions was highest for the microphone transducer, whereas measured jitter was highest for [a] productions with the accelerometer. Two separate repeated-measures analyses of variance were completed for jitter and Fo measures, respectively (47). Each showed that the accelerometer and microphone measures were not reliably different from each other [for jitter: F(1,18) = 3.34, p = 0.08; for F0, F(1,18) = 0.01, p = 0.92]. The apparatus used to collect the data included a Standard 386 AT computer, a 12-bit resolution digital signal processing board (Digital Translation, model 2821), a White Instruments antialiasing lowpass filter (model 4658) with a cutoff at 8.4 kHz, and the CSpeech software program (version 3.0, 46). An additional amplifier (Shure, model M267) was necessary to provide sufficient gain for accelerometer signal digitization because accelerometers provide inherently small outputs. CSpeech optional parameters were set at a sampling rate of 20.0 kHz with a two-channel display and 3-s time interval. CSpeech yields a mean absolute jitter value (in milliseconds) that is an average of the cycle-to-cycle differences in period. Milenkovic's CSpeech (46) program uses an interpolated short-term autocorrelation function to determine F 0 and jitter, a method shown to improve signal measurement with lower sampling rates (8,28). Procedures The accelerometer was attached to the skin superior and lateral to the thyroid prominence on the
ARTICULATORY, DEVELOPMENTAL, A N D GENDER EFFECTS 300
".Iv
¢3 Z U.I
200
0 i11 E. it
I.Z W
100
Q Z It
Accelerometer
Microphone
0.030
0.025
"G
== g nklJ krI -a
0.020
0.015
0.010
0.005 -
0.000
Accelerometer
Microphone
149
across productions. Once the amplifiers were adjusted during practice trials, the levels remained fixed for all experimental productions. Subjects were not instructed to maintain a target-level intensity because the investigators wanted to prevent any unnatural laryngeal variations that may have affected subjects' typical F0 control (48). Excessively loud or soft phonations were not accepted and subjects were carefully instructed to phonate at a "comfortable loudness level." Similar to investigations by Robb and Simmons (21) and others (27, 32,49), including a methodological study of jitter, the current focus was to observe typical voice production. Thus, instructions for comfortable level production were appropriate. As Orlikoff and Kahane (50) suggest, intensity needs to be controlled and/or measured in studies of voice measures. Orlikoff and Kahane showed that intensity changes that were noticeable (i.e., varying by 8-18 dB) could affect jitter measures. Therefore, the current investigation chose to control for intensity in voice productions by perceptual monitoring to assure that productions did not vary significantly. Subjects in the current investigation, like those in the Titze et al. (49) investigation (but with fewer repetitions), maintained a level that did not differ by as much as a doubling of sound pressure (6 dB sound pressure level) and were most likely within ---2 dB, across trial. Thus, intensity differences across trial were believed to be negligible. Post-hoe observation of consistent F0 and jitter measures both within and across trial lend credence to the constant intensity and F o levels maintained across trial.
FIG. 1. Mean fundamental frequency and jitter values (with SDs) for sustained vowel productions of women from simultaneous accelerometer and microphone measures.
Analysis
right lamina with double-sided adhesive tape. All subjects first practiced three trials of the vowel production task to give them experience sustaining the vowel for the necessary 3-s interval and to rehearse comfortable-level production. The investigators modeled "comfortable-level" productions for the subjects, directing them to "produce the vowel [a] in your regular talking voice like this [a: : :] until you see it on the computer screen." After training, subjects then produced two trials each of the vowels [a], [i], and [u]. Subjects' productions were monitored oscilloscopically, first during the training trials to adjust the amplifiers for best digitization levels, and subsequently to assure a constant vocal intensity
Live voice productions were digitized and saved on diskettes for later measurement. All measures of F0 and jitter were made from two 200-ms intervals per production. One measure was made from 1,000 to 1,200 ms (referred to as "midpoint"), and the other measure was made from 1,800 to 2,000 ms (referred to as "endpoint" even though the actual measure was - 1 s before the end of the vowel production). A total of 12 measures were made from each subject (three vowels x two trials x two sample measurement locations). Each 200-ms interval enabled at least 24 cycles per interval (for men) to be measured, a number considered sufficient for accurate jitter analysis (49). Measures from girls, boys, and women enabled an average of at least 54, 52, and 45 cycles, respectively, per 200-ms interval. All results reported are the combined averages of Journal of Voice, Vol. 8, No. 2, 1994
150
J. E. SUSSMAN AND C, SAPIENZA
the two measurement points (midpoint-endpoint) across two trials of sustained vowel productions, an analysis procedure that increased measurement validity. Separate t tests across midpoint-endpoint measures and across trial number revealed no significant differences for F o and jitter measures as shown in Table 1 (p > 0.05). Thus, measures made within one sustained vowel production were not reliably different, nor were measures made across trial for both F 0 and jitter. Subsequently, the four measures per vowel type were combined into average F o and jitter values for subjects' [a], [i], and [u] productions. Combined measurements reported were calculated from an average minimum of 98 cycles for men to an average maximum of 216 cycles for girls. RESULTS Average SNRs of subjects' vowel samples were computed for all measures. The mean SNRs of children (30.6 dB), women (29.3 dB), and men (28.8 dB) were similar and showed that subjects' voice productions had excellent ratios of harmonic to aperiodic components. Fundamental frequency
Age and gender effects Table 2 lists mean Fos for subjects by vowel category. Average F0s of 272 and 262 Hz were found for the girls and boys, respectively, across vowel. Their F0s ranged from 213 to 413 Hz (averaged across trial). Average Fos for the women and men were 224 and 122 Hz, respectively, across vowel. Women's average Fos (across trial) ranged from 197 to 271 Hz whereas men's Fos ranged from 102 to 173 Hz. Average F0 s for boys were between 35 and 40 Hz higher (for [a] to [u], respectively) than those of the women. Average F0s for girls were 47--49 Hz higher than those of women. Men's average Fos were >100 Hz lower than those of all other subjects. Finally, as Table 2 shows, the SDs of children's average F0s (30.2 Hz) for sustained vowel TABLE 1. T test comparisons o f midpoint-endpoint
and trial I versus trial 2 measures o f Fo and fitter across subject-group
Midpoint-endpoint Trial 1 vs. trial 2
/~o
p
Jitter
p
0.99 -0.94
0.33 0.35
1.~2 1.38
0.27 0.18
Journal of Voice, Vol. 8, No. 2, 1994
TABLE 2. Mean fundamental frequency and SD f o r
each subject group by vowel type
Girls Mean SD Boys Mean SD Women Mean SD Men Mean SD
a
i
u
263.98 31.06
272.79 28.10
276.95 39.06
253.01 23.84
263.24 29.60
271.36 28.85
214.94 14.93
227.86 19.99
230.88 16.52
115.46 15.52
122.37 17.85
127.88 17.93
productions were higher than those of the adults (17.3 Hz), particularly for [a] and [u] productions. Thus, children's F0 s showed more variation as a group than did adults'. A three-factor analysis of variance for F0 (age x gender x vowel) with repeated measures on the last factor revealed significant differences in Fo due to age [F(1,46) = 177.91, p < 0.0001], gender [F(1,46) = 63.37, p < 0.0001], and gender by age [F(1,46) = 43.12, p < 0.0001] (47). The age by gender effect was further analyzed with a Tukey hsd post-hoc test (51). Significant differences in vowel Fos were found between values for men and all other groups (at the 0.01 level), with at minimum an -100-Hz difference shown for each vowel type. Fos for vowel productions of boys and girls were not significantly different, with an -10-Hz difference for each vowel production. In addition, mean F0s for women and boys were not significantly different from each other.
Vowel effects Differences in Fo due to vowel type are shown in Fig. 2. The Fos of the vowels [a], [i], and [u] all differed from each other, with [a] having the lowest F o (mean of 221 Hz) and [u] having the highest Fo (237 Hz). The average difference in Fo between [a] and [u] was 16 Hz, between [a] and [i] was 10.6 Hz, and between [i] and [u] was 5.5 Hz. The abovelisted three-factor (age x gender x vowel) analysis of variance also revealed a significant difference in F o due to vowel type [F(2,92) = 24.61, p < 0.0001]. Post-hoc analysis using the Tukey hsd test revealed that all vowel Fos were significantly different from each other with differences between [a] and [i] and [a] and [u] significant at the 0.01 level, whereas that for [i] and [u] was significant at the 0.05 level.
ARTICULATORY, DEVELOPMENTAL, AND GENDER EFFECTS
I51
400
~"
350
[]
a
i
i
lu
>0 Z UJ 0 UJ re U.
300
250
,,¢ I,Z LU =E ,¢
200
Z :3
150
5
?:
FIG. 2. Mean fundamental frequency values (and SDs) of vowels [a], [i], and [u].
LL
100. iii!
i!iii!?:/
50'
i;:iii~ii~..... i:i: i~iJi~iiii!i~ii~i~i~,i a
i
VOWEL
Jitter
Age and gender effects Average jitter values for each vowel type are shown for children and adults in Fig. 3. Overall, children and women demonstrated similar amounts of absolute jitter during sustained vowel productions. Men's productions showed substantially greater amounts of jitter in all vowel productions, particularly for sustained [a]. In addition, a somewhat different pattern of jitter was observed for children compared with adults. Children showed the greatest amount of jitter for [a] and the least amount of jitter for [i] productions. Adults also showed the greatest amount of jitter for [a] productions, but the smallest for [u] productions. Both patterns of jitter by vowel type were consistent across gender. A three-factor analysis of variance for jitter (age x gender x vowel) with repeated measures on the vowel factor was completed. As with Fo, significant differences in jitter were found due to age [F(1,46) = 8.98, p < 0.004], gender [F(1,46) = 20.97, p < 0.0001], and age by gender [F(1,46) = 15.01, p < 0.0003]. A Tukey hsd post-hoc analysis for the age by gender interaction revealed that men's productions had significantly greater amounts of jitter compared with women's for vowels [a] (at the 0.01 level) and [i] (at the 0.05 level). Children's jitter
values were not significantly different by gender, although larger amounts of jitter were found for boys, as shown in Fig. 3. The jitter values from women were not significantly different from those of the children. Finally, men's jitter values were significantly different from those of the children for [a] productions (at the 0.01 level) and were significantly different from the girl's productions for the [i] vowel (at the 0.05 level) but not from the boys' mean [i] values.
Vowel effects The analysis of variance for jitter revealed two significant differences due to vowel type. First, there was a significant difference in the amount of jitter due to vowel type for both children and adults as can be observed in Fig. 2. Second, another significant interaction of vowel by age was found for jitter, also analyzed with a Tukey hsd comparison. Adult jitter values were significantly different for [a] versus [i] and [a] versus [u] comparisons (at the 0.01 level). Children's jitter values differed significantly only between [a] and [i] vowels (at the 0.05 level). DISCUSSION Current results support hypotheses by Honda (15) and others concerning the effects of tongue poJournal of Voice, Vol. 8, No. 2, 1994
152
J. E. S U S S M A N A N D C. S A P I E N Z A 0.05" a
i U
0.04
0.03 nIJJ -N
0.02
0.01"
0.00 - -
FIG. 3. Mean absolute jitter (and SDs) in milliseconds for sustained productions of [a], [i], and [u] by girls, boys, women, and men.
WOMEN
MEN
GIRLS
BOYS
0.05"
0.04"
"G vE nUJ
0.03"
0.02"
0.01-
0.00
sition on vowel F0, with resulting implications for jitter. In addition, results of the current investigation can help to clarify the relative importance of developmental and Fo variables for the amount of jitter observed in voices during sustained vowel productions. Results will be discussed according to vowel type (the articulatory factor for withinsubject changes) and by developmental and gender effects (between-subject factors). Vowel type Fo
First, in the current investigation, significant differences were found for Fo due to vowel type, a within-person variable with regard to Fo control. The greatest differences occurred between the tongue-low vowel [a] and the tongue-high ~vowels [i] and [u]. Smaller, but still significant differences in Journal of Voice, Vol. 8, No. 2, 1994
I
m
F 0 occurred between the two tongue-high vowels. The pattern of highest Fo for [u] and lowest F0 for [a] was consistent across age and gender groups, with women showing the greatest changes in Fo with vowel differences. The results for F o are similar to those reported previously for women's and men's F o by Peterson and Barney (10) and Nittrouer et al. (34) and for men's Fo by House and Fairbanks (12), Lehiste and Peterson (11), and Wilcox and Horii (14). Additionally, the current Fo results from 6- to 9-year-old children were also similar to the trend in F0 reported by Robb and Simmons (21) for 5-year-old children. Jitter
Second, in a pattern similar to that of Fo, jitter values for the sustained vowel productions were also significantly different due to vowel type. For adults, differences were significant for tongue-low
ARTICULATOR Y, DEVELOPMENTAL, A N D GENDER EFFECTS
versus tongue-high comparisons, whereas for children only the [a] versus [i] comparison showed significant differences, with the [a] versus [u] comparison nearly significant. Apparently, the position of the tongue and larynx for high-tongue vowels created a critical degree of tension in the vocal folds to reduce cycle-to-cycle differences in the Fo within individuals. For productions of [a], the lower tongue position appeared to facilitate vocal-fold aperiodicity, perhaps because of the relative laxness of the vocal folds through lack of external tension combined with a slower rate of vocal-fold opening and closing. However, although mean Fo was significantly different between [i] and [u] productions for all subjects, jitter was not significantly different across tongue-high vowel productions. This latter result suggests that the height of the tongue and its influence on F0 had produced a critical mode of vocal-fold vibration for individual subjects during sustained [i] productions that did not decrease jitter further with the somewhat higher frequency vibration associated with [u] productions. The results for jitter are similar to those found previously by Deem et al. (28), Linville and Korabic (27), and Milenkovic (26), who reported greatest amounts of jitter for [a] productions. In contrast, current jitter results differ from those of Glaze et al. (9, for children's [a] and [i] productions), Horii (29,31), Linville (32), Nittrouer et al. (34, for men's productions), and Sorensen and Horii (30). The latter studies differed in a number of important ways from the current procedures. Glaze et al. (9), Horii (31), Linville (32), Nittrouer et al. (34), and Sorensen and Horii (30) analyzed tape-recorded signals collected from microphone transducers. Generally, when microphones were used (e.g., 30,32), higher amounts of jitter were found for [i] productions. Although not a significant finding in the current investigation, a similar pattern was noted for women's productions when comparing accelerometer with microphone measures. Even in the Wilcox and Horii (14) study, microphone-transduced [i] productions had a slightly higher amount of mean percent jitter than [a] productions did. Taperecording effects might also add to transducer effects, explaining differences between current results and those of Horii (29), who also analyzed accelerometer measures, but from tape recordings (49,52). In addition, subject and task factors such as length of sustained vowel or surrounding phonetic context of vowels also might have contributed to observed differences in relative jitter due to vowel
153
type. The Nittrouer et al. (34) investigation obtained jitter values from vowels produced in a variety of consonantal contexts. For the four men studied by Nittrouer et al. (34), consonant context was shown to significantly affect jitter values, resulting in an unusual pattern of jitter by vowel type. On the other hand, for the four women studied by Nittrouer et al. (34), consonantal context did not significantly affect obtained jitter values, and the resulting pattern of jitter was similar to that found in the current investigation, i.e., higher amounts of jitter for the [a] vowel. Thus, gender effects may interact with phonetic context in determination of jitter in vowel productions. Overall, jitter results support predictions made from Titze's model of neurologic jitter (25), that jitter should be related closely with mean firing rate of motor units, a variable that may be associated with F 0 change within-subject. As Titze (25, p. 469) stated with some caution due to lack of information on recruitment of fibers: "One would expect considerable perturbation at low fundamental frequencies in phonation, where mean firing rates are around 10-20 Hz." Thus, when motor-unit firing rates were increased within the same person because of phonetic segment type, jitter decreased. However, as Titze (25) also discussed, a number of other factors may contribute to the total amount of jitter in voice in addition to rate of vocal-fold vibration, specifically: the total number of motor units active and their consistency in size and firing period, the presence of mucus on the vocal folds, and whether glottal air flow is turbulent. Therefore, a number of variables are needed to account for jitter, including factors related to heart rate (53). Age and gender effects Fo
As expected, children's Fos were found to be higher than those of the adults. However, significant group differences were found only between the Fos of men and other subjects (an age and gender difference) and between women and girls (an age difference). In addition, the SD of Fo across children was larger than that found for the adults, suggesting that 7-9-year-old children use a wider range of Fos in sustained vowel productions than adults do. The group variability in F0 found for children in the current investigation corresponds with that reported by Glaze et al. (9), although current Fos are higher than in the Glaze et at. study, probably beJournal of Voice, Vol. 8, No. 2, 1994
154
J. E. S U S S M A N A N D C. S A P I E N Z A
cause the children in the present study were younger. Current variability results are not comparable to those reported by Kent (5) and Ohde (6), who tabulated intrasubject variability rather than intersubject variability. Jitter Developmental differences for jitter in vowel productions roughly corresponded to the significant differences found in average F 0 values. Thus, the significant differences in jitter values found between men and other subject groups were speculated to be related to the significant differences in F 0 observed. Such differences have been attributed most strongly to between-subject differences in the length of the vocal folds and/or mode of vocal-fold vibration (33). Similarly, there were no significant differences in mean jitter values between children's and women's productions across all vowels, perhaps reflecting the more similar F0 values found between the children and women. Current findings expand on those of Glaze et al. (9), who reported similar levels of jitter in children's and adults' productions (i.e., <0.04 ms). Although the present study's jitter values were also below the 0.04-ms level referred to by Glaze et al. (26), current findings showed significantly smaller mean values for women and children (<0.02 ms) than for men (<0.035 ms). Three methodological differences--sample size, recording methodology, and speech sample context--are likely to explain the observed differences with results of Glaze et al. First, the current study used a larger sample of adults as a comparison group (20 compared with 6), perhaps permitting the gender and age differences of men to become more apparent relative to women and children. Second, an accelerometer was used to transduce the voice signal rather than a microphone as in the Glaze et al. study. The live-voice, accelerometer measures yielded better SNRs (an average of 29.6 dB) than those reported by Glaze et al. (15.4 dB), contributing to increased measurement accuracy. In addition, even though intensity was not measured directly in the current investigation, it appears that any small intensity differences in production were not responsible for the jitter and Fo values obtained. Typically, [a] vowels are produced with greater intensity (by - 4 - 5 dB) compared with [i] and [u] vowels as a result of the degree of mouth opening and greater transmission of energy through the open oral cavity (43,44). Furthermore, as reJournal of Voice, Vol. 8, No. 2, 1994
ported by Orlikoff and Baken (37), Orlikoff and Kahane (50), and Glaze et al. (9), jitter values typically decrease with increased intensity. Therefore, based on intrinsic resonated vowel intensity, the least amount of jitter should have been obtained for [a], the most intense vowel, it" intensity factors were important. In the current study, however, greater not lesser amounts of jitter were associated with [a], suggesting that possible differences in resonated intensity were not contributing factors to current results. Furthermore, as discussed earlier, because the accelerometer measures acoustic vibrations at the level of the vocal folds, the influence of vowelintrinsic intensity differences on measures was believed to be negligible. Finally, it is speculated that jitter results in the Glaze et al. study (9) differed from the present investigation because they used an [h] context for productions. The current study used only sustained vowels measured at least 1 s after onset and 1 s before offset. There were four age and gender findings for jitter measures that were not predictable from F o alone. First, although the Fos of women and girls were significantly different, their jitter values were not. It is speculated that the F0s of women were sufficiently high so that low amounts of jitter resulted. Perhaps a critical rate or mode of vibration might be used by young women and girls during sustained vowel phonation, resulting in small amounts of jitter (33,54). The increased F 0 characteristic of younger girls may not reduce the already small amounts of jitter further. Second, there were no developmental differences in jitter for [u] productions, although as discussed below, children's mean jitter values for [u] were higher than those for [i]. For [u] vowels, it is believed that all subjects were phonating at a relatively high rate for normal production, again encouraging a faster mean firing rate compared with other vowel productions, resulting in less jitter (25). The latter result agrees with the proposal by Orlikoff and Baken (37) that relative F0 in an individual's phonational frequency range is an important consideration for determining jitter. Third, the jitter values for boys in [i] and [u] productions were not significantly different from those of men. Although the jitter values for boys were also not significantly different from those of women or girls, the nonsignificant difference with men's values suggests the beginning of a gender difference observable in voices of 7, to 8-year-old boys. This
ARTICULATOR Y, DEVELOPMENTAL, AND GENDER EFFECTS
gender difference for jitter was not observed in Fo overall, because the F0s of men were significantly different from those of all other groups. Perhaps the higher F0 vowel productions for [i] and [u] created a vocal-fold vibration mode for men that was more similar to that of younger male children, maybe by reducing the hypothetical medial vocal-fold bulge to a size more similar to that in boys' voices (33). Finally, a different pattern of jitter was obtained for children compared with adults. Children's jitter values were greater for [u] productions than for [i] productions even though children's average Fo was higher for [u]. This result was consistent for both girls and boys and suggests that some developmental difference in laryngeal structure or articulatory influence of tongue position on laryngeal structure might be responsible for the children's differing pattern of jitter by vowel. It seems critical that the difference between children's and adults' patterns involved the two highest tongue position vowels, [i] and [u]. Interestingly, women's productions had the smallest amount of jitter in [i] and [u] than any group, suggesting that both gender (higher F0s for women) and developmental maturity explain the jitter differences with children. For vowels produced with low tongue height and less vocal-fold tension, such as [a], it is possible that an individual's lower F 0 contributes more heavily to the total amount of jitter than developmental factors do. On the other hand, when the vowels are produced with more laryngeal tension and perhaps articulatory precision as for [i] and [u], it is possible that developmental factors may be more important. Gay et al. (55) concluded from a recent modeling study based on adults that precision of tongue constriction degree and placement appear not to be as critical for [u] production/perception as for other vowels. Lip constriction size appears to affect perception of [u] most critically. Thus, it may be possible that children maximize [u] production by more closely regulating the critical factor of lip constriction while using less precise tongue articulation to reduce the muscular effort needed by their developing mechanisms to achieve an adequate speech output. Such a reduction of tongue constriction by children might result in higher amounts of jitter compared with adult productions for [u]. Further study of developmental factors on jitter, along with physiological measures across vowel productions, would be helpful for further delineating the contributing factors to jitter in vowel productions.
155
Acknowledgment: We thank Mair Olmsted and Tracy Schiavone, who collected and partially analyzed data from subjects. We also thank E. Thomas Doherty, who provided useful comments and suggestions on an earlier version of this article. Partial support for the current research was provided by a NYS/UUP Faculty Development grant to the first author. REFERENCES 1. Eguchi S, Hirsh I. Development of speech sounds in children. Acta Otolaryngol 1969;(suppl 257): 1-51. 2. Zlatin M, Koenigsknecht R. Development of the voicing contrast: a comparison of voice onset time in stop perception and production. J Speech Hear Res 1976;19:95-111. 3. Ramig LO, Scherer RC, Klasner ER, Titze IR, Horii Y. Acoustic analysis of voice in amyotrophic lateral sclerosis: a longitudinal case study. J Speech Hear Disord 1990;55:2-14. 4. Hirano M, Kakita Y. Cover-body theory of vocal fold vibration. In: Daniloff RG, ed. Speech science. San Diego: College-Hill, 1985:1--45. 5. Kent R. Anatomical and neuromuscular maturation of the speech mechanism: evidence from acoustic studies. J Speech Hear Res 1976;19:421-47. 6. Ohde R. Fundamental frequency correlates of stop consonant voicing and vowel quality in the speech of preadolescent children. J Acoust Soc A m 1985;78:1554--61. 7. Tingley B, Allen GD. Development of speech timing control in children. Child Dev 1975;46:186-94. 8. Glaze L, Bless D, Milenkovic P, Susser R. Characteristics of children's voice. J Voice 1988;2:312-9. 9. Glaze L, Bless D, Susser R. Acoustic analysis of vowel and loudness differences in children's voice. J Voice 1990;4:3744. 10. Peterson GE, Barney H. Control methods used in a study of the vowels. J Acoust Soc A m 1952;24:175-84. 11. Lehiste I, Peterson GE. Some basic considerations in the analysis of intonation. J Acoust Soc A m 1961;33:419-25. 12. House A, Fairbanks G. The influence of consonant environment upon the secondary characteristics of vowels. J Acoust Soc A m 1953;25:105-13. 13. Ternstrom S, Sundberg J, Collden A. Articulatory Fo perturbations and auditory feedback. J Speech Hear Res 1988; 31:187-92. 14. Wilcox K, Horii Y. Age and changes in vocal jitter. J Gerontol 1980;35:194-8. 15. Honda K. Relationship between pitch control and vowel articulation. In: Bless DM, Abbs JH, eds. Vocal fold physiology: contemporary research and clinical issues. San Diego: College-Hill, 1983:286-97. 16. Ladefoged P. A phonetic study o f West African languages: an auditory-instrumental survey. London: Cambridge University Press, 1964. 17~ Lehiste I. Suprasegrnentals. Cambridge, Massachusetts: MIT Press, 1970. 18. Erickson D, Liberman M, Niimi S. The geniohyoid and the role of the strap muscles. Haskins Lab Status Rep Speech Res 1977;SR-49:103-10. 19. Sapir S, Campbell C, Larson C. Effect of geniohyoid, cricothyroid and sternothyroid muscle stimulation on voice fundamental frequency of electrically elicited phonation in rhesus macaque. Laryngoscope 1981 ;91:457-68. 20. Colton RH, Shearer W. Hyoid position as a function of fundamental frequency in the modal and falsetto registers (Department of Otorhinolaryngology Laboratories and Clinics, Experimental Phonetics Laboratory, State University of Journal of Voice, Vol. 8, No. 2, 1994
156
21. 22. 23. 24. 25. 26. 27. 28.
29. 30. 31. 32. 33. 34.
35.
36. 37. 38.
J. E. S USSMAN
New York, Upstate Medical Center, Syracuse, New York). Technical Rep 1971;9:24. Robb M, Simmons J. Gender comparisons of children's vocal fold contact behavior. J Acoust Soc A m 1990;88:1318-22. Horii Y. Fundamental frequency perturbation observed in sustained phonation. J Speech Hear Res 1979;22:5-19. Zemlin WR. A comparison of the periodic function of vocal fold vibration of multiple sclerosis and a normal population. [Dissertation]. University of Minnesota, Minneapolis, 1962. Johnson KW, Michel JF. The effect of selected vowels on laryngeal jitter. A S H A 1969;11:96. Titze IR. A model for neurologic sources of aperiodicity in vocal fold vibration. J Speech Hear Res 1991 ;34:460-72. Milenkovic P. Least mean square measures of voice perturbation. J Speech Hear Res 1987;30:52%38. Linville SE, Korabic E. Fundamental frequency stability characteristics of elderly women's voices. J Acoust Soc Am 1987;81:1196--9. Deem JF, Manning WH, Knack J, Matesich J. The automatic extraction of pitch perturbation using microcomputers: some methodological considerations. J Speech Hear Res 1989;32:689-97. Horii Y. Jitter and shimmer differences among sustained vowel phonations. J Speech Hear Res 1982;25:12-4. Sorensen D, Horii Y. Frequency and amplitude perturbation in the voices of female speakers. J Commun Disord 1983; 16:57-61. Horii Y. Vocal shimmer in sustained phonation. J Speech Hear Res 1980;23:202-9. Linville SE. Intraspeaker variability in fundamental frequency stability: an age-related phenomenon? J Acoust Sac A m 1988;83:741-5. Titze IR. Physiologic and acoustic differences between male and female voices. J Acoust Soc A m 1989;85:169%1707. Nittrouer S, McGowan R, Milenkovic P, Beehler D. Acoustic measurements of men's and women's voices: a study of context effects and covariation, J Speech Hear Res 1990;33: 761-75. Ludlow C, Bassich C, Connor N, Coulter D, Lee Y. The validity of using phonatory jitter and shimmer to detect laryngeal pathology. In: Baer T, Sasaki C, Harris K, eds. Laryngeal function in phonation and respiration. San Diego: College-Hill, 1987:492-508. Higgins MB, Saxman JH. A comparison ofintrasubject variation across sessions of three vocal frequency perturbation indices. J Acoust Soc Am 1989;86:911-6. Orlikoff R, Baken R. Consideration of the relationship between the fundamental frequency of phonation and vocal jitter. Folia Phoniatr 1990;42:31-40. Steinsapir CD, Forner LL, Stemple JC. Comparison of data from the Visipitch and the PM 300 pitch analyzer. Presented
Journal of Voice, Vol. 8, No. 2, 1994
AND
C. SAPIENZA
39.
40. 41. 42. 43. 44. 45.
46. 47. 48.
49. 50. 51. 52. 53. 54. 55.
at the annual convention of the American Speech-LanguageHearing Association, November 1986, Detroit, Michigan. Steinsapir CD, Forner LL, Stemple JC. Voice characteristics among black and white children: do differences exist? Presented at the annual convention of the American SpeechLanguage-Hearing Association, November 1986, Detroit, Michigan. Eskenazi L, Childers DG, Hicks DM. Acoustic correlates of vocal quality. J Speech Hear Res 1990;33:298-306. Ramig L, Ringel R. Effects of physiological aging on selected acoustic characteristics of voice. J Speech Hear Res 1983 ;26:22-30. Stevens KN, Kalikow DN, Willemain TR. A miniature accelerometer for detecting glottal waveforms and nasalization. J Speech Hear Res 1975;18:594-9. Fairbanks G. A physiological correlative of vowel intensity. Speech Monogr 1950;17:390-5. Stevens K, House A. An acoustical theory of vowel production and some of its implications. J Speech Hear Res 1961; 4:303-20. Askenfelt A, Gauffin J, Sundberg J, Kitzing P. A comparison of contact microphone and electroglottograph for the measurement of vocal fundamental frequency. J Speech Hear Res 1980;23:258-73. Milenkovic P. Computer speech waveform acquisition and editing analysis. Madison, Wisconsin: University of Wisconsin-Madison, Dept. of Electrical Engineering, 1989. SAS Institute. SAS user's guide: statistics, version 5 edition. Cary, North Carolina: SAS Institute, 1985. Hanson DG, Gerratt BR, Berke GS. Frequency, intensity and target-matching effects on photoglottographic measures of open quotient and speed quotient. J Speech Hear Res 1990;33:45-50. Titze I, Horii Y, Scherer R. Some technical considerations in voice perturbation measurements. J Speech Hear Res 1987;30:252-60. Orlikoff RF, Kahane JC. Influence of mean sound pressure level in jitter and shimmer measures. J Voice 1991 ;5:113-9. Winer BJ. Statistical principles in experimental design. New York: McGraw-Hill, 1971. Doherty ET, Shipp T. Tape recorder effects on jitter and shimmer extraction. J Speech Hear Res 1988;31:485-90. Orlikoff RF. Vocal jitter at different fundamental frequencies: a cardiovascular-neuromuscular explanation. J Voice 1989;3:104-12. Sapienza CM, Stathopoulos ET. Comparison of maximum flow declination rate: children versus adults. J Voice (in press). Gay T, Boe L-J, Perrier P. Acoustic and perceptual effects of changes in vocal tract constrictions for vowels. J Acoust Soc A m 1992;92:1301-9.