Journal of Phonetics (1977) 5, 81-92
Female and infant vocal tracts simulated from male area functions Per-Erik Nordstrom Department ofPhonetics, University of Stockholm, Sweden Received 1st July 1976
Abstract:
Male area functions have been subjected to perturbations so as to simulate the vocal tracts of women and children. Known anatomical differences have been incorporated, e.g. the relatively smaller pharyngeal cavities of women/children compared with men, differences which have been assumed to be responsible for universal differences in formant frequencies (Fant, 1966). Neither length reductions only nor volume reductions in male area functions reveal formant patterns in complete agreement with observations on different speaker categories. Earlier explanations of such differences must therefore be reconsidered. To account for the cross-lingual F-pattern variations associated with different speaker categories it will be necessary to consider their perceptual function as well.
Introduction The set of experiments to be reported here is an extended replica of earlier experiments (Nordstrom & Lindblom, 1975 ; Nordstrom, 1975). It was found desirable to repeat the previous simulations with a more exact computer program (Liljencrants & Fant, 1975). The program used before (Pauli, 1974, based on Heinz, 1962) had some shortcomings for our purposes, several of which the new program eliminates. Both programs calculate the formant frequencies pertaining to a certain vocal tract from an input of cross-sectional areas and, in the case of the new program, section lengths. The new program works on the principle of a multiple-tube representation of the vocal tract, more refined than before, and permits the length of each tube to be varied at will. This was not the case before, which led to some unwanted approximations and subsequent errors in the calculations . Consequently, the length can also be different in different parts of the vocal tract, a most useful · feature in the kind of perturbations we have performed. Under equal conditions (step size 0·5 em) the two programs differ by 2·5 % or less in three formants for the set of Russian vowels (Fant, 1960). In the following presentation, the first set of data is not utilized, but the underlying reasoning is presented and applied to the new experiments. Problems A difficult problem for anyone wanting to compare acoustic vowel data is how to compensate for the appreciable variability between (groups of) speakers. In a study of Dutch with 50 male speakers (Pols, Tromp & Plomp, 1973) the total range of variation(± 3 s. D.) was 26-41 % in F1o 23-33% in F2 , and 19-31 %in F 3 • In other words, one does not need to go beyond comparisons within the same speaker category (male speakers of the same dialect) to find immense variability.
82
P.-E. Nordstrom
It has been suggested that such variations in formant frequencies are due to anatomical differences in the vocal tracts of the speakers involved (Fant, I 959, I 962, 1966, 1968, and others). In an article from I 966, Fant concentrates on the differences between speaker categories. He makes calculations based on acoustic theory and finds that the universal trends in formant differences between men and women" ... conform with anatomical constraints of the average female vocal tract". Our long-term goal being cross-lingual comparisons of acoustic vowel data, we decided to derive a normalization procedure based on Fant's findings. If all average anatomical differences between male, female and infant speakers are considered in a simulation of different vowel articulations, what are the resulting formant frequency correlates of such vocal tract variations? Do they explain the data as Fant stated in 1966? We shall first review some data on vocal tract dimensions, primarily from men and women, then we shall describe how these data were utilized in the experiments. The concluding discussion will compare our data with observations from the literature. Vocal tract dimensions
Extensive reviews of vocal tract dimensions can be found inFant (1966) and in Nordstrom (1975). Therefore only a brief presentation will be given here. An important point made by Fant (1966) is that the main difference between male and female vocal tracts is in the pharynx. The pharynx is shorter relative to the mouth in females. These observations are based on X-ray data from Chiba & Kajiyama (1941) and from a Swedish radiologist (Paul Edholm, unpublished data ; see also Fant, 1975a). In Nordstrom (1975) more data are presented (collected in the literature) which support this issue, as well as some longitudinal data on pharyngeal growth in both sexes (adapted from King, 1952). Another point made by Fant (1966) is that the vocal tracts of children are like those of women in their relative dimensions. The major conclusion in Fant's paper is that the observed formant differences between men, women and children are the result of these anatomical differences. He obviously does not mean the gross differences only, i.e. an overall scale factor of 18% between men and women, but also the vowel specific deviations from the average scale factor. Thus Fant explains, e.g., the great range of variation of the F 1 difference between men and women (from -1% in Swedish [9] to +30% in Swedish and American English[~]). He states that " ... the main physiological determinants of the specific deviations from the average rule are that the ratio of pharynx length to mouth cavity length is greater for males than for females and that the laryngeal cavities are more developed in males". Further that" ... the scaling of children's data from female data comes closer to a simple factor independent of vowel class". With this background, we decided to derive F-pattern nomograms for normalization purposes departing from Fant's findings on vocal tract dimensions. Considerations in connection with the first experiment
According to published anatomical data on vocal tract size variations, the major difference between speaker categories is in terms of vocal tract (VT) length. Therefore it was decided that the first experiment should deal with length scalings only. The length dimension has also been given major theoretical importance in determining male/female differences (Fant, 1966; Wakita, 1975). Since Chiba and Kajiyama it has been customary to discuss VT dimensions in terms of
83
Female and infant vocal tracts
scale factors. A male VT usually serves as a reference (with a scale factor of 1·0) in terms of which other speaker categories can be described. To make certain that the ranges of observed and predicted effects would correspond, a wide range of scale factors was chosen (see Table I). As can be seen, the relative sizes of mouth and pharynx have been varied, in addition to the overall length. Three of the vocal tracts have actually been observed: no . I is a male speaker from Fant (1960), no. 4 is reported by Fant (1966), and no. 8 is from Chiba & Kajiyama (1941). The others have been constructed (nos 2, 6, 7) or constitute manipulations of no. 8 (nos 3, 5). Scalings with different sc,ale factors for mouth and pharynx are what Fant calls non-uniform. Table I Vocal tract scaling factors Scaling no.
kmouth
kpharynx
1·00
1·00
2
0·95
0·85
3
0·95
0·80
4
0·85
0·77
5
0·88
0·73
6
0·90
0·70
7
0·79
0·61
8
0·77
0·64
9
0·70
0·70
Intended as Male speaker with area functions from Fant (1960) Arbitrary scaling for 90% total length Uniform enlargement of no. 8 to 87% total length (corresponding to a female vocal tract; Chiba & Kajiyama, 1941) Female speaker from Edholm [cited by Fant (1966)] Uniform enlargement of no . 8 to 80% total length (corresponding to a boy of nine; Chiba & Kajiyama, 1941) Arbitrary scaling for 80% total length Uniform scaling of no. 6 to 70% total length Girl of eight from Chiba & Kajiyama (1941) Uniform scaling of no. 1 to the same total length as no. 8
In addition to the non-uniform scalings, no. 9 was included to represent a uniform scaling of the male VT. The total length of no. 9 is the same as no. 8, and differences between them were expected to show the importance of non-uniform VT scaling. Some of the non-uniform scalings are uniform in relation to each other, i.e. have the same relative mouth and pharynx sizes. This holds for nos 3, 5 and 8 and for nos 6 and 7 (see Table 1). It was decided that the area functions of the Russian vowels from Fant (I 960) would be subjected to the scalings indicated in Table I. These area functions are generally considered to be dependable, and the Russian vowel system contains the unique and interesting vowel [i]. There is, however, a gap in the front vowel region between [e] and [a], and since this is where the male- female differences tend to be most pronounced, two additional vowels were created by simple interpolation between the area functions of [e] and [a] . The division point between mouth and pharynx was placed at the small dip in the area functions in the velar region. It can be seen fairly clearly for all the vowels. The placement is arbitrary, but it was considered more desirable than a constant pharynx cavity length,
84
P.-E. Nordstrom
which would tend to give too much prominence to this cavity. The height variations of the larynx would then be minimized. Results of the first experiment In spite of the range of variation in scale factors (cf Table I), the effect of non-uniform scaling was very small. Figures 1 and 2 show formant diagrams with F2 /F1 and F 3 /F 1 respectively. The scalings have been connected by straight lines in the order of Table I (nos 1- 8) with the exception of the uniform scaling (no. 9) which has been indicated by filled circles.
2·5
N
I
.><
tf:'
;/!;
I I I 1·5 I I I I I I I I I I / u / I / I I / I I/' II//
:/;/ I / a
0
0
0-8
1·0
Fj(k Hz)
Figure 1
F 2 /F 1 -diagram of the length scaling experiment. Uniform VT scaling is represented by the filled circles and by the dashed lines .
•
2 ·O OL----,~--:-'-:-----:-:0'-::· 6---::0-'-:·8--Ic'::O:--'
Fj (k Hz )
F igure 2
F3 / F 1-diagram of the length scaling experiment. Uniform VT scaling is represented by the filled circles.
Female and infant vocal tracts
85 F1 (kHz)
"'0 ·8 <:: 0 ()
"'
1-
> 0·6 E
.ec. :::> I
~ 0-4
z
0-4
0 ·6
Uniform
Figure3
0 ·8
VT scoli ng
Uniform vs non-uniform scaling : first formant. r - - - - - - - - - - - - - - . , - i-;:•---;>~ F (kHz) 2
0u
"'
1-
>
§ .E
·c:::> c:' 0
z
Uniform VT scaling
Figure 4
Uniform vs non-uniform scaling: second formant. F 3 (kHz) c "'
0()
..
4 ·0
I
"' 1>
E
~
<=:::>
3 ·5
I
<::
0
z
Uniform VT scaling
Figure 5
Uniform vs non-uniform scaling: third formant.
86
P.-E. Nordstrom Table IT Formant frequencies of uniform (no. 9) and non-uniform (no. 8) length and volume scalings of male area functions (no. 1). No correction has been incorporated for losses due to the wall impedance of the vocal tract FI
Vowel [a]
[o]
[u]
[i]
[i]
[e)
[E)
[
8 length 8 vol ume 9 1 8 length 8 volume 9 (9 adj* 1 8 length 8 volume 9 (9 adj* 1 8 length 8 vo lu me 9 (9 adj* 1 8 length 8 volume 9 (9 adj* 1 8 length 8 volume 9 (9 adj* 1 8 length 8 volume 9 (9 adj* 1 8 length 8 volume 9
F2
F3
VT length (em)
2464 3461 3630 3480 2388 3224 3275 3393 3354 2383 3343 3381 3403 3366 2412 3597 3507 3437 3411 3070 4007 4109t 4347t 4291 2790 3905 4010 3889 3872 2636 3757 3873 3703 3687 2458 3532 3635 3470
17·00 11 ·90 11·90 11·90 18·50 13-10 13-10 12·95 13-10) 19·50 13-80 13-80 13·65 13-80) 18·50 13-05 13-05 12·95 13-05) 16·50 11 ·70 11·70 11·55 11·70) 16·50 11·60 11·60 11 ·55 11·60) 16·50 11·60 11 ·60 11·55 11 ·60) 17-00 11 ·90 11 ·90 11·90
(Hz)
Scaling no. 640 895 970 892 504 700 759 700 692 237 323 352 333 329 289 412 482 410 407 227 318 369 322 318 419 582 667 589 586 515 710 806 719 716 588 807 903 817
1082 1512 1498 1501 866 1197 1191 1201 1187 600 854 847 853 844 1518 1927 1975 2069 2053 2275 3338 3202 3241 3199 1967 2761 2702 2774 2762 1743 2420 2395 2443 2432 1477 2070 2051 2066
* To compensate for the slight (less than 1· 5 mm) variations in total lengths between scaling nos 8 and 9, the formants of no. 9 have in some cases been ,adjusted uniformly to the same total length as no. 8 by division with the term (VT length of no. 8)/(YT length of no. 9). The adjustments were made for the purpose of the comparisons in Figs 3, 4 and 5. The length variations are due to the fact that the division point between mouth and pharynx does not always occur exactly half-way between the glottis and the lips. t In the form used for the present experiment, the computer program only calculates formant frequenc ies up to around 4000Hz. Two third formant values for [i] (scaling no. 8) have therefore been calculated . The values have been arrived at by multiplying the third formant frequencies of seatings with the same mouth/pharynx balance with the average of the scale factors found in F1 and F 2 • The F3 of no. 9 is therefore interpolated from no. 1 and of no. 8 from no. 5, both of which are low enough to appear.
87
Female and infant vocal tracts
It is remarkable how little these connected points deviate from lines from the origin and out through the male vowels (dashed in Fig. I). Such lines represent uniformly scaled VTs according to the theory and they consequently continue out through scaling no. 9. A more illustrative way of describing this phenomenon is used in Figs 3, 4 and 5. Here, the two scalings with the same total lengths are plotted against each other for each of the first three formants. Marked deviations from the line of identity can only be seen in Fig. 5 (F3 ) and for the vowels [i] and [i] in Fig. 4 (F2 ). The formant values of scalings I, 8 and 9 plus some explanations are given in Table II. Considerations in connection with the second experiment Since our data from the length seatings did not conform with the theory it was felt that another aspect could be added. The hypothesis had been that the length factors could account for the female /male formant differences as stated by Fant (1966) (e.g. see the quotations in the section on VT dimensions). Granted the validity of the formant frequency observations [stronger support is given in Fant (1975a)], our hypothesis was not substantiated. In experiment I the scaling was consistently restricted to the length dimension, but it would be more realistic to scale the cross-sectional areas as well. Mouth
Pharynx division
~ (o)
Figure 6
I
'
i
Ma:le VT
(b)
Length scaling
(c)
Volume scal ing
Models of VT perturbation. In the step from (a) to (b), only the lengths of the two cavities have been reduced. In (c) also the cross-sectional areas have been reduced, and then by the square of the length scaling factor.
Our first experiment thus utilized the type of scaling illustrated by going from (a) to (b) in Fig. 6. In the second experiment, which might be called volume scaling, the same scale factors were used as before (Table I), but when used on cross-sectional areas, the scale factors had to be squared. The total effect of volume scaling is consequently that the cavity volumes of the male VT have been multiplied by the cube of the scaling factor (cf (c) in Fig. 6). lt can be argued that this method of volume scaling introduces an unwanted step in the area functions in the velar region. However, this is not considered significant under the present circumstances. Most area functions are composed of successive tubes of varying diameters, and our approach only accentuates one of the steps in each area function. Attempts to overcome the steps were abandoned because they meant that assumptions had to be made about where to start and where to finish a continuous connection between the two cavities. No ground was found for such assumptions, and because of the quantization necessary for the implementations, the area functions would still be step-wise,
88
P.-E. Nordstrom
Results of the second experiment
From Figs 7 and 8 it is evident that volume scaling does produce a different formant pattern. The most pronounced differences compared with length scaling are to be found for the central vowel [i] and for the front vowels. The back vowels are less sensitive ([o] and [o]) or hardly at all ([u]). The first formant shows the greatest change compared with length scaling. The values are always higher, but the increase is also directly related to the balance between mouth and pharynx. With increasing balance (smaller pharynx relative to the mouth) the F 1 increase gets greater, too. This can be seen for the front vowels in Fig. 7. Seatings 4, 5 and 6 (cf Table I) have about equal total lengths and are close together when only length seatings are
3 ·0
N'
2·5
-
2·0
-
1·5
-
1·0
-
I
It'
I·/?
' / / 0
0·5
-
0
u
I
I
I
I
0 ·2
OA
0 ·6
08
j
1·0
Fj(kHz)
Figure 7
F2/F 1-diagram of the volume scaling experiment.
2 '0 o
0·2
0-4
Fj (kHz)
Figure 8
F 3 /F1-diagram of the volume scaling experiment.
89
Female and infant vocal tracts
performed (Fig. 1). When volumes are reduced (Fig. 7) they remain close together in the second formant (with the exception of [i]), whereas there is a marked spread in F 1 • The mouth/pharynx balance is thus responsible for part of the shift in the first formant. The average increase is 6·5% for scaling no. 4 (lowest balance), and 15·3% for no. 6 (highest balance). Figures 9, 10 and 11 compare length and volume scalings of no. 8. The average increase in F 1 is 11·8% (Fig. 9), as indicated by the dashed line. The second formant shifts are much smaller. Only the vowels [i] and [i] display a clear difference between length and volume scaling, and then only by -4·1% and +2·5% respectively (Fig. 10). The third formant is generally displaced in a positive direction, the exception being the vowel [i], which shows a negative shift of 2·5 %.
10
a/•
ce / / 0>
•
£/
•
0 8
0/.
c:
/
0
e / /
•
(.)
Ul
.,
E 0 6 :;)
.
0
>
/ /
+e/ 04
/
i\/ u/
0-4
06
08
Length scaling Figure 9
Length scaling vs volume scaling : first formant. The dashed line indicates the average difference.
Length scaling Figure 10
Length scaling vs volume scaling: second formant.
90
P.-E. Nordstrom F 3 (kHz)
4·0
"'
.~
0
u
"'
"'E :::>
3·5
0
>
3 ·5
4·0
Length scaling Figure 11
Length scaling vs volume scaling: third formant.
The i-like vowels are interesting from an acoustic point of view, since [i] and [i] have opposite formant/cavity affiliations as regards F 2 and F 3 (Fant, 1960, and others; Fant & Pauli, 1975). In [i], F 2 is mainly a back cavity and F 3 mainly a front cavity resonance. In [i] , conditions are reversed, so that F 3 is primarily affiliated to the back cavity and F 2 to the front one. The first formant shifts of these two vowels are roughly the same, but in the formant diagrams of Figs 7 and 8, it can be seen that the F 2 /F 1-pattern of [i] resembles the F 3 /Fc pattern of [i], and vice versa. Discussion Figure 12 compares the results of our simulations with observed formant differences between men and women from six languages (Fant, 1975a). In this article Fant gives a procedure for normalizing formant data, and we have obtained Fb F 2 and F 3 differences in per cent for "real" Russian vowels by interpolating inFant's diagram in the manner he suggests. The interpolated values should represent typical differences between Russian men and women with VT sizes as our scalings nos I and 3 respectively. Also shown are the differences resulting from length and volume scalings of no. 3 compared with our male reference (no. I). It is clear that the interpolated patterns are only in part reflected by the simulations. Although the average differences are similar in the "real" data and the length scalings (F1 : 13·3 %, 13·4%; F2 13·9 %, 13·5 %, respectively), except perhaps in F 3 (17·6 %, 14·0 %), individual vowels are widely different. Volume scaling contrives to raise k 1 (and k 3 ) and to lower k 2 , but the similarity of k 1 and k 2 is not improved. There is no sign of the very high k 1 of open front vowels, nor of the low k 1 of close vowels. In k 2 there is some resemblance for the vowels [i], [e), [E] and [re], both in length and volume scalings, but not for the other vowels. In k 3 the similarity is much greater, and perhaps somewhat better in length than in volume scaling. It has been pointed out by Fant (1975a, and personal communication) that in most of the cases with good correspondence between simulations and observations the particular resonance is of a standing-wave type. These resonances are mostly dependent on length relationships and less on volumes. The similarity in these instances can be taken as evidence for the reliability of our simulations, and the other results therefore have implications for the acoustic theory. The most significant result of our simulations is that we have ascertained that, as far as
Female and infant vocal tracts ..··
30
91
.D
Q ...
..o······
.. ··
~ 20 ...,_10
20
l
N
"""- I 0
Figure 12
Female-to-male differences( %) for three formants. The interpolated curves are based on Fant (1975a), the other two on scaling no. 3 compared with no. 1. Interpolated, e-e ; length scaling, o--- o; volume scaling,
o····o.
are known, anatomical differen ces, between men and women/children only explain part of the formant differences. Since our preliminary experiments were reported (Nordstrom & Lindblom, 1975; Nordstrom, 197 5), Fant and his research group have also studied the effects of non-uniform length perturbations (Fant, 1975b). It is significant that the theoretical bases for such perturbations needed to be studied more closely. Fant shows that a more detailed examination of the energy density along the vocal tract (as presented in Fant & Pauli, 1975) can help in explaining the effects of VT perturbations of the kinds discussed here. A couple of reservations must be made, however, although their significance is probably rather small. The formant values obtained in the simulations are not corrected for the energy losses due to the impedance of the walls of the vocal tract. The contribution of this factor is presently being studied by Fant and his colleagues. Indications are that the wall impedance correction will not significantly improve the similarity between our simulations and the observations (Fant, Pauli; personal communication). The other factor is a small systematic error in all the length scalings because the lip inductance has not been scaled together with the lengths. It is a factor to remember although it does not significantly influence our results, (Fant, 1975b). In other words we find it probable that the vocal tract form varies between men and women /children . Apart from establishing this experimentally, it is fundamental for our understanding of speech communication to find out the underlying reasons. An openminded approach is essential, and it may be necessary to discard the formant-based
92
P.-E. Nordstrom
explanation. The all-important issue is how we perceive and how the auditory system performs the normalization between speakers. Even if we can describe the formant differences between men and women fairly well (Fant, 1975a), we are far from knowing the perceptual processing which presumably necessitates these differences. The interplay between spectral envelope, the distribution of partials and the processing in our ears (and brains) is what must be mapped. Without the initiative, support and restraint of Bjorn Lindblom, these experiments and this report would not have come about. We are grateful for the fruitful discussions about the results which we have had with Gunnar Fant. We are grateful that the computer facilities at the Department of Speech Communication (Royal Institute of Technology, Stockholm) were put at our disposal. The research was supported in part by the National Institutes of Health under a research grant. References Chiba, T. & Kajiyama, M. (1941). The Vowel-Its Nature and Structure. Tokyo : Tokyo-Kaiseikan Pub!. Co. Fant, G. (1959). Acoustic analysis and synthesis of speech with applications to Swedish. Ericsson Technics, No. 1. Fant, G . (1960). Acoustic Theory of Speech Production. s-Gravenhage : Mouton & Co. (2nd edition, 1970). Fant, G. (1962). Descriptive analysis of the acoustic aspects of speech. Logos 5, 3-17. Fant, G. (1966). A note on vocal tract size factors and non-uniform F-pattern scalings. STL-QPSR 4/1966, 22-30. Fant, G . (1968). Analysis and synthesis of speech processes. In Manual ofPhonetics (Malmberg, B., Ed.). Pp. 173-277. Amsterdam: North-Holland Pub!. Co. Fant, G. (1975a). Non-uniform vowel normalization. STL-QPSR 2-3/1975, 1-19. Fant, G . (1975b). Vocal tract area and length perturbations . STL-QPSR 4/1975, 1-14. Fant, G. & Pauli, S. (1975). Spatial characteristics of vocal tract resonance modes. In Speech Communication, Vol. 2 (Fant, G ., Ed.) . Pp. 121-132. Stockholm : Almqvist & Wikselllnternational Heinz, J. M. (1962). Reductions of Speech Spectra to Descriptions in Terms of Vocal-Tract Area Functions. Dissertation, Mass. Institute of Technology. King, E. W. (1952). A Roentgenographic study of pharyngeal growth. Angle Orthodontist 22, 23-37. Liljencrants, J., & Fant, G. (1975). Computer program for VT-resonance frequency calculations. STL-QPSR 4/1975, 15-20. Nordstrom, P-E. (1975). Attempts to simulate female and infant vocal tracts from male area functions. STL-QPSR 2-3/1975, 20-33 . Nordstrom, P-E., & Lindblom, B. (1975). A normalization procedure for vowel formant data. Paper 212, VIII International Congress of Phonetic Sciences in Leeds 1975 (To be published in Proceedings . . .). Pauli, S. (1974). Computer program for calculating formants from the vocal tract area function. APRIL US (Annual Progress Report of the Institute of Linguistics . University of Stockholm), 1, Speech Physiology, 22-23 . Pols, L. C. W., Tromp, H . R. C. & Plomp, R . (1973). Frequency analysis of Dutch vowels from 50 male speakers. Journal of the Acoustical Society of America, 53, 1093- 1101. Wakita, H. (1975) . An approach to vowel normalization . Paper given at the 89th meeting of the Acoustical Society of America.