Applied Acoustics 66 (2005) 481–499 www.elsevier.com/locate/apacoust
An improved model for the acoustic radiation impedance of the mouth based on an equivalent electrical network Milan Vojnovic´, Miomir Mijic´
*
Faculty of Electrical Engineering, Bulevar kralja Aleksandra 73, 11000 Belgrade, Serbia and Montenegro Received 22 July 2004; received in revised form 26 September 2004; accepted 27 September 2004
Abstract This paper proposes a model for acoustic radiation impedance of the mouth in the form of the equivalent electrical network. Five known models of radiation impedance are compared: radiation of a circular piston set in a spherical baffle: radiation of a circular piston set in an infinite baffle, the Flanagan model, the Wakita and Fant model, and the Stevens, Kasowski and Fant model. The proposed model most accurately approximates the radiation impedance of a circular piston set in a spherical baffle. Differences between the acoustic resistance and reactance calculated by the proposed model and the piston set in a spherical baffle of 9 cm radius are relatively small in the kr < 2 region. The deviations in calculated values of the acoustic resistance and the reactance are within ±0.023 · qc/Am and ±0.008 · qc/Am, respectively, where Am denotes the area of the mouth aperture. The accuracy of the proposed model is demonstrated by vowel formant frequency calculations. Differences in formant frequencies calculated by applying the proposed model and the piston set in a spherical baffle model are less than 0.3%. 2004 Elsevier Ltd. All rights reserved. Keywords: Mouth radiation impedance; Formant; Vocal tract
*
Corresponding author. Tel.: +381 11 3248 681; fax: +381 263 928. E-mail address:
[email protected] (M. Mijic´).
0003-682X/$ - see front matter 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.apacoust.2004.09.002
482
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
1. Introduction The simplest acoustical model of the vocal tract is a tube with non-uniform crosssection perpendicular to the tube centerline. Transversal dimensions of the vocal tract are shorter than the wavelengths of the sound in the speech frequency domain (typically below 5000 Hz). Consequently, the sound propagates through the vocal tract as a plane wave, and this propagation can be mathematically described by the one-dimensional wave equation. Analysis of the wave propagation is not simple, since the tube has a non-uniform cross-section. A solution for this problem has been offered through segmentation of the vocal tract. According to this approach, the shape of the vocal tract is approximated with a number of short uniform circular tubes of various diameters, equal in length. As a result, the analysis is simplified to propagation through a uniform cylindrical tube. With respect to Russian vowels, Fant [1] introduced a discretisation of the tube cross-sectional area. It could take one of the 16 predefined values in the range of 0.16–16 cm2, where the permitted values of cross-sectional area followed a logarithmic distribution. Analogy can be made between the sound propagation through a uniform cylindrical tube and the wave motion through a uniform electrical transmission line. The currents and voltages on a uniform transmission line satisfy the same differential equations as the volume velocity and the pressure in the uniform cylindrical tube. The analogy enables modeling of a short uniform cylindrical tube using an equivalent electrical network. One such network, a symmetrical four-pole T-network, is presented in Fig. 1. Parameters in the expressions for per-unit-length inductance (L 0 ), capacitance (C 0 ), resistance (R 0 ) and conductance (G 0 ) are: c the sound velocity, q the air density, l the viscosity coefficient, k the coefficient of heat conduction, cp the specific heat of air at constant pressure, and g the adiabatic constant. As shown in Fig. 1, electrical model of the uniform cylindrical tube is entirely determined if its dimensions, i.e. the length and the cross-sectional area, are known. The equivalent electrical model of the vocal tract is created by a serial connection of symmetrical T-networks where each represents a tube segment of the length Dl. The transfer characteristic can be calculated on the basis of this electrical model. The resulting resonant frequencies of the model are equivalent to formant frequencies. Using geometrical shape of the vocal tract, it is possible, by means of analogies and electrical circuit theory, to determine the formant frequencies of the pronounced vowel. More accurate models of the vocal tract should involve some other relevant parameters [2]:
radiation impedance of the mouth aperture, viscosity and heat conduction losses in the vocal tract, vocal tract wall impedance, glottis impedance, impedance of the subglottal system, nasal cavity, etc.
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
483
Fig. 1. Equivalent electrical network of a uniform cylindrical tube of the length Dl and the cross-section area A.
Of these parameters, the one affecting the model accuracy the most is the mouth apertureÕs radiation impedance. Omission of radiation impedance from the equivalent electrical model (radiation impedance value of zero) causes errors in the calculation of the formant frequencies exceeding 10%. Because of the significant influence of radiation impedance of the mouth aperture on formant frequencies, there is a need for precise and mathematically simple model for radiation impedance. The radiation impedance of the mouth aperture is important both for synthesis and for analysis of speech. Accurate radiation impedance model is especially important for analysis of the influence of different parameters on the speech, i.e., formant frequencies. For instance, what are the effects of tongue movements on formant frequencies? In such situations, to estimate the effects of the examined parameter on formant frequencies, precise modeling for the rest of the vocal tract (glottis impedance, wall impedance, radiation impedance, etc.) is necessary.
2. Methods for modeling the acoustic radiation impedance of the mouth The acoustic radiation impedance of the mouth aperture is defined as the ratio of pressure to volume velocity: p Z¼ : q
ð1aÞ
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
484
In practice, the normalized acoustic radiation impedance of the mouth is often used instead: ZN ¼ Z
Am ; qc
ð1bÞ
where Am is the area of mouth aperture, q the air density, and c is the sound velocity. Sound propagates through a vocal tract including the end of the vocal tract: the mouth aperture and/or nostrils as a plane wave. Air between the lips oscillates in the same way a piston does; the entire surface oscillates in phase. This is why radiation from a rigid circular piston can be taken as a basic model of mouth apertureÕs radiation. The radiation impedance of the mouth aperture model needs to be relatively simple for mathematical calculation. In addition, it should also give a physical interpretation of the modelled process. In simplified electrical models of the vocal tract, the radiation impedance of the mouth is represented by a serial connection of acoustic resistance and reactance. Radiation resistance represents energy radiated away from the mouth while reactance represents effective mass of the vibrating air between the lips. Formant frequencies are more affected by radiation reactance than by radiation resistance [2]. This is the reason why only the imaginary part of the radiation impedance is commonly used in formant analysis. The best way to simulate a speakerÕs mouth is to use a circular piston set in a spherical baffle (Fig. 2). The sphere models the speakerÕs head, while the piston vibration represents vibration of air between the lips. This approach corresponds well to the physical phenomenon, which is modelled. The average radius of an adult manÕs head is around 9 cm. The piston is circular, with an area equal to the mouth 2 apertureÕs area Am ¼ pðrS sin /Þ . This is the most acceptable analytical model. Hence, this model is taken as the basic or reference model and the resulting radiation impedance is assumed to be the most accurate.
Fig. 2. Circular piston of radius r set in a spherical baffle of radius rS.
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
Mathematical relations for this model were given by Morse [3] in the form qc Z PIS ¼ ðh þ jvÞ; ðrS sin /Þ2 p h¼
2 1 1X ½P m1 ðcos /Þ P mþ1 ðcos /Þ ; 2 4 m¼0 ðkrS Þ ð2m þ 1ÞB2m sin2 ð/2 Þ
485
ð2Þ
ð3Þ
( ) 2 1 1X ½P m1 ðcos /Þ P mþ1 ðcos /Þ v¼ ½jm ðkrS Þ sinðdm Þ y m ðkrS Þ cosðdm Þ ; 4 m¼0 ð2m þ 1ÞBm sin2 /2 ð4Þ Bm ¼
1 2m þ 1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 ½my m1 ðkrS Þ ðm þ 1Þy mþ1 ðkrS Þ þ ½ðm þ 1Þjmþ1 ðkrS Þ mjm1 ðkrS Þ ; ð5Þ
dm ¼ arctan
ðm þ 1Þjmþ1 ðkrS Þ mjm1 ðkrS Þ ; my m1 ðkrS Þ ðm þ 1Þy mþ1 ðkrS Þ
ð6Þ
where Pn(x) is the Legendre function of order n, jn(x) the spherical Bessel function of the first kind of order n, yn(x) the spherical Bessel function of the second kind of order n, rS the radius of the sphere, Bm the radiation impedance magnitude for scattering by sphere, dm the radiation impedance phase angle for scattering by sphere, c the sound velocity, q the air density, k the wavelength constant (k = x/c = 2pf/c), f the frequency, and x is the circular frequency. This model is the most difficult to compute of all models considered. The spherical Bessel functions of the first and the second kind and the Legendre function cannot be represented in a closed form but only in terms of infinite series. The real and the imaginary parts of the radiation impedance (Eqs. (3) and (4)) are also given in terms of infinite series. Particular problems occur in calculation of Bm and dm because the large numbers that arise produce arithmetic overflow. Functions Bm(x) and dm(x) approach infinity for large m and small x. Besides, the summation for the v (Eq. (4)) converges slowly so a large number of terms need to be taken for that calculation to be accurate and, again, easily leads to errors caused by arithmetic overflow. Because of these difficulties, all other known models attempt to approximate the impedance in a mathematically simpler fashion. The acoustic radiation impedance of the piston set in an infinite plane and a rigid baffle is simplification of the basic model (piston in a spherical baffle). It is given by the following expression [4]: Z PIB
qc ¼ 2 r p
J 1 ð2krÞ S 1 ð2krÞ þj 1 ; kr kr
ð7Þ
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
486
where r is the piston radius, J1(x) the first-order Bessel function, and S1(x) is the firstorder Struve function. The Bessel and the Struve functions are given by the following infinite series: 1 X x x3 x5 x2nþ1 þ 5 ; ð8Þ ¼ ð1Þn 2nþ1 J 1 ðxÞ ¼ 3 2 2 1! 2! 2 2! 3! 2 n!ðn þ 1Þ! n¼0 2 x2 x4 x6 2 þ 2 2 ... 2p 3 3 5 3 5 7 1 2X x2ðnþ1Þ ¼ : ð1Þn p n¼0 ð1 3Þ ð3 5Þ ð5 7Þ ½ð2n þ 1Þ ð2n þ 3Þ
S 1 ðxÞ ¼
ð9Þ
Modeling of the mouth apertureÕs radiation using a piston set in an infinite plane baffle is often used in practice. In comparison to the radiation impedance of the piston in a spherical baffle, this model requires a somewhat simpler mathematical apparatus. The applicability of the model is useful especially at higher frequencies and for small mouth apertures. In these cases, the radiation impedance of the piston is practically the same, irrespective whether it is set in an infinite or in a spherical baffle. The Flanagan model [5], in essence, represents the simplified model of a piston set in an infinite baffle. For small values of the products of a wavelength constant and the piston radius (kr), the Bessel function J1(x) can be approximated by the first two terms of an infinite series, whereas the Struve function S1(x) is approximated by the first term only. With these simplifications, the radiation impedance gets the form " # 2 qc ðkrÞ 8kr þj ZF ¼ 2 : ð10Þ rp 2 3p The basic advantage of this model is its simplicity: there are no infinite series. In the electrical domain, radiation impedance can be represented by a serial connection of resistance and inductance. Therefore, this model is also suitable for modeling in the electrical domain. Although the equivalent electrical resistance is frequency dependent (f2), the value of this model is not reduced in electrical simulations. Starting from the Flanagan model, the radiation impedance of a piston set in an infinite baffle can be written in the form " # 2 qc ðkrÞ 8kr SðkrÞ : ð11Þ KðkrÞ þ j Z PIB ¼ 2 rp 2 3p K(kr) and S(kr) are correction functions showing the deviations of the Flanagan model from the case of the piston set in an infinite baffle. In other words, the radiation impedance is presented by a serial connection of resistance and reactance and the influence of the infinite baffle on these two components is given in the form of correction functions. If Eq. (7) is taken into account, functions K(kr) and S(kr) can be presented in the following form:
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
h i 2 1 J 1 ð2krÞ kr
KðkrÞ ¼
SðkrÞ ¼
ðkrÞ 3p 8ðkrÞ
2
¼12
ðkrÞ2 ðkrÞ4 ðkrÞ6 þ2 2 þ ; 2! 3! 3! 4! 4! 5! 2
S ð2krÞ ¼ 1 2 1
4
487
ð12Þ
6
ð2krÞ ð2krÞ ð2krÞ þ þ 2 35 3 5 7 3 52 72 9
ð13Þ
Thus, the Flanagan model is obtained from model of a piston set in an infinite baffle (Eq. (11)), when correction functions K(kr) and S(kr) are approximated only by the first term of the infinite series (Eqs. (12) and (13)). The Flanagan model can be improved by taking one more term of the infinite series, thus increasing the accuracy of the correction functions [6]. In that case, the radiation impedance of the mouth aperture is given by the expression: ( " # " #) qc ðkrÞ2 ðkrÞ2 8kr ð2krÞ2 Z CM ¼ 2 1 1 þj : ð14Þ r p 3p 2 6 15 Wakita and Fant [7] proposed the following expression for radiation impedance of the mouth aperture: " # qc ðkrÞ2 K S ðxÞ þ j0:8kr : Z WF ¼ 2 ð15Þ r p 4 The correction function KS(x) is defined by the following expression: 0:6x þ 1; 0 6 x < 2p 1600; K S ðxÞ ¼ 2p1600 1:6; x P 2p 1600;
ð16Þ
The Wakita and Fant model is obtained from the model of the piston set in an infinite baffle (Eq. (11)) if the following values are taken for correction functions: K S ðxÞ ; 2 3p SðkrÞ ¼ 0:8 : 8
KðkrÞ ¼
ð17Þ
The Flanagan model can be derived from the Wakita and Fant model if KS(x) = 2 and S(kr) = 10/(3p) 1. Introduction of the correction function KS(x) allows the radiation resistance to be more accurately modelled than in the Flanagan model. The coefficient 0.8 in the Wakita and Fant model of radiation reactance results from a simplification of the expression 8/(3p) 0.8. This rough mathematical simplification renders a more accurate model of the radiation reactance. The models discussed thus far were presented in analytical form. The use of analogy implies transformation of the acoustic model of the vocal tract into an equivalent electrical model. The electrical model is the initial step in calculation of transfer characteristic of the vocal tract. Therefore, it is proper for the radiation impedance
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
488
of the mouth to be expressed in the form of an equivalent electrical circuit. This explains the wide application of the Flanagan model. In the Flanagan model the impedance is given in both analytical (Eq. (10)) and electrical forms (serial connection of resistance and inductance). Fig. 3 shows the equivalent electrical network of the mouth apertureÕs radiation impedance proposed by Stevens et al. [8]. The parameter Am is the area of the mouth aperture given in cm2. Radiation impedance is given by the following expression: Z SKF ¼ RSKF þ jX SKF ;
ð18Þ
where RSKF ¼
X SKF ¼
x2 L21 R x2 L21
þ R2 ð1 x2 L1 CÞ2
xL1 R2 ð1 x2 L1 CÞ x2 L21 þ R2 ð1 x2 L1 CÞ2
;
ð19Þ
þ xL2 :
ð20Þ
The basic advantage of the Stevens, Kasowski and Fant model is its applicability in the electrical (frequency) and the analytical (time) domains. It is an accurate model, which will be demonstrated later in this paper.
3. The proposed model There is a lot of good software on the market for electrical network simulation, such as OrCAD, PSPICE, etc. With the help of the theory of analogy, the software may be used for analysis of the vocal tract equivalent electrical network. This enables fast and precise analysis of the vocal tract transfer characteristic (i.e., speech changes) with different parameters. To use the software, all
R
4.23 . 10 −4
L1 =
C R=
ZSKF
L1
L2
Am 45.9 Am
C = 1.033 . 10 −7 Am Am L2 =
7.11 . 10 −5 Am
Fig. 3. Equivalent electrical network proposed by Stevens, Kasowski and Fant for modeling the radiation impedance of the mouth.
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
489
elements of the vocal tract model must be in the form of equivalent electrical network. Models of the mouth radiation impedance in the form of an equivalent electrical network are useful because they have applications in both time and frequency domains. Depending on applied topology of the electrical circuit, accurate modeling of the radiation impedance is possible. That is why the equivalent electrical network was selected in defining the model of the radiation impedance presented in this paper. The circuit representing this model of the mouth apertureÕs radiation impedance is presented in Fig. 4. Generally, it can be treated as a parallel connection of acoustic resistance and inductance, with a modification of dividing the acoustic resistance into two parts (R1 and R2). An acoustic capacitance is connected in parallel with one of them (R1). For lower frequencies the circuit is equivalent to the parallel connection of inductance L and resistance R1 + R2. For high frequencies the circuit is transformed into a parallel connection of inductance L and resistance R2. Otherwise, radiation impedance of the pulsating sphere [5] is modelled by a parallel combination of a unit resistance and r/c inductance (where r is the radius of the sphere and c is the sound velocity). In the Stevens, Kasowski and Fant model (Fig. 3) when frequency tends to infinity input resistance of the electrical circuit tends to zero, and reactance to infinity. This is not consistent with the changes of radiation resistance and reactance of the piston set in a spherical baffle. When frequency tends to infinity its radiation resistance tends to a finite value (qc/Am), and reactance to zero. According to Fig. 4, radiation impedance is given by the following expression: Z P ¼ RP þ jX P ;
ð21Þ
where RP ¼
x2 L2 ðR1 þ R2 þ x2 R21 R2 C 2 Þ 2
2
;
ð22Þ
2
:
ð23Þ
ðR1 þ R2 x2 R1 LCÞ þ x2 ðL þ R1 R2 CÞ 2
XP ¼
xL½ðR1 þ R2 Þ x2 R21 CðL R22 CÞ 2
ðR1 þ R2 x2 R1 LCÞ þ x2 ðL þ R1 R2 CÞ
R1 ZP
R1 =
C L
R2
R2 = L=
30.09 Am
1.08
26.4 Am
0.89
5.25 10− 4 Am
0.53
C = 1.56 10 −7 Am
1.74
Fig. 4. Proposed equivalent electrical network for modeling the radiation impedance of the mouth.
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
490
Values of the circuit elements in Fig. 4 are calculated by optimization. The circuit elements are presented in parametric form R1 ¼ p1 Aqm1 ; R2 ¼ p2 Aqm2 ; L ¼ p3 Aqm3 ; C ¼ p4 Aqm4 : These values were substituted in Eqs. (22) and (23). Optimization involves change in parameters p1, p2, p3, p4, q1, q2, q3 and q4 in order to arrive to the smallest possible mean square deviation of the calculated resistance/reactance relative to the radiation resistance/reactance of the piston set in a sphere with the radius of 9 cm. The parameters of Eq. (21) are fitted to achieve the best approximation of radiation impedance of the piston set in a sphere given by Eq. (2). This is done by small software, which performs curve fitting. The procedure is repeated for various values of the mouth aperture ranging from 0.65 to 8 cm2. Weight optimization was applied by giving priority to larger mouth apertures.
4. Comparison of the models To establish the accuracy of the proposed radiation impedance models, the radiation of the piston set in a 9 cm radius sphere was used as a reference. This chapter includes analysis of errors in calculation of normalized radiation resistance and reactance, and analysis of errors in calculation of formant frequencies of Russian vowels. Fig. 5 shows normalized resistance and reactance of radiation impedance as a function of the product kr for six different models. Normalization was achieved by dividing the acoustic radiation impedance by qc/Am, Eq. (1b), where Am is the radiating area (area of the mouth aperture). Diagrams are drawn for three mouth apertures: 0.65, 4 and 8 cm2. It is apparent, from Fig. 5, that modeling the radiation impedance by a piston set in an infinite baffle (the PIB model) is superior for small values of the mouth aperture and high frequencies (large values of kr). For kr > 4, the radiation impedance of the mouth aperture (i.e., radiation impedance of a piston set in a spherical baffle – the PIS model) is most accurately modelled by the PIB model (the PIB curves in Fig. 5). Both the Flanagan and Wakita and Fant radiation impedance models (curves F and WF in Fig. 5) can be applied only for lower values of kr, and they will be discussed later. The Wakita and Fant model results in greater accuracy than the Flanagan model in modeling acoustic resistance as well as acoustic reactance. For higher frequencies, the normalized radiation resistance calculated by the Stevens, Kasowski and Fant model (the SKF model) tends to zero (left diagrams in Fig. 5), which is not in accordance with the normalized radiation resistance of
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
1.2
491
0.8 2
Am=0.65 cm
Normalized resistance
1.0 0.6 0.8 0.6
0.4 PIS PIB F WF SKF P
0.4 0.2 0.0 0
2
0.2 2
Am=0.65 cm
0.0
4
6
8
10
1.2
0
2
4
6
8
10
0.8
Normalized resistance
2
Am=4 cm
1.0 0.6 0.8 0.6
0.4
0.4 0.2 0.2
2
Am=4 cm
0.0
0.0 0
2
4
6
8
10
1.2
0
2
4
6
8
10
0.8 2
Am=8 cm
Normalized resistance
1.0 0.6 0.8 0.6
0.4
0.4 0.2 0.2
2
Am=8 cm
0.0
0.0 0
2
4
6 kr
8
10
0
2
4
6
8
10
kr
Fig. 5. Normalized resistance (left) and reactance (right) of the mouth apertureÕs radiation for different models: PIS, piston set in a spherical baffle (radius of the sphere is 9 cm); PIB, piston set in an infinite baffle; F, Flanagan model; WF, Wakita and Fant model; SKF, Stevens, Kasowski and Fant model; P, proposed model Am is the area of the mouth aperture.
the mouth aperture. The normalized resistance (the radiation resistance of the PIS model) tends to one for large values of kr. The model proposed in this paper (Fig. 4) has a topology that corrects such deficiency. The SKF model is applicable for modeling radiation resistance to approximately kr < 3 (diagrams on the left side in Fig. 5). Acoustic reactance modeling by the SKF model (diagram on the right
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
492
hand side in Fig. 5) is applicable only for the region kr < 2. The error in modeling acoustic reactance is significantly increased over this value of kr, and in the range 3 < kr < 5 the reactance calculated by this model is negative. Accuracy in calculating radiation resistance and reactance by the model proposed in this paper is better for larger mouth apertures (the P curves in Fig. 5). That is the consequence of the optimization procedure of electrical network elements (Fig. 4) in which the priority is given to higher values of the mouth aperture. The fact that the accuracy of the proposed model increases with the increase of the mouth aperture is proved on the radiation reactance diagram for Am = 8 cm2 (the lower right diagram in Fig. 5). In this case a proposed model for approximately kr < 4 obtains the best approximation of the radiation reactance (comparing to the PIS model). The results are better even than the results obtained by the PIB model. If the mouth aperture is small, for example Am = 0.65 cm2, the reactance calculated by the proposed model becomes negative for kr > 3.5 (the upper right diagram). For larger mouth apertures (the remaining two diagrams on the right side in Fig. 5) the results are more favorable. According to these results it may be concluded that the proposed model should not be used in modeling acoustic radiation reactance of the mouth aperture above kr 3. If the radius of the mouth aperture is r = 1.6 cm (Am 8 cm2), frequency 5000 Hz and sound velocity 353 m/s, the value of kr is 1.4. For speech analyses the accuracy of radiation impedance calculation in the region kr < 2 is of paramount importance. For all models Figs. 6 and 7 present the normalized impedance calculation error (resistance and reactance) with reference to the radiation impedance of the piston set in a spherical baffle. Error of the normalized resistance (Fig. 6) is calculated according to the expression eMODEL ¼
Am ðRPIS RMODEL Þ; qc
ð24Þ
where eMODEL is the error in calculating normalized radiation resistance for one of the five models: PIB, F, WF, SKF or P; RMODEL the radiation resistance of the mouth aperture for one of the five models: PIB, F, W, SKF or P; RPIS is the radiation resistance of the circular piston set in spherical baffle (the radius of the sphere is 9 cm). By a similar expression errors of the normalized radiation reactance are calculated and presented in Fig. 7. The error in modeling normalized resistance and reactance by proposed model is very consistent: without big variations. This is true particularly for normalized reactance (Fig. 7). Error in calculating normalized resistance and reactance in the region kr < 2 for different models are within the following boundaries:
±0.023 ±0.095 ±0.123 ±0.662 ±1.062
(resistance) ± 0.008 (resistance) ± 0.036 (resistance) ± 0.045 (resistance) ± 1.063 (resistance) ± 1.160
(reactance) (reactance) (reactance) (reactance) (reactance)
– – – – –
model proposed in this paper, piston set in an infinite baffle, Stevens, Kasowski and Fant model, Wakita and Fant model, and Flanagan model.
The error of the normalized resistance
The errorof the normalized resistance
The error of the normalized resistance
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499 0.08 0.04
0.08 PIB F WF SKF P
2
2
Am=0.65 cm
Am=3.2 cm 0.04
0.00
0.00
-0.04
-0.04
-0.08 0.0
0.5
1.0
1.5
2.0
0.08
-0.08 0.0
0.5
1.0
2.0
2
Am=4 cm
Am=5 cm
0.04
0.04
0.00
0.00
-0.04
-0.04
0.5
1.0
1.5
2.0
0.08
-0.08 0.0
0.5
1.0
1.5
2.0
0.08 2
2
Am=8 cm
Am=6.5 cm 0.04
0.04
0.00
0.00
-0.04
-0.04
-0.08 0.0
1.5
0.08 2
-0.08 0.0
493
0.5
1.0 kr
1.5
2.0
-0.08 0.0
0.5
1.0
1.5
2.0
kr
Fig. 6. Difference between the normalized radiation resistance calculated with the PIS model and calculated with: PIB, piston set in an infinite baffle; F, Flanagan model; WF, Wakita and Fant model; SKF, Stevens, Kasowski and Fant model; P, proposed model. Am is the area of the mouth aperture.
Tolerances in calculating resistance and reactance (not normalized values) can be obtained by multiplying the values presented above by expression (qc/Am). Modeling the mouth acoustic radiation resistance by means of radiation resistance of a piston set in an infinite baffle (the PIB curves in Fig. 6) can be used only if the area of the mouth aperture is small, as on the upper left diagram where
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
The error of the normalized reactance
The error of the normalized reactance
The error of the normalized reactance
494 0.06 0.04 0.02
0.06
PIB F WF SKF P
Am= 0 .65 cm
2
0.02
0.00
0.00
-0.02
-0.02
-0.04
-0.04
-0.06 0.0
0.5
1.0
1.5
2.0
0.06
-0.06 0.0
Am
1.0
1.5
2.0
0.02
0.00
0.00
-0.02
-0.02
-0.04
-0.04 0.5
1.0
1.5
A m = 5 cm 2
0.04
0.02
2.0
0.06
-0.06 0.0
0.5
1.0
1.5
2.0
0.06 Am= 6 .5 cm 2
0.04
0.02
0.00
0.00
-0.02
-0.02
-0.04
-0.04 0.5
1.0 kr
1.5
Am= 8 cm 2
0.04
0.02
-0.06 0.0
0.5
0.06 = 4 cm2
0.04
-0.06 0.0
Am= 3.2 cm2
0.04
2.0
-0.06 0.0
0.5
1.0
1.5
2.0
kr
Fig. 7. Difference between the normalized reactance calculated with the PIS model and calculated with: PIB, piston set in an infinite baffle; F, Flanagan model; WF, Wakita and Fant model; SKF, Stevens, Kasowski and Fant model; P, proposed model. Am is the area of the mouth aperture.
Am = 0.65 cm2. For larger values of the mouth aperture such an approach to modeling is acceptable only for kr > 1.7. In the region kr < 1.7 more precise simulation of the radiation resistance is achieved with other models, except for the Flanagan one. In the region kr < 1.2 radiation resistance of the mouth aperture is more accurately simulated by the WF, the SKF and by the model proposed in this paper. The greatest errors of modeling radiation resistance are obtained from the Flanagan model (the F
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
495
curves in Fig. 6). According to that curve, practical use of the Flanagan model is limited to the region kr < 0.75. By introducing the function Ks(x) in the WF model, Eq. (16), proper modeling of the acoustic radiation resistance in the region kr < 1 is obtained (the WF curves in Fig. 6). In comparison to the radiation resistance approximated by the PIB model, the WF model is more accurate in the region kr < 1.2 if area of the mouth aperture is not smaller than 1 cm2. The SKF model (the SKF curves in Fig. 6) shows even better results in modeling radiation resistance. This model, like the previous one, is somewhat less precise for small mouth apertures (the upper left diagram in Fig. 6). With the increase of the mouth aperture area the accuracy of the model increases but only in the region kr < 1. Above this value the error of the resistance modeling increases, and better results are obtained by using the PIB model (for example kr > 1.7). The error in modeling the acoustic radiation resistance of the mouth by the model defined in this paper (the P curves in Fig. 6) is quite balanced, irrespective of the mouth aperture area. In comparison to the PIB model, acoustic radiation resistance is more accurately modelled for all values of the mouth aperture opening in the region kr < 2. Out of all models presented, the SKF and the WF models only can be compared with the proposed one but only in the region kr < 1. Above this region the proposed model provides the best modeling results. With the increase of the mouth apertureÕs area the modeling accuracy of the radiation reactance decreases by means of the PIB model (the PIB curves in Fig. 7). This model has the largest deviations around kr 0.5. This value of kr corresponds to the frequency of about 2300 Hz for the mouth apertureÕs area between 3 and 8 cm2. In other words, the largest error in modeling the acoustic reactance is in the most essential frequency region for the speech. The F model is less precise even in the case of modeling the acoustic radiation reactance (the F curves in Fig. 7). According to Fig. 7 the application of the F model is limited to the region of about kr < 0.5. Expressions for calculation of radiation reactance with F and WF models differ only in mathematical simplification of the expression: 8/(3p) 0.8 (Eqs. (10) and (15)). This small difference produced considerably more accurate modeling of the radiation reactance of the WF model (the WF curves in Fig. 7). In the region kr < 0.5 the accuracy of the model is better even than with the PIB model. Radiation reactance of the mouth apertures according to the SKF model (the SKF curves in Fig. 7) is exceptionally accurate in the region kr < 1 provided that the mouth apertureÕs area is not smaller than about 1 cm2. The region kr < 1 corresponds to the frequency region f < 3500 Hz if Am = 8 cm2, i.e., to the region f < 5.750 Hz if Am = 3 cm2. It means that the largest accuracy in the calculation of the reactance by this model is in the frequency region essential for the speech. That is why this model, as will be shown later on, yields accurate results in the examples of formant frequencies calculation for different vowels. The best results in modeling radiation reactance of mouth apertures are obtained by the model presented in this paper (the curves P in Fig. 7). In comparison to modeling of the radiation resistance (the P curves in Fig. 6) the reactance modeling (the P
496
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
curves in Fig. 7) is even better. It is of particular importance if we know that the accuracy of the formant frequencies calculation is much more affected by the reactance than the resistance of the radiation impedance. The advantage of the proposed model is particularly expressed for large mouth apertures and high frequencies (large kr). Practically for all values of the mouth aperture and 0.5 < kr < 2 the proposed model approximates most accurately the radiation reactance of the mouth, i.e., the radiation reactance of the piston set in a spherical baffle. Another way to compare different models is calculation of formant frequencies of Russian vowels. Vocal tract configurations for production of Russian vowels are taken from Fant [1]. Modeling the vocal tract is somewhat simplified, due to the facts that:
effect of subglottis system is neglected, infinity impedance of the glottis is taken, infinity impedance of the vocal tract wall is taken, and effect of the nasal cavity is neglected.
The vocal tract structure in the voicing of Russian vowels is approximated by uniform cylindrical segments of the length 0.5 cm and different cross-sectional areas. Each cylindrical segment in the electrical domain is presented by the T-network as in Fig. 1. Vocal tract excitation in the acoustical domain is presented by the source of a constant volume velocity, i.e., by a constant current source in the electrical model. At the end of the vocal tract is the radiation impedance defined by one of the six models considered. The equivalent electrical model of the vocal tract formed in this way represented the basis for calculation of transfer characteristic, i.e., formant frequencies of the vowels. Calculation has been realized by algorithm for a recursive calculation of the transfer characteristic [9]. Frequencies of the first five formants were calculated with the accuracy of 0.1 Hz. Values of the constants used in the program were: c = 35,300 cm/s, q = 1.14 · 103 g/cm3, l = 1.84 · 104 g/cm s, k = 5.5 · 105 cal/cm s C, cp = 0.24 cal/g C and g = 1.4. Formant frequencies calculated according to the above conditions are shown in Table 1. The column ‘‘NULL’’ represents the case of zero radiation impedance. As it can be seen, radiation impedance different from zero decreases all formant frequencies. As in calculating normalized resistance and reactance, the model presented in this paper yields the most accurate results, because it gives values of formant frequencies closest to the model of a piston set in a spherical baffle. Maximal absolute percentage errors in calculating formant frequencies of Russian vowels for considered models are:
max(jEi,Pj) = 0.29% for F1 [u], max(jEi,SKFj) = 0.63% for F4 [e], max(jEi,PIBj) = 1.08% for F2 [ ], max(jEi,WFj) = 3.17% for F4 [e], max(jEi,Fj) = 4.46% for F4 [e],
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
497
Table 1 Formant frequencies of Russian vowels calculated for different models of radiation impedance: PIS, piston set in a spherical baffle (radius of the sphere is 9 cm); PIB, piston set in an infinite baffle; F, Flanagan model; WF, Wakita and Fant model; SKF, Stevens, Kasowski and Fant model; P, proposed model; NULL, zero radiation impedance Vowel for-mants
PIS (Hz)
PIB (Hz)
F (Hz)
WF (Hz)
SKF (Hz)
P (Hz)
NULL (Hz)
F1 F2 F3 F4 F5
[a] [a] [a] [a] [a]
642.2 1085.1 2468.8 3620.6 4133.7
638.1 1076.8 2463.7 3612.3 4131.7
637.9 1075.5 2457.6 3583.2 4125.5
640.1 1080.8 2462.6 3594.5 4128.5
641.7 1084.6 2468.8 3617.3 4133.1
641.3 1083.8 2468.9 3620.7 4133.6
676.0 1187.1 2554.0 3791.6 4185.6
F1 F2 F3 F4 F5
[e] [e] [e] [e] [e]
420.0 1973.5 2819.6 3650.2 4212.7
417.9 1969.6 2813.4 3630.7 4181.9
417.8 1964.8 2787.4 3487.3 4151.3
418.8 1967.9 2794.8 3534.3 4177.5
419.4 1972.5 2815.0 3627.3 4200.4
419.4 1973.4 2819.1 3646.5 4207.6
435.0 2016.9 2912.7 3856.8 4632.0
F1 F2 F3 F4 F5
[i] [i] [i] [i] [i]
227.0 2276.0 3107.1 3728.6 4770.3
226.6 2275.4 3096.9 3724.6 4762.4
226.6 2274.7 3066.7 3713.0 4572.1
226.8 2275.3 3079.7 3718.8 4629.8
227.0 2276.1 3105.7 3728.1 4745.0
226.9 2276.1 3109.4 3728.8 4773.7
230.2 2284.5 3290.3 3800.5 4970.6
F1 F2 F3 F4 F5
[o] [o] [o] [o] [o]
505.1 868.1 2389.9 3457.7 4019.8
501.8 861.7 2387.6 3457.1 4018.4
501.8 861.3 2385.8 3455.9 4014.5
503.7 865.5 2388.1 3456.7 4016.6
505.0 868.3 2390.5 3457.8 4019.8
504.2 866.7 2390.0 3457.7 4019.9
535.0 958.4 2437.0 3470.9 4049.9
F1 F2 F3 F4 F5
[u] [u] [u] [u] [u]
237.5 600.2 2383.0 3710.2 4055.8
236.5 599.6 2383.0 3710.1 4055.6
236.5 599.6 2383.0 3710.1 4055.6
237.2 600.0 2383.0 3710.2 4055.8
237.6 600.3 2383.1 3710.2 4055.9
236.8 599.8 2383.0 3710.2 4055.7
248.7 608.1 2384.2 3711.3 4059.0
F1 F2 F3 F4 F5
[ [ [ [ [
289.6 1529.0 2413.4 3470.9 4198.8
288.8 1512.5 2412.1 3468.8 4197.3
288.8 1505.1 2410.5 3461.0 4188.1
289.2 1516.3 2411.7 3463.8 4190.7
289.4 1527.5 2413.2 3469.8 4197.5
289.4 1527.9 2413.4 3470.8 4198.6
295.0 1729.0 2436.6 3509.8 4227.0
] ] ] ] ]
Ei;MODEL ð%Þ ¼ 100
F i;MODEL F i;PIS ; F i;PIS
ð25Þ
where Ei,MODEL represents the percentage deviation of the ith formant frequency calculated for one out of five models (PIB, F, WF, SKF or P) with respect to the PIS model. Index i takes the values from 1 to 30 (the first five formants of six vowels). As can be seen from the results, formant frequency calculation of the vowel [e] is the most critical. It is particularly characteristic for the SKF model which was found to be very accurate, except in the case of the vowel [e]. Mouth aperture for this vowel
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
498
is the largest: Am = 8 cm2. Other models, except for the proposed one, have a similar tendency: increase of the percentage error in calculation of formant frequencies with enhancement of the mouth aperture. For small mouth apertures (the example of vowel [u] in Table 1) calculated formant frequencies are practically the same, irrespective of the radiation impedance model. It means that for small mouth apertures larger errors could be tolerated in modeling radiation impedance. It is these basic reason why the priority to large mouth apertures is given in the optimization procedure of the circuit in Fig. 4. Perhaps a better indicator for comparison of the models described is the mean absolute percentage error, defined as 30 30 1 X 1 X F i;MODEL F i;PIS ; ð26Þ EMODEL ð%Þ ¼ jEi;MODEL j ¼ 100 5 6 i¼1 5 6 i¼1 F i;PIS where 5 represents the number of formants (the first five) and 6 is the total number of vowels. Mean percentage errors for particular models are: EP ¼ 0:057%; ESKF ¼ 0:082%; EPIB ¼ 0:282%; EWF ¼ 0:462%; EF ¼ 0:740%; If error tolerance in calculation of normalized reactance is compared, one can find that the PIB model (±0.036) is more accurate than the SKF model (±0.045). However, in the example of calculation of formant frequencies of Russian vowels, the situation is different: the SKF model was found to have a considerably higher accuracy. The reason is that radiation reactance calculation demonstrates accuracy in the frequency region essential for the speech: below kr 1 (Fig. 7). In this region of kr value, the SKF model calculates radiation reactance more exactly. The SKF model is ‘‘optimized’’ for the frequency region of the speech. The model of mouth apertureÕs radiation impedance proposed in this paper produces the greatest accuracy in calculating formant frequencies. This finding is valid both for maximal percentage error (less than 0.3%) and for average error (less than 0.06%). In formants analyses, where the error in calculating formant frequencies lower than 0.3% can be tolerated, radiation impedance of the mouth aperture can be modelled by the proposed model, instead of by the piston set in a spherical baffle.
5. Conclusions The proposed electrical network for modeling the mouth apertureÕs radiation impedance (Fig. 4) is a good approximation to the radiation of the piston set in a
M. Vojnovic´, M. Mijic´ / Applied Acoustics 66 (2005) 481–499
499
spherical baffle of radius 9 cm. For small values of mouth aperture (for example Am = 0.65 cm2) the use of the model proposed in this paper is limited to the region kr < 2 (the upper diagrams in Fig. 5). For larger mouth apertures the proposed model can be used in a wider region: kr < 3 (the middle and bottom diagrams in Fig. 5). If these ranges are transformed into frequency ranges it follows that the proposed model is a good approximation to the radiation of the piston set in a spherical baffle in the frequency region below 10 kHz. This that means that the proposed model is fully applicable in speech analysis. For analysis at frequencies above 10 kHz, modeling of the mouthÕs radiation impedance by radiation of a piston set in an infinite baffle is recommended. Differences between the radiation impedance of the piston set in a spherical baffle and the piston set in an infinite baffle are minimal for high frequencies, but the latter method is considerably simpler for calculation. Mathematically, the model proposed in this paper is simple because it is given in term of the equivalent electrical network. That means it is applicable in the time domain as well as the frequency domain. The accuracy of the model proposed in this paper has been demonstrated by example calculations of the formant frequencies of Russian vowels. In comparison with other models the greatest accuracy was achieved with the proposed model. The maximum errors in calculating the formant frequencies of Russian vowels were less than 0.3%.
References [1] Fant G. Acoustic theory of speech production. Mouton: The Hague; 1970. [2] Badin P, Fant G. Notes on vocal tract computation. Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, 1984; STL-QPSR 2–3/1984, p. 53–108. [3] Morse PM. Vibration and sound. New York: McGraw-Hill; 1948. [4] Morse PM, Ingard KU. Theoretical acoustics. New York: McGraw-Hill; 1968. [5] Flanagan JL. Speech analysis, synthesis and perception. New York: Springer; 1972. [6] Chalker DA, Mackerras D. Models for representing the acoustic radiation impedance of the mouth. IEEE Trans Acoust Speech Signal Process 1985;ASSP-33:1606–9. [7] Wakita H, Fant G. Toward a better vocal tract model. Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, STL-QPSR 1/1978; 1978. p. 9–29. [8] Stevens KN, Kasowski S, Fant G. An electrical analog of the vocal tract. J Acoust Soc Am 1953;25:734–42. [9] Fant G. The vocal tract in your pocket calculator. Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, STL-QPSR 2–3/1985; 1985. p. 1–19.