Journal of Phonetics (1996) 24, 3 – 22
Properties of the tongue help to define vowel categories: hypotheses based on physiologically-oriented modeling Joseph S. Perkell* Speech Communication Group , Research Laboratory of Electronics , Massachusetts Institute of Technology , Room 36-591 , 50 Vassar Street , Cambridge , MA 02139 , U.S.A. Receiy ed 22nd August 1994 , and in rey ised form 14th February 1995
Results from physiological modeling of the behavior of the tongue are used to motivate hypotheses about articulatory contributions to the definition of targets for the point vowels. Quantal (non-linear) relations between articulation and vowel formant frequencies predict acoustic stability over some range of variation in constriction location for / A / , / i / , and / u / (Stevens, 1972; 1989; Badin, Perrier, Boe¨ & Abry, 1990), but not for constriction degree. In agreement with this prediction, data are reviewed that show more precise positioning of the tongue body in a dorsal-ventral direction than in a direction parallel to the vocal-tract midline. It is hypothesized that for the vowels / i / and / A / , which are each produced with one narrow constriction, overall acoustic stability is achieved by virtue of a quantal acoustically-based relation between constriction location and vowel formants, in combination with a physiologically-based, quantal relation between motor commands and the degree of constriction (a mechanical ‘‘saturation effect’’—Fujimura & Kakita, 1979). For / u / , with two narrow constrictions, there is a similar quantal acoustic relation between the tongue-body constriction location and the vowel formants. The tongue-body constriction produced by contraction of the styloglossus muscle is in the acoustically-stable constriction location. There may not be a non-linear mechanism to help stabilize degree of constriction for / u / , but that lack might be compensated for by a trading relation between the acoustic effects of lip rounding and tongue-body raising. To exploit this trading relation, speakers would learn a low-level ‘‘motor equivalence control strategy’’, which they employ to help constrain acoustic variability. ÷ 1996 Academic Press Limited
1. Introduction and background Analyses of articulatory-to-acoustic relations have shown that the formants of the point vowels / A / , / i / , and / u / are relatively stable with respect to some variation in constriction location (Fant, 1960; Stevens, 1972; 1989). This stability should help to define acoustic and articulatory targets for the vowels and account for their high frequency of occurrence among the world’s languages (Maddieson, 1984; Badin, Perrier, Boe¨ & Abry, 1990). In general, sound patterns of languages are likely to be determined by a combination of acoustic, articulatory, and other factors (cf. * Also of the Department of Brain and Cognitive Sciences, M.I.T. 0095-4470 / 96 / 010003 1 20 $12.00 / 0
÷ 1996 Academic Press Limited
4
J. S. Perkell
Figure 1. A schematic illustration of articulatory data for multiple repetitions of the vowels / A / (tongue surface illustrated by — – —), / i / (— —) and / u / (- - - -), showing elliptical distributions of positioning of points on the tongue surface, with the long axes of the ellipses oriented parallel to the vocal-tract midline.
Lindblom, 1983; Lindblom & Engstrand, 1989; Boe¨ , Schwartz & Valle´ e, 1994; Perkell, Matthies, Svirsky & Jordan, in press). The purpose of this paper is to show how certain aspects of the anatomy of the tongue musculature and physiology may also contribute to defining acoustically-stable targets for the point vowels. We have published examples of articulatory data for multiple repetitions of the vowels / A / , / i / , and / u / , as illustrated schematically in Fig. 1, showing elliptical distributions of positioning of points on the tongue surface, with the long axes of the ellipses oriented parallel to the vocal-tract midline (cf. Perkell & Nelson, 1985; Perkell & Cohen, 1989; see also Beckman, Jung, Lee, deJong, Krishnamurthy, Ahalt, Cohen & Collins, 1995). The shapes and orientations of the articulatory distributions may be due in part to articulatory-to-acoustic relations. With vowel constrictions in the anterior palatal, velo-palatal, and lower pharyngeal regions, two formants converge. As a result, within these regions the formant patterns are relatively insensitive to some variation in the constriction location (cf., Stevens, 1972; 1989; Badin et al. , 1990), and the patterns have characteristics that may serve as bases for some distinctive features (cf. Jakobson, Fant & Halle, 1963). Since the formant values are increasingly sensitive to change in constriction location outside of the regions, the regions help to define articulatory targets with relatively stable acoustic outputs. On the other hand, within the regions, some formants are very sensitive to variation in the degree of constriction, as measured by dorsal-ventral displacements of the tongue surface. This is because formant values are sensitive to percentage change in the constriction area (cf. Fant, 1960), and within the narrow constrictions that characterize these three vowels, small dorsal-ventral tongue displacements cause large changes in the percentage area. Consequently, the speaker has to constrain variation in the dorsal-ventral direction, particularly in languages with relatively crowded vowel spaces. This observation implies that the components of articulatory commands for dorsal-ventral tongue positioning may have to be more precise than the components for locating the tongue constriction
Defining characteristics of point y owels
5
along the vocal-tract midline. Alternatively, certain physiologically-based mechanisms may be available to help constrain dorsal-ventral variation, so no difference in the precision of control is required for articulatory displacements in the two directions, at least for / A / and / i / . 2 . Production of point vowels 2.1. Three hypotheses about production of the point y owels Some behavior of a 2-dimensional, physiologically-based model of the tongue (Perkell, 1974) will be used, along with results from a 3-dimensional tongue model (Fujimura & Kakita, 1979) to motivate hypotheses about how the anatomy and neuro-muscular behavior of the tongue may contribute to the definition of the targets for the vowels / i / , / u / , and / A / , by making them relatively easy to produce and by helping to constrain dorsal-ventral variation for / i / and / A / . 2.1.1. Hypothesis 1 The particular anatomical arrangement of the tongue musculature predisposes speakers to produce vocal-tract constrictions at the locations that result in stable, distinctive formant patterns, mainly with the contraction of two or three muscles at the most (see also, Wood, 1979). 2.1.2. Hypothesis 2 At least for the vowels / i / and / A / , there may be physiologically-based mechanisms that help to constrain dorsal-ventral variation, which would mitigate the sensitivity of vowel formants to variation in the degree of constriction. Fig. 2 is a schematic illustration of how the transformation between motor commands to the tongue and vowel formants [Fig. 2(e)] results from a sequence of two transformations, between motor commands and vocal-tract configuration and between configuration and vowel formants, for constriction location [Fig. 2(a) & (b)] and constriction degree [Fig. 2(c) & (d)]. In Fig. 2(e) there is a region of acoustic stability, enclosed by the ellipse, over which some variation in the magnitude of motor commands produces little variation in the acoustics. Fig. 2(b) shows that the kind of quantal relation illustrated in Fig. 2(e) is the result of a configurational-toacoustic quantal relation between constriction location and formants (cf. Stevens, 1989). Fig. 2(c) shows that the kind of relation illustrated in Fig. 2(e) is the result of a quantal relation between motor commands and constriction degree (Fujimura & Kakita, 1979). Although the quantal relations arise in different stages of the overall transformation between motor commands and acoustics, the net result is that for both constriction location and constriction degree, there is an acoustically stable region for which the underlying motor commands do not have to be precise. 2.1.3. Hypothesis 3 A non-linear relation involving constriction degree may not be as necessary for the vowel / u / , because unlike / i / and / A / , / u / is produced with two independentlycontrollable constrictions, which makes it possible to constrain acoustic variation in an additional way.
6 Constriction location
J. S. Perkell (a)
(b) Formants
X Formants Motor commands Constriction degree
Constriction location Formants
(c)
(d)
Motor commands
X
Motor commands
(e)
Constriction degree
Figure 2. Schematic illustrations of relationships between pairs of parameters. (a) A non-quantal relation between motor commands and constriction location; (b) a quantal relation between constriction location and vowel formants, with an acoustically-stable region enclosed by an ellipse; (c) a quantal relation between motor commands and constriction degree, with a region of constriction degree stability enclosed by an ellipse; (d) a non-quantal relation between constriction degree and formants; and (e) a quantal relation between motor commands and formants.
In considering these hypotheses, it is important to keep in mind the following points. (1) The targets for the sounds are influenced by articulatory-to-acoustic relations that result in acoustic stability, thus the proposed articulatory mechanisms may contribute to overall acoustic stability, but they do not determine it by themselves. (2) The vowel acoustics are determined by the overall vocal-tract shape and not solely at the regions of narrow constriction; however, formants are most sensitive to small dorsal-ventral articulatory changes within narrow constrictions. (3) There is a large amount of variability inherent in speech in general and in the characteristics of individual speakers. For all these reasons, the proposed mechanisms may not be expressed in a robust manner, particularly within any individual language or speaker. However, they might have enough of an effect across speakers and languages to have influenced sound patterns and speech motor control mechanisms (cf. Perkell et al. , 1995). 2 .2. Articulatory data on production of the point y owels In this section, several examples of articulatory data are presented that are consistent with the prediction that formants for the vowels / i / , / A / , and / u / are relatively insensitive to some variation in location of the maximum vocal-tract constriction (formed by the tongue body) and relatively sensitive to some variation
Defining characteristics of point y owels
7
Figure 3. Mid-vowel x-ray microbeam data for three points on the tongue dorsum, blade, mid, and rear. The top and bottom halves show data from many repetitions of the vowel / i / spoken by two speakers. A two-sigma ellipse is drawn with its major axis passing through each group mean and aligned with the principal component of variance for the group. The hard palate is shown above; anterior is to the right.
in constriction degree (as reflected in dorsal-ventral displacement of the tongue body). Fig. 3 shows mid-vowel x-ray microbeam data for three points on the tongue dorsum, blade, mid and rear (from Perkell & Nelson, 1985). The data in the top and bottom halves of the figure are from many repetitions of the vowel / i / spoken in a variety of environments by two speakers. A two-sigma ellipse is drawn with its major axis passing through each group mean and aligned with the principal component of variance for the group. The hard palate is shown above, and anterior is to the right. For both speakers, the two distributions in the region of maximal vocal-tract constriction, blade and mid, are fairly-well aligned with the schematized tongue
8
J. S. Perkell
dorsum (dashed line), which is approximately parallel to the hard palate and the vocal-tract midline. On the other hand, the rear distributions, which are removed from the place of maximum constriction, are nearly perpendicular to the midline. At this location, variations in dorsal-ventral displacement (perpendicular to the vocal-tract midline) cause small percentage changes in the relatively large vocal-tract cross-sectional area (cf. Perkell & Nelson, 1985; Gay, Boe¨ & Perrier, 1992). Thus, while the vowel formants are determined by the entire area function, they are disproportionately sensitive to dorsal-ventral tongue displacements in the region of maximum constriction. (Such data do not actually show the location and degree of maximum constriction; however, they should provide reasonable relative approximations.) Fig. 4 shows distributions of a point on the tongue dorsum of another subject, located about halfway between the rear and mid positions, for multiple repetitions of the vowels / i / , / u / , and / A / in stressed [Fig. 4(a)] and unstressed [Fig. 4(b)] environments (from Perkell & Cohen, 1989), as transduced by an electro-magnetic midsagittal articulometer (EMMA) system (Perkell, Cohen, Svirsky, Matthies, Garabieta & Jackson, 1992). The outline of the posterior portion of the hard palate is shown at the top and anterior is to the right. This vocal-tract location is closest to the place of maximum constriction for / u / , and the / u / distribution appears to be parallel to the vocal-tract midline, which is beginning to turn downward at this point. The orientations for the / A / distributions are also approximately parallel to the vocal-tract midline near its place of maximum constriction, which is further down in the pharynx. The distributions for / i / are more circular than for the other two vowels, which is consistent with the facts that: a) this location is removed somewhat from the / i / constriction location (see above) and b) the / i / constriction is formed more by the partly-independent tongue blade than the other two vowels. Fig. 5 shows the same kind of data from another subject for points on the tongue blade and body with multiple repetitions of the vowel / u / in two conditions, with and without a bite block. These distributions are similar to a number of others we have shown that have the same orientation, approximately parallel to the vocal-tract midline near the place of maximum vocal-tract constriction for the vowel (Perkell, Matthies, Svirsky & Jordan, 1993). The plots in Fig. 6 show formant values that result from manipulating the location and degree of constriction of an articulatory synthesizer at Haskins Laboratories for the vowel / i / on the left, and / A / on the right (see Perkell & Nelson, 1985). The traces show the formant values that correspond to moving the tongue body (in millimeter increments) parallel to the vocal-tract midline (with filled circles) or normal to the midline (with tick marks). For / i / , F1 is extremely sensitive to constriction degree but not location, and for / A / both F1 and F2 are sensitive to constriction degree but not constriction location. This result is consistent with the hypothesis (explained above) that quantal, or non-linear relations between constriction location and formant values help to define acoustically-stable articulatory configurations, which may provide part of the basis of phonological features for the vowels (Stevens, 1989). However, unless mechanisms exist to mitigate the sensitivity of formants to constriction degree , the argument for quantal bases for these vowel categories is weakened.
Defining characteristics of point y owels
9
15 palate
(a)
/i/
10
/u/ 5
0
–5
/A/
Y coordinate (mm)
–10
–15
15 palate
(b)
/i/
10
/u/ 5
0 /A/ –5
–10
–15
–65
–60
–55
–50
–45
–40
–35
X coordinate (mm) Figure 4. Distributions of a point on the tongue dorsum of one speaker for multiple repetitions of the vowels / i / , / u / , and / A / in stressed (a) and unstressed (b) environments. The outline of the posterior portion of the hard palate is shown at the top; anterior is to the right.
10
J. S. Perkell
2 cm Palate
2 cm Palate
Body
Lips Blade
No bite block
Incisors
Bite block
Figure 5. Data like those in Figs. 3 and 4, from another subject, for points on the tongue blade and body with multiple repetitions of the vowel / u / in two conditions, with and without a bite block.
3 . Articulatory constraints on point vowel articulations: modeling results In this section, some results from physiologically-based modeling of the tongue are used to motivate hypotheses 1 and 2 (above), that is, to suggest that the three constriction locations may be ‘‘preferred’’ on anatomical as well as articulatory-toacoustic bases, and that there may not have to be precise control of dorsal-ventral tongue displacements, at least for / i / and / A / . 3 .1. A physiologically -oriented model of the tongue Fig. 7 illustrates a two-dimensional physiologically-oriented model of the tongue (Perkell, 1974). This model was designed to incorporate approximate representations of several basic physiological and biomechanical properties of the tongue: some of its muscular anatomy, conservation of volume and tissue incompressibility, the impenetrability of the vocal-tract walls, and very roughly, the contractile property of muscle tissue and passive elasticity and viscous properties of connective tissue. The model changed shape only in the midsagittal plane; the third dimension was fixed, so areas in this plane corresponded to volumes. The mass of the tongue is concentrated in sixteen movable ‘‘fleshpoints’’, represented by large filled circles. Ten of these fleshpoints lie on the tongue surface, and six are internal. The fleshpoints are connected to one another and to hard structures by tension-generating elements, represented by the heavy solid and dashed lines. The points of attachment to hard structures are represented by small filled circles. There are two types of tension-generating elements—active and passive—and each heavy solid or dashed line represents one passive element and usually at least one active element in parallel. The active and passive tensiongenerating elements are roughly analogous to muscle and connective tissue, respectively. The active elements were arranged anatomically and grouped to comprise ten muscle pairs or components of muscles.
Defining characteristics of point y owels
11
/ /
F3
F3
Back
Least constricted
Front F2
F2
Least constricted
Most constricted
High
F1 Most constricted
F1
Low
Figure 6. Formant values that result from manipulating the location and degree of constriction of an articulatory synthesizer for the vowel / i / on the left, and / A / on the right. The filled circles represent formant values at 1 mm increments of front-to-back tongue-body movement for / i / and low-to-high tongue-body movement for a / A / , and the tick marks show formant values that represent formant values at 1 mm increments of tongue-body movement from the least constricted to the most constricted configuration for each vowel.
In Fig. 7, the heavy solid lines and segments of the part of the mandible to which some of the lines are connected define fourteen quadrilaterals, or volumes. Fig. 8 shows an example quadrilateral in detail, with a schematic diagram of each force-generating mechanism. The actiy e tension generator consists of a passive stiffness element, KAp, in series with an active component, which consists in turn of another stiffness element, KAa, connected in parallel with a dashpot, BA. The model is activated by varying KAa. The value of BAa is linearly related to and changes with KAa. The inset in Fig. 8 shows the form of the length-tension relationship for the stiffness elements, and how excitation changes the slope of the length-tension line for KAa. Passiy e tension is generated by a simple spring, KP, and dashpot, BP, in parallel with one another. In order to provide for the transmission of compression forces and consery ation of y olume , the area of each quadrilateral has limited ‘‘compressibility’’ and a neutral or rest value. Changes in quadrilateral area cause changes in the ‘‘pressure’’ exerted on the quadrilateral walls. This pressure is resolved into forces acting on the
12
J. S. Perkell Styloid process
Hard palate
Soft palate
Upper incisors
Dorsal tongue surface Tongue blade
Tip
Dorsal
Lower incisors
Ventral Tongue body
Posterior pharyngeal wall
Superior mental spine
Base
Mandible Hyoid bone Epiglottis
Figure 7. Diagram of a two-dimensional physiologically-oriented model of the tongue. The large filled circles represent mass-bearing fleshpoints. The heavy solid (—) and dashed lines (- - -) represent tension-generating elements. See text for details.
fleshpoints. Thus, for this purpose, the walls behave as if they were massless and unbendable. Mechanisms are also included to account for impenetrability and sliding friction. The wide shaded line in Fig. 8 represents the dorsal wall of the vocal tract (and in Fig. 7, the mandible, the hyoid bone, and the styloid process of the temporal bone as well). The dorsal wall and part of the mandible surface form barriers through which the fleshpoints cannot penetrate. When a fleshpoint comes closer than some minimum distance to one of these barriers, an impenetrability force is generated. This force resists the fleshpoint’s movement toward the barrier along an axis perpendicular to the barrier. In addition, sliding friction forces act on the fleshpoints to simulate shear forces generated by the movements of the tongue within the lateral confines of the vocal-tract walls. The arrangement and grouping of active tension generating elements into ten ‘‘muscles’’ was based on anatomical references and some original dissection work (Perkell, 1974). Most of the muscles run primarily in parasagittal planes (as illustrated below). Geometrical data for determining the model’s overall shape and behavior were taken from cineradiographic tracings of a male speaker (Perkell, 1969). Some of the lines in Fig. 7 represent more than one set of tension-generating
Defining characteristics of point y owels
13 Change in KAa due to changing excitation
l la TKAa
lp
Aa
Ap
A s
TA
TKP
s
TBP
2
1
Tp
P
1
l T V
AQ
V
AQ
V
II
l
II
Figure 8. A quadrilateral formed by four tension-generating elements showing schematic diagrams of the five force-generating mechanisms incorporated into the model (active tension, passive tension, volume-conservation, impenetrability, and sliding friction). The inset shows the length-tension relationship for the active tension-generating mechanism.
elements in parallel, corresponding to parts of more than one muscle. Inputs to the model were in the form of time-varying ‘‘percentage of maximal physiological excitation’’ of the ten ‘‘muscles’’. Excitation of the muscles causes contraction of the active elements which produces movements of the fleshpoints. The main outputs are the time-varying spatial coordinates of the locations of the sixteen fleshpoints. The implementation of the model was in the form of a lumped parameter simulation that was described with a set of state equations. The state variables were the horizontal and vertical components of the position and velocity of each of the sixteen fleshpoints, as well as a variable specifying the state of each active tension-generating element. To help minimize the complexity of the overall model, the model for each of the force-generating mechanisms was relatively simple, and was only expected to reflect the actual mechanism in an approximate manner. 3 .2. Production of the y owels / A / , / i / , and / u / 3.2.1. The y owel / A / The four panels in Fig. 9 show: (a), the configuration of tension-generating elements for the hyoglossus muscle; (b), the change from a neutral (dashed) tongue contour
14
J. S. Perkell
n n n
(a)
(b)
(c)
(d)
Figure 9. Production of the vowel / A / . (a) The configuration of tensiongenerating elements for the hyoglossus muscle; (b) the change from a neutral (- - - -) tongue contour to (—) one caused by contraction of the hyoglossus (—); (c) the tongue shape that results from the addition of contraction of the pharyngeal constrictors; (d) a cineradiographic tracing of the speaker’s production of the vowel / A / .
to (the solid) one caused by contraction of the hyoglossus (shown in heavy solid lines); (c), the tongue shape that results from the addition of contraction of the pharyngeal constrictors (that do not run in a parasagittal plane); and (d), a cineradiographic tracing of a speaker’s production of the vowel / A / . (The speaker in the cineradiographic tracings in Figs. 9, 11, and 13 is the same one whose tracings were used to help specify the geometry of the model—cf. Perkell, 1969; 1974). Normally, the mandible would be more open for / A / , but it is in the same relatively closed position as in the other examples. For the most part, the displacements are what would be expected from the anatomy. Contraction of the hyoglossus pulls the oral part of the tongue dorsum downward and backward and causes the pharyngeal part of the dorsum to bulge backward toward the pharyngeal wall, narrowing the pharyngeal region and producing a tongue configuration somewhat like the vowel / A / [Fig. 9(b)]. The rear-most vertical hyoglossus element in the model seems to shorten the most and the forward-most element the least, possibly because of the greater amount of incompressible tissue that has to be displaced in the center of the tongue body. Since the hyoglossus muscle runs mainly in the vertical direction, only a modest amount of
Defining characteristics of point y owels
15
pharyngeal narrowing is accomplished by an extensive amount of muscle contraction. In other words, a moderate amount of hyoglossus contraction may be insufficient to produce enough pharyngeal constriction for / A / . The addition of the net effect of the pharyngeal constrictors [Fig. 9(c)] provides the needed extra amount of pharyngeal narrowing to achieve a more / A / -like pharyngeal configuration [Fig. 9(d)]. It is important to note that while the effect of the pharyngeal constrictors is represented in the model by two tension-generating elements in the midsagittal plane, they actually form something like a half-tube. The top part of Fig. 10 is taken from an MRI study (Baer, Gore, Gracco & Nye, / /
/ /
/A/ /i/ /u/
Figure 10. Top part: four tracings of MRI cross-sectional images of the pharynx, in the region of maximum constriction for / A / (Baer et al. , 1991). Anterior is upward. The left and right pairs of images are from two subjects; within each pair, the left one is during the production of / A / and the right one is during the production of / i / . Bottom part: plots of vocal-tract cross-sectional area as a function of dorsal-ventral vocal-tract diameter at the places of maximum constriction for / i / , / u / , and / A / .
16
J. S. Perkell
1991). It shows four tracings of MRI cross-sectional images of the pharynx, in the region of maximum constriction for / A / . Anterior is upward. The left and right pairs of images are from two subjects; within each pair, the left one is during the production of / A / and the right one is during the production of / i / . In effect, the pharyngeal constrictors, forming the bottom and lateral borders in the figures, are represented roughly as a semi-circle. The direction of shortening of the pharyngeal constrictor fibres is, of course, along the circumference of the semi-circle. In combination with the relatively inefficient effect of the hyoglossus in narrowing the pharynx, this mechanism leads to the speculation that the relation between muscle shortening and pharyngeal area constriction becomes increasingly inefficient as the area becomes very small. The plot on the bottom shows vocal-tract cross-sectional area as a function of dorsal-ventral vocal-tract diameter at three regions in the vocal tract: the place of maximum constriction for / i / , / u / , and / A / . These plots are constructed from functions derived by Baer et al. (1991). The curve for the / A / region is shallowest, possibly because of the above-proposed mechanism. Thus, anatomical considerations, some of which are incorporated into the model, lead to the speculation that as the pharyngeal constriction becomes narrow, additional muscle contraction becomes less and less effective in narrowing the area further. 3.2.2. The y owel / i / The four panels in Fig. 11 show: (a), the configuration of tension-generating elements for the posterior portion of the genioglossus muscle (which is known to have parts that contract independently—Miyawaki, 1974); (b), the change from a neutral (dashed) tongue contour to (the solid) one caused by contraction of the posterior genioglossus (shown in heavy solid lines); (c), the tongue shape that results from the addition of contraction of the mylohyoid muscle (that does not actually run in a parasagittal plane); and (d), a cineradiographic tracing of the speaker’s production of the vowel / i / . Contraction of the posterior portion of the genioglossus pulls the base of the tongue forward toward the geniod tubercle of the mandible, producing a somewhat / i / -like tongue shape. The additional action of the mylohyoid produces a shape that is very much like the / i / shape. Whether or not the mylohyoid is actually involved, the kind of mechanism proposed for / A / is not possible for / i / , because the cross-sectional area in the region of maximal constriction in the oral cavity is probably a steeper, more linear function of the dorsal-ventral distance between the tongue surface and hard palate, which may be more linearly related to muscle lengths. This reasoning implies that somewhat less effort is required to completely constrict the vocal tract in the oral region than in the pharynx. Constricting the pharynx involves narrowing the diameter of a half-tube by decreasing its circumference, which is not the case in the oral cavity. This idea is consistent with the relative rarity of pharyngeal consonants among languages of the world (cf. Henton, Ladefoged & Maddieson, 1992). Fig. 12 is a plot from Fujimura & Kakita (1979), who used a three-dimensional finite-element model of the tongue to generate vocal-tract area functions, from which formant frequencies were calculated. Their model produced an / i / with contraction of the posterior genioglossus, as shown above with the two-dimensional model. The figure shows the vowel formant frequencies as a function of variation in contraction of the posterior genioglossus (where 100% is the minimum force
Defining characteristics of point y owels
17
(a)
(b)
(c)
(d)
Figure 11. Production of the vowel / i / . (a) The configuration of tensiongenerating elements for the posterior portion of the genioglossus muscle; (b) the change from a neutral (- - - -) tongue contour to (—) one caused by contraction of the posterior genioglossus (—); (c) the tongue shape that results from the addition of contraction of the mylohyoid muscle; (d) a cineradiographic tracing of a speaker’s production of the vowel / i / .
magnitude that produces an / i / -like configuration). The open circles are from a tongue with a relaxed blade, and the filled circles show results with a stiffened tongue blade. As increasing genioglossus activity pushes the stiffened tongue blade harder against the palate, the lateral edges of the blade brace against the sides of the concave palate, so the cross-sectional area doesn’t decrease past a certain point and the formant values remain stable. With a lax tongue blade, the cross-sectional area continues to decrease and the formants continue to change. Thus stiffening the tongue blade creates the possibility for a non-linear relation between genioglossus contraction and area in the region of maximal constriction (called a mechanical ‘‘saturation effect’’ by Fujimura & Kakita); this mechanism contributes to stability of the vowel formants. In Fig. 3, which shows microbeam data for the vowel / i / , the orientation of the rear tongue pellet distribution in the direction perpendicular to
J. S. Perkell
kHz
18
% Figure 12. A plot of vowel formant frequencies as a function of variation in contraction of the posterior genioglossus in a three-dimensional finite-element model of the tongue that was used to generate vocal-tract area functions (where 100% is the minimum force magnitude that causes the model to produce an / i / -like configuration—Fujimura & Kakita, 1979). The open circles (s) are from a tongue with a relaxed blade, and the filled circles (d) show results with a stiffened tongue blade.
the tongue surface is compatible with such a variable amount of contraction of the posterior and middle portions of the genioglossus.1 3.2.3. The y owel / u / The four panels in Fig. 13 show: (a), the configuration of tension-generating elements for the styloglossus muscle; (b), the change from a neutral (dashed) tongue contour to (the solid) one caused by contraction of the styloglossus (shown in 1 In his review, Maeda observes that if the proposed non-linear effect operates in the production of / i / , the distribution of blade points along the dorsal-ventral direction in Fig. 3 should be asymmetric and skewed, presumably with more points concentrated closer to the palate. Instead, the points for both subjects appear from visual inspection to be distributed in a symmetric, Gaussian-like fashion, centered on the major axes of the ellipses. This cricitism could be valid; it encourages a statistical analysis of the distributions and further exploration of the saturation effect hypothesis in experiments that combine EMG, force of tongue-to-palate contact and tongue displacement.
Defining characteristics of point y owels
19
4 5
7 6
1 3 2
(a)
(b)
(c)
(d)
Figure 13. Production of the vowel / u / . (a) The configuration of tensiongenerating elements for the styloglossus muscle; (b) the change from a neutral (- - - -) tongue contour to (—) one caused by contraction of the styloglossus (—); (c) the tongue shape that results from the addition of contraction of the posterior genioglossus; (d) a cineradiographic tracing of a speaker’s production of the vowel / u / .
heavy solid lines); (c), the tongue shape that results from the addition of contraction of the posterior genioglossus; and (d), a cineradiographic tracing of the speaker’s production of the vowel / u / . Because of the way the fibres of the styloglossus run forward into the ventral aspect of the tongue blade as they insert into the tongue, contraction of the styloglossus produces an upward bulging of the tongue dorsum in the velopalatal region as well as a posterior movement of the tongue body. This configuration looks somewhat like the tongue shape for the vowel / u / . Contraction of the posterior genioglossus had to be added to that of the styloglossus in order to pull forward the tongue root to achieve an / u / -like shape. It is compatible with Stevens’ (1972; 1989) constriction-location argument that the constriction produced by the tongue body is in the velo-palatal region, where the formants are relatively insensitive to some variation in constriction location. Thus, a configuration that may be relatively easy to produce contributes to acoustic stability. But what about constriction degree? Presumably, as in the case of / i / , it is
20
J. S. Perkell
relatively easy to produce a narrow constriction in this area (that is, without excess muscular effort). Stiffening the lateral edges of the blade for bracing against the sides of the palate to facilitate a saturation effect is conceivable, but somewhat less plausible because of the greater width of the posterior part of the palate. On the other hand, / u / is unlike the other two point vowels, in that its acoustics are influenced strongly by two narrow constrictions instead of one. With these two degrees of freedom in the area function, the control mechanism has the option of using adjustments in one constriction to help compensate for imprecise adjustments of the other. In other words, a trading relation between the acoustic effects of lip rounding and tongue-body raising could be used to help constrain acoustic variability in production of the vowel / u / . Similar formants can be produced with slightly increased tongue-body raising and slightly decreased lip rounding, or vice-versa. We have found some preliminary evidence for such a trading relation in multiple repetitions of the vowel / u / in five of six speakers tested (Perkell et al. , 1993; 1994; 1995). 4. Discussion and conclusions Wood (1979) used cineradiographic tracings from 38 speakers, analyses of the sensitivity of vowel formants to area perturbations, and considerations of the muscular anatomy to suggest that there are four preferred constriction locations, rather than three. His results showed two pharyngeal locations within about 2 cm of one another; one of those locations is preferred on acoustic bases for rounded vowels and the other is preferred for unrounded vowels. Stevens (1989) points out that rounding affects the acoustically-stable constriction location somewhat, which is consistent with Wood’s result. The above-described tongue model does not provide any anatomical basis for two different preferred pharyngeal locations. Without a great deal of further work, it is impossible to know how realistic the modelling results are; however, they agree reasonably well with EMG studies (cf. Smith, 1971; Baer, Alfonso & Honda, 1988), especially considering the simplicity of the models and the difficulty and variability of EMG recordings. The modeling results provide motivation for hypotheses about production of the point vowels. For the two vowels that are each characterized by one narrow constriction, / i / and / A / , acoustic stability is achieved by virtue of a non-linear relation between constriction location and vowel formants, in combination with a physiologically-based, non-linear relation between motor commands and the degree of constriction. For / u / , with two narrow constrictions, there is a similar nonlinearity in the relation between the tongue-body constriction location and the vowel acoustics. The tongue-body constriction produced by contraction of the styloglossus muscle is in the region of acoustic stability. If there is no non-linear mechanism to help stabilize the degree of constriction for / u / , that lack might be compensated for by the availability of a trading relation between the acoustic effects of lip rounding and tongue-body raising. To exploit this trading relation, speakers may learn a ‘‘motor equivalence control strategy’’, which is employed to help constrain acoustic variability (cf. Hughes & Abbs, 1976; Folkins & Brown, 1987; Perkell et al. , 1993). This strategy would become incorporated into an automatically-functioning lowlevel control module that is part of a hierarchical system of speech motor control (Perkell et al. , 1995; Wilhelms-Tricarico, this volume).
Defining characteristics of point y owels
21
The hypotheses are based on a few results with models that are very simple in comparison to the actual production mechanism. However, by attempting to characterize some properties of the production mechanism, the models have helped to generate ideas about the constraining effects of those properties on sound patterns and speech motor control strategies. Such ideas may be explored further with additional experimentation and more sophisticated models (cf. WilhelmsTricarico, 1995; this issue). I am grateful for helpful comments from Melanie Matthies, Daniel Recasens, Mario Svirsky, and Reiner Wilhelms-Tricarico and constructive reviews by Pierre Badin and Shinji Maeda. This work was supported by Grants No. DC00075 and DC01925 from the National Institute on Deafness and other Communication Disorders, National Institutes of Health.
References Badin, P., Perrier, P., Boe¨ , L.-J. & Abry, C. (1990) Acoustic and articulatory considerations upon formant convergences, Journal of the Acoustical Society of America , 87, 1290 – 1300. Baer, T., Alfonso, P. J. & Honda, K. (1988) Electromyography of the tongue muscles during vowels in /EpVp / environment, Annual Bulletin 22 , Research Institute of Logopedics and Phoniatrics , University of Tokyo, 7 – 19. Baer, T., Gore, J. C., Gracco, L. C. & Nye, P. W. (1991) Analysis of vocal tract shape and dimensions using magnetic resonance imaging: Vowels, Journal of the Acoustical Society of America , 90, 799 – 828. Beckman, M. E., Jung, T.-P., Lee, S., deJong, K., Krishnamurthy, A. K., Ahalt, S. C., Cohen, K. B. & Collins, M. J. (1995) Variability in the production of quantal vowels revisited, Journal of the Acoustical Society of America , 97, 471 – 490. Boe¨ , L.-J., Schwartz, J.-L. & Valle´ e, N. (1994) The prediction of vowel systems: perceptual contrast and stability. In Fundamentalla of Speech Synthesis and Speech Recognition (E. Keller ed.), 185 – 213, New York: John Wiley & Sons. Fant, G. (1960) Acoustic Theory of Speech Production , The Hague: Mouton. Folkins, J. W. & Brown, C. K. (1987) Upper lip, lower lip and jaw interactions during speech: comments on evidence from repetition-to-repetition variability, Journal of the Acoustical Society of America , 82, 1919 – 1924. Fujimura, O. & Kakita, (1979) Remarks on quantitative description of the lingual articulation. In ¨ hman eds.), 17 – 24. London: Frontiers of Speech Communication Research (B. Lindblom & S. O Academic Press. Gay, T., Boe¨ , L.-J. & Perrier, P. (1992) Acoustic and perceptual effects of changing vocal – tract constrictions for vowels, Journal of the Acoustical Society of America , 92, 1301 – 1309. Henton, C., Ladefoged, P. & Maddieson, I. (1992) Stops in the world’s languages, Phonetica , 49, 65 – 101. Hughes, O. M. & Abbs, J. H. (1976) Labial-mandibular coordination in the production of speech: implications for the operation of motor equivalence, Phonetica , 33, 199 – 121. Jakobson, R., Fant, G. & Halle, M. (1963) Preliminaries to speech analysis. Cambridge, MA: M.I.T. Press. Lindblom, B. (1983) Economy of speech gestures. In The Production of Speech (P. MacNeilage ed.), New York: Springer-Verlag. Lindblom, B. & Engstrand, O. (1989) In what sense is speech quantal?, Journal of Phonetics , 17, 107 – 121. Maddieson, I. (1984) Patterns of Sounds , Cambridge Studies in Speech Science and Communication, Cambridge: Cambridge University Press. Miyawaki, K. (1974) A study of the musculature of the human tongue, Annual Bulletin of the Research Institute of Logopedics and Phoniatrics , 8, 23 – 50, University of Tokyo. Perkell, J. S. (1969) Physiology of speech production : results and implications of a quantitatiy e cineradiographic study , Cambridge, MA: M.I.T. Press. Perkell, J. S. (1974) A physiologically-oriented model of tongue activity during speech production, Ph.D. Thesis, Massachusetts Institute of Technology. Perkell, J. S. & Cohen, M. H. (1989) An indirect test of the quantal nature of speech in the production of the vowels / i / , / a / , and / u / , Journal of Phonetics , 17, 123 – 133. Perkell, J., Cohen, M., Svirsky, M., Matthies, M., Garabieta, I., and Jackson, M. (1992) Electro – magnetic midsagittal articulometer (EMMA) systems for transducing speech articulatory movements, Journal of the Acoustical Society of America , 92, 3078 – 3096. Perkell, J. S., Matthies, M. L. & Svirsky, M. A. (1994) Articulatory evidence for acoustic goals for consonants, Journal of the Acoustical Society of America , 96, No. 5, Pt. 2, 3325(A).
22
J. S. Perkell
Perkell, J. S., Matthies, M. L., Svirsky, M. A. & Jordan, M. I. (1993) Trading relations between tongue-body raising and lip rounding in production of the vowel / u / : a pilot motor equivalence study, Journal of the Acoustical Society of America , 93, 2948 – 2961. Perkell, J. S., Matthies, M. L., Svirsky, M. A. & Jordan, M. I. (1995) Goal-based speech motor control: a theoretical framework and some preliminary data, Journal of Phonetics , 23 , 23 – 35. Perkell, J. S. & Nelson, W. L. (1985) Variability in production of the vowels / i / and / a / , Journal of the Acoustical Society of America 77, 1889 – 1895. Smith, T. (1971) A phonetic study of the extrinsic tongue muscles, Working Papers in Phonetics , 18, U.C.L.A. Stevens, K. N. (1972) The quantal nature of speech: evidence from articulatory-acoustic data. In Human communication —a unified y iew (P. B. Denes & E. E. David eds.), 51 – 56, New York: McGraw-Hill. Stevens, K. N. (1989) On the quantal nature of speech, Journal of Phonetics , 17, 3 – 45. Wilhelms-Tricarico, R. (1995) Physiological modeling of speech production: methods for modeling soft-tissue articulators, Journal of the Acoustical Society of America , 95, 3085 – 3098. Wilhelms-Tricarico, R. (1996) Biomechanical and physiologically based speech production modeling, Journal of Phonetics , 24, 23 – 28. Wood, S. (1979) A radiographic analysis of constriction locations for vowels, Journal of Phonetics , 7, 25 – 43.