Journal of Phonetics (1985) 13, 189- 203
Distinguisability of tongue shape during vowel production Kathleen A. Morrish*, Maureen Stonet, Thomas H. Shawker* and Barbara C. Soniest Department o.f *Diagnostic Radiology and tRehabilitation M edicine , National Institutes o.f Health, Bethesda , MD 20205, U.S.A. Received 7th December 1984
Five repetitions of each of the vowels /i/, fuf, fof, /ref and fa/ were produced by a normal volunteer using the carrier phrase "I can say fdVd/ again " . An ultrasound sector scanner was used to obtain midsagittal pictures of the posterior section of the tongue from vertical to about 85° posteriorly. The pictures were digitized, and the resulting points were used to generate average tongue surface shapes and average polynomial fits for each vowel. The mean variation about the average tongue position was about I mm, and variation appeared to be smaller than this at positions where the importance of tongue position is thought to be greatest. The position of the tongue surface was significantly different for each of the vowels. The coefficients of polynomials of as low as second order distinguished between the five vowels considered. However, even seventh order fits did not reflect certain details of tongue configuration which were visible in the average tongue shapes.
Abstract:
Introduction
In speech research , the tongue surface is typically studied either in relation to the vocal tract as a whole (Kent & Moll , 1982; Wood, 1979; Harshman, Ladefoged & Goldstein, 1977; Shirai & Honda, 1976; Mermelstein, 1973a; Perkell, 1969) or to specific articulators such as the jaw and lips (e.g. Alfonso & Baer, 1982; Lubker, McAllister & Lindblom, 1977; Kuehn & Moll, 1976; Lindbolm & Sundberg, 1971 ). Mathematical models have been proposed by which to economically represent the vocal tract or tongue surface (cf. Hashimoto & Tanimoto, i 978). Lateral X-ray images have been the basis of these studies because, until recently , X-ray fluoroscopy was the only way to visualize the entire tongue surface . While many researchers have analysed tongue shapes (e.g. Hashimoto & Sasaki , 1982; Mermelstein, 1973b), no studies have explicitly shown the existence of reliable parameters which characterize them . This is in large part due to the fact that multiple measurements are needed to examine intrasubject variability, and X-ray fluoroscopy cannot be used for multiple studies of a single subject because of the hazard of ionizing radiation. The data used in the present study are real-time ultrasonic images of the to ngue taken in midsagittal position. Because ultrasound has no known harmful side effects , there is no quantity or time restriction in data collection. *Portions of this work were presented at the 107th meeting of the Acoustica l Society of America in May, 1984. 0095-4470/85/020189
+
15 $03 .00/0
© 1985 Academic Press Inc. (London) Limited
190
K. A . Morrish et al.
In a previous study (Stone, Sonies, Shawker, Weiss & Nadel , 1983) the reliability of measurement techniques outlined in the next section was confirmed. The two judges who performed all of the measurements in this study collected tongue shape data from a single image five times each on two different days. The standard deviation about each point for each subject was about 0.5 mm . This experiment has been repeated on other images with similar results. A second study (Morrish, Stone, Sonies, Kurtz & Shawker, 1984) demonstrated the reliability of several numerical parameters of quadratic and cubic fits to data obtained by the above techniques. In that study, a single image of each of the five vowels used in this study was digitized five times by two judges and the resulting sets of data were fit with quadratic and cubic functions. The standard deviations of the constant, x and x 2 coefficients were less than 2% of an appropriate reference value in each case. These parameters were examined with the hope that a few of them could be found which describe tongue shape in an objective manner. In the present research, further progress is achieved by showing that the previously described curve fit parameters can distinguish between vowels in a single speaker. It would be pointless to consider whether such parameters could distinguish between vowels if the shape of the posterior midsagittal tongue surface did not vary significantly between vowels. Several earlier studies have focused on related questions; for instance, Shirai & Honda ( 1976) indicate that the shape of the tongue from X-ray tracings for the five Japanese vowels /i/, fuf, fof, fa/ and fe/ can be distinguished by certain parameters of a linear transformation. However, such tracings yield information on the highest surface of the tongue, not on the midsagittal section. Ultrasound studies of coronal tongue sections indicate a midsagittal groove in many speakers (Shawker, Sonies & Stone, 1984; Shawker & Sonies, 1984). In particular, the subject who participated in the present study produced a midsagittal groove for every vowel considered here. Attempts to measure the position of the midsagittal tongue include those in which lead pellets are fixed to the tongue surface and tracked by an X-ray micro beam. The data used by Perkell & Nelson ( 1982) are an example of this . The present study differs from X-ray micro beam studies in several respects. First, with ultrasound , only posterior tongue position is examined. Second, points measured do not represent fixed positions on the tongue surface. Finally, no foreign objects are introduced into the oral cavity. Because of such differences between our work and that of other researchers, it was felt that distinguishability of vowels for the current measurement system should be verified. To this end, a mathematically sound method by. which to compute an average tongue shape given several sets of data points which represent different productions of the same vowel by the same subject was formulated. The resulting average curves were compared for differences in tongue position to determine whether they distinguished between vowels. In addition, both the curve fits and average curves were analysed for their ability to model the shape of the tongue. Method
Data acquisition The vowels chosen for study were selected for their roles relative to several models of speech production. fi/, fuf and fa/ are the extremes in the traditional "point" theory of vowel production (cf. Wood, 1975). This theory was based upon oral inspection and acoustic patterns. More recent models, based on X-ray data (e.g. Wood, 1979) redefine vowel positions. fof and /re/ are expected to behave differently under each of these
Tongue shape during vowel production
Figure I
191
The scanning sector is directed back into the resting throat until the hyoid bone is visible. The transducer is represented by the small circle below the chin.
theories. Therefore, they were included to determine if the current methods support one theory over the other, or suggest a different model altogether. Images of the midsagittal section of the tongue during production of the five vowels in the phrase "I can say /dVd/ again" were obtained by use of a mechanical sectoring real-time ultrasonic scanner (Advanced Technology Laboratory). The five phrases were read aloud in order and repeated five times at a single session. The word "I" was stressed to prevent undue emphasis on the target word. The subject rehearsed the phrase as many times as needed to produce the correct stress pattern before measurements were taken. A 3.0 MHz transducer was used to scan an 85° sector of the tongue at a rate of 37 frames per second. The focal length of the transducer was 5 em (3- 8 em focal zone). Axial resolution at the focal depth was 1.0 mm; lateral resolution was 1.9 mm. Nine gray scale levels were used. The scanner was placed submentally and 2 em behind the mental symphysis at midline, and the beam was directed back into the oral cavity (Fig. 1). The subject swallowed and sustained several isolated vowels to assure that the hyoid bone was visible at the edge of the sector. Once placed, the transducer was held manually by a radiologist who made a concerted effort to maintain it at a constant position and pressure during recording. Thus, the transducer rode along with the jaw, and effects due to jaw movement were removed. The resulting midsagittal 85° scan provided an ultrasound image of the posterior tongue surface (Fig. 2). These images were stored on videotape, and photographs were made of the unique frames during each vowel production at which the point of maximum displacement of the tongue was achieved. The front of the tongue was always located at the left of the picture. A radial grid, superimposed on the photograph, was used to locate specific points on the surface of the tongue. The origin of the coordinate system was located at the mid-point of the transducer line. This line was 2 em above the hub of the piezoelectric crystals which generated the 85° sector. Thus, the data subtended an angle greater than 85° about the chosen coordinate system. The lateral-most radii of the grid were 180° apart, and 37 radii divided the image into 5° sectors. Both the picture and the grid were placed on a Summagraphics graphic tablet digitizer. Data points of interest were entered into a computer using a digitizer pen or a cross-hair cursor. The resulting sets of data contained 26-30 points each, depending upon the shape of the tongue and the clarity of
192
K. A . Morrish et al.
Figure 2
Ultrasound image of the midsagittal tongue su rface.
the picture. Measurements obtained by digitization were sent to a DEC system-) 0 computer for storage and processing, along with alphanumeric labeling and coding data (see Stone et al. ( 1983) for a detailed explanation of this measurement system). Figure 3 illustrates the orientation of the tongue relative to the coord inate system used .
Data analysis There is no a priori reason to believe that every region of the tongue will differ between productions of two different vowels. Factors which affect formation of speech sounds outside of the region imaged, such as the shape of the lips, may actually provide the definitive character of certain vowels. Therefore, whenever tongue position is measured
y
I I I I I
!..--/ Figure 3
/
Orientation of the tongue surface rela tive to the coordinate system .
Jt
Tongue shape during vowel production
193
and analysed for more than one vowel, the researcher must demonstrate that the tong ue shapes for each vowel differ enough from one another that the results obtained are not attributable to the normal variations within the vowels tested. One of the questions that this experiment was designed to answer was whether or not the shape of the posterior midsagittal tongue surface varies significantly between the five vowels tested for this subject. If the answer to this question is affirmative, then it might be possible to obtain parameters which distinguish between the vowels tested. Such parameters would all ow the tongue shapes of speakers to be measured in an objective, quantitative manner. Once again, there is no a priori reason to believe that every parameter chosen will distinguish between two different vowels. First, a parameter must be reliable . Several quantities satisfy this criterion for the grid-digitized ultrasound data used in this study. Among these quantities are the coefficients of the x 2 and x terms, the constant term and the location of the local maximum value of quadratic and cubic fits to the data. The positions and finite difference slopes have also been shown to be reliable (Morrish et al., 1984). Second, the parameter chosen must differ significantly between vowels. Therefore, this experiment was designed to determine if any of the parameters previously determined to be reliable satisfy this second criterion.
Average tongue shapes An average tongue shape during production of a given vowel was obtained by combining the information contained in five repetitions of the same sound. A number of strategies for performing thi s task were considered . First, an interpolation procedure was chosen to provide for the fact that not all data sets were comprised of the same x-values. For simplicity , linear interpolation was chosen. This choice also had the virtue of consistency; all interpolations were performed in the same fashion regardless of their position along the curves. Not all interpolation schemes have this property (see, for example, Boyce & DiPrima, 1970). Once the five tongue shapes had been represented by five piecewise linear curves, an average tongue shape was so ught. It was defined to be that curve which, at a given x-value, minimized the root mean squared (RMS) error between its y -value and those of the component curves. Thus, they-value sought was the average of they-values of the component curves. The a-verage curve is also piecewise linear. Figure 4 illustrates this process in the case of two component data sets. Curve fits The same 25 sets of data used in the average tongue shape analysis were employed for this study. Each set of data was fit with polynomials of order 0-7 using an LU decomposition technique on a microcomputer. Figure 5 is an example of a cubic fit to a production of /i/ . Due to roundoff errors, higher order fits could not be reliably performed . The accuracy of the algorithm for fits of order 0- 7 was checked with a commercially available curve fit routine on a microcomputer (Knott & Shapiro, 1981) and found to yield results to within 1% in all coefficients. Again it was desired to characterize the vowels by combining the information contained in several repetitions of the same sound. In this case, all repetitions of a given vowel were considered of equal importance in determining an overall representation. It is shown in the Appendix that, for a fit of given order, the proper average polynomial
K. A. Morrish et a!.
\94
Y(mm)
0 0
(a )
0 0
0 0
40 0 0
0
20 0
oo
-40
-20
0
20
40
-40
-20
0
20
40
X(mm)
-20
0
20
40
X (mm)
-40
Figure 4
0 0
X (mm)
An example of th e ave ragin g tec hnique employed in thi s stud y. (a) The two component data sets with points denoted by o and D. (b) The coordinate pairs co nnected by line segmen ts. (c) The piecewise linear curve with corners marked by so li d circl es represents the average curve.
40
e.§30
0
,.
0 0
0
20
10
·50
·30
·20
· 10
10
20
30
Xlmml
Figure 5
The best cubic fit to a single production of the phoneme /i/ . The front of the tongue is to the left, the center baseline at the origin.
195
Tongue shape during vowel production
is that function whose coefficients are the averages of the coefficients of the component polynomials. In a previous paper (Morrish et a!. , 1984) the constant terms and the coefficients of the x and x 2 terms of quadratic and cubic polynomials were shown to be reliable parameters of tongue shape. To determine if they also vary between vowels in a consistent manner, the coefficients were compared pair-wise using the Wilcoxon statistic (Bickel & Doksum, 1977). This statistic is nonparametric, so no assumptions about the data are made beyond that of the existence of continuous distribution functions. If the lower order fits proved inadequate to distinguish between the vowels examined, the higher order fits could be analyzed . However, in the interest of efficiency, the lowest order polynomials which distinguished between the vowels were of the most interest. Results Curve averaging Figure 6 shows the average tongue shapes plotted along with the five sets of data points from which they were calculated . The apparent smoothness of these piecewise-linear curves derives from the fact that about 140 points are used to generate each of them. Note also that the points graphed do not always lie exactly on a radius of the grid. This is due to measurement error. Table I lists an estimate of overall variability in position for the five vowels. These values represent overall variability, which is the aggregate of instrumental variation (arising from such factors as transducer positioning), measurement error (actually locating points on the photographs of the ultrasound image) and physiological variability in tongue position within a vowel. Measurement errors have been computed previously, and average about 0.5 mm (cf. Stone eta!. , 1983). The impact of instrumental variability is unknown. However, if the three types of errors are assumed to be normally distributed with zero mean, it is possible to obtain an approximate upper bound on biological and instrumental variability in the sense of RMS error of about 1 mm (Bickel and Doksum, 1977). Note that this is an estimate of overall variability . Table I indicates that variation may be greater for some vowels than for others, but it is impossible to determine this from the available data. Figure 6 illustrates that variation may not be constant along the midsagittal tongue surface within a single vowel; but an a posteriori analysis of the variation i~ not advisable. It is almost certain that there will be a region over which variation is significantly larger or smaller than average by random chance. Figure 7 exhibits the five averaged tongue shapes. To determine if they were significantly different, the closeness of pairs of average curves was measured, using the RMS Table I
Root mean squared error of tongue position within vowels
Phoneme /i/
/ul /o/
Ire/ /a/
Total variation (mm)
Percentage of average range of y -values
1.36 1.22 1.30 0.95 1.32
4 3 4 3 4
196
K. A. Morrish et al. 2nd largest
60 smallest
~v";'l
largest
vanat10n
20
10
_
·50
j_
40
·30
20
·10
10
20
30
X( mm )
60
2nd smallest variation
~>- 30 o'l. • muscle constriction
20
/
10
-30
-20
-10
10
20
30
X(mm)
60
v:<; smallest
muscle
20
10
-~
-40
- 30
-20
-10
10
20
30
X(mml
Figure 6
Average tongue shapes plotted along with the five sets of data points from which they were calculated. The front of the tongue is to the left, the center baseline at the origin. Observe the changes in variation along the tongue surface, and the consistent dips associated with muscle constrictions. The vowels represented on this page, from top to bottom, are /i/, /u/ and joj . The vowels represe nted on the next page are /re/ (top) and /a / (bottom).
error between y-values at x-values which corresponded to original data points for each vowel. Table II exhibits the results, note that the average difference in position of the tongue during production of two different vowels may be less than 3 mm (as with /i/ and /u/) or nearly 14 mm (as with jij and joj) .
197
Tongue shape during vowel production 60
smallest
50
variation
4()
l:JJ
;:::
20
10
-60
-40
-30
-20
-10
10
4()
30
20
Xlmm) 60 smallest variation
60
4()
~30
;:::
~
20
Of! 10
-50
-40
-30
-20
-10 Xlmm)
10
30
20
40
Figure 6 cont.
Using the values in Tables I and II and a normal approximation, all but one of the tongue-shape pairs were significantly different at the p = 0.05 level. Curve fits Figure 8 is a representative example of the average quadratic and cubic fits plotted along 60
50
40
]
'"''"' 50
Figure 7
40
-JO
I
-20
-10 Xlmm )
10
2o
:lo
40
All five averaged vowels: the front of the tongue is to the left , the center baseline at the ori gin . Notice the rightward progressio ns of both the points of maxi mum displacement and the posteri or muscle constrictions (see a lso Figs 3, 5 and 6a, a nd b) in th e order /i/ , fuf, /of, /ref and /of.
198
K. A. Morrish et al. Table II
Root mean squared error between vowels (mm)
/i/
fa/
/of
fuf 2.47*
10.35t 8.73t 3.24;
/u/ /of /ref
13 .99t 12.47t 7.54t 4.361
*p = 0.18. tp < 0.01. !p = 0.04. *p = 0.01.
with the five sets of data points from which they were calculated. Table III exhibits the RMS errors between the average curves and the data . This estimate of error takes into account the order of the polynomial by dividing the sum of squares of differences by the number of differences minus the order of the polynomial plus one. Hence, the RMS error may be larger for a higher order fit, as can be seen in the fift h and sixth order fits to /o f. Discussion Table II holds the answer to the question asked in the introduction to this paper: does tongue shape differ significantly between vowels? In general, the answer is affirmative. Only one pair, /i/ and /u/, are outside of the p = 0.05 range. There are several reasons for this. First, this subject's speech is typical of the Maryland dialect in that he produces an "anterior" fu/. Subjects with different backgrounds may be expected to exhibit more distinct tongue locations for these two vowels. Second , the alveolar context of the carrier phrase enco uraged production of a more anterior juf. Finally, it is well known that the addition of lip rounding in fu/ lowers the first and second formants , providing a more / u/-li ke quality to the acoustic output (cf. Pickett, 1980). This increases the acoustic difference between /i/ and fu/, offsetting the effects of tongue shape similarity. Neverthe-
60
-cubic
-- - -- Quadratic
20
10
·50
Figure 8
40
·30
·20
. 10 Xlmm )
10
20
30
40
Average quadratic and cubic fits to fu/ plotted along wi th the five sets of data points from wh ich they were ca lculated . T he front of the tongue is to the left , the center baseline at the origin . Notice that the quadratic fit does not represent the shape of the tongue surface well. The cubic fit , which matches the general configuration of the tongue, fa ils to reflect certa in small but consistent features, such as the posterior muscle constrictions.
199
Tongue shape during vowel production Table III
Root mean squared error between average polynomial fits and data (mm)
Order of po lynomial 0 I 2 3 4 5 6 7
/i/
(u(
(of
(re f
fa/
11.76 8.82 4.78 1.78 1.78 1.78 1.76 1.77
12.39 9.07 3.73 1.97 1.92 1.64 1.60 1.61
10.91 9.46 2.43 1.96 1.61 1.49 1.56 1.59
8.93 8.15 1.71 1.32 1.13 1.10 1.10 !.II
8.84 8.59 1.92 1.66 1.37 1.36 1.35 1.36
less, this subject's /i/ and /u/ can be distinguished when only a selected portion of the tongue is considered. It is known that the front of the tongue rises higher in the mouth during production of /i/ than it does while j uj is being produced. To use this phenomenon to help distinguish between /i/ and j uj , a portion of the tongue toward the front was selected . The first two points taken from the front of the scan were discarded due to the fact that air in the mouth under the tongue tip makes the ultrasound scan less clear anteriorly. A section 20 mm long in the x-direction was chosen so that at least five points from each set of data could be included. The interval used was [-50 mm, - 30 mm]. Over this region , a one-tailed test can be performed to provide additional power of discrimination. Note that it is essential that the criteria, such as the interval over which the curves are to be compared, be set prior to obtaining data. This is due to the fact that noise and physiological variation in vowel production may occasionally make random portions of the tongue appear significantly different between vowel productions when in fact these portions are simi lar. Trying to show that two shapes are different by first locating a region in which the data differs and then testing it to show that it is different almost guarantees that any two shapes will test as significantly different. When calculations are performed on the restricted region d iscussed above, /i/ and / u/ turn out to be different at the p = 0.05 level (one-tailed). Therefore, even with only five productions of each vowel , it may be said that all of the vowels tested involve significantly different tongue shapes. Although curve averaging was undertaken to determine whether or not specific vowels could be distinguished from one another, it has yielded other interesting results as well. The fact that the RMS error in tongue position within vowels may be as large as I mm , and that /i/ and j uj are often within a few millimeters of one another, makes the use of a single vowel production to represent the behaviour of the tongue during general production of the vowel inadvisable. Figure 9 illustrates how a researcher could draw misleading conclusions by observing two single vowel productions. Although tongue pos ition during production of /i/ is higher in the front of the scan than during production of j uj on average (Fig. 7), Fig. 9 shows that a single production of j uj may exhibit a position that closely coincides with a single production of /if. However, averaging several productions of each vowel reduces the uncertainty associated with average tongue shape by a factor proportional to the inverse of the square root of the number of productions averaged, assuming a normal distribution of y-values at a fixed x-value. Thus, the more images used, the better the resulting estimate. For a standard deviation of 1.23 mm (the average of the values in Table I) and five productions, the 95% confidence interval
200
K. A. Morrish et al. 6G
• / i/
50
~
i
. 0
o / u/ 0
~
•o
• .,
~
40
i
. 0
E
E30
;;::
. 0
o. o
,p
•o•o ,P•o .o •
20
0 0
•• 08
10
-50
Figure 9
-40
-30
-20
-10 Xlmml
0
10
20
30
40
Single productions of the phonemes fi/ and fu/. The fron t of the tongue is to the left, the center baseline at the origin. Notice that, for this pair, /i/ and fu/ appear not to differ in the front of the scan. We have seen that , on average , /i/ is significantly higher than /u/ in this region. Thus, erroneous conclusions about the tongue configuration may be drawn if the number of phoneme productions examined is small.
aro und the averaged vowel is a band 1.17 mm on a side. For I 0 productions, the band drops to 0.71 mm per side. Although a patient cannot be scanned with X-ray cineradiography I 0 times per vowel, ultrasound makes multiple repetitions feasible. Figure 6 suggests thaL variation along the tongue surface is not constant. We conjecture that the position of the tongue may be more important or restricted at points of small variation than at points with large variation. For instance, a region of small variation in each figure moves rightward (toward the posterior tongue) in the order /i/ , fu/, fo/, /ref and jaj. This roughly corresponds to the traditional order in which the maximum displacement of the tongue moves from front to back (Stevens, 1972). The locations of these small variations in tongue positioning, therefore, may represent constrictions in the vocal tract which are important for vowel production. On the other hand , the large variation at the left side of Fig. 6c may reflect the relative unimportance of the position of the front of the tongue as compared to the posterior tongue and lips in fo/. A similar large variation in / u/ can be -explained in this manner. The large variation at the back of the throat in fil suggests that this vowel gets its character by tongue features farther forward in the oral cavity. Verification of these observations awaits further study. About a quarter of the way from the right side of the scans in Fig. 6 can be seen a small dip in the tongue surface. This feature moves rightward (posteriorly) in the order /i/ , fu/, /of, /
Tongue shape during vowel production
201
low-order polynomial fit will reflect only the general shape of the tongue surface. Because the regions at the back of the tongue which are concave up represent a small perturbation in tongue position relative to its overall behavior, the fits which were performed failed to model it. By choosing a polynomial of higher order, this feature could be taken into account. However, higher order polynomials are more apt to reflect measurement error, so that the coefficients of powers of x decrease in reliability as the power of x increases . This problem with higher order polynomials is reflected in Table III. As the order increased from zero, the RMS error decreased. At some point, however, RMS error increased with increasing order for all vowels. This was due to the fact that the additional term added to the polynomial did not change the shape of the function enough to overcome the loss of a degree of freedom to the fit. Little or no advantage is gained by increasing the order of fit beyond 3 for /i/, 4 for jre j and jaj , or 5 for juj and joj . It is difficult to compare these results with those of other models, due to differences in technique and experimental goals. For example, Hashimoto & Tanimoto (1978) report a standard error of 1.5 mm in conic section fits to the shape of the highest surface of the tongue over most of its length. This can be expected to compare unfavourably with the present results (divide the numbers in Table III by .j5) because a greater percentage of the tongue was considered. The PARAFAC procedure used by Harshman et al. (1977) was applied to single tokens from multiple subjects rather than multiple tokens from a single subject. In addition , the vector components of a factor analysis vary from study to study. Thus, comparing the results of such an analysis to a polynomial fit , in which the "basis functions " (the powers of the independent variable x ) remain constant between studies, is as much a philosophical task as it is a mathematical one. Although the low-order polynomial fits do not reflect details of tongue shape, some of them can still be used to distinguish between vowels. The coefficient of the x 2 term in the cubic equations differs between pairs of vowels at the p = 0.02 level (Wilcox). This parameter represents physically the change in slope of the tongue surface at an angle of about 45° to the vertical back into the throat. To achieve similar results using quadratic equations, both the x 2 coefficient and one of the other two parameters must be considered (Fig. 10). However, both types of polynomial fit produce parameters which vary significantly between vowels. Therefore, these parameters can be used to distinguish between vowels, and to characterize the associated tongue shape. Fits of order 0 and I did not distinguish between the vowels tested. Higher fits were not analysed for distinguishability, for the lower fits were adequate. Conclusion
From the analysis of average tongue shape, it can be seen that the shape and position of the posterior midsagittal tongue during production of the vowels /i/, j uj , joj , jre j and jaj varied significantly between vowels for the speaker studied. The average technical and physiological variability within a vowel was about a millimeter for this speaker. In addition, total variability seemed to decrease over the length of the tongue scanned as the importance of tongue position in the vocal tract increased. In the study of vowel production, it would be of inestimable value to find a normal range of production, or standard, agai nst which disordered speakers can be compared. It has been shown here that such a standard can be found within a subject, by exhibiting several reliable parameters with which to measure differences in tongue shape. Whether a standard can be formulated which takes into account variation between normal speakers remains to be seen. Even if this is not possible, the researcher may still be able
202
K. A. Morrish et a!. -1·2 f-
•• 0
- 1·3 f- I ii A 0 E E
~
-1·4 -l·:i
"'
0
-.,
1m! 0
A~•
...
~..
-
-1·7
-
•
-1 · 8
-
-1·9
-
•
0
u
-2·0
Ia!
••
- 1·6
c
•• • • •
0
X
0
0
/of
I
.I
-0-7-0·6 -0·5
l
l
-0·4 -0·3
1
I
-0·2
-0·1
Coefficient of X term, unitless
Figure 10
Plot of the coefficients of the x and x 2 terms of quadratic polynomials for the five productions of each of the five vowels.
to use relative measurements in scoring a single speaker's utterances, such as the ratios of the coefficient of the x 2 term between pairs of vowels. The authors wish to thank Diana Thompson for data analysis, and Phoebe A. Kent and Anne Boyden for their editorial assistance.
References Alfonso , P. & Baer, T. ( 1982) Dynamics of vowel articulation, Language and Speech, 25, 151 - 173. Bickel, P. & Doksum , K. ( 1977) Mathematical statistics: basic ideas and selected topics, San Francisco, CA: Holden-Day. Boyce, W. E. & DiPrima, R. C. (1970) lmroduction to differential equations. New York : John Wiley and Sons. Harshman, R., Ladefoged, P. & Goldstein, L. ( 1977) Factor analysis of tongue shapes, Journal of the Acoustical Society of America, 62, 693- 707. Hashimoto, K. & Sasaki , K. ( 1982) On the relatio nsh ip between the shape and position of the tongue for vowels, Journal of Phonetics, 10, 29 1- 299. Hashimoto, K. & Tanimoto, M. ( 1978) Approximation of the tongue profile by a quadratic curve, Journal of Acoustical Society of Japan , 34, 140- 148. Kent , R. & Moll , K. (1982) Tongue body articulation during vowel and dipthong gestures, Folia Phoniatrica, 24, 278- 300. Knott, G. & Shapiro, M. ( 198 1) MLAB Reference Manual, lOth Edition. Bethesda, MD: DCRT-NIH. Kuehn , D. & Moll, K. ( 1976) A cinerad iographic study of VC and CV articulatory veloci ties, Journal of Phonetics, 4, 303- 320. Lindblom, B. & Sundberg, J. ( 197 1) Acoustical consequences of lip, tongue, jaw and larynx movement, Journal of the Acoustical Society of America, 50, 1166-- 1179. Lubker, J. , McAllister, R. & Lindblom, B. ( 1977) On the notion of interarticulator programming, Journal of Phonetics, 5, 213- 266. Mermelstein , P. ( 1973a) Articulatory model of speech production, Journal of the Acoustical Society of America, 53, 1070- 1082. Mermelstein , P. (1973b) Some articulatory manifestations of vowel stress, Journal of the Acoustical Society of America, 54, 538- 540. Morrish , K., Stone, M., Sonies, B. , Kurtz, D. & Shawker, T. ( 1984) C haracterizatio n of tongue shape, Ultrasonic Imaging, 6, 37-47 .
Tongue shape during vowel production
203
Perkell, J. ( 1969) Physiology of speech production: results and implica tions of a quantitative cineradiographic stud y, Research monograph 53. MIT Press. Perk ell , J. S. & Nelso n, W. L. ( 1982), Articulato ry targets and speech motor control: a study of vowel production. In The motor control of speech (Grillner, S. , Persso n, A ., Lindblom, B. & Lubker, J. editors) pp. 187- 204. New York: Pergamon Press. Pickett, J. (1980) , The sounds o.l speech communication, p. 52. Baltimore: Uni ve rsity Park Press. Shawker, T., Sanies, B. & Stone, M. ( 1984), Soft tissue anatomy of the tongue and floor of the mouth: an ultrasound demonstrati o n, Brain and Language, 21 , 335- 350. Shawker, T. & Sanies, B. (1984) Tongue movement during speech : a real-time ultraso und eva luation, Journal of Clinical Ultrasound, 12, 125- 133. Shirai, K . & Honda , M. ( 1976), Estimation of articula to ry motion . in Dynamic Aspects of Speech Production (M. Sawashima & F. S. Cooper, editors), pp. 279- 304. Tokyo: University of Tokyo Press. Stevens, K . ( 1972), The quan ta! nature of speech: Evidence from articulatory-acoustic data . In H uman communication: a unified vie111 (P. Denes & E. David, ed itors), pp. 5 1- 61. New York: McGraw-Hill Book Co . Stone, M. , Sanies, B. , Shawker, T., Weiss, G. & Nadel , L. ( 1983), Analysis of rea l-time ultrasound images of tongue configuration using grid -digitizin g system, Journal of Phone1ics, II , 207- 2 18. Wood , S. (I 975), The weakness of the tongue-arching model of vowel articulation , Working Papers 11 , pp. 55- 107. Lund , Sweden: Lund University. Wood , S. ( 1979), A radiographic ana lys is of co nstriction location for vowels, Journal of Phonetics, 7, 25-43.
Appendix Given m functions of x,,[;(x), i = I, 2, . .. , m , defined on an interval [L , R] , the average of the _t;(x) is taken to be that function .f(x) which minimizes F(f):
;~
F(f) =
1r
2
(f(x) - .f(x)) d x.
(I)
Suppose that each [;(x) is a polynomial of degree n, of the form (2) where the a i.i are constants. Further, assume that.f(x) is an analytic function : therefore, it is given by a power series 00
f(x)
=
L ajx
1
(3)
J~ O
Substituting these funct ional forms into Equation (1), taking derivatives of F with respect to each a1 and setting them to zero, we obtain
00
Ri + k+ l
U+k+IJ
+ j =In + l a.' ---.,..--...,..---J+k+l . An obvious solution to this set of equations is a1 = -
I "'
m
L ai. J for j
;~ 1
~
m,
and a1 = 0 for j > m. Because these are linear, independent equations, this must be the unique so lution . Thus, the function .f(x) which minimizes Equation (I) is the n-degree polynomial with coefficients equal to the average of the coefficients of the f(x) .