Journalof Voice
Vol. 2, No. 2, pp. 111-117 © 1988Raven Press, Ltd., New York
Vocal Tract Parameters Associated with Voice Quality and Preference Thomas Murry Veterans Administration Medical Center and University of California at San Diego, San Diego, California, U.S.A.
Summary: The purpose of this study was to obtain, vowel formant measures and determine their relationship to previously obtained ratings of voice preference. Two groups of 20 adults, 10 with no laryngeal pathology and 10 diagnosed with a laryngeal pathology, provided vowel samples from which formant frequency and amplitude measures were obtained. These samples were also judged by trained listeners for voice preference and several voice qualities. Using a multiple regression model, it was shown that judgments of voice preference, hoarseness, and nasality are based primarily on laryngeal features. One vocal tract parameter, namely the ratio of F3/F~ amplitude, was related to the perception of vocal effort. Key Words: Voice quality--Voice preference-Hoarseness--Voice analyses.
relationships among of voice qualities, vocal preference, and supraglottal contributions of the vocal tract. In 1986, Murry et al. (7) reported on the acoustic features of a vowel judged by two groups of listeners for overall voice preference. From ratings of voice preference found to have reliability coefficients of .88 or higher, a model of voice preference incorporating voice qualities and their acoustic correlates in normal and pathologic voices was derived. Using a multiple regression technique, four acoustic factors made up this model including average jitter, harmonic-to-noise ratio, autocorrelation function, and standard deviation of the fundamental frequency. Figure 1 summarizes the results of those models. For the normal voices, overall voice preference was a function of the average jitter plus the standard deviation of the fundamental frequency. This two-factor model produced a prediction level for overall voice preference of 0.94. Adding other variables did not significantly increase the predictability of overall voice preference but did account for 99% of the variance. For the pathologic voices,
Early descriptions of voice quality have relied on listener judgments to determine the presence and degree of hoarseness, breathiness, or roughness in voices (1,2). In 1985, Michel suggested that it was necessary to describe voice quality in terms of perceived function and then to verify perception by actual measurement of normative data (3). In doing so, the goal is not to obtain a one-to-one correspondence of perceptual to acoustic features but rather to determine how much information the acoustic analyses offer for interpreting the perceptual decisions of listeners. Acoustic correlates of perceptual decisions have been demonstrated by the present author (4), by Yumoto et al. (5), and Arnold and Emanuel (6), to name just a few. These investigators have focused on the laryngeal measures of fundamental frequency and periodicity. However, there is relatively little information to describe the
Address correspondence and reprint requests to Dr. T. Murry at Speech Research Laboratory, V.A. Medical Center, 3350 La Jolla Village Drive, San Diego, CA 92161, U.S.A. Presented at Care of the Professional Voice Symposium at The Juilliard School, New York, June 2, 1987.
111
112
T. M U R R Y
a two-factor model resulted in 0.78 prediction level. Adding two additional variables accounted for a 2% improvement in the prediction of voice preference. These data would imply that reliable judgments of overall preference of a normal voice depend on only two laryngeal acoustic factors to account for 94% of the variability, whereas four factors are needed to account for 80% of the variability in pathologic voices. Because the acoustic laryngeal data account for most of the variability in these voices, it was hypothesized that vocal tract information would not significantly improve the prediction level of overall voice preference. In the present study, an attempt was made to determine if the addition of vocal tract parameters to the above model would account for additional variability in specifying preferred voices. A single vowel was selected to provide relative independence from other possibly confounding parameters such as reading rate, dialect, phonetic differences, etc., which have previously been shown to be related to perceptual voice ratings (8). The purposes of this study were: (a) to determine the relationship between vocal tract parameters and voices previously identified to be either high or low in preference, and (b) to determine the contribution of vocal tract parameters to models of voice preference. Percent of Variability Explained by Acoustic Measures 25 D
50
75
;
!
100
NORMAL
OA
= JITTER.
OA
= JITTER • SD(FO) + AUTOCOR + H/N
SD(FO)
PATHOLOGIC
OA = A U T O C O R + SD(FO)'~ H / N + J I T T E R
FIG. 1. Normal and pathological models of voice preference based on four laryngeal acoustic measures: harmonic-to-noise ratio (H/N), average jitter, autocorrelation function, and standard deviation of the fundamental frequency (SD/F0).
Journal of Voice, Vol. 2, No. 2, 1988
METHODS Subjects Two groups of 20 adults randomly chosen from a library of voice recordings served as the talkers. The first group consisted of normal men between the ages of 22 and 56 years with no history of voice disorders. The second group, called the pathologic group, consisted of men between the ages of 24 and 59 years with a confirmed diagnosis of laryngeal pathology. Five subjects in each of the two groups receiving the highest and lowest overall ratings of voice preference were selected for this study. The method for determining the high and low preference voices has been described in a previous report (9). To summarize, each speaker produced a sample of the v o w e l / a / a t a comfortable speaking level. From these samples, ratings were obtained from two groups of listeners, six speech pathologists and six vocal performers. The listeners were asked to rate the voices on six parameters: overall preference, pitch, breathiness, effort, hoarseness, and nasality. All ratings were obtained using a seven-point equal interval scale. Listeners were not given any definitions but asked to use their expertise in making the ratings. The listeners heard each group of voices twice and reliability coefficients were obtained from each judge for each group of voices on all six perceptual parameters. From the overall preference ratings of both groups of listeners, the five highest and five lowest normal and pathological talkers provided the acoustic data for this investigation. To obtain the formant data for this study, the recordings of the 20 subjects were processed through a Hewlett-Packard Spectrum Analyzer. The system was programmed for FFT analysis of five 100 ms epochs for each vowel using a 15 Hz bandwidth. The resulting output of this process is shown in Fig. 2. From the spectral plots, the first three formant frequencies and their relative amplitudes were visually identified and marked. Six amplitude-related measures were obtained for each vowel sample and were used to examine the vocal tract contribution to voice preference. The amplitude measures were selected due to the high degree of variability in the f r e q u e n c y data, especially in the pathological samples. The vocal tract data were subjected to correlation analysis with four perceptual parameters:
VOCAL TRACT PARAMETERS OF VOICE QUALITY A N D P R E F E R E N C E
RANGE:
-21
dBV
A: HAG
STATUS: RHS: 5
113
PAUSED OVLD
-23 ciBV F1
670
-28.98
F2
1210
-31.59
F3
2010
-43.42
7.079 mVPms /DIV
START:
0 Hz
A
,~BW:
15 Hz , ~
STOP:
4 000
Hz
FIG. 2. Spectral display of one subject showing the three formant locations (arrowheads) and listing their formant frequencies and relative amplitudes (upper right).
overall preference, and ratings of hoarseness, nasality, and effort. In addition to the correlation analysis, regression analyses were performed using laryngeal data obtained from these same talkers [and reported in a previous study (9)] and the vocal tract information from this study. The acoustic features were regressed onto each of the perceptual features. RESULTS Figure 3 displays four of the five voices with the highest overall preference ratings. Note the periodic nature of the partials throughout the spectrum. By contrast, Fig. 4 displays the four voices with the lowest overall preference ratings. There is a loss of periodic energy in the mid to upper frequency region of the spectrum and there are fewer partials in the low-preference voices that are identifiable. These two groups differ in the number of partials. Four perceptual features--overall preference, hoarseness, effort, and nasality--were examined in relation to the six vocal tract amplitude measures. Only one vocal tract parameter was statistically related to a perceptual feature. That was the significant correlation between the ratio of F3/F 1 and the
perception of effort for the normal control subjects. Thus, the data from listener judgments of voice quality using a sustained vowel suggest that vocal tract measures are not significantly related to hoarseness, nasality, or overall voice preference. The relative contribution of the vocal tract parameters to voice preference and to the perception of hoarseness, effort, and nasality was examined next. For this study, three vocal tract parameters were s e l e c t e d - - t h e ratios of Fz/F1, F3/F 1, and F3/F 2. The amplitude ratio data were selected because they showed less variability in the sampling procedure and also because previous studies have shown that vocal tract ratios are related to perceptual and acoustic measures of normal and pathological voices (10,11). The formant amplitude ratios were combined with the four previously obtained laryngeal variables. Multiple linear regression analyses were performed on the normal and pathologic data separately and a regression coefficient (R 2) was obtained for each group. In addition to obtaining a model for overall voice preference, models for each of the three other perceptual parameters were obtained for the normal and pathological voices. Figure 5A presents the model for overall preference for the normal and pathologic
Journal of Voice, Vol. 2, No. 2, 1988
114
T. M U R R Y
HIGH P R E F E R E N C E
3
b
LOW P R E F E R E N C E
~, kHz
Frequency
o
1
2
a
kHz
Frequency
FIG. 3. Spectral plots of four of the five high-preference normal subjects. Frequency is shown on the abscissa and relative amplitude is shown on the ordinate.
FIG. 4. Spectral plots of four of the five low-preference pathologic subjects. Frequency is shown in the abscissa and relative amplitude is shown on the ordinate.
groups. Adding vocal tract information did not change the regression coefficient (R 2) value for the normal voices. One percent of the variance remains unexplained in the regression equation. For the pathologic model, adding the vocal tract information increased the prediction by 18% and only 2% of the variability remains unexplained. Thus, the inclusion of vocal tract information was not necessary to increase the predictability of the normal group, whereas addition of the vocal tract information to the regression model increased the predictability of overall voice preference by 18% in the pathological group. This difference was statistically significant at the 0.05 level. Figure 5B displays the
multiple regression models for effort. Laryngeal variables accounted for 90% of the variability in the normal voices, the three vocal tract parameters accounted for 5%, and 5% remains unexplained in this model. Adding vocal tract information did not significantly increase the predictability of effort judgments for the normal voices. For the pathological voices, an additional 13% of the variance is explained when the vocal tract information is included. The increase differs significantly at the .05 level from the model using laryngeal features alone. One percent of the variance remains unexplained. Figure 6A shows the multiple regression model for hoarseness. Laryngeal features account for 95%
Journal of Voice, Vol. 2, No. 2, 1988
VOCAL TRACT PARAMETERS OF VOICE QUALITY A N D P R E F E R E N C E A
NORMAL
115
PATHOLOGIC
%
9
FIG. 5. A: Model of overall preference for the normal and pathologic subjects. The model was derived from four laryngeal and three formant ratio parameters. B: Model of vocal effort for the normal and pathologic subjects.
B
12%
87%
ORT
[ ] LARYNGEAL
of the variability in the perceptual judgments of normal voices; adding vocal tract information increased the prediction by 3%. In the pathological population, adding vocal tract information to the prediction equation increased the variance accounted for by 5%, whereas 4% of the variance remains unexplained. The vocal tract information was not statistically significant at the 0.05 level for either the normal or the pathologic voices. The multiple regression model for nasality is shown in Fig. 6B. For the normal subjects, laryngeal parameters account for 80% of the variance and when the vocal tract information is added, an additional 7% of the variance is accounted for, with 13% remaining unexplained. The increase in variance accounted for was not statistically significant
['~ VOCAL TRACT
mUNEXPLAINED
at the .05 level. For the pathological voices the laryngeal features account for 99% of the variance. DISCUSSION The results of this study suggest that laryngeal parameters account for the overwhelming acoustic information used by listeners when making reliable j u d g m e n t s of hoarseness, effort, nasality, and overall voice preference from isolated vowel productions. Furthermore, the perceptual variability not accounted for by laryngeal parameters is only partially accounted for by the vocal tract parameters examined in this study. From the ratings of overall voice preference, a Journal of Voice, VoL 2, No. 2, 1988
116
A
T. M U R R Y
NORMAL
PATHOLOGIC
HOARSENESS FIG. 6. A: Model of hoarseness for the normal and pathologic subjects. B: Model of nasality for the normal and pathologic subjects.
B
)
•
%
13%
99%
80%
NASA
["]LARYNGEAL
[ ] VOCAL .TRACT
model based on multiple regression was derived showing that four laryngeal features explained 99% of the variance associated with the voice preference judgments of normal voices. Adding vocal tract information to the model did not explain the remaining error variance. In the pathologic voices, 18% of the variance was explained using vocal tract information. The regression models for explaining judgments of hoarseness and effort in both talker groups required laryngeal and vocal tract information to account for 99% of the perceptual variability. HowJournal ()f Voice, Vol. 2, No. 2, 1988
•
UNEXPLAINED
ever, for the pathological group, the increase was significant for predicting vocal effort. Nasality differed in the two groups; for normal voices, 5% of the variance was explained using vocal tract information whereas laryngeal parameters explained 99% of the variance in pathological voices. Thus, it appears that when listeners rate perceptual features of sustained vowels in normal voices, they use primarily laryngeal features. To a degree (more in the pathological voices than in the normal) they also make use of vocal tract parameters, and that information significantly increases the prediction of
VOCAL TRACT PARAMETERS OF VOICE QUALITY A N D P R E F E R E N C E
overall voice preference and vocal effort for the pathological subjects. The results of this study also confirm previous studies, which found that the perception of hoarseness is related to the harmonic-tonoise ratio in normal voices and not to any of the other laryngeal parameter. This finding may now be extended to vocal tract parameters suggesting that hoarseness is specifically a laryngeal feature but that effort and nasality may be features of both the laryngeal and vocal tract subsystems. The harmonic-to-noise ratio as well as perceived hoarseness may be useful acoustic and psychoacoustic markers for monitoring changes in the performer because these two measures are primarily laryngeal and are not confounded by vocal tract information. Unlike hoarseness, the perception of vocal effort is related to several laryngeal parameters and to at least one vocal tract parameter, namely, the ratio of F3/F1 amplitude. This perceptual feature, which has also been related to loudness and respiratory function, appears to be more than a parameter of voice quality. Vocal effort may be a function of the respiratory system, laryngeal system, and/or the shape and opening of the vocal tract. Vocal effort may be more appropriately represented as speech effort because it may involve more than the laryngeal mechanism. Finally, nasality judgments appear to involve complex perceptual strategies in normal subjects. Neither the four laryngeal features nor the three vocal tract amplitude features could account for 13% of the variance in the linear regression model.
117
Acknowledgment: This study was supported in part by the Veterans Administration Merit Review Program. The author gratefully acknowledges the support and helpful suggestions of Michael Caligiuri, Ph.D.
REFERENCES 1. Shipp T, Huntington D. Some acoustic and perceptual factors in acute laryngitic hoarseness. J Speech Hear Disord 1965;30:350-9. 2. Van Riper C, Irwin J. Voice and articulation. Englewood Cliffs, New Jersey: Prentice Hall, 1958. 3. Michel J. The elusive nature of voice quality. In: Lawrence V, ed. Fourteenth symposium: care o f the professional voice. New York: Voice Foundation, 1985:178-83. 4. Murry T. Psychophysical judgements of voice characteristics. In: Lawrence V, ed. Fourteenth symposium: care o f the professional voice. New York: Voice Foundation, 1985:214-21. 5. Yumoto E, Gould WJ, Baer T. Harmonics-to-noise ratio as an index of the degree of hoarseness. J Acoust Soc A m 1982;71:1544-50. 6. Arnold KS, Emanuel E Spectral noise levels and roughness severity ratings for vowels produced by male children. J Speech Hear Res 1979;22:613-26. 7. Murry T, Brown WS, Rothman H. Judgements of voice quality and preference: acoustic interpretations. In: Lawrence V, ed. Fifteenth symposium: care o f the professional voice. New York: Voice Foundation, 1986. 8. Huntley R, Hollien H, Shipp T. Influences of listener characteristics on perceived age estimations. J Voice 1987;1:4952. 9. Murry T, Brown WS Jr, Rothman H. Judgments of vocal quality and preference: acoustic interpretations. J Voice 1987;1:252-7. 10. Murry T, Singh S. Multidimensional analysis of male and female voices. J Acoust Soc A m 1980;68:1294-300. 11. Murry T, Singh S, Sargent M. Multidimensional classification of abnormal voice qualities. J A c o u s t Soc A m 1977 ;61:1630-5.
Journal of Voice, Vol. 2, No. 2, 1988