Journalof Voice
Vol. 9. No. 4, pp. 429--438 e3 1995 Lippincott-Raven Publishers. Philadelphia
Measures of Vocal Function During Changes in Vocal Effort Level Daniel Zaoming Huang, Fred D. Minifie, tHidelzi Kasuya, and *Sarah Xiao Lin Department of Speech and Hearing Sciences, University of Washington, and *Tiger Electronics, Inc., Seattle, Washington, U.S.A.; and tDepartment of Electrical and Electronic Engineering, Utsunomiya University, Utsunomiya, Japan
Summary: The purpose of this article is to present the results of a controlled study of the day-to-day variabilities of three acoustic parameters (jitter, shimmer, and normalized noise energy), and two electroglottographic parameters (contact quotient and contact quotient perturbation) for vowels produced at three vocal efforts (low, normal, high). Data were obtained with use of a sophisticated bilinear interpolation pitch detection method. A repeated measures design required subjects to produce the vowels/2e/and/a/five times a day over 3 days at each vocal effort level. The jitter, shimmer, and normalized noise energy values from acoustic measures and contact quotient and contact quotient perturbation values varied significantly among the three vocal effort levels. The clinical implication of this finding is that vocal effort must be controlled in order to obtain consistent clinical measures. Furthermore, dayto-day variability must be taken into account if representative measures are to be obtained for clinical use. Key Words: Jitter--Shimmer--Glottal noise-Perturbation--Contact quotient--Contact quotient perturbation--Vocal effort-Vocal function--Pitch period detection.
perturbation (CQP), which is the c y c l e - t o - c y c l e variation in contact quotient (CQ), cycle-to-cycle fundamental period variations from the E G G signal (EGG-jitter), and cycle-to-cycle peak-to-peak variations from the E G G signal (EGG-shimmer) (2,9, I0). The usefulness of such measures as indicators of vocal function is dependent upon their reliability and the sensitivity of the measures to changes in vocalizations. Previous studies have looked at intrasubject variability of vocal jitter in voice signals from day to day (11,12), the relationship of vocal jitter to voice intensity levels (13), vocal j i t t e r changes with the aging voice (14), and differences in vocal jitter from vowel to vowel (15). Similar studies need to be done to indicate the relative stability of each of the acoustic measures and E G G measures used to evaluate vocal function. Accurate characterization of acoustic and E G G measures is essential not only in the evaluation of vocal pathologies, but also in the accurate modeling
Scientists have long known that clinical use of acoustic and electroglottographic (EGG) measures provides a convenient and noninvasive way to evaluate laryngeal function (1-6). The three acoustic perturbation measures that have received the most attention in the literature as indicators of vocal function are cycle-to-cycle variations in fundamental period (jitter), cycle-to-cycle variations in peakto-peak amplitude (shimmer), and normalized noise energy (NNE) (7,8). There are four EGG measures that also provide useful information about normal and pathological vocal function. The four E G G measures are contact quotient (CQ), the percentage of time the vocal folds are in contact (closed) during each cycle of vocal fold vibration, contact quotient
Accepted October 25, 1994. Address correspondence and reprint requests to D. Z. Huang, Department of Speech and Hearing Sciences, University of Washington, JG-15, Seattle, WA 98195, U.S.A.
429
430
D. Z. H U A N G E T AL.
of the voice source for the speech synthesis (16). Two of the major questions about acoustic measures and EGG measures remain unresolved: (a) how do these measures change with changes in vocal effort level, and (b) what is the day-to-day variability in these measures? A better understanding of the variability of voice perturbation of normal speakers at different vocal efforts over time is needed, therefore, before the use of acoustic measures and EGG measures can be used appropriately as clinical measures for voice assessments. There are three purposes for this study: (a) to introduce a bilinear interpolation pitch detection method, (b) to use the new pitch period detection method to evaluate the stability of acoustic and EGG measures of vocal function during changes in vocal effort level, and (c) to evaluate the stability of such measures from day to day. Therefore, before discussing the data from analyses of normal vowel production, this article will provide a discussion of the development and evaluation of the accuracy of the bilinear interpolation algorithm for application to quasi-periodic signals. METHOD OF ANALYSIS
Algorithm and terminology If acoustic and EGG measures are to be used as indicators of laryngeal function, it is crucial to have an accurate cycle-to-cycle pitch period detection method. This is because nearly all aspects of acoustic and EGG measures are based on the accuracy of pitch detection. For example, since perturbation measures rely on the accurate identification of each pitch period, measures of jitter and shimmer are dependent on the precision of pitch period detection. Similarly, the detection of glottal noise requires an accurate pitch period marker in order to match adjacent waveshape cycles. Various fundamental frequency (Fo) extraction methods have been summarized by Hess (17). They can be classified in two major categories: (a) eventdetection methods, such as the peak-picking and zero-crossing methods; and (b) short-time averaging methods, such as autocorrelation, minimal distances, amplitude magnitude difference function, cepstral analysis, and harmonic compression. Milenkovic (18) found that greater reliability and accuracy could be obtained by matching the entire waveshape across adjacent cycles rather than by identifying isolated events, like zero-crossing and Journal of Voice, Vol. 9, No. 4, 1995
peak-picking. Kasuya et al. (8) developed a rather accurate pitch detection method based on a cycleto-cycle average magnitude difference function (AMDF). This pitch detection method compares the sampled data points from one cycle of the waveform with those from adjacent cycles. Since the measures of jitter, shimmer, NNE, CQ, and CQP are based on the cycle-to-cycle similarity of the waveform, accurate determination of the pitch period is of crucial importance. This article presents a new method for determining pitch periods, from which jitter, shimmer, NNE, CQ, and CQP are measured. The method incorporates a bilinear interpolation procedure into the AMDF to evaluate the cycle-to-cycle waveform similarity in sustained vowel utterances. This method was selected based on experiments with synthetic speech showing that the method performs with greater accuracy than several other methods taken for comparison. The method of bilinear interpolation of sample points on the AMDF is shown in Fig. I. The pitch m a r k e r s (ql,q2 . . . . . q N - I , q N or ct,c2 . . . . . cN_ ~,cN), shown at the top of Fig. 1, are estimated with use of an automatic method based on zero crossings of the vowel waveform. The method then locates the pitch boundary on the basis of the AMDF to indicate the beginning of each pitch period. In this case, six points, shown at the bottom of Fig. l, around a primary dip in the AMDF are separated into two groups; one group includes a minimum AMDF point, whereas the other includes the second-most minimum point. Two lines are obtained from the two groups on the basis of the least mean square criterion. The point where two lines cross is regarded as the real pitch boundary, the beginning of a real pitch period (Pj,P2 . . . . . PN- l, PN), on which a pitch period sequence p(n), n =
1 PcHpooo
,
Pi
P2
P3
V V
PN-~
PN
Y
FIG, 1. Schematic illustration of the pitch detection method with bilinear interpolation of the average magnitude difference function (AMDF).
MEASURES OF VOCAL FUNCTION 1,2 . . . . . N, is defined as an interval of (P,+I Pn). A p e a k - t o - p e a k a m p l i t u d e a ( n ) , n = 1,2 . . . . . N, is obtained by finding a peak-to-peak amplitude value within each pitch period. For a sequence x(n), n = 1,2 . . . . . M, the perturbation quotient [PQ (%)] is, in general, defined as
100
PQ-M-k+
M-k+l
I Z n= 1
1-
k*x(n + m-
1)
k
Z x(n + j - I) j=l
(i) where k is the length of moving average (an odd integer >1) and m = (k + 1)/2. In our system k = 5 and m = 3. If x(n) is the pitch period p(n) of the acoustic signal, then PQ is the pitch period perturbation quotient (jitter), and if x(n) is the peak-topeak amplitude a(n) of the acoustic signal, then PQ is the peak-to-peak amplitude PQ (shimmer). Ifx(n) is the contact quotient sequence of the EGG signal, then PQ is the CQP quotient. These jitter and shimmer values are measured from the pitch period and peak-to-peak amplitudes, respectively. More than 50 cycles were used for each perturbation analysis, beyond the number of cycles recommended by Titze et al. (13). Only segments that had pitch period fluctuations within 10% in either a positive or negative direction of the mean pitch period (calculated in milliseconds) were analyzed. This criterion was used so that only very steady waveform segments would be analyzed for all subjects, thus minimizing variability due to selection of cycles for analysis. If no segment consisting of at least 50 cycles could be found to fit this cnterron, no perturbation values were computed for that vowel production. Less than 1% of the normal vowel prolongations were rejected by following this criterion. This criterion is more problematic in analyses of pathological voices because some abnormal voices have relatively few stable segments with pitch period fluctuations within 10% of the mean pitch period. Such extremely variant voices cannot be analyzed, because the nonlinear aberrations in perturbation measures yield results that cannot be interpreted meaningfully. Accuracy of the new pitch detection method will be discussed with synthesized steady vowels later. With r e s p e c t to the E G G - j i t t e r and EGGshimmer, Haji et al. (19) found that the EGG-jitter
431
was nearly equivalent to the jitter and shimmer obtained from acoustic signal, so that the EGG-jitter and EGG-shimmer data are not reported here. The CQ measure from the EGG signal provides unique information about the percentage of each cycle of vocal fold vibration during which the vocal folds are in contact (closed). This information is, for the most part, invisible to other available techniques (20). The CQP measure provides precise information about the rate, symmetry, and regularity of the vocal fold contact phase during vocal fold vibration (21). It is for these reasons that the CQ and CQP measures were chosen in our study. Rothenberg and Mahshie (21) suggested the use of variable baseline crossings with interpolation of a criterion level to demarcate the EGG contact phase. In the present study, a baseline of 25% of the peak-to-peak EGG amplitude of each wave is associated with the EGG minimal contact phase and is selected for measuring the CQ and CQP. The method of noise energy measurement used in this experiment provides more insight into perturbation measurement. The relative magnitude of noise included in the voice signal is evaluated with use of an acoustic measurement, NNE, described by Kasuya et al. (8). We have chosen to use the N N E measure because it can differentiate among normal and pathological voices more sensitively than does the harmonic-to-noise ratio (8,22). An adaptive comb filtering method is used in N N E for estimating vocal noise in normal and pathological voices. (This procedure was initially investigated for the enhancement of degraded speech due to additive white noise.) The N N E (decibels) is given by the equation: w(n) z n
NNE = I0 log Z x(n)2 + BL
(2)
n
where w(n) and x(n) are, respectively, an estimated vocal turbulent noise component and an original voice waveform, and BL is a constant for compensating for the amount of noise energy removed by the comb filter, as described by Kasuya et al. (8).
Test signals The accuracy of the pitch detection method was tested with use of periodic synthesized signals withJournal o f Voice, Vol. 9, No. 4, 1995
432
D. Z. HUANG ET AL.
out additive noise, which were produced by the following equation:
M
[2wnk
y(n) = E A(k)sinL-- f -
)
+ q~(k)
Pitch period error (point)
(3)
k=l
where A(k) is the amplitude of k-th harmonic component that simulates a vowel/ae/as in " b a t , " d~(k) is the phase of k-th harmonic component, M is the number of harmonics, and T is the normalized pitch period (points). In our experiment, d~(k) = 0 and M = 23. T is defined by the following equation: T = Fs x P
(4)
where F S is the sampling frequency (44 kHz), and P is the pitch period in seconds. For simulating a child voice, T is allowed to vary from 133 to 134 points with a step 0.2, which corresponds to a change from 3.325 to 3.35 ms. For simulating a female voice, T is allowed to vary from 174 to 175 points, which corresponds to a change from 4.35 to 4.375 ms. Similarly, for simulating a male voice, T is allowed to vary from 333 to 334 points, corresponding to a change from 8.325 to 8.35 ms. Accuracy of the pitch detection methods In the present study, two interpolation methods were applied to the AMDF in order to determine the most precise method of pitch period extraction. The two methods employed were parabolic interpolation and interpolation with bilinear approximation. Measures obtained with use of these interpolation methods were compared to measures derived from the AMDF when no interpolation was employed. Results showing the accuracy of the pitch detection method, with and without interpolation, are provided in Table 1. Results obtained from test signals, which include constant pitch periods from integer multiples and noninteger multiples, allow us to draw the following conclusions about pitch period detection. First, the standard deviations of data obtained via the parabolic and bilinear interpolation methods were always smaller than the standard deviation when no interpolation was used. The second observation from Table 1 is that the bias of the interpolation methods was generally smaller than the bias obtained with the "no interpolation" method. Third, the bias of bilinear interpolation method was always smaller than the bias obtained with the parabolic method. Thus, it appears clear from Table 1 that the Journal of Voice, Vol. 9, No. 4, 1995
T A B L E 1. Comparison o f accuracy o f the pitch extraction method with or without interpolation on the average magnitude difference fimction
No int.
Parabolic int.
Bilinear int.
Pitch period (point)
Bias
SD
Bias
SD
Bias
SD
133.0 133.2 133.4 133.6 133.8 134.0 174.0 174.2 174.4 174.6 174.8 175.0 333.0 333.2 333.4 333.6 333.8 334.0 Mean
.032 .007 .013 .013 .007 .032 .032 .020 .013 .013 .007 .032 .032 .027 .021 .014 .007 .035 .020
.183 .402 .495 .495 .402 .183 .183 .402 .495 .495 .402 .183 .183 .385 .494 .501 .412 .189 .360
.001 .077 .069 .065 .073 .00! .001 .072 .064 .070 .078 .003 .001 .076 .067 .066 .074 .001 .048
.001 .078 .070 .066 .075 .001 .001 .073 .065 .072 .080 .003 .001 .077 .069 .067 .076 .001 .049
.000 .004 .001 .005 .009 .000 .000 .012 .010 .007 .004 .000 .000 .001 .001 .002 .002 .000 .003
.000 .006 .004 .006 .010 .000 .000 .011 .010 .007 .005 .000 .000 .003 .002 .002 .005 .000 .004
Bias is the difference of an average of measured pitch periods from the true value, and SD is the standard deviation of measured pitch period values, lnt,, interpolation.
bilinear method is more accurate than the parabolic method. In addition, Table l indicates that both of the interpolation methods are more accurate than the results obtained with no interpolation. In order to estimate the sensitivity of the bilinear interpolation method of pitch measurement, whitenoise signals were scaled appropriately and then added point-for-point with the above periodic synthesized signals y(n). As the signal-to-noise ratio (SNR) decreased, reflected by increasing the amount of white noise added point-for-point on synthesized signals, jitter and the value of normalized root mean square (RMS) error of the pitch period clearly increased, as shown in Fig. 2. It is also clear that variations in jitter imposed by noise are relatively small when SNRs are >45 dB. These data provide support for the use of a bilinear interpolation of the AMDF as a pitch detection algorithm. It appears to be an accurate method for pitch period measurement. Therefore, in this study, pitch detection for both acoustic and EGG signals was obtained by incorporating bilinear interpolation of sample data points on the AMDF. Perturbation measures were obtained with use of a 5-point moving average procedure.
MEASURES OF VOCAL FUNCTION
XO.OI % 00000
0
p~ ©
~
o,
o%000000
~,
0
0
0
0
i
o
o
i0
0
i
ol
0 0
10
5 p~
I•000000000 L
O 0 I
n
[]
[]
0
O 13
,,~ 10 [.5
no0 0000
0
10
0
20
30
SNR
IdB)
0
0
40
50
FIG. 2. Jitter, normalized root mean square (RMS) and bias errors as a function of signal-to-noise ratio (SNR) o f synthesized signals with white noise.
VOICE PERTURBATION MEASUREMENTS The next investigation was to examine the influence of three vocal effort levels on the measures of jitter, shimmer, NNE, CQ, and CQP over time. In a repeated measures design, three male subjects pronounced the sustained v o w e l s / ~ e / a n d / a / a t three vocal efforts (low, normal, and high) five times a day. Recordings were made on every other day for 3 different days. Jitter, shimmer, NNE, CQ, and CQP were measured from each vowel sample. Although these acoustic and EGG measures may be easy to obtain, their usefulness in analysis of voice disorders and in measuring progress during therapy is contingent upon how these measures change during different vocalization conditions. Hence, our interest was focused on how these measures change during voice production at different vocal effort levels.
433
Subjects Subjects were three healthy men with no history of voice disorders or present complaint of voice disorders. All subjects were in good health on each day of testing with no history of audiological, neurological, or chronic respiratory disease. Stimuli We manipulated vocal effort level by having subjects produce the vowels &e/ and /a/ at " l o w , " "normal," and "high" vocal levels. With use of a repeated measures design, each of three adult male normal talkers produced five replications of each vowel, at each vocal effort level, on each of 3 different days. These utterances were produced under two different conditions: (a) spontaneous vowel production, and (b) imitative vowel production. Spontaneous vowel production Each subject was first directed to sustain the vowel &e/as in " b a t " five times with each utterance lasting >3 s at normal effort. Then, the subject was asked to repeat the sustained vowel five times again with each production lasting >3 s, at a low vocal effort. Similarly, five replications of the vowel were obtained at high vocal effort. The same procedure was used to obtain tokens of the vowel/a/at each of the three vocal effort levels.
hnitative vowel production In order to determine if talkers have greater variability in vowel production spontaneously than during production of imitated vowels (model matching), a vocal imitation condition was included. In this condition, each subject was instructed to sustain each vowel (/~e/ and /a/) in imitation of synthetically generated vowel tokens. The computersynthesized vowel stimuli were presented via loudspeaker to each talker at the following intensity levels: low = 67 dB sound pressure level (SPL), normal = 72 dB SPL, and high = 75 dB SPL. These intensity levels were chosen because they represented approximate average levels monitored in pilot investigations during production of low, normal, and high vocal efforts of the spontaneous vowels/a/and/~e/. Recording Each subject for this experiment was seated in a soundproof room (IAC 1200) and comfortably positioned in a headrest so that a condenser microphone (SONY ECM22-P) was positioned at a constant microphone-to-mouth distance of 10 cm. Plate electrodes were placed over the thyroid lamina to obtain EGG signals (Synchro Voice ElectroglottograJournal of Voice, Vol. 9, No. 4, 1995
434
D. Z. H U A N G E T A L .
phy). During the recording of both acoustic and EGG signals into a computer, no attempt was made to control Fo at any of the vocal efforts, although F o remained reasonably constant throughout production of sustained vowels at each vocal effort level. F 0 did not vary greatly from one vocal effort level to another.
and high. For example, it can be seen in Fig. 3a that jitter decreases with increasing vocal effort level. Similarly, shimmer reduces with increasing vocal effort level (Fig. 3b). Please note that in Fig. 3c we have used the acronym N N E (8,23) to represent normalized noise energy (glottal noise energy). Obviously this graph has to be interpreted by bearing in mind the fact that noise energy is measured in relation to the amplitude of the harmonic energy in the vowel. Therefore, the minus values indicate how many decibels below the signal energy is the level of the noise energy (e.g., a smaller negative value indicates a larger amount of noise than does a larger negative value). Figure 3c shows that as vocal effort level is increased, the relative amount of noise in the vocalization decreases. In the CQ graph (Fig. 3d), it can be observed that at high vocal effort levels of phonation, the vocal folds are closed for a considerably longer percentage of each vocal cycle than during the normal and low vocal effort levels of phonation. This result is consistent with Flanagan's model of vocal fold vibration (24). And in the CQP graph (Fig. 3e), we see that the percentage of CQP decreases with increasing vocal effort level. Figure 4(a---e) shows the data obtained from spontaneous productions of the vowel/a/. Although the
Analysis of samples Each vocal token produced by the subjects in this experiment was digitized at a sampling frequency of 22,050 Hz/channel, with an accuracy of 16 bits/ sample and analyzed by using the software Voice Evaluation and Therapy (VET 2.00, Tiger Electronics) (10). Only the middle portions of the vowel at each vocal effort were used for analysis. Following bilinear interpolation of the AMDF, jitter, shimmer, and NNE were obtained from the acoustic signals, and CQ and CQP from the EGG signals. Results The results of this experiment can be seen in the following series of figures. Figure 3 shows the resuits for the vowel/~e/produced in a natural, spontaneous manner. Each of the bar graphs shows the means and standard deviations of the data obtained for each vocal effort level condition: low, normal,
035
3
03
2.5
-4
~2
025 ~'-
o -2
02
~015 -10
OI
-12
05
o 05 o
Low
Normal Vocal Effort
High
-II
o
-16
Vocal Effort
Lo~
Normal Vocal Effon
Hzgh
(c) 7O
3
6O
5O
20 10 o
ill Lob
Normal Vocal Effon
25
Spontaneous Vowel/ae/ Production
2
115 High
(d)
0
Lov,.
Normal
High
Vocal Effort
(e)
FIG. 3. Jitter, shimmer, normalized noise energy (NNE), contact quotient (CQ), and contact quotient perturbation (CQP) from a sustained vowel I~elas a functionof three vocal efforts in the spontaneous production. Journal of Voice, Vol. 9, No. 4, 1995
MEASURES OF VOCAL FUNCTION
03
25
435
o
o 25 {I 2
-
"l
(115 OI
~ o o5
05
{I Low
Normal Vocal Effort
60
0
High
-it)
-t5
-
L....
-2D
.
v
~ow
Spontaneous Vowel~a~ Production
08
"t ('~04 •
0
Lot~
Normal Vocal Effort
~o~alrt
12
l
I (I
. . . . . . .
1
(12 t (I !
High
Low
Normal Vocal Effort
(d)
High
(e)
FIG. 4. Jitter, shimmer, normalized noise energy (NNE), contact quotient (CQ), and contact quotient perturbation (CQP) from a sustained vowel/aJ as a function of three vocal efforts in the spontaneous production.
tive response to target acoustic models produced by voice synthesis to simulate "average vocal effort levels." The target vowels for the low, normal, and high conditions were synthesized at 67, 72, and 75
values of the various measures may differ slightly from those obtained for the vowel/~e/, the patterns of change are rather similar. Figure 5 shows the vowel/~e/produced in imita-
o 25
0
2
~15
"S
, ~ 0.15 0 I
-
-r~05
OO5
{I
~"
v?~-ort
H,gh
(a) 6(1 , -
!v!!!
-t2 14
~
. . . . . . . . Low
VoNeaOln~ort
H,gh
(c)
I
5o~ff! I 0~ ~3°i 2oL
og
40
Imitative Vowel/ae/ Production
706
~ ~04
10 ~-
02
0L-
0 Vocal Effort
(a)
.
*
.
Vocal Effort
(e)
FIG. 5. Jitter, shimmer, normalized noise energy (NNE), contact quotient (CQ), and contact quotient perturbation (CQP) from a sustained v o w e l / m / a s a function of three vocal efforts in the imitative production. Journal of Voice. Vol. 9, No. 4, 1995
436
D. Z. H U A N G E T A L .
2
025
"S
0.15
!H
0.05
.
.
.
-
0
-
..... °~05
.
-
o
Lot',.
-
-4
-5
Normal Vocal Effort
(a)
Lov,,
(/,)
60
ii
..
Normal V o c a l Effon
High
(c)
08
Imitative Vowel~a~ Production
0.6 411
~0 20
~
rO 02
10
0
0
.
t Vocal Effort
(a)
(e)
F I G . 6. J i t t e r , s h i m m e r , n o r m a l i z e d n o i s e e n e r g y ( N N E ) . c o n t a c t q u o t i e n t ( C Q ) , a n d c o n t a c t sustained vowel/a/as a f u n c t i o n o f t h r e e v o c a l e f f o r t s in t h e i m i t a t i v e p r o d u c t i o n .
dB SPL, respectively. Similar patterns of changes are observed in these imitative vocalizations in comparison to those obtained during spontaneous vowel productions during changes in the vocal effort level. Figure 6 shows similar results for the/a/ vowel produced at different vocal effort levels in imitative response to the acoustic targets produced by vowel synthesis. Shown in Table 2 are the results of 20 three-way analyses of variance comparing three vocal effort levels, on 3 different days for three different talkers. Five replications of each vowel token in each conTABLE
2.
quotient
perturbation
(CQP)
from a
dition by each talker provided internal variation for the analysis of variance calculated via SYSTAT software. The differences resulting from these influences were evaluated in measures of jitter, shimmer, NNE, CQ, and CQP for each vowel (/a/and /~e/) in the spontaneous and imitative modes of production. Perhaps the most important finding from this study is related to changes occurring from changes in vocal effort level. Table 2 shows that in all cases, changes in vocal effort level caused statistically significant changes (p > 0.05) in the three acoustic measures and in both EGG measures.
Results o f repeated three-way analyses o f variance (one for each acoustic and electroglottographic measure fi)r each vowel) Jitter
Shimmer
NNE
CQ
CQP
Parameter
ah
ae
ahm
aem
ah
ae
ahm
aem
ah
ae
ahm
aem
ah
ae
ahm
aem
ah
ae
ahm
aem
Effo~level (L) Day(D) Su~ect(S) L x D D x S L x S L x D x S
* * * * NS * NS
* * NS NS NS * NS
* NS NS NS NS NS *
* NS * NS NS NS NS
* * * NS NS * NS
* * * NS NS * NS
* NS NS NS * NS *
* * * NS * NS *
* * NS NS NS * NS
* * * NS NS NS NS
* * NS NS * NS *
* * * NS * NS *
* NS NS NS NS * *
* * NS NS * NS *
* NS * * NS NS NS
* * * NS NS NS NS
* NS * NS NS NS NS
* * * * NS NS *
* *
* * NS * * * *
NS * NS
The "'*" indicates a significant difference at the 0.05 level. (ah, /a/ natural, s p o n t a n e o u s ; a e , / a e / n a t u r a l , s p o n t a n e o u s ; ahm, /a/ produced in imitation o f a c o m p u t e r - s y n t h e s i z e d / a / ; aem = / a e / p r o d u c e d in imitation o f a c o m p u t e r - s y n t h e s i z e d / a e / ) . The three parameters assessed in these repeated m e a s u r e s e x p e r i m e n t s were vocal effort level, day-to-day variability, and subject difference. N N E , normalized noise energy; CQ, contact quotient; CQP, contact quotient perturbation; NS, not significant.
Journal of Voice, Vol. 9. No. 4, 1995
M E A S U R E S OF VOCAL F U N C T I O N
DISCUSSION In this paper, we have discussed the development of a computer program for the measurement of vocal pitch perturbation, peak-to-peak amplitude perturbation, normalized noise energy, contact quotient, and contact quotient perturbation (jitter, shimmer, NNE, CQ, and CQP, respectively) on the basis of pitch detection methods using interpolation of the AMDF. Both parabolic and bilinear interpolation methods of the AMDF were evaluated in pilot investigations. Both of these methods provide an obvious advantage for the estimation of pitch period when compared with the "no interpolation" condition. If a relatively low sampling rate is used, such as 11,025 Hz, interpolations will provide an even greater advantage over procedures requiring no interpolation. Our primary interest was to investigate the influence of vocal effort level during vowel production on perturbations of jitter and shimmer, glottal noise, CQ, and CQP measurements and to study the day-to-day variability of each influence. We observed consistent effects of vocal effort level, namely that the high vocal effort condition produced the lowest values for each measure. As a result of our "every-other-day" sampling procedure for obtaining the jitter, shimmer, NNE, CQ, and CQP values associated with the three vocal efforts, we observed that, in most cases, these measures varied significantly from day to day. Two hypotheses may be proposed: either the actual vocal effort levels used by the talkers varied from day to day or the subjects were variant in the amount of perturbation employed at the same vocal effort level from day to day. The first hypothesis seems to be supported by the trend of fewer significant differences among measures of imitated vocalizations from day to day. The first conclusion to be drawn from these data is that it is very important to control vocal effort level when analyzing vocalizations for research or clinical purposes. Widely variant measures of jitter, shimmer, NNE, CQ, and CQP can be obtained from the same talker merely by having the talker produce the same vowel at different levels of vocal effort. It appears that standardization of vocalization in terms of vocal effort level will be required before intra- and intersubject comparisons will be useful (meaningful). We assume that the same conclusion would apply to vocalizations produced by healthy and pathological subjects. On the other hand, it ap-
437
pears reasonable to analyze only very steady utterances from subjects in order to get a better approximation of a speaker's typical perturbation values. This criterion may make it impossible to measure the vowel productions of some pathological speakers. As Titze et al. (13) have suggested, it appears best to use a voice sample of at least 20-30 cycles in duration when measuring jitter and shimmer in healthy speakers. A smaller sample provides an insufficient number of cycles for calculating perturbation data. Whether or not 20-30 cycles is sufficient with some, or all, pathological speakers is uncertain. What is clear is that a sufficiently long sample duration is needed to obtain a stable estimate of perturbation measures. Certainly, longer vowel duration is desirable, but at a cost of increased processing time. The results of the present study suggest that more vowel repetitions are helpful in determining a speaker's typical production of a given vowel. Furthermore, it is our observation that the first vowel produced during a recording session usually yields the highest amount of variability, presumably because of psychological influences. Letting the subject acclimate to the recording situation during practice trials may be useful in obtaining stable performance during vowel production. By way of suggestions for future research, it should be noted that when tokens are recorded on a high-quality digital audiotape recorder or digitized directly into a computer prior to analysis it appears to have a noticeable and salutary effect on the measures obtained. Poor quality audio recordings introduce greater amounts of background noise that contribute to measured values of jitter, shimmer, and NNE. We believe future studies should employ direct recording into the computer to minimize the effects of audiotape noise, thereby improving the stability of perturbation data reported. CONCLUSIONS The take-home message from this experiment is that if these acoustic and EGG measures are to be taken in the clinic and used to compare the patient's performance from one point in time to another, it is important to have the vocalizations produced at the same vocal effort level. Second, as Table 2 shows, in most cases there was variability in these measures from day to day. Thus, it may be important to obtain recordings from several days in order to obtain a good indication of "average" subject performance. Finally, it should be pointed out that this Journal of Voice, Vol. 9, No. 4, 1995
438
D. Z. H U A N G ET AL.
experiment was designed to investigate how these measures varied during vocalizations produced by normal talkers, under the prescribed conditions. It would be of considerable clinical importance to determine whether patients with voice disorders produce similar changes. Further investigations with both normal and pathological speakers should begin to provide an answer. Acknowledgment: We would like to thank Dr. Y. Kikuchi at Tsukuba Junior College and Dr. Robert Orlikoff at Memphis State University for their suggestions regarding this research project. REFERENCES 1. Aronson AE. Clinical voice disorders: an interdisciplinary approach. New York: Thieme-Stratton, 1980. 2. Baken RJ. Clinical measurement o f speech and voice. San Diego: College-Hill, 1987. 3. Boone DR, McFarlane S. The voice and voice therapy. 4th ed. Englewood Cliffs, N J: Prentice Hall, 1988. 4. Davis S. Computer evaluation of laryngeal pathology based on inverse filtering of speech. SCRL Monograph 1976:13. 5. Hirano M. Clinical examination o f voice. Wien, New York: Springer-Verlag, 1981. 6. Hirano M. Objective evaluation of the human voice: clinical aspects. Folia Phoniatr (Basel) 1989;41:89-144. 7. Hirano M, Matsushita H, Hiki S. Acoustic analysis for voice disorders: a basic conception for the use of acoustic measurements for the diagnosis in voice disorders. Practical Otolal3,ngology (Kyoto) 1976;69:267-71. 8. Kasuya H, Ogawa S, Kikuchi Y. An adaptive comb filtering method as applied to acoustic analyses of pathological voice. Proceedings of International Conference on Acoustics, Speech, and Signal Processing 1986;1:669-72. 9. Huang Z. A review of speech analysis and synthesis system to evaluate pathological voices. Journal o f Shanghai Television University 1988;10:87-91.
Journal of Voice, t/ol. 9, No. 4, 1995
10. Huang Z, Minifie F, Lin X. An integrated clinical program for voice evaluation and therapy. Paper presented at American Spech-Language-Hearing Assocaition Conference, San Antonio, Texas, November 1992. I1. Linville SE. Intraspeaker variability in fundamental frequency stability: an age-related phenomenon? J Acoust Soc Am 1988;83:741-5. 12. Higgins MB, Saxman JH. A comparison of intrasubject variation across sessions of three vocal frequency perturbation indices. J Acoust Sac Am 1989;86:91 I-6. 13. Titze 1, Horii Y, Scherer R. Some technical considerations in voice perturbation measurements. J Speech Hear Res 1987;30:252-9. 14. Brown WS, Morris RJ, Micheal JF. Vocal jitter in young adult and aged female voice. J Voice 1989;3:113-9. 15. Orlikoff RF, Huang Z. Influence of vowel production on acoustic and electroglottographic perturbation measures. Paper presented at American Speech-Language-Hearing Association Conference, Atlanta, Georgia, November 1991. 16. Fant G. Voice source dynamics. Speech Transmission LabsQuarterly Progress Status Reports 1980;2-3:17-37. 17. Hess W. Pitch determination o f speech signals. Berlin: Springer-Verlag, 1983. 18. Milenkovic P. Least mean square measures of voice perturbation. J Speech Heat" Res 1987;30:529-38. 19. Haji T, Horiguchi S, Bear T, Gold WJ. Frequency and amplitude perturbation analysis of electroglottograph during sustained phonation. J Acoust Soc Am 1986;80:58-62. 20. Orlikoff RF, Baken RJ. Consideration of the relationship between the fundamental frequency of phonation and vocal jitter. Folia Phoniatr (Basel) 1990;42:31--40. 21. Rothenberg M, Mahshie JJ. Monitoring vocal fold abduction through vocal fold contact area. J Speech Heat" Res 1988; 31:338-51. 22. Kasuya H, Zue W, Endo Y. Measurements of laryngeal turbulent noise in pathological voice. Paper presented at American Speech-Language-Hearing Association Conference, Anaheim, California, November 1993. 23. Kasuya H, Ogawa S. Normalized noise energy as an acoustic measure to evaluate pathologic voice. J Acoust Soc Am 1986 ;80:1329-34. 24. Flanagan JL. Speech analysis, synthesis and perception. 2rid ed. New York, Berlin: Springer Verlag, 1972.