Time-shift, trial, and gender effects on vocal perturbation measures

Time-shift, trial, and gender effects on vocal perturbation measures

Journal of Voice Vol. 7, No. 4, pp. 326-336 © 1993 Raven Press, Ltd., New York Time-Shift, Trial, and Gender Effects on Vocal Perturbation Measures ...

949KB Sizes 0 Downloads 19 Views

Journal of Voice

Vol. 7, No. 4, pp. 326-336 © 1993 Raven Press, Ltd., New York

Time-Shift, Trial, and Gender Effects on Vocal Perturbation Measures *tMehdi Jafari, *tJames A. Till, *Lauren F. Truesdell, and Cindy B. Law-Till *Dysarthria Laboratory, Department of Audiology and Speech Pathology, VA Medical Center, Long Beach, and ?Speech and Voice Laboratory, Department of Otolaryngology-Head and Neck Surgery, University of California, Irvine, California, U.S.A.

Summary: Three jitter and three shimmer measures were examined for: (1) the stability of the measured values with respect to shifts in sample site (starting point of window in the token) and (2) the effects of trial and gender. The perturbation measures and their coefficients of variation for windows starting from five different but adjacent cycles showed a dependence on sample size. Their variability with regard to shifts in sample site decreased asymptotically with increasing size. The data suggested no statistically significant trial effect except for APQ and no statistically significant gender effect except for absolute jitter. It is speculated that relatively long smoothing windows for shimmer (such as the one used for APQ) would allow the effects of slow vocal modulations (e.g., tremor) to accentuate, especially for lower pitch speakers. Key Words: Perturbation--Sampling stability--Trial and gender effects.

Voice production is a complex process involving both central and peripheral nervous systems, as well as muscular coalitions. The short-term temporal and amplitude stability of the voice signal provides insight into laryngeal control processes and the interactions between various biomechanical and aerodynamic factors. In the normal voice, there are inherent temporal and amplitude perturbations that are caused by various factors. Titze et al. (1) enumerated possible sources of perturbation, such as (a) randomness in the action potentials of laryngeal muscles that create fluctuations in the muscle forces and configuration of the larynx, (b) randomness in the distribution of mucus on the folds and asymmetries in vocal fold structures, (c) randomness in the flow emerging from the glottis, and (d) irregularity in source--vocal tract interactions that stem from nonstationary articulatory configurations. Pathologic voices demonstrate perturbations

(aperiodicities or irregularities) on a larger and perhaps qualitatively different scale. Many investigations have shown that dysphonic voices tend to demonstrate larger than normal aperiodicity. Baken (2) maintains that both frequency and amplitude perturbations are sufficiently sensitive to pathological changes in the phonatory process to warrant considering evaluation of vocal performance. He also notes that amplitude perturbation is at least as important as frequency perturbation in determining perceived hoarseness (3,4), and possibly even more important (5). The conventional perturbation measures (2) in their present form probably are not sufficiently refined to serve as a standard for vocal performance. Hillenbrand (6) has suggested that longer-term sequential characteristics of pitch or amplitude change should be quantified and combined with the more conventional measures of perturbation. Pinto and Titze (7) have discussed the voice perturbation measures within the framework of signal theory and recommend unifying a variety of existing jitter, shimmer, and noise measures on the basis of com-

Accepted July 20, 1992. Address correspondence and reprint requests to D'i'. M. Jafari at Speech Pathology (126), VA Medical Center, 5901 E. Seventh Street, Long Beach, CA 90822, U.S.A.

326

FACTORS AFFECTING VOCAL PERTURBATIONS

mon underlying perturbation functions and their derivatives. Furthermore, Titze (8) investigated the development of techniques to separate neurologic jitter from other sources of irregularity in vocal-fold vibration. In addition, conventional perturbation measures lack standard procedure and analysis methodology, and fail to classify voice disorders reliably (9). A number of investigators have considered the technical aspects of jitter and shimmer extraction (1,10-12). However, several methodological issues require further investigation. Prominent factors are reliability and representativeness. If measures of jitter and shimmer are to be useful for clinical assessment, acceptable resolution of the reliability and representativeness issues are required. This implies the need for specification of standard task(s), the number of tokens, the sample size (specific amount of time or number of cycles), a sampling location (where in the token), and a method of analysis. It is known that perturbation values depend on the analysis window size (1,11-13). However, it is not yet clear what a sufficient sample size should be for various measures. Based on data obtained from two normal, trained subjects, Titze et al. (1) concluded that both jitter and shimmer stabilized after I>20--30 cycles were included in a window. Deem et al. (11) tested 12 untrained, young, normal speakers and found that jitter values were stable for sample durations of t>40 cycles. Karnell (12) tested 18 adults with symptoms of possible laryngeal dysfunction (none to moderate hoarseness) and concluded that 190 cycles for jitter and 130 cycles for shimmer analysis may be required before they approach an asymptote. These studies used the same relative measures of perturbation (average absolute difference between consecutive pitch periods or amplitudes divided by the average period or amplitude). There are no published data regarding adequate sample size for other conventional perturbation measures. Moreover, comparisons among alternative measures for the same speakers might suggest which perturbation measure achieves statistical stability most rapidly as sample size increases. Such a measure would allow shorter sampling periods to obtain perturbation values that validly represent the steady-state vocal behavior. The effects of sampling location within the steady-state portion of a token have not been studied. It is likely that, even within the steady-state portion, a change of analysis window location would affect the calculated perturbation values.

327

This effect may be different for various measures because of their different mathematical formulations. Furthermore, it appears that the sampling location effect would interact with the analysis window size. Therefore, investigation of the effect of window size on the stability of perturbation values calculated from shifted analysis windows seems appropriate. If the minimum window size(s) could be determined such that specific perturbation measures became relatively shift invariant, their reliability would be significantly enhanced. Titze et al. (1) addressed the issue of adequate number of tokens to obtain a relative measure of perturbation. Their results, based on data obtained from two normal subjects, suggested that several tokens of sustained/a/were required to obtain valid results. For clinical applications, the number of tokens required is an important practical issue. There is a need for additional data to suggest the minimum number of tokens required for various measures of perturbation. The effects of gender and fundamental frequency on jitter have been studied to various degrees. However, these effects remain largely unexplored for shimmer (2). Absolute jitter is known to be lower for women (14). Furthermore, several investigators have reported a considerable association between larger cycle-to-cycle differences and larger fundamental periods (15-19). The opposite is expected for relative measures of jitter because, in their formulas, absolute perturbation is divided by the mean period (2). This paper investigated (a) the stability of selected perturbation measures with r e s p e c t to changes in sample size (number of pitch cycles included) and shifts in sample site (starting point of window in the token), and (b) the effects of trial and gender on vocal perturbation. Three different measures of jitter were studied: jitter ratio (JR) (18), period variability index (PVI) (20), and relative average perturbation (RAP) (17). The three shimmer measures were dB Shimmer (21), amplitude variability index (AVI) (20), and amplitude perturbation quotient (APQ) (5). METHODS Table 1 shows the formulas for the three jitter and three shimmer measures studied. Subjects and procedure Five male [mean age = 30, SD = 9.5, range = 21-43 years; mean fundamental frequency (fo) = Journal of Voice, Vol. 7, No. 4, 1993

328

M. J A F A R I E T A L .

TABLE 1. Selected jitter and shimmer measures and corresponding perturbation functions Measure

between trials and each subject practiced the task at least twice.

Formula

Instrumentation

,.o

n, i1

P+)

The signal from the microphone was conditioned through a preamplifier (Rane, MS1), anti-aliasing filter (Wavetek BrickwaU Filter 752A, low-pass cutoff frequency = 4 KHZ), and then digitized at 40 K H Z with 12-bit quantization (RC Electronics, ISC-16) using an 80386 PC. The initial 500 ms after the start of phonation was omitted from analysis. The remaining data were analyzed for pitch period extraction using automated acoustic analysis software incorporated in the Casper speech measurement system (22). Pitch detection was based on a peak-picking method combined with an adaptive autocorrelation technique and subsequent interpolated zero-crossing determination. Because the results of the current investigation rely crucially on correct cycle-to-cycle segmentation, all preliminary pitch detection markings were inspected visually. No errors in pitch marking were observed.

PVI b

_ _1

RAP c

n-2

dB Shimmerd

g

(Pi-I + Pi + Pi+l) 3

i=2

- Pi t

~

n - 1

Logl0 i=l

, og,0('? °

AVI b

1

APQ e n-

n-5

(!

i=6

Analysis

11

- Ai

!)

= (l/n) ~n=lAi; P = (l/n) gp=lPi; A i = peak to peak amplitude of the ith pitch cycle. JR, jitter ratio; PVI, period variability index; RAP, relative average perturbation; AVI, amplitude variability index; APQ, amplitude perturbation quotient. a Ref. 18, bref. 20, Cref. 17, dref. 21, eref. 5.

112.1, SD = 16.0, range = 89.3-139.9 Hz] and five female (mean age = 27.4, SD = 3.8, range = 22-33 years; mean fo = 221.2, SD = 15.2, range = 194.6242.9 Hz) subjects were studied. They were native speakers of American-English without a history of speech, hearing, or neuromotor disorder and without training in voice control. Each subject was seated comfortably in a sound-treated booth (IACModel 40) and wore a head-mounted microphone (Telex PH-91). The mouth-microphone distance was set at 50 mm and approximately 45 ° azimuth. A test session consisted of three 3-s prolongations of the v o w e l / a / a s steady as possible at a comfortable pitch and effort level. There was a 1-min rest period Journal of Voice, Vol. 7, No. 4, 1993

Data analysis consisted of two parts: (a) investigation of the sampling stability of jitter and shimmer measures with respect to changes in sample size (number of pitch cycles included) and shifts in sample site (starting point of window in the token), and (b) examination of effects of trial and gender on vocal perturbation. Stability with respect to window starting point was considered important because a perturbation measure that changed markedly when a different but adjacent cycle began the analysis window is neither stable nor clinically desirable. To investigate sampling stability, jitter and shimmer values were calculated for sample sizes ranging from 10 to 100 cycles. For each sample size, five separate perturbation values were computed using five different starting points (cycles 1, 2, 3, 4, and 5). The variability of each perturbation measure was indicated by the coefficient of variation (CV) calculated from perturbation values for the five different starting points. Analyses of variance (ANOVA) were performed on the CVs for each of the measures to study the effects of gender, trial, sample starting point, and sample size on sampling stability. Post-hoc Newman-Keuls test were used to examine the nature of significant effects. In addition, ANOVAs were used to examine the effects of trial and gender on jitter and shimmer. Data for

FACTORS AFFECTING VOCAL PERTURBATIONS

these analyses were derived from 80-cycle samples to provide relatively stable measures of jitter and shimmer. Repeated measures ANOVA was used to investigate trial and gender effects. In all cases, statistical significance was inferred based on a p < 0.05. RESULTS AND DISCUSSION Sampling stability Initial gender × trial x sample size ANOVA for CVs across different starting sites showed no significant main effects for gender or trial and no threeway interaction for any of the measures considered. However, CVs for JR and RAP produced significant gender by sample size interactions. This finding suggests that the effect of sample size was not the same for each gender group. Therefore, for these measures additional repeated measures ANOVAs for sample size were performed for each gender group. Because there was no significant main effect or interaction for trial, the data were pooled across the three trials for these analyses. Table 2 summarizes the statistical results for jitter measures. JR The main effect for sample size was significant for both gender groups. For the male group, Neuman-Keuls contrasts of the CV means revealed

T A B L E 2. A N O V A Measure JR

PVI RAP

three main findings. First, both the 10- and 20-cycle conditions differed significantly from samples with i>30 cycles. Second, the 10- and 20-cycle conditions were significantly different from each other. Third, there were no significant differences among conditions with/>30 cycles. Newman-Keuls testing of the CV means for the female group revealed a pattern of differences among the CV means similar, but not identical to the males. First, the 10-cycle condition was significantly different from all other sample sizes. Second, the 20-cycle condition was significantly different from conditions with >160 cycles. Third, there were no significant differences among conditions with/>30 cycles. In general, the CVs suggest that for both males and females the variability in JR with different starting points decreases with increasing sample size and stays almost the same for samples >60 cycles. The trend is also evident in Fig. 1, which illustrates the mean JR values for different starting points and sample sizes for male and female subjects. The divergence of JR values for traces c 1 through c5 (Fig. 1) for sample sizes <40 cycles is in marked contrast to their near equality for windows >60 cycles. It is interesting to note that with increasing sample size, JR values decrease for males (Fig. 1B) and increase for females (Fig. 1A). Figure 1 shows that for small samples the value of JR was larger for males than

and N e u m a n - K e u l s comparison results f o r C V s o f jitter measures

Gender

Trial

NS

NS

NS NS

329

NS NS

Interaction (Gender × Sample size)

Sample size F = 38.5, p Males: F = Females: F F = 4.66, p F = 44.7, p Males: F = Females: F

< 0.01 43.1, p < = 17.9, p < 0.01 < 0.001 25.3, p < = 25.0, p

F = 3.17, p < 0.01 0.01 < 0.01 None F = 5.52, p < 0.01 0.01 < 0.01

N e u m a n - K e u l s comparison of CV m e a n s for different sample sizes JR

RAP

Males

Females

a. 10-cycle ~ others (p = 0.01) b. 20-cycle ~ others (p < 0.05) c. />30 cycles not different

a. 10-cycle # others (p = 0.01) b. 20-cycle ~ samples > 50 cycles (p = 0.05) c. ~>30 cycles n o t different

a. 10-cycle ~ 20 cycle (p < 0.05) b. 10- & 20-cycle ~ others (p < 0.05) c. I>30 cycles not different

a. 10-cycle # others (p < 0.05) b. 20-cycle ~ samples > 50 cycles (p = 0.05) c. t>30 cycles not different

PVI

Males and females a. 10-cycle ~ 20-cycle (p < 0.05)

b. > 2 0 cycles n o t different

NS = not statistically significant (p > 0.05). Three-way interactions were not significant in all cases.

Journal of Voice, Vol. 7, No. 4, 1993

330

M. J A F A R I E T AL. 0.50

~'

]

shows a decreasing trend for males (Fig. 2B), but continue to show an increasing trend for females (Fig. 2A). One reason for the discrepancy may be the fact that the average fo for females was twice that of males; therefore, the average total time for 100 pitch periods for females was equal to the average total time for 50 pitch periods for males. As Fig. 2 shows, for almost the same amount of time (50 cycles for males and 100 cycles for females) PVI values demonstrate a similar increasing trend.

0.45

1

¢4 c3 0 .:55

o

" 2'0 " 4; " 6; " 8; " ~;o " G Sample Size (cycle) (F)

R

0.50

-

0.48- c4

c5

0.46 - c3 vO 0.44,,~

0.42

- c2

RAP The A N O V A s performed on the male and female groups separately showed significant main effects for sample size. F o r the male group, Neuman-Keuls tests of the CV means showed three findings. First, the RAP CV for conditions of 10 and 20 cycles were significantly different compared to conditions with more cycles. Second, the 10- and 20-cycle conditions were significantly different. Third, conditions

A

0.5.

0.40 -

g

0.38 -

0.36

o

"~ 0.4. '

2'0

'

4'0

"

~

"

8~

"

,,;0

'

,~0

Sample Size (cycle) (M)

~ 0.30-4

FIG. 1. Average JR values (Ci denotes window starting cycle); A: females (F). B: males (M).

for females, whereas the opposite was true for larger samples. PVI Neuman-Keuls tests o f the CV means showed that the 10-cycle condition was significantly different from the 20-cycle condition. Also the 10-cycle condition differed significantly with conditions >30 cycles. There were no significant differences among conditions with/>20 cycles. Figure 2 illustrates the behavior of average PVI values for different starting points and sample sizes for male and female subjects. PVI CV showed a decreasing trend with increasing sample size. Figure 2B shows that the PVI values for males for samples >50 cycles are almost equal regardless of window site. In samples I>50 cycles, the PVI values increase with increasing sample size for both males and females (Fig. 2). For samples including 60-100 cycles, the PVI value Journal of Voice, Vol. 7, No. 4, 1993

0.2

o

" ~'o " 4'o " 6'o " 8'o " ~;o " G Sample Size (cycle) (F)

I=I

0.7

~ 0.5 ~ 0.4 ~

. 0,3-

:

'~0.102i 20

40

60

80

100

120

Sample Size(cycle) (M)

FIG. 2. Average PVI values (Ci denotes window starting cycle); A: females (F). B: males (M).

FACTORS AFFECTING VOCAL PERTURBATIONS

with i>30 cycles did not differ significantly from each other. For the female subjects, Neuman-Keuls testing revealed a pattern of differences among the CV means similar, but not identical, to the males. First, the 10-cycle conditions were significantly different from larger sizes. Second, the 20-cycle condition was significantly different from conditions >40 cycles. Third, conditions with I>30 cycles did not differ significantly. In general, the data indicated that for both males and females the variability in RAP with different starting points decreases with increasing sample size and stays almost the same for samples >40 cycles. The same trend is demonstrated in Fig. 3, which illustrates mean RAP values for different starting points and sample sizes for male and female subjects. The divergence in RAP values for cycles 1-5 for windows of ~<50 cycles is in marked contrast to the near equality of RAP values of cycles 1-5 for 0.30 -

.-.t o

0.28

1~

o .26

~

0.24 °

"~

0,22"

0.20



2'0

"

4'0

" 6'o

"

8.0 " 40

" ,~o

(F) 828

c5

~

0.26

8 •24

I> ~

3. A N O V A

and N e u m a n - K e u l s comparison

results for CVs of shimmer measures

Measure

Gender Trial

dB Shimmer AVI APQ

NS NS NS

NS NS NS

Sample size

Interaction (Gender x Sample size)

F = 17.3, p = 0.0001 F = 3.03, p = 0.006 F = 93.9, p = 0.0001

None None None

Neuman-Keuls comparison of CV means for different sample sizes Males and females dB Shimmer

a. b. c. d.

20-cycle # others (p = 0.05) 30-cycle # larger samples (p = 0.05) 40-cycle ~ samples > 70 cycles (p = 0.05) />70 cycles not different

AVI

40-cycle # others (p < 0.05)

APQ

a. 20- and 30-cycle # others (p < 0.05) b. 40-cycle # 60-cycle or larger (p = 0.05) c. ~>50 cycles not different

N S = n o t s t a t i s t i c a l l y s i g n i f i c a n t (p > 0.05). T h r e e - w a y i n t e r a c t i o n s w e r e n o t s i g n i f i c a n t in all c a s e s .

larger sample sizes. Like JR, RAP values with increasing sample size decreased for males (Fig. 3B) and increased for females (Fig. 3A). Figure 3 suggests that for small samples the value of RAP is larger for males than females, and vice versa for larger samples. dB Shimmer, A P Q a n d AVI Table 3 summarizes the statistical results for the shimmer measures considered. For dB Shimmer, Neuman-Keuls testing of the differences among the CV means for the combined gender groups showed that (a) the 20-cycle condition was significantly different from larger sizes, (b) the 30-cycle condition was significantly different from conditions >40 cycles, (c) the 40-cycle condition was significantly different from conditions >70 cycles, and (d) the means for conditions with i>50 cycles did not differ significantly. For APQ, Neuman-Keuls testing of the CV means for the combined gender groups showed that (a) the 20- and 30-cycle conditions were significantly different from larger sizes, and (b) the 40cycle condition was significantly different from conditions with/>60 cycles. The means for conditions with I>50 cycles did not differ significantly. For AVI, Neuman-Keuls testing showed that only the mean for the 40-cycle condition was significantly different from the means of larger samples. However, as Table 4 below indicates, the variability in AVI CVs are quite substantial. In general, the shimmer results indicated that the variability in dB Shimmer and APQ with different Shimmer:

Sample size (cycle)

O

TABLE

331

0.22

0.20 20

40

60

80

I00

Sample Size(cycle) (M) F I G . 3. A v e r a g e R A P v a l u e s (Ci d e n o t e s w i n d o w s t a r t i n g c y c l e ) ; A: f e m a l e s (F). B: m a l e s (M).

Journal of Voice, Vol. 7, No. 4, 1993

332

M. J A F A R I ET AL.

starting points decreased with increasing sample size and stayed almost the same for samples >80 cycles. The behavior of AVI CV was more erratic. It did not demonstrate a distinct decreasing trend with increasing sample size. Figures 4-6 show the behavior of the three shimmer measures with increasing sample size. It appears that dB Shimmer and APQ for females tend to stabilize for samples >60 cycles. However, the three shimmer measures considered demonstrated relatively large intersubject variabilities. Comparative sampling stability of measures with sample size ANOVA performed on the CVs over window site shifts indicated that for all the measures except PVI (for females) and AVI, the variation in site shift CV is minimal if adequate number of cycles (>60) are included in the analysis window. Table 4 reports a summary of the CV statistics for all six measures

A ~

O81 0.6

o24

I

0.0

c5 c4 c3c

-0.2 / 0

B

x



'

, 120

°st

.4 0.34 A

0,15-

c5 c4 c2

0.14.

c3

0.1|



2S

"

4S

"

6'0

"

~'0

"

%o

"

G

Sample Size (cycles) (F) 0.22 •

c4 c5 c3 0.20 c2 c!

0.18-

0.16 -

0.14

.

2'0

.

4'0

.

.

~0

.

~

, ~o

, ~o

FIG. 5. Behavior of AVI with increasing sample size (Ci denotes window starting cycle); A: females (F). B: males (M).

0.12

i

.

Sample Size(cycles) (M)

0,13'

B

0

"

2;

"

4;

"

6;

"

8;

"I;0"I;0

Sample Size (cycles) (M)

FIG. 4. Behavior of dB Shimmer with increasing sample size (Ci denotes window starting cycle); A: females (F). B: males (M). Journal of Voice, Vol. 7, No. 4, 1993

across five starting points for sample sizes of 60-100 cycles. The results for PVI, dB Shimmer, AVI, and APQ are pooled across gender because ANOVA indicated no significant gender effect. According to Table 4, JR, RAP, dB Shimmer, and APQ show a decreasing trend in variability regarding local sample site shifts with increasing sample size. Among the measures considered, the mean CV is the largest for PVI (7.0%) and AVI (13.4%). It is reasonable to conceptualize that selection of the first cycle after the initial 500 ms was arbitrary (it could have been, for example, 500 or 565 or 700 ms); therefore, the conclusions regarding sampling stability can be extended over the entire steadystate portion of a token. That is, if a sufficient number of cycles are included, the variability of a perturbation measure due to changing window location in a token will be minimized. This will improve the repeatability and reliability of measurements. For most of the measures examined in this study, the

333

FACTORS AFFECTING VOCAL PERTURBATIONS

variability due to change in the window location will be very small if the analysis window contains ~60 pitch cycles.

1.5.

A

g c4

c~

1.4 ¸

c2

c3

1.3 cl

1.2



2'0

"

40

"

6'0

"

8'0

"

1(;0

"

120

Sample Size (cycles) (F) 2.6"

B g

2.4"

(~

2.2'

0

"~

2,o-

~ =.

~,8-

< 1.6 0

20

40

60

80

100

120

Sample Size (cycles) (M) FIG. 6. Behavior of APQ with increasing sample size (Ci denotes window starting cycle); A" females (F). B: males (M).

Effects of trial and gender on jitter and shimmer Perturbation values derived from 80-cycle samples from the three trials were chosen for further investigation of trial and gender effects. The 80cycle samples were chosen because the results of the previous section suggested that they would be stable. The group mean and standard deviation for each gender and trial appear in Table 5 for each measure considered. In addition, because the three jitter measures studied were relative measures, average absolute cycle-to-cycle period jitter was also calculated and analyzed. I n d e p e n d e n t gender by trial ANOVAs were performed on each perturbation measure. Table 6 summarizes the A N O V A results. There were no significant interactions of gender by trial for any of the measures considered, except for APQ. Therefore, the main effects found for all measures except APQ were interpretable without further analysis. Absolute jitter showed no trial effect, but did show significant gender effect. Mean absolute jitter was smaller for females (mean = 0.0211 ms, SD = 0.0035 ms) than for males (mean = 0.0357, SD = 0.013). This confirms results reported in other studies (2,14). The significant interaction of trial by gender for APQ required additional ANOVAs to examine the effects of trial for each gender group and the effect of gender for each

TABLE 4. Coefficients of variation across five starting points for sample sizes >50 cycles (pooled over three trials) Coefficient of variation Measure JR

60 F M

PVI

RAP

F and M

F M

APQ

F and M

dB Shimmer

F and M

AVI

F and M

70

80

90

100

Mean (SD) 0.018 (0.003)

0.019

0.020

0.020

0.018

0.013

(0.006)

(0.005)

(0.006)

(0.006)

(0.003)

0.020 (0.006) 0.103

0,016 (0.003) 0.050

0.012 (0.006) 0.101

0.012 (0.002) 0.060

0.008 (0.002) 0.035

(0.180) 0.020 (0.008)

(0.057) 0,020 (0.007)

(0.164) 0.020 (0.008)

(0.087) 0.016 (0.005)

(0.023) 0.013 (0.005)

0.018 (0.003) 0.015 (0.005)

0.024

0.017

0.012

0.013

0.010

(0.008)

(0.005)

(0.004)

(0.004)

(0.004)

0.020

0.020

0.016

0.015

0.015

(0.007)

(0.006)

(0.005)

(0.007)

(0.006)

0.021 (0.005) 0.088

0.0t7 (0.006) 0.161

0.013 (0.006) 0.085

0.013 (0.005) 0.295

0.011 (0.006) 0.043

(0.097)

(0.236)

(0.122)

(0.715)

(0.053)

0.014 (0.004) 0.070 (0.031)

0.017 (0.003) 0.015 (0.004) 0.134 (0.099)

Values are means (SD). See Table 1 for list of abbreviations. F, female; M, male. Journal of Voice, Vol. 7, No. 4, 1993

334

M. J A F A R I E T A L . TABLE 5. Perturbation values extracted from

80-cycle samples Measure

Trial 1 mean (SD)

Trial 2 mean (SD)

Trial 3 mean (SD)

JR (%) M F M and F

0.38 (0.07) 0.44 (0.10) 0.41 (0.09)

0.37 (0.10) 0.40 (0.07) 0.38 (0.08)

0.39 (0.13) 0.56 (0.20) 0.48 (0.18)

PVI M F M and F

0.52 (0.20) 0.46 (0.26) 0.49 (0.22)

0.55 (0.20) 0.35 (0.16) 0.45 (0.20)

0.43 (0.29) 0.50 (0.19) 0.46 (0.23)

RAP (%) M F M and F

0.21 (0.05) 0.26 (0.06) 0.23 (0.06)

0.20 (0.07) 0.24 (0.30) 0.22 (0.06)

0.22 (0.08) 0.34 (0.12) 0.28 (0.11)

dB shimmer M F M and F

0.22 (0.12) 0.14 (0.03) 0.18 (0.09)

0.17 (0.09) 0.12 (0.03) 0.15 (0.07)

0.16 (0.08) 0.14 (0.05) 0.15 (0.06)

APQ (%) M F M and F

2.54 (1.36) 1.27 (0.09) 1.90 (1.13)

1.99 (1.08) 1.18 (0.26) 1.59 (0.86)

1.73 (0.95) 1.31 (0.34) 1.52 (0.71)

AVI M F M and F

0.51 (0.76) 0.50 (0.20) 0.51 (0.52)

0.50 (0.51) 0.64 (0.20) 0.57 (0.37)

0.26 (0.44) 0.68 (0.35) 0.47 (0.43)

from the third trial. One-way ANOVA of APQ values comparing gender groups for each of the three trials showed no significant gender effect. SUMMARY

See Table 1 for list of abbreviations. M, male; F, female.

trial. The ANOVA comparing the means for each trial showed a significant trial effect for males and no trial effect for females. Newman-Keuls followup testing revealed that the mean APQ values for males in the first trial was significantly different

The findings of this study confirm that sample size is a significant factor in establishing repeatable procedures for voice perturbation measurement (1, 11,12). The perturbation measures and their coefficients of variation for windows starting from five different but adjacent cycles showed a dependence on sample size. Variability with regard to shifts in sample site decreased asymptotically with increasing size. The data suggested no statistically significant trial effect for the measures studied, except for APQ for the male group. The absence of trial effect in this experiment suggests that for normal speakers analysis of one sample of at least 80 pitch cycles may be sufficient to characterize the jitter and shimmer in their sustained vowels. However, more than one sample may be necessary to adequately characterize vocal perturbations in disordered speakers. Concerning the significant trial effect for APQ in the male group, it is possible to attribute this effect to the specific male subjects sampled. However, Table 5 shows a trend toward reduction of shimmer from trial 1 to trial 3 for all three measures of shimmer for the males. Furthermore, if we assume that both male and female participants in this experiment were relatively more vocally unsteady (e.g., due to nervousness) during the first recording, then one

TABLE 6. A N O V A results f o r perturbation measures extracted f r o m 80-cycle samples Sex Measure

Trial

Interaction

F(I,8)

p

F(2,16)

p

F(2,16)

p

dB shimmer AVI APQ Trial 1 Trial 2 Trial 3

1.26 0.55 2.80 4.35 2.63 0.85

0,29 0.48 0,13 0,07 0.14 0.38

3.38 0.37 4.13 . . .

0.06 0.69 0.04 a

2.37 1.54 4.49

0.12 0.24 0.03 a

RAP PVI JR Ab solute jitter

3.69 0.64 2.07 5.90

0.09 0,45 0.19 0,04 a

3.55 0.09 3.12 2.34

1.59 0.84 1.71 0.57

0.23 0.45 0.21 0.58

. . .

Trial effect APQ M F RAP, M and F a

0.053 0.91 0.07 0.13

. . .

Newman-Keuls results F(2,8) = 5.95, p = 0.03 F(2,8) = 0.36, p = 0.71 F(2,18) =..3.33, p = 0.06

Statistically significant. See Table 1 for list of abbreviations. M, male; F, female.

Journal of Voice, Vol. 7, No. 4, 1993

. . .

Trial 1 different from trial 3 (p = 0.05) ---

FACTORS AFFECTING VOCAL PERTURBATIONS

may present the following speculation. Lowfrequency vocal amplitude tremors (4--6 Hz) could more effectively influence APQ and the male (lower pitch) group than the other shimmer measures or the female (higher pitch) group. This is because APQ is calculated by comparing each cycle's peakto-peak amplitude to the average peak-to-peak amplitude over an l 1-cycle smoothing window surrounding the current cycle. In our case, 11 cycles would encompass - 1 0 0 ms for the male speakers, but only 50 ms for the females. On the other hand, a 4--6 Hz amplitude modulation (or tremor) would have a period of 250-166 ms. Therefore, potential difference between the average window amplitude and current cycle amplitude (mid-window point) would be greater if the averaging is over 100 ms rather than 50 ms. In a sense, long smoothing windows serve to filter out short-term variations (higher-frequency variations), but may be more susceptible to the influence of low-frequency modulations. To summarize, if it is assumed that slow vocal amplitude variations are independent of pitch frequency, then their effect on smoothing window amplitude average would be a function of fo. Namely, for a window composed of a fixed number of pitch cycles, longer fundamental periods imply longer time windows and consequently different window amplitude averages. The data showed no statistically significant gender effect, except for absolute jitter. Absolute jitter was smaller for females. However, analysis of absolute jitter values for all subjects showed a significant negative correlation (r = -0.75) between absolute jitter value and fo. It can be suggested that absolute jitter may in fact be a function of f0 (decreasing with increasing frequency) rather than gender per se. The results for JR, PVI, and RAP showed different patterns of change with increasing sample size for males and females. For both JR and RAP, the males showed decreasing perturbation with increasing sample size (Fig. 1B and 3B). In contrast, the females demonstrated increasing perturbation with increasing sample size (Fig. 1A and 3A). Therefore, within the range of measurements obtained in this experiment, the diverging trends of JR and RAP for males and females with increasing sample size suggest that comparison of perturbation values between genders may depend critically on the window size. Also, if the optimum sample size is considered to be the one that would minimize the amount of perturbation (12), then for JR and RAP different

335

optimal sample sizes might be found for males and females. On the other hand, PVI values for windows of 60-100 cycles decreased for males (Fig. 2B), but continued to increase for females (Fig. 2A). One reason for the discrepancy may be the fact that the average f0 for females was twice that of males; therefore, the average total time for 100 pitch periods for females was equal to the average total time for 50 pitch periods for males. Thus, it is possible that for investigation of temporal behavior of perturbation measures we may need to study a proportionately larger number of pitch cycles for higherpitched subjects. Also, when comparing malefemale perturbation measures derived from a specific sample duration (not a specific number of cycles), the difference in fundamental periods may be noteworthy. For an average female, considerably more pitch cycles will exist within a specific token time. This will enhance the statistical effects of the central limits theorem for the typical female subjects. The male-female differences in the trend of perturbation behavior can be attributed to two related factors: (a) the voice source systems for males and females are physiologically different (23); and (b) if samples are measured in pitch periods, equal sample sizes for males and females will embrace unequal time intervals. This implies that the system steadiness is compared under different structural conditions and for different temporal durations. Although only normal speakers were studied, the present findings have some limited implications for clinical practice. One clinical concern is specifying where to begin sampling in the sustained vowel. Our results show that when analyses include ~<40 cycles, there will be some variance that is introduced by the specific cycle at which the analysis started. However, if larger (>80) sets of voice cycles are analyzed, the exact cycle at which analysis beings makes little difference. If similar principles operate for abnormal speakers, then, providing the sample size is adequate and onset/offset effects are avoided, the clinician can sample anywhere in the nominal " m i d d l e " of the sustained vowel. Of course, defining an adequate sample size for dysphonic speakers is not simple (12). Another clinical issue is determining how many trials to elicit. We did not find robust statistical indications of trial-totrial differences in the present study of normal speakers. However, for dysphonic speakers, it is likely that the vocal representativeness of a single trial will vary depending on etiology and pathophysJournal of Voice, Vol. 7, No. 4, 1993

336

M. JAFARI ET AL.

iology. Some patients may have trial stability similar t o the normal speakers we studied; if so, then one trial may be sufficient. Others may exhibit more variation of voice quality and require analysis of multiple trials to characterize their specific dysphonia. Another clinical question is which measure(s) of perturbation to use. Our study was not designed to directly address this issue. However, examination of the CVs shown in Table 4 suggests that PVI and AVI were both markedly more variable under the experimental conditions studied than were the others. All other factors being equal, the more stable measures of perturbation would be desirable clinically. Finally, for certain perturbation measures, there were dissimilar group trends for males and females as sample size was increased (see Figs. 1-6). This reiterates the need for increased data and understanding of gender differences in phonation. Future research may help determine whether separate sets of clinical normative values must be applied to male and female subjects. In general, the reliability and discriminating power of vocal perturbation measures may be enhanced by refining them using a stochastic-physiologic modeling approach for the vocal folds vibration patterns. In this way, normal random variations, as well as neurologic perturbations, may be separated from other components. Partitioning of the total perturbation into its subcomponents allows a more detailed investigation of the effects of various experimental conditions and vocal disorders on voice. Hillenbrand (24) investigated the sequential characteristics of naturally occurring voice signals (vowel /a/) obtained from normal subjects. From inspection of the Fourier transform and autocorrelation functions derived from the pitch and amplitude data, he suggested that these sequences were similar, but not identical to the 1/f distributions described by Mandelbrot (25). Furthermore, Titze (8) has pioneered the way for developing techniques to separate neurologic jitter from other sources of irregularity in vocal-fold vibration. More effort is needed to better understand the nature of voice perturbations, incorporation of sequential properties of voice signals, and development of standard measurement and analysis procedures that would promote validity, reliability, and discriminating power. Acknowledgment: This work was supported in part by the Rehabilitation Research and Development Service, Project C468-R, Department of Veteran Affairs, Washington, D.C. Journal of Voice, Vol. 7, No. 4, 1993

REFERENCES 1. Titze IR, Horii Y, Scherer RC. Some technical considerations in voice perturbation measurements. J Speech Hear Res 1987;30:252-60. 2. Baken RJ. Clinical measurement o f speech and voice. Boston: College-Hill Press, 1987. 3. Wendahl RW. Some parameters of auditory roughness. Folia Phoniatr (Basel) 1966a;18:26-32. 4. Wendahl RW. Laryngeal analog synthesis of jitter and shimmer auditory parameters of harshness. Folia Phoniatr (Basel) 1966b;18:98-108. 5. Takahashi H, Koike Y. Some perceptual dimensions and acoustical correlates of pathologic voices. Acta Otolaryngol Suppl (Stockh) 1975;338:1-24. 6. Hillenbrand J. Perception of aperiodicities in synthetically generated voices. J Acoust Soc A m 1988;83:2361-71. 7. Pinto NB, Titze IR. Unification of perturbation measures in speech signals. J Acoust Soc Am 1990;87:1278-89. 8. Titze IR. A model for neurologic sources of aperiodicity in vocal fold vibration. J Speech Hear Res 1991;34:460-72. 9. Ludlow CL, Bassich CJ, Conner NP, Coulter DC, Lee YJ. The validity of using phonatory jitter and shimmer to detect laryngeal pathology. In: Baer T., Sasaki C., Harris K., eds. Laryngeal function in phonation and respiration. Boston: College-Hill Press, 1987:492-508. 10. Doherty ET, Shipp T. Tape recorder effects on jitter and shimmer extraction. J Speech Hear Res 1988;31:485-90. 11. Deem JF, Manning WH, Knack JV, Matesich JS. The automatic extraction of pitch perturbation using microcomputers: some methodological considerations. J Speech Hear Res 1989;32:689-97. 12. Karnell MP. Laryngeal perturbation analysis: minimum length of analysis window. J Speech Hear Res 1991 ;34:54448. 13. Jafari M. Acoustic characterization and modeling of human neuromotor speech performance toward pathology pattern recognition. [Ph.D. dissertation] University of Texas at Arlington/University of Texas Southwestern Medical Center at Dallas, 1989. 14. Schoentegen J. Jitter in sustained vowels and isolated sentences produced by dysphonic speakers. Speech Communication 1989;8:61-79. 15. Lieberman P. Some acoustic measures of the fundamental periodicity of normal and pathologic larynges. J Acoust Soc A m 1963;35:344-53. 16. Beckett RL. Pitch perturbation as a function of subjective vocal constriction. Folia Phoniatr (Basel) 1969;21:416--25. 17. Koike Y. Application of some acoustic measures for the evaluation of laryngeal dysfunction. Stud Phonol 1973 ;7:1723. 18. Horii Y. Fundamental frequency perturbation observed in sustained phonation. J Speech Hear Res 1979;22:5-19. 19. Horii Y. Vocal shimmer in sustained phonation. J Speech Hear Res 1980;23:202-9. 20. Deal RE, Emanuel FW. Some waveform and spectral features of vowel roughness. J Speech Hear Res 1978;21:250--64. 21. Horii Y. Some statistical characteristics of voice fundamental frequency. J Speech Hear Res 1975;18:192-201. 22. Till JA. Computer-assisted speech evaluation: rationale and direction for the future. J Comput Users Speech Hear 1990; 6:134-48. 23. Titze IR. Physiologic and acoustic differences between male and female voices. J Acoust Soc A m 1989;85:1699-707. 24. Hillenbrand J. A methodological study of perturbation and additive noise in synthetically generated voice signals. J Speech Hear Res 1987;30:448-61. 25. Mandelbrot B. Thefractal geometry o f nature. San Francisco: Freeman, 1983.