Acoustic characteristics of rough voice: Subharmonics

Acoustic characteristics of rough voice: Subharmonics

Journal of Voice Vol. I1, No. I, pp. 40--47 © 1997Lippincott-RavenPublishers, Philadelphia Acoustic Characteristics of Rough Voice" Subharmonics *Koi...

490KB Sizes 42 Downloads 114 Views

Journal of Voice Vol. I1, No. I, pp. 40--47 © 1997Lippincott-RavenPublishers, Philadelphia

Acoustic Characteristics of Rough Voice" Subharmonics *Koichi Omori, tHisayoshi Kojima, *Rajesh Kakani, *David H. Slavit, and *Stanley M. Blaugrund *Ames Vocal Dynamics Laboratory, Lenox Hill Hospital, New York, New York, U.S.A.; and "~Department of Otolaryngology, Kyoto Universit3, Hospital, Kyoto, Japan

Summary: This study investigates the relationship between rough voice and the presence of subharmonics, which correspond to smaller yet distinct peaks located between two consecutive harmonic peaks in the power spectrum. Spectrum analysis was undertaken in 389 pathologic voices, of which 20 had subharmonics. Although all 20 voices had roughness perceptually, 8 had normal jitter and/or shimmer. The degree of roughness had a significant inverse relationship with the frequency of subharmonics. By digital signal processing, sound samples with various types of subharmonics were synthesized and perceptually analyzed. Power and frequency of subharmonics in the synthesized sound also had significant relationships with the degree of roughness. Rough voice is acoustically characterized not only by jitter and shimmer but also by the presence of subharmonics in the power spectrum. Subharmonics are important acoustic properties for objective evaluation of rough voices. Key Words: Rough voice--Spectrum analysis--Subharmonics--Sound synthesis-Perceptual analysis.

Hoarseness in a vowel sound originates either from fluctuation of vocal fold vibration or from turbulent air flow at the glottis. Rough voice corresponds to irregular vocal fold vibration, whereas breathy voice corresponds to turbulent noise arising from the glottis. A soft swelling or a mass imbalance of the vocal fold is usually the cause of a rough voice. Jitter and shimmer have been generally used as a parameter of roughness (I-4). However, Omori et al. (5) reported on the presence of a different acoustic abnormality in some pathologic voices with roughness, especially in cases of laryngeal polyp or polypoid degeneration. They documented lower peaks between two consecutive harmonic peaks in the power spectrum. These lower peaks in the power spectrum are termed " s u b h a r m o n i c s . "

Titze (6) described that subharmonic f r e q u e n c y sometimes appears in asymmetric vocal fold vibrations, which is evidence that periodicity is achieved only across every second or third cycle. This study investigates the relationship between rough voice and the existence of subharmonics in the power spectrum. In the first part of the study, the existence of subharmonics was determined by examining the power spectrum in various pathologic voices. In the second part of the study, sound samples having subharmonics in the power spectrum with no jitter and shimmer were synthesized and perceptually analyzed to ascertain whether subharmonics affect the perception of roughness.

Accepted February 14, 1996. Address correspondence and reprint requests to Dr. Koichi Omori, Department of Otolaryngology, Kyoto University Hospital, 54 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto 606, Japan. This paper was presented at the 23rd Annual Symposium: Care of the Professional Voice, Philadelphia, June 10, 1994.

Spectrum analysis We studied 389 subjects with pathologic voices. The distribution by sex and age is shown in Table I. Each person was asked to phonate a vowel sound / a / a t a comfortable speaking level. A B & K 4006

SUBJECTS AND METHODS

40

ROUGHNESS: S U B H A R M O N I C S TABLE I. Subjects Age

Male

Female

0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89

2 I 21 29 32 32 43 17 8

0 2 2I 40 41 4I 28 27 4

Total

185

204

condenser microphone and Sony PCM2500A digital audio recorder (DAT) were used for data collection. The recording environment consisted of a soundtreated booth. The distance from the microphone was set at 20 cm. The frequency response of the microphone was from 20 Hz to 40 kHz. After the data were collected, the sample was checked for peak clipping to ensure that an appropriate recording level had been used. The data were stored on an optical disc through a 16-bit resolution analog to digital converter at a sampling rate of 44.1 kHz. Data acquisition was done on a Macintosh IIx computer (Apple C o m p u t e r Inc., Cupertino, CA,

41

U.S.A.) with Sound Designer II (Digidesign Inc., Menlo Park, CA, U.S.A.) sound editing software. DAT had a completely flat frequency responses up to 20 kHz. A steady 185.8-msec portion containing 8,192 data points of each sustained vowel/a/was selected for analysis. The power spectrum of the acoustic signal was then calculated by fast Fourier transform using a Harming window. All peaks in the power spectrum between 0 Hz and 3 kHz were identified. Fundamental frequency (frequency of high peaks) and existence, number, and frequency of subharmonics were determined. When the frequency interval between the lower peak and the adjacent high harmonic peak was either one half, one third, or another multiple of the frequency interval between two consecutive high peaks of the harmonics, the lower peak was accepted as a subharmonic. The number of subharmonics was counted between two consecutive high peaks of the harmonics. Frequency of subharmonics was defined by the frequency interval between a high peak of a harmonic and the adjacent lower peak o f a subharmonic. In an example of a power spectrum of a rough voice, Fig. l(A) shows that the fundamental frequency is 235 Hz, number of subharmonics is one, and frequency

Oda]

::ol

B

V

t'7

-30

(A)

-40 -50

-60 -70 -80 -90

-100

1

0

(B)

qA

AA^AA^

2

3 kHz

AA^AA^AA^AA^AA^

k 4.26 m s e c FIG. 1. A: Power spectrum of case 18. Black arrow heads indicate subharmonic peaks. White arrow heads indicate original harmonic peaks. B: Waveform of case 18. One cycle (4.26 msec) corresponds to Fo of 235 Hz. Journal of Voice, Vol. II, No. I, 1997

42

K. O M O R I ET AL.

of the subharmonic is ll8 Hz. In the waveform of Fig. l(B), fundamental frequency is also defined as 235 Hz because one cycle of the waveform is 4.26 msec. Acoustic analysis of jitter and shimmer was also undertaken based on previous reports (1-3). Jitter and shimmer were calculated from the acoustic signals and expressed as the percentage of average differences of the pitch and the amplitude among 50 successive cycles relative to the average pitch and amplitude, respectively. All the acoustic analyses were performed using software written by the members of the Ames Vocal Dynamics Laboratory (7). Normative data of jitter and shimmer were obtained by the analysis of 50 male and 50 female volunteers without laryngeal diseases in the Ames Vocal Dynamics Laboratory, Lenox Hill Hospital. Stored voice samples that had subharmonics on power spectrum were played back through a 16-bit D/A converter (Digidesign) and a Roommate II speaker with frequency response from 50 Hz to 15 kHz (Bose Co., Boston, MA, U.S.A.). These voice samples were perceptually judged by five experienced listeners (2 otolaryngologists, 3 speech pathologists). Each voice sample was perceived twice. Roughness was scored 0, 1, 2, or 3 with 3 being the most rough and 0 being normal. In each voice sample, roughness scores of five listeners were averaged and classified into four categories, from score 0 through 3.

Sound synthesis and perceptual analysis To confirm that subharmonics affect perception of roughness, artificial sound s a m p l e s / a / t h a t had subharmonics but zero jitter and shimmer were synthesized with a Macintosh IIcx computer (Apple Computer). Predefined power spectrums were transferred to sound waves by inverse fast Fourier transform using Alchemy software (Passport Designs, Inc., Half Moon Bay, CA, U.S.A.). Sampling rate was set at 10 kHz. Each sound wave contained 5,000 points (500 msec). Phase of waveform was not fixed for each synthesized sound. Synthesized sound samples were played back through a 16-bit D/A converter and a Bose Roommate II speaker (Bose). Degree of roughness was judged by the five listeners using the previously described method of perceptual analysis. Two different methods were undertaken for synthesizing sound samples. In method 1, the power of the subharmonics was varied while the frequency of the subharmonics was fixed at 75 Hz. The acoustic Journal of Voice, Vol. 11, No. 1, 1997

signal had a fundamental frequency of 150 Hz and included nine harmonics. At the 4th and 7th harmonics (750 Hz and 1,200 Hz), the power was set 10 db higher than the other harmonics to produce the first and the second formant of vowel sound/a/. Fig. 2 shows the power spectrums of synthesized sound samples in method I. Test number l had no subharmonics. Test numbers 2, 3, 4, and 5 had subharmonics with the power levels of - 2 0 , - 15, - 10, and - 5 db of the original harmonics, respectively. In method 2, the frequency of subharmonics was varied while the power of subharmonics was fixed at - 10 db of the power level of the original hat-

OdB] (1)

-,ot i

-20 -30

0

0.5

"1

1.5 (kHz)

OdB]

-10] -20 (2)

-30 I I I I I I I I I I o o.5 "1 1.5 (kHz)

:::1 (3)

(4)

-20 t

- - 0I I I I0.'5 I I I I I I1

;:tl I I I I II 0

0.'5

1

1.5 (kHz)

1.5 (kHz)

(5)

0

II1[ IIII 0.5

"i

1.5 (kHz)

FIG. 2. Predefined power spectrum of synthesized sounds in method 1.

ROUGHNESS: SUBHARMONICS monics. Fundamental frequency was set at 150 Hz with its nine harmonics. At the 4th and 7th harmonics (750 Hz and 1,200 Hz), the power was set I0 db higher than that of the other harmonics. The frequency of the subharmonics was 75 Hz or 50 Hz. Figure 3 shows power spectrums of synthesized sound samples in method 2. Test number 1 had no subharmonics. Test number 2 had 75 Hz subharmonics. The small peaks occur once between high peaks. Test number 3 had 50 Hz subharmonics. The small peaks occur twice between high peaks. RESULTS

Spectrum analysis Subharmonics were identified in 20 cases of 389 pathologic voices using spectrum analysis. The sex, age, and diagnosis for the 20 cases are shown in Table 2. In the 20 voices, fundamental frequency ranged from 101 Hz to 272 Hz and frequency of subharmonics ranged from 51 Hz to 132 Hz (Table 3). All 20 voices that had subharmonics were judged as rough voices by perceptual analysis.

OdB1 (1)

-20 -30 0

(2)

0.'5

1

1.5 (kHz)

::t111111 Iill 0

0.5

1

:::1

1.5 (kHz)

]

::tllIlilll,II,i,i,ll,I,II, 0

0.5

1

1.5 (kHz)

FIG. 3. Predefined power spectrum of synthesized sounds in method 2.

43

T A B L E 2. P a t i e n t s w h o s e voices h a d s u b h a r m o n i c s in power spectrum Case no.

Sex

Age

Diagnosis

I

M F F F M F F F F M M M F F F F M F M M

36 56 84 66 55 61 66 45 67 55 66 81 37 38 43 37 46 50 58 20

polyp polyp polyp polyp polyp atrophy nodule polyp polyp cyst polyp atrophy polyp polyp paralysis polyp polyp paralysis polyp paralysis

2 3 4 5 6 7 8 9 10 Ii 12 13 14 15 16 17 18 19 20

M, male; F, female.

In 19 of the 20 cases, 1 subharmonic was identified between two consecutive high peaks of the harmonics. In these cases, the frequency of the subharmonic was exactly half of the fundamental frequency. In case 16, two subharmonics were identified between two consecutive harmonic peaks. Figure 4 shows the power spectrum and acoustic waveform of case 16. In the spectrum display, the fundamental frequency is 272 Hz and the frequency of subharmonics is 91 Hz (see Fig. 4(A)). In the waveform of Fig. 4(B), fundamental frequency is also calculated as 272 Hz because one cycle of waveform was 3.68 msec. Looking further at the original waveforms of the 20 cases with subharmonics, periodic modulation of peak amplitude was identified in four cases (10, 11, 12, 15). Figure 5 shows the power spectrum and acoustic waveform in case 11. In Fig. 5(A), the power spectrum shows that the frequency interval between two consecutive high peaks is 144 Hz, as indicated by the white arrow heads. The frequency interval between a high peak and the adjacent low peak (indicated by black arrow head) is 72 Hz. Based on our definition of subharmonics, its frequency is 72 Hz. In Fig. 5(B), the acoustic waveform shape is similar every 6.94 msec, which corresponds to a frequency of 144 Hz. However, the largest amplitude peaks (black arrow heads) occur every other cycle and the time between these maximal peaks is 13.89 msec, corresponding to a frequency of 72 Hz, This amplitude modulation seen in Journal of Voice, Vol. I1, No. 1, 1997

44

K. O M O R I E T A L . T A B L E 3. Results o f acoustic and perceptual analysis in cases

with subharmonics Case no.

Fundamental frequency (Hz)

I 2 3 4 5 6 7 8 9 I0 11 12 13 14 15 16 17 18 19 20

128 115 132 240 129 207 168 218 188 156 144 140 161 232 265 272 101 235 122 172

Subharmonics Number

Frequency (Hz)

Roughness score

Jitter (%)

Shimmer (%)

64 57 66 120 65 103 84 109 94 78 72 70 81 116 132 91 51 118 61 86

2 2 2 I 2 I 2 I 2 3 3 3 2 2 I 2 2 2 2 2

0.82 0.34* 1.46 0.49* 2.05 7.00 1.17 2. l I 0.91 1.12 0.76 5.26 1.54 1.22 3.64 0.53* 0.45* 0.51" 2.12 0.85

3.78" 5.78 I 1.37 1.61" 7.16 32.06 8.14 3.78" 7.14 7.28 5.70 12.28 7.80 14.84 3.24* 2.51" 3.54* 2.05* 6.33 5.83

Normal: Jitter: 0.15---0.53% (male) 0.07--0.69% (female). Shimmer: 0.45--4.53% (male) 0.69--4.52% (female). * Within normal range.

the acoustic waveform corresponds to the harmonics and subharmonics seen in the power spectrum. Acoustic analysis of jitter and shimmer was undertaken in the 20 cases who had subharmonics in spectrum analysis. Of the 20 cases, 5 had normal jitter and 7 had normal shimmer (Table 3). Numbers with an asterisk in Table 3 indicate values for jitter and shimmer within normal range. Figure 6 shows the relationship between the degree of roughness and frequency of subharmonics. In cases with the frequency of subharmonics <100 Hz, roughness scores were 2 or 3. Analysis of variance (ANOVA) showed a statistically significant inverse relationship between frequency of subharmonics and roughness scores (p < 0.01). Sound synthesis and perceptual analysis Table 4 shows roughness scores perceived by five listeners in method I. As the power of the subharmonics increased, roughness was perceived to be worse by all listeners. ANOVA showed a significant relationship between roughness scores and power of subharmonics (p < 0.01). Table 5 shows perceptual evaluation scores in method 2. As the frequency of subharmonics became lower, roughness scores became worse. ANOVA showed a significant inverse relationship between roughness scores and frequency of subharmonics (p < 0.01). Journal of Voice, Vol. 11, No. I, 1997

DISCUSSION Rough voice has been acoustically characterized by analysis of jitter and shimmer in a sustained vowel sound (1-4). However, the present study showed several pathologic rough voices with normal jitter and shimmer. In these patients, subharmonics were seen in spectrum analysis, although jitter and shimmer of those voices were within normal range. Furthermore, synthesized sound samples demonstrated that acoustic signals with subharmonics and zero jitter and shimmer were perceived as rough. We conclude, therefore, that jitter and shimmer do not always represent rough voice and roughness can manifest with the presence of subharmonics in spectrum analysis. In an experimental study using high-speed motion film, Tanabe et al. (8) reported that vibratory patterns obtained in the case of a glottal chink added to a tension imbalance were complex dicrotic, tricrotic, or other quasiperiodic types with less phaselag between the two vocal folds. The glottis may close only momentarily. Vibratory patterns under this condition are quite unstable and sensitive to subglottic pressure. Voice is rough in quality or may sometimes sound diplophonic. Titze (6) described that period and amplitude sometime alternate between two states in asymmetric vocal fold

ROUGHNESS: SUBHARMONICS

45

0de[ V

V

V

-30 -40

(A)

-50 -60

-70 -80 -90 -100

1

0

2

3 kHz

\~/~ ~/~ /~A/~/~ ~/~A/~~/~v~/v /~/~ /~/~/~ ~/v~/v~v~/v~/v~/v~/v v~v~v

(B)

F~

3.68 msec FIG. 4. A: Power spectrum o f case 16. Black arrow heads indicate subharmonic peaks. White arrow heads indicate original harmonic

peaks. B: Waveform of case 16. One cycle (3.68 msec) corresponds to Fo of 272 Hz.

0dB -10 -20 -30

(A)

-40 -50 -60 -70 -80 -90 -100

0

1

2

3 kHz

^ A AA~.AA^A^ A A~A.A A~A^ A A~A_A AAA^ A

"Vv

.94 rnsec 13.89 msec

d

FIG. 5. A: Power spectrum of case 11. White arrow heads indicate the high peaks with regular frequency interval. Black arrow heads indicate subharmonic peaks. B: Waveform of case I 1. Large amplitude (black arrow heads) and small amplitude (white arrow heads) repeated alternatively. Journal of Voice, Vol. I1, No. I, 1997

46

K. O M O R I E T A L .

140

T A B L E 5. R o u g h n e s s s c o r e s in m e t h o d 2

Hz 0

0 "• 0

0

120

E

"ca e-

100

l '~

80

0

60

£: (9

40

¢9

20

o

0 0

0 0 0

T

Ik,,,

IL

O

I

I

I

I

0

1

2

3

Roughness scores FIG. 6. Roughness scores and frequency of subharmonics.

vibrations. This qualitative change in the behavior of a system is known as "bifurcation." In the analysis of correlogram, Imaizumi (9) reported that voice samples possessing multiplicative variations or modulations over several pitch periods were perceived as rough. In our present study, there was periodic amplitude modulation of waveforms in some cases that had subharmonics in the power spectrum. Frequency of subharmonics was either half, one third, or possibly another multiple of fundamental frequency. These findings suggest that subharmonics in spectrum analysis may correspond to complex dicrotic or tricrotic vibration of the vocal folds. The relationship between roughness and low pitch has been studied by several authors (4,10,1 I). Wendahl (4) demonstrated a relationship between T A B L E 4. R o u g h n e s s s c o r e s in m e t h o d I Sound sample (power of subharmonics) No. No. No. No. No.

I (None) 2 (-20 db) 3 (- 15 db) 4 (- 10 db) 5 (-5 db)

Sound sample (frequency of subharmonics)

Listeners A

B

C

D

E

No. 1 (None) No. 2 (75 Hz) No. 3 (50 Hz)

! 1 2

1 2 3

0 I 2

0 i 2

0 I 2

the pitch of an audible, nonspeech signal and its perceived roughness. Emanuel and Smith (I0) reported that perceived vowel roughness decreased as the individual subjects raised their fundamental frequency. In the analysis of signal-to-noise (S/N) ratio, Kojima (ll) described a correlation between roughness and the low frequency noise component. In our present study, the frequency of the subharmonic had a significant inverse relationship with degree of roughness in both the analysis of pathological voices and synthesized sounds. These results suggest that the low frequency sounds of subharmonics are causing the perceptual judgement of roughness. This study demonstrates that rough voice is acoustically characterized by the presence of subharmonics in power spectrum as well as jitter and shimmer analysis of fundamental frequency. Even in cases with normal jitter and shimmer, subharmonics analysis provides an objective evaluation of rough voices. Spectrum analysis evaluating harmonics and subharmonics is recommended when acoustic assessment of rough voices is required. Spectrum analysis using fast Fourier transform is currently a standard technique undertaken in voice research laboratories. When subharmonics exist between consecutive harmonics peaks, it is important to observe the presence of subharmonics. The specific frequency and relative power of the subharmonics should be analyzed. Roughness is identified in many varieties of dysphonia such as polyp, cyst, nodule, polypoid degeneration, atrophy, and paralysis of the vocal fold. Subharmonic analysis can therefore be used in a variety of clinical settings.

Listeners A

B

C

D

E

I I 2 2 2

1 3 3 3 3

0 0 I 2 2

0 1 1 1 2

0 0 0 I 2

Journal of Voice, 1/ol. 11, No. 1, 1997

CONCLUSIONS Rough voice is acoustically characterized not only by jitter and shimmer but by the presence of subharmonics in the power spectrum. Power and frequency of subharmonics had a significant rela-

ROUGHNESS: SUBHARMONICS

tionship with the degree of roughness. Subharmonic analysis in the power spectrum provides an objective parameter for evaluating rough voices. REFERENCES i. Lieberman P. Some acoustic measures of the fundamental periodicity of normal and pathologic larynges. J Acoust Soc Am 1963;35:344-53. 2. Kitajima K, Gould W. Vocal shimmer in sustained phonation of normal and pathological voice. Ann Otol Rhinol Laryngol 1976;85:377-81. 3. Heiberger VL, Horii Y. Jitter and shimmer in sustained phonation. In: Lass N J, ed. Speech and language: advances in basic research and practice, vol 7. New York: Academic Press, 1982:299-332. 4. Wendahl RW. Laryngeal analog synthesis of jitter and shimmer auditory parameters of harshness. Folia Phoniatr 1966; 18:98-108.

47

5. Omori K, Kojima H, Fujita S, Nonomura M. Maximum entropy spectral analysis of hoarse voice. Jpn J Logop Phoniatr 1991 ;32:255--60. 6. Titze RT. Fluctuations and perturbations in vocal output. In: Titze RT, ed. Principles of voice production. New Jersey: Prentice Hall, 1994:279-306. 7. Shoji K, Regenbogen E, Yu JD, Blaugrund SM. High frequency power ratio of breathy voice. Laryngoscope 1992; 102:267-71. 8. Tanabe M. Effect of asymmetrical tension on the voice and vibratory pattern of the vocal cords. Practica Otologica (Kyoto) 1976;69:67-88. 9. Imaizumi S. Acoustic measures of pathologic voice qualit i e s - R O U G H . Ann Bull Research Institute of Logopedics and Phoniatrics Univ Tokyo 1985;19:179-90. 10. Emanuel FW, Smith WF. Pitch effects on vowel roughness and spectral noise. Journal of Phonetics 1974;2:247-53. I I. Kojima H. Objective evaluation of hoarseness and voice quality. Practica Otologica (Kyoto) 1986;79:1149--66.

Journal of Voice, Vol. II, No. I, 1997