Journalof Voice
V01.2, No. 2, pp. 127-131 © 1988Raven Press, Ltd., New York
Acoustic Analysis of Hoarse Voice: A Preliminary Report Tatsuya Fukazawa, Stanley M. Blaugrund, Ashraf E1-Assuooty, and Wilbur J. Gould Vocal Dynamics Laboratory, Lenox Hill Hospital, New York, New York, U.S.A.
Summary: Two quantitative parameters of hoarse voice quality (breathiness and strain) were introduced in an attempt to identify more objectively hoarse voice qualities. These two parameters plus jitter were used in the analysis of hoarseness in 39 patients. A significant preliminary finding in this study indicates that high breathiness with high strain suggests the presence of laryngeal carcinoma. Key Words: Hoarseness--Auditory impression--Acoustic analysis.
For the purpose of our study, hoarse voice is defined as a pathological voice that has a "jarring" impression to the ear and has, acoustically, certain specific noise components. Pathological changes in the vocal cord can result from a variety of diseases such as laryngeal cancer, recurrent nerve palsy (RNP), and vocal cord polyps, to name a few. These disease entities are associated with characteristic voice qualities (1). To identify hoarse voice characteristics more accurately, the Japanese Society of Logopedics and Phoniatrics has adopted "rough," "breathy," "strained," and "asthenic" as their classification for the auditory impression of hoarseness, as shown in Table 1. These characteristics are defined as follow: "Rough" is defined as an auditory impression made by irregular vibration of the vocal cord (2). A soft swelling or a mass imbalance of the vocal cord is usually the cause of a rough voice; vocal cord polyps are a typical example. Two important parameters for the evaluation of this auditory impression are pitch perturbation or "jitter" and amplitude perturbation or "shimmer." Pitch perturbation is used in this study as one of the parameters to analyze hoarseness. " B r e a t h y " is an auditory impression resulting from turbulent noise that arises within glottic air
flow that exceeds the critical Reynolds number (2). The presence of any disease that results in the formation of a glottic chink during phonation can produce a breathy impression, unilateral RNP being a typical example. "Strained" voice is associated with abnormally high tension of the vocal cord and has acoustically strong harmonics. A strained impression is found clinically in patients with laryngeal cancer and spastic dysphonia. On the contrary, "asthenic" voice is associated with low tension of the vocal cord and an auditory impression of vocal faintness. It occurs typically in cases of myasthenia gravis and RNP with a large gap. This classification is usually applied with the following rating system: (Ri, Bj, S~, Al; 0 <= i, j, k, l <3; 0 = normal, 1 = slight, 2 = fair, 3 = extreme), e.g., (R1, B 3, $2, A0). In most cases, a well-trained speech scientist may subjectively suspect a specific abnormality from the auditory impression of hoarseness. The purpose of this study, however, is to establish quantitative parameters of voice qualities and to assess their applicability in the analysis of hoarse voices. METHOD
Address correspondence and reprint requests to Dr. T. Fukazawa at Vocal Dynamics Laboratory, Lenox Hill Hospital, N e w York, NY 10021, U.S.A.
The definition of pitch perturbation in this article is the same as that that has been used by many re-
127
128
T. F U K A Z A WA E T AL.
T A B L E l. R, B, S, A classification o f voice qualities
(Japanese Society o f Logopedics and Phoniatrics) Auditory impressions
Acoustic features
Rough Breathy Strained Asthenic
Jitter and/or shimmer Turbulent noise Strong harmonics Weak harmonics
s e a r c h e r s , i.e., the a v e r a g e o f c y c l e - t o - c y c l e change in f r e q u e n c y divided by the average frequency (3,4). The extraction of pitch was done by an algorithm that detects zero crossing points after the glottic closure. The quantification of breathy and strained needs a detailed explanation since there has been no consensus on these parameters. For analysis of voice qualities, it is useful to observe three aspects of a voice. Figure 1 shows three waves sampled from the same part of a normal voice with different sampling rates and through different filter systems. The upper figure is a preemphasized voice wave by a high-pass filter with 6 dB/oct starting from 0 H z up to 10 k H z and a cutoff frequency of 10 kHz. The sampling rate is 20 kHz. High-frequency range augmented up to 10 k H z can be seen in this wave. The middle figure is the same voice wave sampled at a rate o f 10 k H z without filtering. The lower figure is the same voice wave processed through a low-pass filter with - 12 dB/oct, a corner frequency of 339 Hz, and a sampling rate o f 10 kHz. It demonstrates pitch waves. Figure 2 shows the three aspects of a voice wave of a patient who had a glottic gap during phonation. The auditory impression was breathy and laryngeal
BP= 35,21206
St"
3,67723
/ AH /
N0r~,~l
FIG. 1. Three aspects of a normal voice wave. Top: A preemphasized voice wave of 25.6 ms. Middle: a voice wave of 51.2 ms sampled by a microphone. Bottom: a low-pass filtered voice wave of 51.2 ms. Br, breathiness; St, strain.
Journal of Voice, Vol. 2, No. 2, 1988
Bt= 163,1~0 St= 6,93663
}AHI
GloRic
~p
W
FIG. 2. An example of a breathy voice coming from a glottic gap during phonation. Many spikes can be observed in the top figure. Top: A preemphasized voice wave of 25.6 ms. Middle: a voice wave of 51.2 ms sampled by a microphone. Bottom: a lowpass filtered voice wave of 51.2 ms. Br, breathiness; St, strain.
friction noise could be heard. In the upper figure, the turbulent noise appeared as spikes on the highpass filtered wave. This can be interpreted as a result of noise having a stronger spectral focus in the h i g h - f r e q u e n c y region than the vowel signal. By taking this observation into account, we defined a parameter called " b r e a t h i n e s s " (Br) as shown in Fig. 3 (left). Mathematically, breathiness is defined as follows: Br = 2[f(tj) - 2f(tj_l) + f(tj_2)] 2 x 100 Y~[f(tj)] 2
where f(t) is the preemphasized voice wave. The n u m e r a t o r is p r o p o r t i o n a l to the e n e r g y of the second derivative o f f ( t ) and the denominator is the energy of f(t). Since the second-order differentiation is the same as a high-pass filter of + 12 dB/oct, this parameter becomes large when there are many spikes on the voice wave. Figure 4 is an example of a strained voice that was p r o d u c e d deliberately by one of the authors with strained glottis and high subglottic pressure. Strong h a r m o n i c s are seen in the middle figure, which reflects the high tension of the glottis. In comparison with Fig. 1, it is clear that in a strained voice higher harmonics b e c o m e stronger than in a normal voice. Therefore, we have defined a parameter called " s t r a i n " as the ratio between the energy of the voice wave and the low-pass filtered wave (see Fig. 3, right). This parameter becomes large in a strained voice that has strong harmonics. A sustained v o w e l / a / w i t h a conversational intensity and pitch was r e c o r d e d in a s o u n d p r o o f
ACOUSTIC A N A L Y S I S OF H O A R S E VOICE
129
f(t) VOICE
~-~
y]f (t)2
T:f"(t)2 - ~ B K - ~ f*(t)2
6dB/oct.
L~-Ef (t) 2 - - j
"--'Zi
(t) 2 Xl00
St. VOICE
~
= 7Of (t)2
Zg(t)2
g(t) Ng(t)2~ fo=339Hz
FIG. 3. Left: Breathiness (Br) is defined as the ratio of energy between f ' and f. f is a preemphasized voice wave and f ' represents f ( t ) - 2f(ti_ 1) + f(tj_2). Right: Strain (St) is defined as the ratio of energy between f and g. f is a voice wave and g is a low-pass filtered (LPF) wave off. HPF, high-pass filter.
booth for the later analysis. A mouth-to-microphone distance of ~ 15 cm was maintained. A 12-bit A/D converter was utilized and processing of the voice data was performed by a microcomputer. The lengths of the samples for calculation of jitter, breathiness, and strainedness were 256, 25.6, and 51.2 ms, respectively. The hoarse voices of 39 patients [11 polyps, 18 RNPs, and 10 laryngeal cancers] were analyzed for pitch perturbation, breathiness, and strain, respectively, prior to therapy. Cancer cases consisted of 1 T1, 5 T2, and 4 T3 in T N M classification of glottic cancer. Significance of the difference b e t w e e n means of each parameter in the three pathological groups was tested using the t test. Voices of 24 normal subjects (12 men and 12 women) were analyzed as controls. RESULTS AND DISCUSSION Figure 5 demonstrates the results of pitch perturbation analysis in which polyps, RNPs, and laryngeal cancers appear separately. The two broken
?3,84425
Bm-" St" 1?,14338
/ A H /
lines show mean + SD in the control study (2 = 0.887, SD = 0.325) and the area between them can be regarded as normal. The ranges of distribution of the three pathological disease entities are noted. However, among these three pathologies, RNPs do not exceed 5.0 and more than half of these patients fall within the normal or seminormal range. Mean pitch perturbation in cancer cases is statistically higher than that in RNP cases at the 0.05 level, though the overlapping is large. Figure 6 shows the result of the breathiness analysis. The two broken lines are mean + SD (2 = 27.0, SD = 15.0) in the control study. Mean values of the breathiness in polyp, RNP, and cancer cases are 138.3 (SD = 79.35), 264.0 (SD = 186.84), and 440.8 (SD = 197.97), respectively. Almost all patients have a larger breathiness value than the mean _+ SD of the control group. Statistically, the mean of the breathiness in 39 patients (X = 273.9) is significantly larger than that of the 24 normal subjects at the 0.01 level. Thus, breathiness is a sensitive PITCH
PERTURBATION N--~lz~I F i Fi+d xlO0
St~i~a
~>:Fi
lo
g~
;;;
o 5
o o
o o
o o
o o
...............
~.,, A,, ~'.., /": ~'.,,~,'"¢ "-. ,,N ~,% ,,A, "X ,""~'"~. ,'~" ""~ ,'~',~' "~ .~'; ,~.. i~:' FIG. 4. A strained voice produced with a deliberately strained glottis and high subglottic pressure. Strong harmonics appear in the middle figure. Top: A preemphasized voice wave of 25.6 ms. Middle: a voice wave of 51.2 ms sampled by a microphone. Bottom: a low-pass filtered voice wave of 51.2 ms. Br, breathiness; St, strain.
-._. . . . . ~ ........... 0
VOCAL CORD POLYP
~
.........
0_ .........
........
_o_. . . . . . . . .
RECURRENT LARYNGEAL NERVE CANCER PALSY
FIG. 5. Result of pitch perturbation analysis (jitter) of 31 patients. The two broken lines represent means _+ SD of 24 normal subjects, d show cases of more than 10.0 pitch perturbation.
Journal of Voice, Vol. 2, No. 2, 1988
130
T. F U K A Z A WA E T AL.
Br.
5%
o
10
F
7OO
8° 600
o
500 -
o o
400 -
o
M 0
0
0
•
8
'
300:
I 200
o
oo
o
1%
o
~
I
o
0
-
o
o
e
o8o O*
o
o
100
0000 ......
0
%
.........
..........
o_ . . . . . . . . . . .
VOCAL CORD POLYP
...........................................
VOCAL CORD POLYP
RECURRENT NERVE PALSY
LARYNGEAL CANCER
FIG. 6. Result of breathiness (Br) analysis of 31 patients. The two broken lines represent mean +_ SD of 24 normal subjects. Mean breathiness of polyp cases was significantly lower than that of the other two pathologies at the 0.01 level.
parameter for the detection of the three pathologies. It is noteworthy that polyp cases have a statistically lower mean value of breathiness than RNP or cancer cases (p = 0.025). This is in agreement with the fact that polyps contribute to the auditory impression of rough rather than breathy. Laryngeal cancers and RNPs usually give a breathy impression as one of their voice qualities. On the other hand, they have larger breathiness values in our study than do polyps. Since both cancer and RNP have large breathiness values, the introduction of strain, as a third parameter, will allow for differentiation in the analysis between the two pathologies. Figure 7 shows the result of strain analysis. On the right side of the figure, the mean _+ SD is shown separately for males and females in the control study. This parameter has apparent sexual differences in its distribution, probably because of pitch differences between males and females. Males are represented by black and females by white circles in the figure. Mean values of the strain in polyp, RNP, and cancer cases are 4.11 (SD = 1.565), 2.35 (SD = 0.958), and 7.33 (SD = 5.273), respectively. As can be seen, most cases fall into the normal area. However, useful features of this parameter are as follow: (a) half of the cancer cases exceed the normal limit; (b) mean Journal of Voice, Vol. 2, No. 2, 1988
RECURRENT NERVE PALSY
LARYNGEAL CANCER
FIG. 7. Result of strain (St) analysis of 31 patients. Mean +_ SD of 24 normal subjects are shown for males and females separately on the right side. Mean strain in recurrent nerve palsy cases is significantly lower than that in the other two pathologies at the 0.01 level.
strain in RNP cases is significantly smaller than that in the other groups at the 0.01 level. The first feature seems to reflect the strained quality of cancer voices and the second represents the asthenic quality of RNP voices. Each of the three parameters mentioned above has characteristic features. These features combined can be informative in the following two-dimensional analyses. Considering the high detectability of pathologies in the breathiness parameter, breathiness was always used as an axis. In Fig. 8, pitch perturbation is plotted against Br,
CANCER~ REC.N.PALSY~
700
POLYP~
••
m
600 500 400
0
U O
300 200 100 0
OZX • OA •
• 0 o
O
•
0 I
5
Jk~
I
I
I
I
I
10
I
I
P.P.
FIG, 8, Two-dimensional analysis using breathiness (Br) and pitch perturbation (PP) of 31 patients. Five cancer cases are found in the high breathiness and high pitch perturbation area.
ACO US TIC A N A L YSIS OF H O A R S E VOICE
CONCLUSION
B~
700 600 500
o
i 4O0 300
•
O
A
•
20O 100 0
131
n
~
~
~
5
n
~
~
n
I
1o
n
a
5t
FIG. 9. Two-dimensional analysis using breathiness (Br) and strain (St) of 31 patients. Six cancer cases are found in the high breathiness and high strain area.
breathiness. The rectangle of broken lines represents the normal area of mean _+ SD in the control study. It can be seen that vocal samples of laryngeal cancers and RNPs, both of which have high breathiness values, are partially separated by the statistical difference in pitch perturbation. Fifty percent of cancer cases fall in the high breathiness and high pitch perturbation area. In Fig. 9, strain is plotted against breathiness. In this figure, laryngeal cancers and RNPs are differentiated by the difference in strain, and 60% of the cancer cases are in the high breathiness and high strain area, apparently making a group. This result agrees with the well-known fact that laryngeal cancer usually has an auditory impression of breathy and strained. High breathiness plus high strain seems to suggest a presence of laryngeal cancer.
In this study, we have shown that acoustic analysis may c o m p l e m e n t auditory impression of hoarseness. A combination of high breathiness and high strain is often found in voice samples of laryngeal cancer subjects. In some cases of laryngeal cancer, it is difficult to visualize true cords by any means. In such cases, the acoustic analysis can provide a complementary source of information, enabling one to plan further examination such as direct laryngoscopy. The applications of these quantitative parameters are useful not only in the differential diagnosis of cancer but also in the assessment of phonosurgery and speech therapy. This is a most promising application because our parameters can analyze many aspects of pathological voice qualities such as turbulent noise, vocal cord tension, and irregular vibration. Acknowledgment: A part of this work was done at Kyoto University. T. Fukazawa is greatly indebted to Professor Iwao Honjo for his support to his research there. REFERENCES 1. Isshiki N, Okamura H, Tanabe M, Morimoto M. Differential diagnosis of horarseness. Folia Phoniatr 1969;21:9-19. 2. Isshiki N, Yanagihara N, Morimoto M. Approach to the objective diagnosis of hoarseness. Folia Phoniatr 1966;18: 393 -400. 3. Raining LA, Scherer RC, Titze IR. Transcripts o f the 14th symposium: care o f the professional voice. New York: Voice Foundation, 1985:131-8. 4. Ringel RL, Chodzko-Zajko WJ. Vocal indices of biological age. J Voice 1987;1:31-7.
Journal of Voice, Vol. 2, No. 2, 1988