Journal of Phonetics (1986) 14, 483- 488
Classification of voice qualities* J. Wendler, A. Rauhut and H. Kruger Phoniatrische Abteilung der HNO-Klinik, Bereich Medizin (Charite) der Humboldt-Universitiit, 1040 Berlin, Schumannstr. 20/21 , D.D.R.
A total of 473 samples of normal and pathological voices were classified by a group of experienced listeners according to three factors of the Isshiki scale (degree of hoarseness, roughness, breathiness, four point grading) . The same material was processed by long-term average spectral analysis (LTAS). The results were related to each other using multivariate variance and discriminant analyses and error estimating procedures regarding the share of correct reclassifications (based on the spectral data) into the subjective classes. Probability of correct reclassification reached a level of c. 50%. Cluster analyses (four clusters) of the spectral data with no primary reference to the auditive classification demonstrated that each of the clusters covered samples from several auditive classes. Auditory reevaluation by a second series of listening tests showed results of a very high level of concordance compared to the first tests. Reclassification programs regarding the clusters reached a probability of correctness ranging from 77 to 85%. Thus, LTAS data can be separately structuralized and may help to confirm classifications of individual clinicians. For the time being, groups of listeners reveal the most reliable classification of voice qualities.
1. Introduction
Investigations by Fmkjaer-Jensen & Prytz (1976), Fritzell & Hammarberg (1977) and Gauffin & Sundberg (1977) encouraged us to try LT AS data along with auditive assessments to improve and to objectify our voice diagnostics. Our own first experiments, indeed, came out rather promising. Both LTAS data (Wendler, Doherty & Hollien, 1980b) and auditive classifications (Wendler & Anders, 1980a; Anders, Hollien, Sonninen, Hurme & Wendler, 1982) seemed to provide us with useful, clinically applicable measures. However, with increasing experience we became aware that-regarding LTAS-we had overestimated a preliminary study, and that, as Hirano (1981) put it, "further extensive basis and clinical research is required in order to obtain some algorithm for diagnostic purposes" . 2. Material and methods
2.1. Auditive classification Four hundred and seventy-three voice recordings of HIFI quality formed the basis for our investigations. The samples included normal as well as pathological voices of 311 *Dedicated to Prof. Dr. med. Peter Biesalski on the occasion of his 70th birthday. 0095-4470/86/030483
+
06 $03.00/0
© 1986 Academic Press Inc. (London) Ltd.
J. Wendler , A. Rauhut & H. Kruger
484
I. Auditive classification: R, roughness; B, breathiness; H, hoarseness; 0, normal; 1, slight; 2, moderate; 3, extreme
TABLE
Classification
0
2
3
Males (n = 162)
R B H
34 96 49
50 36 35
51 22 47
27 8 31
Females (n = 311)
R B H
66 199 97
156 74 110
74 29 75
15 9 29
female and 162 male subjects. The recordings were presented via loudspeaker and judged by five experienced listeners (three phoniatricians and two logopedists) using a fourpoint grading with regard to degree of hoarseness (H) as a quantitative measure and degree of roughness (R) and breathiness (B) as qualitative assessments, with roughness corresponding to irregular vibrations and breathiness to incomplete closure of the vocal folds, 0 indicating normality, 1 slight, 2 moderate, and 3 extreme deviation as used in the Japanese scales (Isshiki & Takeuchi, 1970; GRBAS-scale). Table I shows the results. Part of the samples (60 female and 38 male voices) were auditively reevaluated by the same group, about 2 years after the first listening experiments. The two sets of auditive data were used for an appraisal of the reliability of this method of classification. 2.2. LTAS Tape recordings of ongoing speech (standard text "Kronungstag" by H. Kant) were analysed by means of a real time analyser (EZA 01012 VEB MeJ3elektronik "Otto Schon" Dresden), using 25 1/3 octave filters in the area of 63Hz to 12.5 kHz in combination with an averager NT A 512. Every 0.6 s, 200 sweeps of 25 .6 ms duration each were taken (analysing time of 2 min). One analysis was carried out with the unmanipulated continuous signal, another with the voiceless consonants eliminated. As there were no significant differences with regard to these two variants, we generally based all further evaluations on the data from the continuous signals. 3. Statistical approaches and results
The spectral data (females separated from males) were treated statistically in two different ways: (a) with reference to auditive judgments (multivariate variance and discriminant analyses, reclassification into the auditive classes) and (b) with no reference , to auditive judgements (cluster analysis, formation of four clusters, reclassification according to clusters). 3.1. Reclassification into auditive classes The results obtained from multivariate variance and discriminant analyses (Ahrens & Lauter, 1974) of the spectral data were used for reclassification of the individual subjects into one of the four classes (0, 1, 2, 3) of the auditive categories H, Rand B. As statistical procedure for the reclassification, the so-called pi-method, a very critical error estimating method, was used (Toussaint & Sharpe, 1975; Wernecke & Kalb, 1983). Table II demonstrates one example of the results (H , males).
Classification of voice qualities
485
II . Probability of correct reclassification: four auditive classes related to spectral data, males, n = 157
TABLE
H
0
0 I 2 3
50.0 39.47 17.78 0
40.91 26.32 15.56 6.67
2
3
9.09 28.95 53.33 36.67
0 5.26 13 .33 56.67
III. Probability of correct reclassification, 4 clusters, spectral data only, females , n = 306
TABLE
I 2 3 4
84.72 9.43 8.62 13 .73
2
3
4
3.47 77.36 8.62 1.96
6.94 11.32 82.76 0
4.86 1.89 0 84.31
The statistical probability, by chance, of correct reclassification would have been 25%. Even that low a value can be found in this table. For the other classes, the probability was about twice as high as chance, i.e. about 50%. Most of the misclassifications were placed into neighbouring classes. This is true both for males and females and also for the categories R and B. This means that there seem to be some relations between spectral voice structures as expressed in LT AS data and auditive impressions of voice qualities. But these relations are not strong enough to be of use for clinical purposes in voice diagnostics. 3.2. Reclassification into clusters Cluster analysis generally aims at an optimal structuring of a great number of elements by construction of homogeneous groups (Spath, 1975; Steinhausen & Langer, 1977). The clusters required should only consist of similar elements whereas elements of different groupings should be as unlike as possible. This method exclusively follows mathematicstatistical and heuristic principles with no reference to any other criteria. Two cluster analyses were carried out resulting in four clusters each for both all female and all male subjects according only to their LT AS characteristics. This was done to find out if the spectral data can be structuralized separately (with no regard to auditive classifications). Then variance and discriminant analyses and a reclassification program regarding the 4 clusters based on the LT AS data (also pi-method for error estimation) were performed again. The results for the females are shown in Table III with a probability of correct reclassification ranging from 77 to 85%. We can see that, using LT AS data, reclassification into objective clusters turned out to be much better than reclassification into subjective auditive groups. The same relations were found for the male subjects. 3.3. Cluster comparison regarding auditive components In a further step we looked at the auditive classes gathered in the four clusters. Generally, the subjects forming one particular cluster came from different auditive classes. Table IV
486
J. Wendler, A. Rauhut & H. Kruger
3
2. 5 2
1.5
0.5
n
r
a(%)
60 0 .69 1:1 (/) (/)
~
A
0.64 0 . 83
15 15
0.39 / 0.74 0.1 ~~//
~....
-
0.5
I
15 15
--'
u
0.1
/
/
//
0.1
/
/
/
/
/
R
~R,R•
/~ / /
/
v-~------
---- R3
/
/
a(%)
60 0 . 89
0.1
2
0.5
0
2
3
AUO. CLASS I
Figure I. Relations between first (AUD. CLASS. I) and second (AUD. CLASS. II) auditive classification (auditive classes 0, I , 2, 3) expressed in correlation coefficients (r), a-values and linear regressions: females, clusters one to four. (a) Hoarseness: - - , H, all subjects; broken lines, H 1, H 2 , H 3 , H4 ---clusters 1- 4. (b) Roughness: - - , R, all subjects; broken lines, R 1, R 2 , R 3 , R4 ---clusters 1- 4. (c) Breathiness: - - , B, all subjects; broken lines, B1, B2 , B3 , B4 ---clusters 1- 4.
487
Classification of voice qualities
IV. Cluster to cluster comparison, 4 clusters, t-test, a:- values, females, n = 60 a: roughness, b: breathiness, c: hoarseness
TABLE
b
a R
2
4
3
B
I
0.1
I
2 3 4
I
2 3 4
-~
1.02 0.87
1.34 1.35
x
2
c 3
4
5 0.27 0.45 0.76 0.96
H
3
4
1
0.1
2 3 4
I
0.1 0.1
1.4
1.59
x
2
0.9
0.87
a--c show at the bottom the auditive mean values for H, R and B in each cluster calculated from the subjects constituting the particular clusters. We carried out a cluster to cluster comparison regarding H, Rand B components by means oft-tests. Significant differences are indicated in the same tables by ex-values. With respect to H, most of the clusters, except for the neighbouring 1-2 and 3-4, could be separated significantly. Concerning R the pairings 1-4, 2-3 and 2- 4 revealed evident differences. The B comparison came out as the worst, with only one meaningful differentiation (2-3). 3.4. Auditive re-evaluation
The two sets of auditive data mentioned in Section 2.1 were compared for the 60 female subjects by means of correlation coefficients regarding the values of the factors H, R and Bas represented in the group as a whole, and, additionally, as represented in the four clusters. Figure l(a)-(c) shows the results of the comparisons in terms of correlation coefficients, ex-values and linear regressions. For all factors, repeated auditive judgements revealed a high degree of conformity as far as the whole group is concerned (ex below 0.1% ). Considering, in addition, the results split into the four clusters, there was, obviously, some uncertainty in assessing the R component. For H and B, even in the clusters, the auditive classifications were very close in the two sets. This means that the discrepancies seen in the heterogeneous auditive groupings in the spectral data based clusters cannot be explained by charging auditive judgements of voice qualities with unreliability. On the contrary, for the time being there seems to be no better approach to clinical assessment of the voice sound than listening in groups. What has to be improved is the acoustic analysis. 4. Conclusions
The connections between LT AS data and auditive assessments of the voice, as known so far, are at present of no real use for clinical practice. But LT AS data of the voice in ongoing speech can be structuralized. Further investigations are advocated, at least for heuristic reasons. Auditive classifications considering H, R and B form a practicable basis of high reliability, in particular when they are carried out by groups of listeners. One goal in improving and objectifying voice diagnostics could be to determine acoustic measures which can serve as group substitutes to help the individual clinician making his diagnostic decisions. LT AS data can certainly be one of these measures.
488
J. Wendler, A. Rauhut & H. Kruger References
Ahrens, H. & Lauter, J. (1974) Mehrdimensionale Varianzanalyse. Berlin: Akademie-Verlag. Anders, L. Ch., Hollien, H., Sonninen, A., Hurme, P. & Wendler, J. (1982) Heiserkeitsperzeption durch Horer aus der DDR, den USA und Finnland. In Sprechwirkungsforschung, Sprecherziehung, Phonetik und Phonetikunterricht (E. Stock, editor), pp. 21-28. Wissenschaftliche Beitrage der Martin-LutherUniversitat Halle 55 (F 40). Fritzell, B. & Hammarberg, B. (1977) Clinical applications of acoustic voice analysis. Background and perceptual factors. In Proceedings I 7th Congress of the International Association of Logopedics and Phoniatrics, I, pp. 477- 487. Copenhagen. Fmkjaer-Jensen, B. & Prytz, S. (1976) Registration of voice quality, Bruel and Kjaer Technical Review, 3, 3- 17. Gauffin, J. & Sundberg, J. (1977) Clinical applications of voice acoustics. Acoustical analysis, results and discussion. In Proceedings 17th Congress of the International Association of Logopedics and Phoniatrics, I, pp. 489-502. Copenhagen. Hirano, M. ( 1981) Clinical examination of voice. In Disorders of human communication, vol. 5 (G. E. Arnold, F. Winckel & B. D . Wyke, editors) . Wien, New York: Springer-Verlag. Isshiki, N. & Takeuchi, Y. (1970) Factor analysis of hoa rseness, Studia phonologica, 5, 37- 44. Spath, H. ( 1975) Cluster-Analyse-Aigorithmen zur Objektlassifizierung. Miinchen. Steinhausen, D. & Langer, K. (1977) Clusteranalysis. New York: Walter de Gruyter. Toussaint, G. T. & Sharpe, P. M. (1975) An efficient method for estimating probability of misclassification applied to a problem in medical diagnosis, Computers in Biology and Medicine, 4, 178-269. Wendler, J. & Anders, L. Ch. (1980a) Subjektive und objektive Kriterien in der phoniatrischen Stimmdiagnostik. In Proceedings 18th Congress of the International Association of Logopedics and Phoniatrics, I, pp. 61 - 65. Wendler, J. , Doherty, E. T. & Hollien, H . (1980b) Voice classification by means of long-term speech spectra. Folia phoniatrica, 32, 51 - 60. Wernecke, K .-D. & Kalb, G . (1983) Further results in estimating the classification error in discriminance analysis, Biomedical Journal, 25, 247- 258.