Journal of Phonetics (1986) 14, 489- 492
Acoustic, perceptual and clinical studies of normal and dysphonic voice Pertti Hurme Department of Communication, University of Jyviiskylii, 40100 Jyviiskylii, Finland
and Aatto Sonninen Department of Logopedics, University of Jyviiskylii, 40100 Jyviiskylii, Finland
Voice research in Jyviiskylii during the last few years is reviewed . Some results from a number of studies are presented in terms of the variables investigated: background, clinical, perceptual and acoustic characteristics, and the relations between them. Acoustic studies of voice have been carried out by means of long-term average spectrum (LT AS) analysis.
1. Introduction
The present article surveys recent work by an interdisciplinary team at the University of Jyvaskyla and at the Central Hospital of Central Finland on normal and dysphonic voice. The human voice and its quality can be investigated at various levels of observation (cf. Sonninen, 1970). At least five levels can be distinguished: (1) anatomical constraints in the voice and speech production system (e.g. vocal nodules, cleft palate), (2) long-term habitual adjustments of the voice and speech production system (in extreme cases, functional voice disorders), (3) subjective impression of one's own voice, (4) acoustic speech signal, including the long-term properties typical of a speaker, and (5) subjective impression of another person's voice, i.e. the auditory colouring characteristic of a speaker. It seems to the present authors that a multi-level approach to the study of voice could be highly fruitful. A summary of our voice studies, arranged on the basis of the variables investigated, is given below. The variables investigated include background information (e.g. smoking), clinical examination (e.g. vocal fold imbalance), perceptual ratings (e.g. breathiness, roughness) as well as acoustic parameters (e.g. long-term average speech spectra). 2. Results and discussion
2.1. Background and clinical variables In a study of 124 normal and dysphonic individuals, Sonninen & Hurme (1982) investigated various background and clinical variables. To give an example of the results, 0095-4470/8 6/030489
+
04 $03.00/0
© 1986 Academic Press Inc. (London) Ltd.
490
P. Hurme and A. Sonninen
statistical analyses indicated a relation between tobacco smoking and physical exhaustion (background variables) and pathological surface of vocal folds and glottal (vocal fold) imbalance (clinical variables). 2.2. Perceptual variables To elucidate the structure of Finnish voice terminology , a factor analysis of a number of terms used by non-specialists to describe normal and dysphonic voices was carried out (Sonninen, Lehtonen & Hurme, I 982). The following five factors emerged: goodness (general quality), nasality, hoarseness, darkness, tenseness. The terminology used to describe nasality is reviewed by Sonninen & Lehtonen (1982b). Attempts have been made to investigate language/culture differences in rating voice samples. In the first study, German samples were rated by Finnish, German and American subjects (Anders, Hollien, Sonninen, Hurme & Wendler, 1982). The 40 samples had been collected from both normal and dysphonic individuals. No significant differences turned up in the ratings of phoniatriciansjspeech therapists as compared to non-specialists. On the other hand, the judgements varied according to the nationality of the subjects: on the average , the Finns judged the samples as more hoarse than the Germans, whereas the Americans found the voices Jess hoarse than the Germans. In another study, by Hurme & Sonninen (1985a), Finnish speech samples were rated by Finns and Germans using a modified GRBAS-scale (Isshiki & Takeuchi, 1970; cf. Hirano, 1981). Comparing the results obtained in the general quality ratings (G-scale) with those of Anders et a!. (1982) a contradiction is apparent: the Germans rated the German samples (Anders et a!., 1982) as less poor (better) than the Finns, but the Germans rated the Finnish samples (Hurme & Sonninen, I 985a) as poorer than the Finns. This discrepancy might result from differences in the perception of voice quality in the native language as compared with a foreign language. However, the observed language differences should by no means be regarded as established. Even though both the Finns and the Germans used the GRBAS-scale, there is no guarantee that the scales were used identically by the Germans and the Finns. 2.3. A coustic variables The use of long-term average spectrum (LTAS) analysis of speech has been explored in several studies. H urme ( 1980) showed that the sound pressure level of speech has a decisive effect on the average spectra: the fundamental frequency tends to dominate the spectra of samples produced with low SPL, whereas the three or four harmonics above F 0 have approximately the same intensity as F 0 in the spectra of samples produced with high SPL. Hurme & Pirinen (1984) examined variation in LTAS. Among the sources of variation, that within individuals (reading the same text 10 times) was assessed by computing an average spectrum (and standard deviation) of the 10 average spectra accumulated for each production. Intra-individual differences were observed to be moderate in comparison with inter-individual differences, as expected. However, standard deviations in the average spectra for each individual were typically of the order of 2- 3 dB, and in some frequency areas up to 5- 6 dB. It was concluded that intra-individual differences in LT AS are considerable, but LT AS were more influenced by the following three factors: inter-individual differences, the SPL of speech (cf. above) and duration of speech sample.
Studies of normal and dysphonic voice
491
An attempt has been made to develop a voice field analyser for clinical use (Sonninen
& Lehtonen, 1982a; Sonninen, Hurme, Toivonen & Vilkman, 1985). A voice field (an amplitude/F 0 description of connected speech) is formed in real time on the basis of cycle-to-cycle analysis of the speech signal with a micro-computer (EXORset 33). Our purpose is to use the voice field program in routine work to evaluate the results of speech therapy. 2.4. Perceptual and acoustic variables
The acoustic correlates of perceived nasality have been reviewed by Kyttii & Hurme (1982) and Sonninen & Lehtonen (1982b). The acoustic correlates of perceived breathiness and roughness have been investigated by Hurme & Sonninen (1985a). In this study, speech samples from 40 individuals were rated on the GRBAS-scale and analysed by means of LTAS (cf. above) . The average spectra were normalized by setting the highest intensity peak (invariably in the 0-1 kHz area) as the reference and adjusting the spectra accordingly. In comparison with voices rated as normal or only slightly breathy, voices rated as extremely breathy show more energy above circa 2.5 kHz, but less below that frequency (i.e. in the region between F 0 and 2.5 kHz). The former correlate is apparently related to the "turbulence component" present in (at least some) breathy voices (cf. Isshiki, Kitajima, Kojima & Harita, 1978), and the latter to the "weakness component" [weak harmonics (especially the second) above F 0 , cf. Ladefoged & AntofianzasBarroso, 1985] present in (at least some) breathy voices. Voices rated as extremely rough have on the average more high-frequency energy (above c. 3kHz) than voices rated as normal or only slightly rough. However, perturbation measures are probably better adapted for the description of rough voices than spectral measures (cf. e.g. the article by Imaizumi, this issue, p. 457). 2.5. Clinical and acoustic variables Groups of individuals with certain voice disorders (diagnosis groups) have also been investigated by means of LTAS analysis. In a study of 124 normal and dysphonic voices (Sonninen & H urme, 1982) some preliminary acoustic differences were observed between several diagnostic groups. In a study of 40 normal and dysphonic voices (Hurme & Sonninen, 1985a) some diagnostic groups show characteristic acoustic patterns: (1) Laryngeal cancer vs. normal groups differ acoustically in two main areas: the former group shows more high-frequency energy (above c. 4kHz) and the latter shows more energy in the 0.5- l.OkHz area; (2) vocal fold paralysis vs. normal groups differ mainly in that the former group has less energy in the region above F 0 (cf. Hurme & Sonninen, 1985b). In other words, the voices of the individuals with vocal fold paralysis are dominated by F 0 . 3. Conclusion The border-line between normal and dysphonic voice quality is a recurring problem in our research endeavours. A normal voice is not necessarily a good voice, but neither does normal mean the same as "natural" in the sense of unrestricted variation possibilities. What is normal is constrained by culture and language. What is pathological in one language, may be utilized to signal a linguistic distinction in another (cf. Ladefoged & Antofianzas-Barroso, 1985).
492
P. Hurme and A. Sonninen
The results of the acoustic analyses reported above have been collected by means of LT AS analysis only. The use of a wide variety of acoustic methods, including perturbation measures, appears mandatory. LT AS analysis in voice research has been duly criticized, for example in the papers by Kitzing, LOfqvist and Wendler (this issue). Our attempts at comparing the evaluations of normal and dysphonic voices across languages (Anders et al., 1982; Hurme & Sonninen, 1985a) are problematic. The use of a common scale (e.g. GRBAS) does not guarantee that these parameters mean the same to a German, an American or a Finn. We believe that the terminology should be standardized at least to some extent (cf. Hammar berg, 1985) on the basis of physiological investigations of phonation and of acoustic descriptions and perceptual ratings of normal and dysphonic voices (cf. Boves, 1984). In this work, international cooperation is essential. References Anders, L., Hollien, H., Sonninen, A., Hurme, P. & Wendler, 1. (1982) Heiserkeitsperzeption durch Harer aus der DDR, den USA and Finnland. In Sprechwirkungsforschung, Sprecherziehung, Phonetik und Phonetik-unterricht (E. Stock, editor), pp. 21-28, Wissenschaftliche Beitriige, 55. Halle-Wittenberg: Martin-Luther-Universitiit. Boves, L. (I 984) The phonetic basis of perceptual ratings of running speech, Dordrecht: Foris. Hammarberg, B. (1985) Clinical routines for the perceptual-acoustic assessment of dysphonia, Phoniatric & Logopedic Progress Report, vol. 4, pp. 14-29. Stockholm: Huddinge University Hospital. Hirano, M. (1981) Clinical examination of voice. Wien: Springer. Hurme, P. (1980) Auto-monitored speech level and average speech spectrum. In Voice, speech and language: reports and reviews (P. Hurme, editor), pp. 119- 127; Papers in Speech Research, vol. 2, Finland: University of 1yviiskylii. Hurme, P. & Pirinen, M. (1984) Keskiarvospektrien vaihtelusta (summary: Assessing variation in LTAS of speech). In Papers from the twelfth meeting of Finnish phoneticians-Joensuu 1984 (U. Ikonen & T. Tikka, editors), pp. 23- 36; Studies in Language, vol. I, Finland: University of 1oensuu. Hurme, P. & Sonninen, A. (1985a) Normal and disordered voice quality: listening tests and long-term spectrum analyses. In Papers in speech research, vol. 6 (P. Hurme, editor), pp. 49- 72. Finland: Department of Communication, University of 1yviiskylii. Hurme, P. & Sonninen, A. (1985b) Acoustic correlates of pathological voices: individuals with paralysis of recurrent nerve. In Papers in speech research, vol. 6 (P. Hurme, editor), pp. 73- 78. Finland: Department of Communication, University of 1yviiskylii. Kyttii, 1. & Hurme, P. (1982) Acoustical aspects of nasality. In Vox humana. Studies presented to Aallo Sonninen on the occasion of his sixtieth birthday, December 24, 1982 (P. Hurme, editor), pp. 203-211; Papers in speech research, vol. 5. Finland: University of 1yviiskylii. Isshiki, N. & Takeuchi, Y. (1970) Factor analysis of hoarseness. Studia Phonologica, 5, 37-44. Kyoto. Ladefoged, P. & Antoiianzas-Barroso, N. (1985) Computer measures of breathy voice quality, Working Papers in Phonetics (University of California, Los Angeles), 61 , 79- 86. Sonninen, A. (1970) Phoniatric viewpoints on hoarseness, Acta Otolaryngologica, 236, 68-81. Sonninen, A. & Hurme, P. (I 982) Clinical and acoustic observations of normal and hoarse voices. In Phonetic aspects of normal and pathological speech (P. Hurme, editor), pp. 1- 16; Papers in speech research, val. 4. Finland: University of 1yviiskylii. Sonninen, A., Hurme, P., Toivonen, R. & Vilkman, E. (1985) Computer voice fields of connected speech. In Papers in speech research, vol. 6 (P. Hurme, editor), pp. 93- 112. Finland: Department of Communication, University of 1yviiskylii. Sonninen, A. & Lehtonen, 1. (1982a) Aiinikenttiimittausten sovelluksista. In Foliafennistica & linguistica (P. Sirvi6, editor), pp. 311 - 321. Finland: Department of Finnish Language and General Linguistics, University of Tampere. Sonninen, A. & Lehtonen, 1. ( 1982b) Diagnosis of nasality. In Phonetic aspects of normal and pathological speech (P. Hurme, editor), pp. 27-35; Papers in speech research, val. 4. Finland: University of 1yviiskylii. Sonninen, A. , Lehtonen, 1. & Hurme, P. (1982) Perzeptuelle und akustische Analyse der Heiserkeit. In Phonetic aspects of normal and pathological speech (P. Hurme, editor), pp. 17- 26; Papers in speech research, val. 4. Finland: University of 1yviiskylii.