Computer Methods and Programs in Biomedicine, 31(1992) 127- 135 0 1992 Elsevier Science Publishers B.V. All rights reserved 0169-2607/92/$05.00
COMMET
127
01262
Section II. Systems and programs
Computer aided evaluation of phonetograms F. Klingholz Department of Phoniatrics, Unk~ersityof Munich, Munich, Germany
The phonetogram represents an area limited range. Phonetograms are used in the diagnosis of single parameters, phonetograms have computer program divided the phonetogram The ellipse parameters were used to evaluate comments. Voice analysis;
Voice disorder;
Acoustic
by piano and forte contours of the sound pressure levels along the vocal of voice status. Apart from reference phonetograms and the extraction not been evaluated quantitatively in medical practice. The present into subareas which were approximated by simple patterns (ellipses). voice efficiency, to recognize voice categories, and to derive diagnostic
analysis;
Phonetogram
1. Introduction
In the diagnosis of voice function, methods of a very different nature are available. However, most methods show drawbacks opposed to medical practice. For instance, electromyography needs anesthesia of the larynx and measures only the activity of some selected laryngeal muscles. In electro- and photoglottography, parameters describing glottal movement were not comparable between subjects. Aerodynamic measurements (pressure-flow characteristic) are invasive procedures, they perturbate the phonation process. High-speed photography or ultrasound scanning of the moving vocal folds are very complicated and expensive. Therefore, two types of methods have been developed: the stroboscopic observation of the vocal fold vibration, and acoustic voice analysis. The stroboscopic picture only allows a subjective and qualitative judgment of the voice
Correspondence: F. Klingholz, Department of Phoniatrics, University of Munich. Pettenkoferstr. 4a, D-8000 Munich 2, Germany.
production. Acoustic data, however, are of quantitative nature, and the subjects are not affected by the measurements. For about 10 years, in European phoniatrics, the registration of phonetograms has widely been used to characterize voice production. The phonetogram (Fig. 1) represents the sound pressure level (SPL) and the fundamental frequency (F) of sustained vowels produced as loudly (forte) and as softly (piano) as possible along the vocal range. Therefore, the phonetogram represents an area limited by the piano and forte contours. The quantitative evaluation of this two-dimensional curve, however, en-
0.2 FIJDA\IENTAL Fig.
0.4
0.6
0.8
1.0
FREQUENCY, kHz
1. Phonetogram
of a mezzo-soprano.
1.2
128
counters difficulties. The present paper aims at a solution of the problem by means of an approximation to the phonetogram by simple patterns. In phoniatrics, three groups of voices have to be distinguished: trained voices (singers), normal voices and pathological voices. In organic voice disorders, the voice physiology is perturbated, and the phonetograms are more or less degenerated. These phonetograms are not very helpful in the evaluation of voice, except in cases of voice therapy control. The phonetograms show their highest power in the description of singers’ voice, normal voice, and functional voice disorders (generally superelevated or lowered tension in the laryngeal muscles, pressed phonation).
2. Background 2.1. Measurement A convention by The Union of European Phoniatrists [l] specified the measurement procedure of phonetograms. The conditions were established as follows. (a> The measuring environment should show “living-room acoustics” with noise levels lower than 40 dB(A); (b) the lip-to-microphone distance was fixed at 30 cm; (c) the vowel [a:] is to be produced for at least 2 s; (d) the sound pressure level has to be measured in dB(A); (e) the voice production should be “physiologically” acceptable; and (f) the pitch should follow a tone scale (C-major>. There are two possibilities of measurements. In off-line measurement, a sound pressure level meter for the SPL and a tone generator (e.g. piano) to offer pitch are used, where the SPL and pitch data are manually listed. In on-line measurement, A/D converter, computer storage, and software for the determination of SPL and F are used. The second method has advantages: manual data input in the computer is avoided, and the acoustic signals are reproducibly stored, i.e., quality measurements can be performed to select sufficient voice productions (e.g. by the measurement of the signal-to-noise ratio). Nevertheless, both measurements yield the phonetographic contours of forte [F(n), SPL,(n)l and of piano [F(n), SPL,(n)], where F(n) is the fundamental
frequency, and n = 1, 2, 3,. . . is the order of tones. The lowest and highest produced tones were F(H,,,~,,) and F(n,,,), respectively. 2.2. Data processing Two variants of the processing of phonetogram data have been used. (1) Phonetograms of subjects without any laryngeal disorder have been normalized with respect to the vocal range and averaged [2-41. These standard phonetograms have been used as references for phonetograms to be to characterized. (21 Parameters have been extracted from phonetograms to evaluate the voice function [5-71. The parameters were vocal range: F(n,,,) - F(Y~,,,~,,),mean slope of the phonetogram, maximum and minimum of SPL&n) and of SPL,(n), and maximum dynamics D(n), where D(n) = SPL,(n) - SPL,(nl. Since voice production is of individual nature, the averaged phonetograms are not expected to reference voices sufficiently. On the other hand, the extraction of parameters as vocal range or maximum SPL from the phonetograms does not require the measurement of the total phonetogram. Any processing of phonetogram data only makes sense if it takes advantage of the two-dimensionality of the representation, as shown in the present paper. 2.3. Physiological model In different ranges of the phonetogram different phonatory modes are represented. In a rough approach, two laryngeal mechanisms should be regarded: phonation with high adductory activity (M. thyreoarytenoid, M. cricoarytenoid lateralis) in the low frequency range, and phonation with high activity of the tensor (M. cricothyroid) in the high frequency range. In the terms of voice teachers, these modes are called chest and head registers. The first modus rather dominates in forte than in piano and vice versa. Transitions between the mechanisms occur at medium pitch, where a mixed region (middle voice) exists. In forte and piano, the transitions show discontinuities which are represented by valleys in the forte contour and peaks in the piano contour (other discontinuities can arise due to an interac-
tion of the glottal configuration and the sub- and supralaryngeal cavities [8,91X Each mechanism might be considered as a mechanical system showing inefficiency at the lower and upper ends of its working range. At these limits, the inefficiency is indicated by minimum dynamics (Fig. 2). This means that an increased amount of energy dissipates in the mechanism, i. e., the maximum output is lowered (valley in forte), and a higher input power is needed to trigger the mechanism (peak in piano). It follows from Fig. 2 that the mechanisms represented by subareas of the phonetogram could be approximated by ellipses, where the ellipse parameters describe the phonetogram and evaluate it quantitatively. The three ellipses would represent chest, middle, and head voices. Although this model is very simple, it approximates most phonetograms sufficiently. Nevertheless, one should bear in mind that this description is only a rough approximation to the physiological background. However, as shown in Sections 4.3
and 5, the procedure rather aims practice than at a scientific approach mental research.
at medical for funda-
3. System description The program was written in TurboC and run on a Compaq 386 computer. The program read the phonetogram data by keyboard input; additionally, the program needed information on the subject’s sex and whether the subject was a singer or an non-singer. The phonetogram data as well as the ellipse parameters were stored. The program output could be regenerated from the stored data at any time. The processing time took maximally 3 min for the analysis of a new voice. The reproduction of program output of voices already analyzed amounted to some few seconds. The output medium was a LaserJet. Fig. 3 shows a block diagram of the program.
4. Program description 2 %
120
t
g
g
2
100
“1
*h*** * *
I
*“**
*
*
**
*
*
The data processing included four steps: location of the transitions between the subareas of the phonetogram; design of the ellipses; transformation of ellipse parameters to phonatory parameters, and utilization of the ellipse parameters for the characterization of voices.
v
*
y* * *
& 60 2 2 40 /,.
*
$1 ’ *
60
**
*
*
**
L ..~~
__
0.2
0..4
0.6
0.8
1.0
I.2
FIJNI)AMENTAI, FREQ1JENCY, kIlz
FI~Nl),\hlEV:\I, Fig. 2. Top: minimum
FREQt’EN(Y. and maximum
togram of Fig. I; bottom:
kllz dynamics
approximation of Fig. 1.
in the phone-
to the phonetogram
4. I. Location of transitions Candidates for the transitions between the subareas were minima of SPL&n), maxima of SPL,(n), and minima of D(n). If such a minimum resp. maximum had a relative high magnitude and was close to a member of a set of reference transitions, it was chosen as a transition. The frequency coordinates of the transitions were F(n t) for the lower transition and F(n?) for the upper transition. Because the determination of the transitions showed no absolute reliability, the operator could manually modify the transitions according to his own experience (see Section 4.2).
1 YES
STORAGE
NO 17 APPROXIMATION Variation of Ellipse Parameters with Minimization of Meon Square Devlatlon and Test of Restrlctlon Conditions
01 PHONETOGRAM DATA 02 VALUES OF ELLIPSE PARAPMETERS 03 REFERENCE FREQUENCIES OF TRANSITIONS
04 REFERENCE DATA OF EFFICIENCY 05 COEFFICIENTS OF LINEAR COMBINATIONS
06
COEFFICIENTS OF DISCRIMINANT FUNCTIONS
/Efi Fig. 3. Block diagram
of the phonetogram
program.
131
I
I
Fig. 4. Division Fig.
of the phonetogram
into subareas (compare
Ellipses with the rotation 0” resp. 90” were embedded into the boxes. While rotating the ellipses (vertexes of the ellipses touched the box walls), the mean square deviation was computed between the SPL,-(n) resp. SPL,(n) and the related ordinates of the ellipses for each box. The rotations were stopped for minimum mean square deviation. This ellipse configuration was considered as a first approximation to the phonetogram (Fig. 4). At this stage, the ellipse configuration and the phonetogram data were plotted on the monitor. The frequency coordinates of the transitions could be modified by a mouse-controlled cursor creating a new version of the ellipse configuration. The five ellipse parameters (main axis, secondary axis, rotation, x and y coordinates of the central point) were varied to lower the mean square deviation for improving the first approximation. In this procedure, the value of each parameter was set to P - 6P, P, P + 6P, where P was the value of the parameter determined in the first approximation stage, and SP/P amounted to 10%. Each ellipse was treated separately, i.e. 5’ modifications per ellipse were tested for minimum mean square deviation. The following restrictions were set to avoid excessive growing of the ellipses:
I); top: boxes; middle: ellipses with 0” resp. 90” rotation: bottom: ellipses with optimal rotation.
4.2. Design of ellipses Three boxes were constructed. Their vertical walls were located at the following frequency coordinates: box 1: [ F(n “,,,, - 1) + F(n.,,,)1/2 VXn,) +F(n, + 1)1/2, box 2: [F(n, - 1) + F(n,)1/2 [F(nz) + F(n, + 1)1/2, box 3: [ F(n, - 1) + F(n,)]/2 [Hn,;,,) + F(n,;,, + 1)1/2.
and and and
The horizontal walls were located at the minima of SPL,(n) and the maxima of SPL,(n) between these frequency coordinates (see Fig. 4).
(a) The frequency
coordinates of the intersection points of the ellipses must approximate the frequency coordinates of the transitions. (b) The frequency
coordinate of the left vertex of the first ellipse must range between [ F(n,i, - 1) + F(n,i,)]/2 and F(n,i,). The frequency coordinate of the right vertex of the third ellipse must range between F(n,,,) and [F(n”,;,,) + F(n,,;,, + 1)1/2.
(c)The
frequency coordinates of the left and right vertexes of the ellipses must not exceed the frequency coordinates of the central points of their neighbouring ellipses.
Cd) The distance between the sound pressure coordinates of the upper vertexes of the ellipses
132
and the maximum sound pressure subareas must not exceed 3 dB(A).
levels
in the
The procedure could be repeated to improve the approximation. However, when the mean deviation was lower than 3 dB(A), the program output did not depend on the quality of the approximation. 4.3. Transformation of ellipse parameter-s to phonatory parameters The frequency coordinates of the left and right vertexes of the ellipses represented the vocal ranges of the subareas. The sound pressure coordinates of the upper and lower vertexes determined maximum and minimum sound pressure levels in the subareas. The difference between the sound pressure coordinates in the central points of the ellipses represented the maximum dynamics in the subareas. 4.4. Utilization of ellipse parameteu The program printed phonetograms for visual inspection and judgment as well as for a document for the clinical record. The ellipse parameters were used as variables in a classification of voices or in a graduation of voice effects. In the case of singers, the ellipse parameters were used to recognize voice categories and technical problems in singing. The parameters were also used to evaluate the voice efficiency of singers and non-singers. All voices were inspected for their tendency to show functional voice disorders by means of the parameters. There are six adult voice 4.4.1. Voice categories. categories to be identified from phonetograms: bass, baritone, tenor, alto, mezzo-soprano, soprano. A finer subdivison of the voice categories could not be expected because other variables than fundamental frequency and sound pressure level (e.g. spectral characteristic) influence singing quality too. The classification of voices used linear discriminant analysis. In the learning phase, the discriminant functions were determined from ten reference singers per category, where the ellipse parameters frequency coordinates of the central points and vocal ranges of the subareas were
used. The reference singers were primarily classified by their teachers and the singers themselves. In the program, the degree of singers’ belonging to a category was evaluated by the posterior probability of classification. 4.4.2. Voice efficiency. The wider the frequency range, the higher the maximum dynamics (dynamics in the central point of the ellipse), and the higher the maximum SPL,(n) of a subarea the more efficient a voice is in this frequency range. The parameters were measured in male and female groups of subjects without voice disorders to create values of reference. The values of the parameters of a new voice were normalized relative to the reference data. The efficiency of the voice was then the mean of these ratios in percent. 4.4.3. Functional dysphonia. There are two types of functional dysphonia which can be described by clinical features (tension of the vocal folds, amplitudes and vertical phasing of the glottal oscillation, etc.). Multiple correlations were computed between scores of the clinical features and values of the ellipse parameters (Fig. 5) for the recognition of the disorder types from the phonetograms. Linear combinations of different parameter sets for male and female subjects discriminated the types. The values of the linear combinations evaluated the disorders as none, slightly, or strongly manifested (output information). 4.4.4. Organic dysphonia. In some cases of organic dysphonia, the phonetograms described the clinical phenomenon very well: shift of the phonetogram to lower frequencies in the virilization of a female voice, the absence of the lower subarea in the fistulous voice, and the absence of the upper subarea with laryngeal trauma.
Fig. 5. Parameters for the evaluation of functional voicr disorders. Male subjects: P,, P,, P4; female subjects: P,. P,, P4.
133
mean deviation between the SPL,(n) resp. SPL, (n) and the related ordinates of the ellipses are specified to give an impression of the quality of the approximation, Each voice was characterized by the voice effi-
5. Samples of program runs About 1000 subjects were analyzed by the phonetogram program. In this section, examples are given to demonstrate program outputs. The
401
I 1
I 0.2 FUNDAMENTAL
o.2 FUNDAMENTAL
0.4
0.6
FREQUENCY.
0.4 FREQUWCY.
D.0
1.0
FUNDAMENTAL
kWz
0.6
0.2
1.2
0.8
1.0
0.2
12
FUNDAMENTAL
kHz
Fig. 6. (a-d).
Phonetograms
of different
0.4
0.6
0.4 FREQUWCY.
voice categories.
0.8
1.0
1.2
kHz
FREQUENCY.
0.6 kHz
0.8
1.0
1.2
5
3?
2 [II 0.2 FUNDAMENTAL
0.4 FREQUENCY.
0.6
0.8
jib
120
1
fl
40 0.2
LO
0.4
0.6
0.B
1.0
1.2
kHz
Fig. 7. (a,h). Phonetograms
in cases of problems of singing techniques.
ciency, the vocal range, the maximum sound pressure level, and the maximum dynamics in the subareas, and the total vocal range. In the case of singers, the probability of the classification in a voice category as well as the pitch of the transi-
a. Overtensed
phonation.
b. Pressed phonation.
tions were indicated. Short comments refer to diagnostic details. Typical phonetographic configurations of four voice categories (bass, tenor, alto, soprano) are demonstrated in Fig. 6. Baritones and mezzo-
i::~
~ 0.2
FUNDAMENTAL
FREQUENCY.
Fig. 8. (a,b) Phonetograms
FUNDAMENTAL
kHz
of functional
dysphonia.
a. Male hypofunctional
0.4 FREQUENCY.
0.6
0.8
1.0
1.2
kHz
dysphonia. b. female hyperfunctional
dysphonia.
13s
sopranos showed patterns between the shown phonetograms. In the most cases, the posterior classification probability of a voice was different from zero in the neighboring categories. When the classification by the teachers was compared with the classification performed by the present program the error rate was less than 10%. Fig. 7 represents phonetograms of singers with technical voice problems, and Fig. 8 shows typical phonetograms of the types of functional dysphonia.
Acknowledgements This research was supported by the Deutsche Forschungsgemeinschaft (Project Kl 580/1-l743/87). The identification of phonetographic features due to functional dysphonia was in charge of Mr. Airainer.
References [I] W. Seidner and H.K. Schutte, Empfchlung der UEP: Standardisierung der Stimmfeldmessung/Phonetographic. HNO-Praxis. 7 t 1982) 305-307. [Z] R.F. Coleman, J.H. Mabis and J.K. Hinson, Fundamental frequency-sound pressure level profiles of adult male and female voices, J. Speech Hear. Res., 20 (1977) 197-204. [3] M.F. Pedersen and H. Lindskov-Hansen, Computerized phonetograms for clinical use, Proceedings XXth Congress of IALP, 3-7 August 1986, Tokyo, pp. l70- 171. [4] H.-J. Schultz-Coulon and S. Asche, Das “Normstimmfeld” - ein Vorschlag, Sprache Stimme Gehiir. I2 (1988) S-X. [S] W. Seidner, H. Kruger and K.-D. Wernecke. Numerische Auswertung spektraler Stimmfelder. Sprache Stimme Geher, 9 (198.5) 10-13. [6] G. Dickopf, M. Flach, R. Koch and B. Kroemer, Varianzanalytische Untersuchungen zur Stimmfeldmessung, Folia phoniat., 40 (1988) 43-48. [7] H.W. Eichel. Der Stimmfeldindex. ein Vorschlag zur quantitativen Auswertung des Stimmfeldes, Sprache Stimme Gehiir, 12 (1988) 63-64. [8] I.R. Titze, A framework for the study of vocal registers, J. Voice, 2 (1988) 183-194. [9] P. Gramming and J. Sundberg, Spectrum factors relevant to phonetogram measurement, J. Acoust. Sot. Am.. X3 (1988) 235222360.