Cepstrum analysis of pathologic voices

Cepstrum analysis of pathologic voices

Journal of Phonetics (1986) 14, 501-507 Cepstrum analysis of pathologic voices Yasuo Koike Department of Otolaryngology , The University of Tokushima...

2MB Sizes 0 Downloads 54 Views

Journal of Phonetics (1986) 14, 501-507

Cepstrum analysis of pathologic voices Yasuo Koike Department of Otolaryngology , The University of Tokushima School of Medicine, 3 Kuramoto, Tokushima, 770, Japan

The Cepstrum technique was applied both to the acoustic speech wave and to the residual signal derived from an inverse filter analysis, in order to study the short-term perturbations in the periodicity of the voice sources. The speech material was obtained from patients with various laryngeal pathologies, and from normal adults for control. The resulting data for laryngeal patients demonstrated various degrees of irregularity rather explicitly. Cepstrum analysis of the residue signal revealed a simpler form than that of the acoustic speech wave. The feasibility of applying the Cepstrum technique for the purpose of screening laryngeal pathology is considered.

1. Introduction It is well known that various pathologies of the vocal fold, or of the surrounding

structure, can cause a certain change in voice quality. This change, which is often designated as hoarseness, has been extensively studied in the hope that it might provide a useful clue for detecting the causative event in the larynx. Most of early studies on this aspect of voice were conducted on the basis of auditory perception. These perceptual studies were largely subjective, and accordingly resulted in a considerable confusion of concepts and terms. The auditory perception of hoarseness may be influenced by many factors, perhaps even by many extra-laryngeal elements. Acoustic analysis of pathological voices has also been made by many authors. Spectral analysis of pathologic voices, for example, has been performed by several investigators (Winckel, 1952; Nessel, 1960; Prytz, 1977; Wendler, Doherty & Hollien, 1980). Although the results of these studies showed wide variability and thus a proper interpretation of these data is still under discussion, these studies strongly suggested the feasibility of utilizing the acoustic signal to reveal the existence of a pathology in the diseased larynx. Subsequently, detailed and quantitative analyses of variability in the fundamental period and peak amplitude of acoustic speech have been performed by several researchers employing digital means of processing. Lieberman (1963), for example, defined a "pitch perturbation factor" , and indicated that this might be of use in the detection of laryngeal pathology. An important facet of this study was that the variability of the fundamental period was statistically accounted for. A similar parametric statistical procedure was adopted by many authors. Hecker & Kreul (1971) used a "directional perturbation factor", which may be regarded as a method of non-parametric statistics. Koike (1969) and Koike, Takahashi 0095-4470/86/030501

+

07 $03.00(0

© 1986 Academic Press Inc. (London) Ltd.

502

Y. Koike

& Calcaterra (1977) applied techniques of time series analysis, and demonstrated the feasibility of screening laryngeal pathology with such indices as the "frequency perturbation quotient" and "amplitude perturbation quotient". Although the acoustic measures of pitch period and amplitude are quantitatively and accurately measurable, it is not an easy task for a human operator to identify each pitch period even with the aid of a digital computer. Many automated methods, therefore, have been employed by numerous investigators (Crystal, 1970; Horii, 1979; Ludlow, 1981 ). Most of these studies were based on low-pass filtering combined with a peak detector, however, and the use of a low-pass filter may modify the rapid fluctuations of the acoustic waveform which closely correspond to certain variations in the glottal area waveform (Koike et al. , 1977). An improved means of automatically processing these measures, without sacrificing the accuracy of measurement, still seems to be needed. The present study has thus been undertaken in order to examine the possibility of automatic screening of laryngeal pathology by means of acoustic signal processing. The assessment of the nature of laryngeal pathology, or differential diagnosis, is beyond the scope of the present paper. 2. Method

The subjects of the present study consisted often selected patients with various laryngeal pathologies and ten normal adults for control. Table I lists the pathologies of these patients along with their ages, sexes and average fundamental frequencies of the sustained voice samples employed for the study. The subjects were instructed to sustain the vowel /a/ for approximately 4 s at a comfortable pitch level. A magnetic tape recording of the acoustic waveform was made for each subject with a conventional microphone (Eiectrovoice 666) and a magnetic tape recorder (Ampex 600). A stable portion of the vowel of about I sin duration was taken for the analysis. This material was then digitized with an analog to digital converter at a sampling rate of 6500 samples/s with the aid of a small general-purpose digital computer (PDP-11 ). The stored data were then analysed with a "Cepstrum" implemented on the computer. The analyser was designed to produce high-resolution spectra of the acoustic wave, and the logarithm of each consecutive amplitude spectrum was then used as the input to a second similar spectrum analyser. The output of this analyser is then the power spectrum I. Age, sex, average fundamental frequency, and laryngeal pathology of each pathologic subject.

TABLE

Subject number

Age

Sex

Fo

Pathology

PI P2 P3 P4 P5 P6 P7 P8 P9 PlO

33 16 56 25 39 39 77 28 57 64

f m m f f m m f m m

205 175 96 196 231 115 164 189

Vocal nodule Unilateral paralysis Hemilaryngectomized Vocal nodule Spastic dysphonia Vocal polyp Laryngeal papilloma Unilateral paralysis Glottic cancer Advanced cancer

Cepstrum of pathologic voices

503

of the log spectrum, and is called a " Cepstrum" . The Cepstrum of a speech signal has a peak corresponding to the fundamental period for voiced speech, and is widely employed for pitch detection. Detailed descriptions of this technique are available elsewhere (Noll, 1964; Oppenheim, 1969). An inverse filtering was also performed on the original speech waveforms, in order to remove the effect of the supraglottal vocal tract. A detailed explanation of this procedure is also given elsewhere (Markel & Gray, 1973; Koike & Markel, 1975). The Cepstrum of the resultant wave, or the "residue signal" derived from this filter, was also computed. These two types of Cepstra, i.e. the Cepstrum of the original acoustic wave and that of residue signal, were compared to each other, to assess the influence of the vocal tract transmission characteristics upon the Cepstral display. 3. Results

A typical Cepstrum of a normal voice is illustrated in Figure I. The upper tracing shows the Cepstrum of the original acoustic speech waveform, and the lower tracing shows the Cepstrum of residue signal for the same voice sample. The abscissa indicates "QUEFRENCY" in ms, and the ordinate the Cepstral amplitude. The waveforms displayed on the right side of the Cepstra show the Hamming window adopted. The most remarkable I ( a) /

It



,...

J.lj

"

I

I

I

I

I

I

I

I

I

I

I

I

I

(b)

r.,

~

I

20

40

60

Quefrency (ms)

Figure I. The speech Cepstrum (a) and the residue Cepstrum (b) of a normal

voice. The abscissa indicates " QUEFRENCY" in ms, and the ordinate the Cepstral amplitude. The waveforms on the right side of the Ceptra show the Hamming window adopted. The leftmost peak in the speech Cepstrum cannot be seen in the residue Cepstrum, but the remaining structure of the two Cepstra is identical. Two apparent pitch peaks can be identified.

Y. Koike

504 (a)

( b)

20

40

60

Quefrency (ms)

Figure 2. The speech Cepstrum (a) and the residue Cepstrum (b) of an utterance produced by a patient (PlO) with advanced laryngeal cancer. No apparent pitch peak is seen. A considerable amount of noise can be observed.

difference between the two Cepstra seems to be that the prominent positive peak observable at the leftmost end of the speech Cepstrum cannot be seen in the residue Cepstrum. The remaining structures of the two Cepstra appear to be almost identical to each other. This was true with all the Cepstra for normal subjects, and with those for laryngeal patients as well. The fundamental period of the analyzed voice sample is indicated by two sharp peaks in the residue Cepstrum. Quite similar peaks are found in the speech Cepstrum also. The regularity of the original voice sample is represented by the distinctiveness of the peaks. The Cepstra for normal subjects invariably revealed a similar sharp peak indicative of a regular fundamental period. It should be added that in most cases of normal subjects two or more peaks could be identified at approximately equal distances on the "QUEFRENCY" axis. Although this may be dependent upon the Hamming window employed, this feature may give an additional cue for the determination of pitch period. The voices of patients with laryngeal pathologies revealed various different patterns on the Cepstrum. Figure 2 demonstrates Cepstra for an advanced cancer case. No apparent pitch peak can be observed in either tracing. The Cepstra of a voice sample produced by a moderately advanced laryngeal cancer patient are shown in Fig. 3. This time, some pitch peaks may be identified, even though they are much less distinctive than those for normal subjects. Most of the patients with hoarse voices but with varied pathologies in the larynx yielded a similar Cepstrum, indicating the irregularity in the pitch period of their voices.

Cepstrum of pathologic voices

505

(a)

( b)

20

40 Quefrency (ms)

Figure 3. The Cepstra of a voice produced by a patient (P9) with moderately advanced glottic cancer. Although pitch peaks can be identified in each of the two Cepstra, they are less apparent than those in the Cepstra of normal voice.

The Cepstra in Fig. 4, which are based on an utterance by a patient with unilateral recurrent nerve paralysis, indicate no apparent peak, though the patterns seem to disclose some regularity. At least, the Cepstral patterns given in this figure seem to be somewhat different from those in Fig. 2. It is apparent that considerable irregularity exists in the pitch period, but the nature of the irregularity does not seem to be quite the same as that found in Fig. 2. 4. Discussion It seems to be justifiable to mention that the Cepstrum analysis technique is useful in detecting the fundamental period of the acoustic speech wave produced by both the normal and the pathologic speakers. Although this method was originally designed to distinguish the voiced portion of the speech signal from the unvoiced part or from silent intervals, and to measure the fundamental period when it exists, the method seems to be applicable to testing whether the laryngeal vibratory behavior is sufficiently regular or not. This would make a screening of irregular excitations feasible. This should in turn make it possible to detect laryngeal pathology at least at a certain degree of disease development, since the irregularity in the fundamental period might be due to laryngeal pathologies such as cancer. By comparing the speech Cepstrum with the residue Cepstrum on the same sample, it is known that the Cepstral analysis is relatively independent from the vocal tract transmission characteristics which can be effectively removed by inverse filtering. The

Y. Koike

506

(a )

(b)

Quefrency (ms)

Figure 4. The Cepstra of a voice produced by a patient with unilateral recurrent nerve paralysis. No apparent pitch peak is seen. The pattern of noise distribution in these Cepstra seems to be different from that observable in Fig. 2.

vocal tract characteristics reside mainly in the very low-"QUEFRENCY" end of the Cepstrum. Therefore one measure of the effectiveness of the inverse filter is to note how well the low quefrencies are cancelled in the transition from the Cepstrum of the original waveform to that of its inverse-filter residue signal. Since the waveform of the residue Cepstrum is simpler than that of speech Cepstrum, however, it was considered to be meaningful to perform an inverse filtering on the pathologic voices prior to the application of Cepstrum technique, in order to minimize the error in the course of subsequent processing. The pitch information derived from the Cepstrum analysis seems to be quite reliable. It should be feasible, then, to combine this technique with other means of quantitative analysis, in order to further improve the efficiency of screening. The difference between the Cepstra for the laryngeal cancer case (Fig. 2) and those for the unilateral recurrent nerve paralysis patient (Fig. 4) may be attributable to a difference in the acoustic characteristics of the noise included in the original voices. The Cepstral method is relatively noise-resistant. A narrow-band noise, in particular, would produce a single spectral peak, and therefore, there would be no pronounced peak in the Cepstrum. A wide-band noise, however, may result in some energy between the Cepstral peaks. The Cepstral difference mentioned above, therefore, could be due to the dissimilarity in the bandwidth of the noise included in the original voices. The above described feature related to noise may also be of some use in refining the acoustic procedures for the screening of laryngeal pathology.

Cepstrum of pathologic voices

507

The author is indebted to Dr David Broad for his suggestions and assistance. This study was supported, in part, by a grant in aid for Fundamental Scientific Research from the Education Ministry of Japan.

References Crystal, T. & Jackson, C. (1970) Extracting and processing vocal pitch for laryngeal disorder detection, Journal of the Acoustical Society of America, 48(A), 118 . Hecker, M . & Kreul , E. (1971) Description of speech of patients with cancer of the vocal folds . Part 1: measures of fundamental frequency , Journal of the Acoustical Society of America, 49, 1275- 1282. Horii, Y. (1979) Fundamental frequency perturbation observed in sustained phonation, Journal of Speech and Hearing Research , 25, 5-1 9. Koike, Y. (1969) Vowel amplitude modulations in patients with laryngeal diseases, Journal of the Acoustical Society of America, 45, 839- 844. Koike, Y. & Markel, J. (1975) Application of inverse filtering for detecting laryngeal pathology, Annals of Otology, Rhinology & Laryngology, 84, 117- 124. Koike, Y. , Takahashi, H. & Calcaterra, T. (1977) Acoustic measures for detecting laryngeal pathology, Acta Oto-laryngologica, 84, 105- 117. Lieberman, P. (1963) Some acoustic measures of the fundamental periodicity of normal and pathologic larynges, Journal of the Acoustical Society of America, 35, 344--353. Ludlow, C. , Coulter, D. & Gentges, F. (1981) The differential sensitivity of measures of fundamental frequency perturbation to laryngeal neoplasms and neuropathologies. In Vocal fold physiology (D. M. Bless & J. H. Abbs, editors), pp. 127- 134. New York: College-Hill. Markel, J. & Gray, A. Jr. (1973) On autocorrelation Equations as applied to speech analysis, IEEE Transactions on Audio Electroacoustics, AU-21 , 69- 79. Nessel, E. (1960) Ober das Tonfrequenzspectrum der pathologisch veriinderten Stimme, Acta Oto-laryngologica Supplementum, 157, 1- 45. Noll, A. (1964) Short-time spectrum and "Cepstrum" techniques for vocal-pitch detection, Journal of the Acoustical Society of America, 36, 296--302. Oppenheim, A. (1969) Speech analysis-synthesis system based on homomorphic filtering, Journal of the Acoustical Society of America, 45, 458- 465. Prytz, S. (1977) Long-time average spectra (LT AS) analyses of normal and pathologic voices. In Proceedings of the 17th International Congress of Logopedics and Phoniatrics (N. Buch, editor), vol. I, pp. 459- 475. Copenhagen : Special Paedagogisk Forlag. Wendler, J. , Doherty, E. T. & Hollien, H . (1980) Voice classification by means of long-term speech spectra, Folia Phoniatrica, 32, 51- 60. Winckel, F. (1952) Elektroakustische Untersuchungen an der menschliche Stimme, Folia Phoniatrica, 4, 93-113.