ELSEVIER
THE STATISTICAL ANALYSIS OF SPEECH ENVELOPES IN STUTTERERS AND NON-STUTTERERS WIESLAWA KUNISZYK-JOZKOWIAK Institute of Physics. Marie Curie-Sk4"odowska Universi~,. Lublin, Poland
This work is a portion of a research project that is aimed at the development of an objective method for evaluating fluency. The present purpose focuses on acoustic characteristics of utterances that distinguish stuttered and fluent speech. Toward this end, we used sound envelopes of utterances by 30 stutterers and 30 non-stutterers, each 100 seconds in duration and under conditions of simultaneous auditory feedback (SAF) or with an echo (DAF). Specifically, the interdependence of phonation number (number of times a given sound level was crossed) and sound intensity level was determined in this study. In addition, average distributions of phonation times and pause durations as a function of sound intensity level were examined. The resulting data suggest that differences in speech envelopes of stutterers and non-stutterers may be used to evaluate the degree of speech non-fluency. The obtained results support the hypothesis that the cause of stuttering is a lack of synchronization between laryngeal functioning and vocalizing activities. The result is that transitional states between vowels and consonants are either prolonged or more abrupt than normal. The present results are suggestive for a disorder of timing in the stutterers' speech.
INTRODUCTION Diagnosing stuttering, or evaluating the degree of non-fluency in stutterers' speech, is difficult. The difficulty is increased by the fact that the very concept of speech fluency is not univocal. Speech fluency is variously defined by researchers concerned with this problem (Adams, 1982, 1985; Adams and Runyan, 1981; Dalton and Hardcastle, 1977; Fillmore, 1979; Fin and Ingham, 1989; Hegde, 1978; Perkins, 1971, 1983; Starkweather, 1984, 1987). Evaluation of the severity of stuttering depends on the judgment of listeners, who identify non-fluent fragments of an utterance: e.g., incorrectly Address correspondence to Wiesi'awa Kuniszyk-J6~kowiak, Institute of Physics, Marie CurieSk~'odowska University, PI. Marie Curie-Sk~'odowskiej 1 PL-20-031 Lublin, Poland.
J. FLUENCY DISORD. 20(1995), 11-23 © 1995 by Elsevier Science Inc. 655 Avenue of the Americas, New York, NY 10010
0094-730X/95/$9.50 SSDI 0094-730X(94)00006-F
12
W. KUNISZYK-JOZKOWIAK
pronounced syllables, words, or phones. The absence of distortions characteristic of stuttering does not mean, however, that speech is fluent. There are some cases of so called "hidden stuttering," when the speaker practices word avoidance. Also, there often are blockades in the stutterer's speech, which may be judged by the listeners as normal pauses in speaking. Because of this the utterances of stutterers are currently evaluated not only from an acoustic aspect, but from a visual one as well (Coyle and Mallard, 1979; Seymour, Ruggerio, and McEneaney, 1983). It is hypothesized that the most objective evaluation of speech non-fluency would be made based on an acoustic analysis of the speech signal. The acoustic analysis of the stutterers' speech has been the subject of much research in recent years. Researchers have dealt with both portions of stutterers' speech that are considered as fluent (Borden, Baer, and Kenney, 1985; Di Simoni, 1974; Healey and Gutkin, 1984; Hillman and Gilbert, 1977; Kalveram and Jancke, 1989; Metz, Conture, and Caruso, 1979; Metz, Samar, and Sacco, 1983; Pindzola, 1987; Prosek and Runyan, 1983; Zebrowski, Conture, and Cudahy, 1985; Zimmerman, 1980) and the disfluencies characteristic of stuttering (Howell and Vause, 1986; Howell and Williams, 1992). These studies concentrated on short fragments of utterances, the approximate duration of one fragment being one or a few phonemes. The subject of the present article is the analysis of the intensity envelopes of 100 second utterances of stutterers and non-stutterers regardless of their level of displayed fluency. The intensity envelope plays an important part in controlling the fluency speech. It has been shown in some studies (e.g., Adamczyk and KuniszykJ6~kowiak, 1987) that the envelopes of delayed or prolonged speech signals can be used to correct stutterer's speech. Intensity variations can be made known to the speaker through the medium of sound, sight, or touch (Kuniszyk-J6~kowiak and Adamczyk, 1988, 1989; Smo/ka and Adamczyk, 1992). With regard to this, it seems to be worth comparing the parameters of the speech envelopes of stutterers and non-stutterers. Further reasons for the analysis of longer utterances are as follows. A listener judges speech as fluent or non-fluent on the basis of sufficiently long fragments of an utterance. Fluent speech fragments of the approximate duration of one phoneme, syllable, word, or even a short sentence do not necessarily add up to an utterance that a listener would judge as fluent. The present article is a continuation of research described in previous articles (Kuniszyk-J62kowiak, 1991, 1992).
APPARATUS AND MEASUREMENT PROCEDURE As shown in Figure 1, speech signals from a tape-recorder (1) were processed by a re.ctifier/integator (2), converted to digital signals (A/D converter)
ANALYSIS OF SPEECH ENVELOPES
13
;5
PC-XT
©
3
2
4
Figure 1. Instrumental set up for rectification, integration, A/D conversion, and storage of the speech signal. (3), and stored (4) on floppy disks. The maximum levels of analogue signals fed to the converter were monitored on an oscilloscope (5). The dynamic range of the analog signals was 40 dB. The conversion time of the analogueto-digital converter was 5 msec. The speech envelopes were smoothed by totaling the squares of the amplitudes in 0.04-sec intervals, which can be mathematically presented as follows: where:
y[k] =
x 2 [n] ~n=(k-1)oAn+l
y(k) - means output signal sample number k x(n) - means input signal sample number n n - 0.04 sec/tp (tp - signal sampling time) On the basis of the course of the speech envelopes the number of times the signal amplitude exceeded the preset criterion was calculated--in this study this is called the number of phonations. For example, in the portion of the speech envelope shown in Figure 2, the number of phonations above 35 dB level is 9. A computer program was developed that determined the statistical distribution of phonations and the pauses between them. It included the following procedures: 1. calculating the individual durations of phonations and pauses between them (on Figure 2 in the fragment of an utterance, which was chosen as an example, times of single phonations t , . . . t9 and times of single pauses Pl • • • Ps at the 35 dB level have been marked);
14
W. KUNISZYK-JOZKOWIAK
~,' Ill
• i
"o 407
N
I
', ,
rP~'~ P~ tP, rp~-~
Ip~!
i
i
I
~
t
i-,
'[, :lit, 'j;, =l~;
"
,
I
~
li-
I
ii
i
~
I,
mm
mama
I
I
I
,,~t ~,
I
,
I
I
I
'I"~
i tk Ill
I,
I
I
3
4
!1111
'~ ~J"'
"
'"
tagj£
•
P8"~I,
~|
I
2 m
|
I
ILl
I
,3
I
I
,
,I',
t ~
I, I I,
, , ~
I
I~
I
',tg, .
I I
I I
I i
TIME [5]
pomaraptfovc
Figure 2. Speech intensity envelope after rectification and integration showing instances of phonation exceeding 35dB. 2. calculating the number of phonations and the number of pauses of duration times ranging from Ti to Ti + 0.05 sec. T~ changed from within the range of 0 to 0.95 sec. every 0.05 sec.; 3. presenting graphically the statistical distribution of numbers of phonations and numbers of pauses at various sound intensity levels; 4. averaging the numbers of phonations and the numbers of pauses for a given time period (Ti - Ti + 0.05 sec) and for a given sound intensity level. The term phonation is operationally defined as "soundproduction".
SUBJECTS Stutterers
The group of stutterers subjected to speech analysis consisted of 23 men and seven women. Their ages ranged from 11 to 21 years. They described simple pictures in two experimental conditions: with simultaneous auditory feedback (SAF) and with an echo (DAF - delayed auditory feedback) of 0.1-see delay. The stuttering severity defined as the number of errors characteristic of stuttering, such as: repetitions, insertions, blockades etc. per 100 syllables, ranged (with SAF) from 1 to 38 (the average value was 16 errors per 100 syllables). The stuttering severity when speaking under DAF decreased to a lower severity level ranging from 0 to 7.4 errors per 100 syllables in individual stutterers (average: 1.7 errors per 100 syllables). Factors such as age, sex, communication and language experiences, and social background were purposefully not taken into account in the selection of subjects whose utterances were analyzed. The intention of the research was to find the difference in speech parameters between stutterers and fluent
ANALYSIS OF SPEECH ENVELOPES
15
speech, independently of either the abovementioned characteristics or the type of utterance.
Normal Speakers The group of fluent speakers included 15 university students and teachers (seven women and eight men) who described the same pictures as the stutterers did. The analyses included 100-second fragments of official speeches of 15 members of parliament (eight women and seven men) recorded from the radio. Utterances of the subjects lasted 100 sec each and were taped using a constant maximum sound level. Utterances of non-stutterers were treated by the researchers as the mean "pattern of fluent speech," regardless of the speaker and utterance type. For this reason descriptions of simple illustrations by university students and teachers, as well as utterances by members of parliament, during a session were analyzed as examples of fluent speech. The speakers were randomly selected.
RESULTS Figure 3 shows the dependence of the average phonation number and sound intensity level for 100-sec utterances by non-stutterers and stutterers. A definite maximum phonation number can be observed at the level of approximately 30 dB (10 dB below the maximum sound intensity level) in the utterances of non-stutterers. In the speech of stutterers (both speaking with SAF and with an echo) the phonation number is almost constant in a wide range of sound intensity levels (10-30 dB) and is much lower than that of non-stutterers. The increased number of "phonations" observed at the 0-10 dB level is probably caused by the occurrence of quiet sounds (at the level of noise) accompanying stuttering, loud breaths, and disturbances of the acoustic field as a result of joint movements. They do not disappear when speaking with an echo, although an increased fluency can be observed then. While speaking with an echo there is also a significant drop in the speech rate (the average speech rate drops from 4.0 syllables per second with SAF to 1.8 under the influence of DAF), which directly affects a reduction of the number of phonations in a given time period. The results shown in Figure 3 do not inform how long single phonations and pauses between them last. The statistical distributions of the times of phonations and pauses were made for the range of 10--40 dB. The sound intensity level was changed every 5 dB. The statistical analysis was conducted for the time period of 0-1 sec accepting a 0.05-sec interval duration. The average statistical distributions (for 30 stutterers) of the numbers of phonations and pauses as a function of sound intensity level are given an
16
W. KUNISZYK-JOZKOWIAK
* -- s t u t t e r e r s (SAF) + - - s t u t t e r e r s (DAF) nonstutterers o -
-300 i LI')
~Z__ 2o0
.
z
.
.
.
50
0 0
10
20
30
4-0
50
SOUND LEVEL [dB] Figure 3. Average number of phonations and sound intensity levels of 100 second utterances by stutterers SAP and DAF and nonstutterers.
Figures 4-7. The distributions in Figures 4 and 5 were made for the utterances under the influence of simultaneous auditory feedback (SAF) and the ones in Figures 6 and 7 were for the utterances with an echo. The average statistical distributions of numbers of phonations and pauses for utterances of non-stutterers are displayed in Figures 8 and 9. The height of each bar in Figures 4 through 9 is equal to the average number of phonations or pauses in a given time period (from the utterances of 30 stutterers or 30 non-stutterers). In the utterances of stutterers speaking under SAF the maximum phonation numbers take on values of approximately 20 (at levels of 10, 15, 20 dB) to about 40 (at the 30 dB level). In the utterances of non-stutterers the maximum phonation number increases along with the rise in the sound intensity level from about 10 (at 10 dB) to approximately 90 (at 30 dB). So we observe a nine-fold increase. Similar differences can be observed in the distributions of pauses between single phonations. In the speech of stutterers the value of maximum number of pauses is 40 for the 10, 15, 20, and 25 dB levels, then it decreases-whereas in the speech of non-stutterers this value increases from about 40 at the 10 dB level to approximately 90 at 25 dB. It should be also pointed out
ANALYSIS OF SPEECH ENVELOPES
u~
17
80
In
.
0.2 PHONATION
0.4
0.$ Tiff[
0.8
I0 d8
[$]
Figure 4. Frequency distributions of phonation durations at different levels of sound intensity of 30 stutterers under SAF.
that a great number of long pauses occurred in the speech of stutterers (these lasted approximately 0.5-0.7 sec.). They do not occur in the non-stutterers' speech and they could be related to stuttering blocks. These pauses disappear while speaking under DAF, as can be seen in Figure 7. The statistical distributions of phonation times in the utterances of stutterers under DAF are characterized by a significant decrease in the phonation number as well as the shift of the maximum level toward long phonations. This is related to the decreased speaking rates. Just like in utterances with SAF the maximum number of phonations at the 30 dB level is approximately twice as great as at the 15 dB level.
DISCUSSION As the data made evident, in the speech of stutterers, even at a level of 25 dB below the maximum, there is a definite division of phonations of which the approximate duration time is equal to the duration of single syllables. We can deduce from this that the interphonemic transitions (vowel-consonant) are sometimes longer and more abrupt in these utterances than in fluent speech. In fluent utterances there is a clear division of short phonations at the level of approximately 10 dB below the maximum level. This is the result of
18
W.
KUNISZYK-JOZKOWIAK
I00 ~3
tZ 40 t~J ~3 Z ,
-q2
~4
PflLISE
0.6
OURflT]DH
i
0,8
10
IS]
Figure 5. Frequency distributions of pause durations at different levels of sound intensity of 30 stutterers under SAF.
ff."'" j° .:
.,-'"'"
1~3.
r~ z ~-m 4-r0 . ° Z o_
,..,-""
.,-'"'"
.,-'"'" i,.'"'"
.,"""
,,..,-'"'" ,,..,-'"" ,,.,--"'" ,_.,-'"'" .-""~dB
~:-~.:-,-!:i:~-:,-:r.~(
20.
,,.""
.,-'"'" .,-'"'"
1
i
t
i
0.2
0.4
0.6
PHON~TION
.,-'"'"
T|flE
.,-'"'"
.... -:'(
.,-'"'"
.......
.,-"'"15~
,:Z .........
0.8
"
t I0 dB
[$3
Figure 6. Frequency distributions of phonation durations of different levels of sound intensity of 30 stutterers under DAF.
ANALYSIS OF SPEECH ENVELOPES
19
I
I
i
i
,
.
.
--( .... ,........ . ..... ~ I00.
"i"'.""'"'"'~'"/"" ,-'/"'"" 39 d6
o~
8o. UC3 r'r" L,U
~ / i . - s /
40.
'.................
.......
,
20. Z I
I
PfiUSE
OURfiTION
IS)
Figure 7. Frequency distributions of pause durations at different levels of sound intensity of 30 stutterers under DAF.
dB
~:i]i
.......ii..ii..........'
Z 0
~. D .-r t0. ~.
,~.....,.... ..,"
.,.Y
0.2
0.4
PHOflRTIOH
.,--"" .,.'"" .,-'"" .,.'""15dB .....•-('" ......... TlflE
iS)
Figure 8. Frequency distributions of phonation durations at different levels of sound intensity of nonstutterers.
20
W. KUNISZYK-JOZKOWIAK
(i)
n
~0
o
m z D z
20
ou
n,,o,
Figure 9. Frequency distributions of pause durations at different levels of sound intensity of nonstutterers.
the differences in amplitudes of vowel and consonant phonemes that in Polish speech have an average value of about 10 dB. The difference in the amplitudes of vowel and consonant phonemes in fluent speech increases along with rising vocal effort (Frgckowiak-Richter, Kosiel, and Czajka, 1970). It could be assumed that the following feedback takes place in stutterers' speech: a disturbance in the transitional state between phonemes causes a greater effort in articulating speech sounds and, conversely, greater vocal effort leads to a more abrupt interphonemic transition. The vocal effort under the influence of an echo on the speaking process of stutterers is probably less than speech under SAF. Interphonemic transitions, however, are longer because of slower speech. Speech is almost free of errors characteristic of stuttering, yet it cannot be called fluent as such. In utterances of stutterers speaking under SAF numerous long pauses occur. They are likely to be connected with blockades. They do not occur in the utterances of those using DAF or in the utterances of non-stutterers. The presented statistical distributions confirm the thesis that the cause of the occurrence of disfluencies in the speech of stutterers is the lack of synchronization between the functioning of the larynx and vocalizing activities. That is the reason why the transitional state between vowel and consonant is longer (sometimes) o r (sometimes) more abrupt. The difference in the presented
ANALYSIS OF SPEECH ENVELOPES
21
characteristics between the speech of stutterers and fluent speech indicates a disorder of timing in stutterers. The differences in speech envelopes of stutterers and non-stutterers (shown and described on Figures 3-9) can be used in evaluating the speech non-fluency degree in stutterers. In my opinion it would be useful to examine the speech envelope ranges during the course of therapy. The aim of speech therapy for stutterers (which is not always attainable) should be obtaining the speech fluency that approximates the non-stutterers' fluency. It seems to be wrong that some therapists are satisfied with obtaining speech "free of disfluencies that are characteristic of stuttering" and do not care how much it differs from the speech of non-stutterers.
I wish to thank Prof. Bogdan Adamczyk for bringing the subject of this study to my attention and for his support. I also would like to thank Mieczyslaw Paw~'owski, M.SC., for his assistance with the measurements and Professor Klaas Bakker for his very considerable editorial advice.
REFERENCES Adamczyk, B., and Kuniszyk-J62kowiak, W. (1987) Effect of echo and reverberation of a restricted capacity on the speech process. Folia Phoniatrica 39, 9-17. Adams, M.R. (1982) Fluency, nonfluency, and stuttering in children. Journal of FluemT Disorders 7, 171-185. Adams, M.R. (1985) The speech physiology of stutterers: Present status. Seminars in Speech and Language 6, 177-197. Adams, M.R., and Runyan, C.M. (1981) Stuttering and fluency: Exclusive events or points on a continuum? Journal of Fluency Disorders 6, 197-218. Borden, G.J., Baer, T., and Kenney, M.K. (1985) Onset of voicing in stuttered and fuent utterances. Journal of Speech and Hearing Research 28, 363-372. Coyle, M., and Mallard, A.R. (1979) Word-by-word analysis of observer agreement utilizing audio and audiovisual techniques. Journal of Fluency Disorders 4, 23-28. Dalton, P., and Hardcastle, W.J. (1977) Disorders offluency. London, England: Edward Arnold. Di Simoni, F.G. (1974) Preliminary study of certain timing relationships in the speech of stutterers. The Journal of the Acoustical Society of America 56, 2, 695-696.
22
W. KUNISZYK-J07.KOWIAK
Fillmore, C.J. (1979) On fluency. In C.J. Fillmore, D. Kempler, W. S-Y Wang (Eds.), Individual differences in language ability and language behavior. New York: Academic Press, 85-102. Finn, P., and Ingham, R.J. (1989) The selection of "fluent" samples in research on stuttering: Conceptual and methodological considerations. Journal of Speech and Hearing Research 32, 401-418. Fr~ckowiak-Richter, L., Kosiel, U., and Czajka, J. (1970) The effect of voice effort on C/V intensity ratios. Speech Analysis and Synthesis, vol II, 163-175. Healey, E.C., and Gutkin, B. (1984) Analysis of stutterers' voice onset times and fundamental frequency contours during fluency. Journal of Speech and Hearing Research 27, 219-225. Hegde, M.N. (1978) Fluency and fluency disorders: Their definition, measurement, and modification. Journal of Fluency Disorders 3, 51-71. Hillman, R., and Gilbert, H.R. (1977) Voice onset time for voiceless stop consonants in the fluent reading of stutterers and nonstutterers. The Journal of the Acoustical Society of America 61(2), 610-611. Howell, P., and Vause, L. (1986) Acoustic analysis and perception of vowels in stuttered speech. The Journal of the Acoustical Society of America 79(5), 1571-1579. Howell, P., and Williams, M. (1992) Acoustic analysis and perception of vowels in children's and teenagers' stuttered speech. The Journal of the Acoustical Society of America 91(3), 1697-1706. Kalveram, K.T., and J~incke, L. (1989) Vowel duration and voice onset time for stressed and nonstressed syllables in stutterers under delayed auditory feedback condition. Folia Phoniatrica 41, 3042. Kuniszyk-J62kowiak, W. (1992) Distribution of phonation and pause durations in fluent speech and in stutterers speech. Archives of Acoustics 1, 7-17. Kuniszyk-J6Zkowiak, W. (1992) The characteristics of speech envelopes in stutterers and nonstutterers. XXII World Congress IALP, Hannover, Congress Proceedings 95. Kuniszyk-J6~kowiak, W. (1991) The possibility of acoustical evaluation of disfluency of speaking. Logopedia 18, 65-72 (in Polish). Kuniszyk-J6Zkowiak, W., and Adamczyk, B. (1988) Effect of Auditory and Tactile Echo and Reverberation on Stuttering. Proceedings XV Congress Union European Phoniatricians, Erlangen, 103-105.
ANALYSIS OF SPEECH ENVELOPES
23
Kuniszyk-J62kowiak, W., and Adamczyk, B. (1989) The effect of tactile echo and tactile reverberation on the speech fluency of stutterers. International Journal of Rehabilitation Research 12(3), 312-317. Metz, D.A., Conture, E.G., and Caruso, A. (1979) Voice onset time, frication, and aspiration during stutterers fluent speech. Journal of Speech and Hearing Research 22, 649-656. Metz, D.E., Samar, V.J., and Sacco, P.R. (1983) Acoustic analysis of stutterers' fluent speech before and after therapy. Journal of Speech and Hearing Research 26, 531-536. Perkins, W.H. (1971) Speech pathology: An applied behavioral science. Saint Louis: C.V. Mosby. Perkins, W.H. (1983) The problem of definition: Commentary on "stuttering". Journal of Speech and Hearing Disorders 48, 247-249. Pindzola, R.H. (1987) Durational characteristics of the fluent speech of stutterers and nonstutterers. Folia Phoniatrica 39, 90-97. Prosek, R.A., and Runyan, Ch.M. (1983) Effects of segment and pause manipulations on the identification of treated stutterers. Journal of Speech and Hearing Research 26, 510-516. Seymour, C.M., Ruggerio, A., and McEneaney, J. (1983) The identification of stuttering: Can you look and tell? Journal of Fluency Disorders 8,215-220. Smotka, E., and Adamczyk, B. (1992) Influence of visual echo and visual reverberation on speech fluency in stutterers. International Journal of Rehabilitation Research 15, 134-139. Starkweather, C.W. (1987) Fluency and stuttering. Englewood Cliffs, N J: Prentice Hall. Starkweather, C.W. (1984) On fluency. NSSLHA Journal 12, 30-37. Zebrowski, P., Conture, E., and Cudahy, E. (1985) Acoustic analyses of young stutterers' fluency: Preliminary observations. Journal of Fluency Disorders 10, 173-192. Zimmermann, G. (1980) Articulatory dynamics of fluent utterances of stutterers and nonstutterers. Journal of Speech and Hearing Research 23, 95-107.
Manuscript received May 1993; revised October 1993; accepted November 1993.