Journal ofPhonetics (1976) 4, 129-136
A spectrographic investigation of consonant-vowel transitions in the speech of deaf adults Howard B. Rothman Institute for Advanced Study of the Communication Processes, University of Florida, Gainesville, Florida, U.S.A. Received 9th December 1975
Abstract:
A spectrographic investigation was carried out on the speech of normal-hearing and deaf speakers; the research attempted to answer questions concerning formant transitions, coarticulation and neutralization of vowels in the speech of the deaf adults. Recordings were made of each subject reading a constant carrier phrase into which a changing medial monosyllable was embedded. The deaf subjects chosen for this investigation were judged to be above average in intelligibility (for deaf talkers that is); they exhibited a profound bilateral hearing loss of early onset. Wide band spectrograms were made of selected recorded stimulus items. The results indicated that the transitions of deaf speakers had a restricted range and slower rate of movement than did those of the normal talkers. Further, the speech of the deaf showed relatively few coarticulation effects, their vowel formants tended to be neutralized and they tended to begin any given articulatory sequence in a manner similar to any other regardless of context.
Introduction General agreement exists among those who work with deaf individuals regarding the general characteristics of"deaf speech". These factors primarily include articulation errors (both consonants and vowels) and errors of rhythm; the more important rhythmic errors consist of durational distortions within and between both syllables and phonemes. However, since the exact nature of these factors is not clearly understood, it appears necessary that their acoustic and physiologic components be specified more precisely in order that clinicians and teachers of the deaf may deal with them more effectively. Research in this area (Calvert, 1961 ; Hood & Dixon, 1969; Hudgins & Numbers, 1942; John & Howarth, 1965; Showalter, 1961; Voelker, 1938) suggests that deaf speakers treat phonemes, syllables and words as isolated events rather than as integrated parts of an event of substantially greater magnitude. In addition, the vowel formants of deaf speakers tend to be "fiat", and show little movement, that is, when compared to those of normal speakers (Potter, Kopp & Kopp, 1966; Showalter, 1961). The cited findings are substantiated by clinical observations which indicate that the intelligibility of deaf speech is often affected by the speaker's failure to combine relatively discrete and invariant articulatory responses into a continuously varying acoustic event-a characteristic almost always found in normal speech production. For example, the research on normal speech [see, Delattre (1968); Delattre, Liberman & Cooper (1955); Lehiste
130
H. B. Rothman
(1964); Liberman, (1970); Lindblom (1963); Lisker (1957a); Ohman (1966)] has demonstrated that: (l) articulatory gestures associated with given sound segments characteristically vary as a function of phonological context (due, in part, to coarticulation); (2) acoustic segments and perceived phonological segments are not identical; and (3) a single acoustic segment carries information in parallel about preceding and succeeding phonological segments. Thus, as is well known, the articulation of individual segments is importantly related to, and affected by, the phonetic environments in which they occur. Based on the cited differences between normal and deaf speech, the present study sought to obtain comparative (acoustic) information on the effects of various phonetic environments on the speech of deaf speakers-as compared to normal speakers. More specifically, consonant-vowel transitions were studied in order to specify the nature of formant transitions, coarticulation effects and neutralization of vowels in the speech of two classes of adult speakers: profoundly deaf and normal.
Method Subjects The experimental group consisted of four deaf male adults. Selection criteria were: (1) the presence of long-term (i.e. acquired prior to age three) profound bilateral hearing loss; (2) comparable oral-speech training backgrounds (to minimize interspeaker variability); and (3) the ability to articulate stimulus items adequately (to insure above-average intelligibility of speech samples for a group of this type). Additional characteristics of the experimental group of subjects were: none ever recalled having heard speech, none had an awareness of"everyday" sounds, all failed to respond to sinusoids within the limits of the audiometer (i.e. 110 dB HL-air, 70 dB HL-bone),l all had received oral training from an early age, and all were college graduates and worked in professional positions. A control group also was utilized in the experiment. Selection criteria for this group of four normal male speakers were : no evidence of speech problems or hearing loss; speech was representative of a single dialect area (the Rocky Mountain region). Selection of stimulus material Stimulus items were chosen in order to maximize differences in formant transitions and consisted of minimally different pairs of monosyllabic nonsense words (CVC) embedded within the sentence "Take a-·- aside". Each of four initial consonants (/t, k, I, s/) was combined with each of three vowels (/i, a, u/), and the final consonants /t/. These combinations resulted in a set of 12 stimulus "key" words (e.g. toot, keet, lot and suit). The stimulus words were embedded in the constant or "carrier" sentence in order to provide a linguistically "real" situation in which the influence of the changing medial key word on the constant sentence frame could be investigated. Procedure Subjects were provided supervised practice in reading the stimulus phrases until each had demonstrated reasonable articulatory proficiency (as judged by a panel of three linguists). At that point, tape recordings were made of each subject reading each of the 12 sentences 10 times in a random order (a total of 120 sentences per subject). The recording environment was a sound-treated, shielded lAC room; equipment included an Altec 21D condenser microphone coupled to an Ampex 351 tape recorder. Five utterance-renditions 1 0ne subject, who responded to white noise at 75 dB in the right ear, was judged to be reacting to vibration; another subject admitted to having heard loud explosions.
Spectrographic investigation of consonant-vowel transitions
131
(per sentence for each subject) were chosen for analysis ; that is, there were 60 renditions for each subject or a total of 480 utterances. Multiple renditions of each stimulus item were judged to be necessary in order to (1) obtain a "best" estimate of the spectrographic segment which represented a transition, (2) determine where a transition began or terminated; and (3) minimize the natural variability between productions of an utterance. Spectrograms of the selected acoustic data were made on a two-channel Kay Sonagraph, 7029A; a frequency range of 85-8000 Hz was employed. Tracings of the first three vowel formants /transitions were made from each spectrogram. In order to develop a composite for each subject, the tracings were first arrayed in sets of 12 key words (each set contained five tracings) and superimposed tracings were made of each array. By this process, an average tracing of the first three formants /transitions was obtained for each key word array; these materials were utilized in measurement. M easurements Durations of intervocalic closure events and vocalic events were measured, as were formant transition frequencies in order to determine the influence of the variable key word on the constant sentence frame. Frequency measurements were made of the beginning, middle and end of each transition of F 2 and F 3 • The transitions measured were: (1) between /ei-/ and /k/ of "take" -designated as T 1 ; (2) between the /'J / of the article immediately preceding the medial key word (T 2 ) ; and (3) between theM following the key word to the /sf of "aside" (T 3 ) . The results and discussion section will deal only with the above relationships as they were found to be most important. Results Since the intent of this study was to compare deaf speakers with normal speakers, the data were treated on a group basis. It should be remembered, however, that the findings presented below reflect the fact that the deaf subjects used in this experiment were above average in intelligibility and, consequently, should be presumed to exhibit articulatory behavior superior to most deaf individuals and perhaps more comparable to normal speakers. Moreover, in order to reduce intra-speaker variability, all utterances were subjected to narrow phonetic transcription (performed by three linguists) ; only those phrases judged to be most similar (perceptually) to each other were chosen for analysis.
Coarticulation effects Spectrograms for the deaf group showed relatively small right-to-left coarticulation effects related to the variable key word. This finding is illustrated in Fig. I which represents the range of second formant (F 2 ) transition frequencies for both groups at the onset (a) and termination (c) of the transitions /ei/ to /k/ of "take" (T 1 ), M to the several key words (T 2 ), and /'J / to /s/ of "aside" (T 3 ) . As seen in Fig 1, position T 2 a shows a transition frequency range of 100 Hz for the deaf group, an indication that the schwa tends to be produced in such a way that it lacks the influence normally exerted by a following articulatory sequence. The range of 370Hz seen for the normal group at this point (T2 a), reflects the influence of the subsequent (variable) key words on the place of articulation of the /'J /. Further evidence for the absence of contextual (i.e. coarticulatory) effects in the speech of the deaf group can be found by examining position T 2 c-the termination of the transition of the schwa to the key words and position T 3 a-the onset of the schwa to the /sf of "aside". At T 2 c, the range of transition frequencies for the deaf group is 3 I 5 Hz-which is in sharp contrast to the 970Hz range for the normal group. The articulation of j'J/for the
132
H. B. Rothman 21
20 19
tN
1
I
18 17 16 N
D~
I
D~ : I101:
D~
:
I
1N I
15
I
I
I
""' 14
13
12 II
10
9 s~--J_
__
_ L_ _~_ _ _ _L __ _J __ __L~
T2 o
Figure 1
T2 c
The range of frequencies, in Hz, for normal (N) and deaf (D) groups at the onset (a) and termination (c) of the transitions (ei/ to /k/ of "take"-T 1 M to variable key word -T2 and /:-J/ to /s/ in "aside"-T3 •
normal group, then, appears to be maximally and differentially affected by the proximity of the key words. The key words contain the lip-spread vowel /i/ with a high second formant and the vowels juj and fa/ and the continuant /1/-all of which result in a lowered second formant. At position T 3 a, the normal group exhibits a range of 645Hz while the deaf group shows a range of only 130Hz. Therefore, the articulation of /'Jf in "aside" by the deaf group shows little of the normally occurring left-to-right effect that would be expected to result from the preceding variable key words. The most apparent effect of coarticulation in the normal group (and hence, the most striking difference between groups) occurs during the 1~1 production preceding the voiced continuant /1/, a segment requiring a high degree of continuous and highly controlled coordination by the articulators. Acoustically, a lowering of F 2 and a raising of F 3 is seen during normal /1/ phonation [Delattre (1968), Lisker (1957 b) and Lehiste (I 964)]. From the data presented in Table 1 and Fig. 2, it is apparent that the formant transition frequencies Table I Mean transition frequencies in Hz for Fz and F3 in the normal (N) and deaf (D) groups for the onset (a), midpoint (b) and termination (c) for the transitions of the M to /1/ (T z)
a
c
b
N
D
N
D
N
D
1795 2305
1645 2190
1270 2425
1530 2175
940 2590
1440 2190
1690 2325
1635 2240
1185 2410
1520 2180
945 2520
1335 2130
1695 2345
1665 2230
1150 2470
1490 2175
860 2575
1380 2140
LEET Fz F3
LUTE Fz F3
LOT Fz F3
a
c
b
~ 1::. ·::..-::.-: :.: :-_ ~ "'".,..2500
a ._____ u.a
--.~--
0-_______ - ; -----.... --..:::-:::----;a u
; 0.-------.::-.::::e,~~.::::....::..::::::-.::::::~ ~ -----::::::"'
----o N
I
2000
>,
<.>
c
Q)
" .:::
;e
~·
;.~8
0"
Q)
T3
T2
Tl
a
a
a
b
_,. . ;-.::1 a:-----~::~:::.::.-::.------ .... ---;---
.=---;...--::.--;:::..--- -
~====== --=='=~=--::.-;;;,~~-;~~
b
~ ====-.=-:.---==~:~==-=--=--===-' ~ .• §:=--:::.-::.- ::-~ .-::~~.:::-":=,--:,~=~-=---=-~===8 ~
ti
----------o
i
.~~
.~·
~
;e
:j;
1500 --------------· a
1000
Figure 2
;~~ Mean transition frequencies at the onset (a), midpoint (b) and termination (c) for normal (e) and deaf ( o) groups for F2 (--) and F, (---).The mean transition frequencies include the three {1/ utterances. The asterisk(*) indicates that one or more of the transition frequencies coincide with each other.
134
H. B. Rothman
of the deaf group do not follow the patterns found in the normal group. For example, during the production of the fgf-/1/ sequences, the range ofF 2 transition frequencies for the normal group is 935Hz (1795Hz for LEET to 860Hz for LOT) and always exhibits a negative slope; for F 3 it is 285 Hz(2305-2590 Hz for LEET)-it always exhibits a positive slope. In contrast, the deaf group's transition range for F 2 and F 3 is restricted and always exhibits a negative slope. The transition range for F 2 is only 330Hz (1665Hz for LOT to 1335Hz for LUTE); for F 3 it is 100Hz (2240-2130 Hz for LUTE) and exactly opposite in slope to the movement of F 3 in the normal group. Therefore, it is evident from these results that the deaf have difficulty executing the /1/ sequence properly and that the articulatory context has little effect on their attempted production of the schwa-/1/ sequence. Stereotyped articulation by the deaf group Regardless of the context, the deaf group tends to begin any articulatory sequence in a manner similar to any other articulation sequence. For example, further examination of Table 1 and Fig. 2 will reveal that the range of transition frequencies exhibited by the normal group at the onset of the schwa-/!/ sequence (T 2 a) is approximately 105Hz (1795 Hz for LEET to 1690Hz for LOT). In contrast, the range of transition frequencies for the deaf group is only approximately 30 Hz(l665 Hz for LOT to 1635Hz for LUTE). Further, as Fig. 1 will demonstrate, the range of onset frequencies ("a" points) for all key words for the deaf group is restricted. This finding suggests articulatory stereotyping by the deaf group; i.e., the deaf tend to start each articulatory sequence in much the same way as they do any other regardless of the context. Restricted and neutralized vowel and transition frequencies of the deaf group The transition frequencies for the deaf group show a restricted range of movement as compared to those of the normal group. Previous investigators (Potter et al., 1966; Showalter, 1961) also have reported a comparative lack of movement (or flatness) of F 2 • This characteristic may be seen graphically in Fig. 2. As stated previously, the continuant /1/ requires continuous movement of the articulators and, therefore, is expected to show the influence of a following vowel's second formant position. Moreover, regardless of context, the vowel and transition frequencies of the deaf group tend to be more neutralized than do those of normals. For example, Table 2 presents Table II Mean F 2 transition frequencies in Hz for the normal (N) and deaf (D) groups for the onsets (a points) of the transitions of /ei/ to /k/ of "take" (T1a), the fa/ to the variable key words (T2 a) and the /a/ to /sf of "aside" (T3 a) when the key words are grouped for the vowels /i/, /u/ and fa/
Vowel Condition
Group
/if
fuf
fa/
T1a
N
1920 1691 1918 1641 1848 1645
1885 1664 1821 1615 1549 1644
1891 1662 1824 1620 1321 1658
D
Tza
N
T3a
N
D D
Spectrographic investigation of consonant-vowel transitions
135
evidence that there is little or no influence of the preceding variable key-word vowels on the production of the schwa-/sf (of "aside") sequence in the deaf group. The onset of the F 2 transition frequency for this sequence (T 3 a), following a lip-rounded (/u/) and a low vowel (/a /), is higher for the deaf group than for the normal group. Following a high front vowel (/i/), the deaf group has a lower F 2 transition frequency than does the normal group. In addition, the spread of frequencies at each transition onset (Tla, T2a and T3a) for the deaf group is approximately 75Hz (1961Hz for /i/ to 1615Hz for fuf) as compared to the normal group range of approximately 6_00 Hz (1920Hz for /i/ to 1321 Hz for faf). The data for the experimental group, therefore, indicate a tendency among deaf speakers to neutralize vowels and transition frequencies-and for their articulation patterns to remain relatively unaffected by the phonetic environment. Discussion
Unlike previous investigations which researched points or force of articulation, the present study examined the linking of two or more articulatory gestures in deaf, as compared to normal, speakers. In order to determine group differences for formant transitions, coarticulation and neutralization, the effects of a changing medial monosyllable on the surrounding sentence frame were investigated. The manipulated segments, representing either an articulatory contrast and/or an articulatory extreme, were chosen in order to maximize differences in formant transitions and to help the deaf speakers achieve good representation of each vowel in the various contexts. For example, the vowels /i, a, u j represent the articulatory extremes of the traditional physiological vowel chart and also represent contrasts in incisor separation (/a/), lip-rounding (/u/) and lip-spreading (/i/). The consonants provide alveolar and velar stop (/t,k/) and continuant (/l,s/) contrasts. The results of the research indicated that normally occurring coarticulation effects are minimized in the speech of the deaf group. One possible explanation for the lack of coarticulation in the speech of the deaf appears to be associated with the relatively longer discontinuities observed in their phonatory sequence, especially those related to the longer closure durations of the stop consonants. For example, the mean closure for the deaf group's /k/, of the word "take", before the schwa, is 105 ms as compared to 83 ms in the normal group. Moreover, when the key word follows the schwa and begins with either of the stop consonants (/t/ or /k/), there is a mean closure of 113 ms for the deaf group as compared to the 53 ms closure for the normal group. The relatively longer discontinuities in the phonatory sequence surrounding the schwa for the deaf group may allow the deaf speakers time to treat the schwa as a separate entity rather than as part of a series of interrelated articulatory events. Furthermore, the longer discontinuities provide the opportunity for the deaf speakers to concentrate on the production of the next speech segment which, in turn, may result in articulatory stereotyping. That is, given sufficient time, deaf speakers will tend to achieve some kinesthetic-tactile feedback by finding articulation landmarks. It is difficult to determine the cause-effect relationship between the lack of coarticulation effects and the discontinuities in the phonatory sequence of the deaf group. However, it seems obvious that these two characteristics are related-especially since the spectrographic data show both atypical discontinuities and reduced coarticulation in the stream of phonation of the deaf speakers. Further, these observations support those studies of deaf speech which have demonstrated that deaf speakers treat phonological segments, syllables and words as isolated events rather than as integral parts of longer interrelated articulatory events.
136
H. B. Rothman
Lack of contextual effects and restricted transition frequency movement also were apparent in sequences involving no discontinuities, e.g. the schwa-/!/ sequence. Both Lehiste (1964) and Delattre (1968) have demonstrated the characteristically great range of fluctuation in F 2 , due to an apparent anticipation of the position of the second formant of the following vowel. Further, these investigators have demonstrated that /1/ is distinguished by a rising F 3 , regardless of the following syllable nucleus. Thus, it is clear from the above data that the normal group followed the predicted pattern while the deaf group did not. Finally, the results of this investigation suggest that speech training of the deaf may be faulty in that it may fail to ameliorate the problems of deaf speech, and in fact, may intensify them. Individual sounds, regardless of how correctly they are articulated, account only for a part of the intelligibility of speech. Furthermore, speech sounds in phonetic context bear little relationship to their "allophones" in isolation. Unless there exists a proper relationship between the sounds in a sequence, human speech will be no more intelligible than that produced by a machine when the elements of the synthesized speech are strung together as isolated phonetic elements. By teaching the production of speech as a series of static events, e.g. stressing the articulation of isolated phonemes, a series of distortions are introduced into the speech process. It appears evident that once an adequate level of articulation for isolated phonemes has been established, emphasis should be placed on establishing appropriate speech rhythm. By developing and maintaining correct speech rhythm, the deaf should become more responsive to the normal carryover and anticipatory constraints of the articulatory system and the result should be a closer approximation to normal coarticulation. The author wishes to express appreciation to Dr Dorothy A. Huntington and Dr Naomi Remen for their assistance with the project. Requests for reprints should be addressed to Dr Howard B. Rothman, Institute for Advanced Study of the Communication Processes, ASB 63, University of Florida, Gainesville, Florida 32611. References Calvert, D . R . (1961). Some Acoustic Characteristics of the Speech ofProfoundly Deaf Individuals. Ph.D . Dissertation, Stanford University. Delattre, P . C. (1968). From acoustic cues to distinctive features. Phonetica 18, 198-30. Delattre, P . C., Liberman, A . M. & Cooper, F. S. (1955). Acoustic loci and transitional cues for consonants, Journal of the Acoustical Society of America 27, 1955, 769-73. Hood, R. B. & Dixon, R. F. (1969). Physical characteristics of speech rhythm of deaf and normalhearing speakers. Journal of Communication Disorders 2, 20-28. Hudgins, C. V. & Numbers, F. C. (1942). An investigation of the intelligibility of the speech of the deaf. Genetic Psychology Monographs. 25, 289-92. John, J. E. J. & Howarth, J. N. (1965). The effect of time distortions on the intelligibility of deaf children's speech. Language and Speech 8, 127-34. Lehiste, J. (1964). Acoustical Characteristics of Selected English Consonants. The Hague: Mouton and Company. Liberman, A. M. (1970). The grammars of speech and language. Cognitive Psychology I, 301- 23 . Lindblom, B. (1963). Spectrographic study of vowel reduction. Journal of the Acoustical Society of America 35, 1773-81. · Lisker, L. (1957a). Closure duration and intervocalic voiced-voiceless distinction in English. Language 33,42-49. Lisker, L. (1957b). Minimal cues for separating /w, r, I, j/ in intervocalic positions. Word 13, 257- 67. Ohman, S. E . G. (1966). Coarticulation in VCV utterances : Spectrographic measurements. Journal of the Acoustical Society of America 39, 151-68. Potter, R. K., Kopp, G. A. & Kopp, H. G . (1966). Visible Speech. New York : Dover Publications. Showalter, B. J . (1961). Some Acoustic Aspects of Diphthongs as Produced by Deaf and Normal Speakers. M.A. Thesis, Stanford University. Voelker, C. H . (1938). An experimental study of the comparative rate of utterance of deaf and normal hearing speakers. American Annals of the Deaf83, 1938, 274-84,