JOURNAL
OF COMMUNICATION
DISORDERS
9 (1976), 247-260
CONFUSIONS IN RECOGNIZING PHONEMES SPOKEN BY ESOPHAGEAL SPEAKERS: II. VOWELS AND DIPHTHONGS ALAN Deparhnent
C. NICHOLS
of Speech Pathology and Audiology, San Diego State University, San Diego, California 92182
A methodology was developed to determine the patterns of phonemic errors that listeners would make in receiving the vowels and dipthongs of monosyllables spoken by esophageal speakers. Analyses of the features preserved in the errors were performed. The identified phonemic error patterns and the features found to be poorly preserved in the errors were used to structure multiple-choice intelligibility practice materials for esophageal speakers.
Introduction The intelligibility of esophageal speakers has been shown to be severely impaired (Amster et al., 1972; Hyman, 1955; McCrosky and Mulligan, 1963; Nichols, 1976; Shames, Font, and Matthews, 1963). Evidence to the contrary seems limited to studies of superior speakers and/or relatively easy test materials (Horii and Weinberg, 1975; Hubbard, 1972; Weinberg and Westerhouse, 1973). Because the intelligibility of the esophageal speaker is unlikely to improve with use over extended periods of time, and is quite likely to regress (Amster et al, 1972; Diedrich and Youngstrom, 1966), Nichols (1976) developed a methodology for the preparation of practice materials that would promote its maintenance or improvement. The present report continues the exposition of the findings of the program that Nichols (1976) developed. It deals with the vowels and diphthongs, their intelligibilities, and the development of multiple-choice materials that may be used to facilitate practice with vowel and diphthong productions by esophageal speakers. As an approach to the methodological problems of materials development, it was hypothesized that patterns of confusions would occur when esophageal speakers said monosyllables and normal listeners wrote down what they thought was said. The methodology was described as follows in an earlier report: ‘Ihe phonemic characteristics of productions peculiar to esophageal speakers that tend to cause listener confusions were termed a pafholect. The procedure under study involved the identification of reception errors made most frequently by listeners to esophageal speakers. These were, in turn, used to select the foils of minimally-contrasted four-word multiple-choice items to be used in intelligibility practice (Nichols, 1976). o American
Elsevier Publishing
Company,
Inc.,
1976
247
248
ALAN
C. NICHOLS
Method1 Speakers There were 11 male speakers in the esophageal group. All were over 40 and two were in their 70’s. All but one were sufficiently fluent to articulate a three-word phrase without pause. They had had from 2 to 50 hours of group intelligibility practice with multiple-choice materials. Three had moderate hearing losses; four had slight losses. None had losses severe enough to indicate a need for amplification. All but one spoke variants of standard English. One used remnants of Black nonstandard English, but could and did produce standard English during the recording task of the present study. Listeners Group I. The first group to listen to the experimental tapes included 38 lay persons, volunteer workers for the American Cancer Society, who were invited to participate in a demonstration of intelligibility training for esophageal speakers. In addition, there were 13 students from the Department of Speech Pathology and Audiology of the San Diego State University. The total group was predominantly female (45:6) and ranged in age from 19 to 62 years. All were normal speakers of variants of standard English with hearing within normal population limits. There were two Black and two Chicano students in the group who had minimal traces of dialect. The remainder were white. Group 2. The second group to hear the tapes was a graduate seminar in voice pathology at the San Diego State University. All 26 members of the group were speakers of standard English with normal hearing. There were one Chicano and two Black students in the group, none of whom had any detectable dialectical traits. There was one male student. In summary, the speaker and listener groups were representative of fluent esophageal speakers and some of the women who listen to them. Male listeners were not adequately represented, and questions of cross-dialectal communication remain open. Words The stimuli were taken from Moser’s (1969) lists of one-syllable words. There were 20 lists of 20 words each prepared for the esophageal speakers. Twenty initiating and 20 terminating consonants and clusters were represented in each list. The 16 vowels and diphthongs of the General American dialect (Moser, 1969) ‘I%e following material reprises the exposition of method in Nichols (1976). Variations a discussion of the vowel and diphthong results have been made.
appropriate
to
CONFUSIONS
OF PHONEMES
FROM
ESOPHAGEAL
SPEAKERS:
II.
249
were also represented in each 20-word list, with four duplications per list. The following summarizes the phonemes occurring in each list. These were BR2, CH, D, F, FL, G, GR, J, Initial consonants and clusters. KL, L, M, N, P, R, S, SK, ST, SH, and T. -Vowels and diphthongs. HE, I-UT, I-MY, HECK, HAT, HOT, HAWK, HUT, HER, HOE, HOOK, WHO, HIGH, HOW, HOIST, and HUE were the vowels and diphthongs used. Theseincluded DZ, K, KS, KT, LD, LZ, M, Final consonants and clusters. MZ, N, ND, NZ, P, PS, PT, RD, RZ, S, ST, T, and TS. All vowels and diphthongs in Moser’s (1969) lists were included among the stimuli for each speaker. Due to word selection contingencies, balance was not maintained between the vowels and diphthongs. The actual number of speakerlistener interactions for each of these phonemic entities is shown in Tables 1 and 2. These will be discussed later. For selection purposes, a word represented an initial, a vowel or diphthong, and a final. Thus, “jam” represented the initial /J/, the vowel in I-MT, and the terminal /MI. For the most part, a word appeared only once in the speakers’ production lists. Recording Two speakers recorded one list each, and nine speakers recorded two lists each. They read the 20 words from 3 X 5 cards that were placed in a slot on a cardboard sheet at a deliberate rate. Three to four seconds were allowed to elapse between placements to provide time for listeners to write down the word spoken. The words “say” and “again” were printed on the cardboard on either side of the slot. Speakers were instructed to read each target monosyllable as the center word of a phrase, i.e., “say ‘word’ again.” Speakers were seated in an audiometric testing room. A stand-mounted condenser microphone (AKG CKI) with essentially “flat” response from 50 to 15,000 Hz was placed 30 inches from the speaker’s mouth. The microphone was led through a recording system that consisted of an Ampex mixer and amplifier (4M- 10) and Ampex tape deck (602). Subsequent playback was made through an Ampex speaker (62 1). A calibration tone was recorded free-field at the beginning of the tape, 1,000 Hz at 80 dB from the audiometer associated with the room. Playback Playback procedures were the same for both groups, although 6 months separated the listening sessions of Group 1 and Group 2.
*Here, and throughout,
Moser’s (1969) phonemic
notation has been used.
250
ALAN
C. NICHOLS
The tapes were presented to the listeners in a large lecture hall. Calibration was performed with the speaker placed on a table in the front of the room. The level check (80dB) of the calibration tone was made with a Bruel and Kjaer sound level meter (2203, C Scale) placed on the writing leaf of a seat in the center of the sixth row of seats, 30 feet from the front wall and approximately one-third of the distance from the front to the back of the hall. The listeners were seated in the fifth, sixth, and seventh rows of seats, that is, around the calibrating meter. The ambient noise in the hall was found to be 55 ‘_ 3 db.3 The procedures were introduced to Group 1 by an esophageal speaker. He explained the intelligibility project, and in doing so, introduced many of the listeners to the esophageal voice. Next, the experimenter explained the task as follows. You have before you a series of lined sheets. At the top of each sheet is a blank to enter the name of the speaker and the number of the list of words he is going to read. You will then hear the speaker saying a series of three-word phrases. All begin with the word “say” and end with the word “again. ” Please write the middle word of each phrase on a line on the sheet before you. For example, when the speaker says “say ‘top’ again,” write down the word “top.” There are twenty words in each list. Are there any questions? Then we will begin.
Only the experimenter’s presentation was given to the listeners in Group 2. The stimuli were arranged on four tapes, each of which presented the phrase productions of five esophageal speakers. As we noted, the Group 1 listeners were recruited during a demonstration. The number of listeners per reel for Group 1 varied (due to late arrivals and early departures) as follows: Reel l--49; Reel 2-50; Reel 3-3 1; Reel 4-30. Their w-rite-down responses provided the data to be analyzed. The 26 members of Group 2 listened and wrote the target words of all four tapes. Results4 The data for the present report were derived from lists of the responses, one list for each group for each of the 400 words. The overall whole-word write-down intelligibility for these data was 36%) a value quite similar to that found by Amster (1973). Only the initial phonemes and clusters have been analyzed and reported in the present report. The data for the two groups were analyzed separately and then combined. The 31heae procedures were designed to meet Nichols’ (1971) criticisms of a study of esophageal intelligibilitiea. The rationale is more fully developed in that criticism than is appropriate in the present report. 41he following
findings,
material, with variations appropriate to an exposition reprises the Results section of Nichols (1976).
of the vowel and diphthong
CONFUSIONS
OF PHONEMES
FROM
ESOPHAGEAL
SPEAKERS:
II.
reader should note whether it is the separate analyses or the combined under discussion in each of the following sections.
Speaker by Listener
251
data that are
Tallies
Speaker-listener tallies were done separately for each group and then combined. Responses of Group 1 to a speaker’s “say ‘peg’ again” might be as follows: egg, 20; peg, 5; peck, 3; pal, 3; ten, 2; had, 1; apple, 1; tennis, 1; noresponse, 8. The investigator had prepared a speaker by phoneme chart to tally responses to each of the 20 initials and clusters. The data above were then entered on the sheet devoted to the vowel in /HECK/:
Speaker
HE HIT HAY HECK 35 1.
HAT HOT. . . Polysyllables 2 4 .. .
No response 8
The ellipses indicate phonemes that were not among the vowel or diphthong structures of the responses. This process was repeated for each of the speakers, and then done for the remaining vowels and diphthongs. The reader will note that no attempt was made to analyze the polysyllables among the responses. The decision to forgo such analysis was based upon the observation that inherently arbitrary judgments would have to be made for many responses falling into this category, i.e., the investigator could not know which syllable of the response corresponded to the stimulus syllable. Because the hypothesis that anchored the study was that a group of speakers with a particular problem would have a particular pattern of errors, one further exclusion of data was performed, that of the idiosyncratic errors. That is, when a particular production of an initial evoked a response or responses unlike those evoked by any other production of that initial by the speaker or by other speakers, they were excluded from further consideration in the analysis. For example, if among the productions that included the vowel in HECK, only one production evoked responses that used the vowel in HE, such responses were regarded as the problem of a particular effort by a particular speaker, idiosyncratic, and not part of the patholect. 5 They were not treated in further analyses. Only 3% of the responses fell into this class, but occasional speaker-by-response row entries as high as 23 were thus eliminated. The data excluded due to no-response, polysyllabic response, and idiosyncratic response accounted for 12% of the 26,400 total responses for the combined groups (7%) 2%) and 3%) respectively). The data tallies for correct responses (phoneme S?he reader should note that most speakers recorded two lists. Hence, an idiosyncratic even characteristic of a particular speaker.
error was not
252
ALAN
C. NICHOLS
intelligibility), errors of an analyzable form, and the “other” errors, which included the exclusions discussed in this section, are shown in Table 1. The table reports in terms of each of the vowels. It reveals for example, that the vowel in HE was correctly received 7 1% of the time, in error 18% of the time, and involved in “other” errors, idiosyncratic, polysllabic, or no-response, 11% of the time. The most intelligible (%C) vowel was the HAY (79%) and the least intelligible the HUE (52%). The mean intelligibility was 64% with a standard deviation of 8% and a standard error of the mean of 2%. The reliability of the intelligibilities was estimated by computing the test-retest correlation (r) between the %C score for Group 1 for each of the 16 vowels and diphthongs and the comparable %C score for Group 2. This coefficient was moderately high (r = 0.78,df = 14,~ < O.Ol), indicating good stability from one set of responses to the other. The intelligibilities of the Group 2 scores were significantly higher than those of the Group 1 scores (t = 7.96, df = 15, p < 0.001). This latter result may be attributed to the more skilled listeners in Group 2, and their younger ears.
TABLE 1 Table of Statistics for the Response Data. The Numbers of Correct, Errors, and “Other,” or Unscorable Responses, are Shown. Also Shown are Percentage Expressions of these Numbers in Terms of the Total Number of Responses to Each Phonemic Entity of the Study (NR), and the Test-Retest Reliabilities (rrr) for the Error Types Derived from the Error Patterns for the Two Experimental Groups Responses
HE HIT HAJ HECK HAT HQT HW HUT H& HO& HOOK WHO_ @GH HOW H@ST HUJ
NR
correct
%C
Errors
%E
r12 0
1431 2748 1954 1924 1601 2319 1694 1628 1620 1592 1370 1860 1691 1483 736 762
1019 1891 1548 1109 1100 1226 903 1065 1169 1137 763 1161 1145 976 423 396
71 69 79 58 69 53 53 65 72 71 56 62 68 66 57 52
256 545 307 646 329 788 638 387 242 279 409 476 397 291 225 248
18 20 16 34 21 34 38 24 15 18 30 26 23 20 31 33
0.97 0.96 0.92 0.99 0.98 0.88 0.97 0.98 0.56 0.78 0.86 0.76 0.98 0.98 0.96 0.79
“Other”
%U
156 312 99 169 172 305 153 176 209 176 198 223 149 216 88 118
11 11 5 9 11 13 9 11 13 11 14 12 9 15 12 15
o All r12 coefficients were (with N = 15) significant beyond the 0.01 level of confidence with the exception of the r12 for HER responses. In this latter case, the value 0.56 is significant beyond the 0.05 level.
CONFUSIONS
OF PHONEMES
FROM ESOPHAGEAL
SPEAKERS:
II.
253
Production-Reception Matrix The’combined data for the two groups were next cast into a production (P-R) matrix in which the productions attempted by the esophageal speakers are identified with rows and the responses of the listeners are represented as phonemes in the columns. The P-R matrix might be construed as a “confusion matrix.” To do so would be to oversimplify the character and complications of the present procedure. In most confusion matrix studies, the response patterns have been induced by transmission distortions, i.e., noise, filtering, etc. Theoriginal production of the stimuli is done by “normal” speakers of a specifiable dialect. The errors are those of the listener. In the present situation, the errors were hypothesized to be due to the faulty productions of the speakers; their source, insofar as the procedure and analysis can insure, was the esophageal speech per se. The pattern of errors is, under this assumption, the patholect insofar as it is identifiable by the present methodology. The term P-R matrix differentiates this methodology from other studies of receptive confusions. The P-R matrix for the present study is presented in Table 2. The correct receptions are shown in bold type. For example, the esophageal speakers’ HECK word productions (e.g., “bread”) led to 1109 correct receptions by the two groups of listeners. There were also 10 errors in which the reception was a word containing the vowel in HE, 108 errors in which words with the vowel in HIT were received instead of the intended production (“bid” instead of “bed”), 37 HAY errors of reception, etc. The pattern of errors was tested for reliability by considering each of the listener groups’ error response sets as a “test” of the distribution. A test-retest reliability coefficient (r 12) was then computed for the error types in each row. Then for each computation was the number of potential error types, that is, the number of error vowels and diphthongs ( 15) found in the responses of the two groups. The r 12 for the HE pattern of errors was 0.97, indicating very good stability of pattern from Group 1 to Group 2. The average r for these test-retest comparisons, computed by using Fisher’s Z conversion (McNemar, 1955, pp. 148-149) was 0.94, also significant beyond the 0.01 level of confidence. The range was 0.56 to 0.99, all values but one, the 0.56, indicating significance beyond the 0.01 level of confidence. The 0.56 was significant at the 0.05 level. It may be concluded that the pattern of errors in the P-R matrix is highly reliable on the whole, and possessed of good stability for every row (phonemic entity). Certain of the errors have been designated as likely confusors for the purposes of preparing multiple-choice intelligibility practice materials. They are shown in bold italics in Table 2. For example, the vowel in HIT is a likely confusor for the vowel in HE, when a word, containing the latter is spoken by an esophageal speaker. The criteria for such designations were two: (1) the error accounted for at least 5% of the total analyzable errors, and (2) the error occurred in the responses to
F
20
28
22 3 4 4
7
6
14
7
3
13
82
128
152
28
22
26
17
87
279
1226
14
15
6
HOT
--
11
17
33 21
1065 49 14 a2 27 32 51 40 11
5 40 71 4 11 29 30
15
5
119
116
1137
21
15
5
28 9
1.5
58
10
1145
of the responses
27
NR
423
16
3%
762
736
1483
1691
1860
1370
1592
1620
1628
1694 11
1601
1924
1954
2748
1431
2319
31
51
6
HUE
12
HOIST
made by the listeners.
5
976
6
3
12
16
59
39
24
7
4
HOW
3
13
1161
91
3
68
763
7
10
78
52
17
6
2 25
66
11
31
3.5
103
6
5
2
8
14
38
21
14
20
HIGH
WHO
HOOK
and the vowels and diphthongs
5
9
73
21
14
1169
24
83
34
9
107
21
2
903
21
10
4
4
15
HOE
243
23
36
11
6
4
12
HER
HUT
Receptions
I63
8
HAWK
Designated confusots for productions are indicated by bold italics. Correct receptions are shown with bold numerals. a The key words contain the target vowels and diptbongs for productions
56
3
16
HOIST
HUE
34
139
5
2
7
84
10
159
1100
372
52
23
10
HAT
9
70
8
5
33
159
44
1548
51
257
58
1109
21
84
37
HECK
HAY
HOW
15
39
WHO
HIGH
19
HOOK
33
8
HOE
39
HER
8
8
HUT
7
2
HAWK
108
3
10
HECK
128
HOT
53
HAY
91
1891
24
39
HIT
HAT
1019
HIT
HEa
HEa
Productions
TABLE 2 A Production-Reception Matrix Showing the Pattern of Listeners’ Receptions of the Vowels and Diphthongs of Monosyllables Spoken by Esophageal Speakers. Designated Confusors for Productions (See Text for Designation Criteria) and the Number of Potential Receptions (NR) for Each Vowel or Diphthong Are Also Shown. Data Peculiar to One Production, Two Syllable Receptions, and No-Responses Are Not Shown in the Matrix
CONFUSIONS OF PHONEMES FROM ESOPHAGEAL SPEAKERS: II.
255
at least four productions. (From another point of view, the second criterion assured that at least two esophageal speakers had to produce a phoneme in such a way as to induce the particular error. If only two speakers induced an error in response to productions of a given phoneme, they had to do it consistently, both times they said words beginning with the phone.) These criteria were designed to preserve the generality of the confusors, to maximize their potential to model the patholect . Multiple-Choice Intelligibility Materials At this point, it would be possible to develop intelligibility practice materials for esophageal speakers using the P-R matrix as a guide. Moser (1969) would be consulted to exhaust the language’s potential for the particular monosyllabic stimuli under study. For example, the list of potential HUT vowel stimuli might begin with the words BUCK and BLUNT and continue through several items to the word DONE. The list would then be reduced by retaining only those stimuli each of which had among the minimally-contrasted pairings in which it participated at least three potential confusors in the patholectic pattern: HA T, HOT, I-L4WK, HOOK. Among the examples already noted, BUCK has the words BACK, BOCK, BALK, and BOOK as potential minimal contrasts, while the word DONE has DAN, DON, and DAWN. There are no words that have syllabic nuclei among the potential confusors that preserve the BL-NT environment of the word BLUNT. A more elaborate discussion of the details of these procedures and the practical and theoretical principles underlying them may be found in Nichols (1976). The word BUCK and the confusors would be retained to provide for such items as: BACK BUCK BALK BOOK
BOCK BOOK BUCK BACK
BUCK BOCK BACK BALK
Feature Analysis The methodology developed for the present program pursued the analysis of the P-R matrix further. Features derived from the work of Chomsky and Halle ( 1968) and Jacobson, Fant, and Halle (1969) were applied to the error patterns to determine whether particular acoustic characteristics of the phonemic entities studied were “preserved” in the error responses. The features employed in this analysis were & vocalic, 2 high, f back, ? low, f round, and ? tense.
256
ALAN
C. NICHOLS
An example of the procedure, the analysis of the error responses to productions of the vowel in HIT, YIELDED THE FOLLOWING STATISTICS: + vocalic, 86%; & high, 29%; - back, 75%; - low, 90%; - round, 77%; - tense, 54%. That is, of the errors, 86% were vocalic. Because the presence or absence of a feature was placed in this dichotomous frame of reference, it was possible to test the null hypothesis that the feature was not present at a greater than chance, or 50%, level (a one-tailed hypothesis) by the z test of arcsin transformations of proportions (Walker and Lev, 1953). The alpha level was set at 0.0005 since there were 96 proportions to be tested. This provided an experimentwise alpha of 0.047. In this context, the errors for HIT vowel productions were + vocalic, - back, low, and - round, but not -t high or - tense. The results of the application of this procedure to the error responses of all 16 syllabic nuclei are shown in Table 3. Multiple-choice items that take into account the results of the feature analysis are illustrated by the following: ILL EEL ALE YOU’LL
MUTE MATE MITT MEAT
HEED HID HUED HAYED
FAILED FIELD FUELED FILLED
The items contrast the - tense vowel in HIT with the + tense nuclei of HE, HAY, and HUE. Moser ( 1969) provides seven useful monosyllables (ill, fill, fills, filled, hid, mitt, and mitts) that have three such minimal contrasts. Items contrasting the - tense vowel in HECK, with the + tense nuclei of HAY, HAT, and HIGH would provide for more practice of the contrast. Since the - tense feature was not present at greater than chance levels in any of the esophageal speakers’ productions of - tense vowels, the preparation of such practice materials would have substantial theoretical validity. The following set of contrasts provides for tense vs. + tense items within the confusor patterns of thepatholect: Stimuli: Confusors HIT: I-L?, HAY, HUE HECK: HAY, HAT, HIGH HUT: HAT, HOT, HAWK HOOK: HAWK, HER, HOE, HOW: HAT, HOT, HAWK
WHO
The present results bear no resemblance to previously reported intelligibilities for the vowels and diphthongs. Neither Fletcher’s ( 1953) % misinterpreted values nor Black’s (1952) report of the sounds that enhance and deter intelligibility had
+ + + + _
Back
84* 41
96*
N(ermrs) 256 *Critical% 61 @ < 0.005)
Tense
Round
Low
51
+ -
Hnzh
83*
98*
+ _
Vocalic
HE-
545 58
54
II*
90*
75*
29
86*
HIT
307 60
96* 40
81*
92*
36
98*
HAY -
646 57
23
95*
32
89*
73*
93*
HECK -
329 60
85* 26
II* 14
16*
90*
HAT -
788 56
66* 62*
60*
16* 62*
80*
HOT -
638 51
65*
32
51*
79* 91*
89*
HAWK
387 59
30
59*
39
66* 57
91*
HUT -
242 61
40
88* 36
31
44
lOO*
HER
219 60
68*
63* 76*
33 85*
87*
HOE -
409 59
26
Ii’* 68*
89*
27
99*
HOOK _
Vowels and diphthongs
476 58
65*
90* 68*
64*
58*
92*
WHO_
397 59
93* 86*
41 79*
3 6
HIGH
291 60
21
20
69*
79*
3 9
HOW
225 61
73*
32
59
86*
8 9
H@ST
248 61
50
100* 45
41
0 87*
HUE -
12 4 9 I 9 I 6 10 8 8 11 5
N
12 0 2 5 7 5 4 8 3 8 6 0
Preserved
100 0 22 71 78 71 61 80 38 100 55 0
%
TABLE 3 Preservations of Features in Listeners’ Errors in Receiving the Phonemic Productions of Esophageal Speakers. Errors for 16 Vowels and Diphthongs Were Analyzed in Terms of Six Feature Systems: 2 Vocalic, + High, * Back, 2 Low, f Round, and * Tense. Table Entries Are Percentages. Z Teats of the Significance of the Entries Where Chance = 50% Are Indicated (See Text)
258
ALAN C. NICHOLS
any significant predictive value when applied to the % C values shown in Table 1. The intelligibilities of the vowels and diphthongs, while on the average higher than those of the initial consonants and clusters of the stimuli (64% vs. 53%) still confirm that severe impairments of articulatory ability may be observed among the population under study. The ability of the average esophageal speaker to communicate, then, must be attributed in substantial part to the listeners’ ability to make perceptual phonetic closure and to profit from redundancy and context. Such listener activities no doubt account for the common impression that esophageal speech shows no changes in distinctness (see, for example, Safran and Szende, 1973). While all speakers profit from these skills of the mentally active and motivated listener, the esophageal speaker must rely heavily. When the listener is not perceptually active and motivated, communication breakdowns are liable to occur. The problem of communication breakdown is particularly serious in the presence of noise. As Horii and Weinberg (1975) have shown, even superior esophageal speakers are more vulnerable to intelligibility disintegration in noise than the normal speaker. Nichols’ (1968) demonstration that the esophageal phonation itself contains considerable noise, and that superior speakers have less noise than average or poor speakers, must also be taken into account in the context of the present discussion. Outside of the usually quiet therapeutic situation, noise is ubiquitous. No home is free of competing noise sources, and “noise pollution” is an accompaniment of most jobs and social situations in which the esophageal speaker is expected to communicate. It would thus seem relevant to provide intelligibility practice in the presence of noise. The utility of the materials developed by the present methodology must, of course, be tested. Preliminary work with the 16 vowels and diphthongs did not prove to be successful (Pottinger, 1974). Concentration of practice with materials based upon feature analyses (such as the - tense array illustrated in the preceding section) may prove to be more effective. Both group practice, in which esophageal speakers serve alternately as speakers and as listeners, and individual practice, in which esophageal speakers record a list of words and then listen to the recording while responding within the multiplechoice format, have been carried out with the materials. Both applications appear useful (Nichols, 1976). It may also be of interest to test the impact of vowel/diphthong intelligibility practice (and improvement) upon the intelligibility of the consonants. That vowel formant transition cues signal the presence of the consonants is a well-known fact. Fant’s (1970) discussion of this phenomenon provides a good review, but many others are available. Studies such as those of Stevens and Klatt (1974) and Klatt (1975) continue to provide evidence of the importance of the vowel formant transition to such phonetic factors as voicing, a vulnerable feature in esophageal speech intelligibility. They showed, for example, that “voicing onset time” had a
CONFUSIONS
OF PHONEMES
FROM
ESOPHAGEAL
SPEAKERS:
II.
259
trading relation to vowel formant transition time in effecting the 2 voice distinction (Stevens and Klatt, 1974). Jacobson and Fant (1969, p. 57) have called attention to the importance of durational control to the + tense feature. Should zk tense items prove effective in improving speakers’ control of this feature, then improvement of the intelligibility of the stops may be effected. Similar effects upon the intelligibilities of other phonemes may also be possible. Research is needed to test this hypothesis. This study was carried out under a grantfrom the American Cancer Society, California Division, to the San Diego State University Foundation. The contributions of Judy Nicks and Karen Wolfer as research assistants, Anne Pottinger, a graduate student, and the voice pathology seminar of Fall 1974 are gratefully acknowledged.
References Amster, W. W. Letter and data sheet. March 28, 1973. Amster, W. W., Love, R. J., Menzel, 0. J., Sandler, J., Sculthorpe, W. B., Gross, F. M. Psychosocial factors and speech after laryngectomy. J. Cotnmun. Dis., 1972, 5, l-18. Black, J. W. Accompaniments of word intelligibility. J. Speech Hearing Dis., 1952, 17,409-418. Chomsky, N., Halle, M. The sound pattern of English. New York: Harper, 1968. Diedrich, W. M., Youngstrom, K. A. Aluryngeulspeech. Springfield, Ill.: Charles C Thomas, 1966. Fant, G. Analysis and synthesis of speech processes. In G. Malmberg (Ed.), Manual ofphonetics. Amsterdam: North-Holland, 1970. Horii, Y., Weinberg, C. Intelligibility characteristics of superior esophageal speech presented under various levels of masking noise. J. Speech Hearing Res., 1975, l&413-419. Hubbard, D. J. A comparison of speech intelligibility between esophageal and normal speakers via three modes of presentation. Paper delivered to the ASHA convention, San Francisco, 1972. Hyman, M. An experimental study of artificial larynx andesophageal speech. J. Speech Hearing Dis., 1955, 20, 291-299. Jacobson, R., Fant, C. G. M. Tenseness and laxness. In R. Jacobson, C. G. M. Fant, and M. Halle, Preliminaries to speech analysis. Cambridge, Mass.: MIT Press, 1969. Jacobson, R., Fant, C. G. M., Halle, M. Preliminaries to speech analysis. Cambridge, Mass.: MIT Press, 1969. Klatt, D. Voice onset time, frication, and aspiration in word-initial consonant clusters. J. Speech Hearing Res., 1975, 18, 686-706. McCrosky, R. L., Mulligan, M. The relative intelligibility of esophageal speech and artiticial larynx speech. J. Speech Hearing Dis., 1963, 28, 3741. McNemar, Q. Psychological statistics (2nd ed.). New York: Wiley, 1955. Moser, H. One sylluble words. Columbus, Ohio: Merrill, 1969. Nichols, A. C. Loudness and quality in esophageal speech and the artiticial larynx. In J. C. Snidccor, Sr. Author, Speech rehabilitation of the luryngecromized (2nd ed.). Springfield, Ill.: Charles C Thomas, 1968. Nichols, A. C. A note on Hqops and Noll’s “Relationship of selected acoustic variables to judgments of esophageal speech.” J. Commun. Dis., 1971, 4, 51-53. Nichols, A. C. Confusions in recognizing phonemes spoken by esophageal speakers: I. Initial consonants and clusters. J. Co-n. Dis., 1976, 9, 2741.
260
ALAN
C. NICHOLS
Pottinger, A. M. Esophageal intelligibility training: vowels. M. A. Thesis, San Diego State University, 1974. Saffran, A., Szende, T. Esophageal speech as a linguistic form of compensation. Foliu Phoniufr., 1973, 25, 347-364. Shames, G. H., Font, J., Matthews, J. Factorsrelated to speech proficiency of the laryngectomized. J. Speech Hearing Dis., 1963,28, 273-287. Stevens, K. N., Klatt, D. H. Role of formanttransitions in the voiced-voiceless distinction for stops. J. Acoust. Sot. Am., 1974, 55, 653-659. Walker, H. M., Lev, J. Statistical inference. New York: Henry Holt, 1953. Weinberg, B,, Westerhouse, J. A study of pharyngeal speech. J. Speech Hearing Dis., 1973,38, 111-118.