Confusions in recognizing phonemes spoken by esophageal speakers: II. vowels and diphthongs

JOURNAL OF COMMUNICATION DISORDERS 9 (1976), 247-260 CONFUSIONS IN RECOGNIZING PHONEMES SPOKEN BY ESOPHAGEAL SPEAKERS: II. VOWELS AND DIPHTHONGS A...

Download PDF

841KB Sizes 0 Downloads 21 Views

Report

PDF Reader
Full Text

JOURNAL

OF COMMUNICATION

DISORDERS

9 (1976), 247-260

CONFUSIONS IN RECOGNIZING PHONEMES SPOKEN BY ESOPHAGEAL SPEAKERS: II. VOWELS AND DIPHTHONGS ALAN Deparhnent

C. NICHOLS

of Speech Pathology and Audiology, San Diego State University, San Diego, California 92182

A methodology was developed to determine the patterns of phonemic errors that listeners would make in receiving the vowels and dipthongs of monosyllables spoken by esophageal speakers. Analyses of the features preserved in the errors were performed. The identified phonemic error patterns and the features found to be poorly preserved in the errors were used to structure multiple-choice intelligibility practice materials for esophageal speakers.

Introduction The intelligibility of esophageal speakers has been shown to be severely impaired (Amster et al., 1972; Hyman, 1955; McCrosky and Mulligan, 1963; Nichols, 1976; Shames, Font, and Matthews, 1963). Evidence to the contrary seems limited to studies of superior speakers and/or relatively easy test materials (Horii and Weinberg, 1975; Hubbard, 1972; Weinberg and Westerhouse, 1973). Because the intelligibility of the esophageal speaker is unlikely to improve with use over extended periods of time, and is quite likely to regress (Amster et al, 1972; Diedrich and Youngstrom, 1966), Nichols (1976) developed a methodology for the preparation of practice materials that would promote its maintenance or improvement. The present report continues the exposition of the findings of the program that Nichols (1976) developed. It deals with the vowels and diphthongs, their intelligibilities, and the development of multiple-choice materials that may be used to facilitate practice with vowel and diphthong productions by esophageal speakers. As an approach to the methodological problems of materials development, it was hypothesized that patterns of confusions would occur when esophageal speakers said monosyllables and normal listeners wrote down what they thought was said. The methodology was described as follows in an earlier report: ‘Ihe phonemic characteristics of productions peculiar to esophageal speakers that tend to cause listener confusions were termed a pafholect. The procedure under study involved the identification of reception errors made most frequently by listeners to esophageal speakers. These were, in turn, used to select the foils of minimally-contrasted four-word multiple-choice items to be used in intelligibility practice (Nichols, 1976). o American

Elsevier Publishing

Company,

Inc.,

1976

247

248

ALAN

C. NICHOLS

Method1 Speakers There were 11 male speakers in the esophageal group. All were over 40 and two were in their 70’s. All but one were sufficiently fluent to articulate a three-word phrase without pause. They had had from 2 to 50 hours of group intelligibility practice with multiple-choice materials. Three had moderate hearing losses; four had slight losses. None had losses severe enough to indicate a need for amplification. All but one spoke variants of standard English. One used remnants of Black nonstandard English, but could and did produce standard English during the recording task of the present study. Listeners Group I. The first group to listen to the experimental tapes included 38 lay persons, volunteer workers for the American Cancer Society, who were invited to participate in a demonstration of intelligibility training for esophageal speakers. In addition, there were 13 students from the Department of Speech Pathology and Audiology of the San Diego State University. The total group was predominantly female (45:6) and ranged in age from 19 to 62 years. All were normal speakers of variants of standard English with hearing within normal population limits. There were two Black and two Chicano students in the group who had minimal traces of dialect. The remainder were white. Group 2. The second group to hear the tapes was a graduate seminar in voice pathology at the San Diego State University. All 26 members of the group were speakers of standard English with normal hearing. There were one Chicano and two Black students in the group, none of whom had any detectable dialectical traits. There was one male student. In summary, the speaker and listener groups were representative of fluent esophageal speakers and some of the women who listen to them. Male listeners were not adequately represented, and questions of cross-dialectal communication remain open. Words The stimuli were taken from Moser’s (1969) lists of one-syllable words. There were 20 lists of 20 words each prepared for the esophageal speakers. Twenty initiating and 20 terminating consonants and clusters were represented in each list. The 16 vowels and diphthongs of the General American dialect (Moser, 1969) ‘I%e following material reprises the exposition of method in Nichols (1976). Variations a discussion of the vowel and diphthong results have been made.

appropriate

to

CONFUSIONS

OF PHONEMES

FROM

ESOPHAGEAL

SPEAKERS:

II.

249

were also represented in each 20-word list, with four duplications per list. The following summarizes the phonemes occurring in each list. These were BR2, CH, D, F, FL, G, GR, J, Initial consonants and clusters. KL, L, M, N, P, R, S, SK, ST, SH, and T. -Vowels and diphthongs. HE, I-UT, I-MY, HECK, HAT, HOT, HAWK, HUT, HER, HOE, HOOK, WHO, HIGH, HOW, HOIST, and HUE were the vowels and diphthongs used. Theseincluded DZ, K, KS, KT, LD, LZ, M, Final consonants and clusters. MZ, N, ND, NZ, P, PS, PT, RD, RZ, S, ST, T, and TS. All vowels and diphthongs in Moser’s (1969) lists were included among the stimuli for each speaker. Due to word selection contingencies, balance was not maintained between the vowels and diphthongs. The actual number of speakerlistener interactions for each of these phonemic entities is shown in Tables 1 and 2. These will be discussed later. For selection purposes, a word represented an initial, a vowel or diphthong, and a final. Thus, “jam” represented the initial /J/, the vowel in I-MT, and the terminal /MI. For the most part, a word appeared only once in the speakers’ production lists. Recording Two speakers recorded one list each, and nine speakers recorded two lists each. They read the 20 words from 3 X 5 cards that were placed in a slot on a cardboard sheet at a deliberate rate. Three to four seconds were allowed to elapse between placements to provide time for listeners to write down the word spoken. The words “say” and “again” were printed on the cardboard on either side of the slot. Speakers were instructed to read each target monosyllable as the center word of a phrase, i.e., “say ‘word’ again.” Speakers were seated in an audiometric testing room. A stand-mounted condenser microphone (AKG CKI) with essentially “flat” response from 50 to 15,000 Hz was placed 30 inches from the speaker’s mouth. The microphone was led through a recording system that consisted of an Ampex mixer and amplifier (4M- 10) and Ampex tape deck (602). Subsequent playback was made through an Ampex speaker (62 1). A calibration tone was recorded free-field at the beginning of the tape, 1,000 Hz at 80 dB from the audiometer associated with the room. Playback Playback procedures were the same for both groups, although 6 months separated the listening sessions of Group 1 and Group 2.

*Here, and throughout,

Moser’s (1969) phonemic

notation has been used.

250

ALAN

C. NICHOLS

The tapes were presented to the listeners in a large lecture hall. Calibration was performed with the speaker placed on a table in the front of the room. The level check (80dB) of the calibration tone was made with a Bruel and Kjaer sound level meter (2203, C Scale) placed on the writing leaf of a seat in the center of the sixth row of seats, 30 feet from the front wall and approximately one-third of the distance from the front to the back of the hall. The listeners were seated in the fifth, sixth, and seventh rows of seats, that is, around the calibrating meter. The ambient noise in the hall was found to be 55 ‘_ 3 db.3 The procedures were introduced to Group 1 by an esophageal speaker. He explained the intelligibility project, and in doing so, introduced many of the listeners to the esophageal voice. Next, the experimenter explained the task as follows. You have before you a series of lined sheets. At the top of each sheet is a blank to enter the name of the speaker and the number of the list of words he is going to read. You will then hear the speaker saying a series of three-word phrases. All begin with the word “say” and end with the word “again. ” Please write the middle word of each phrase on a line on the sheet before you. For example, when the speaker says “say ‘top’ again,” write down the word “top.” There are twenty words in each list. Are there any questions? Then we will begin.

Only the experimenter’s presentation was given to the listeners in Group 2. The stimuli were arranged on four tapes, each of which presented the phrase productions of five esophageal speakers. As we noted, the Group 1 listeners were recruited during a demonstration. The number of listeners per reel for Group 1 varied (due to late arrivals and early departures) as follows: Reel l--49; Reel 2-50; Reel 3-3 1; Reel 4-30. Their w-rite-down responses provided the data to be analyzed. The 26 members of Group 2 listened and wrote the target words of all four tapes. Results4 The data for the present report were derived from lists of the responses, one list for each group for each of the 400 words. The overall whole-word write-down intelligibility for these data was 36%) a value quite similar to that found by Amster (1973). Only the initial phonemes and clusters have been analyzed and reported in the present report. The data for the two groups were analyzed separately and then combined. The 31heae procedures were designed to meet Nichols’ (1971) criticisms of a study of esophageal intelligibilitiea. The rationale is more fully developed in that criticism than is appropriate in the present report. 41he following

findings,

material, with variations appropriate to an exposition reprises the Results section of Nichols (1976).

of the vowel and diphthong

CONFUSIONS

OF PHONEMES

FROM

ESOPHAGEAL

SPEAKERS:

II.

reader should note whether it is the separate analyses or the combined under discussion in each of the following sections.

Speaker by Listener

251

data that are

Tallies

Speaker-listener tallies were done separately for each group and then combined. Responses of Group 1 to a speaker’s “say ‘peg’ again” might be as follows: egg, 20; peg, 5; peck, 3; pal, 3; ten, 2; had, 1; apple, 1; tennis, 1; noresponse, 8. The investigator had prepared a speaker by phoneme chart to tally responses to each of the 20 initials and clusters. The data above were then entered on the sheet devoted to the vowel in /HECK/:

Speaker

HE HIT HAY HECK 35 1.

HAT HOT. . . Polysyllables 2 4 .. .

No response 8

The ellipses indicate phonemes that were not among the vowel or diphthong structures of the responses. This process was repeated for each of the speakers, and then done for the remaining vowels and diphthongs. The reader will note that no attempt was made to analyze the polysyllables among the responses. The decision to forgo such analysis was based upon the observation that inherently arbitrary judgments would have to be made for many responses falling into this category, i.e., the investigator could not know which syllable of the response corresponded to the stimulus syllable. Because the hypothesis that anchored the study was that a group of speakers with a particular problem would have a particular pattern of errors, one further exclusion of data was performed, that of the idiosyncratic errors. That is, when a particular production of an initial evoked a response or responses unlike those evoked by any other production of that initial by the speaker or by other speakers, they were excluded from further consideration in the analysis. For example, if among the productions that included the vowel in HECK, only one production evoked responses that used the vowel in HE, such responses were regarded as the problem of a particular effort by a particular speaker, idiosyncratic, and not part of the patholect. 5 They were not treated in further analyses. Only 3% of the responses fell into this class, but occasional speaker-by-response row entries as high as 23 were thus eliminated. The data excluded due to no-response, polysyllabic response, and idiosyncratic response accounted for 12% of the 26,400 total responses for the combined groups (7%) 2%) and 3%) respectively). The data tallies for correct responses (phoneme S?he reader should note that most speakers recorded two lists. Hence, an idiosyncratic even characteristic of a particular speaker.

error was not

252

ALAN

C. NICHOLS

intelligibility), errors of an analyzable form, and the “other” errors, which included the exclusions discussed in this section, are shown in Table 1. The table reports in terms of each of the vowels. It reveals for example, that the vowel in HE was correctly received 7 1% of the time, in error 18% of the time, and involved in “other” errors, idiosyncratic, polysllabic, or no-response, 11% of the time. The most intelligible (%C) vowel was the HAY (79%) and the least intelligible the HUE (52%). The mean intelligibility was 64% with a standard deviation of 8% and a standard error of the mean of 2%. The reliability of the intelligibilities was estimated by computing the test-retest correlation (r) between the %C score for Group 1 for each of the 16 vowels and diphthongs and the comparable %C score for Group 2. This coefficient was moderately high (r = 0.78,df = 14,~ < O.Ol), indicating good stability from one set of responses to the other. The intelligibilities of the Group 2 scores were significantly higher than those of the Group 1 scores (t = 7.96, df = 15, p < 0.001). This latter result may be attributed to the more skilled listeners in Group 2, and their younger ears.

TABLE 1 Table of Statistics for the Response Data. The Numbers of Correct, Errors, and “Other,” or Unscorable Responses, are Shown. Also Shown are Percentage Expressions of these Numbers in Terms of the Total Number of Responses to Each Phonemic Entity of the Study (NR), and the Test-Retest Reliabilities (rrr) for the Error Types Derived from the Error Patterns for the Two Experimental Groups Responses

HE HIT HAJ HECK HAT HQT HW HUT H& HO& HOOK WHO_ @GH HOW H@ST HUJ

NR

correct

%C

Errors

%E

r12 0

1431 2748 1954 1924 1601 2319 1694 1628 1620 1592 1370 1860 1691 1483 736 762

1019 1891 1548 1109 1100 1226 903 1065 1169 1137 763 1161 1145 976 423 396

71 69 79 58 69 53 53 65 72 71 56 62 68 66 57 52

256 545 307 646 329 788 638 387 242 279 409 476 397 291 225 248

18 20 16 34 21 34 38 24 15 18 30 26 23 20 31 33

0.97 0.96 0.92 0.99 0.98 0.88 0.97 0.98 0.56 0.78 0.86 0.76 0.98 0.98 0.96 0.79

“Other”

%U

156 312 99 169 172 305 153 176 209 176 198 223 149 216 88 118

11 11 5 9 11 13 9 11 13 11 14 12 9 15 12 15

o All r12 coefficients were (with N = 15) significant beyond the 0.01 level of confidence with the exception of the r12 for HER responses. In this latter case, the value 0.56 is significant beyond the 0.05 level.

CONFUSIONS

OF PHONEMES

FROM ESOPHAGEAL

SPEAKERS:

II.

253

Production-Reception Matrix The’combined data for the two groups were next cast into a production (P-R) matrix in which the productions attempted by the esophageal speakers are identified with rows and the responses of the listeners are represented as phonemes in the columns. The P-R matrix might be construed as a “confusion matrix.” To do so would be to oversimplify the character and complications of the present procedure. In most confusion matrix studies, the response patterns have been induced by transmission distortions, i.e., noise, filtering, etc. Theoriginal production of the stimuli is done by “normal” speakers of a specifiable dialect. The errors are those of the listener. In the present situation, the errors were hypothesized to be due to the faulty productions of the speakers; their source, insofar as the procedure and analysis can insure, was the esophageal speech per se. The pattern of errors is, under this assumption, the patholect insofar as it is identifiable by the present methodology. The term P-R matrix differentiates this methodology from other studies of receptive confusions. The P-R matrix for the present study is presented in Table 2. The correct receptions are shown in bold type. For example, the esophageal speakers’ HECK word productions (e.g., “bread”) led to 1109 correct receptions by the two groups of listeners. There were also 10 errors in which the reception was a word containing the vowel in HE, 108 errors in which words with the vowel in HIT were received instead of the intended production (“bid” instead of “bed”), 37 HAY errors of reception, etc. The pattern of errors was tested for reliability by considering each of the listener groups’ error response sets as a “test” of the distribution. A test-retest reliability coefficient (r 12) was then computed for the error types in each row. Then for each computation was the number of potential error types, that is, the number of error vowels and diphthongs ( 15) found in the responses of the two groups. The r 12 for the HE pattern of errors was 0.97, indicating very good stability of pattern from Group 1 to Group 2. The average r for these test-retest comparisons, computed by using Fisher’s Z conversion (McNemar, 1955, pp. 148-149) was 0.94, also significant beyond the 0.01 level of confidence. The range was 0.56 to 0.99, all values but one, the 0.56, indicating significance beyond the 0.01 level of confidence. The 0.56 was significant at the 0.05 level. It may be concluded that the pattern of errors in the P-R matrix is highly reliable on the whole, and possessed of good stability for every row (phonemic entity). Certain of the errors have been designated as likely confusors for the purposes of preparing multiple-choice intelligibility practice materials. They are shown in bold italics in Table 2. For example, the vowel in HIT is a likely confusor for the vowel in HE, when a word, containing the latter is spoken by an esophageal speaker. The criteria for such designations were two: (1) the error accounted for at least 5% of the total analyzable errors, and (2) the error occurred in the responses to

F

20

28

22 3 4 4

7

6

14

7

3

13

82

128

152

28

22

26

17

87

279

1226

14

15

6

HOT

--

11

17

33 21

1065 49 14 a2 27 32 51 40 11

5 40 71 4 11 29 30

15

5

119

116

1137

21

15

5

28 9

1.5

58

10

1145

of the responses

27

NR

423

16

3%

762

736

1483

1691

1860

1370

1592

1620

1628

1694 11

1601

1924

1954

2748

1431

2319

31

51

6

HUE

12

HOIST

made by the listeners.

5

976

6

3

12

16

59

39

24

7

4

HOW

3

13

1161

91

3

68

763

7

10

78

52

17

6

2 25

66

11

31

3.5

103

6

5

2

8

14

38

21

14

20

HIGH

WHO

HOOK

and the vowels and diphthongs

5

9

73

21

14

1169

24

83

34

9

107

21

2

903

21

10

4

4

15

HOE

243

23

36

11

6

4

12

HER

HUT

Receptions

I63

8

HAWK

Designated confusots for productions are indicated by bold italics. Correct receptions are shown with bold numerals. a The key words contain the target vowels and diptbongs for productions

56

3

16

HOIST

HUE

34

139

5

2

7

84

10

159

1100

372

52

23

10

HAT

9

70

8

5

33

159

44

1548

51

257

58

1109

21

84

37

HECK

HAY

HOW

15

39

WHO

HIGH

19

HOOK

33

8

HOE

39

HER

8

8

HUT

7

2

HAWK

108

3

10

HECK

128

HOT

53

HAY

91

1891

24

39

HIT

HAT

1019

HIT

HEa

HEa

Productions

TABLE 2 A Production-Reception Matrix Showing the Pattern of Listeners’ Receptions of the Vowels and Diphthongs of Monosyllables Spoken by Esophageal Speakers. Designated Confusors for Productions (See Text for Designation Criteria) and the Number of Potential Receptions (NR) for Each Vowel or Diphthong Are Also Shown. Data Peculiar to One Production, Two Syllable Receptions, and No-Responses Are Not Shown in the Matrix

CONFUSIONS OF PHONEMES FROM ESOPHAGEAL SPEAKERS: II.

255

at least four productions. (From another point of view, the second criterion assured that at least two esophageal speakers had to produce a phoneme in such a way as to induce the particular error. If only two speakers induced an error in response to productions of a given phoneme, they had to do it consistently, both times they said words beginning with the phone.) These criteria were designed to preserve the generality of the confusors, to maximize their potential to model the patholect . Multiple-Choice Intelligibility Materials At this point, it would be possible to develop intelligibility practice materials for esophageal speakers using the P-R matrix as a guide. Moser (1969) would be consulted to exhaust the language’s potential for the particular monosyllabic stimuli under study. For example, the list of potential HUT vowel stimuli might begin with the words BUCK and BLUNT and continue through several items to the word DONE. The list would then be reduced by retaining only those stimuli each of which had among the minimally-contrasted pairings in which it participated at least three potential confusors in the patholectic pattern: HA T, HOT, I-L4WK, HOOK. Among the examples already noted, BUCK has the words BACK, BOCK, BALK, and BOOK as potential minimal contrasts, while the word DONE has DAN, DON, and DAWN. There are no words that have syllabic nuclei among the potential confusors that preserve the BL-NT environment of the word BLUNT. A more elaborate discussion of the details of these procedures and the practical and theoretical principles underlying them may be found in Nichols (1976). The word BUCK and the confusors would be retained to provide for such items as: BACK BUCK BALK BOOK

BOCK BOOK BUCK BACK

BUCK BOCK BACK BALK

Feature Analysis The methodology developed for the present program pursued the analysis of the P-R matrix further. Features derived from the work of Chomsky and Halle ( 1968) and Jacobson, Fant, and Halle (1969) were applied to the error patterns to determine whether particular acoustic characteristics of the phonemic entities studied were “preserved” in the error responses. The features employed in this analysis were & vocalic, 2 high, f back, ? low, f round, and ? tense.

256

ALAN

C. NICHOLS

An example of the procedure, the analysis of the error responses to productions of the vowel in HIT, YIELDED THE FOLLOWING STATISTICS: + vocalic, 86%; & high, 29%; - back, 75%; - low, 90%; - round, 77%; - tense, 54%. That is, of the errors, 86% were vocalic. Because the presence or absence of a feature was placed in this dichotomous frame of reference, it was possible to test the null hypothesis that the feature was not present at a greater than chance, or 50%, level (a one-tailed hypothesis) by the z test of arcsin transformations of proportions (Walker and Lev, 1953). The alpha level was set at 0.0005 since there were 96 proportions to be tested. This provided an experimentwise alpha of 0.047. In this context, the errors for HIT vowel productions were + vocalic, - back, low, and - round, but not -t high or - tense. The results of the application of this procedure to the error responses of all 16 syllabic nuclei are shown in Table 3. Multiple-choice items that take into account the results of the feature analysis are illustrated by the following: ILL EEL ALE YOU’LL

MUTE MATE MITT MEAT

HEED HID HUED HAYED

FAILED FIELD FUELED FILLED

The items contrast the - tense vowel in HIT with the + tense nuclei of HE, HAY, and HUE. Moser ( 1969) provides seven useful monosyllables (ill, fill, fills, filled, hid, mitt, and mitts) that have three such minimal contrasts. Items contrasting the - tense vowel in HECK, with the + tense nuclei of HAY, HAT, and HIGH would provide for more practice of the contrast. Since the - tense feature was not present at greater than chance levels in any of the esophageal speakers’ productions of - tense vowels, the preparation of such practice materials would have substantial theoretical validity. The following set of contrasts provides for tense vs. + tense items within the confusor patterns of thepatholect: Stimuli: Confusors HIT: I-L?, HAY, HUE HECK: HAY, HAT, HIGH HUT: HAT, HOT, HAWK HOOK: HAWK, HER, HOE, HOW: HAT, HOT, HAWK

WHO

The present results bear no resemblance to previously reported intelligibilities for the vowels and diphthongs. Neither Fletcher’s ( 1953) % misinterpreted values nor Black’s (1952) report of the sounds that enhance and deter intelligibility had

+ + + + _

Back

84* 41

96*

N(ermrs) 256 *Critical% 61 @ < 0.005)

Tense

Round

Low

51

+ -

Hnzh

83*

98*

+ _

Vocalic

HE-

545 58

54

II*

90*

75*

29

86*

HIT

307 60

96* 40

81*

92*

36

98*

HAY -

646 57

23

95*

32

89*

73*

93*

HECK -

329 60

85* 26

II* 14

16*

90*

HAT -

788 56

66* 62*

60*

16* 62*

80*

HOT -

638 51

65*

32

51*

79* 91*

89*

HAWK

387 59

30

59*

39

66* 57

91*

HUT -

242 61

40

88* 36

31

44

lOO*

HER

219 60

68*

63* 76*

33 85*

87*

HOE -

409 59

26

Ii’* 68*

89*

27

99*

HOOK _

Vowels and diphthongs

476 58

65*

90* 68*

64*

58*

92*

WHO_

397 59

93* 86*

41 79*

3 6

HIGH

291 60

21

20

69*

79*

3 9

HOW

225 61

73*

32

59

86*

8 9

H@ST

248 61

50

100* 45

41

0 87*

HUE -

12 4 9 I 9 I 6 10 8 8 11 5

N

12 0 2 5 7 5 4 8 3 8 6 0

Preserved

100 0 22 71 78 71 61 80 38 100 55 0

%

TABLE 3 Preservations of Features in Listeners’ Errors in Receiving the Phonemic Productions of Esophageal Speakers. Errors for 16 Vowels and Diphthongs Were Analyzed in Terms of Six Feature Systems: 2 Vocalic, + High, * Back, 2 Low, f Round, and * Tense. Table Entries Are Percentages. Z Teats of the Significance of the Entries Where Chance = 50% Are Indicated (See Text)

258

ALAN C. NICHOLS

any significant predictive value when applied to the % C values shown in Table 1. The intelligibilities of the vowels and diphthongs, while on the average higher than those of the initial consonants and clusters of the stimuli (64% vs. 53%) still confirm that severe impairments of articulatory ability may be observed among the population under study. The ability of the average esophageal speaker to communicate, then, must be attributed in substantial part to the listeners’ ability to make perceptual phonetic closure and to profit from redundancy and context. Such listener activities no doubt account for the common impression that esophageal speech shows no changes in distinctness (see, for example, Safran and Szende, 1973). While all speakers profit from these skills of the mentally active and motivated listener, the esophageal speaker must rely heavily. When the listener is not perceptually active and motivated, communication breakdowns are liable to occur. The problem of communication breakdown is particularly serious in the presence of noise. As Horii and Weinberg (1975) have shown, even superior esophageal speakers are more vulnerable to intelligibility disintegration in noise than the normal speaker. Nichols’ (1968) demonstration that the esophageal phonation itself contains considerable noise, and that superior speakers have less noise than average or poor speakers, must also be taken into account in the context of the present discussion. Outside of the usually quiet therapeutic situation, noise is ubiquitous. No home is free of competing noise sources, and “noise pollution” is an accompaniment of most jobs and social situations in which the esophageal speaker is expected to communicate. It would thus seem relevant to provide intelligibility practice in the presence of noise. The utility of the materials developed by the present methodology must, of course, be tested. Preliminary work with the 16 vowels and diphthongs did not prove to be successful (Pottinger, 1974). Concentration of practice with materials based upon feature analyses (such as the - tense array illustrated in the preceding section) may prove to be more effective. Both group practice, in which esophageal speakers serve alternately as speakers and as listeners, and individual practice, in which esophageal speakers record a list of words and then listen to the recording while responding within the multiplechoice format, have been carried out with the materials. Both applications appear useful (Nichols, 1976). It may also be of interest to test the impact of vowel/diphthong intelligibility practice (and improvement) upon the intelligibility of the consonants. That vowel formant transition cues signal the presence of the consonants is a well-known fact. Fant’s (1970) discussion of this phenomenon provides a good review, but many others are available. Studies such as those of Stevens and Klatt (1974) and Klatt (1975) continue to provide evidence of the importance of the vowel formant transition to such phonetic factors as voicing, a vulnerable feature in esophageal speech intelligibility. They showed, for example, that “voicing onset time” had a

CONFUSIONS

OF PHONEMES

FROM

ESOPHAGEAL

SPEAKERS:

II.

259

trading relation to vowel formant transition time in effecting the 2 voice distinction (Stevens and Klatt, 1974). Jacobson and Fant (1969, p. 57) have called attention to the importance of durational control to the + tense feature. Should zk tense items prove effective in improving speakers’ control of this feature, then improvement of the intelligibility of the stops may be effected. Similar effects upon the intelligibilities of other phonemes may also be possible. Research is needed to test this hypothesis. This study was carried out under a grantfrom the American Cancer Society, California Division, to the San Diego State University Foundation. The contributions of Judy Nicks and Karen Wolfer as research assistants, Anne Pottinger, a graduate student, and the voice pathology seminar of Fall 1974 are gratefully acknowledged.

References Amster, W. W. Letter and data sheet. March 28, 1973. Amster, W. W., Love, R. J., Menzel, 0. J., Sandler, J., Sculthorpe, W. B., Gross, F. M. Psychosocial factors and speech after laryngectomy. J. Cotnmun. Dis., 1972, 5, l-18. Black, J. W. Accompaniments of word intelligibility. J. Speech Hearing Dis., 1952, 17,409-418. Chomsky, N., Halle, M. The sound pattern of English. New York: Harper, 1968. Diedrich, W. M., Youngstrom, K. A. Aluryngeulspeech. Springfield, Ill.: Charles C Thomas, 1966. Fant, G. Analysis and synthesis of speech processes. In G. Malmberg (Ed.), Manual ofphonetics. Amsterdam: North-Holland, 1970. Horii, Y., Weinberg, C. Intelligibility characteristics of superior esophageal speech presented under various levels of masking noise. J. Speech Hearing Res., 1975, l&413-419. Hubbard, D. J. A comparison of speech intelligibility between esophageal and normal speakers via three modes of presentation. Paper delivered to the ASHA convention, San Francisco, 1972. Hyman, M. An experimental study of artificial larynx andesophageal speech. J. Speech Hearing Dis., 1955, 20, 291-299. Jacobson, R., Fant, C. G. M. Tenseness and laxness. In R. Jacobson, C. G. M. Fant, and M. Halle, Preliminaries to speech analysis. Cambridge, Mass.: MIT Press, 1969. Jacobson, R., Fant, C. G. M., Halle, M. Preliminaries to speech analysis. Cambridge, Mass.: MIT Press, 1969. Klatt, D. Voice onset time, frication, and aspiration in word-initial consonant clusters. J. Speech Hearing Res., 1975, 18, 686-706. McCrosky, R. L., Mulligan, M. The relative intelligibility of esophageal speech and artiticial larynx speech. J. Speech Hearing Dis., 1963, 28, 3741. McNemar, Q. Psychological statistics (2nd ed.). New York: Wiley, 1955. Moser, H. One sylluble words. Columbus, Ohio: Merrill, 1969. Nichols, A. C. Loudness and quality in esophageal speech and the artiticial larynx. In J. C. Snidccor, Sr. Author, Speech rehabilitation of the luryngecromized (2nd ed.). Springfield, Ill.: Charles C Thomas, 1968. Nichols, A. C. A note on Hqops and Noll’s “Relationship of selected acoustic variables to judgments of esophageal speech.” J. Commun. Dis., 1971, 4, 51-53. Nichols, A. C. Confusions in recognizing phonemes spoken by esophageal speakers: I. Initial consonants and clusters. J. Co-n. Dis., 1976, 9, 2741.

260

ALAN

C. NICHOLS

Pottinger, A. M. Esophageal intelligibility training: vowels. M. A. Thesis, San Diego State University, 1974. Saffran, A., Szende, T. Esophageal speech as a linguistic form of compensation. Foliu Phoniufr., 1973, 25, 347-364. Shames, G. H., Font, J., Matthews, J. Factorsrelated to speech proficiency of the laryngectomized. J. Speech Hearing Dis., 1963,28, 273-287. Stevens, K. N., Klatt, D. H. Role of formanttransitions in the voiced-voiceless distinction for stops. J. Acoust. Sot. Am., 1974, 55, 653-659. Walker, H. M., Lev, J. Statistical inference. New York: Henry Holt, 1953. Weinberg, B,, Westerhouse, J. A study of pharyngeal speech. J. Speech Hearing Dis., 1973,38, 111-118.

Confusions in recognizing phonemes spoken by esophageal speakers: II. vowels and diphthongs

Confusions in recognizing phonemes spoken by esophageal speakers: II. vowels and diphthongs

Recommend Documents