J. COMMUN. DISORD. 24 11991), 40-50
JUDGING SPEECH COMMUNICATION EFFECTIVENESS IN ORAL CANCER PATIENTS L. J. B O S M A N , J. M. W. FABERT, J. F. A. P R U Y N IVA, Institute.for Social Scient~[~e Research o f the Catholic University of Brabant
M. A. SCHMIT J O N G B L O E D - T E R PELKWIJK, M. F. DE BOER Department of Head and Neck Surgery, the Dr. Daniel den Hoed Cancer Center, Rotterdum
H. W. V A N D E N B O R N E IVA, Institute ,~r Social Scientific Research of the Catholic Universi~., of Brabant
P. C. DE JONG Department of Head and Neck Sur,eer3', the Dr. Daniel den Hoed Cancer Center, Rotterdam
R. M. R Y C K M A N Department o f Psychology, University of Maine, Orono An experiment was conducted to assess the effects of different types of sentences and recording methods on naive judges' evaluations of the speech communication effectiveness of four patients who had undergone surgery for oral cancer. As expected,judges understood patients better if they read meaningful rather than meaningless sentences and if their speech was evaluated under video rather than audio conditions. However, these general findings were qualified because of the powerful influence of individual differences among patients. For example, whereas the intelligibility of three of the patients increased under the audio condition if the sentences being read were meaningful, one patient was poorly understood no matter what type of sentence he read under the same condition. The results suggested that the identification of the unique personality characteristics of patients that are related to their intelligibility merits serious consideration by both researchers and rehabilitation clinicians.
INTRODUCTION Part of the difficulty in generating a consistent knowledge base in the literature on observer judgments of speech communication effectiveness in cancer patients who have experienced a decline in their speech abilities revolves around the fact that measurement procedures vary considerably a c r o s s i n v e s t i g a t i o n s . W i t h r e s p e c t to the m a t e r i a l s u s e d to a s s e s s cornAddress correspondence to L. J. Bosman, IVA, Institute for Social Scientific Research, P.O. Box 90153, NL-5000 Le Tilburg, The Netherlands. 40 0~121-9924/91/$3.51)
t: 1991 by Elsevier Science Publishing Co.. Inc. 655 Avenue of the Americas, New York, NY 100111
JUDGING SPEECH COMMUNICATION
41
munication effectiveness, for example, subjects in some studies were required to read a list of words aloud (e.g., Klein, Wasserstrom, Sessions, Merson, and Ogura, 1977; Gibbs and Achterberg-Lawlis, 1979; Leonard and Gillis,, 1982; Ryan, Gates, Cantu, and Hearne, 1982; Rizer, Schechter, and Coleman, 1984). Second, some investigators asked their patients to read aloud a number of sentences. For example, in studies by Clark and Stemple (1982) and Clark (1985) patients read meaningless sentences; in other studies patients read meaningful sentences (e.g., Amster, Love, Menzel, Sandier, Sculthorpe, and Gross, 1972; Hubbard and Kushner, 1980). Third, in a number of studies patients were asked to read aloud a standard text (e.g., Berry and Knight, 1975; Daou, Shultz, Remy, Turner Chan, and Attia, 1984; DiBartelo, 1971; Salmon, Kushner, and Knox, 1979; Teichgr~iber, Bowman, and Goepfert, 1985). And fourth, in some investigations judgment was based on "spontaneous" conversation (e.g., Byles, Forner, and Stemple, 1985; Green and Hults, 1982; Olson and Shedd, 1978; Schumann, Laniado, and Carstens, 1981). Although in some studies more than one of the above-mentioned methods were used (e.g., Amster et al., 1972; Gibbs and Achterberg-Lawlis,, 1979), in no study was a direct comparison of meaningful versus meaningless sentences methods conducted. Furthermore, in most studies interrater reliability data for the various measures of speech communication effectiveness were not reported (Pruyn et al., 1986). Adding to the complexity of the situation, studies can also be distinguished in terms of the methods used to record the patients' speech. In the majority of studies only audio recordings were employed (e.g., Byles et al., 1985; Clark and Stemple, 1982; Gibbs and Achterberg-Lawlis, 1979; Kalb and Carpenter, 1981; Klein et al., 1977; Leonard and Gillis, 1982; Rizer et al., 1984; Teichgrgtber et al., 1985). In several others, only video recordings were used (e.g.,, Green and Hults, 1982; Knox and Anneberg, 1973; Salmon et al., 1979; Daou et al., 1984); and finally both methods were employed in only three studies (Berry and Knight, 1975; Hubbard and Kushner, 1980; Ryan et al., 1982). The three major purposes of the present experiment are (1) to examine directly the impact of two different types of sentences (meaningless, meaningful) on communication effectiveness; (2) to assess the effects of video (visual plus audio cues) and audio recordings on the communication effectiveness of patients; and (3) to assess interrater reliabilities for communication effectiveness obtained under the various treatment conditions. More specifically, because meaningful sentences more closely reflect the kind of speech that listeners ordinarily experience, they should be easier to understand than sentences that are meaningless. Therefore, we expected higher communication effectiveness scores in the meaningful sentences than in the meaningless sentences condition. Also, because visual speech cues have been found to enhance communication effectiveness judgments of the speech of cancer patients (e.g., Berry and Knight, 1975;
42
L.J. BOSMAN et al.
Hoops and Noll, 1971 ; Ryan et al., 1982), we anticipated that judges would understand the speech of patients better if it was presented under video rather than audio conditions. Finally, we also explored whether there were differences in interrater reliabilities depending on the type of sentences and recording methods used.
METHOD Judges Twenty-eight male and 33 female freshmen sociology students of Tilburg University served as judges (ages ranged from 18 to 46 years; mean = 26.0) as part of their course requirements. Speakers Speech samples of four patients who had been surgically treated for cancer in the cavum oris in the Rotterdam Cancer Center, Dr. Dani61 den H o e d or in the Rotterdam Academic Hospital Dijkzigt, were used. They were taken 6 weeks after surgery.
PROCEDURE Speech Samples Each patient was asked to read aloud 30 sentences: thirteen meaningful sentences (e.g., " T h e farmer has broken his a r m , " " I t is raining during the whole d a y " ) (Plomp and Mimpen, 1979) and 17 meaningless sentences (e.g., " T h e low eye went the letter" and " T h e whole flower could the k e y " ) (Van Erp, 1985). n For each patient two lists of sentences (one of each type) were randomly chosen without replacement so every speaker read different sentences t Plomp and M i m p e n (1979) designed 20 lists with 13 different s e n t e n c e s each. The sentences are in the D u t c h spoken language, are short (8 or 9 syllables), are phonetically balanced, and are not diffficult or confusing (according to 10 speech therapists). It is a s s u m e d that all lists have the s a m e difficulty grade. Van Erp (1986) designed 10 lists of meaningless sentences with 17 different s e n t e n c e s each. The s e n t e n c e s have the following qualities: The first two s e n t e n c e s are practice s e n t e n c e s (these are the same for all the lists). Each sentence has 7 - 1 0 syllables. The structure of the sentence is article, adjective, substantive, verb (past), article, substantive. The substantives are the m o s t frequently used words with less than three syllables in spoken Dutch. The lists are phonetically balanced. It is a s s u m e d that all lists are identical with respect to difficulty.
JUDGING SPEECH COMMUNICATION
43
that were comparable in difficulty level. The total number of 30 sentences (except for the two practice sentences, which were listed first) were randomly assigned to the patient to read aloud. For this study the speech samples of the four patients were copied to a new videotape in order to have the four patients consecutively on one tape.
Task Half of the judges heard only the patient talking (audio condition), whereas the other half saw and heard the patient (video condition). The student judges were randomly assigned to these conditions. Within a condition the students were split up in two classrooms. In four classrooms a videorecorder and TV set was installed. F o r the audio condition the TV screen was covered. The students were sitting at a distance of two meters from the TV set. The judges were told that they were going to hear or hear and see a person speaking and that they would have to answer some questions and would have to write out the sentences on the sheets in front of them. 2 The students who participated in the video condition were asked to look at the patient and listen as he read aloud the sentences; the students in the audio condition simply heard the same sentences. All judges were asked to write out each sentence they had heard or seen and heard. Finally all subjects answered the following questions (which are described in full in Appendix A): (1) To what extent could you hear what the person said? (2) H o w do you evaluate this p e r s o n ' s speech? (3) To what extent is the speech intelligible? (4) To what extent is the person intelligible? (5) What percentage of the speech can you hear?
Designs for Analyses In an attempt to simplify the data analyses on the five subjective variables (items I - 5 ) , intercorrelations were performed on the standardized scores among the variables, yielding an average r of .74. 3 Consequently, a composite measure of subjective communication effectiveness was computed for each judge by summing the scores cross the five variables for each patient. The design for the composite subjective variable analysis was a 2 x 2 × 4 mixed analysis of variance (ANOVA), with judges' sex ( m a l e - f e male), method of recording (video-audio), and patients (A, B, C, D) as 2 All students signed a declaration in which they promised to maintain secrecy about the information they received from the audio or video recordings. 3 Subjective measures involve ratings by the judges; objective measures involve calculations by the experimenters.
44
L.J. BOSMAN et al.
the independent variables. The first two independent variables were between-subjects and the last one was within-subject. For the objective speech effectiveness analysis, it was first necessary to compute percentages of words correctly understood for the meaningful and meaningless conditions. From each of the 28 sentences (the first two sentences were excluded) the number of correctly understood words was counted by the experimenters. Percentages (the number of understood words divided by the total number of spoken words multiplied by 100) were then calculated. The design for the objective variable analysis of the percentage of words correctly understood was a 2 x 2 x 2 x 4 mixed A N O V A , with judges' sex (male-female), method of recording (audio-video), type of sentences (meaningful-meaningless sentences), and patients (A, B, C, D) as the independent variables. The first two variables were between subjects, whereas the last two were within subjects.
RESULTS
Interrater Reliability Analyses Computation of the interrater reliabilities for the subjective and the objective speech effectiveness measures under audio and video conditions are presented in Table 1. These reliabilities vary between .74 and .91 and are satisfactory. An analysis of the differences between the levels of reliability reveals that the reliability levels in the audio condition for the objective measurement procedures are significantly higher than for the subjective measurement procedure [comparisons between .74 v s . . 8 7 (z = 1.988, p < 0.05) and .74 v s . . 9 1 (z = 3.00, p < .01)]. The levels of reliability in the video condition between the subjective and objective procedures do not differ between .81 v s . . 8 5 (z = .67, n.s.) and .81 vs. .88 (z = 1.29, n.s.)].
Table 1. Interrater Reliability Indices for the Subjective and Objective Measures under Audio and Video Conditions lnterrater reliability index Measurement
Audio
Video
Subjective speech effectiveness measure Objective speech effectiveness measures Meaningless sentences Meaningful sentences
0.74
0.81
0.87 0.91
0.88 0.85
45
JUDGING SPEECH COMMUNICATION
Table 2. Analysis of Variance of Subjective Communication Effectiveness Data for Judges' Sex, Method of Recording, and Individual Patients Source Between subjects Judges' sex (S) Method of recording (AV) SxAV Within subjects Patients (P) SxP AV×P SxAVxP
Degrees of freedom
Mean square
F
P
1
2.00
.18
.670
1 1
114.61 8.50
10.52 .78
.002 .381
3 3 3 3
1020.27 2.54 33.58 2.79
295.66 .74 9.73 .81
.000 .532 .000 .491
Analysis of the Subjective Speech Communication Effectiveness Data An ANOVA was performed on the subjective measure data, with judges' sex, method of recording, and patients as independent variables. An examination of Table 2 indicated that the differences among patients explain most of the variance. There is no significant main effect for judges' sex, but there is a significant effect for method of recording, with patients being better understood in the video than in the audio condition, as predicted. This effect is qualified, however, by a significant interaction between patients and type of recording.
Analysis of the Objective Speech Communication Effectiveness Data An ANOVA was performed on the percentage of words correctly understood, with judges' sex (male-female), method of recording (videoaudio), type of sentence (meaningful-meaningless), and patients (A, B, C, D) as independent variables. An examination of Table 3 shows that, once again, differences among patients explain most of the variance. Also, there is no significant main effect for judges' sex. A significant main effect is found for method of recording, with patients in the video condition being better understood than patients in the audio condition. Again this effect is conditioned by a significant patient by method of recording interaction. This analysis also reveals that, as predicted, meaningful sentences (x = 66.8) were better understood than meaningless sentences (x = 59.8); however, this effect is also conditioned by the significant interaction between patients and type of sentence read. Finally, there is a three-way interaction between patients, method of recording, and type of sentence. The means for this interaction are presented in Figure I. It shows that judges understand meaningless sentences better under the video condi-
46
L . J . B O S M A N et al.
100%
90%
80%
70%
60%
50%
40%
30%
20% V i d e o ~C_ Audl
IMeaningless
78.8%
! 37.8%
I Meaningful
Patients A, B, C, D ~Meanlngless
~Meenlngful
Figure 1. Mean percentages of correctly understood words for patients, type of sentence, and method of recording.
tion, as compared with the audio condition, for every patient. In contrast, there is no difference in judges' understanding of meaningful sentences read by patients A and B under audio and video conditions, whereas there is greater understanding of the meaningful sentences read by patients C and D under video as compared with audio conditions. For patient D there is no difference in intelligibility between meaningful and meaningless sentences under the audio condition, whereas there is a difference between type of sentences under the video condition.
JUDGING SPEECH COMMUNICATION
47
Table 3. Analysis of Variance of Percentages of Words Correctly Understood for Judges' Sex, Method of Recording, Type of Sentences, and Individual Patients Source Between subjects Judges' sex (S) Method of recording (AV) S x AV Within subjects Type of sentence (T) S × T AV × T S x AV x T Patients (P) S x P AV x P S x AV x P T x P S x T x P AV x T x P S x AV x T x P
Degrees of freedom
Mean square
F
P
1 1 l
12.97 276.95 .80
3.45 73.60 .21
.069 .000 .647
l l l l 3 3 3 3 3 3 3 3
110.75 .02 .04 .39 780.34 1.74 17.61 1.50 10.24 .32 8.35 .36
215.00 .05 .09 .75 1230.31 2.75 27.76 2.37 29.44 .91 24.01 1.05
.000 .830 .770 .389 .000 .045 .000 .072 .000 .437 .000 .373
DISCUSSION The data of this experiment indicate that naive judges evaluate the speech of patients at satisfactory levels of reliability, irrespective of the type of sentence being read and the kind of recording methods being utilized. However, an examination of the relative levels of reliability assessed by subjective versus objective measurement procedures reveals that the reliabilities in the audio condition are lower when the subjective procedure is used. It makes no difference which kinds of measurement proceeures are used in the video condition; they are all comparable. Thus, researchers might prefer to use more objective measurement procedures if they plan to use the audio-recording method in their studies. In general, the reading of meaningful sentences by patients enhanced their intelligibility more than did the reading of meaningless sentences. Furthermore, communication was understood better if it occurred under video rather than audio conditions. These general findings must be qualified, however, in light of the strong influence of individual differences among patients. Clearly, some patients were more intelligible than others. Whereas the intelligibility of three of the patients (A, B, C) increased under the audio condition if the sentences being read were meaningful rather than meaningless, patient D was poorly understood no matter what type of sentence he read under the same condition. Interestingly, this
48
L.J. BOSMAN et al.
patient's intelligibility was enhanced if he read meaningful sentences under the video condition, as was the intelligibility of the other three patients. Consistent with much prior research, the addition of visual cues increases the intelligibility of speakers. Since the reading of meaningful sentences under video conditions (visual plus audio) approximates most closely the conditions in everyday living likely to be encountered by patients, researchers might prefer to utilize such conditions in their investigations whenever possible. Researchers might also consider that certain patients cannot be understood well under audio conditions. Highly restricted variability among such patients in a particular research sample employing an audio technique might result in an inability to confirm investigatory hypotheses. Finally, the identification of the unique personal characteristics of patients that are directly related to their intelligibility warrants serious consideration both experimentally and in clinical decision-making (cf. Kalb and Carpenter, 1981). This study was made possible by a grant from the Netherlands Cancer Foundation "Nederlandse Kankerbestrijding."
APPENDIX A: OVERVIEW OF THE QUESTIONNAIRE FOR THE LISTENERS The judges had to write out the words they heard from the list of sentences spoken by the patients. And subsequently they answered the questions below. 1. To 1. 2. 3. 4. 5. 6.
what extent could you hear what the person said? Very bad. Bad. Rather bad. Rather good. Good. Very good.
2. H o w do you evaluate this person's speech? 1. As very good. 2. As good. 3. As rather good. 4. As bad. 5. As very bad. 3. To what extent is the speech intelligible? 1. Excellent. 2. Good.
JUDGING SPEECH COMMUNICATION
49
3. Satisfactory, but some words are not intelligible. 4. Some words are produced, but these are probably only intelligible for listeners who are accustomed to the speech of the person. Some noises are produced, but these are not intelligible. 6. No sounds are produced. .
4. To 1. 2. 3. 4.
w h a t e x t e n t is the p e r s o n intelligible? Not at all. Rather bad. Rather good. Very good.
. W h a t p e r c e n t a g e o f the s p e e c h can y o u hear?
REFERENCES Amster, W. W., Love, R. J., Menzel, O. J., Sandler, J., Schulthorpe, W. B., and Gross, M. (1972). Psychosocial factors and speech after laryngectomy. J. Commun. Disord. 5:1-18. Berry, R. A., and Knight, R. E. (1975). Auditory versus audio-visual intelligibility measurements of alaryngeal speech: A preliminary report. Perc. Motor Skills 40:915-918. Byles, P. L., Forner, L. L., and Stemple, J. C. (1985). Communication apprehension in esophageal and tracheoesophageal speakers. J. Speech Hear. Disord. 50:114-119. Clark, J. G., and Stemple, J. C. (1982). Assessment of three modes of alaryngeal speech with a synthetic sentence identification (SSI) task in varying messageto-competition ratios. J. Speech Hear. Res. 25:333-338. Clark, J. G. (1985). Alaryngeal speech intelligibility and the older listener. J. Speech Hear. Disord. 50:60-65. Daou, R. A., Shultz, J. R., Remy, H., Turner Chan, N., and Attia, E. L. (1984). Laryngectomee study: Clinical and radiologic correlates of esophageal voice. Otolaryngol. Head N e c k Surg. 92:628-634. DiBartelo, R. (1971). Psychological con sideration s in the attainment of esophageal speech. J. Surg. Oncol. 3:451-466. Erp, van, A. J. M. (1985). De sociale evaluatie van gespleten-gehemelte-sprekers. Logoped. Foniatr. 57:314-321. Fagel, W. P. F., Herpt, van, L. W. A., and Boves, L. (1983). Analysis of the perceptual qualities of dutch speakers' voice and pronounciation. Speech Commun. 2:315-326. Gibbs, H. W., and Achterberg-Lawlis, J. (1979). The spouse as facilitator for esophageal speech: A research perspective. J. Surg. Oncol. 11:89-94. Green, G., and Hults, M. (1982). Preferences for three types of alaryngeal speech. J. Speech Hear. Disord. 47:141-145.
50
L . J . BOSMAN et al.
Hoops, H. R., and Noll, J. D. (1971). The effects of listener sophistication on judgments of esophageal speech. J. Commun. Disord. 4:250-260. Hubbard, D. J., and Kushner, D. (1980). A comparison of speech intelligibility between esophageal and normal speakers via three modes of presentation. J. Speech Hear. Res. 23:909-916. Kalb, M. B., and Carpenter, M. A. (1981). Individual speaker influence on relative intelligibility of esophageal speech and artificial larynx speech. J. Speech Hear. Disord. 46:77-80. Klein, A. D., Wasserstrom, J. P., Sessions, D. G., Merson, R., and Ogura, J. H. (1977). Rehabilitation of partial laryngectomy patients. Trans. Am. Acad. Ophthamol. Otololaryngol. 84:324-334. Knox, A. W., and Anneberg, M. (1973). The effects of training in comprehension of electrolaryngeal speech. J. Commun. Disord. 6:110-120. Leonard, R., and Gillis, R. (1982). Effects of a prosthetic tongue on vowel intelligibility and food management in a patient with total glossectomy. J. Speech Hear. Disord. 47:25-30. Olson, M. L., and Shedd, D. P. (1978). Disability and rehabilitation in head and neck cancer patients after treatment. Head Neck Surg. 1:52-58. Plomp, R., and Mimpen, A. M. (1979). Speech-reception threshold for sentences as a function of age and noise level. J. Acoust. Soc. Am. 66:1333-1342. Pruyn, J. F. A., Jong de, P. C., Bosman, L. J., Poppel, van, J. W. M. J., Borne, van den, H. W., Ryckman, R. M., and Meij de, K. (1986). Psychosocial aspects of head and neck c a n c e r - - A review of the literature. Clin, Otolaryngol. 11:469474. Pruyn, J. F. A., Jong de, P. C., Bosman, L. J., Borne, van den, H. W., Poppel, van, J. W. M. J., and Ryckman, R. M. (1984). Carcinoom in her hoofd-halsgebied, een eerste orientatie. IVA, Instituut voor sociaal-wetenschappelijk onderzoek van de katholieke hogeschool Tilburg en werkgroep hoofd-halstumoren R'dam. Rizer, F. M., Schechter, G. L., and Coleman, R. F. (1984). Voice quality and intelligibility characteristics of the reconstructed larynx and pseudolarynx. Otolaryngol. Head N e c k Surg. 92:635-638. Ryan, W., Gates, G. A., Cantu, E., and Hearne, E. M. (1982). Current status of laryngectomee rehabilitation: III. Understanding of esophageal speech. Am. J. Otolaryngol. 3:91-96. Salmon, S. J., Kushner, H., and Knox, A. W. (1979). Judgments by children and adults regarding communication skills of esophageal speakers. J. Commun. Disord. 12:95-101. Schumann, K., Laniado, K., and Carstens, N. (1981). Funktionelle Ergebnisse nach Laryngektomie. Laryngol. Rhino. Otol, Grensgebiete 60:378-380. Shrout, P. E., and Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychol. Bull. 86:420-428. Teichgraeber, J., Bowman, J., and Goepfert, H. (1985). New test series for the functional evaluation of oral cavity cancer. Head N e c k Surg. 8:9-20.