PRODUCTION AND PERCEPTION OF FINAL CONSONANT VOICING IN SPEECH PRODUCED BY INEXPERIENCED SIGNERS DURING SIMULTANEOUS COMMUNICATION SARAH B. D’AVANZO,TRISHA GRAZIANO, DALE EVAN METZ, and NICHOLAS SCHIAVETTI Department of Communicative Disorders and Sciences, State University of New York, Geneseo, New York
ROBERT L. WHITEHEAD National Technical Institute for the Deaf, Rochester Institute of Technology, Rochester, New York
Simultaneous communication combines both spoken and manual modes to produce each word of an utterance. This study investigated the potential influence of alterations in the temporal structure of speech produced by inexperienced signers during simultaneous communication on the perception of final consonant voicing. Inexperienced signers recorded words that differed only in the voicing characteristic of the final consonant under two conditions: (1) speech alone and (2) simultaneous communication. The words were subsequently digitally edited to remove the final consonant and played to 20 listeners who, in a forcedchoice paradigm, circled the word they thought they heard. Results indicated that accurate perception of final consonant voicing was not impaired by changes in the temporal structure of speech that accompanied the inexperienced signers’ simultaneous communication. ©1998 by Elsevier Science Inc. Educational Objectives: The reader will be able to describe the effect that simultaneous communication has on vowel durations produced by persons who are inexperienced signers; and the effect that simultaneous communication has on the perception of final consonant voicing in speech produced by persons who are inexperienced signers; and to describe similarities between voiceless/voiced vowel duration ratios in speech alone and simultaneous communication conditions for persons who are inexperienced signers. KEY WORDS: Simultaneous communication; Speech perception; Sound identification; Stop consonants; Digital editing
Address correspondence to Dale Evan Metz, Ph.D.; Department of Communicative Disorders and Sciences, State University of New York, Geneseo, Geneseo, NY 14454. Telephone: (716) 245-5132; fax: (716) 245-5434; e-mail: .
J. COMMUN. DISORD. 31 (1998), 337–346 © 1998 by Elsevier Science Inc. All rights reserved. 655 Avenue of the Americas, New York, NY 10010
0021-9924/98/$19.00 PII S0021-9924(98)00007-0
338
D’AVANZO et al.
INTRODUCTION Simultaneous communication (SC) combines speech and various forms of manually coded English (e.g., signs, fingerspelling) in an attempt to produce each word of an utterance in both spoken and manual modes (Akamatsu & Stewart, 1989; Marmor & Petitto, 1979; Maxwell & Bernstein, 1985; Strong & Charlson, 1987). Research findings have indicated that the overall temporal structure of speech produced by experienced signers is altered dramatically during SC. Notable temporal alterations in speech include a slower rate of speech, increased sentence, word, and vowel durations, increased voice onset times and interword interval durations, and more pauses immediately after a signed word (Bellugi & Fischer, 1972; Huntington & Watton, 1984; Schiavetti, Whitehead, Metz, Whitehead, & Mignerey, 1996; Whitehead, Schiavetti, Whitehead, & Metz, 1995; Windsor & Fristoe, 1989, 1991). It is well established that preceding vowel durations play a pivotal role in the perception of final stop consonant voicing in spoken English (Hillenbrand, Ingrisano, Smith, & Flege, 1984; Hogan & Rozsypal, 1980; Kent & Read, 1992; Raphael, 1972; Wardrip-Fruin, 1982). Because preceding vowel durations do provide acoustic cues for final stop consonant perception, it is reasonable to suggest that increased vowel durations associated with SC may influence final stop consonant perception. This possibility led Metz, Schiavetti, Lessler, Lawe, Whitehead, and Whitehead (in press) to investigate the potential effects of temporally altered speech produced during SC on listeners’ perception of final consonant voicing. Metz et al. (in press) had experienced signers produce words that ended in both voiced and voiceless consonants under two experimental conditions: (1) speech alone, and (2) SC. The words produced by the experienced signers were digitally edited to remove the final consonant, and the edited words were played to listeners who were instructed to identify the voicing characteristic of the final consonant. The findings indicated that despite a considerable lengthening of vowel durations preceding both voiced and voiceless final consonants in the SC condition listeners were able to accurately identify the voicing characteristic of the final consonants equally well in both experimental conditions. Whitehead, Schiavetti, Metz, and Farrell (1997) have shown that inexperienced signers exhibit greater durational increases than experienced signers on the same temporal aspects of speech (Whitehead et al., 1995). For example, Whitehead et al. (1995) reported average vowel durations of approximately 188.75 ms for experienced signers whereas Whitehead et al. (1997) reported average vowel durations of 371.25 ms for inexperienced signers. In addition, both the experienced and the inexperienced signers’ vowel durations preceding voiced consonants were longer than their vowel durations preceding voiceless consonants as is consistent with English voicing rules. The average magnitude of the difference between the vowel durations preceding voiced
VOICING PERCEPTION DURING SIMULTANEOUS COMMUNICATION
339
and voiceless consonants for the inexperienced signers, however, was substantially less than that exhibited by the experienced signers. The difference in the duration of vowels preceding voiced versus voiceless consonants was 88 ms for the experienced signers but only 50 ms for the inexperienced signers suggesting a potential blurring of vowel duration differences preceding voiced and voiceless final consonants. It is likely that many of the communicative interactions that deaf and hard of hearing children will have in their language acquisition years will be with persons who are not experienced signers (e.g., parents, teachers in mainstream classrooms, friends, peers, etc.). Given that inexperienced signers tend to protract their speech during SC considerably more than experienced signers, and given the reduced durational differences of vowels preceding voiced and voiceless consonants, it is reasonable to ask if the accurate perception of final consonant voicing in speech produced by inexperienced signers might be compromised during SC. As such, the purpose of the present study was to systematically replicate and extend the Metz et al. (in press) study by using inexperienced as opposed to experienced signers. The specific question asked in this study was, do changes in the temporal structure of speech produced by inexperienced signers have an influence on listeners’ perception of final consonant voicing?
METHOD Speakers Ten female undergraduate students who were in the last week of an introductory sign language course (15-week semester) served as the speakers in this study. These speakers had no previous experience with SC. The course focused on the development of expressive and receptive fingerspelling and signing, and sign language concepts. Approximately 300 signs for English words were introduced during the course. English was the first language of all the speakers, and they reported no previous or current, speech, language, or hearing problems.
Speech Stimuli The speech stimuli consisted of six pairs of CVC English experimental words (see Appendix A). Each pair differed only in the voicing characteristics of the final consonant (e.g., hit vs. hid; but vs. bud, etc.). Each target word was embedded in the carrier phrase “I can say _______” and presented to the speakers in one of two random orders on 3 3 5 flash cards.
340
D’AVANZO et al.
Recording Procedures The speakers produced each sentence, with its embedded experimental word, under two conditions: (1) speech alone, and (2) speech combined with signed English and fingerspelling, or SC. The speakers were instructed to produce orally the sentences at a comfortable rate and loudness level for the speech alone and SC conditions. Additionally, for the SC condition, the speakers were instructed to sign all the words of the carrier phrase and to fingerspell the experimental word. The rationale for having the speakers fingerspell the experimental word was two-fold: (1) adding fingerspelling to the task made it more representative of typical SC, and (2) because some of the experimental words like “bud”and “hud” do not have conventional signs. The instruction to fingerspell the experimental word provided uniformity of manual production across speakers. The experimental words were arranged in two random orders, one for SC and one for speech, and the order of experimental conditions (speech alone vs. SC) was randomized for each speaker. Speech samples for both conditions were obtained in a sound attenuating room with a microphone (Shure SM48) which was placed 30 cm from the speaker’s mouth and connected to a Tascam 202 MKII tape recorder.
Digital Editing Procedures The audio recordings were low passed filtered at 8 kHz and digitized with 16bit precision at 20 kHz by a Kay Elemetrics Computerized Speech Laboratory (CSL) system (4300B) and stored on disk. The digital representation of the test sentences was displayed on the graphics terminal and the CVC experimental word was identified and its duration was measured. A cursor was then placed and stored at the onset of acoustic energy associated with the release of the initial consonant of each experimental word. A second cursor was placed and stored at the zero-crossing (to avoid introducing audible clicks) preceding the last cycle of the experimental word’s vowel. These two cursors demarcated the CV syllable portion of the CVC experimental word; the last two vowel cycles and the final consonant’s closure phase and release were eliminated. The duration of the CV syllable portion of the experimental word was measured and then submitted to the CSL’s D-to-A processor. The line output of the D-to-A processor was connected to a high quality tape recorder (Tascam 202KMII). Separate listening tapes were made for each subject that comprised a random order of the CV syllable portions of the experimental words from both conditions (speech alone and SC).
CV Syllable and Vowel Duration Measurement Reliability Intrajudge reliability of the durational measurements was assessed by remeasuring the entire data corpus of two randomly selected speakers using the pro-
VOICING PERCEPTION DURING SIMULTANEOUS COMMUNICATION
341
cedures described above. The intrajudge reliability coefficients for the CV and V segment duration measurements were .986 and .981, respectively, for the speech alone condition, and .989 and .972 for the SC condition. The average magnitude difference between the original and replicate measurements of the CV and V segments were 1.22 ms and 2.53 ms, respectively, for the speech alone condition and .025 and 4.48 for the SC condition.
Listeners Twenty undergraduate students in communicative disorders served as listeners. All listeners passed a hearing screening at 20 dB HL (American National Standards Institute, 1989) at .5, 1, and 2 kHz; spoke English as their first language; and were unaware of the nature of the recordings they would audit.
Listening Procedures The 20 listeners were assembled into five separate listening groups in a soundattenuating room. The recordings of the speakers were played to the listeners who were seated in groups of five in chairs arranged along an arc traced 2 meters from the center of the playback speaker. Individual response sheets were made for each speaker that listed in pairs the experimental word and its foil (e.g., bit—bid) in the order read by that speaker. The actual word read by the speaker was placed randomly in either the first or second position of the pair. The words were presented in the sound field with the peak vowel intensity of each sample set at approximately 70 dB SPL (C-scale of a B & K 2204 sound level meter positioned 2 meters in front of the loudspeaker) by previously adjusting the input record level of the tape recorder that received the CSL D-to-A signal. Listeners were instructed to circle one of the words in the pair. Correct word recognition (i.e., correct perception of final consonant voicing) percentages were computed for both the speech alone and SC conditions.
RESULTS Syllable and Vowel Durations Table 1 shows the average edited CV syllable durations, the average vowel durations, and the voiced to voiceless vowel duration ratios for both speaking conditions. For the speech alone condition, the edited CV syllables from words that had ended with final voiced consonants were on average 36.10 ms longer than those that had ended with final voiceless consonants and the vowel duration preceding the voiced consonant was 60.20 ms longer than the vowel duration preceding the voiceless consonant. For the SC condition, the edited CV syllables from words that had ended with final voiced consonants were on av-
342
D’AVANZO et al.
Table 1. Means and Standard Deviations (in Parentheses) of the CV Syllable Durations and Vowel Durations for Words Ending with Voiced and Voiceless Consonants for Speech Produced during the Speech Alone and Simultaneous Communication (SC) Conditions CV syllable duration (ms)
Vowel duration (ms)
Speaking condition
Voiced
Voiceless
Voiced
Voiceless
Speech alone
320.4 (37.7) 397.6 (49.4)
284.3 (33.3) 315.2 (43.5)
260.4 (36.5) 321.6 (15.5)
200.2 (30.6) 254.1 (37.6)
SC
Vowel duration ratio 0.77 0.79
Vowel duration ratios (voiceless/voiced) for both experimental conditions are shown in the right column.
erage 82.40 ms longer than those that had ended with final voiceless consonants and the vowel duration preceding the voiced consonant was 67.50 ms longer than the vowel duration preceding the voiceless consonant. The vowel duration ratio (voiceless/voiced) was .77 for the speech alone condition and .79 for the SC condition.
Listener Judgments There were 4,800 judgments (12 experimental words 3 2 conditions 3 10 speakers 3 20 listeners) made by the listeners in this study. For each speaker under both conditions, an average percent correct final consonant recognition was computed across all the listeners. Listeners correctly identified the voicing characteristics of the deleted final consonant for 77.5% of the CV words produced during the speech-alone condition and for 74.88% of the CV words produced during the SC condition. A paired t-test indicated that this was not a significant difference, (t (9) 5 21.04; p 5 .323).
DISCUSSION The results of the present investigation demonstrated that perception of final consonant voicing was not impaired by the durational changes accompanying the typically slower speech pattern of SC produced by inexperienced signers. These results are consistent with the findings of Metz et al. (in press) regarding the perception of final consonant voicing in speech produced by experienced signers. Comparing the values listed in Table 1 of this article and Table 2 of Metz et al. (in press) reveals that the inexperienced signers exhibited longer CV syllable and vowel durations than the experienced in both experimental conditions. Inspection of these tables further reveals that the ratios of
VOICING PERCEPTION DURING SIMULTANEOUS COMMUNICATION
343
the vowel durations preceding voiced and voiceless consonants were approximately the same in both studies for both the SC and speech-alone conditions. Thus, although speech was slowed during SC in both studies, the relative durations of vowels preceding voiced and voiceless consonants were unaffected allowing preceding vowel duration to remain as one intact acoustical cue for final consonant perception. The present study and the Metz et al. (in press) study addressed only one specific perceptual task, perception of final consonant voicing, and this task was somewhat difficult for listeners, as indicated by the 25% to 30% error rates of our listeners in both studies. This is important because the experimental methods used in the study highlight a redundant acoustical cue (i.e., vowel duration) to consonant identification that is carried by the preceding vowel, thus indicating that altered speech rate alone did not produce deleterious carry-over effects to this important acoustical redundancy. Although some authors have criticized SC for the speech rate changes evidenced in this mode (Huntington & Watton, 1984), this is the second study to address a specific perceptual consequence of the rate alteration. As Metz et al. (in press) stated previously, if the speech rate changes typically found in SC did have deleterious perceptual effects, then serious reconsideration of the continued use of SC would be necessitated. This study, once again, addressed only one of many possible perceptual consequences of slowed speech rate during SC and found no deleterious perceptual consequence. More research is needed to examine other perceptual consequences of speech produced during SC in order to determine its viability as a method for communicating with young deaf and hard of hearing persons. A portion of this research was conducted at the National Technical Institute for the Deaf in the course of an agreement between the Rochester Institute of Technology and the United States Department of Education. Part of this research was supported by funds from the Geneseo Foundation provided through the Research Council. We are grateful to Dr. Bruce Godsave of the School of Education, State University of New York at Geneseo, and the students in his introductory sign language class for their participation in this experiment.
REFERENCES Akamatsu, C.T., & Stewart, D.A. (1989). The role of fingerspelling in simultaneous communication. Sign Language Studies, 65, 361–374. American National Standards Institute (1989). Specifications for Audiometers. New York, NY: ANSI. Bellugi, U., & Fischer, S. (1972). A comparison of sign language and spoken language. Cognition, 1, 173–200. Chen, M. (1970). Vowel length as a function of the voicing consonant environment. Phonetica, 22, 129–159.
344
D’AVANZO et al.
Hillenbrand, J., Ingrisano, D.R., Smith, B.L., & Flege, J.E. (1984). Perception of the voiced-voiceless contrast in syllable-final stops. Journal of the Acoustical Society of America, 76, 18–26. Hogan, J.T., & Rozsypal, A.J. (1980). Evaluation of vowel duration as a cue for the voicing distinction in the following word-final consonants. Journal of the Acoustical Society of America, 67, 1764–1771. Huntington, A., & Watton, F. (1984). Language and interaction in the education of hearing-impaired children (Part 2). Journal of the British Association of Teachers of the Deaf, 8(5), 137–144. Kent, R.L., & Read, C. (1992). The acoustic analysis of speech. San Diego, CA: Singular. Marmor, G.S., & Petitto, L. (1979). Simultaneous communication in the classroom: How well is English grammar represented? Sign Language Studies, 23, 99–136. Maxwell, M., & Bernstein, M.E. (1985). The synergy of sign and speech in simultaneous communication. Applied Psycholinguistics, 6, 63–81. Metz, D.E., Schiavetti, N., Lessler, A., Lawe, Y., Whitehead, R.L., & Whitehead, B. (in press). Production and perception of speech produced during simultaneous communication. Journal of Communication Disorders, 30, 495–505. Raphael, L. (1972). Preceding vowel duration as a cue to the perception of the voicing characteristics of word-final consonants in English. Journal of the Acoustical Society of America, 51, 1296–1303. Schiavetti, N., Whitehead, R.L., Metz, D.E., Whitehead, B.H., & Mignerey, M. (1996). Voice onset time in speech produced during simultaneous communication. Journal of Speech and Hearing Research, 38, 565–572. Strong, M., & Charlson, E.S. (1987). Simultaneous communication: Are teachers attempting an impossible task? American Annals of the Deaf, 132, 376–382. Wardrip-Fruin, C. (1982). On the status of phonetic cues to phonetic categories: Preceding vowel duration as a cue to voicing in final stop consonants. Journal of the Acoustical of America, 71, 187–195. Whitehead, R.L., Schiavetti, N., Whitehead, B.H., & Metz, D.E. (1995). Temporal characteristics of speech produced during simultaneous communication. Journal of Speech and Hearing Research, 38, 1014–1024. Whitehead, R.L., Schiavetti, N., Metz, D.E., & Farrell, T. (1997). Simultaneous communication in beginning signers—Part I: Temporal characteris-
VOICING PERCEPTION DURING SIMULTANEOUS COMMUNICATION
345
tics of speech. Paper presented at Annual Convention of Acoustical Society of America, San Diego, CA. Windsor, J., & Fristoe, M. (1989). Key word signing: Listeners’ classification of signed and spoken narratives. Journal of Speech and Hearing Disorders, 54, 374–382. Windsor, J., & Fristoe, M. (1991). Key word signing: Perceived and acoustic differences between signed and spoken narratives. Journal of Speech and Hearing Research, 34, 260–268. Appendix A. Voicing Perception During Simultaneous Communication Stimulus words Voiced final consonant
Voiceless final consonant
Hid Bid Bad Had Bud Hud
Hit Bit Bat Hat But Hut
CONTINUING EDUCATION Production and Perception of Final Consonant Voicing in Speech Produced by Inexperienced Signers During Simultaneous Communication QUESTIONS 1. As used in this study simultaneous communication (SC) involved: a. Speech combined with American Sign Language b. Speech combined with signed English c. Speech combined with signed English and fingerspelling d. Speech combined with key word signing e. Speech combined with fingerspelling 2. Compared to experienced signers, inexperienced signers: a. Greatly lengthen their vowel durations during SC b. Greatly shorten their vowel durations during SC c. Exhibit vowel durations that are comparable to experienced signers d. Exhibit a high degree of variability in their vowel durations e. None of the above
346
D’AVANZO et al.
3. In the present article, final consonant voicing cues were removed by: a. Mixing speech with masking noise b. Low-pass filtering speech c. Digitally editing speech d. a and b e. b and c 4. The results reported in this article indicate that: a. Final consonant voicing was perceived better in speech alone than in SC b. Final consonant voicing was perceived better in SC than in speech alone c. Final consonant voicing perception was essentially the same in SC and in speech alone d. Final consonant voicing was not perceived in speech alone e. Final consonant voicing was not perceived in SC 5. The results reported in this article indicate that for inexperienced signers: a. Vowel duration ratios preceeding voiceless/voiced consonants were longer in SC than in speech alone b. Vowel duration ratios preceeding voiceless/voiced consonants were longer in speech alone than in SC c. Vowel duration ratios preceeding voiceless/voiced consonants were the same in SC and in speech alone d. Vowel duration ratios preceeding voiceless/voiced consonants could not be calculated in SC e. Vowel duration ratios preceeding voiceless/voiced consonants could not be calculated in speech alone