Laterality effects in voice recognition

Laterality effects in voice recognition

Neuropsychologia, 1971, Vol. 9, pp. 425 to 430. Pergamon Press. Printed in England LATERALITY EFFECTS IN VOICE RECOGNITION* D. G. DOEHRING and B. N. ...

350KB Sizes 3 Downloads 181 Views

Neuropsychologia, 1971, Vol. 9, pp. 425 to 430. Pergamon Press. Printed in England

LATERALITY EFFECTS IN VOICE RECOGNITION* D. G. DOEHRING and B. N. BARTHOLOMEUS McGill University, Montreal, Canada

(Received 21 July 1971) Abstraet--Dichotic voice recognition was assessed in three groups of 16 right-handed subjects by a procedure in which a sample voice was presented binaurally and the subject was required to indicate which of two dichotically-presented voices was that of the sample speaker. Consonants, vowels, nonsense syllables, and words were used as speech stimuli. Significant rightear superiority occurred for subjects who responded by naming the ear in which the matching speaker was heard and for subjects who circled the stimulus spoken by the matching speaker on a response sheet, but no significant ear difference was found for subjects who orally repeated the utterance of the matching speaker. Overall accuracy of recognition tended to be greatest with words and nonsense syllables and least with vowels and consonants as speech stimuli. There was no significant interaction between ears and speech stimuli. It was concluded that the right-ear superiority in voice recognition by ear naming and visual choice responses is associated with left-hemisphere dominance for voice recognition. 1. I N T R O D U C T I O N PREVIOUS investigations involving dichotic stimulus presentation have indicated that perception o f consonants [l], digits, words, and nonsense syllables [2], and backward speech sounds [3] is mediated primarily by the left or d o m i n a n t hemisphere, whereas music [2] and environmental sounds [4] are processed primarily by the right or n o n d o m i n a n t hemisphere. Left hemisphere superiority in such studies appears to be associated with the decoding o f acoustic patterns into sets of linguistic features [1] rather than with such variables as conceptual content, meaningfulness, and familiarity [3]. Voice recognition or speaker identification has received little attention f r o m investigators o f cerebral lateralization. Since vocal utterances normally consist o f speech sounds, voice perception might be a verbal ability mediated primarily by the left hemisphere. However, speaker recognition involves non-linguistic parameters such as speed, pitch, and nasality [5], and could conceivably be a nonverbal ability analogous to face recognition, which is impaired by right hemisphere lesions [6-8]. Previous observations lead to conflicting conclusions regarding the cerebral lateralization o f voice recognition. It has been noted tbat voice recognition usually serves as the basic m o d e o f compensation for prosopagnosia (impaired face recognition) resulting f r o m right 1-emisphere lesions [9-11]. This would imply that voice recognition is spared by righthemisphere lesions that impair face recognition. However, GRENIER [12] tested patients who did not differ markedly in face recognition and found that voice recognition was more impaired by right-hemisphere lesions than by left-hemisphere lesions. In an investigation o f voice recognition in normal adults, DOEHRING and Ross [13] tested each ear separately * This research was supported by Grants MA-1652 from the Medical Research Council of Canada and National Health Grant 604-7-729 from the Canadian Department of National Health and Welfare. 425

426

D . G . DOEHRING and B. N. BARTHOLOMEUS

by a matching-to-sample procedure in which the subject was required to indicate which of a sequence of three voices speaking a nonsense syllable matched the speaker of a sample vowel. For subjects who made spoken responses, a trend towards higher right-ear scores was observed, but tbe ear difference did not reach statistical significance. In the present study, cerebral lateralization of voice recognition was investigated by a d i c h o t i c m a t c h i n g - t o - s a m p l e p r o c e d u r e in w h i c h a s a m p l e v o i c e w a s p r e s e n t e d b i n a u r a l l y and the subject was required to indicate which of two dichotically-presented voices was that of the sample speaker. Three modes of response and four types of verbal stimuli were emp l o y e d , a n d t h e effect o f v o i c e f a m i l i a r i t y w a s a l s o e x a m i n e d . 2. M E T H O D 2.1. Subjects Subjects were three groups of 16 right-handed young adults with normal hearing for pure tones. Groups 1 (ear-naming response) and 2 (spoken repetition response) contained 10 males and 6 females, and Group 3 (visual choice response) contained 8 males and 8 females. The mean ages of the three groups were 21.8, 21.8, and 20.6 yr, respectively. 2.2. Speakers and speech stimuli Pairs of binaurally-presented tape-recorded digits were prepared in which the voice of each of seven female speakers was paired with itself three times and with the voice of each of six other speakers once. On the basis of "same" or "different" responses to the tape by five judges, six of the speakers whose voices were identified with over 90 per cent accuracy were selected as distinguishable speakers for the dichotic voice recognition test. The four types of stimuli used were vowels, consonants combined with the vowel (a), nonsense syllables, and meaningful words. The vowels were three front (/i/, /e/,/ze/) and three back (//o, /u/,/a/) vowels, and the consonants were one nasal/n/, one lateral/1/, two p l o s i v e / b / , / k / a n d two f r i c a t i v e / v / , / s / c o n s o n a n t s . The six nonsense syllables /bAp/, /NIk/, /giz/, /fet/, /kzes/ and /dz~d/ were of intermediate association value [14] and differed in the initial, medial and final phonemes, as did the six meaningful words (gaze, beam, lick, sun, fed, and pat). 2.3. Voice recognition test Voice recognition was assessed on four 36-trial sub-tests where each trial consisted of the binaural presentation of a sample voice followed three seconds later by the dichotic presentation of two voices, one of which was the sample speaker. The intertrial interval was six seconds. On each sub-test, the speech stimuli and speakers were randomized, with the restriction that each of the six speech stimuli was presented as a sample six times, each time spoken by a different speaker. The dichotic stimuli were selected randomly with the restriction that each pair would be composed of two different stimuli spoken by two different voices. On about 17 per cent of the trials, one of the dichotic stimuli was the same as the sample stimulus. Selection of the ear to which the matching voice would be presented on each trial was made according to sequences devised by FELLOWS [15], with the restriction that the matching voice was presented 18 times to each ear. In recording the sub-tests, speech stimuli were spoken into two Sony F-25 microphones located inside a sound-treated room, and recorded on a Sony TC-252 stereophonic tape recorder outside the room. All stimuli spoken by one speaker were recorded on the appropriate channels at their pre-determined places in the sub-tests before proceeding to the next speaker. The recording level was adjusted to maintain a constant reading on the VU meter for all speakers. Synchronization of dichotic sounds and timing of intervals between binaural and dichotic sounds and between trials was achieved by means of an indicator light activated by inaudible pulses that had been pre-recorded on the tape. Onset of the last of a series of one-second lights served as a ready signal, with stimuli spoken as soon as the light went out. Each speaker was given a script indicating the required speech stimuli and the number of illuminations of the indicator light before each stimulus was to be spoken. A binaural recording was also made of each speaker reading a 36-word prose passage, six vowels, six consonants, six nonsense words, and six meaningful words, none of which was the same as any of the test stimuli. 2.4. Response requirements and familiarization training Different modes of response were required of the three experimental groups, in an effort to vary the amount of attention to the explicitly verbal content of the speech stimuli. Subjects of Group 1 named the ear at which the matching voice was heard ; subjects of Group 2 repeated the stimulus spoken by the correct matching voice; and subjects of Group 3 circled the speech stimulus spoken by the matching voice on a response sheet on which the matching and non-matching dichotic stimuli had been randomly assigned to the left and right positions on the page. Presumably the greatest attention to verbal content was required of Group 2 and the least attention was required of Group 1.

427

LATERALITY EFFECTS IN VOICE RECOGNITION

Half the subjects in Groups 1 and 2 were given preliminary familiarization training by binaural presentation of each speaker reading a prose passage, and the binaural presentation before each sub-test of each speaker reading six stimuli of the type included on the sub-test. 2.5. Testing procedure All subjects were tested individually in a sound-treated room. Test stimuli were presented at a comfortable listening level through Echo H-606 stereophonic earphones. The experimenter remained in the room to instruct subjects before each sub-test, operate the tape recorder, and record the ear-naming and repetition responses. Testing was completed in one session lasting approximately 45 rain. Within each group, 8 subjects received the sub-tests in the order Vowels, Consonants, Nonsense Syllables, and Words, and the remaining 8 subjects received the order Consonants, Vowels, Words, and Nonsense Syllables. As a control for possible differences in synchronization and recording level between the two channels of the tape recording, half of the 8 subjects who received each order of presentation wore the headset with a marked earphone on the right ear and the other half wore the marked earphone on the left ear, thus reversing the ears to which dichotic stimuli were presented. 3. R E S U L T S Th e percentage o f correctly identified voices is shown in Table 1 as a function o f ears, groups, and types o f speech stimuli, with differences between ears an d c o m b i n e d results for the two ears also shown. R i g h t ear superiority occurred f o r all speecb stimuli in G r o u p s 1 (ear naming) and 3 (visual choice) and for vowels and c o n s o n a n t s in G r o u p 2 (spoken repetition). T h e largest right-ear differences tended to occur with the visual choice m e t h o d o f response. Table 1. Mean percentage of correct voice identifications by the three groups as a function of ears and types of speech stimulus, with differences between ears (R--L) and combined results for the two ears (R-I-L) also shown Ear

Vowels

Consonants

Nonsense Syllables

Words

Combined

1. Ear naming (N= 16)

R L R--L R-k-L

74.0 72.2 1.8 73.1

75.0 68.1 6.9 71.6

81.6 74.3 7.3 78.0

81.3 78.8 2.5 80.1

78.0 73.4 4.6 75.7

2. Spoken repetition (N= 16)

R L R--L RWL

72.9 69.4 3.5 71.2

68.4 63.5 4.9 66.0

65.6 70.8 --5.2 68.2

70.5 75.7 --5.2 73.1

69.3 69.9 --0.6 69.6

3. Visual choice (N=16)

R L R--L Rq-L

79.2 71.5 7.7 75.4

77.4 65.3 12.1 71.4

83.3 76.4 6.9 79.9

87.8 82.6 5.2 85.2

81.9 74.0 7.9 77.5

Group

T h e relative m a g n i t u d e o f ear difference as a function o f types o f speech stimuli varied m a r k e d l y a m o n g groups. F o r G r o u p I, the ear differences for c o n s o n a n t s an d nonsense syllables were a b o u t equally large, with m u c h smaller differences for vowels an d words. F o r G r o u p 2 there was a m o d e r a t e a m o u n t o f right-ear superiority for vowels an d consonants, but a larger a m o u n t o f left-ear superiority f o r nonsense syllables and words. Th e largest a m o u n t o f right-ear superiority observed in any o f the groups occurred for consonants in G r o u p 3, with smaller but still sizeable ear differences f o r the r em ai n i n g three types o f speech stimuli in this group. Relative accuracy o f r e c o g n i ti o n for c o m b i n e d ears as a f u n c t i o n o f speech stimuli was quite consistent a m o n g the three groups, with all three groups m o s t accurate in voice

428

D.G. DOEHRINGand B. N. BARTHOLOMEUS

recognition for words and least accurate for consonants. Overall accuracy was about the same in Groups 1 and 3. Group 2 was less accurate than the other two groups for each type of speech stimulus, probably because this response procedure did not involve a 0.50 probability of chance accuracy. A separate analysis of variance was calculated for each group, in which the effects of ears, speech stimuli, and the interaction of ears and speech stimuli were assessed. For G r o u p 1 (ear naming) there was a significant difference between ears [F(1, 15)=5.61, p < 0.05] and among speech stimuli [F(3, 45) = 3.76, p < 0.025]. Further analysis of differences among speech stimuli by the Newman-Keuls Test revealed a significantly greater accuracy of recognition for words as compared with both vowels and consonants (p<0.01), and for nonsense syllables as compared with vowels (p<0.05) and consonants (p<0.01). For G r o u p 2 (spoken repetition) there were no significant ear or speech stimulus effects. For Group 3 (visual choice) there was a significant difference between ears [F(I, 15),= 11.26, p<0.005] and among speech stimuli [F(3, 45)--15.52, p<0.005]. Further analysis by the Newman-Keuls Test revealed significantly greater accuracy of recognition for words as compared with the other three types of stimuli (p<0.01), for nonsense syllables as compared with both consonants and vowels (p<0.01), and for vowels as compared with consonants (p<0.05). The interaction of ears and speech stimuli did not approach significance for any of tile groups. Familiarization training did not result in greater accuracy or larger ear differences. Analyses of variance of familiarization and ear effects were carried out for Groups 1 and 2. For neither group did the effect of familiarization or the interaction between familiarization and ears approach statistical significance. For G r o u p 2 (spoken repetition), an incorrect response could consist of something other than the stimulus spoken by the non-matching speaker. The most common incorrect response was, in fact, the stimulus spoken by the non-matching voice. This constituted about 90 per cent of the errors for vowels and consonants and 70 per cent of the errors for nonsense syllables and words. Many of the remaining errors involved the correct response distorted by a substitution or inclusion of one phoneme which was not presented to either ear. These partially distorted correct responses occurred at an approximately equal rate for each ear, and would not, therefore, have altered the pattern of observed ear differences had they been scored as correct responses. 4. D I S C U S S I O N Voice recognition by ear naming and visual choice responses was significantly more accurate in the right ear. This finding is clearly indicative of left-hemisphere superiority for voice perception. It might therefore be inferred that linguistic interpretation of speech stimuli is inherent in speaker identification. Such being the case, it is rather puzzling that significant right-ear superiority did not occur with oral repetition of the stimulus spoken by the matching speaker. The necessity for reproducing speech sounds that are not in themselves definitive of speaker identity may have interfered with the judgement of whatever parameters are critical for voice recognition, perhaps in the manner postulated by BROOKS [16]. Overall accuracy of voice recognition was greater for words and nonsense syllables than for vowels and consonants, probably because of the larger number of phonemes sampled in the former [17]. The greater accuracy for words as compared with nonsense syllables and for vowels as compared with consonants cannot be as readily explained.

LATERALITYEFFECTSIN VOICERECOGNITION

429

It should be noted, however, t h a t the left ear m a d e a larger c o n t r i b u t i o n t h a n the right ear to the differences in question. M o r e specific i n f o r m a t i o n r e g a r d i n g the possible differential use o f lateralized cues in voice recognition might be o b t a i n e d b y a m o r e t h o r o u g h investigation o f ear differences relative to different types o f speech stimuli. The lack o f effect o f familiarization training on either accuracy or ear differences in voice recognition is consistent with the absence o f practice effect f o u n d by DOEHRING a n d Ross [13] on alternate forms o f a voice recognition test, a n d also with KIMURA'S [2] finding that laterality effects in a u d i t i o n are u n r e l a t e d to stimulus familiarity. It should be noted, however, that the dichotic tasks differed f r o m n o r m a l speaker recognition in the sense t h a t relative, rather t h a n absolute, identifications were required. The results o f the present study suggest t h a t voice recognition is a verbal ability prim a r i l y m e d i a t e d by the left hemisphere in r i g h t - h a n d e d subjects, and are thus in a c c o r d a n c e with the previously cited clinical reports that voice recognition is n o r m a l in patients with face recognition deficits associated with right-hemisphere lesions, a n d also with the neurophysiological m o d e l o f a u d i t o r y perception p r o p o s e d by KONORSKI [18]. However, it is n o t clear to w h a t extent the present findings are d e p e n d e n t on the p a r t i c u l a r stimuli selected, a n d whether right-ear superiority might also be observed with n o n v e r b a l vocalizations. F u r t h e r research should be directed not only t o w a r d s verification o f hemispheric specialization for this type o f a u d i t o r y perception, but also t o w a r d s gaining further insight into the factors that differentiate verbal a n d n o n v e r b a l abilities. Acknowledgement--The writers wish to thank HARRIETEMERSON,SANDRAFINKELSTEINand ROBERTKROLL for assistance during the study.

REFERENCES I. STUDDERT-KENNEDY,M. and SHANKWEILER,D. Hemispheric specialization for speech perception. J. acoust. Soc. Am. 48, 579-594, 1970. 2. KIMURA,O. Functional asymmetry of the brain in dichotic listening. Cortex 3, 163-178, 1967. 3. KIMURA,D. and FOLB, S. Neural processing of backward speech sounds. Science 161, 395-396, 1968. 4. CURRY, F. K. W. A comparison of left-handed and right-handed subjects on verbal and nonverbal dichotic listening tasks. Cortex 3) 343-352, 1967. 5. STEVENS,K. N., WILLIAMS,C. E., CARBONELL,J. R. and WOODS, B. Speaker authentication and verification: A comparison of spectrographic and auditory presentations of speech material. 3". acoust. Soc. Am. 44, 1596-1607, 1968. 6. WARRINGTON,E. K. and JAMES, M. An experimental investigation of facial recognition in patients with unilateral cerebral lesions. Cortex 3, 317-326, 1967. 7. BENTON, A. L. and VAN ALLEN, M. W. Impairment of facial recognition in patients with cerebral disease. Trans. Amer. Neurol. Assoc. 93, 38-42, 1968. 8. DERENzI, E., SCOTTI,G. and SPINNLER,H. Perceptual and associative disorders of visual recognition. Neurology 19, 634-642, 1969. 9. BORNSTEIN,B. and KIDSON,D. Prosopagnosia. J. Neurol. Neurosurg. Psychiat. 22, 124-131, 1959. 10. BEYN, E. S. and KNYAZEVA,G. R. The problem of prosopagnosia. J. Neurol. Neurosurg. Psychiat. 25, 154-163, 1962. 11. HECAEN,H. and ANGELERGUES,R. Agnosia for faces (prosopagnosia). Arch. Neurol. 7, 92-100, 1962. 12. GRENIER,D. La prosopagnosie et l'agnosie des voix. Unpublished M. A. Thesis, Universit6 de Montr6al, 1969. 13. DOEHRING,D. G. and Ross, R. W. Voice recognition by matching to sample. J. Psycholing. Res. in press, 1971. 14. HILGARD, E. R, Methods and procedures in the study of learning. In Handbook o f Experimental Psychology, S. S. STEVENS(Editor). John Wiley, New York, 1951. 15. FELLOWS,B. J. Chance stimulus sequences for discrimination tasks. Psychol. Bull. 67, 87-92, 1967. 16. BROOKS,L. R. Spatial and verbal components of the act of recall. Can. J. Psychol. 22, 349-368, 1968. 17. BRICKER,P. O. and PRUZANSKY,S. Effects of stimulus content and duration on talker identification. J. acoust. Soc. Am. 40, 1441-1449, 1966. 18. KONORSKI,J. Integrative Activity o f the Brain, pp. 128-131. University of Chicago Press, Chicago, 1967.

430

D . G . DOEHRING and B. N. BARTHOLOMEUS

R 6 s u m ~ - O n a 6tudi6 la reconnaissance de voix pr6sent6e dichotiquement dans trois groupes de 16 sujets droitiers. Le proc6d6 consistait ~ pr6senter l'abord biauralement une voix-exemple, puis le sujet devait indiquer laquelle des deux voix pr6sent6es secondairement etait celle du locuteur pris comme exemple. Les stimuli verbaux comportaient consonnes, voyelles, syllabes sans sens et roots. Une sup6riorit6 significative de l'oreille droite se manifestait chez les sujets dont la r6ponse 6tait de nommer l'oreille ofa 6tait entendue la voix du locuteur donn6e comme exemple, de m6me que chez les sujets qui devaient r6pondre en indiquant sur la feuille de r6ponse le stimulus parl6 par le locuteur dont la voix devait btre reconnue. En revanche, aucune diff6rence significative n'6tait trouv6e chez les sujets qui devaient r6p6ter oralement ee stimulus. D'une fa~on g6n6rale, l'exactitude de la reconnaissance tendait/t ~tre plus grande avec les roots et les syllabes sans sens, et moindre avec les voyelles et les consonnes. On ne constatait pas d'interaction significative entre oreilles et stimulus de parole. On conclut que la supdriorit6 de l'oreille droite darts la reconnaissance de la voix avec r6ponse d'indication de l'oreille ou de choix visuel est associ6e avec la dominance h6misphdrique gauche pour la reconnaissance de la voix. Zusammenfassung--Das dichotische Erkennen yon Stimmen wurde in 3 Gruppen yon 16 rechtshfindigen Versuchspersonen gepr~ift. Dabei wurde eine Versuchsstimme beidohrig dargeboten. Die Versuchsperson wurde aufgefordert die Versuchsstimme aus zwei dichotisch dargebotenen Stimmen herauszufinden. Als Reize wurden Konsonanten, Vokale, sinnlose Silben und W6rter verwandt. Es wurde eine signifikante H6herwertigkeit for das rechte Ohr beirn H0ren einer Sprecherstimme gefunden. Eine signifikante Differenz zwischen rechtem und linkem H6rorgan wurde dann nicht angetroffen, wenn die Worte des Sprechers wiederholt wutden. Die gr613te lJbereinstimmung fand sich bei W6rtern und sinnlosen Silben, die geringste bei Vokalen und Konsonanten. Es bestand keine signifikante Intelaktion zwischen H6r- und Sprechreizen. Es wurde der Schlul3 gezogen, dal3 eine rechtsohrige H6herwertigkeit fiir das Erkennerl yon Stimmen beim Namennennen und bci optischer Antwortwahl besteht, was mit der linkshemisph/irischen Dominanz ftir Stimmenerkennen in Einklang steht.