BRAIN
AND
LANGUAGE
An Analysis
3,
209-228 (1976)
of Speech
Perception
in Word Deafness
ELEANOR M. SAFFRAN,OSCARS. M. MARIN, AND GRACE H. YENI-KOMSHIAN Department of Neurology, Baltimore City Hospitals. Departments Neurology and Otolaryngology, The Johns Hopkins University, School of Medicine, Baltimore, Maryland
of
A patient with a rather pure word deafness showed extreme suppression of right ear signals under dichotic conditions, suggesting that speech signals were being processed in the right hemisphere. Systematic errors in the identification and discrimination of natural and synthetic stop consonants further indicated that speech sounds were not being processed in the normal manner. Auditory comprehension improved considerably however, when the range of speech stimuli was limited by contextual constraints. Possible implications for the mechanism of word deafness are discussed.
At some point in the processing of auditory speech signals, the operations performed by the two temporal lobes must begin to diverge. Evidence from dichotic listening experiments suggests that hemispheric specialization may begin with the extraction of phonetic information, and that this process is most likely lateralized on the left (StuddertKennedy & Shankweiler, 1970; Studdert-Kennedy, Shankweiler & Pisoni, 1973). We would therefore expect to find deficits of phonetic perception with injuries of the left temporal lobe, but in most of these cases any disturbances at the phonetic level are compounded by deficits at other levels of language function. The patient with “pure word deafness”, or “auditory verbal agnosia,” who is impaired only in the perception of speech sounds, is rare.’ A dozen or so such cases have been reported in the literature, but the nature of the perceptual disturbance is not yet well understood. The disorder typically begins as (or progresses into) a Wernicke’s aphasia, with difficulties in expressing as well as understanding both written and spoken language. As expressive functions recover, the We gratefully acknowledge the use of synthetic speech stimuli originally recorded at the Haskins Laboratories. We also thank Drs. Myrna Schwartz and Bruce Hamill for helpful comments on earlier drafts of this paper. Dr. Yeni-Komshian was supported by NIH Research Grant No. 09994 from the National Institute of Neurological Diseases and Stroke. Address correspondence to Dr. Eleanor M. %&ran, Department of Neurology, Baltimore City Hospitals, 4940 Eastern Avenue, Baltimore, Maryland 21224. 1 The syndrome is, in fact, so rare that some authors have questioned its existence (Marie, 1906). 209 Copyright 0 1976 by Academic Prew. Inc. All rights of reproduction in any form reserved.
210
SAFFRAN,
MARIN
AND
YENI-KOMSHIAN
deficit becomes restricted to the perception of speech sounds. Reading ability is preserved and it is possible to communicate with the patient by writing. Hearing is often found normal upon audiometric examination and non-speech sounds can be recognized without difficulty. Classically, the syndrome has been attributed to a deep lesion in the left temporal lobe which is thought to isolate the auditory association area from primary auditory input (Lichtheim, 1885). The presumed locus of the lesion, in the region of Wernicke’s area, readily accounts for the more general aphasic involvement at onset. The most straightforward explanation of the speech perception problem would seem to be that phonetic processing mechanisms are damaged or disengaged by a limited, focal lesion. However, bilateral lesions or peripheral sensory deficits have been found in some cases (Goldstein,. 1974), and more general auditory deficiencies have been held responsible for the phonetic disturbance in others (Albert & Bear, 1974). In recent years, psychological studies of speech perception have begun to provide some understanding of how speech sounds are processed by the human brain. The use of synthetic speech stimuli has made it possible to fractionate these complex and variable auditory signals and to identify acoustic parameters that are relevant for phonemic discrimination; dichotic listening techniques have been used to localize processing operations in one cerebral hemisphere or the other; and several theoretical approaches to speech perception have been advanced (see Studdert-Kennedy, 1974, for a review). As yet, there has been Iittle attempt to relate these developments to central disorders of speech perception. This case study is an effort in that direction. CASE DESCRIPTION
The patient is a right-handed 37-year-old male with an eight year history of withdrawal seizures and peripheral neuropathy due to alcohol abuse. He suffered a massive pulmonary embolism one year prior to this study and continued to have episodes of thrombophlebitis which were only partially controlled by anticoagulation. Six months before the present admission the patient developed acute swelling and pain in the lower extremities. On this occasion, the peripheral problem was associated with the sudden onset of aphasic symptomatology, including poor comprehension and garbled speech. The language problem was not explored until the next hospital admission six months later, when the patient was admitted during a convulsive episode. At this point, a neurological examination was performed and was considered normal except for slight motor asymmetries suggestive of left hemisphere involvement and language difficulties described in more detail below. A brain scan showed a slightly increased
SPEECH PERCEPTION
IN WORD DEAFNESS
211
uptake in the sylvian area of the left side, but the left carotid angiogram and electroencephalogram were normal. The overall neurological picture was interpreted as compatible with an occlusive vascular lesion affecting the territory of the left middle cerebral artery. When the language problem was evaluated about six months after onset, most of the expressive difficulties had cleared, leaving only a very mild dysnomia and occasional phonemic paraphasias. Reading was intact. Spelling was poor. Auditory comprehension and repetition were disproportionately impaired. The severity of the disturbance was evident in a speech perception test in which the patient had to identify familiar monosyllabic words, selected to reflect phoneme frequency in English; his performance on this simple multiple choice test was only 68% correct. Although acutely aware of his problem, the patient was unable to describe his auditory experience beyond the frequent complaint that speech sounds did not “register,” or that they would not “come up.” Once he made the intriguing remark that “it’s as if there were a bypass somewhere, and my ears were not connected to my voice.” Imitation of speech sounds was attempted only after repeated presentation or if the stimulus set had been constrained in some way. Conversation was also difficult for the patient unless he himself provided the topic or was otherwise informed what it was. Lip-reading facilitated comprehension and we therefore took care to prevent it in the experiments reported below. The audiogram showed no evidence of peripheral hearing loss which could explain the perceptual deficit. Both ears were within the normal range, with the left ear relatively better in the frequency range above 4kHz. Speech reception thresholds were also within the normal range, 20 db SPL in the right ear and 10 db in the left. Comparable difficulty with nonspeech sounds could not be demonstrated. The patient identified a variety of recorded sounds, including musical instruments and environmental noises such as typing, keys jingling, telephones ringing, etc. He could determine the gender of a recorded voice and whether the language spoken was English or “foreign.” Attempts to assess musical abilities were largely unsuccessful. The patient denied any knowledge of music, but was nonetheless able to identify a few familiar melodies and the sounds of many instruments. He did well on the rhythm test of the Seashore Measure of Musical Talents, a task which involved a simple same-different judgement; other tests in the Seashore battery demand more difficult discriminative responses (e.g., whether a pitch is higher or lower than the previous one) and the patient would not cooperate in performing them.
212
SAFFRAN. MARIN AND YENI-KOMSHIAN
In brief, then, these results are consistent with the diagnosis of a relatively pure word deafness, with only minimal involvement of other language functions, as a consequence of embolic infarction of the left temporal lobe. The perceptual disorder was further explored in three directions: (1) dichotic listening tests to examine the lateralization of speech perception mechanisms; (2) systematic studies of phonemic identification and discrimination; and (3) an examination of contextual effects in word recognition. DICHOTIC LISTENING STUDIES Although both temporal lobes receive input from both ears, there is evidence that the contralateral pathway is more effective. Perhaps the best demonstration comes from dichotic experiments in patients with sections of the corpus callosum. Under dichotic conditions, the split brain patient is usually able to report only the right ear stimulus, although he can report left ear signals when they are presented monaurally (Milner, Taylor & Sperry, 1968; Sparks & Geschwind, 1968). Comparable extinction effects have been observed in hemispherectomy patients (Cullen, Thompson, Hughes, Berlin & Samson, 1974). While these data suggest that the ipsilateral pathway is relatively ineffective under dichotic conditions, normal subjects are nevertheless able to perceive both stimuli when listening dichotically, with one ear accorded only a slight advantage. It has therefore been proposed that the callosal pathway is the primary source of ipsilateral input under dichotic conditions (Sparks & Geschwind, 1968). With a few more assumptions, the dichotic listening model can be applied to word deafness. If the left para-acoustic area has been deprived of both direct and transcallosal auditory input (Lichtheim, 1885), processing of incoming signals should be restricted to the right temporal lobe. There, ipsilateral right ear signals will presumably be suppressed by competing input to the left ear; and, because there are no direct connections between primary sensory areas (Geschwind, 1965), and the left auditory association cortex is essentially deafferented, by the lesion, callosal input from the right ear will also be relatively ineffective. The word deaf patient should therefore show a very strong left ear affect under dichotic conditions. Note that while there is a general tendency for the normal right ear advantage for verbal materials to diminish with left hemisphere lesions (Schulhoff & Goodglass, 1969; Zurif & Ramier, 1972), we are here predicting a more specific and more powerful effect: that like the split brain and hemispherectomy patients, the word deaf patient should have difficulty perceiving the right ear stimulus at all under dichotic competition.
SPEECH PERCEPTION
IN WORD DEAFNESS
213
METHODS The patient performed two dichotic tasks. One, the Dichotic Names Test, consisted of fifty pairs drawn from a list of twelve monosyllabic names: Ben, Bill, Bob, Chuck, Dick, Doug, Jack, Ken, Pat, Ted, Tim and Tom. Names sharing the same consonant or vowel were not paired, so that the stimulus presented to one ear was maximally different from the stimulus to the other ear. Since our concern here was with ear differences, the patient was familiarized with the stimuli prior to testing to assure a reasonably high level of monaural performance. It will be recalled that audiometric testing had indicated that the patient’s speech reception threshold was 20 db for the right ear and 10 db for the left. We tried to correct for this discrepancy by raising the intensity of the right ear stimuli relative to the left ear stimuli. Thus the Dichotic Names Test was given at several different intensity combinations. The same test was also given monaurally at 65 db SPL. Dichotic testing was carried out in a sound-treated room, where the stimuli were presented through stereophonic headphones. For the dichotic test, the patient was given an answer sheet which listed four choices for each trial. He was informed that two different stimuli were present on each trial and was urged to try to underline two names on the answer sheet. The second dichotic task, designed to examine vowel differences, was composed of dichotic pairs of monosyllabic English words. The discrepancy between the two stimuli is only on the vocalic portion of the syllable (e.g., “boot” vs. “bait”) and there is generally no ear advantage under dichotic listening conditions in which no masking noise is present (Weiss & House, 1973). The patient was given 53 dichotic pairs and was instructed to underline two choices out of five possibilities, e.g., “boot, boat, bait, beat, bat”. The intensity of the dichotic stimuli was about 70 db SPL in each ear.
RESULTS AND DISCUSSION
On the Dichotic Names Test, the patient was 100% correct when the stimuli were presented monaurally to the right and left ears. On dichotic stimulation, however, the patient essentially reported only the stimuli presented to the left ear even when the sensation level in the right ear was presumably 10 db higher (Table I).” The patient would acknowledge hearing only a single stimulus and could not be persuaded to indicate a second choice on the answer sheet. When the discrepancy between the two stimuli was as much as 30 db (20 db sensation) the patient began to indicate that he was now aware of a second stimulus which he could not identify, and complained that this “noise” was interfering with the task. Even in this case the left ear scores are almost twice as high as right ear scores. It is interesting to note the close parallel between the right ear scores and the error scores; that is, as the right ear intensity is increased the error scores go up. The results on the vowel test are similar, namely, a strong left ear advantage and an apparent lack of awareness that two different stimuli 2 It is interesting that the patient remained unaware of the right ear stimulus even when it was his own name (Ben).
214
SAFFRAN, MARIN AND YENI-KOMSHIAN TABLE 1 RESULTS
OF THE DICHOTIC
NAMES
TEST
Percent in each response category (relative intensities in db SPL) Stimulus level Sensation level Right ear Left ear Error
R =7O,L = 6.5 R-CL, 5db
R = 75,L = 65 R=L
R = 85,L = 65 R >L, 10 db
R=95,L=65 R >L,20 db
8 84 8
0 92 8
8 82 10
24 48 24
were present. The following is the percentage in each response category: Left ear = 70%; Right ear = 9%; Errors = 21%. As predicted, therefore, we find almost total suppression of the right ear stimulus under dichotic conditions although the patient is able to perceive the right ear signal monaurally. A similar finding in another case of word deafness has recently been reported by Albert and Bear (1974). These results suggest that word deaf patients process speech sounds largely in the right hemisphere. With respect to current theories of speech perception, this finding has some interesting implications. . A distinction has been made between “auditory” and “phonetic” stages in the processing of speech sounds (Studdert-Kennedy, 1974). The auditory process extracts characteristics like pitch and intensity from speech and non-speech sounds alike; when the stimulus is verbal, this information is further subjected to a phonetic analysis in which portions of the signal are identified as particular phonetic features. While the line between auditory and phonetic processes is still to be drawn, there is evidence that phonetic mechanisms are localized largely, if not entirely, in the left hemisphere (Studdert-Kennedy, Shankweiler & Pisoni, 1973; Wood, 1975). If so, and if our interpretation of the dichotic data is correct, it is possible that the aberrant speech perception in word deafness reflects an auditory analysis performed by the right hemisphere, rather than some malfunction of the phonetic processor in the left: that is, that speech perception has been arrested at a pre-phonetic level. PERCEPTION
OF PHONEMES
To get a more systematic picture of the disturbance, we looked at the patient’s perception of a limited class of speech sounds. Since most of the recent work on auditory speech perception has focused on the stop consonants /b p d t g k/, we concentrated our efforts on this group of phonemes.
SPEECH PERCEPTION
215
IN WORD DEAFNESS
In articulatory terms, the stops can be described by combinations of two parameters or “distinctive features” (Table 2), place of articulation (labial, alveolar or velar) and voicing (vocal chords vibrating or silent). Acoustic correlates of these features have been determined by experimentation with synthetic speech signals (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967). The fact that perceptual errors and biases occurring under a variety of experimental conditions can be explained in terms of phonetic features argues strongly in favor of their “psychological reality”; that is, that they are extracted from the speech signal during the perceptual process (e.g., Blumstein, 1974). In attempting to describe the patient’s perceptual problem within this framework, we were especially concerned with the following kinds of questions: How does the patient identify and discriminate phonemes? Can his errors be explained in terms of distinctive feature relationships? Will he perceive the stop consonants “categorically,” as normal subjects do, or will he be more sensitive to the acoustic variability within phonemic categories? These questions were addressed in a series of experiments on the identification and discrimination of stop consonants, using both natural and synthetic speech sounds. IDENTIFICATION
OF NATURAL
STOP CONSONANTS
In the first set of experiments, the patient was asked to identify stop consonants which appeared as initial phonemes in CV or CVC syllables with the vowels /i e E a o o u/. This information would yield evidence of any systematic perceptual confusion. METHOD Since a written response format was used, the syllables were chosen so that the vowel sound could always be clearly represented in the orthography. Thus it was necessary to use a terminal consonant in the /E/ series (e.g., “ben”). For the other vowels, CV syllables were sufficient (e.g., for /b/, “bee, bay, ba, baw, bo, boo”). In most instances, the test consisted of 30 trials, with each of the six stops represented five times, in random order, in combination with the same vowel. The patient was seated in a quiet room directly behind the experimenter. He was given an answer sheet listing the six syllable choices (e.g., “bay, pay, day, tay, gay, TABLE 2 ARTICULATORV FEATURESOFTHE
Manner of articulation Voiced Unvoiced
STOP CONSONANTS
Front (bilabial)
Middle (alveolar)
Back (velar)
lb/ IPI
Id/
Id
ltl
lki
216
SAFFRAN.
MARIN
AND
YENI-KOMSHIAN
kay”) and was instructed to underline the syllable heard on each trial. The stimuli were clearly enunciated in a normal speaking voice by a female native speaker of English. The /a/ series was given six times over a three month period, the ld series three times, and the others twice.
RESULTS AND DISCUSSION
Of particular interest in these results are the indications of perceptual variability on the one hand, and of certain regularities on the other: while it is clear that many of the constancies essential to normal speech perception are lacking, the error patterns often seem to be quite systematic. Table 3 lists the frequency of correct identification of each stop consonant as a function of the following vowel. Unlike normals, the patient’s perception of the consonant is apparently altered by vocalic context. The formant transitions which indicate place of articulation do, in fact, vary with changes in the following vowel (Delattre, Liberman & Cooper, 1955); phonetic encoding is thought to remove this variability so that acoustically different versions of the same phoneme are perceived as the same, i.e., “categorically” (Liberman, Mattingly & Turvey, 1972). If the patient is processing speech in an auditory mode, he might indeed be more sensitive to these acoustic differences. To test this hypothesis adequately would require more systematic manipulation of acoustic parameters than was possible here. However, some features of the patient’s performance are at least suggestive of the importance of auditory (as opposed to phonetic) factors in explaining his deficit. From Table 3, for example, there is some indication that the stops are relatively more intelligible when combined with vowels produced at the front of the mouth (/i e E a/) that at the back (/o o u/). The formants which comprise the signal become rather compressed as the vowel moves backward (Delattre, Liberman & Cooper, 1955), and perhaps more difficult for an auditory processor to resolve. TABLE 3 INTELLIGIBILITY
OF STOP CONSTANTSASA
FUNCTIONOFTHE
FOLLOWING VOWEL
Percent identified correctly
b d i3 P t k mean
i
e
E
a
3
0
U
mean
60 20 20 100 100 50 58
40 0 10 10 60 90 35
loo 30 20 70 80 80 63
90 38 38 28 41 86 54
30 10 10 50 80 70 42
40 7 20 40 67 73 41
10 40 0 50 60 50 35
62 24 19 58 78 83
SPEECH PERCEPTION
IN WORD DEAFNESS
217
In general, the voiceless stops /p t k/ are more intelligible than the voiced, with the exception of Jbi. Across vowels, performance is most variable for the bilabials lb/ and /p/, a consequence, perhaps, of the fact that the patient’s name (Ben) begins with Jb/. Note, however, that while “hen”, a stimulus in the I&f series, is always identified correctly, there is no consistent superiority for other common words in the syllable lists (e.g., from Table 3, “pen”, “tea”, and “pea” appear to be highly intelligible while “key”, “day”, “pay” and “go” are not). Space does not permit inclusion of the complete set of confusion matrices from these experiments. Instead types of errors are summarized in Table 4. Note particularly the large number of place errors with alveolar stimuli and the strong tendency to identify the voiced consonants as unvoiced; for the most part, these results reflect a disposition to label fdi, fgJ, and often JtJ and Jpl as ikf. In general, voicing errors somewhat exceed place errors,3 a difference which is probably underestimated since there are three possible assignments for place and only two for voicing. More detailed evidence for these trends can be found in Table 5, the confusion matrix for the /a/ series. Almost 46% of the errors are misassignments to the category /ka/; another 20% are false recognitions of Jbal, which, like Ikai, is itself highly intelli~ble. Among the stops, ibl and ikl can be thought of as articulatory opposites: the voiced stop /b/ is produced at the front of the mouth and Jk/, which is unvoiced, is produced at the back. It is interesting to consider, therefore, that there may have been some simplification of speech perception along articulatory lines. It is also intriguing to note similar phonemic preferences in child language and cross language data: in acquisition, iki and tbi appear earlier than Jgl and lpi; and in languages which lack one or more of the stops, /g/and /p/are likely to be the ones omitted (Ferguson, 1975). IDENTIFICATION
OF SYNTHETIC
SPEECH SOUNDS
The use of synthetic speech stimuli has made it possible to identify those components of the complex acoustic signal which are essential for the identification and discrimination of speech sounds. It has been shown, for example, that voice onset time (VOT), the interval between the release of the burst (air pressure) and the onset of laryngeal vibration, is critical for the distinction between stop consonants sharing the same place of articulation, that is, between lb/ and JpJ, Jdt and lti, and lgl and JkJ. By delaying the onset of laryngeal vibration relative to the release of the burst, abrupt discontinuities in perception at delays of 20-40 msec have been demonstrated (Liberman, Delattre & Cooper, 1958;Abramson & Lisker, 3 This is in contrast to the preponderance of place errors made by normal subjects in experiments where the stimulus is partially masked by acoustic noise (Miiier&Nic~ly, 195.5).
218
SAFFRAN, MARIN AND YENI-KOMSHIAN TABLE 4 TYPES OF ERRORS IN STOP CONSONANT IDENTIFICATION”
Alveolar stops No. of errors:
Bilabial stops No. of errors: Voicing Voiced stops
Place
Voicing
35
Voicing
Place
44
25
9
7
33 44
Total errors
141
38
20
Total errors
Place
62
11 Unvoiced stops
Velar stops No. of errors:
36
28 66
19 44
(I Total voicing errors: 177, Total place errors: 154.
1965). Stimuli on one side of the critical VOT are perceived as voiced and indistinguishable from each other; stimuli on the other side are perceived as voiceless and likewise indistinguishable from each other. This effect supports the notion of “categorical perception”: that speech sounds can be discriminated only to the extent that they are identified as members of different phonemic categories (Liberman, Cooper, Shankweiler & Studdert-Kennedy, 1967). Our experiments with natural speech sounds suggested that the patient was having particular difficulty with the voicing distinction. The tendency to label voiced stops as voiceless might, for example, be explained as a shift of the category boundary along the VOT continuum. It was therefore of interest to examine the categorization of synthetic stop consonants over a range of VOTs. METHOD Tapes of VOT continua for iba-pa/, /da-ta/ and /ga-ka/, originally synthesized at the Haskins Laboratories (see Caramazza, Yeni-Komshian, Zurif & Carbone, 1973, for a description of these materials) were spliced and all of the stimuli randomized to yield a TABLE 5 CONFUSION MATRIX FOR NATURAL STOP CONSONANTS WITH /al
Percent in each response category Stimulus
ba
da
ga
pa
ta
ka
ba da ga pa ta ka
90
0 38 3 3 10 3
3 14 38 0 7 3
7 3 3 28 0 0
0 7 7 10 41 7
0 24 41 17 41 86
10
7 41 0 0
SPEECH PERCEPTION
219
IN WORD DEAFNESS
test series comparable in format to the experiments with natural speech sounds. There were 37 stimuli from each of the three continua, ranging in VOT from -150 to +150 msec. VOT increments occurred in 10 msec steps, except for the transition stage of - 10 to +50 msec, where the interval was reduced to 5 msec. The stimuli were presented binaurally at 5 set intervals through a stereo headset. The answer sheet and instructions were similar to those for the natural speech experiments. This test was given on two different occasions. Data obtained from three normal subjects were used to determine category boundaries for the purpose of scoring the results: for lba-pa/, VOTs greater than +25 msec were called /pa/; for /da-ta/, VOTs greater than +30 msec were called ha/; for /ga-ka/, VOTs greater than +40 msec were called /ka/.
RESULTS AND DISCUSSION
The results for the two test sessions using randomized tapes were comparable and were combined to yield the confusion matrix in Table 6. In general, the trends are similar to what we have seen for natural speech, occasionally (as with /da/) with some exaggeration. If anything, there is an even stronger bias toward voiceless responses, especially toward /ka/. VOT has some influence on voicing judgments, but, particularly with alveolar and velar stimuli, the effect is small. In general, more voiced than unvoiced stimuli are judged as voiced, and vice versa: i.e., for iba/, 88% of the responses are voiced; for /da/, 28%; /gal, 30%, /pa/, 37%; /ta/, 4%; /ka/, 20%. The errors are distributed all over the VOT continuum, however, and nowhere is there evidence of a sharp demarcation between voiced and voiceless responses. We cannot, therefore, explain the bias toward voiceless responses as a displacement of the normal category boundary to a different locus on the VOT continuum. If this were so, we could attribute the voicing problem to processing by feature detectors which had been “set” at the wrong level. The evidence does not support this interpretation and suggests, rather, that phonetic mechanisms are operating either extremely noisily or perhaps not at all. TABLE 6 CONFUSION MATRIX FOR SYNTHETIC STOP CONSONANTS WITH la/
Percent in each response category Stimulus
ba
da
ga
pa
ta
ka
ba da ga pa ta ka
86 10 4 30 4 8
0 6 2 0 0 4
2 12 24 7 0 8
2 6 2 47 4 0
7 16 8 7 25 8
2 50 60 10 67 71
220
SAFFRAN, MARIN AND YENI-KOMSHIAN
DISCRIMINATION
OF STOP CONSONANTS
Because subjects are normally unable to discriminate between acoustically different speech sounds which belong to the same phonemic category (but see Pisoni & Tash, 1974, for some qualifications), speech perception has been termed “categorical” in nature. Does the patient behave similarly? Or can he discriminate between, say, /gal and /ka/ on as acoustic basis even though he tends to identify both as /ka/? Most of the research on phonetic discrimination has been based on an ABX paradigm: two different sounds are given in succession, and the subject must determine whether a third stimulus is identical to either the first or second. This procedure was deemed too difficult for the patient. Instead, we used anAX paradigm in which the patient simply had to indicate whether two successive stimuli were the same or different. METHOD The stimuli were enunciated by the same female experimenter as in the identification experiments described above. Each stimulus from the la/ series was paired five times with itself and twice with each of the other syllables, with the members of each pair balanced for order of presentation. The interstimulus interval was approximately one second. The order of presentation of the pairs was random. This test was given on seven different occasions over a three month period.
RESULTS AND DISCUSSION
The results, which are summarized in Table 7, suggest that the patient has difficulty discriminating between the same stimuli which had been confused in the identification task (c.f., Table 5), that is, between /pa/ and lbal, /gal and ikal, and /ta/ and lkal. The indiscriminability of Ida/ and /tat also conforms with the identification data, where /da/ and /tat were both labeled as /ka/. Thus, like normals, the patient is unable to discriminate between phonemes classified within the same phonemic category; what is unusual about his performance is the nature of his categories. As in the identification experiments, the patient seems to have more difficulty with voicing than with place cues. Over all, pairs differing in voicing were discriminated only at chance (.45); those differing only in place were somewhat better discriminated (-78); and those differing in both place and voicing were almost always perceived as different (.90). From dichotic experiments in normal subjects, there is some indication that the right hemisphere can distinguish the place cues (the second formant transitions) which are heard as “chirps” when isolated from the rest of the speech signal (Liberman, 1974). We do not know whether it is also sensitive to the voicing cue, which involves the discrimination of small temporal differences. However, it is frequently suggested
SPEECH PERCEPTION
221
IN WORD DEAFNESS
TABLE 7 PROBABILITY
OF DISCRIMINATING
MEMBERS
OF STOP CONSTANT
PAIRS AS DIFFERENT
ba da
ba
da
ga
pa
ta
ka
.oo
1.00 .oo
.86
.43 .93 1.00 .oo
1.00 .57 .86 1.00
1.00 .64 .36 .79
.oo
SO .oo
.50
.oo
ga pa ta ka
that temporal analysis requires left hemisphere specialization (e.g., Carmon & Nachson, 1971), and it is possible that voicing distinctions are especially vulnerable when the analysis is performed by the right temporal lobe. The most remarkable feature of the patient’s performance is, however, to be found in Table 8: discriminability is often strongly determined by order of presentation. This unusual state of affairs holds true for all of the low-discriminability (i.e.,p < .8) pairs in Table 7. The effect seems to depend consistently on /ka/: in those pairs of which /ka/ is a member, discriminability is greater when /ka/ comes first; in other low-discriminTABLE 8 EFFECT
OF ORDER OF PRESENTATION
Order
ba, pa
ON THE DISCRIMINABILITY
Probability of discriminating as different
pa, ba
.43 .57
pa, ka ka, pa
.56 1.00
da, ga
.29 .71
ga, da ta, da
.29 .86
da, ka ka, da
.29 1.00
ta, ka ka, ta
.14 .86
ga, ka ka, ga
.oo .71
da, ta
OF STOP CONSONANTS
222
RAFFFLAN, MARIN AND YENI-KOMSHIAN
ability pairs (as in /da-ta/), discriminability improves when the stimulus more frequently labelled as /ka/ (see Table 5) comes first. How can we account for this unusual effect of order on discriminability? We will argue that the effect has something to do with memory, and in turn with the encoding of /ka/. To discriminate successive stimuli, the first stimulus must be stored briefly in order to compare it with the second; thus the order effect may indicate that /ka/ has some mnestic advantage over the other syllables in the set. While it would seem possible to discriminate successive stimuli like /ga/ and ikal purely on the basis of their auditory characteristics, there is evidence that encoding in this form is very transitory (Pisoni, 1973); it is thought that to endure, the stops must be encoded categorically (Liberman, Mattingly & Turvey, 1972). The mnestic advantage of/k/ for this patient suggests, therefore, that he encodes this phoneme somewhat differently than the other stops. That /k/ might have some special status for the patient was suggested earlier by his performance in the identification tasks, where /k/ was not only highly intelligible (Tables 5 and 6), but also seemed to evoke more rapid and less equivocal responses than the other stimuli. The reason for its salience, however, and the nature of its representation remain obscure.4 We assume, therefore, that /k/ “registers” for the patient, while most other speech sounds do not, because it engages some categorical representation. Whether this encoding is, in the usual sense, “phonetic” and dependent on distinctive feature analysis is not clear; conceivably it could be based on auditory parameters alone. Whatever the nature of its representation, other stop consonants, /g/ and /t/ in particular, must have some capacity to evoke categorical /k/; this would account for order effects among non-/k/ pairs in the discrimination experiments (Table 8), as well as for phonemic misclassification (Table 5). It is also conceivable that the strong bias toward voicelessness in the identification experiments reflects a disposition to categorize other stimuli as /k/ rather than a systematic error in voicing discrimination per se. To summarize, we have demonstrated that the patient classifies and discriminates stop consonants very abnormally. The error patterns are quite systematic and an explanation in terms of random noise in the phonetic apparatus does not seem satisfactory. It is more likely that some other mechanism, lawful but inadequate, is responsible for the deviant perception of speech sounds. We are left therefore with the 4 It cannot be explained in terms of importance in English where the poorly discriminated ldi carries a much higher information load (Denes, 1963). Possibly, like his performance with “hen”, the superiority of lki is another idiosyncratic result of personal history. It might also be related to the acquisition and cross-language data (Ferguson, 1975) cited above.
SPEECH PERCEPTION
IN WORD DEAFNESS
223
hypothesis suggested earlier, that speech perception might be arrested at a pre-phonetic level5 THE EFFECT OF CONTEXT
ON SPEECH COMPREHENSION
There is a good deal of evidence that language comprehension “is not exclusively outside-in”, and that “the listener makes an active contribution to what he hears and understands” (Wanner, 1973, p. 164). Context not only improves the intelligibility of noisy signals (Miller, Heise, & Lichten, 1951); it is powerful enough to induce the perception of phonemes that have actually been deleted from the acoustic stimulus (Warren, 1970). These phenomena provide strong support for the notion of “analysis-by-synthesis” (Neisser, 1967): that perception is partially an act of construction on the part of the perceiver and is not wholly determined by the stimulus event. Like normals under noisy conditions, the word-deaf patient can use contextual information to compensate, in part, for his phonemic deficit. We videotaped a remarkable sequence in which the patient gets completely lost each time the questioning shifts from his smoking habits to his work experience, to the circumstances of his early life, but is able to respond appropriately once he grasps the general topic of the conversation. Similar anecdotal observations have been reported by others who have studied these patients (Hemphill & Stengel, 1940; Klein & Harper, 1952). The patient’s behavior in test situations also showed evidence of contextual effects. While he was unable to repeat any of the words in a randomly selected list upon first presentation, he performed significantly better than chance in a multiple choice test using similar items. In multiple choice recognition tests, he was more successful when the set of choices was given him just before rather than immediately after the auditory stimulus. Repetition performance also improved considerably after the patient was shown a list of the relevant items. The effects of context are further demonstrated in the experiments below. THE EFFECT OF CATEGORY
CUES ON REPETITION
The object of this experiment was to determine whether category constraints could improve the patient’s performance on a repetition task. 5 Several preliminary studies were carried out with other classes of speech sounds. As with the stops, the patient made some systematic errors with fricatives and sibilants; with these stimuli, however, voiceless consonants tended to be identified as voiced (/f/, for example, as /v/). Vowels seemed to be easier to identify than consonants. Unfortunately, we were unable to enlist the patient’s cooperation long enough to complete these experiments.
224
SAFFRAN, MARIN AND YFNI-KOMSHIAN
METHOD Two lists of 20 familiar mono- and b&syllabic nouns were constructed. For the category condition, five words were chosen from each of four categories (animals, vehicles, fumitune, fruits). Twenty words similar in phonetic composition and frequency of occurrence were chosen for the control condition. For the category list, item presentation was blocked by category. Half of the control list was given first, then all of the category list, then the remainder of the control list. The experimenter repeated each word up to 10 times until the patient repeated it correctly. If he failed to respond correctly on the tenth presentation, he was shown a card with the correct word. He was never informed of the relationships in the category condition.
RESULTS The mean number of repetitions required to elicit the correct response in the category condition was 4.3; the number of repetitions required in the control condition, 6.4.6 Within list A, performance was considerably better on the fifth word of a particular category than on the first (3 .Orepetitions needed as opposed to 6.7 on the first word). Providing context, by restricting words to categories, clearly improves performance on this task. THE EFFECTS OF EMBEDDING WORDS IN SENTENCES In a noisy background, normal subjects are able to identify words more easily when they are embedded in a sentence than when the same words are presented in isolation (Miller, Heise & Lichten, 1951). We performed a similar experiment with this patient, but without the superposition of acoustic noise. METHOD Twenty-five familiar monosyllabic words beginning with a stop consonant were embedded in 4-6 word sentences. The same items were also presented as isolated words. After each sentence (or word) was read to the patient, he was shown a list of three words, and was asked to underline the one he had just heard. The distracters were two rhyming words beginning with other stop consonants (e.g., for “The boy sailed a boat,” the choices were “boat, goat, coat”). In most cases (20/25), the target word came at the end of the sentence. Half of the sentences and half of the words were given in two sessions one week apart, with no duplication of the target words in the same test session. The order of the two conditions was counterbalanced across sessions.
RESULTS The patient identified 12 of the isolated and 19of the sentence-embedded words correctly. Most of the errors in the sentence condition (4/6) occurred 8 The difference between these two conditions is somewhat underestimated; the procedure allowed only 10 repetitions, and the maximum error score was reached more often in the control than in the category condition.
SPEECH PERCEPTION
IN WORD DEAFNESS
225
with words which appeared in the middle rather than at the end of the sentence. DISCUSSION
Embedding words in a context provided either by a sentence or a semantic category considerably improved their intelligibility. We think it unlikely, however, that the patient was decoding whole sentences accurately, since he performed very poorly on measures of sentence comprehension (e.g., Kessel, 1970).Detection of another word or two, or information provided by the intonation contour of the sentence, may have been sufficient to constrain the response. The patient’s ability to implement contextual cues suggests that “analysis-by-synthesis” mechanisms play some role in his limited comprehension of spoken language. Decoding normally begins, however, with a “preliminary analysis of the signal . . . a kind of processing which, by definition, is not analysis-by-synthesis” (Neisser, 1967, p. 164). This stage is so deficient in word deafness that it does not generate the information on which analysis-by-synthesis can build; the patient is unable to hazard even a reasonable guess about what he hears. But if other contraints are supplied along with the speech signal, synthetic operations can apparently compensate, in part, for deficits at the phonetic level. Whether word recognition of this sort is carried out in the right hemisphere or requires transfer of auditory information to the left is not clear. Albert and Bear (1974) have reported evoked potential data which suggest that the left temporal lobe can receive transcallosal auditory input in word deafness, although it may transmit none.’ If so, lexical operations might be performed by the left hemisphere mechanisms on auditory signals supplied by the right; without phonetic mediation, presumably, and therefore highly dependent on contextual information. Recently, Albert and Bear (1974) have looked at word deafness as a problem in the temporal resolution of auditory stimuli, rather than, as we have viewed it, as a specific phonetic impairment. Their patient had an abnormal fusion threshold for auditory stimuli, and a comprehension deficit that improved at slow presentation rates. We did not vary presentation rate in our own study. However, we should not have found words in sentences to be more intelligible than isolated words if temporal resolution were the major impediment to language comprehension. The beneficial effect of a slow presentation rate may reflect increased reliance on con‘I Note that this finding does not necessarily contradict the classical mechanism for word deafness, which isolates Wemicke’s area from both direct and transcallosal “primary” input. There is no reason to believe that callosal fibers from the whole auditory association cortex in the right hemisphere must all be affected by what must be a rather restricted lesion of the left temporal lobe.
226
SAFFRAN, MARIN AND YENI-KOMSHIAN
structive processes in word recognition. In any case, sensitivity to the temporal parameters of the speech signal is not limited to word deafness: similar effects of presentation rate have been found in patients with specific deficits of auditory short term memory (Warrington & Shallice, 1969); there is also evidence that aphasics in general benefit from artificial extension of transient components of the speech signal (Tallal & Newcombe, 1975). Thus, while there may very well be a temporal resolution deficit in word deafness, it does not seem sufficient explanation for the severe disorder of speech perception. CONCLUDING COMMENTS We would like to suggest, then, that the perceptual deficit in word deafness is the result of an arrest of speech processing at a pre-phonetic level, a hypothesis that is in accord with both the classical anatomical conception of verbal agnosia and contemporary theories of speech perception. Compelled, despite the phonetic impairment, to try to make sense out of what he hears, the patient obviously resorts to other capacities which remain to him: very likely, some combination of auditory parameters, contextual cues and higher level linguistic processes. It may therefore be easier to determine which operations are lacking in word deafness than to give a complete explanation for the deviant perceptual results. In any case, evidence gathered from a single patient can be exceptional and the conclusions we have drawn must be weighed accordingly. We hope that others who have the opportunity to study this rare syndrome will see fit to repeat and extend our observations, particularly in the direction of more systematic manipulation of auditory parameters than was possible here. REFERENCES Abramson, A., & Lisker, L. 1965. Voice onset time in stop consonants: Acoustic analysis and synthesis. Fifth International Congress of Acoustics, Liege, l-44. Albert, M. L., & Bear, D. 1974. Time to understand: A case study of word deafness with reference to the role of time in auditory comprehension. Brain, 97, 373-384. Blumstein, S. E. 1974. The use and theoretical implications of the dichotic technique for investigating distinctive features. Brain and Language, 1, 337-350. Caramazza, A., Yeni-Komshian, G. H., Zurif, E. B., & Carbone, E. 1973. The acquisition of a new phonological contrast: The case of stop consonants in French-English bilinguals. Journal of the Acoustical Society of America, 54, 421-428. Carmon, A., & Nachson, 1. 1971. Effect of unilateral brain damage on perception of temporal order. Cortex, 7,411-418. Cullen, J. K., Thompson, C. L., Hughes, L. F. Berlin, C. I., & Samson, D. S. 1974. The effects of varied acoustic parameters on performance in dichotic speech perception tasks. Brain and Language, 1, 307-322. Delattre, P. C., Liberman, A. M., & Cooper, F. S. 1955. Acoustic loci and transitional cues for consonant. Journal of the Acoustical Society of America, 27,769-773.
SPEECH PERCEPTION
IN WORD DEAFNESS
227
Denes, P. B. 1963. On the statistics of spoken English. Journal of the Acousrical of America,
Society
35, 892-904.
Ferguson, C. A. 1975. Sound patterns in language acquisition. Paper presented at 26th Annual Georgetown Round Table, March, 1975. Geschwind, N. 1%5. Disconnexion syndromes in animals and men. Bruin, 88, 237-294. Goldstein, M. N. 1974.Auditory agnosia for speech (“Pure word deafness”): A historical review with current implications. Bruin and Language, 1, 195-204. Hemphill, R. E., & Stengel, E. 1940. A study on pure word-deafness. Journal of Neurology and Psychiatry, 3, 25 l-262. Kessel, F. S. 1970.The role of syntax in children’s comprehension from ages six to twelve. Monographs
of the Society for Research
in Child Developmenr,
No. 139.
Klein, R., & Harper, J. 1956. The problem of agnosia in the light of a case of pure word deafness. Journal of Mental Science, 102, 112- 120. Liberman, A. M. 1974.The specialization of the language hemisphere. In F. 0. Schmitt and F. G. Worden (Eds.), The neurosciences. Cambridge, Mass.: M.I.T. Press. Pp. 43-56. Liberman, A. M., Cooper, F., Shankweiler, D. P., & Studdert-Kennedy, M. 1967. Perception of the speech code. Psychological Review, 74, 431-461. Liberman, A. M., Delattre, P. C., & Cooper, F. S. 1958. Some cues for the distinction between voiced and voiceless stops in initial position. Language and Speech, 1, 153-167. Liberman, A. M., Mattingly, I. Cl., & Turvey, M. T. 1972. Language codes and memory codes. In A. W. Melton and E. Martin (Eds.), Coding processes in human memory. New York: Winston. Pp. 307-334. Lichtheim, M. L. 1885. On aphasia. Bruin, 7, 433-484. Marie, P. 1906.What to think about subcortical aphasias (pure aphasia). Semaine Medicale 26: 493-500. In M. F. Cole & M. Cole (Eds. and Transl.), Pierre Marie’s papers on speech disorders, New York: Hafner. Pp. 75-102. Miller, G. A., Heise, G. A., & Lichten, W. 1951.The intelligibility of speech as a function of the context of the test materials. Journal of Experimental Psychology, 41, 329-335. Miller, G. A., & Nicely, P. 1955.An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America, 27, 338-352. Milner, B., Taylor, C., & Sperry, R. W. 1968. Lateralized suppression of dichotically presented digits after commissural section in man. Science, 161, 184-186. Neisser, V. 1967. Cognifive psychology. New York: Appleton-Century-Crofts. Pisoni, D. B. 1973. Auditory and phonetic memory codes in the discrimination of consonants and vowels. Perception and Psychophysics, 13, 253-260. Pisoni, D. B., & Tash, J. 1974. Reaction time to comparisons within and across phonetic categories. Perception and Psychophysics, 15, 285-290. Schulhoff, C., & Goodglass, H. 1969. Dichotic listening, side of brain injury, and cerebral dominance. Neuropsychologia, 7, 149- 160. Sparks, R., & Geschwind, N. 1968. Dichotic listening in man after section of neocortical commisures. Corfex, 4, 3- 16. Studdert-Kennedy, M. 1974.The perception of speech. In T. Sebeock (Ed.), Current trends in linguistics, Vol. 12. The Hague: Mouton. Studdert-Kennedy, M., & Shankweiler, D. 1970. Hemispheric specialization for speech perception. Journal of the Acoustical Society of America, 48, 579-594. Studdert-Kennedy, M., Shankweiler, D., & Pisoni, D. 1972. Auditory and phonetic processes in speech perception: Evidence from a dichotic study. Cognitive Psychology,
3, 455-466.
Tallal, P., & Newcombe, F. 1975. Impairment of auditory perception and language comprehension in residual dysphasia. Manuscript submitted for publication.
228
SAFFRAN, MARIN AND YENI-KOMSHIAN
Wanner, E. 1973. Do we understand sentences from the outside-in or from the insideout? Daedalus, 102, 163- 184. Warren, R. M. 1970. Perceptual restoration of missing speech sounds. Science, 167,392393. Warrington, E. K., & Shallice, T. 1969. The selective impairment of auditory verbal shortterm memory. Brain, 92, 885-906. Weiss, M., & House, A. 1973. Perception of dichotically presented vowels. Journal of the Acoustical
Society of America,
53, 51-58.
Wood, C. C. 1975. Auditory and phonetic levels of processing in speech perception: Neurophysiological and information processing analyses. Journal of Experimental Psychology: Human Perception and Performance, 104, 3-20. Zurif, E. B., & Ramier, A. M. 1972. Some effects of unilateral brain damage on the perception of dichotically presented phoneme sequences and digits. Neuropsychologia, 10, 103- 110.