Brain Research, 252 (1982) 353-365
353
Elsevier Biomedical Press
Speech Evoked Activity in the Auditory Radiations and Cortex of the Awake Monkey MITCHELL STEINSCHNEIDER, JOSEPH AREZZO and HERBERT G. VAUGHAN, Jr.* Departments of Neuroscience and Neurology and the Rose F. Kennedy Center for Research in Mental Retardation and Human Development, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, N Y 10461 (U.S.A.)
(Accepted May 18th, 1982) Key words: auditory radiations - - auditory cortex - - multiple unit activity - - speech sounds - - linguistic features
To determine whether phonetic features of human speech are reflected in activity patterns of the auditory cortex and its thalamic afferents, concurrent recordings of multiple unit activity (MUA) and averaged evoked potentials (AEP) to 3 synthetic syllables: /da/,/ba/and/ta/, were performed in awake monkeys. Using clicks, responses from thalamocortical axons and cortical cells were differentiated on the basis of their response latency, spatial distribution, and relationships to AEP components. Voice onset time was reflected in MUA time-locked to consonant release and voicing onset, and phase-locked to the syllables' fundamental frequency. Place of articulation was reflected in discriminative 'on' and phase-locked responses occurring to the formant transitions of the syllables. Duration of the voiced formant transitions was represented by an accentuation of the phase-locked responses occurring to this period. Activity of thalamocortical fibers and cortical cells differed. Thalamocortical fibers were more responsive to speech sounds, as well as iesponding more frequently with a phase-locked response pattern. Cortical cells responded with sustained activity to a greater degree. Responses to identical portions of the vowels were biased by the preceding consonant. The spatial extent and timing of the responses demonstrate that speech sounds are processed along parallel, but not synchronous, channels. Relevance to human psychoacoustical phenomena is discussed. INTRODUCTION In recent years, increasing interest has been focused on neural responses to speech sounds. W o r k has mainly concentrated on how lower levels o f the auditory system encode vowel-like sounds aa,34,aT-39, 54 Systematic investigations o f the neurophysiological responses to consonants, which carry most of the information in speech 9, have been more limited. In contrast, psychoacoustic research into the mechanisms o f c o n s o n a n t perception has been extensive and has led to the delineation o f specific acoustic parameters which participate in the discrimination o f the stop consonants (/b,p,d,t,g,k/) when in syllable initial position, including: duration o f the f o r m a n t transitions 26, shape o f the syllables' shortterm spectrum at stimulus onset7,s,16, 4a, direction and extent o f the second and third f o r m a n t transitions17, zS, and voice onset time (VOT)2S, 44. To * To whom requests for reprints should be addressed. 0006-8993/82/0000-0000/$02.75 © 1982 Elsevier Biomedical Press
better understand how the C N S encodes these psychoacoustically important variables, it is necessary to examine their neurophysiological correlates within the auditory system. Because the detailed intracranial recording necessary for this analysis is not feasible in humans, neurophysiological investigations o f speech sound processing must be performed in animals. The use o f animal models for evaluating some o f the processes involved in speech perception is justified by data demonstrating that b o t h monkeys and chinchillas discriminate stop consonant-vowel (CV) syllables that vary along the dimensions o f place o f articulation and V O T in a m a n n e r similar to h u m a n s 23,24, 40,49. Our analysis has focused on the auditory cortical fields, as these regions are essential for the proper decoding o f speech. D a m a g e to the transverse gyri or to their afferent pathways produces varying de-
354
DA + TA m
-3000 m
N
BA DA + TA
-2000 -
o
C
-
-1000 TA DA+BA I
I
I
I
0
20
40
60
80
!
I
I
I
100
120
140
160
msec
TA 5 mse¢
10 mse¢
355 grees of auditory agnosia and aphasia a,19-2t,a°,85,a6. Bilateral auditory cortical lesions in monkeys destroy their ability to discriminate human speech sounds TMand other complex acoustic signals TM, while sparing simpler auditory discriminations. Analysis of the electrophysiological activity of auditory cortex elicited by speech sounds is therefore critical to establish correlations between speech variables important for decoding speech and the corresponding neural responses. In this study we examined the responses of auditory thalamocortical fibers and cortical cells to human speech sounds that varied in their place of articulation and VOT in the awake monkey using concurrent recordings of averaged evoked potentials (AEP) and multiple unit activity (MUA). We intended to define relationships between the acoustic properties of the stimuli and the neural responses, and to detect differences in response properties of afferent fibers and neurons in auditory cortex. METHODS Three adult macaques (1 male M. mulatta and 2 male M. fascicularis), weighing between 4.5 and 6.0 kg, were studied. Surgery, performed under general anesthesia using aseptic techniques, was begun with the removal of about 2 cm 2 of skull overlying the estimated location of auditory cortex. This hole accommodated a rectangular matrix of adjacently placed 18-gauge stainless-steel tubing in 2 monkeys, or a Plexiglas plate embedded with short lengths of 20-gauge tubing in the other. The tubes served as guides for the movable intracortical electrodes and were positioned vertically with respect to stereotaxic coordinates. The tube matrix was embedded in a mound of dental acrylic which was anchored to the skull by 4 stainless steel bolts. Bars were also incorporated into the acrylic for holding the animals' heads rigid during the recording sessions. Depth electrodes were multicontact electrodes
with up to 8 contacts spaced 300-400/zm apartL Each channel had an impedance of 0.2-0.5 Mf~ at 1 kHz. Brain potentials were amplified by differential amplifiers, with a frequency response down 3 dB at 1 Hz and 3 kHz. Unity gain headstage amplification was also employed. The reference was generally an occipital bone electrode, though linked mastoids were sometimes used. The AEPs were averaged online by a Nicolet Med 80 computer with a minimum sampling rate of 2 kHz. The potentials and a timing pulse were also recorded on a 7-track tape recorder (bandpass 0-5 kHz) for subsequent off-line analysis. This recorded activity was high-pass filtered above 500 Hz (roll-off 24 dB/octave), full-wave rectified, and averaged by the computer (sampling rate = 8200 Hz) for analysis of MUA. When single units were isolated in the high-pass filtered data, post-stimulustime histograms (PSTHs) were constructed (bin width=280/zs). Stimuli were clicks and synthetically produced CV syllables. Condensation clicks were generated by 100 /~s positive square-wave pulses. The synthetic syll a b l e s , / d a / , / b a / a n d / t a / , produced at the Haskins Laboratories, are shown in Fig. 1. Stimuli were approximately 166 ms in duration. The shape and phase of the sound pressure waves f o r / d a / a n d / b a / were nearly identical. VOT was 0 ms for both. These two stimuli differed only in their second and third formant transitions, which lasted 40 ms. /Ta/ differed f r o m / d a / b y an increase in the VOT from 0 ms to 80 ms, and by an elevation in the starting first formant frequency from 200 Hz to 526 Hz, a modification that greatly enhanced its unvoiced quality. The first formant transition lasted 30 ms. Second and third formant starting frequencies were 1835 Hz and 3439 Hz f o r / d a / a n d / t a / , 621 Hz and 2029 Hz for /ba/. All formants changed linearly to their steady-state values, which were: first formant, 817 Hz; second formant, 1181 Hz; and third formant, 2632 Hz. These stimuli were digitized at a sampling
Fig. 1. Physical characteristics of the syllables used in this study. The upper half of the figure depicts the center form,antfrequencies of the syllables vs time./Da/and/ba/differed in their second and third formant transitions, which lasted 40 ms./Ta/differed from /da/by an increase in the VOT from 0 to 80 ms (vertical dotted line), and by a frequency cutback in the first formant transition, which lasted 30 ms. The lower half of the figure depicts the sound pressure waves of the syllables. The periodic waveforms of/da/ and/ba/are very similar, and differ predominantly during the formant transitions. The waveform of/ta/differs from the others by containing an initial aperiodic segment, corresponding to the 80 ms VOT. The initial 20 ms of the waveforms are also shown separately, illustrating these differences.
356 rate of 12 kHz and transferred with a timing pulse to a tape recorder for presentation. The final versions were highly discriminable by human listeners. Stimuli were delivered to the subjects via dynami~ headphones, which were positioned snugly against the animals' ears. Care was taken to insure that field artifacts did not contaminate the neural responses. All stimuli were presented to the ear contralateral to the recording sites at an interstimulus interval of 658 ms. Stimuli were presented at a peak intensity of 80 dB SPL. Recording sessions were conducted in a soundattenuated chamber and lasted about 2 h. The animals, seated in a primate chair with head fixed and upper limbs restrained, maintained a relaxed but alert state. The dura was pierced with a sharpened probe and the electrodes were introduced into the brain through the lumens of the guide tubes using a microdrive. Positioning of the electrodes was guided by the on-line analysis of the intracortical depth patterns in the click-evoked AEP. Presentation of the speech sounds was initiated when the multicontact electrode array straddled the plane of inversion of the early cortical AEP components to click. After the initial presentation of speech sounds, the electrode array was repositioned 150-200 /~rr deeper, and the stimuli readministered. This process was continued until the polarity inverted AEPs showed a major amplitude decrement, indicating that the electrode had completely traversed the auditory cortex. Typically, 200 presentations of each stimulus were used to generate the averages. Epocht containing muscle related activity or electrical transients were automatically rejected by the averaging program. Fifteen to 35 electrode passes were taken in each monkey over a period of 2-3 months. At the end of the recording series, the animals were painlessly sacrificed and perfused through the aortic arch with glutaraldehyde. Tissue was blocked, sectioned, and histologically examined to reconstruct the electrode tracks and to identify selected recording sites which had been marked by iron deposition. RESULTS The data presented are based on the analysis of 70 electrode passes taken through auditory koniocortex
and the surrounding auditory fields at locations where MUA to click stimulation was observed. Although speech sounds elicit responses in all auditory fields, the activity is less widespread than that evoked by clicks. Syllable-evoked responses occur in 68 % of the sites activated by clicks, whereas over 97 % of sites responsive to syllables are also activated by clicks.
Differentiation of thalamocortical axons from cortical cells The MUA recorded within the cortex contains discharges from thalamocortical axons and cortical cells. Contributions from each of these sources can be differentiated on the basis of: (a) onset latency to click stimulation, (b) the distribution of MUA within and below the superior temporal plane (STP), and (c) the spatial and temporal relationships of M U A to click-evoked AEP components. At 80 dB SPL, responses with onset latencies of 4.3-7.0 ms originate from thalamocortical fibers. This activity represents the earliest responses within the STP, and can be recorded throughout the middle and lower cortical laminae, as well as in the white matter beneath the auditory cortex. It occurs prior to the onset of the cortical AEP components, and is coincident with a positive potential that precedes the initial cortical activity. M U A originating from cortical cells possesses different characteristics. It is confined to the gray matter of the auditory cortex, and begins within 1 ms after the onset of the initial cortical AEP component. This activity is maximal within laminae III and IV, the intracortical depths where the inverting AEP components undergo their greatest rate of change in amplitude. Frequently, MUA representing responses of cortical neurons is preceded at the same site by thalamocortical activity. At these locations, it is difficult to ascertain the relative contributions of each of these neural populations to the responses.
Temporal response patterns reflect speech parameters Within all auditory cortical fields, the articulatory parameters of VOT and place of articulation are represented in the responses of both thalamocortical axons and cortical cells elicited by /da/, /ba/ and /ta/. Differential activity reflecting these speech parameters occurs within two kinds of response pat-
357 terns: transient responses time-locked to the onset of the voiced and unvoiced speech segments, and oscillatory responses phase-locked to the syllables' fundamental frequency. Additionally, the phaselocked response pattern reflects the duration of the voiced formant transitions.
CLICK
Discharge patterns reflect VOT
A
Discharge patterns of thalamocortical fibers and cortical cells mirror the VOT of the CV syllables. The 'on' response pattern reflects the VOT by displaying phasic activity bursts at the onset of the periodic and aperiodic segments of the syllables (Fig. 2). The thalamocortical fiber responses t o / d a / a n d / b a / , the stimuli which are voiced throughout their duration, consist of single bursts near stimulus onset. M U A to/ta/differs from the other responses by exhibiting two bursts separated by the 80 ms VOT. The first burst is elicited by the onset of the unvoiced speech segment, while the second is elicited by the onset of the voiced segment.~'Similar responses reflecting the VOT of alveolar CV syllables are shown in the M U A of thalamocortical axons to / d a / a n d / t a / a t site A in Fig. 3. Site B in Fig. 3, 200/~m below A, illustrates the manner in which the phase-locked response pattern reflects syllable VOT. Whereas the phase-locked responses t o / d a / a n d / b a / b e g i n near stimulus onset and continue throughout their duration, the oscillatory response to /ta/ is delayed from an initial burst time-locked to consonant release by an interval that equals the VOT. The activity of single units also reflects the VOT in their phase-locked responses to stimulus fundamental frequency (Fig. 4). The PSTHs of this thalamocortical axon exhibit oscillatory responses which continue t h r o u g h o u t / d a / a n d / b a / , and which are delayed by the VOT o f / t a / . Additionally, the response t o / t a / c o n t a i n s a time-locked burst to the onset o f its unvoiced segment and an accentuation of the initial phase-locked response.
I 0 msec
Discharge patterns reflect place of articulation Activity of thalamocortical axons and cortical cells also reflects the consonants' place of articulation, manifested by discharges selectively occurring to the formant transition regions of either the bilabial or alveolar consonants. The 'on' response pat-
I 20
DA
A I 0
I 50
I I00 msec
I 150
! 200
Fig. 2. MUA from a single location to clicks and the 3 syllables. Response latency to clicks indicates that the activity emanates from thalamocortical axons. 'On' responses to the syllables reflect their VOT. Responses to/da/and/ba/, the stimuli with 0 ms VOTs, contain single bursts near stimulus onset. The discharges evoked by /ta/ differ from the other responses by containing two bursts separated by the 80 ms VOT. The first burst is elicited by consonant release, the second by voicing onset. Arrows mark stimulus onset. tern demonstrating this consonant differentiation is illustrated in the thalamocortical axon M U A of Fig. 5. The alveolar CV syllables, /da/ and /ta/, elicit
DA
BA
.LLLLLLLLLLLLLLL, '"rrTlrFlrlrlr~lrRI r'r
.LLLLLLLLLLLLLLL,
CLICK
TA
A
B
LT-
8TIMULU8
! o
! lo
| 20
I 0
I 50
I 100
I 180
",~rrrrrrrrrrrr' I ' 200
I 0
I 80
msec
msec
....... LLLLLLLL~
I 100
I 150
......~rrrrrrr'
I 200
I 0
I 50
msec
I 100
I
I 150
200
msec'
Fig. 3. Thalamocortical axon M U A from two sites separated vertically by 200/~m and recorded during a single electrode pass. Activity at site A exhibits 'on' responses which reflect VOT and place of articulation. The response t o / d a / i s a single burst evoked by voicing onset, whereas the response to/ta/consists of two bursts, separated by 80 ms, and time-locked to consonant release and voicing onset. Unlike the alveolar CV syllables,/da/and/ta/, the bilabial CV syllable,/ha/, fails to elicit a response, thus exhibiting differentiated activity which reflects the consonant place of articulation. Activity at site B contains phase-locked responses to the syllable fundamental frequency which also reflect the VOT. Whereas the phase-locked responses t o / d a / a n d / b a / a r e maintained throughout the stimuli, the periodic activity to/ta/is delayed from a burst at stimulus onset by an interval that equals the VOT. Just as two thalamocortical axon volleys occur in the click evoked M U A at site B, two phase-locked response peaks are evoked by each pitch period of the syllables.
CLICK
DA
BA
TA
m
I
eo
g
L~LLLLLLLLLLLLLL t 'qTfl'rrl'n'l~r'
d LLLLLLLLLLLLLLi *'rrrrrrrrrn'rr'
L f I 1o
I 20
I
l
0
80
msec
I
100
I
I
150 2oo
I 280
I
0
I
I
80
100
msec
I
150
•~- ,LLLLLLLLI l
200
I
I
0
250
...... I
!
I
I
I
I
0
50
100
150
200
250
msec
I
50
l
100
I
150
msec
msec
DA~
IrFFFFR'
"
I
200
l
250
359 phase-locked peak in the response t o / d a / has no corresponding peak in the response to/ba/(arrows), whereas the remainder of the activity is very similar. Thus, the responses are differentiated when the acoustic parameters o f / d a / a n d / b a / a r e most dissimilar, but not when the stimuli are identical.
Discharge patterns reflectformant transition duration I 0
I 20
A f
!
I
1
I
O
50
leO msec
150
200
Fig. 5. Place of articulation reflected in MUA from thalamocortical fibers recorded at a single site. The alveolar CV syllables, /da/ and /ta/, elicit short-latency 'on' responses, while the bilabial CV syllables,/ba/, does not. short-latency responses, while the bilabial CV syllable,/ba/, does not. Responses at site A in Fig. 3 display a similar differentiation. Phase-locked response patterns reflect the place of articulation by exhibiting selective discharges to the periodic stimulus peaks occurring in the formant transitions. The PSTHs of a single cortical afferent fiber illustrates this response (Fig. 6). The first
The duration of formant transitions is one further acoustic parameter that is reflected in responses to voiced CV syllables. This is manifested by an increased amplitude of the phase-locked response to the stimulus peaks during the formant transitions. Fig. 7, which depicts the simultaneously recorded activity of thalamocortical fibers separated vertically by 1.2 mm, illustrates this finding. The response to / b a / a t site A consists of 3 peaks phase-locked to the initial peaks of the stimulus. These peaks demarcate the duration of the first formant transition. A similar response is seen at site B to/da/, where the first 4 phase-locked peaks are much larger than the remaining peaks. These peaks demarcate the period of the second and third formant transitions. The responses, besides being sensitive to the voiced formant transitions, are also differentiated according to the consonants' place of articulation. At site A, the initial response to /ba/ is much larger than that elicited b y / d a / a n d / t a / . Activity at site B is also sensitive to the place of articulation. In this case, however, the locus is more responsive to the acoustic parameters o f / d a / d u r i n g the formant transitions than t o / b a / .
Activity of thalamocortical fibers and cortical cells differs Under the experimental conditions of this study cortical cells appear to be less responsive to speech sounds than their thalamic input. In 15 % of the electrode penetrations through auditory cortex M U A of thalamocortical fiber origin was identified without responses of the adjacent cortical cells. Temporal response patterns of thalamocortical
Fig. 4. Single unit responses reflect the VOT and phase-lock to the syllable fundamental frequency. PSTHs recorded from a single thalamocortical fiber. Periodic activity occurs throughout/da/and/ba/, and is delayed by the VOT for/ta/. An 'on' response to consonant release initiates the activity to/ta/. Note that the first phase-lockedpeak is present to both/da/and [ba/(arrows, compare with Fig. 6). Number of spikes re 200 stimulus presentations.
360
CLICK
DA
t
BA
TA
.,LLLLLLLLLLLLLL, .tI.LLLLLLLLLLLLL, %rrrrrr rr t
I 0
I 10
I 20
I 0
I 80
msec
I 100
I 150
,LLLLLLLL,
',rrrrrrrrrrrrrr
I 200
I 260
I 0
I 60
I 100
msec
I 150
I 200
I 250
'?rrrrrrr r I 0
I 50
msec
I 100
I 150
I 200
1 250
msec
BA I 0
I 60
I 100
I 150
I 200
t 250
msec Fig. 6. Place of articulation is reflected in the phase-locked responses of this single thalamocortical axon. The first peak in the PSTH t o / d a / h a s no analogous peak in the response t o / b a / ( a r r o w s , compare with Fig. 4), whereas the remainder of the peaks are very similar. Responses also reflect stimulus VOT. Note the pronounced reduction in firing between the response peaks.
CLICK
DA
BA
.,,.J, LJ.,J ]ll-hx-iak~-.
STIMULUS
-
LLLLLLLLLLLLLLh "rrrrrrrrrrrrr
- -
o
I 1o
msec
I 20
I 0
I 50
I 100 msec
I 160
TA
I 200
_;LLLLLLLLLLLLLLL I ..... LI.LLL 'rrrrrrrrrrrrrr' -Trr rr' I 0
I 50
I 100 msec
I 150
I 200
I 0
I 50
I 100 msec
I 150
! 200
Fig. 7. M U A from thalamocortical axons at two sites separated vertically by 1.2 m m and recorded simultaneously. Duration of voiced formant transitions is reflected by accentuation of phase-locked responses during this period. The response t o / b a / a t site A contains 3 peaks phase-locked to the initial peaks of the stimulus, demarcating the duration of the first formant transition. Similar activity is seen in the response t o / d a / a t site B, where the first 4 peaks are much larger than the remaining activity, and which delineate the duration of the higher formants. Activity is also differentiated according to place of articulation. At site A, the response t o / b a / i s greater than those t o / d a / a n d / t a f t At site B, the early activity t o / d a / i s greater than that to/bail Note that later phase-locked activity to the 3 speech sounds is not the same between the voiced and unvoiced CV syllables, illustrating that different consonants can alter responses to identical vowel portions of syllables.
361 fibers and cortical cells also differ. Phase-locked activity to the syllable fundamental frequency is more prominent in thalamocortical axons: 44 70 of syllable evoked thalamocortical fiber responses are phase-locked to stimulus periodicity, compared to 20 70 of cortical responses. Additionally, when cortical phase-locked activity is present, it is usually less robust than that of thalamocortical axons. Sustained activity, on the other hand, is more pronounced in cortical MUA.Whereas 20 ~o of cortical responses are sustained, only 3 7o of thalamocortical axon responses display this pattern. Even though 'on' responses are only slightly more common in cortical cells (52 70 vs 45 7o), qualitatively, the cortical 'on' responses are of longer duration and dominate the total responses to a greater degree than 'on' responses of thalamocortical axons. 'Off' responses are equally probable in both populations (8 ~o). Fig. 8, which depicts M U A at 4 progressively
deeper sites in an electrode pass through the posterior margin of auditory koniocortex, illustrates these differences. Sites A and D, as well as B and C, were recorded simultaneously. Syllable-elicited responses of cortical cells located in lamina III (site A) display weak phase-locking to the stimulus periodicity. This oscillatory activity is superimposed upon prominent 'on' and sustained responses. Separation between the first and third response peaks to/ta/reflects the delayed VOT. 'Off' responses are also present. The cortical M U A at site B, located in upper lamina IV 200 # m below A, exhibits low-amplitude 'on' and sustained responses which only weakly reflect the VOT. Thalamocortical axon MUA at site C, in lower lamina IV, 400/~m below B, consists of robust phase-locked responses which reflect the VOT and which occur 5.0 ms prior to the corresponding peaks at site A. Except for the burst to consonant release of/ta/, no 'on', sustained, or 'off' responses are seen.
"~V"~
' ,,--e~ ~
~r'r"'
)
'~",-Tqe
~.klIJLII~I~LLJ~Lk,LI.L,~,.~~
D
.,..,k,.a,.~,,,,,,t..,J t,~..J.Uk. ~ a •W ~ ~ r " ~r,~'Tal[ ~ - rT' t w,-" ",? ," ~ " T T ' ~ " ~ p ~ ,"v '1
.jLLLLLLLLLLLLLLh ",i'/Trlrlrlrrlrlrlrrrr'
,.LLLLLLLLLLLLLLL, -,'rrrrrrrrrrrrr'
STIM.
I 0
rnsec
I 20
!
0
I
50
I
100 msec
i
150
I
i
200
0 I:>_.
I
50
!
i00 msec
1
150
..... ..... I
200
I
0
. L.LLLL . rrlrrrr .r I
50
I
100
I
150
[
200
msec
Fig. 8. MUA at 4 progressivelydeeper sites in an electrode pass through posterior auditory koniocortex illustrates response pattern differences between thalamocortical fibers and cortical cells (see text for description). Thalamocortical axon responses at site D reflect consonant place of articulation, as the early response to/ha/is larger than those to/da/and/ta/. Activity during the later acoustically identical portions of/a/also differs across syllables, illustrating effects on following vowel responses by preceding consonants. Note double-peaked response at site C to clicks and to each pitch period of the syllables.
362
Discharge patterns are influenced by preceding stimulus features Fig. 8 also illustrates that responses to identical acoustic regions of the syllables can be influenced by the VOT and place of articulation of preceding consonants. This is observed in the later periodic portions of the syllables, where the acoustic parameters o f / d a / , /ba/ and /ta/, are the same. The thalamocortical axon MUA at site D, located in the white matter 600 #m below C, displays phaselocking to the fundamental frequency of/ba/. This response is greatly attenuated for/da/, with almost no response to the periodic portion of/ta/. As the responses at site D were recorded simultaneously with those at site A, where activity is elicited by all 3 syllables, differences due to changes in response over time can be ruled out. Thus, physically identical periodic events may not elicit the same phase-locked responses, but are influenced by the place of articulation and voicing parameters immediately preceding them. Similar effects are seen at site B in Fig. 7.
Speech parameters are processed in parallel Responses reflecting the speech parameters of VOT, place of articulation, and voiced formant transition duration occur at multiple loci in the rostral portions of the thalamocortical radiations and in the auditory cortex, suggesting that acoustical speech parameters are encoded and processed in parallel by multiple neuronal elements. This consideration is clearly expressed in the responses of thalamocortical axons, which function as parallel lines of input into the cortex. The phase-locked responses of thalamocortical axons illustrated in this report were recorded from one monkey at sites encompassing over 21 mm 2 of white matter beneath the koniocortex, prokoniocortex, lateral and caudal parakoniocortex. The responses also vary in their timing, suggesting that the parallel processing does not involve synchronous activation of the thalamic and cortical neurons. Temporal dispersion is observed in the MUA at site B of Fig. 3, which contains two discrete thalamocortical axon volleys in the double-peaked response to click. Similarly, two phase-locked response peaks occur to each syllable pitch period, with average latencies of 8.2 zk 0.7 ms and 11.0 ± 1.0 ms after the syllables' rarefaction peaks. A similar response occurs in the
thalamocortical axon MUA at site C in Fig. 8. The isolated thalamocortical axons of Figs. 4 and 6 also display phase-locked responses with mean latencies of 7.7 -k 0.9 ms and 10.8 -4- 0.7 ms, respectively. As these single unit responses also reflect VOT and place of articulation, it follows that encoding of these parameters also occurs along non-synchronous, but parallel channels. DISCUSSION The results of this study demonstrate that stimulus parameters that play a role in the differential perception of stop CV syllables are expressed in the temporal patterning of activity within the auditory radiations and cortex, sites necessary for speech decoding. Perceptually significant parameters that are reflected in the neural responses include fundamental frequency, VOT, place of articulation, and voiced formant transition duration. Temporal activity patterns expressing these linguistic parameters are determined by the configuration of the syllables' sound pressure waves. Acoustic transients are accentuated by the neural responses, thus reflecting both syllable fundamental frequency and VOT. Differential frequency components that are present in the initial segments of the syllables modify these responses to reflect consonant place of articulation and voiced formant transition duration. The temporal pattern of responses in medial geniculate (MG) and auditory cortical cells to animal vocalizations has previously been shown to depend on the pattern of acoustic transientsl0,al, 50. The present study confirms these findings, and extends this line of research to include human speech sounds, where the acoustic features involved in perceptual discrimination have been demonstrated. Thus, utilizing human vocalizations, it is possible to examine correlations between perceptually relevant acoustic parameters and the evoked activity in the auditory system. Stimulus fundamental frequency, an acoustic feature which participates in the differentiation of voiced from unvoiced consonants27, is reflected by responses phase-locked to the amplitude-modulated (AM) structure of the syllables, and has previously been reported in cortical AEPs and MUA az. This study confirms the previous findings and establishes that phase-locking to speech sounds occurs in both
363 isolated thalamocortical fibers and cortical MUA. As temporally periodic signals such as AM white noise 6, click trainsla, 14, and electrically pulsed stimulation of the auditory nerve 2a, can elicit lowfrequency pitch percepts equal to the frequency of periodicity, it is possible thatthe temporal encoding of speech sound fundamental frequency plays a role in the pitch perception of human speech. VOT, a major cue for distinguishing voiced from unvoiced consonants27,2s, 44, is reflected in responses time-locked to the transients of consonant release, voicing onset, and the peaks in the syllables' AM structure, These response patterns are seen throughout the auditory system, demonstrating that the encoding of temporally significant speech sound features is initiated in peripheral response patterns and maintained as the speech signal is processed at progressively rostral auditory centers. In auditory nerve fibers, voiced CV syllables elicit phase-locked responses to their AM structure that persist throughout their duration 11. For unvoiced CV syllables, phase-locking is delayed by the VOT interval and preceded by a burst to the consonant release2L Similar patterns are seen in the inferior colliculus 47, 4s and, as this report demonstrates, in thalamocortical fibers and cortical cells as well. Consonant place of articulation is reflected by differential responses occurring to the formant transitions, the segment of speech that both acoustically and perceptually distinguishes the stop consonants varying along this phonetic dimensionT,16,17,25,43. 'On' responses may be present for one syllable and not the other. Furthermore, the initial phase-locked response peaks may be present for one syllable and not the other. As the only acoustic differences b e t w e e n / d a / a n d / b a / a r e the frequency characteristics of the second and third formant transitions, the differential responses must be due to these different frequency components. Our data do not allow us to state whether it is the differing spectral content or the direction of frequency change in the formant transitions o f / d a / a n d / b a / , or both, that determine the discriminative consonant responses. Prior work has shown that responses of M G and auditory cortical units are sensitive to both stimulus parameters when incorporated into pure and frequency-modulated tones~,5,15, and animal vocalizations10,46.
Duration of voiced formant transitions, a factor involved in differentiating stop consonants from semivowels and vowels26, is also reflected in responses time-locked to the acoustic transients of the syllables. Phase-locked response peaks during the formant transitions are accentuated in amplitude relative to the peaks occurring in the steady-state portions of the syllables. This pattern has previously been reported for speech-like signals in the cochlear nucleus of the guinea pig a4. Acoustic events related to VOT and place of articulation of stop consonants in syllable initial position alter the activity evoked by the following vowel, as responses to identical segments of/a/differ according to the preceding consonant. A similar effect has been seen in the rat cochlear nucleus, where single unit activity to vowel sounds is dependent upon the vowel's position relative to others in a stimulus train aT. These findings suggest that excitatory and inhibitory synaptic events evoked by phonemes with specific acoustic parameters are maintained for durations long enough to bias the output to later speech sounds. Similar effects are seen in human acoustic perception. Vowels coarticulated with consonants are more intelligible than isolated vowel sounds, even though vowel formant frequencies and durations in the steady-state are relatively constant in both conditions4L Non-target variations in preceding stop consonants interfere with perception of following vowels in a two-choice selective attention task 5a. Using the same experimental technique, it has been shown that, in general, non-target acoustic manipulations affect the perception of other acoustic and phonetic parametersa,al,sl,sL The interference by non-target acoustic variables with the processing of acoustic and phonetic target parameters, as well as the improvement in reaction time when both dimensions covaried redundantly as targets, led Wood 51 to propose a hybrid serialparallel processing model for acoustic stimuli, wherein an initial acoustic analyzer projects its output to parallel systems for further acoustic and phonetic analysis. Within the framework of this model, the recording of speech evoked responses in non-human primate thalamocortical fibers and cortical cells samples the activity of this initial analyzer, furnishing an indication of the type of information
364 which w o u l d be i m p a r t e d to the phonetic analyzer in the h u m a n . I f these a s s u m p t i o n s are valid, then the activity o f this initial acoustic analysis has a l r e a d y t r a n s f o r m e d the neural responses to speech irtto a f o r m which reflects linguistic parameters. II1 the present study, cortical cells were f o u n d to be less responsive to speech sounds t h a n t h a l a m o cortical axons. This finding might be due to the passive s t i m u l a t i o n p a r a d i g m e m p l o y e d here, which has been shown to lead to w e a k e r a n d m o r e labile a u d i t o r y cortical activation when c o m p a r e d to responses o f cortical units in m o n k e y s p e r f o r m i n g an a u d i t o r y d i s c r i m i n a t i o n t a s k 8~. This finding dictates c a u t i o n in interpreting the i n f o r m a t i o n a l significance o f activity p a t t e r n s in sensory cortex o f nonbehaving animals. The t e m p o r a l a r r a n g e m e n t o f speech e v o k e d activity represents j u s t one o r g a n i z a t i o n a l c o n s t r a i n t placed u p o n the neural e n c o d i n g o f speech sounds in s u b - h u m a n primates. A c a r d i n a l feature o f o u r
REFERENCES 1 Allon, N., Yeshurun, Y. and Wollberg, Z., Responses of single cells in the medial geniculate body of awake squirrel monkeys, Exp. Brain Res., 41 (1981) 222-232. 2 Barna, J., Arezzo, J. and Vaughan, H. G., Jr., A new multielectrode array for the simultaneous recording of field potentials and unit activity, Electroenceph. clin. NeurophysioL, 52 (1981) 494-496. 3 Barrett, A., A case of pure word-deafness with autopsy, J. nerv. ment. Dis., 37 (1910) 73-92. 4 Blechner, M., Day, R. and Cutting, J., Processing two dimensions of nonspeech stimuli: the auditory-phonetic distinction reconsidered, J. exp. PsychoL, Human Percept. Perform., 2 (1976) 257-266. 5 Brugge, J. and Merzenich, M., Responses of neurons in auditory cortex of the macaque monkey to monaural and binaural stimulation, J. Neurophysiol., 36 (1973) 1138-1158. 6 Burns, E. and Viemeister, N., Nonspectral pitch, J. acoust. Soc. Amer., 60 (1976) 863-869. 7 Chang, S. and Blumstein, S., The role of onsets in perception of stop place of articulation: effects of spectral and temporal discontinuity, J. acoust. Soc. Amer., 70 (1981) 3944. 8 Cole, R. and Scott, B., The phantom in the phoneme: invariant cues for stop consonants, Percept. Psychophys., 15 (1974) 101-107. 9 Cooper, F., Acoustics in human communication: evolving ideas about the nature of speech, J. acoust. Soc. Amer., 68 (1980) 18-21. 10 Creutzfeldt, O., Hellweg, F.-C. and Schreiner, C., Thalamocortical transformation of responses to complex auditory stimuli, Exp. Brain Res., 39 (1980) 87-104.
findings is that processing o f p h o n e t i c stimuli occurs in parallel over extensive regions o f the STP. T e m p o r a l response p a t t e r n s to speech, as well as their correlations with p h o n e t i c parameters, m u s t therefore be e x a m i n e d in the context o f the cytoarchitectonic a n d t o n o t o p i c o r g a n i z a t i o n s o f the M G a n d a u d i t o r y cortex. F u r t h e r m o r e , it is i m p o r t a n t to examine the effects on cortical responses when speech sounds are used as c o n d i t i o n a l cues for differential b e h a v i o r a l activity. ACKNOWLEDGEMENTS The a u t h o r s gratefully a c k n o w l e d g e the technical assistance o f J. B a r n a a n d C. F r e e m a n , a n d t h a n k T. Halwes o f Haskirts L a b o r a t o r i e s for his a i d in constructing the speech stimuli. This research was s u p p o r t e d in p a r t b y N I H T r a i n i n g G r a n t 5T32GM7288 f r o m N I G M S a n d G r a n t s H D 01799 a n d M H 06723 f r o m the U S P H S .
11 Delgutte, B. Representation of speech-like sounds in the discharge patterns of auditory-nerve fibers, J. acoust. Soc. Amer., 68 (1980) 843-857. 12 Dewson, J., Pribram, K. and Lynch, J., Effects of ablations of temporal cortex upon speech sound discrimination in the monkey, Exp. Neurol., 24 (1969) 579-591. 13 Flanagan, J. and Guttman, N., On the pitch of periodic pulses, J. acoust. Soc. Amer., 32 (1960) 1308-1319. 14 Flanagan, J. and Guttman, N., Pitch of periodic pulses without fundamental component, J. acoust. Soc. Amer., 32 (1960) 1319-1328. 15 Funkenstein, H. and Winter, P., Responses to acoustic stimuli of units in the auditory cortex of awake squirrel monkeys, Exp. Brain Res., 18 (1973) 464-488. 16 Halle, M., Hughes, G. and Radley, J.-P., Acoustic properties of stop consonants, J. acoust. Soc. Amer., 29 (1957) 107-116. 17 Harris, K., Hoffman, H., Liberman, A., Delattre, P. and Cooper, F., Effects of third-formant transitions on the perception of the voiced stop consonants, J. acoust. Soc. Amer., 30 (1958) 122-126. 18 Hupfer, K., Jfirgens, U. and Ploog, D., The effect of superior temporal lesions on the recognition of speciesspecific calls in the squirrel monkey, Exp. Brain Res., 30 (1977) 75-87. 19 Jerger, J., Weikers, N., Sharbrough, F. and Jerger, S., Bilateral lesions of the temporal lobe: a case study, Acta oto-laryng., Suppl. 258 (1969) 1-51. 20 Kanshepolsky, J., Kelly, J. and Waggener, J., A cortical auditory disorder: clinical, audiologic and pathologic aspects, Neurology, 23 (1973) 699-705. 21 Kertesz, A., Lesk, D. and McCabe, P., Isotope localization of infarcts in aphasia, Arch. Neurol., 34 (1977) 590-601.
365 22 Kiang, N. Y. S. and Moxon, E., Physiological considerations in artificial stimulation of the inner ear, Ann. OtoL, 81 (1972) 714-730. 23 Kuhl, P. and Miller, J., Speech perception by the chinchilla: identification functions for synthetic VOT stimuli, J. acoust. Soc. Amer., 63 (1978) 905-917. 24 Kuhl, P., Discrimination of speech by nonhuman animals: basic auditory sensitivities conductive to the perception of speech-sound categories, J. acoust. Soc. Amer., 70 (1981) 340-349. 25 Liberman, A., Delattre, P., Cooper, F. and Gerstman, L., The role of consonant-vowel transitions in the perception of the stop and nasal consonants, Psychol. Monogr., 68 (1954) 1-13. 26 Liberman, A., Delattre, P., Gerstman, L. and Cooper, F., Tempo of frequency change as a cue for distinguishing classes of speech sounds, J. exp. PsychoL, 52 (1956) 127-137. 27 Massaro, D. and Cohen, M., The contribution of fundamental frequency and voice onset time to the /zi/-/si/ distinction, J. acoust. Soc..4met., 60 (1976) 71)4-717. 28 Massaro, D. and Oden, G., Evaluation and integration of acoustic features in speech perception, J. acoust. Soc. Amer., 67 (1980) 996-1013. 29 Merzenich, M., Michelson, R., Pettit, C., Schindler, R. and Reid, M., Neural encoding of sound sensation evoked by electrical stimulation of the acoustic nerve, Ann. Otol., 82 (1973) 486-502. 30 Michel, F. and Peronnet, F., A case of cortical deafness: clinical and electrophysiological data, Brain andLanguage, 10 (1980) 367-377. 31 Miller, J. L., Interactions in processing segmental and suprasegmental features of speech, Percept. Psychophys., 24 (1978) 175-180. 32 Miller, J. M., Sutton, D., Pfingst, B., Ryan, A. and Beaton, R., Single cell activity in the auditory cortex of rhesus monkeys: behavioral dependency, Science, 177 (1972) 449-451. 33 Moore, T. and Cashin, J., Jr., Response patterns of cochlear nucleus neurons to excerpts from sustained vowels, J. acoust. Soc. Amer., 56 (1974) 1565-1576. 34 Moore, T. and Cashin, J., Jr., Response of cochlearnucleus neurons to synthetic speech, J. acoust. Soc. Arner., 59 (1976) 1443-1449. 35 Mott, F., Bilateral lesion of the auditory cortical centre: complete deafness and aphasia, Brit. med. J., 2 (1907) 310-315. 36 Oppenheimer, D. and Newcombe, F., Clinical and anatomic findings in a case of auditory agnosia, Arch. NeuroL, 35 (1978) 712-719. 37 Rupert, A., Caspary, D. and Moushegian, G., Response characteristics of cochlear nucleus neurons to vowel sounds, Ann. Otol., 86 (1977) 37-48. 38 Sachs, M. and Young, E., Encoding of steady-state vowels
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
in the auditory nerve: representation fn terms of discharge rate, J. acoust. Soc. Amer., 66 (197.') 470-479. Sachs, M. and Young, E., Effects of nonlinearities on speech encoding in the auditory nerve, J. acoust. Soc. Amer., 68 (1980) 858-875. Sinnott, J., Beecher, M., Moody, D. and Stebbins, W., Speech sound discrimination by monkeys and humans, J. acoust. Soc. Amer., 60 (1976) 687-695. Sovij/irvi, A., Detection of natural complex sounds by cells in the primary auditory cortex of the cat, Acta physiol, scand., 93 (1975) 318-335. Steinschneider, M., Arezzo, J. and Vaughan, H. G. Jr., Phase-locked cortical responses to a human speech sound and low-frequency tones in the monkey, Brain Research, 198 (1980) 75-84. Stevens, K. and Blumstein, S., Invarient cues for place of articulation in stop consonants, J. acoust. Soc. Amer., 64 (1978) 1358-1368. Stevens, K. and Klatt, D., Role of formant transitions in the voiced-voiceless distinction for stops, J. acoust. Soc. Amer., 55 (1974) 653-659. Strange, W., Verbrugge, R., Shankweiler, D. and Edman, T., Consonant environment specifies vowel identity, J. acoust. Soc. Amer., 60 (1976) 213-224. Symmes, D., Alexander, G. and Newman, J., Neural processing of vocalizations and artificial stimuli in the medial geniculate body of squirrel monkey, Hear. Res., 3 (1980) 133-146. Watanabe, T. and Sakai, H., Responses of the collicular auditory neurons to human speech, Proc. Jap. Acad., 49 (1973) 291-296. Watanabe, T. and Sakai, H., Responses of the cat's collicular auditory neuron to human speech, J. acoust. Soc. Amer., 64 (1978) 333-337. Waters, R. and Wilson, W., Jr., Speech perception by rhesus monkeys: the voicing distinction in synthesized labial and velar stop consonants, Percept. Psychophys., 19 (1976) 285-289. Wollberg, Z. and Newman, J., Auditory cortex of squirrel monkey: response patterns of single cells to speciesspecific vocalizations, Science, 175 (1972) 212-214. Wood, C., Parallel processing of auditory and phonetic information in speech perception, Percept. Psychophys., 15 (1974) 501-508. Wood, C., Auditory and phonetic levels of processing in speech perception: neurophysiological and informationprocessing analyses, J. exp. PsychoL, 104 (1975) 3-20. Wood, C. and Day, R., Failure of selective attention to phonetic segments in consonant-vowel syllables, Percept. Psychophys., 17 (1975) 346-350. Young, E. and Sachs, M., Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory nerve fibers, J. acoust. Soc. Amer., 66 (1979) 1381-1403.