Hearing Research 165 (2002) 68^84 www.elsevier.com/locate/heares
Temporal integration in the human auditory cortex as represented by the development of the steady-state magnetic ¢eld Bernhard RoM b
a;b
*, Terence W. Picton b , Christo Pantev
b
a Institute of Experimental Audiology, Mu«nster University Hospital, Mu«nster, Germany Rotman Research Institute for Neuroscience, Baycrest Centre for Geriatric Care, 3560 Bathurst Street, Toronto, ON, Canada M6A 2E1
Received 20 August 2001; accepted 18 December 2001
Abstract The threshold for detecting amplitude modulation (AM) decreases with increasing duration of the AM sound up to several hundred milliseconds. If the auditory evoked steady-state response (SSR) to AM sound is an electrophysiological correlate of AM processing in the human brain, the development of the SSR should follow this course of temporal integration. Magnetoencephalographic recordings of SSR to 40 Hz AM tone-bursts were compared with responses to non-modulated tonebursts at inter-stimulus intervals (ISIs) of 3, 1, and 0.5 s. Both types of stimuli elicited a transient gamma-band response (GBR), an N1 wave, and a sustained field (SF) during stimulus presentation. The AM stimulus evoked an additional 40 Hz SSR. The N1 amplitude was strongly reduced with shortened ISI, whereas the amplitudes of SSR, GBR, and SF were little affected by the ISI. Magnetic source-localization procedures estimated the generators of the early GBR, the SSR, and the SF to be anterior and medial to the sources of the N1. The sources of the SSR were in primary auditory cortex and separate from GBR sources. The SSR amplitude increased monotonically over a 200 ms period beginning about 40 ms after stimulus onset. The time course of the SSR phase reliably measured the duration of this transition to the steady state. At stimulus offset the SSR ceased within 50 ms. These results indicate that the primary auditory cortex responds immediately to stimulus changes and integrates stimulus features over a period of about 200 ms. 7 2002 Elsevier Science B.V. All rights reserved. Key words: Temporal integration ; Auditory steady-state response; Primary auditory cortex ; Amplitude modulation; Middle latency response; Gamma band
1. Introduction In normal hearing subjects the auditory threshold decreases with increasing stimulus duration (Hughes, 1946). For pure tones the change in hearing threshold shows an exponential characteristic of the form 1 I=I 0 ¼ ; ð13e3t=d Þ where t is the time, I the intensity at threshold and I0 the intensity at threshold for a very long stimulus (Plomp and Bouman, 1959). The time constant d, which characterizes the time course of the process, varies (be-
* Corresponding author. Tel.: +1 (416) 785 2500 ext. 3387; Fax: +1 (416) 785 2862. E-mail address:
[email protected] (B. RoM).
tween 50 and several hundred milliseconds) with the experimental design and the way in which the duration is measured. Longer time constants of temporal integration were found for tones of lower frequencies, e.g. 160 ms for 125 Hz, 83 ms for 1000 Hz and 52 ms for 4000 Hz (Watson and Gengel, 1969). In contrast, Florentine et al. (1988) found no signi¢cant frequency e¡ect and horizontal segment of temporal integration functions, as described with the exponential model. From a review of 20 studies of this phenomenon, Gerken et al. (1990) concluded that the power function model given by Itm = constant gives a better quantitative description. In this equation I and t have the same meaning as in the exponential model. The exponent m is measured as the slope of the change in intensity with duration plotted on log^log coordinates, m = 1 corresponding to 10 dB threshold decrease for a decade (or ten-fold) increase in duration, with normal values between 6 and
0378-5955 / 02 / $ ^ see front matter 7 2002 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 8 - 5 9 5 5 ( 0 2 ) 0 0 2 8 5 - X
HEARES 3825 16-5-02
B. RoM et al. / Hearing Research 165 (2002) 68^84
10 dB/decade for normal hearing subjects. Both equations indicate that the change in the threshold with increasing duration becomes vanishingly small after several hundred milliseconds, but they di¡er on the curvature whereby the asymptote is reached. The lowered threshold with increasing duration indicates the integration of the auditory signal over time (Florentine et al., 1988). On the other hand, since the human auditory system can detect acoustical signal changes lasting only a few milliseconds, processes must exist that integrate with signi¢cantly shorter time constants. The temporal modulation transfer function (TMTF) indicating the time resolution for the auditory system has a time constant of about 2.5 ms (Bacon and Viemeister, 1985; Viemeister, 1979). The time constants necessary for temporal integration and time resolution di¡er by two orders of magnitude. This discrepancy was termed the ‘resolution^integration’ paradox by de Boer (1985). Viemeister and Wake¢eld (1991) attempted to resolve this paradox with a model of ‘multiple looks’. They proposed that a small integration time constant of about 3 ms can explain the large temporal integration time constant of several hundred milliseconds if the auditory system samples the input signal at high rate and stores these samples in a short-term memory. These samples can be accessed in parallel and processed selectively. Viemeister and Wake¢eld (1991) compared the detection threshold for a pair of pulses with the threshold for a single pulse as a function of the separation time between the pulses. For a delay of 1 ms, the thresholds were 4 dB lower than those for single pulses, which agreed with the expected threshold shift of 3 dB if the energies of both pulses were completely combined. The threshold increased to a level of 1.5 dB below the threshold for the single pulse for increased separation of stimuli, but then remained constant for pulse intervals from 5 to 200 ms. Changes in threshold resulting from integration over time took place only during the ¢rst 5 ms. Furthermore, the threshold for a pulse pair separated by 100 ms was not a¡ected by masking noise inserted between the pulses. Therefore, no long-term integration appears to occur. The model of multiple looks explains the threshold shifts for stimuli of long duration, like tone-bursts or noise-bursts, through a later mechanism that stores (and evaluates) the outputs from the multiple looks. Nevertheless, this proposal leaves a number of questions still open: How do succeeding samples contribute to signal detection ? How long is the sampling element? How it is shaped ? If all samples contribute equally to detection, the theory of multiple looks predicts a threshold decrease of 5 dB per decade increase in duration. In order to explain the actual slope of 310 dB/decade the model requires increasing weights for each look at least over the ¢rst 200
69
ms of stimulus duration. A small, but signi¢cant, e¡ect of time on the detection contribution by individual temporal segments of the signal was observed by Buus (1999). In his study the stimuli were sinusoidally amplitude-modulated (AM) tones. The temporal integration function did not di¡er from those obtained for steady tones, so the listener’s detection strategies were assumed to be consistent for modulated and steady tones. Sheft and Yost (1990) investigated the threshold for AM detection as a function of duration in the time range from one period of AM to 400 ms at modulation frequencies ranging from 5 to 640 Hz. The observed band-pass-like TMTFs showed a maximum in the 20^ 40 Hz range. The threshold decreased with increased duration for the investigated time range. The resulting integration characteristics which showed the modulation depth in dB at threshold as a function of the logarithm of AM duration were straight lines. At 40 Hz modulation frequency the slopes of these characteristics were 7.46 dB per decade of duration, if the AM signal was presented as burst. A steeper slope of 9.36 dB/decade was observed when the AM occurred within a continuously presented wide-band noise carrier. Similar results of 9.30 dB per decade of duration were obtained with 500 ms of carrier signal preceding the AM. The e¡ect of stimulus duration on discrimination of modulation rate (Lee, 1994) as well as modulation depth (Lee and Bacon, 1997) is most pronounced below a critical duration which equals ¢ve periods of AM. Above that duration, the threshold for detection of changes in modulation rate or depth decreased only slightly or remained constant. Lee and Bacon (1997) discussed the e¡ect of stimulus duration in the context of the model of multiple looks. If a pair of AM periods represents a single look, the number of looks is given by the number of cycles minus one. If individual looks are mutually independent, each look contributes equally to discrimination, and the looks are optimally combined. The model provided a reasonable prediction of the change in AM depth discrimination threshold as a function of stimulus duration. Dau et al. (1997) found a decreasing threshold for increasing duration of AM similar to the phenomenon of threshold shift for increasing duration of tone-bursts. At a modulation frequency of 40 Hz the threshold for modulation detection decreased by 12 dB when the duration of AM was increased from 25 to 400 ms (3 dB/ double duration). Although Dau et al. (1997) agreed that the models of multiple looks were able to explain the observed relation between the duration of modulation and threshold shift, they preferred a model of an optimal detector integrating over a variable time. Temporal integration, more generally described as a combination of auditory information over time, seems to be a fundamental ability of the auditory system. For
HEARES 3825 16-5-02
70
B. RoM et al. / Hearing Research 165 (2002) 68^84
the listener this is advantageous not only at detection thresholds but also at higher intensities. Therefore, the phenomenon of temporal integration should be re£ected in electrophysiological correlates which are measurable at superthreshold intensities. Onishi and Davis (1968) showed an increasing amplitude of the N1 de£ection of slow auditory evoked potentials with increasing sound duration up to 30 ms. Alain et al. (1997) investigated this e¡ect in more detail and found N1 amplitude enhancement up to a stimulus duration of 72 ms. Di¡erent components of the N1 response showed di¡erent integration time constants. Also larger time constants for low stimulus frequencies have been reported (Alain et al., 1997). Similar integration time constants were found in a magnetoencephalographic (MEG) study by Gage and Roberts (2000) using stimuli of constant intensity and constant energy. They concluded that the N1 amplitude depends more likely on stimulus duration than on energy integration. However, the transient N1 peak with latency around 100 ms cannot re£ect auditory integration over longer time intervals. To overcome this situation pairs of stimuli have been used. In general, response amplitudes diminish if the time interval between the succeeding stimuli was shortened. However, Loveless et al. (1996) reported an increased N1 amplitude to the second stimulus of a pair of short tone-bursts, when the distance between the stimuli was shorter than 300 ms, indicating a shortterm facilitation process that has a time course similar to that of temporal integration. Several other studies have replicated these ¢ndings for both magnetic (McEvoy et al., 1997) and electrical recordings (Budd and Mitchie, 1994). Temporal integration in the auditory system has also been demonstrated in studies of the mismatch negativity (MMN). Sussman et al. (1999) presented within a series of standard stimuli an infrequent pair of deviant stimuli, which were di¡erent in two stimulus parameters. They observed a single MMN peak, when the onsets of deviants were separated by 150 ms and two separate MMN peaks if the separation was 300 ms. Several other studies with the MMN have shown that this phenomenon links stimuli together over periods of about 300 ms (Winkler et al., 1998; Yabe et al., 1997). One problem with these studies of the N1 and the MMN is that the responses are transient and do not allow one to follow integration over time. Better candidates for studying the phenomenon of temporal integration seem to be the cortical responses in the gammafrequency band (electroencephalographic (EEG) activity with frequencies above 24 Hz, mainly in the 30^50 Hz region) and especially the steady-state response (SSR) because of their high temporal resolution. The auditory SSR was reported for the ¢rst time as a response to clicks presented at a rate of 40 Hz (Galambos
et al., 1981). Their interpretation of this type of driven oscillatory brain activity was that the short inter-stimulus interval (ISI) does not allow the activated brain structures to return to their resting state. Successive middle latency responses (MLRs) therefore overlap linearly and form the SSR. AM tones could also elicit SSR with a maximum response amplitude at modulation frequency near 40 Hz (Kuwada et al., 1986; Picton et al., 1987; Rees et al., 1986; RoM et al., 2000). Auditory SSR can be evoked with AM tones even at modulation depth as small as 5% (Picton et al., 1987; RoM et al., 2000), which is near the behavioral threshold for modulation detection. In a neuromagnetic study, Hari et al. (1989) demonstrated SSR to wide-band stimuli such as clicks and periodically gated noise. They discussed the functional meaning of the SSR in relation to the neural mechanisms involved in periodicity processing. Evidence against the hypothesis of linear superimposition of MLRs has come from simulation studies and the observation of di¡erent recovery time courses of SSR and MLR after anesthesia (Santarelli and Conti, 1999). In the frequency domain the SSR is represented by a single peak of constant amplitude and phase. Amplitude £uctuations of the SSR have been observed in relation to attention, arousal or the state of anesthesia (Galambos and Makeig, 1988; Linden et al., 1985; Plourde and Picton, 1990). However, the time course of the SSR amplitude during the transition from quiet into the steady state has not yet been reported. If the SSR represents the processing of AM in the auditory system, then the time course of the SSR at the onset of the AM sound may re£ect the process of auditory temporal integration. This study investigated the temporal dynamics of the magnetically recorded auditory SSR. In order to measure the development of the SSR we had to distinguish it from several other processes occurring at the onset of a sound, in particular the N1^P2 waves, the MLR, the transient gamma-band response (GBR) (Jacobson and Fitzgerald, 1997; Pantev, 1995), and the sustained ¢eld (SF) which is an activity of constant amplitude lasting for the stimulus duration (Picton et al., 1978; Pantev et al., 1994). Our goals were to examine the time course of the developing SSR in order to see whether this relates to the timing of temporal integration, and to gain some deeper insight into the generator mechanisms of the SSR and its relation to other auditory evoked ¢elds.
2. Materials and methods 2.1. Methods The ¢rst part of this study reanalyzed MEG data
HEARES 3825 16-5-02
B. RoM et al. / Hearing Research 165 (2002) 68^84
which had been acquired in a previous study with a di¡erent objective (Engelien et al., 2000). The stimuli used in this study were AM tone-bursts and non-modulated tone-bursts. Both stimuli evoked slow transient cortical responses and SFs. In addition, the AM stimulus elicited an SSR. The SSR could be disentangled from the transient responses by comparing the responses to the modulated tones with the responses to the non-modulated tones. Since the N1^P2 complex of the slow cortical responses arises in the same latency as the onset of the auditory SSR, we performed a second experiment using shorter ISIs to reduce the amplitudes of slow transient evoked responses. 2.2. Subjects Nine right-handed subjects (¢ve female, four male) of ages 20^31 years (mean age 25 years) participated in the ¢rst experiment. A second group of nine subjects (three female, six male) of ages 23^31 years (mean age 26 years) served for the second experiment. None of them participated in both experiments. Subjects had no history of otological or neurological disorders and had normal audiological status (air conduction thresholds no more than 10 dB HL at octave frequencies between 0.25 and 8 kHz). The study was reviewed by the Ethics Commission of the Medical Faculty of the University of Mu«nster. Informed consent was obtained from each subject after the nature of the study was explained to the subjects in accordance with the principles of the Declaration of Helsinki. 2.3. Stimulation The AM stimulus was generated by multiplication of a sinusoidal carrier with frequency fc and amplitude a by a shifted cosine function with the modulation frequency fm yielding the desired signal y(t): yðtÞ ¼ asinð2Z f c tÞW0:5ð13cosð2Z f m tÞÞ
ð1Þ
With this de¢nition the modulation depth is set to the maximum which means that the signal envelope £uctuates periodically between zero and twice the amplitude a. The amplitude spectrum of this AM signal consists of a peak of amplitude a at the carrier frequency fc and two peaks of half that amplitude at fc 3fm and fc +fm . The modulation frequency was chosen to be close to 40 Hz. The di¡erent values of 39 Hz in the ¢rst and 40 Hz in the second experiment were chosen for technical reasons, and no e¡ect on the SSR amplitude was expected. An AM tone-burst consisted of 20 periods of the modulation frequency (500 ms duration at 40 Hz and 512 ms at 39 Hz). In the second experiment tone-bursts of 250 ms duration corresponding to 10 periods of 40 Hz
71
modulation were also used. The envelope of the nonmodulated tone-burst was derived from the AM toneburst by keeping the amplitude constant between the ¢rst and the last modulation maximum. Examples of the stimulus waveforms are shown in Fig. 1. In the ¢rst experiment a carrier frequency of 250 Hz was used because a previous study showed largest SSR amplitudes at low carrier frequencies (RoM et al., 2000). For the repeated measurements at the second experiment a carrier frequency of 1000 Hz was used in order to investigate whether the time course of the SSR depends on the carrier frequency. The stimuli were presented through a magnetically silent delivery system consisting of speakers (1Q compression driver, Renkus-Heinz Inc.) that were mounted outside the magnetically shielded room and were connected to a silicon ear piece through 6.3 m of echoless plastic tubing (16 mm inner diameter). The transfer characteristic of this system deviated less than T 10 dB in amplitude between 200 and 6000 Hz. Since the stimulus was de¢ned in a narrow frequency band (80 Hz), the stimuli were not distorted by the frequency characteristic of the sound delivery system. The transmission delay of about 19 ms was compensated by an appropriate shift of the trigger signal. Before carrying out the experimental measurements, both the signal spectrum of the stimulus and its correct timing were checked by means of a 2 cm3 ear simulator (Bru«elpKjUr model 4157) that was equipped with an 1/2Q condenser microphone (Bru«elpKjUr model 4134) and connected to the silicon ear piece at the end of the sound delivery system. For AM tones these measurements assured correct modulation phase and modulation depth. The stimuli were presented monaurally to the subject’s right ear. In the ¢rst experiment, the stimulus intensity was 60 dB referred to the individual’s sensation threshold (dB SL). For this purpose, the subject’s hearing threshold was measured prior to each experimental session by applying the relevant stimulus type at an ISI of 1 s trough the sound delivery system. In the second experiment the stimulus intensity was increased to 70 dB SL in order to partially compensate for the smaller SSR amplitude when stimulating at 1000 Hz instead at 250 Hz. In the ¢rst experiment 1024 AM tone-bursts and 512 non-modulated tone-bursts at an ISI of 3 s were presented in one session. For the second experiment, two experimental sessions, one with modulated and the other with non-modulated tone-bursts, were carried out on di¡erent days. One thousand stimuli were presented at an ISI of 3 s, 1000 stimuli at an ISI of 1 s, and 2000 stimuli at an ISI of 0.5 s. The stimuli were presented in blocks of about 4 min duration. Between blocks the technician conversed with the subject in order to keep her or him in an alert state.
HEARES 3825 16-5-02
72
B. RoM et al. / Hearing Research 165 (2002) 68^84
2.4. Data acquisition Recordings were performed in a magnetically and acoustically shielded room. The subjects rested in right lateral position on an air mattress, with their head lying on a mold to permit stable ¢xation throughout the whole experimental session. The MEG was recorded with a 37-channel neuromagnetometer (MAGNES, 4D-Neuro-Imaging Inc.). The detection coils of the neuromagnetometer were arranged in a circular concave array with a diameter of 144 mm and a spherical radius of 122 mm. The distance between the centers of the coils was 22 mm and the coil diameter was 20 mm. The sensors were con¢gured as a ¢rst-order axial gradiometer with a baseline of 50 mm. The spectral density of the intrinsic noise of each channel was between 5 and 7 fT/kHz in the frequency range above 1 Hz. The sensor array was placed over the left temporal plane, centered over a point about 1.5 cm superior to the position T3 of the 10^20 system for electrode placement, as close to the subject’s head as possible. A sensor-position indicator system determined the spatial locations of the sensors relative to the head and indicated if head movements occurred during the recordings. No head movements su⁄cient to require discarding of data were observed in the study. During the MEG session, subjects watched cartoon videos that were projected via ¢ber-optic cable onto a non-magnetic display. The subjects were instructed to stay in a relaxed state to reduce the in£uence of myogenic activity on the MEG signals and their compliance was veri¢ed by video-monitoring. The reason for maintaining wakefulness was that sleep substantially decreases SSR amplitudes (Jerger et al., 1986; Linden et al., 1985). The magnetic ¢eld data of 37 channels were bandpass ¢ltered from 0.1 to 200 Hz prior to sampling at the rate of 520 Hz and digitization with 16 bit resolution. When stimulating with an ISI of 3 s, stimulus-related data epochs of 1100 ms length were recorded including a 300 ms pre-stimulus interval. At higher stimulus rates, the data were recorded continuously. 2.5. Data analysis For averaging stimulus-related epochs of the magnetic ¢eld data the epoch duration was 1100 ms beginning 300 ms before and ending 300 ms after the 500 ms stimulus. If the magnetic ¢eld amplitude exceeded 3.5 pT, the corresponding epoch was rejected as artifactcontaminated. In order to separate the di¡erent response components two band-pass ¢lters were applied to the averaged data. A 24^60 Hz band-pass extracted the GBR and SSR. A second ¢lter passing between 1
and 24 Hz gave the low-frequency P1^N1^P2 response and the SF. Source analysis based on the model of a single moving equivalent current dipole (ECD) in a spherical volume conductor was applied to the measured ¢eld distribution. Source localizations were estimated for each sampled data point in a head-based Cartesian coordinate system. The origin was set to the midpoint of the medial^lateral axis (y axis) between the acoustic meatuses of the left and right ear. The posterior^anterior axis (x axis) ran between the nasion and the origin, and the inferior^superior axis (z axis) ran through the origin perpendicular to the (x^y plane). Estimates of the source parameters were accepted for further evaluation only if both the goodness of ¢t of the ¢eld of the estimated ECD to the measured magnetic ¢eld was greater than 96% and the distance of the ECD to the midsagittal plane was greater than 3 cm. The median values across all reliable source coordinates in the time interval around a magnetic ¢eld maximum and all repeated measurements were used as an estimate for the individual source coordinates. This analysis was applied to the 24^60 Hz band-pass ¢ltered data and source coordinates of the MLR waves Na, Pa, Nb, Pb, and the SSR were estimated separately. In addition, from the 24 Hz low-pass ¢ltered data the SF and the N1 source coordinates were estimated. No anatomical information were available from magnetic resonance imaging for the subjects. In order to reduce the e¡ect of inter-individual anatomical variations the N1 source coordinates were used as individual landmarks. The source coordinates of MLR, SF and SSR were expressed as distances from these landmarks. With the individual results of magnetic source localization a spatial ¢lter was constructed that can collapse the time series of the 37 MEG sensors into a single waveform of magnetic dipole moment. The method called source-space projection is based on the linear relationship that exists between each current source q(R) at the position R in the brain and the magnetic ¢eld bi (r) which is measured with the ith sensor at position r outside the head. This relation is given by the equation b(r) = L(r, R)Wq(R) (Ha«ma«la«inen et al., 1993). The lead¢eld matrix L(r, R) depends on the source position and the sensor position, as well as upon the properties of the volume conductor and the sensors. The described relation maps each current source into a multidimensional (37 in our case) signal space (Ilmoniemi et al., 1987). The method of source-space projection performs the reversed mapping. The resulting output signal is a single waveform that would be seen by a virtual sensor (Robinson and Rose, 1992) that responds maximally to the region of interest in the brain. Therefore, the meth-
HEARES 3825 16-5-02
B. RoM et al. / Hearing Research 165 (2002) 68^84
od is spatially sensitive, and the contribution of spontaneous brain activity from other regions to the resulting signal is reduced. In addition, uncorrelated system noise is reduced. A typical enhancement of the signal^noise ratio by a factor of two for the detection of auditory SSR at 40 Hz was demonstrated (RoM et al., 2000). Such a ¢lter is the dot product of the measured ¢eld b(t) with a weighting vector W(q(R)) (Robinson, 1989). In this study, the pseudo-inverse L31 (r, R) of the lead¢eld matrix and the estimated orientation q(R)/Mq(R)M of the underlying source were used to de¢ne the ¢lter W. Two separate spatial ¢lters were calculated for each subject. One was based on the N1 source coordinates and directions and was applied to the low-pass ¢ltered data. The second, which used the SSR source coordinates, was applied to the 24^60 Hz band-pass ¢ltered response data. The polarity of the resulting dipole moment waveforms was de¢ned in the way that the de£ection around 100 ms, which is the magnetic counterpart of the slow cortical evoked N1 potential, has a negative polarity. In order to study the time course of the amplitude and phase of the response at the modulation frequency, a modi¢ed version of a Gabor ¢lter (Sinkkonen et al., 1995) was applied to the response signals. The signals were convolved with windowed short sections of sine and cosine functions. From the resulting real and imaginary parts of the convolution product at each data point, the amplitude and phase were calculated as a function of time. A three-term Blackman^Harris window was used which is de¢ned on a shorter time interval than the Gabor function. The kernel duration was 30 ms in the time domain. The bandwidth was 16 Hz centered around 40 Hz. Both measures were obtained between 33 dB points and extended to 42 ms and 22 Hz at 36 dB. The time^bandwidth product, de¢ned at 33 dB points, was 1.5 times the optimal value of the Gabor ¢lter. The analysis resulted in amplitude and phase values a(t) and P(t) for each data point. The di¡erence between the response phase and the stimulus envelope phase vP(t) = P(t)32Zfm t is equivalent to a time delay between stimulus and response. Its mean value in the steady-state time interval (300^500 ms at 500 ms stimulus duration and 200^250 ms at 250 ms stimulus duration) was adjusted to zero prior to statistical evaluation to reduce the e¡ect of inter-individual variations on the group statistics. For similar reasons, the individual amplitude characteristics a(t) were scaled to one in the same time interval. Amplitude and phase characteristics were calculated individually and were grand averaged across the group of nine subjects. The 95% con¢dence limits of the grand averages were obtained with the method of bootstrap resampling (Davison and Hinkley, 1997; Noreen, 1989).
73
3. Results 3.1. Experiment 1 Time-series data of the magnetic source strength for all nine subjects in the ¢rst experiment are displayed in Fig. 1. For each single subject, the response to AM tone-bursts is shown in the upper trace and the response to non-modulated tone-bursts in the lower trace of the pair. All signals were band-pass ¢ltered between 24 and 60 Hz to show only the gamma-band portion of the response. In the case of AM stimulation, an SSR occurs for all subjects in the form of oscillations at the modulation frequency with roughly constant amplitude in the time range from 200 to 500 ms. This response could be clearly distinguished from the spontaneous background activity in all subjects. The individual amplitudes varied over a wide range with the maximum amplitude (S3) being about four times the minimum amplitude (S4). During the non-modulated tone-burst, the evoked activity in the gamma band did not exceed the level of residual noise within the time interval 200^ 500 ms after stimulus onset. However, in the ¢rst 100 ms after stimulus onset the response signals to both types of stimuli contain a number of irregular oscillations, which represent the transient GBR. Although the SSR was very similar in morphology across subjects the GBR response pattern in the ¢rst 100 ms varied greatly from one subject to the next. The transient GBR to the constant-amplitude tone-burst returns to the level of the general background at 100 ms after stimulus onset. The response to the AM stimulus shows a burst of irregular gamma activity followed by a depression in this activity near 100 ms and, in the interval between 100 and 200 ms, regular oscillations with increasing amplitude. After the end of the stimulus at 500 ms, the SSR decayed rapidly (reaching baseline levels within 50^100 ms), and a gamma-band o¡-response of smaller amplitude than the on-response was observed in most subjects. The grand average waveforms across the nine subjects are plotted with thick lines in Fig. 2b for the AM stimulus and Fig. 2d for the tone-burst stimulus. As expected from the latency variations seen in Fig. 1 the time-domain average underestimates the transient response in the ¢rst 100 ms interval. Therefore, a second averaging method was applied which correctly estimated the mean response amplitude regardless of individual phase variations. At ¢rst, a Hilbert transform was performed on the individual responses. This allowed the calculation of the instantaneous amplitude and phase over time which were averaged separately and resulted in time course of mean amplitude and phase. Waveforms which were reconstructed from the mean amplitude and mean phase are shown in Fig. 2b,d
HEARES 3825 16-5-02
74
B. RoM et al. / Hearing Research 165 (2002) 68^84
Immediately after stimulus onset both characteristics resemble each other. Therefore, the amplitude di¡erence between the responses to the AM-burst and the tone-burst (plotted in Fig. 2e) shows only little activity in the ¢rst 100 ms. The amplitude-di¡erence characteristic shows a steady rise for the ¢rst 250 ms. It keeps constant until the stimulus ends and declines suddenly in the 30 ms after stimulus o¡set. In addition, the time courses of the amplitude and phase at the modulation frequency of 39 Hz were obtained using the modi¢ed Gabor ¢lter. The resulting characteristics are illustrated in Fig. 3c,d in comparison with the stimulus waveform (Fig. 3a) and the time-domain grand average of the response signal (Fig. 3b). Because the amplitude characteristics were scaled relative to the interval from 300 to 500 ms the curve runs within a narrow band of con¢dence limits over this latency range. In the latency range from 100 to 200 ms the narrow con¢dence limits suggest similar time courses of the amplitude among the nine subjects. In contrast, in the ¢rst 100 ms the wide con¢dence limits re£ect large variations between subjects in the ampli-
Fig. 1. SSR to AM tone-burst (upper trace of each pair) and transient GBR to constant-amplitude tone-burst (lower traces) for nine subjects (S1^S9) band-pass ¢ltered between 24 and 60 Hz.
with thin lines. All amplitudes show higher values compared to the time-domain average because individual responses with di¡erent phases, which cancelled out partially in the average, contributed in full to the mean amplitude after Hilbert transform. Especially in the pre-stimulus interval non-coherent activity was reduced in the time average, whereas the mean instantaneous amplitude did not diminish. In the AM response four distinct latency intervals can be distinguished. In the ¢rst 100 ms transient gamma-band activity occurs. From 100 to 200 ms the amplitude of oscillation at the modulation frequency increases monotonically. Around 250 ms after stimulus onset, the amplitude of the oscillations reaches a steady-state level, which then continues until the end of the stimulus. The SSR activity then returns to the resting state within less than 50 ms. The response to stimulation with tone-bursts of constant amplitude (Fig. 2d) shows a transient GBR lasting approximately 100 ms following both the onset and o¡set of the stimulus. Other than these responses the waveform contains only the residual noise. In Fig. 2e the time courses of the instantaneous amplitude of responses to both types of stimuli are shown.
Fig. 2. Grand average GBR and SSR waveforms across nine subjects. (a) AM tone-burst stimulus. (b) GBR and SSR to AM toneburst stimulus (thin line: time-domain average, thick line: grand average using Hilbert transform). (c) Tone-burst stimulus. (d) Like b, GBR to non-modulated tone-burst. (e) Instantaneous amplitudes of GBR and SSR obtained from Hilbert transform, at AM stimulation (thin line), tone-burst stimulation (thick line) and di¡erence between both amplitude characteristics (thick line).
HEARES 3825 16-5-02
B. RoM et al. / Hearing Research 165 (2002) 68^84
75
which are spaced by the period length of AM, are aligned to the phase in the 300^500 ms latency interval. Between 120 and 300 ms the response signal crosses zero before the grid lines. The phase di¡erence decreases linearly over 60‡ which corresponds to a 4.3 ms increase of the response latency. An SSR is de¢ned by a constant phase shift between stimulus and response signal and a constant response amplitude (Regan, 1989). These requirements are ful¢lled if both the scaled amplitude value of 1.0 and zero phase lie inside the corresponding con¢dence intervals. For the amplitude this is the case at about 205 ms (marked by an arrow in Fig. 3c). Another frequently used de¢nition for the end of a transition is the 90% value. This is marked by a horizontal grid line around 200 ms in Fig. 3c. The grand average amplitude characteristic crosses the 90% value at a latency of 210 ms. The con¢dence limits reach the 90% value at 180 and 240 ms. The phase achieves the steady-state criterion about 100 ms later than the amplitude at 305 ms (marked by an arrow in Fig. 3d). The end of the transition interval is more sharply de¢ned for the phases than for the amplitudes. 3.2. Experiment 2 Fig. 3. Amplitude and phase characteristics of the 39 Hz GBR and SSR. (a) AM stimulus. (b) Response waveform grand averaged across all subjects (vertical grid lines are spaced by a stimulus period of 1/39 s and are aligned to mean zero phase in the 300^500 ms time interval). (c) Mean 39 Hz amplitude (thick line) scaled to unity in the 300^500 ms latency interval, and 95% con¢dence limits of the grand average (thin lines). (d) Mean phase di¡erence between response and stimulus signal (thick line) shifted to zero in the 300^ 500 ms latency interval, and 95% con¢dence limits of the grand average (thin lines). The inset shows the sine and cosine versions of the kernel function used for convolution.
tude of the transient evoked GBR. Fig. 3d shows the time course of the phase di¡erence between the stimulus envelope and the response signal. The grand averages of the amplitudes and phases which were measured for the individual subjects are drawn with thick lines and the 95% con¢dence intervals for the mean with thin lines. In the ¢rst 100 ms the con¢dence interval covers a range of 360‡ indicating an arbitrary phase relation between individual response signals and the modulation signal. Therefore, the phase characteristics were discarded in this latency range. In contrast, from about 120 ms until the end of the stimulating burst the mean phase lays within narrow con¢dence limits of less than 15‡. In the time range of 300^500 ms the con¢dence limits do not exceed T 10‡. In the latency interval from 100 to 300 ms the response phase precedes its ¢nal value in the 300^500 ms range. This preceding phase can be observed also in the response waveform pictured in Fig. 3b. In this ¢gure the vertical grid lines,
3.2.1. Example of individual response waveforms Fig. 4 shows the waveforms obtained from a single subject in the second experiment which manipulated the ISI (de¢ned as the interval between stimulus onsets). For an ISI of 1 or 3 s the stimuli lasted 500 ms (Fig. 4a). For an ISI of 500 ms the duration was shortened to 250 ms (Fig. 4b). As in the ¢rst experiments the envelope slopes of the non-modulated tone-burst were identical to the ¢rst rising and the last falling slope of an AM burst. However, the amplitude was kept constant in between. The 1^24 Hz low-pass ¢ltered response waveforms to stimuli with an ISI of 3 s are shown with thick lines in Fig. 4g (AM stimulus) and in Fig. 4h (non-modulated stimulus) contained large N1^P2 waves. When the ISI was shortened to 1 s (Fig. 4e,f), the N1^P2 amplitude decreased by a factor of about three. When the ISI was decreased to 500 ms (Fig. 4c,d), the N1^P2 complex became barely recognizable. The N1^P2 peaks to tone-burst stimulation showed slightly shorter latencies than to AM stimuli. With decreased ISI the N1^P2 latencies increased slightly. The amplitude and latency of the P1 wave, the earliest component of the slow transient response, both increased with decreasing ISI. The latency changed from 40 ms at 3 s ISI to 75 ms at 0.5 s ISI. Unlike the transient responses, the sustained response, observed for both the modulated and the non-modulated stimulus, was una¡ected by ISI (Fig. 4c^h).
HEARES 3825 16-5-02
76
B. RoM et al. / Hearing Research 165 (2002) 68^84
The 1^60 Hz wide-band ¢ltered waveforms, superimposed with thin lines on the low-pass ¢ltered response (Fig. 4c^h), demonstrate the e¡ect of the AM stimulus. Whereas the response to the non-modulated tone-burst shows high-frequency activity only in the ¢rst 100 ms after the stimulus onset, the response to the AM stimulus shows continuous oscillations throughout the stimulus. An enlarged view of the response in the gammafrequency band is given in Fig. 4i^o after band-pass ¢ltering (24^60 Hz). These waveforms of magnetic dipole moment were obtained from source-space projection based on the SSR source coordinates. For both the low-pass- and wide-band ¢ltered waveforms in Fig. 4c^ h source-space projection based on N1 source coordinates was used. In contrast to the slow transient response the amplitude of the SSR does not vary with
Fig. 5. The grand averages of 1^24 Hz band-pass ¢ltered waveforms at AM-burst (thick lines) and non-modulated tone-burst (thin lines) stimulation with ISIs of 0.5, 1.0, and 3.0 s show the slow transient and sustained evoked responses.
the ISI. Also for the three di¡erent ISIs investigated in this experiment, the SSR amplitude reaches its ¢nal value at about 200 ms after stimulus onset. Even though the con¢guration of the transient GBR waveform varies with the di¡erent ISIs, the general amplitude of this response is little a¡ected by the ISI. The GBRs in the ¢rst 100 ms interval after stimulus onset are remarkably similar for the two types of stimuli. However, as in the ¢rst experiment (Fig. 1), there were large variations in waveform morphology among subjects.
Fig. 4. Example of a single subject’s transient response waveforms and SSR to modulated and non-modulated tone-burst using various stimulus timings. (a) AM tone-burst stimulus of 500 ms duration (ISI = 1 or 3 s). (b) AM stimulus of 250 ms duration (ISI = 500 ms). Waveforms c^h were band-pass ¢ltered between 1 and 24 Hz. (c) Response signal to 250 ms AM tone-burst (ISI = 500 ms). (d) Response signal to 250 ms non-modulated tone-burst (ISI = 500 ms). (e) Like c, duration = 500 ms, ISI = 1 s. (f) Like e, non-modulated tone-burst. (g) Like c, duration = 500 ms, ISI = 3 s. (h) Like g, nonmodulated tone-burst. The GBR and SSR waveforms i^o were band-pass ¢ltered between 24 and 60 Hz and are shown in the same order as the waveforms c^h.
3.2.2. Grand average response waveforms The 1^24 Hz band-pass ¢ltered grand average responses to AM-bursts (thick lines) and non-modulated tone-bursts (thin lines) are shown in Fig. 5. The dipole moment waveforms shown in this ¢gure were calculated with the method of source-space projection based on the estimated source coordinates of the SF. This focuses the view onto the sustained response. E¡ectively the amplitude was increased 1.3 times by this procedure because the SF sources were estimated to be 9 mm deeper than N1 in the brain, whereas no signi¢cant change in the amplitude ratio between SF and N1 was observed. The SF amplitudes had the same size for non-modulated tone-burst and AM-burst stimulation at the ISI of 0.5 and 1.0 s and for the non-modulated stimulus at 3 s ISI. The AM-burst at 3 s ISI evoked a larger SF than the non-modulated tone-burst (P = 0.012 for the di¡erence of mean SF amplitude in the 450^500 ms interval). Further di¡erences between the responses at 3 s ISI were longer peak latencies for N1 and P2 and smaller P2 amplitude for the AM-burst (N1 was 7 ms later, P = 0.00001; P2 was 14 ms later, P = 0.004; P2 was 34% smaller, P = 0.0003). The waveforms at 1 s ISI showed the same tendency but only the
HEARES 3825 16-5-02
B. RoM et al. / Hearing Research 165 (2002) 68^84
N1 latency di¡erence was signi¢cant (N1 was 12 ms later in the AM condition, P = 0.017). No signi¢cant di¡erences were found for the low-pass ¢ltered waveforms for both stimulus types at the ISI of 0.5 s. In order to prevent the partial cancellation of nonaligned waves when averaging across subjects, a method of parametric averaging was applied to the 24^60 Hz ¢ltered response data. Individual amplitudes and latencies of succeeding positive and negative peaks were determined. For corresponding peaks the mean amplitude and mean latency were calculated separately. For graphical representation, the waveform between the measured peaks was interpolated using cosine functions. Fig. 6a shows an example of the di¡erent results from conventional time-domain averaging and parametric averaging. The square symbols denote the measured peaks and the thin line connecting these points represents the interpolated waveform. The superimposed thick line shows the average response waveform. For latencies longer than 120 ms both methods lead to a similar result. In contrast, signi¢cant di¡erences between methods are obvious in the ¢rst 100 ms latency interval. Whereas corresponding peak latencies are similar for both methods, the peak amplitudes obtained by time-domain averaging are much smaller. The greatest di¡erence occurred at a latency of 64 ms where this peak was almost completely absent in the time-domain average. The results of parametric averaging across the nine subjects investigated are summarized in Fig. 6b^k. For each of the three ISIs the graphs show the transient response to the non-modulated tone-burst and to the AM tone-burst. The 95% con¢dence intervals for the mean peak amplitudes and latencies are indicated by the dimensions of the error boxes. During the ¢rst 75 ms after stimulus onset none of the peak amplitudes of the di¡erence response di¡ered signi¢cantly from zero. The con¢dence limits for the peak latencies before 50
77
ms are smaller than between 50 and 100 ms (Fig. 6k). This re£ects a synchrony between subjects for the early transient responses and larger inter-individual latency variability for later peaks. After 100 ms the SSR develops and the con¢dence limits for the latency decreases. From 200 ms until the end of the stimulus the con¢dence limits of the mean latency are less than 1 ms for each SSR peak. In the 100^200 ms interval a regression line through the four positive peaks or the corresponding negative peaks crosses the x axis before 60 ms. The mean intersection point for the 3 ISIs is 37 ms. The peak latencies of the four positive and negative de£ections within the latency range of 20^70 ms, shown in Table 1, did not signi¢cantly di¡er with changes in the ISI, or between the modulated and non-modulated stimuli. These four peaks are similar to the MLR peaks. For comparison, MLR latencies which were reported from EEG studies using clicks (Picton et al., 1974) or tone-impulses with 10 ms rise time (Lane et al., 1971) as well from MEG studies with tones of 12.5 ms rise time (Borgmann et al., 2001) are given at the bottom of Table 1. The peak latencies obtained with tone-burst stimulation are the same as those of the MLR evoked by tones with 10^12 ms rise time. Because of the close relationship between the MLR and the transient GBR in the latency range from 20 to 70 ms, these waves are termed MLR components in subsequent analysis. 3.2.3. Amplitude and phase The time courses of the amplitude and phase of the response at the modulation frequency are shown in Fig. 7. Fig. 7a demonstrates the reproducibility between the ¢rst and the second experiment. In both experiments AM tone-bursts with a duration equivalent to 20 periods of the modulation frequency were presented at an ISI of 3 s, but the carrier frequency was 250 Hz in the ¢rst and 1000 Hz in the second experiment. The amplitudes across each group of subjects were scaled relative to the mean value in the time range of 300^500 ms after
Table 1 Latencies of the ¢rst four peaks of the transient GBR in response to non-modulated tone-bursts or AM-bursts with ISIs of 0.5, 1.0, and 3.0 s Stimulus
MLR peak latency (ms) Neg. Pos.
Neg.
Pos.
Tone-burst, ISI = 3.0 s AM-burst, ISI = 3.0 s Tone-burst, ISI = 1.0 s AM-burst, ISI = 1.0 s Tone-burst, ISI = 0.5 s AM-burst, ISI = 0.5 s Mean peak latency:
22 23 22 22 25 22 22 2
33 35 34 33 37 35 35 3
46 47 46 47 49 47 47 4
58 59 56 59 62 59 59 5
MLR component: EEG data (Picton et al., 1974) EEG data (Lane et al., 1971) MEG data (Borgmann et al., 2001)
Na 18 23 22
Pa 30 35 33
Nb 40 45 47
Pb 50 63 63
Peak polarity:
MLR peak latencies as reported in the literature are listed at the bottom.
HEARES 3825 16-5-02
78
B. RoM et al. / Hearing Research 165 (2002) 68^84
trast, the amplitude of the transient response during the ¢rst 100 ms decreases monotonically from 1.5 nAm at 3 s ISI to 1.0 nAm at 500 ms ISI. For all ISIs the amplitude rises with the same slope from 100 to 200 ms. At 3 s ISI the rising slope of the amplitude is delayed by about 20 ms. As in the ¢rst experiment using an ISI of 3 s, the ¢nal transition to steady-state amplitude occurs between 200 and 250 ms for all three ISIs in the second experiment. However, this transition is not precisely timed. The time course of the phase (Fig. 7c) exhibits a wellde¢ned transition into the steady state at 260 ms for all three ISIs. This point was reached about 50 ms earlier than in the ¢rst experiment. After 140 ms the transient response is strongly reduced, and the SSR phase is measured reliably. The SSR phase precedes the phase in the 300^500 ms interval. At 140 ms the phase di¡erence is 35‡ corresponding to 2.5 ms at 0.5 s ISI, 51‡ (3.5 ms) at 1.0 s ISI, and 62‡ (4.3 ms) at 3.0 s ISI. These points are marked with square symbols in Fig. 7c. During the following 120 ms the response phase changes linearly to its ¢nal value in the 250^300 ms latency interval. The direction of the phase shift during the transition interval corresponds to increased SSR latency.
Fig. 6. Summary of parametrically averaged GBR and developing SSR. (a) Across nine subjects grand average response waveform for AM tone-bursts (500 ms duration, 3 s ISI, thick line) in comparison with averaged peak amplitudes and latencies (square symbols). The thin line shows the reconstructed waveform connecting the peaks with cosine functions. (b) Parametrically averaged transient response to non-modulated tone-burst (250 ms duration, 500 ms ISI). (c) Onset of response to AM stimulus of the same timing structure. (d) Di¡erence waveform between c and b, the rectangular boxes denote the 95% con¢dence interval of the averaged peak amplitudes and latencies. (e^g) Like b^d, 500 ms duration, 1 s ISI. (h^k) Like e^g, ISI = 3 s. (l) Onset of non-modulated tone-burst stimulus. (m) AM stimulus signal 250 Hz carrier frequency, 40 Hz modulation frequency.
stimulus onset. From 120 ms after stimulus onset until the end of the stimulus the amplitudes match almost perfectly. The o¡set slope of the amplitude displays a shift in time between the two experiments because the stimulus duration was 512 ms in the ¢rst experiment and 500 ms in the second experiment. The dependence of the grand average amplitudes upon the ISI is shown in Fig. 7b. In contrast to Fig. 7a the SSR amplitudes are not normalized. Although the amplitude characteristics show some £uctuations, it is obvious that the mean steady-state amplitude is independent of the ISI. Even at the shortest investigated ISI of 500 ms the SSR amplitude reached the same ¢nal value of 2.6 nAm as at the longest ISI of 3 s. In con-
Fig. 7. Amplitude and phase characteristics at the modulation frequency. (a) Comparison of response amplitudes obtained from the ¢rst (dash-dotted line) and second experiment (solid line) using 500 ms AM tone-bursts at an ISI of 3 s. (b) Response amplitude using AM tone-bursts of various timing parameters. (c) Characteristics of phase di¡erence between response and stimulus signal.
HEARES 3825 16-5-02
B. RoM et al. / Hearing Research 165 (2002) 68^84
Fig. 8. Grand averages across nine subjects of individual distances between source locations of MLR and SSR and source location of N1 wave in a head-based Cartesian coordinate system. The ellipses denote the 95% con¢dence regions of the grand averages.
3.2.4. Source localization The results of the source analysis are summarized in Fig. 8. In order to reduce the e¡ect of individual anatomical variations in this ¢gure the di¡erent source coordinates are given relatively to the source location of the corresponding N1 source for each subject (for the 3 s ISI recording). Fig. 8 shows the grand average source coordinates across subjects as well as the 95% con¢dence regions, which were obtained from bootstrap resampling. The con¢dence region of the SSR source location is smaller compared to that of each MLR component, because for each wave of the SSR multiple source estimations were feasible whereas the transient MLR waves occurred only once in the response signal. The measured source locations are all signi¢cantly different from the location of the N1 source. The sources of SSR were estimated to be 4.5 mm anterior, 6 mm medial, and 4.5 mm inferior to the sources of the N1 response. The SSR source locations are signi¢cantly di¡erent from all MLR source locations in the inferior^superior direction, but the con¢dence regions overlap in the two other directions. The con¢dence regions of all MLR source coordinates overlap in all three directions and cannot be separated signi¢cantly. The MLR source coordinates lay within a small cubic volume of 5 mm edge lengths centered about 7 mm anterior, 7 mm medial, and 2 mm superior to the sources of the N1 response. The SF sources were estimated to be 5 mm anterior and 9 mm medial to the sources of the N1 response, and the con¢dence intervals overlapped with both the SSR and MLR sources.
79
3.2.5. Superimposition of MLR The linear superimposition of periodically evoked transient responses is a common explanation for the SSR. Such a model assumes that all modulation periods evoke identical transient responses. Successive responses are delayed by the modulation period and may partially overlap. The sum of all delayed responses ¢nally forms the SSR. In this study single transient responses were obtained from tone-burst stimulation and served as sample functions for the linear superimposition model. This choice implicitly assumes that the stimulus onset is the e¡ective stimulus which is periodically repeated and that the falling slopes of the AM stimulus contribute little to the response. The SSR resulting from this model simulation was compared to the SSR evoked with AM-bursts. An individual auditory evoked response to the nonmodulated tone-burst (Fig. 9a) is shown in Fig. 9b after applying a 24^60 Hz band-pass ¢lter to the signal. The transient response waveform consists of two pairs of negative and positive de£ections within the ¢rst 100 ms after stimulus onset. Replications of the sample function, each delayed by multiples of 25 ms, are shown
Fig. 9. Reconstruction of SSR waveform by linear superimposition of shifted transient response. (a) Tone-burst stimulus. (b) Single subject’s transient response in the gamma-frequency band. (c^g) Shifted versions of the transient response according to the periodicity of the modulation frequency shown in half scale compared to the original version in a. (h) Compound response signal from superimposition of shifted sample functions c^g. (i) Response signal to AM toneburst stimulation obtained from the same subject. (k) AM toneburst stimulus.
HEARES 3825 16-5-02
80
B. RoM et al. / Hearing Research 165 (2002) 68^84
in Fig. 9c^g. The superimposition of 20 consecutive sample functions results in a compound response signal (Fig. 9h). After about 50 ms of transition interval, the modeled response signal oscillates with constant amplitude at the rate of 40 Hz. For comparison, the experimentally obtained response signal to AM tone-burst stimulation is displayed in Fig. 9i. The transient response during the ¢rst 70 ms resembles that of the response to the non-modulated tone-burst almost perfectly. Consequently, during these ¢rst 70 ms no essential di¡erence between real and arti¢cial response waveform can be ascertained. In contrast, the dramatic drop of response amplitude at 100 ms and recovery 200 ms after stimulus onset do not appear in the arti¢cial response waveform. In the steady-state period (330^500 ms) the actual response shows a phase di¡erence of 60‡ from the modeled response.
4. Discussion The main goal of these experiments was to examine the development of the auditory SSR evoked by AM sound, in order to determine whether the time course of this development is related to auditory temporal integration. This goal could only be addressed after disentangling the auditory SSR from other auditory responses to the onset of the AM tone. The time course of the responses in the gamma-frequency band obtained in both experiments showed a deep depression of activity around 100 ms after stimulus onset (Figs. 2b and 9i). This is the peak latency of the N1 wave. Typically the N1 amplitude reaches 20^40 times the amplitude of the GBR. A possible explanation of the response pattern would therefore be that the SSR developed immediately but was then transiently inhibited by the processes underlying the N1 activity. In order to investigate this possible explanation, shorter ISIs were used in the second experiment. Decreasing the ISI decreased the N1 amplitude, with almost no N1 activity occurring at an ISI of 0.5 s. In contrast, the pattern of the GBR did not change signi¢cantly with ISI (Figs. 4 and 7). The hypothesis that the gamma-band activity is inhibited by the N1 activity has to be rejected. If the GBR cannot be considered as a continuous response that is attenuated during the N1, then the initial portion of the response is a di¡erent process from the later sustained oscillations. This is the transient GBR, which occurs when either a modulated toneburst or a non-modulated tone-burst stimulus is presented. Within a single subject these responses show almost identical peak latencies and amplitudes for the two kinds of stimuli, although they show noticeable variations between subjects. The components of the
GBR are well de¢ned by their peak latencies. The peaks of the transient MLR are generally identical with the waves of the early GBR, with the ¢rst four waves being equivalent to the MLR components Na, Pa, Nb and Pb. In our study the peak latencies were about 4.5 ms longer for Na and Pa, about 7 ms longer for Nb and about 9 ms longer for Pb than the latencies obtained in EEG studies with rapid rising stimuli (Picton et al., 1974). The reason for these prolongations is likely the long-lasting 12.5 ms rise time of the stimulus, which has been used in this study. In an EEG study with tones of 10 ms rise time Lane et al. (1971), and in a recent MEG study with 12.5 ms stimulus rise time Borgmann et al. (2001) reported MLR peak latencies which resemble the latencies of the early peaks of the GBR found in this study (Table 1). Thus, the data presented here provide further evidence for the close relationship between MLR and transient evoked GBR, as was proposed by Basar et al. (1987). However, the GBR is not identical with the transient MLR. At peak latencies later than 50 ms the GBR is more variable, and the signal is better de¢ned by its spectral attributes. Additionally, individual responses may contain non-phase-locked gammaband activity, which is not seen in the averaged signal. The GBR and MLR can also be distinguished according to their variation with ISI. The signi¢cant increase of the GBR at an ISI longer than 1 s, reported by Makeig (1990), was not observed in the present data. Even though the amplitude of the transient GBR varies in amplitude and latency between subjects, the response con¢guration remains relatively stable within an individual subject. This means that the transient GBR at the onset of the response to the AM stimulus can be eliminated by subtracting the response to a nonmodulated tone from the response to the modulated tone. No signi¢cant response activity remains in the ¢rst 80 ms of the di¡erence signal. This result was con¢rmed for di¡erent ISI values. Although, the spectral content of the transient GBR is centered around 40 Hz, which equals the modulation frequency of the AM stimulus, the transient GBR does not closely follow the modulation. The frequency of the oscillations in the ¢rst 100 ms is not exactly equal to the modulation frequency and the phase relationship between the response and the signal changes with time. The oscillatory activity builds up between 80 and 250 ms after stimulus onset and continues as an SSR develops a stable frequency and a stable phase relationship to the stimulus. The transient GBR of the ¢rst 80^100 ms can thus be clearly distinguished from the SSR. Further arguments for the functional di¡erences of the transient GBR and the SSR can be deduced from the di¡erent behavior of the responses in relation to stimulus parameters. The amplitude of the transient GBR was reduced by one-third when the ISI was short-
HEARES 3825 16-5-02
B. RoM et al. / Hearing Research 165 (2002) 68^84
ened from 3 to 0.5 s. This dependency with regard to the ISI is similar to that previously reported by Pantev (1995) and Pantev et al. (1993). In contrast, the amplitude of the simultaneously recorded SSR did not change with ISI variation. Within the steady-state time interval, the 95% con¢dence limits for the peak latency obtained from the group of nine subjects were signi¢cantly smaller than 1 ms. This points to a highly reproducible phase relation between stimulus and response signal common to all subjects. In contrast, during the ¢rst 100 ms the phase of the response signal to the AM stimulus resembles the phase of the transient GBR to non-modulated tone-burst instead of being phase coupled to the AM stimulus. The distinction between GBR and SSR was also demonstrated by the repetitive superimposition of the transient response waveform, which did not reproduce the phase relation between the stimulus and the SSR. Furthermore, MEG source analysis determined separate source locations for the GBR and the SSR. Because magnetic resonance images were not available for the subjects who participated in this study, the estimated source locations could not be overlaid on the corresponding individual anatomical structures. Instead, the source locations of SSR and GBR were compared with the sources of the N1 response that was estimated from the slow evoked cortical activity in response to the tone-bursts presented with long ISI (Engelien, 2000). Both the sources of the GBR and the sources of the SSR were shifted in anterior and medial directions compared to the N1 source location, which is consistent with previous results (Pantev et al., 1996). Both the GBR and the SSR are most likely generated in primary areas of the auditory cortex. The source locations of the ¢rst four peaks of transient GBR within the latency interval between 20 and 60 ms, which might be related to the MLR components Na, Pa, Nb and Pb, were found within a cube of 5 mm edges. The signal^noise ratio of the MLR did not allow to distinguish between the locations of the di¡erent MLR components, although it was possible to separate SSR from MLR source locations. The underlying model of a single ECD may not completely explain these two di¡erent cortical activities. The estimated sources indicate only the centers’ activation and not their extension. Nevertheless, the clearly separate sources indicate that di¡erent neuronal structures must be responsible for the transient and steady-state activity. In a positron emission tomography study (Gri⁄ths et al., 1998) and recently in a functional magnetic resonance imaging study (Gri⁄ths et al., 2001) the processing of regular temporal structures in acoustical signals with periodicities between 10 and 100 ms has been localized to the primary auditory cortex. In a patient, who acquired word deafness following a focal lesion
81
of the primary auditory cortex, Phillips and Farmer (1990) showed that the primary auditory cortex has a special role in processing auditory events in the time range of tens of milliseconds. From these results and from the results of the current study, the SSR appears to be involved in processing the time structure of AM sounds in the auditory cortex. Ma«kela« and Hari (1987) argued that the SF may also re£ect the processing of the 40 Hz rhythm. In our study, an enhanced SF to the AM stimulus in comparison with the non-modulated tone-burst was found at an ISI of 3 s, but not at shorter ISIs. Furthermore, AM signi¢cantly a¡ected N1 and P2 peak latencies and P2 amplitude. Although the stimulus intensity and stimulus onset were carefully matched between both stimulus types, the N1 latencies were signi¢cantly longer for the AM signal. This may re£ect the higher complexity of the AM stimulus. The development of the SF may re£ect the integration of sound energy without regard to its modulation. However, it is extremely di⁄cult to view this development because of the superimposed N1^P2 complex. Also it is di⁄cult to subtract this out using brief tones as a control since the N1^P2 also integrates sound energy over a short period of time. After subtracting the transient response evoked by the non-modulated tone-burst from the response to the AM tone-burst, the di¡erence waveform shows a response at the frequency of modulation. The amplitude of this response increases linearly from a beginning at about 40 ms to reach a stable value between 200 and 250 ms after stimulus onset. The linear onset slope lasting about 200 ms is in distinct contrast to the sudden decay of the steady-state activity after the end of the stimulus. Thus, the onset and the o¡set of the SSR seem to re£ect di¡erent processes in the underlying neural system. Increasing amplitude of the averaged response during SSR onset results either from higher synchronization between the stimulus and the activated neurons or from increased number of neurons involved in response generation. Furthermore, the decreasing response signal phase during the transition interval corresponds to increasing latency. Since this latency increased by at least 4 ms, it might re£ect more complex processing. If oscillatory SSR is related to AM, the time courses of both the amplitude and the phase of the SSR re£ect a progressive enhancement of AM processing within the cortex. We therefore suggest that the development of the SSR re£ects auditory temporal integration at the level of the primary auditory cortex. Most psychoacoustical investigations have not used stimulus durations beyond about 1 s. Whereas the exponential model clearly describes an asymptotic approach to a ¢nal threshold, the power function model does not indicate any end to the integration (although it
HEARES 3825 16-5-02
82
B. RoM et al. / Hearing Research 165 (2002) 68^84
does become very slow in terms of actual rather than exponential time). Plomp and Bouman (1959) increased the stimulus duration up to 10 s, and the listener’s detection performance improved up to about 1^2 s. The duration of the rising SSR slope of about 200 ms agrees with the main changes of psychoacoustic temporal integration, but does not suggest any integration beyond that period. In principle, the perception of temporal structures is a hierarchical process (Po«ppel, 1997). It is therefore unlikely that temporal integration ranging from milliseconds to seconds is physiologically realized in a single stage. Even during the ¢rst 200 ms interval after stimulus onset, di¡erent processing strategies can occur because threshold characteristics showed a slope of 310 dB/decade in the ¢rst 100 ms and 35 dB/decade for longer stimulus durations (Green et al., 1957; Buus, 1999). Both the time course of the SSR in the transition interval as well as the characteristic of sensitivity improvement show a smooth approximation of their ¢nal values at the end of the auditory integration interval. This behavior restricts an exact determination of the duration of the integration interval. By means of SSR the phase relation between the stimulus envelope and the response waveform can be used as an additional indicator of the achievement of the steady-state condition. During the ¢rst 300 ms the phase of the response signal precedes its steady-state value and shows a linear reduction of the phase di¡erence with increasing time. At the end of the integration interval the phase characteristic exhibits a well-de¢ned bend into a horizontal line. It is assumed that the interval in which the SSR builds up is related to temporal integration after stimulus onset. More precise determination of the duration of this interval can be obtained using the phase characteristic, instead of the time course of the SSR amplitude. Viemeister (1979) reported a higher detection threshold when the AM stimuli were presented as bursts with silent periods in between compared to a continuous carrier condition. This e¡ect was discussed as interference of the stimulus onset with the detection of modulation. The stimulus onset masks the ¢rst few cycles of the AM. Sheft and Yost (1990) stated that the AM stimulus onset introduces another modulation, which may interfere with modulation detection analogously to modulation masking (Yost et al., 1989). With respect to this onset e¡ect in the model of multiple looks, in which the duration of each look corresponds to the AM period, they attributed smaller weights to the ¢rst periods of the modulating signal. The transient GBR which is related to the stimulus onset and precedes the developing SSR might be an electrophysiological correlate of the ‘onset insu⁄ciency’ of AM detection (Sheft and Yost, 1990). During the ¢rst 100 ms the onset response
is dominant, and after its decay the response to the AM becomes the most pronounced brain activity. In conclusion, these results provide an instantaneous time course of activation of the human primary auditory cortex at the onset of an AM stimulus. This time course re£ects auditory temporal integration in the sense of the auditory system’s ability to combine information over time. In previously used methods based on the N1 responses or MMN, the time course of the temporal integration characteristic was reconstructed step by step after sequentially repeated measurements with di¡erent stimulus timings. In contrast, the application of the SSR method allows the exact determination of the time to achieve the steady state, which is equivalent to the end of the integration interval. The ability to measure the duration of the auditory integration interval is important and may lead to further investigations of temporal processing of auditory information. Florentine et al. (1988) reported that listeners with cochlear impairment show less temporal integration than normal listeners. Also psychiatric patients showed di¡erent integration of auditory input over time (Babko¡ et al., 1980). The actual physiological mechanisms of temporal integration are not well understood. Relatively simple neural networks can show intrinsic oscillations and can synchronize with the rhythm of the AM of the auditory input (for a review see Je¡ereys et al., 1996). May and Tiitinen (2001) demonstrated that recurrent connections of excitatory and inhibitory cells within a cortical microcolumn can explain the cortical representation of periodical sound structures. Such a neuronal model acts like a sharply tuned band-pass ¢lter. The narrower this band-pass is tuned, the longer are the rise and decay times of its output in response to a sustained input. This model might be able to explain the slowly rising onset of the SSR. However, our experimental data have shown di¡erent SSR rise and decay times. Also Dau et al. (1997) modeled auditory processing of AM with a relatively tuned ¢lter which cannot explain a transition interval of some 100 ms duration. Despite of its long integration time the human cortex responds immediately to changes of the periodical input. Therefore, any neural model for the generation of auditory SSR would have to be more complex than a simple ¢lter. Further experimental work will be necessary in order to determine this temporal behavior.
Acknowledgements This work was supported by grants from the Deutsche Forschungsgemeinschaft (No. Pa392/7-2) and the Canadian Institutes for Health Research.
HEARES 3825 16-5-02
B. RoM et al. / Hearing Research 165 (2002) 68^84
References Alain, C., Woods, D., Covarrubias, D., 1997. Activation of durationsensitive auditory cortical ¢elds in humans. Electroencephalogr. Clin. Neurophysiol. 104, 531^539. Babko¡, H., Sutton, S., Zubin, J., Har-Even, D., 1980. A comparison of psychiatric patients and normal controls on the integration of auditory stimuli. Psychiatry Res. 3, 163^178. Bacon, S., Viemeister, N., 1985. Temporal modulation transfer functions in normal-hearing and hearing-impaired listeners. Audiology 24, 117^134. Basar, E., Basar-Eroglu, C., Greitschus, F., 1987. The association between 40 Hz-EEG and the middle latency response of the auditory evoked potential. Int. J. Neurosci. 33, 103^117. Borgmann, C., RoM, B., Draganova, R., Pantev, C., 2001. Human auditory middle latency responses: in£uence of stimulus type and intensity. Hear. Res. 158, 57^64. Budd, T., Mitchie, P., 1994. Facilitation of the N1 peak of the auditory ERP at short stimulus intervals. NeuroReport 5, 2513^2516. Buus, S., 1999. Temporal integration and multiple looks, revisited: weights as a function of time. J. Acoust. Soc. Am. 105, 2466^ 2475. Dau, T., Kollmeier, B., Kohlrausch, A., 1997. Modelling auditory processing of amplitude modulation. II. Spectral and temporal integration. J. Acoust. Soc. Am. 102, 2906^2919. Davison, A., Hinkley, D., 1997. Bootstrap Methods and their Application. Cambridge University Press, Cambridge. de Boer, E., 1985. Auditory time constants: A paradox? In: Michelsen, A. (Ed.), Time Resolution in Auditory Systems. Springer, Berlin, pp. 141^158. Engelien, A., Schulz, M., RoM, B., Arolt, V., Pantev, C., 2000. A combined functional in vivo measure for primary and secondary auditory cortices. Hear. Res. 148, 153^160. Florentine, M., Fastl, H., Buus, S., 1988. Temporal integration in normal hearing, cochlear impairment, and impairment simulated by masking. J. Acoust. Soc. Am. 84, 195^203. Gage, N., Roberts, T., 2000. Temporal integration: re£ections in the M100 of the auditory evoked ¢eld. NeuroReport 11, 2723^2726. Galambos, R., Makeig, S., 1988. Dynamic changes in steady-state responses. In: Basar, E. (Ed.), Dynamics of Sensory and Cognitive Processing by the Brain. Springer, Berlin, pp. 103^122. Galambos, R., Makeig, S., Talmacho¡, P., 1981. A 40-Hz auditory potential recorded from the human scalp. Proc. Natl. Acad. Sci. USA 78, 2643^2647. Gerken, G., Bhat, V., Hutchison-Clutter, M., 1990. Auditory temporal integration and the power function model. J. Acoust. Soc. Am. 88, 767^778. Green, D., Birdsall, T., Tanner, W., 1957. Signal detection as a function of signal intensity and duration. J. Acoust. Soc. Am. 29, 523^ 531. Gri⁄ths, T., Bu«chel, C., Frackowiak, R., Patterson, R., 1998. Analysis of temporal structure in sound by the human brain. Nature Neurosci. 1, 422^427. Gri⁄ths, T., Uppenkamp, S., Johnsrude, I., Josephs, O., Patterson, R., 2001. Encoding of temporal regularity of sound in the human brainstem. Nature Neurosci. 4, 633^637. Ha«ma«la«inen, M., Hari, R., Ilmoniemi, R.J., Knuutila, J., Lounasmaa, O., 1993. Magnetoencephalography ^ theory, instrumentation, and applications to noninvasive studies of the working human brain. Rev. Mod. Phys. 65, 413^497. Hari, R., Ha«ma«la«inen, M., Joutsiniemi, S., 1989. Neuromagnetic steady-state responses to auditory stimuli. J. Acoust. Soc. Am. 86, 1033^1039. Hughes, J., 1946. The thresholds of audition for short periods of stimulation. Proc. R. Soc. London B133, 486^490.
83
Ilmoniemi, R.J., Williamson, S.J., Hostetler, W.E., 1987. New method for the study of spontaneous brain activity. In: Atsumi, K., Kotani, M., Ueno, S., Katila, T., Williamson, S.J. (Eds.), Biomagnetism ’87. Tokyo Denki University Press, Tokyo, pp. 182^185. Jacobson, G., Fitzgerald, M., 1997. Auditory evoked gamma band potential in normal subjects. J. Am. Acad. Audiol. 8, 44^52. Je¡ereys, J., Traub, R., Whittington, M., 1996. Neuronal networks for induced ‘40 Hz’ rhythms. Trends Neurosci. 19, 202^208. Jerger, J., Chmiel, R., Frost, J.D.J., Coker, N., 1986. E¡ect of sleep on the auditory steady state evoked potential. Ear Hear. 7, 240^ 245. Kuwada, S., Batra, R., Maher, V.L., 1986. Scalp potentials of normal and hearing-impaired subjects in response to sinusoidally amplitude-modulated tones. Hear. Res. 21, 179^192. Lane, R., Kupperman, G., Goldstein, R., 1971. Early components of the averaged electroencephalic response in relation to rise^decay time and duration of pure tones. J. Speech Hear. Res. 14, 408^415. Lee, J., 1994. Amplitude modulation rate discrimination with sinusoidal carriers. J. Acoust. Soc. Am. 96, 2140^2147. Lee, J., Bacon, S., 1997. Amplitude modulation depth discrimination of a sinusoidal carrier: e¡ect of stimulus duration. J. Acoust. Soc. Am. 101, 3688^3693. Linden, R., Campbell, K., Hamel, G., Picton, T., 1985. Human auditory steady state evoked potentials during sleep. Ear Hear. 6, 167^ 174. Loveless, N., Leva«nen, S., Jousma«ki, V., Sams, M., Hari, R., 1996. Temporal integration in auditory sensory memory: neuromagnetic evidence. Electroencephalogr. Clin. Neurophysiol. 100, 220^228. Makeig, S., 1990. A dramatic increase in the auditory middle latency response at very slow rates. In: Brunia, C., Gaillard, A., Kok, A. (Eds.), Psychophysiological Brain Research. Tilburg University Press, Tilburg, pp. 56^60. Ma«kela«, J., Hari, R., 1987. Evidence for cortical origin of the 40 Hz auditory evoked response in man. Electroencephalogr. Clin. Neurophysiol. 66, 539^546. May, P., Tiitinen, H., 2001. Human cortical processing of auditory events over time. NeuroReport 12, 573^577. McEvoy, L., Leva«nen, S., Loveless, N., 1997. Temporal characteristics of auditory sensory memory: neuromagnetic evidence. Psychophysiology 34, 308^316. Noreen, E., 1989. Computer Intensive Methods for Testing Hypotheses. Wiley, New York. Onishi, S., Davis, H., 1968. E¡ects of duration and rise time of tonebursts on evoked V-potentials. J. Acoust. Soc. Am. 44, 582^591. Pantev, C., 1995. Evoked and induced gamma-band activity of the human cortex. Brain Topogr. 7, 321^330. Pantev, C., Elbert, T., Makeig, S., Hampson, S., Eulitz, C., Hoke, M., 1993. The relationship of transient and steady-state auditory evoked ¢elds. Electroencephalogr. Clin. Neurophysiol. 88, 389^ 396. Pantev, C., Eulitz, C., Elbert, T., Hoke, M., 1994. The auditory evoked sustained ¢eld: origin and frequency dependence. Electroencephalogr. Clin. Neurophysiol. 90, 82^90. Pantev, C., Roberts, L.E., Elbert, T., RoM, B., Wienbruch, C., 1996. Tonotopic organization of the sources of human auditory steadystate responses. Hear. Res. 101, 62^74. Phillips, D., Farmer, M., 1990. Acquired word deafness and the temporal grain of sound representation in the primary auditory cortex. Behav. Brain Res. 40, 85^94. Picton, T.W., Hillyard, S.A., Krausz, H.I., Galambos, R., 1974. Human auditory evoked potentials. I. Evaluation of components. Electroencephalogr. Clin. Neurophysiol. 36, 179^190. Picton, T.W., Woods, D.L., Proulx, G.B., 1978. Human auditory sustained potentials. I. The nature of the response. Electroencephalogr. Clin. Neurophysiol. 45, 186^197.
HEARES 3825 16-5-02
84
B. RoM et al. / Hearing Research 165 (2002) 68^84
Picton, T.W., Skinner, C., Champagne, S., Kellet, A., Maiste, A., 1987. Potentials evoked by the sinusoidal modulation of the amplitude or frequency of a tone. J. Acoust. Soc. Am. 82, 165^178. Plomp, R., Bouman, A., 1959. Relation between hearing threshold and duration for tone pulses. J. Acoust. Soc. Am. 31, 749^758. Plourde, G., Picton, T.W., 1990. Human auditory steady state responses during general anesthesia. Anesth. Analg. 71, 460^468. Po«ppel, E., 1997. A hierarchical model of temporal perception. Trends Cogn. Sci. 1, 56^61. Rees, A., Green, G., Kay, R., 1986. Steady-state evoked responses to sinusoidally amplitude-modulated sounds recorded in man. Hear. Res. 23, 123^133. Regan, D., 1989. Human Brain Electrophysiology: Evoked Potentials and Evoked Magnetic Fields in Science and Medicine. Elsevier, New York. Robinson, S.E., 1989. Theory and properties of lead ¢eld synthesis analysis. In: Williamson, S., Hoke, M., Stroink, G., Kotani, M. (Eds.), Advances in Biomagnetism. Plenum Press, New York, pp. 599^602. Robinson, S.E., Rose, D.F., 1992. Current source image estimation by spatially ¢ltered MEG. In: Hoke, M., Erne¤, S., Okada, Y., Romani, G. (Eds.), Biomagnetism: Clinical Aspects. Excerpta Medica, Amsterdam, pp. 761^765. RoM, B., Borgmann, C., Draganova, R., Roberts, L., Pantev, C., 2000. A high-precision magnetoencephalographic study of human auditory steady-state responses to amplitude-modulated tones. J. Acoust. Soc. Am. 108, 679^691.
Santarelli, R., Conti, G., 1999. Generation of auditory steady-state responses: linearity assessment. Scand. Audiol. 28 (Suppl. 51), 23^ 32. Sheft, S., Yost, A., 1990. Temporal integration in amplitude modulation detection. J. Acoust. Soc. Am. 88, 796^805. Sinkkonen, J., Tiitinen, H., Na«a«ta«nen, R., 1995. Gabor ¢lters: an informative way for analyzing event-related brain activity. J. Neurosci. Methods 56, 99^104. Sussman, E., Winkler, I., Ritter, W., Alho, K., Na«a«ta«nen, R., 1999. Temporal integration of auditory stimulus deviance as re£ected by the mismatch negativity. Neurosci. Lett. 264, 161^164. Viemeister, N., 1979. Temporal modulation transfer functions based upon modulation thresholds. J. Acoust. Soc. Am. 66, 1364^ 1380. Viemeister, N., Wake¢eld, G., 1991. Temporal integration and multiple looks. J. Acoust. Soc. Am. 90, 858^865. Watson, C., Gengel, R., 1969. Signal duration and signal frequency in relation to auditory sensitivity. J. Acoust. Soc. Am. 46, 989^997. Winkler, I., Czigler, I., Jaramillo, M., Paavilainen, P., Na«a«ta«nen, R., 1998. Temporal constraints of auditory event synthesis: evidence from ERPs. NeuroReport 9, 495^499. Yabe, H., Tervaniemi, M., Reinikainen, K., Na«a«ta«nen, R., 1997. Temporal window of integration revealed by MMN to sound omission. NeuroReport 8, 1971^1974. Yost, W., Sheft, S., Opie, J., 1989. Modulation interference in detection and discrimination of amplitude modulation. J. Acoust. Soc. Am. 86, 2138^2147.
HEARES 3825 16-5-02