Neuroscience Letters 329 (2002) 29–32 www.elsevier.com/locate/neulet
The influence of critical bands on neuromagnetic fields evoked by speech stimuli in humans Klaus Mathiak a,b,*, Ingo Hertrich a, Werner Lutzenberger b, Hermann Ackermann a a
Department of Neurology, Universita¨t Tu¨bingen, Otfried-Mu¨ller-Strasse 47, 72076 Tu¨bingen, Germany b MEG Center, Universita¨t Tu¨bingen, Otfried-Mu¨ller-Strasse 47, 72076 Tu¨bingen, Germany Received 5 March 2002; received in revised form 16 May 2002; accepted 16 May 2002
Abstract The various classes of speech sounds differ in their configuration of acoustic features. Vowels are characterized by specific local maxima of spectral energy distribution (formants). Using whole-head magnetoencephalography, the impact of variation of the first (F1) and second formant (F2) on the evoked N1m component (100 ms latency) was studied based on an oddball design. F1 changes yielded N1m enhancements in parallel to the spectral distance between standard and deviant stimuli. By contrast, F2 shifts gave rise to a non-linear relationship: the N1m effect flattened out above a range of two Barks. This frequency domain accords to critical band characteristics of the peripheral and central auditory system. The differences of early neuronal encoding of both formants relate to the predominant role of F2 for the encoding of stop consonants. q 2002 Elsevier Science Ireland Ltd. All rights reserved. Keywords: Speech perception; Vowel space; Magnetoencephalography; Critical bands; Mismatch negativity; Formants
Any series of discrete auditory events reliably evokes middle- and long-latency electroencephalographic (EEG) responses, provided that the interval between successive stimuli exceeds some short minimal duration (about 300 ms, [12]). The activity pattern comprises, among others, a positive peak about 50 ms after stimulus onset (P50) and a following negative one (latency approx. 100 ms, N1). Evoked magnetic fields exhibit deflections at similar latencies denoted as P50m and N1m, respectively. Randomly interspersed deviant stimuli within a sequence of homogeneous acoustic events (oddball design) yield an enhanced reaction: The difference between the response to the deviant and the response to the standard is termed mismatch negativity (MMN, derived from EEG) or mismatch field (MMNm, the magnetoencephalography (MEG) counterpart). Dipole source analyses indicate both N1m and MMNm to be generated at the level of the supratemporal plane, but to exhibit different topographic distributions [2]. These mismatch responses are considered correlates of early cognitive operations in terms of sensory memory processes [12].
* Corresponding author. Tel.: 149-7071-29-87708; fax: 1497071-29-5706. E-mail address:
[email protected] (K. Mathiak).
Within some limits, the various classes of speech sounds (phonemes) of any language system can be characterized in terms of rather specific features of the acoustic signal. Vowels differ in their formant structure, i.e. in the location of the maxima of spectral energy distribution reflecting the filter characteristics of the vocal tract. Perception of these sounds predominantly depends upon the first (F1) and second formant (F2). Stop consonants preceding a vowel as, e.g. in /ba/, are cued by an up- or down-going shift of spectral energy distribution (formant transients). A variety of studies revealed mismatch reactions to be sensitive to dynamic and to steady-state aspects of acoustic speech parameters [1,16,17]. In contrast to consonants, Diesch and Luce [3] found the vowel mismatch, first, to be characterized by a shorter latency as well as a higher amplitude and, second, to be generated at a more anterior or lateral location. These findings were considered to reflect better discriminability or slower auditory decay of vocalic as compared to consonantal contrasts. In accordance with these data, a previous study of our group found the syllable /bi/ within a series of vowel /i/ to elicit a typical mismatch response peaking at about 170 ms [7–9]. Furthermore, a parallel development of transient rise time and MMNm could be noted: the longer the transient rise time the higher the MMNm amplitude. Apparently, deviance detection quantitatively depends on the spectral distance between rare and frequent stimuli.
0304-3940/02/$ - see front matter q 2002 Elsevier Science Ireland Ltd. All rights reserved. PII: S03 04 - 394 0( 0 2) 00 57 2- 4
30
K. Mathiak et al. / Neuroscience Letters 329 (2002) 29–32
In order to further investigate the influence of spectral dissimilarity upon N1m within the context of an oddball design [13], two experiments were performed which systematically varied the center frequency of F1 and F2 of the rare events. The first experiment of the present study considered a synthesized vowel /u/ (F0 ¼ 100 Hz, i.e. the pitch level of a male voice; F1 ¼ 250, F2 ¼ 650, F3 ¼ 2500, F4 ¼ 3600, F5 ¼ 4500 Hz; duration ¼ 90 ms; see Fig. 1) as the frequent stimulus of an oddball experiment. A series of eight deviants was created by a gradual increase of the second formant of /u/ in steps of 9% each (deviant #1: F2 ¼ 709 Hz, #2: F2 ¼ 773 Hz,…, #8: F2 ¼ 1300 Hz; compare /y/ in Fig. 1). Generation of vowel stimuli relied on an additive synthesis procedure. In short, each of the formants was modeled as an amplitudeand frequency-modulated sinusoid phase-locked to the fundamental frequency [4]. In order to test the effect of F2 decrease or variation of F1 on the response amplitudes, a second experiment was performed with vowel /e/ as the standard stimulus (F0 ¼ 126, F1 ¼ 600, F2 ¼ 2000, F3 ¼ 3400, F4 ¼ 4600 Hz)
Fig. 1. (A) Upper panel: Stylized spectrograms of the vowel /u/ and /e/, respectively, including the fundamental frequency (green dashed line), first and second (blue) and higher (green straight) formants. (B) Lower panel: Vowel space with typical German vowel configurations [6] across the first (horizontal axis) and the second formant (vertical axis): Triangles symbolize frequent (filled) and deviant (open) vowels in Experiment 1, circles refer to Experiment 2.
and two sets of seven deviants extending either in F1 from 300 to 504 or in F2 from 1000 to 1682 Hz, respectively. Twelve paid right-handed native speakers of standard German (age 26–40 years, median ¼ 31 years; seven females) participated in the first, 13 subjects (age 19–40, median ¼ 26 years; seven females) in the second experiment. At clinical examination, the volunteers showed unimpaired hearing sensitivity (250–4000 Hz, ^10 dB). None of them had a history of any relevant audiological or neurological disorders. Auditory evoked magnetic fields were recorded by means of a 151-channel whole-head gradiometer (CTF System Inc., Vancouver, Canada) within an electromagnetically shielded room (anti-aliasing filtering ¼ 80 Hz, sampling rate ¼ 250 Hz). The three blocks of the first experiment comprised 450 sweeps each (sweep length ¼ 548 ms, pre-stimulus baseline ¼ 148 ms, inter-stimulus-interval (ISI) ¼ 610 ms). Within the blocks, 90 rare events were randomly selected from the set of deviants ( ¼ 20% of the events, each rare stimulus occurred about 34 times). Apart from shorter ISI ( ¼ 365 ms) and sweep length ( ¼ 360 ms), the four blocks of the second experiment had the same design (i.e. about 23 times each deviant). All sweeps from a given subject were averaged after eyeblink artifact rejection. Source analysis assumed a simplesphere head model and two tangential dipole currents. The time point of maximum global field power within the domain of 70–140 ms post-trigger was considered for dipole fitting. An orientation of the dipoles toward the frontal pole was set to yield a negative moment at the time of N1m peak. Calculation of the individual lead-field functions relied on dipole position and orientation. The subspace projection on these two dipole components provided a measure of the time course of dipole strength. The difference wave between the averaged responses to the deviants and the last stimulus of the preceding series of standards, respectively, was considered the difference curve. In order to minimize noise within the averages, data were weighted by the inverse of prestimulus variance. Quantification of N1m amplitude change included the weighting of the difference deflections using a triangular window centered at N1m peak set as point of maximal negativity of the dipole moment (rise and fall times ¼ 20 ms each, see Fig. 2). For statistical analysis the independent variable spectral distance between vowel formants was defined as the difference of the respective center frequencies f using a modified Bark scale (z ¼ 21:4 log10ð4:37 f 1 1Þ; ([11], pp. 161–205)). This scale is isodistant for masking bandwidth as obtained from perception experiments. The measured MEG signal represents the sum of multiple signal and noise sources with individual locations. Consequently, intra-subject variability was considered an additive error term allowing, thus, for robust linear statistics. A Kolmogorov–Smirnov-test rejected normal distribution assumption for the parameters across the group (ks ¼ 0.19 and 0.11, P , 0:01). Therefore, the inter-subject variation
K. Mathiak et al. / Neuroscience Letters 329 (2002) 29–32
Fig. 2. (A) Upper panel: Time course of the calculated dipole moments in a representative subject (straight line ¼ left hemisphere, dashed line ¼ right hemisphere). The vertical line in the small triangle indicates the N1m peak elicited 100 ms after stimulus onset. Note that the /y/-like deviants (green line) elicited a much larger N1m amplitude than the averaged response to frequent /u/ (blue line; last stimulus prior to the deviant). (B) Lower panel: Plot of N1m amplitude increase across formant distance. Robust group average of robust intra-individual regression estimators for left (solid line) and right hemisphere (dashed line) across change of first (F1 down; blue) as well as down- (F2 down; green) and up-shifted second formant (F2 up; magenta), respectively.
of individual configurations was evaluated by means of nonparametric group analysis (sign test). Fig. 2a displays the time course of the computed dipole moments as obtained from a representative subject during the first experiment. In two volunteers, the power peak emerged at a latency of 88 ms. Since this activity exhibited a positive moment, i.e. a dipole orientation toward the frontal pole, it must be assumed to reflect the P50m component preceding the N1m field. The remaining subjects showed a power peak within the time domain of 104–132 ms after trigger reflecting a negative dipole moment. Possibly due to the shorter ISI of the second experiment, all subjects exhibited maximum global power peaks within the time range of the P50m wave selected for dipole fitting. The negative maximum of the calculated dipole moment provided a measure of N1m. Both experiments revealed the N1m amplitude increase to depend upon spectral formant distance (sign test: 0 in 12
31
and 0 in 13, P , 0:01 for all conditions each). As concerns F2, the N1m measure increased within a range of about two Bark but significantly flattened out for larger frequency changes (smaller linear trend: 1 in 12 and 2 in 13, P , 0:02). Remarkably, the second experiment disclosed a different behavior for F1 in terms of a continuous increase of N1m amplitude even beyond the range of two Bark (2 in 13, P , 0:02; Fig. 2b). Taken together, variation of the second formant revealed a non-linear relationship between N1m increase and spectral distance. The filter bank model assumes the basilar membrane of the inner ear to act as an array of overlapping bandpass filters on an incoming auditory signal ([11], pp. 161–205). Consequently, the simultaneous masking of a tone with noise is restricted to the area of filter overlap, i.e. to a limited range of frequency termed the critical band (CB). On the Bark scale, the CB masking extends across two units. Animal experimentation revealed similar characteristics of spectral encoding at the level of the auditory cortex ([11], pp. 75–121). On these grounds, it is quite conceivable that a deviant vowel of a center frequency within the two Bark range of the respective standard undergoes forward masking, in the sense of an attenuation of subsequent stimulus response by the preceding ones. Furthermore, the degree of masking should depend upon the spectral distance between the rare and frequent events. Admittedly, forward masking effects as reported in the literature extend across just a few tens of milliseconds ([11], pp. 194–199). These data, however, stem from psychoacoustic experiments and do not exclude prolonged forward masking or adaptation under the conditions of repetitive stimulus application. It can be expected, thus, that adaptation processes in frequency selective neuronal assemblies develop in parallel to spectral dissimilarity across a frequency range of about two Bark but achieve a plateau outside the domain of CB overlap. Salient acoustic components of communication or orientation sounds often parallel specific functional characteristics of the central auditory system. For instance, in echolocating bats, the second harmonic of the complex sounds used for target detection (at about 61 kHz) represents the predominant acoustic cue. This spectral domain shows over-proportionate representation at the level of the auditory cortex and the respective neurons are characterized by very sharp frequency tuning [14]. In a similar vein, the higher significance of CBs in the N1 time domain for F2 as compared to F1 might be influential for speech sound perception. Sussman and colleagues [15] suggested that stop consonants are cued by parameters of the second rather than the first formant (‘locus equation theory’). The onset of F2 usually lies outside the two Bark range of its steady state component and, thus, might be unmasked and detected faster than F1 onset. In line with previous studies [3], the present investigation found the difference wave between responses to standard and deviant vowels within the N1m time domain. The
32
K. Mathiak et al. / Neuroscience Letters 329 (2002) 29–32
observed effects reflected CBs and, thus, point at a tonotopic organization of the neuronal substrate of this N1m enhancement. By contrast, the classical model of mismatch negativity assumes that rare stimuli elicit mismatch activity independently of the N1 deflection or its magnetic analogue [12]. However, recent studies suggest a more dynamic concept of deviance detection. As concerns frequency changes, adaptation and lateral inhibition might account for the mismatch response [10]. Thus, the combination of delay and inhibition of distinct N1 or N1m components can explain a significant part of the pre-attentive responses to deviant sounds [5]. The considered F2 shifts extended across vowel type boundaries. Indeed, processing of phonetic categories has been reported to be bound to later MMNm components [17]. Nevertheless, it cannot be excluded that category boundaries within the vowel space (see Fig. 1) contribute to the observed plateau effects. The relationship of CBs and categorical perception of vowels remains to be studied. This study was supported the German Research Foundation (SFB 550/B1 and SPP-ZIZAS Ac 55/5-1). The authors thank Maike Borutta and Dirk Mooshammer for technical assistance and Nell Zink reviewing the manuscript for style. [1] Aaltonen, O., Niemi, P., Nyrke, T. and Tuhkanen, M., Eventrelated brain potentials and the perception of a phonetic continuum, Biol. Psychol., 24 (1987) 197–207. [2] Cse´ pe, V., Pantev, C., Hoke, M., Ross, B. and Hampson, S., Mismatch field to tone pairs: neuromagnetic evidence for temporal integration at the sensory level, Electroenceph. clin. Neurophysiol., 104 (1997) 1–9. [3] Diesch, E. and Luce, T., Magnetic mismatch fields elicited by vowels and consonants, Exp. Brain Res., 116 (1997) 139– 152. [4] Hertrich, I. and Ackermann, H., A vowel synthesizer based on formant sinusoids modulated by fundamental frequency, J. Acoust. Soc. Am., 106 (1999) 2988–2990. [5] Ja¨ a¨ skela¨ inen, I.P., Ahveninen, J., Bonmassar, G., May, P., Ilmoniemi, R.J., Levanen, S., Lin, F.-H., Stufflebeam, J., Melcher, J., Dale, A.M., Tiitinen, H. and Belliveau, J.W.,
[6] [7]
[8]
[9]
[10]
[11] [12] [13]
[14]
[15]
[16]
[17]
Differential post-stimulus inhibition of N1 activity underlies mismatch response generation at the human auditory cortex. Proceedings of the 10th Scientific Meeting of the International Society for Magnetic Resonance in Medicine, 10 (2002) 1471. Kohler, K.J., Einfu¨ hrung in die Phonetik des Deutschen, Schmidt Verlag, Berlin, 1995. Mathiak, K., Hertrich, I., Lutzenberger, W. and Ackermann, H., Pre-attentive processing of consonant-vowel syllables at the supratemporal plane: a whole-head magnetoencephalography study, Brain Res. Cogn. Brain Res., 8 (1999) 251–257. Mathiak, K., Hertrich, I., Lutzenberger, W. and Ackermann, H., Encoding of temporal speech features (formant transients) during binaural and dichotic stimulus application: a whole-head magnetencephalography study, Brain Res. Cogn. Brain Res., 10 (2000) 125–131. Mathiak, K., Hertrich, I., Lutzenberger, W. and Ackermann, H., Neuronal correlates of duplex perception: a whole-head magnetencephalography study, NeuroReport, 12 (2001) 501–506. May, P., Tiitinen, H., Ilmoniemi, R.J., Nyman, G., Taylor, J.G. and Naatanen, R., Frequency change detection in human auditory cortex, J. Comput. Neurosci., 6 (1999) 99– 120. Moore, B.C.J., Hearing, Academic Press, San Diego, 1995. Na¨ a¨ ta¨ nen, R., Attention and Brain Function, Erlbaum, Mahwah, NJ, 1992. Poeppel, D., Phillips, C., Yellin, E., Rowley, H.A., Roberts, T.P. and Marantz, A., Processing of vowels in supratemporal auditory cortex, Neurosci. Lett., 221 (1997) 145–148. Suga, N. and Manabe, T., Neural basis of amplitude-spectrum representation in auditory cortex of the mustached bat, J. Neurophysiol., 47 (1982) 225–255. Sussman, H.M., Fruchter, D., Hilbert, J. and Sirosh, J., Linear correlates in the speech signal: the orderly output constraint, Behav. Brain Sci., 21 (1999) 241–259. Titova, N. and Na¨ a¨ ta¨ nen, R., Preattentive voice discrimination by the human brain as indexed by the mismatch negativity, Neurosci. Lett., 308 (2001) 63–65. Winkler, I., Lehtokoski, A., Alku, P., Vainio, M., Czigler, I., Cse´ pe, V., Aaltonen, O., Raimo, I., Alho, K., Lang, H., Iivonen, A. and Na¨ a¨ ta¨ nen, R., Pre-attentive detection of vowel contrasts utilizes both phonetic and auditory memory representations, Brain Res. Cogn. Brain Res., 7 (1999) 357–369.