Clinical Neurophysiology 123 (2012) 2273–2280
Contents lists available at SciVerse ScienceDirect
Clinical Neurophysiology journal homepage: www.elsevier.com/locate/clinph
Visual distance cues modulate neuromagnetic auditory N1m responses Christian F. Altmann a,b,⇑, Masao Matsuhashi a, Mikhail Votinov a,b, Kazuhiro Goto a,b, Tatsuya Mima a, Hidenao Fukuyama a a b
Human Brain Research Center, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan Career-Path Promotion Unit for Young Life Scientists, Kyoto University, Kyoto 606-8501, Japan
a r t i c l e
i n f o
Article history: Accepted 2 April 2012 Available online 16 May 2012 Keywords: Auditory loudness perception Ventriloquist illusion Magnetoencephalography Distance perception Loudness constancy
h i g h l i g h t s Auditory distance perception can rely on sound intensity as a cue and can be additionally modulated by vision. We found that the N1m MEG component generated in auditory cortex was enhanced when a sound was paired with a distant compared to a close visual cue. This result suggests an audio–visual interaction at an early stage in the auditory cortex, possibly related to cue integration for auditory distance processing.
a b s t r a c t Objective: Auditory distance judgment relies on several acoustic cues and can be modulated by visual information. Sound intensity serves as one such cue as it decreases with increasing distance. In this magnetoencephalography (MEG) experiment, we tested whether N1m MEG responses, previously described to scale with sound intensity, are modulated by visual distance cues. Methods: We recorded behavioral and MEG data from 15 healthy normal hearing participants. Noise bursts at different sound pressure levels were paired with synchronous visual cues at different distances. We hypothesized that noise paired with far visual cues will be represented louder and result in increased N1m amplitudes compared to a pairing with close visual cues. This might be based on a compensation of visually induced distance when processing loudness. Results: Psychophysically, we observed no significant modulation of loudness judgments by visual cues. However, N1m MEG responses at about 100 ms after stimulus onset were significantly stronger with distal compared to proximal visual cues in the left auditory cortex. Conclusions: Our results suggest an audio–visual interaction at an early stage in the left auditory cortex, possibly related to cue integration for auditory distance processing. Significance: Sound distance processing could prove itself as a promising model system for the investigation of intra-modal and cross-modal integration principles. Ó 2012 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
1. Introduction The perception of auditory distance has so far often been neglected in neuroscience in favor of other elements of spatial localization, in particular judgment of sound source azimuth and elevation. However, in the absence of vision, auditory cues can provide important information about the distance of an object and help to react adequately to looming dangers. For example, Voss
⇑ Corresponding author at: Career-Path Promotion Unit for Young Life Scientists, Kyoto University, Yoshida Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan. Tel.: +81 75 753 9295; fax: +81 75 753 9281. E-mail address:
[email protected] (C.F. Altmann).
et al. (2004) have demonstrated that both early- and late-onset blind individuals outperformed sighted individuals in discriminating sound source distances at about 3–4 m from the listener. This result suggests that visual deficits can be compensated by audition even at later stages in life and even at distances that are beyond reaching distance. Moreover, separation in distance of multiple speech sources can improve intelligibility by spatial unmasking (Brungart and Simpson, 2002). For horizontal separation, this effect has been shown to underlie advantages for children that have been simultaneously rather than sequentially implanted with bilateral cochlear implants (Chadha et al., 2011). When testing in noisy, reverberant conditions, speech source distance has been demonstrated to affect sentence recognition by simulated cochlear implant users (Whitmal and Poissant, 2009).
1388-2457/$36.00 Ó 2012 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved. http://dx.doi.org/10.1016/j.clinph.2012.04.004
2274
C.F. Altmann et al. / Clinical Neurophysiology 123 (2012) 2273–2280
The cues that are used for ranging auditory objects are manifold (Mershon and King, 1975; Zahorik, 2002): two major acoustic cues are sound intensity and the direct-to-reverberant energy ratio. Sound intensity is related to distance following an inverse-square law when the sound source is point-like under anechoic or freefield conditions. That means that for doubling the distance to a source, sound intensity is reduced to one fourth or by about 6 dB. However, to take advantage of this cue, the listener has to store an internal model of the typical sound source power of an object, compare it with the sound intensity reaching the ears and infer the distance to the object. To build up an internal model of typical sound power, the listener needs familiarity with the sound. Indeed, previous psychophysical studies have shown improved sound ranging capabilities for familiar sounds, in particular speech (Coleman, 1962; McGregor et al., 1985; Philbeck and Mershon, 2002). However, in the absence of familiarity it is still possible to range an object using reverberation cues when available. Specifically, the direct-to-reverberant ratio decreases with increasing distance, because the direct sound is attenuated according to the inverse-square-law, whereas the reflected sound energy can be approximated by a diffuse sound field (Zahorik, 2002). Interestingly, when reverberation cues are present a phenomenon termed loudness constancy can be observed (Zahorik and Wightman, 2001): when subjects are instructed to estimate the perceived loudness of a sound at its source positioned at different distances, the reported loudness levels are constant across distance, even though the sound intensity at the ears depends on the source distance. Thus, the subjects possibly take into account the source distance to estimate the sound power at the source, but given that they misjudged the distances in a separate psychophysical experiment, it has been argued that they might rely on the reverberation energy for their loudness judgments. Alternatively, distance cues may be taken into account for loudness constancy, but they cannot be reproduced accurately when subjects are asked to estimate distance. To clarify how loudness constancy is achieved by our brain, one could construct a situation in which a distance cue leads to modulation of loudness perception or loudness-related neural markers. A promising approach could be to determine object distance by visual cues. For sound localization in the horizontal plane, the ventriloquist illusion describes the capture of sound position by visual cues (Bertelson and de Gelder, 2003). This case of audio– visual integration has been shown to occur at a relatively early stage within 200 ms after stimulus onset, since it is reflected in the mismatch negativity (Colin et al., 2002; Stekelenburg et al., 2004). A strong form of visual capture of auditory source location in depth has been described in anechoic environments (Gardner, 1968): when subjects have to judge the sound source from an array of loudspeakers directly in front of them at different distances, they believe the sound originates from the closest speaker even though the farthest speaker is active. Moreover, when subjects are visually presented with a dummy loudspeaker and are instructed to judge the loudness of fixed noise sounds emitted from a hidden source, apparent loudness increases with the distance of the dummy loudspeaker (Mershon et al., 1981). In this experiment, we aimed to test whether visual distance cues can affect loudness processing. Possibly, the distance of an object determined by a visual cue ‘‘capturing’’ the sound is taken into account when loudness levels are to be judged. Thus, a distant sound would appear louder than a sound with same actual sound pressure level but appearing closer to the subject, similar to the experiments by Mershon et al. (1981). Furthermore, we were interested in whether mass neural activity in the auditory cortex reflects such an integration of visual distance and auditory loudness information. A possible candidate component is the N1 that has been reported to depend on stimulus intensity by previous
EEG and MEG studies. More specifically, early EEG investigations have shown a linear amplitude increase of the N1–P2 complex when sound intensity increases (Beagley and Knight, 1967; Kaskey et al., 1980; Rapin et al., 1966; Hegerl et al., 1994). A more recent study that compared EEG with MEG recordings, observed a linear increase of the EEG N1 amplitude with intensity, but a saturation above 80 dB sound pressure level for the MEG N1m (Neukirch et al., 2002). The linear increase of the N1m amplitude with sound pressure has been replicated in the intensity range of 30–70 dB in a recent MEG study, in particular for pure tones with lower frequencies (<2000 Hz; Soeta and Nakagawa, 2009). The loudness dependence of the EEG N1 component has been shown to correlate with the extent (number of significantly activated voxels) of fMRI activation in a combined EEG/fMRI experiment (Mulert et al., 2005). We used magnetoencephalography (MEG), which has the advantage of high temporal resolution, to measure the evoked magnetic field and to test whether the N1m, the magnetic counterpart of the N1 potential, is modulated by visual distance cues. We hypothesized that the N1m amplitude would depend on sound intensity and that a visually more distant object would increase the N1m amplitudes additionally.
2. Materials and methods 2.1. Subjects Sixteen healthy right-handed subjects participated in the MEG and psychophysical experiment. All subjects had normal hearing abilities and had no history of otological, neurological or psychiatric disease as indicated by self-report. One participant was excluded from further analysis due to increased MEG system noise levels on that particular day. Thus, data from 15 subjects (11 male) were analyzed whose average age was 26.6 (range: 20–39) years. Each subject gave written informed consent to participate in the study. The experiments were performed in accordance with the ethical standards laid down in the declaration of Helsinki of 1964 and the guidelines approved by the local ethics committee of the Graduate School of Medicine and Faculty of Medicine, Kyoto University.
2.2. Stimuli The experiment consisted of the following three parts: first, binaural sensation levels were measured for the employed sound stimuli. Second, the actual MEG experiment was conducted and third, a psychophysical sound loudness estimation experiment was performed. The auditory stimuli consisted of 50 ms bandpass-filtered (250–4000 Hz) noise, sampled at 44.1 kHz and shaped by rising and falling 2.5 ms ramps. To avoid conflict between auditory and visual distance cues, the sounds did not contain any reverberation. Sounds were presented binaurally via air-conducting tubes with insert earphones (E-A-R-tone 3A, Aearo Corporation, Indianapolis, USA). As shown in Fig. 1a, during the psychophysical loudness judgment, the noise was presented at five different sound pressure levels, which were 42, 48, 54, 60, and 66 dB above the subjects’ individual sensation level (dB SL). In the MEG experiment, a subset of these levels was presented, namely 48, 54, and 60 dB SL. The noise sounds were paired with visual distance cues that consisted of red light located 60, 120 and 240 cm from the listeners’ ear-to-ear axis. The red-light stimuli (peak wavelength: 660 nm) were generated by three TOTX173 (Toshiba Corporation, Tokyo, Japan) fiber-optic transmitting modules outside the MEG shielded room and guided into the shielded room with fiber-optic cables of equal length. The visible diameter of the glass fiber was 1 mm.
2275
C.F. Altmann et al. / Clinical Neurophysiology 123 (2012) 2273–2280
(a) Experimental setup 60
66
60
42
48
54
120
240 distance [cm]
(b) Trial structure
(c) Loudness judgment
Experimental trial
25
1 1.05
2.5-2.8
time [s] Catch trial
0
1.15
2.5-2.8
time [s]
perceived loudness (a.u.)
0
sound pressure [dB SL]
20
Visual distance 60cm 120cm 240cm
15
10
5
42
48
54
60
66
sound pressure [dB SL] Fig. 1. Experimental methods. (a) This sketch depicts the experimental setup. During the psychophysical experiment, noise at five different sound pressure levels (42, 48, 54, 60, 66 dB SL) was paired with a visual distance cue given by LEDs positioned 60, 120 or 240 cm from the subject’s ear-to-ear line. During the MEG experiment, only a subset of sound stimuli (48, 54, 60 dB SL) was used and paired with the visual distance cues as indicated by the colored lines (blue, black, red) connecting sounds and visual distances. (b) The temporal trial structure is shown for an experimental trial and a catch trial which afforded a button press. The gray bar indicates the occurrence of a noise burst, the red bars depict the occurrence of the visual distance cue. At the beginning of each trial, the visual distance cue was shown for 1 s. Then, the visual distance cue was turned off for 50 ms, and the 50 ms sound stimulus was presented during this period. At the end of the sound, the visual distance cue was turned on again until the next trial began and the location of the visual distance cue was changed. (c) Results for the psychophysical loudness estimation plotted in arbitrary units according to the free-modulo magnitude estimation procedure. Error bars depict standard error of the mean.
2.3. Procedure During all parts of the experiment, subjects were seated in the darkened MEG shield room. Prior to the MEG experiment, the subjects’ binaural sensation level for the noise stimuli was evaluated. To this end, the 50% sensation threshold was adaptively tested with a weighted up-down method described previously (Kaernbach, 1991). The step-size started with 16 dB and was then halved at each reversal, that is whenever a series of correct trials was interrupted by an incorrect trial or vice versa. The step-size was halved until it reached 1 dB after which it remained constant for ten further reversals. The sensation threshold was determined as the median of the last eight reversals. The audio–visual MEG experiment consisted of five to six experimental runs with a duration of 6 min 40 s each. Nine subjects additionally underwent two blocks of a visual only experiment for which the sounds were muted. Each run consisted of 150 sound presentations and the inter-stimulus interval from onset to onset randomly ranged from 2.5 to 2.8 s. As depicted in Fig. 1b, each trial started with the visual distance cue which was shown for 1 s. Then, the visual distance cue was turned off for 50 ms, in synchrony with the presentation of the sound stimulus. After that, the visual distance cue was turned on again until the next trial began and the location of the visual distance cue was changed. Thus, the onset of the sound was synchronized with the offset of the visual stimulus and sound offset was synchronized with LED onset. The rationale for this approach was that an audiovisual onset–onset pairing could have led to sudden vergence eye-movements phase-locked to stimulus onset contaminating the MEG evoked fields with ocular artifacts. Furthermore, presentation of the visual cue well before the sound stimulus
allows for a stable visual distance percept at the time when the sound is introduced. During the MEG experiment, subjects were instructed to attend the visual cues and to report visual targets occurring in catch trials only by button presses. As a visual target, the LED light was turned off twice rather than only once. Catch trials occurred with a probability of 10% and were discarded from MEG analysis. Following the MEG experiment, subjects underwent a loudness estimation task on five different sound pressure levels paired with three different visual cue distances. Each combination of sound pressure level and visual distance was presented ten times, resulting in 150 judgments. To ensure that subjects viewed the visual stimuli, they were additionally instructed to report verbally visual targets occurring in 20 additional catch trials. The loudness estimation followed a free-modulus magnitude estimation procedure (Stevens, 1975). Specifically, the subjects were instructed to verbally assign a number to the loudness of the different stimuli. The response range was chosen freely by the subjects. To normalize the obtained estimation scores across subjects a procedure similar to that described by Cho et al. (2002) was employed. In more detail, each single response was converted to its logarithm. Then, for each subject and condition, the arithmetic mean across the logarithmic loudness estimation values was calculated. From these values, the arithmetic mean across conditions was subtracted for each individual subject. Next, we averaged these values across subjects and added the grand mean across subjects and conditions to this value. Finally, we reversed the logarithm operation on the group mean values. The arithmetic mean of logarithmic judgment values corresponds to geometric averaging of the raw values which has been argued to be preferable when subjects match numbers to stimuli (Stevens and Guirao, 1962).
2276
C.F. Altmann et al. / Clinical Neurophysiology 123 (2012) 2273–2280
2.4. MEG acquisition and data analysis The auditory evoked magnetic fields were recorded with a 306channel whole-head MEG system (Vectorview; Elekta Neuromag Oy, Finland) which contained 102 sensor triplets consisting of two planar gradiometers and one magnetometer each. The exact head position with respect to the sensors was determined by measuring signals from four indicator coils attached to the scalp. In addition, three head landmarks (nasion and bilateral preauricular points) and the subject’s head shape were recorded with a spatial digitizer (Polhemus Inc., Colchester, USA) before the MEG experiment. This data served for coregistration with the individual structural MRI data obtained with a 0.2T MR scanner (Signa profile, GE HealthCare, Waukesha, USA). The MEG data was recorded with a bandpass filter of 0.10–200 Hz and a sampling rate of 600 Hz. Horizontal and vertical electrooculograms were simultaneously recorded. The epochs coinciding with electrooculogram peak-topeak amplitudes exceeding 150 lV were excluded from averaging to avoid blink artifacts and excessive eye movements. The analysis period was 600 ms including a prestimulus baseline of 100 ms. The acquired data were low-pass filtered with a cut-off frequency of 40 Hz. After artifact rejection, on average about valid 114 trials (range 77–151) per condition per subject remained for event-related averaging. We analyzed the evoked magnetic fields at the sensor level by calculating the vector sum magnitudes of the planar gradiometer pairs. Vector sum magnitudes were computed by squaring the sum of MEG signals of each gradiometer pair and then calculating the square root of this sum (Bonte et al., 2006; Altmann et al., 2008). For each hemisphere, we averaged the vector sum magnitude across 23 gradiometer pairs covering the left and the right temporal lobe, respectively (for a similar approach see for example Tarkiainen et al., 2003; Altmann et al., 2008). The peak amplitude and latency of the N1m component were determined for each subject and hemisphere by determining the maximal vector sum magnitude within a 60-ms window centered on 100 ms after stimulus onset. To statistically test the influence of the sound pressure level and the visual distance cues, we restricted the analysis to four conditions (48 dB – 60 cm; 60 dB – 60 cm; 48 dB – 240 cm; 60 dB – 240 cm) and excluded the 52 dB – 120 cm condition. We compared the N1m peak amplitudes and latencies for these four conditions by means of a two-way repeated-measures ANOVA with sound pressure level (48 and 60 dB) and visual distance cue (60 and 240 cm) as factors. To characterize the cortical generators of the MEG responses, dipole source estimation was conducted with BESA 5.2 software (MEGIS software, Gräfelfing, Germany). Two dipole sources, located in the bilateral superior temporal lobes were used to model the evoked magnetic field for each individual subject and condition. For eight subjects, we modeled the bilateral MEG responses with two non-symmetrical source dipoles. However, seven subjects exhibited weak N1m responses in one of the brain hemispheres, that would have led to implausible dipole positions outside the auditory cortex and thus for these subjects we modeled the MEG responses with a set of symmetrical dipole locations. Dipole source locations and orientations were estimated within the 20-ms time window prior to the maximal global field power of the N1m response peaking at around 100 ms after sound onset. The source locations were transformed into the standard brain coordinate space described by Talairach and Tournoux (1988). To compare dipole moments across conditions, source waveforms were calculated for each subject and condition. Similar to the sensor level analysis, we determined the N1m peak amplitude for each condition for each individual subject as the MEG peak response within a time range of 70–130 ms after stimulus onset. The N1m peak dipole moments and latencies were statistically tested with a repeated-measures ANOVA with factors sound pressure level (48 and 60 dB) and visual
distance cue (60 and 240 cm). To test for interactions between left and right hemispheres, we additionally conducted a three-way repeated measures ANOVA with factors hemisphere (left and right), sound pressure level (48 and 60 dB SL) and visual distance cue (60 and 240 cm). For all statistical tests, significance was reached when the p-value for an effect was a < 0.05. 3. Results 3.1. Behavioral performance Behavioral data were collected in two parts of the experiment: first, in a psychophysical experiment in which the subjects were instructed to estimate sound loudness. Second, during the MEG experiment the subjects were instructed to respond to visual targets in the catch trials with a button press. As shown in Fig. 1c, the estimated loudness increased with sound pressure level. A repeated-measures ANOVA with factors sound pressure level (42, 48, 54, 60, and 66 dB SL) and visual distance cue (60, 120, and 240 cm) showed significant effects for sound pressure level (F[4,56] = 48.37; p < 0.001), but no significant influence of the visual distance cue (F[2,28] < 1; p = 0.44) and no significant interaction (F[8, 112] < 1; p = 0.76). Thus, the listeners reported increased perceptual loudness with increasing sound pressure level without an influence of visual distance. Subjects performed the visual target detection task during the MEG experiment at an average hit rate of 65% (48 dB SL – 60 cm: 71%; 48 dB SL – 240 cm: 59%, 60 dB SL – 60 cm: 68%; 60 dB SL – 240 cm: 62%). A repeated-measures ANOVA on the hit rates with factors sound pressure level (48 and 60 dB SL) and visual distance cue (60 and 240 cm) showed no significant effects for sound pressure level (F[1,14] < 1; p = 0.99), but a significant influence of the visual target distance (F[1,14] = 5.93; p < 0.05). In particular, subjects’ visual detection was significantly lower for the far compared to close visual targets. There was no significant interaction between the factors sound pressure level and visual distance (F[1,14] < 1; p = 0.16). 3.2. Sensor level MEG results To investigate the effect of visual distance cues on auditory evoked responses, we determined the vector sum magnitudes averaged across left and right hemisphere gradient sensor pairs, depicted in Fig. 2. In a repeated-measures ANOVA with factors sound pressure level (48 and 60 dB SL) and visual distance cue (60 and 240 cm), the N1m peak amplitudes for these sensor pairs showed a significant main effect of sound pressure in the right (F[1,14] = 6.03; p < 0.05) but not in the left hemisphere (F[1,14] < 1; p = 0.43). The main effect of the visual distance cue was not significant in the right hemisphere (F[1,14] = 4.05; p = 0.06) but in the left hemisphere (F[1,14] = 10.2; p < 0.01). The interaction between the factors sound pressure level and visual distance cue was not significant in either hemisphere (right hemisphere: F[1,14] < 1; p = 0.56; left hemisphere: F[1,14] < 1; p = 0.39). To assess the unimodal influence of visual distance cues on neuromagnetic responses in the temporal lobe, nine subjects underwent a visual only control experiment. Results of this control experiment on the sensor level and the comparison with the audiovisual conditions are shown in Supplementary Fig. S1. At the time of the N1m component, the evoked magnetic fields for the far compared to close visual cues were stronger only in the audiovisual condition in the left hemisphere (p < 0.001, Supplementary Fig. S1a), but not in the visual only condition (Supplementary Fig. S1b). The magnetic fields evoked across occipital sensors showed also no significant differences between far and close visual cues in the visual only condition (Supplementary Fig. S1c). Furthermore, to depict
2277
C.F. Altmann et al. / Clinical Neurophysiology 123 (2012) 2273–2280
35
48dB 60cm 48dB 240cm 54dB 120cm 60dB 60cm 60dB 240cm
30 25 20 15 10 5 −0.1
0
0.1
0.2
vector sum magnitude [fT/cm]
vector sum magnitude [fT/cm]
35
(a) Average vector sum magnitudes - time course Right Hemisphere Left Hemisphere 30 25 20 15 10 5 −0.1
0.3
0
0.1
0.2
0.3
time [s]
time [s]
(b) Vector sum magnitudes - N1m topography
A 50 48dB 240cm
L
60dB 240cm
R
ft/cm
0 54dB 120cm
48dB 60cm
60dB 60cm
(c) Average vector sum magnitudes - N1m peak magnitudes 34
Left Hemisphere
29 28
Visual distance 60cm 120cm 240cm
27 26 25 24
N1m peak magnitude [fT/cm]
N1m peak magnitude [fT/cm]
30
32 31 30 29 28
48
54 sound pressure level [dB SL]
60
Right Hemisphere
33
48
54 sound pressure level [dB SL]
60
Fig. 2. MEG sensor level analysis. (a) Group-averaged (n = 15) time courses of the vector sum magnitudes averaged across left and right temporal sensors, respectively. The insets in the left and right graph show the position of gradiometer sensor pairs for which the vector sum magnitudes were averaged. (b) Topography of the N1m peak responses for the gradiometer sensor pairs. A: anterior; R: right; L: left. (c) Peak N1m vector sum magnitudes averaged across all subjects. Error bars depict standard error of the mean. For the error bars to depict within- rather than between-subject variability, we calculated the standard error of the mean based on the peak N1m amplitudes adjusted for subject effects. This adjustment was achieved by subtracting the mean across conditions within a subject from the peak N1m amplitude value for each condition within a subject and adding the grand mean across all conditions and subjects.
differences between male and female subjects, we present the evoked magnetic fields in the audiovisual conditions separated by sex in Supplementary Fig. S2. The N1m component peaked at 105 ms after stimulus onset in the right and 108 ms in the left hemisphere. There was no significant latency difference between conditions (p > 0.3 for left and right hemispheres apart from the effect of the factor sound pressure in the right hemisphere: F[1,14] = 3.42; p = 0.09). Thus, N1m responses in the right hemisphere increased with increasing sound pressure level. More interestingly, we observed significantly stronger N1m responses in the left hemisphere when a sound was paired with a far compared to a close visual cue.
3.3. Dipole sources To characterize the cortical generators of the N1m component, we modeled the evoked magnetic field with bilateral dipole sources (see Fig. 3) and transformed the locations to the coordinate system defined by Talairach and Tournoux (1988). The positions of these dipoles, latencies and amplitudes are shown in Table 1. Averaged across conditions, the dipoles were located in the superior temporal lobe at coordinates of [x, y, z] = [48, 19, 12] mm ± [5, 8, 5] SD in the right and [x, y, z] = [ 50, 23, 11] mm ± [6, 8, 5] SD in the left hemisphere. The variance explained by the two-dipole model within the fitted N1m interval was on average 79.1%.
2278
C.F. Altmann et al. / Clinical Neurophysiology 123 (2012) 2273–2280
Fig. 3. Dipole source analysis. (a) In this graph we superimpose the group-averaged N1m dipole source locations for the different conditions onto the normalized structural MR image of a representative subject. The inner circles mark the mean dipole source location for each condition, the outer ellipses depict ± one standard deviation across subjects for the x, y and z-coordinates. The z and y-coordinates below the MR images describe the slice position in terms of the Talairach and Tournoux (1988) coordinate system. (b) Peak N1m dipole source moments averaged across all subjects (n = 15). Error bars depict standard error of the mean corrected for between-subject variability as in Fig. 2c.
Table 1 Mean ± SD of peak latency, peak amplitude and source coordinates in the Talairach and Tournoux (1988) coordinate system of the estimated N1m sources for the different conditions and the left and right hemispheres. Latency (ms) Left hemisphere 48 dB–60 cm 48 dB–240 cm 54 dB–120 cm 60 dB–60 cm 60 dB–240 cm
108 ± 14 106 ± 12 107 ± 13 102 ± 14 105 ± 15
Right hemisphere 48 dB–60 cm 48 dB–240 cm 54 dB–120 cm 60 dB–60 cm 60 dB–240 cm
107 ± 18 101 ± 10 100 ± 12 101 ± 15 100 ± 12
Amplitude (nA m)
x (mm)
y (mm)
21.9 ± 11.7 26.0 ± 14.0 25.2 ± 13.4 22.4 ± 13.4 26.9 ± 11.8
52 ± 7 49 ± 7 50 ± 7 51 ± 6 49 ± 7
24 ± 9 23 ± 8 22 ± 8 23 ± 9 22 ± 8
10 ± 6 10 ± 6 10 ± 5 11 ± 6 13 ± 6
25.7 ± 10.6 28.6 ± 17.2 30.2 ± 14.5 27.2 ± 10.7 31.3 ± 15.1
49 ± 7 48 ± 6 48 ± 6 49 ± 6 48 ± 6
19 ± 9 18 ± 8 18 ± 7 19 ± 9 18 ± 9
12 ± 6 12 ± 6 11 ± 5 12 ± 6 13 ± 6
In a repeated measures ANOVA with factors sound pressure level (48 and 60 dB SL) and visual distance cue (60 and 240 cm), the N1m peak amplitudes showed a significant main effect of sound pressure in the right (F[1,14] = 4.82; p < 0.05) but not in the left hemisphere (F[1,14] < 1; p = 0.70). The main effect of the visual distance cue was not significant in the right hemisphere (F[1,14] = 1.76; p = 0.20) but in the left hemisphere (F[1,14] = 9.61; p < 0.01). The interaction between sound pressure and visual distance was not significant (right hemisphere: F[1,14] < 1; p = 0.56; left hemisphere: F[1,14] < 1; p = 0.77). For the peak latencies, we did not observe significant main effects of sound pressure (right hemisphere: F[1,14] = 1.61;
z (mm)
p = 0.23; left hemisphere: F[1,14] = 2.77; p = 0.12) or visual distance cue (right hemisphere: F[1,14] = 2.88; p = 0.11; left hemisphere: F[1,14] < 1; p = 0.50). However, there was a significant interaction between sound pressure and visual distance cue in the right (F[1,14] = 5.06; p < 0.05) but not in the left hemisphere (F[1,14] = 2.23; p = 0.16). Additionally, we tested for interaction effects between hemispheres in a three-way repeated measures ANOVA with factors hemisphere (left and right), sound pressure level (48 and 60 dB SL) and visual distance cue (60 and 240 cm). For the N1m peak source moments, we did not observe significant main effects for hemisphere (F[1,14] = 1.12; p = 0.31) or sound pressure level
C.F. Altmann et al. / Clinical Neurophysiology 123 (2012) 2273–2280
(F[1,14] = 1.48; p = 0.24), but a marginally significant effect for the visual distance cue (F[1,14] = 4.16; p = 0.06). There was no significant interaction (hemisphere sound pressure level: F[1,14] < 1; p = 0.44; hemisphere visual distance cue: F[1,14] < 1; p = 0.64). For the peak latencies, the analysis did not reveal a significant main effect for hemisphere (F[1,14] = 1.12; p = 0.31), but a marginally significant effect for sound pressure level (F[1,14] = 4.20; p = 0.06), and no effect for the visual distance cue (F[1,14] = 1.43; p = 0.25). The interactions between hemisphere, sound pressure level and visual distance cue were not significant (hemisphere sound pressure level: F[1,14] < 1; p = 0.99; hemisphere visual distance cue: F[1,14] = 2.87; p = 0.11). In sum, similar to the sensor-level results, we found an increase of N1m source moments with sound pressure in the right and with visual distance in the left hemisphere.
4. Discussion The aim of the present experiment was to test whether visual distance cues modulate auditory loudness processing in a manner consistent with distance compensation to achieve loudness constancy. We did not observe modulation of psychophysical loudness judgments by visual distance, but N1m responses in the left auditory cortex were significantly stronger when a noise burst was paired with a far compared to a close visual stimulus. Possibly, the MEG N1m response amplifications at about 100 ms after stimulus onset reflect a cortical compensatory process that might be related to constant loudness perception across different distances. If we assume that the observed N1m amplifications are related to distance compensation, the question arises what the neurophysiological mechanisms are that lead to this phenomenon and how it can help to localize objects and process distant auditory objects. In the visual domain, a previous functional magnetic resonance imaging (fMRI) experiment (Murray et al., 2006) has manipulated the perceived angular size of an object by introducing depth cues. In this study, the area in V1 activated by the object was related to the perceived angular size but not the retinal extent. This indicates that primary visual cortex represents objects not only in terms of their retinal properties but that retinal size and distance are integrated at that early stage in the visual processing hierarchy. Similarly, based on our results we propose that at relatively early stages in auditory cortex distance information is integrated with sound loudness. Recent evidence for direct anatomical connections between low-level visual and caudal auditory cortex in the macaque monkey (Falchier et al., 2010) could point to a possible route for the integration of sound loudness and visual distance. Several different mechanisms have been proposed to underlie the increase of evoked potentials due to increased sound intensity (Schadow et al., 2007): an increase of an event-related potential might be caused by an increase of the number of neurons responding to a sound, an increase of interneural synchronization or an increase of synchronicity with stimulus onset. An increase in the number of responding neurons could be mediated by the response of frequency-selective neurons with increased bandwidth at higher sound intensities. Specifically, neurophysiological studies on cat primary auditory cortex have described neurons with wider response frequency range for higher intensities (Sutter, 2000). Alternatively, auditory cortical neurons with monotonic intensity-firing rate relationship may play a role in creating an enhanced MEG response due to increased sound intensity. Such monotonic and nonmonotonic neurons have been investigated in previous studies in the cat and have been shown to form clusters within the primary auditory cortex (Schreiner et al., 1992; Sutter and Schreiner, 1995). To clarify the neural mechanism that mediates the audiovisual interaction observed in this study, invasive measures in the animal
2279
model might offer valuable information. Graziano et al. (1999) have recorded from multimodal neurons in the ventral premotor area of macaque monkeys and found sensitivity for close distances (<30 cm) which suggests a preferential tuning for nearby space. However, this type of tuning may be more related to the control of head and arm movements rather than a representation of object distance in the near and far-field (Moore and King, 1999). Psychophysical studies that provide a quantitative model of sound distance perception based on direct-to-reverberant sound energy and adaptation of the listener to sound and environmental properties (Bronkhorst and Houtgast, 1999) suggest that a tuning to auditory distance in the mammalian brain may be possible (Moore and King, 1999). The clearest evidence for such a tuning has been provided for echolocating bats which possess neurons in the superior colliculus that are tuned to echo-delay which in turn is related to target distance (Valentine and Moss, 1997). It is unclear whether distance tuning is present in the non-echolocating mammalian’s brain, but the results of the present study possibly provide a starting point to investigate further into this question. In the current study, the effects of visual distance cues were significant in the left, the effects of sound intensity in the right hemisphere. Previous MEG research has suggested a special role for the right supratemporal plane in distance perception (Mathiak et al., 2003). At a first glance that appears in contradiction to our results, but Mathiak et al. (2003) showed that the mismatch response to distance deviants defined by sound amplitude shows a right-hemispheric lateralization. Similarly, in our study amplitude of anechoic sounds modulated the N1m responses predominantly in the right hemisphere. Possibly, the neural circuits involved in distance perception are arranged bilaterally and depending on the distance cue (vision, amplitude) show different lateralization. So far, we discussed the N1m amplitude modulation by visual distance cues mainly in the context of loudness perception. However, an alternative explanation is conceivable: Previous functional imaging studies have shown that stimulation of one sensory modality can modulate activity in sensory cortices of another modality. Specifically, a human fMRI study presenting subjects with visual (checkerboard stimuli), auditory (noise bursts) and audiovisual stimuli has shown decreased BOLD responses in auditory cortex during visual stimulation and vice versa (Laurienti et al., 2002). Furthermore, an fMRI study that subdivided the auditory cortex into core and belt areas and stimulated acoustically (pure tones) and audiovisually (pure tones and preceding red LEDs) has described decreased BOLD responses in core regions for the audiovisual compared to the auditory stimulation (Lehmann et al., 2006). During the present MEG experiment, the distal visual cues were weaker in intensity and smaller compared to the close targets, resulting behaviorally in lower hit rates for the far targets. Thus, the increased auditory N1m responses may be interpreted as a decreased suppressive influence from visual areas. However, in that case, we would have expected larger evoked magnetic fields for the close compared to far visual cues for which we could not find clear evidence (see Supplementary Fig. S1c). Nevertheless, further research is needed that disentangles visual stimulus intensity/ size and perceived distance in order to clarify whether the audio– visual interaction observed in this study is based on distance processing or suppression from visual areas. An apparent problem with our findings is the divergence of visual modulation for the behavioral effects and neuromagnetic responses. If the N1m amplitude modulation by visual cues is related to loudness constancy, then an accompanying behavioral effect should have been expected. A possible explanation could be that cognitive factors may have counteracted the behavioral effects: in the present study subjects were aware that the sound stimuli were not generated by the synchronously blinking LEDs and they might have based their loudness judgment on this knowl-
2280
C.F. Altmann et al. / Clinical Neurophysiology 123 (2012) 2273–2280
edge. However, the data from this study cannot clearly speak for or against this argument. Thus, to clarify whether the MEG N1m amplitude increases are indeed related to distance perception further experiments are needed to demonstrate these effects in the presence of a behavioral modulation by distance cues. For example, in a future experiment the visual distance cue could be replaced by a reverberation cue (see for example Alais and Carlile, 2005). A modulation of MEG responses according to loudness constancy paired with a perceptual effect would strengthen the notion of sound distance perception reflected by auditory cortical activity. 5. Conclusions In summary, when pairing noise sounds with different visual distance cues, we observed N1m amplitude increases for far compared to close distances. Our results suggest that at an early stage of auditory processing in the human cortex, visual distance cues affect processing of sound in auditory cortex. With its reliance on several auditory and non-auditory object properties, sound distance processing holds promise as a fruitful model system for the investigation of intra-modal and cross-modal integration principles. Acknowledgements This study was partly supported by the Special Coordination Fund for Promoting Science and Technology to C.F.A. from the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan, a Grant-in-Aid for Young Scientists (B) 23730701 to C.F.A. from the Japan Society for the Promotion of Science, the Strategic Research Program for Brain Sciences (SRPBS) to T.M. from MEXT of Japan, and a Grant-in-Aid for Scientific Research (C) 21613003 to T.M. from the Japan Society for the Promotion of Science. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.clinph.2012.04. 004. References Alais D, Carlile S. Synchronizing to real events: subjective audiovisual alignment scales with perceived auditory depth and speed of sound. Proc Natl Acad Sci USA 2005;102:2244–7. Altmann CF, Nakata H, Noguchi Y, Inui K, Hoshiyama M, Kaneoke Y, et al. Temporal dynamics of adaptation to natural sounds in the human auditory cortex. Cereb Cortex 2008;18:1350–60. Beagley HA, Knight JJ. Changes in auditory evoked response with intensity. J Laryngol Otol 1967;81:861–73. Bertelson P, de Gelder P. The psychology of multimodal perception. In: Spence C, Driver J, editors. Crossmodal space and crossmodal attention. Oxford: Oxford UP; 2003. Bonte M, Parviainen T, Hytönen K, Salmelin R. Time course of top-down and bottom-up influences of syllable processing in the auditory cortex. Cereb Cortex 2006;16:115–23. Bronkhorst AW, Houtgast T. Auditory distance perception in rooms. Nature 1999;397:517–20. Brungart DS, Simpson BD. The effects of spatial separation in distance on the informational and energetic masking of a nearby speech signal. J Acoust Soc Am 2002;112:664–76. Chadha NK, Papsin BC, Jiwani S, Gordon KA. Speech detection in noise and spatial unmasking in children with simultaneous versus sequential bilateral cochlear implants. Otol Neurotol 2011;32:1057–64. Cho G, Kim C, Casali JG. Sensory evaluation of fabric touch by free modulus magnitude estimation. Fiber Polym 2002;3:169–73. Coleman PD. Failure to localize the source distance of an unfamiliar sound. J Acoust Soc Am 1962;34:345–6. Colin C, Radeau M, Soquet A, Dachy B, Deltenre P. Electrophysiology of spatial scene analysis: the mismatch negativity (MMN) is sensitive to the ventriloquism illusion. Clin Neurophysiol 2002;113:507–18.
Falchier A, Schroeder CE, Hackett TA, Lakatos P, Nascimento-Silva S, Ulbert I, et al. Projection from visual areas V2 and prostriata to caudal auditory cortex in the monkey. Cereb Cortex 2010;20:1529–38. Gardner MB. Proximity image effect in sound localization. J Acoust Soc Am 1968;43:163. Graziano MSA, Reiss LAJ, Gross CG. A neuronal representation of the location of nearby sounds. Nature 1999;397:428–30. Hegerl U, Gallinat J, Mrowinski D. Intensity dependence of auditory evoked dipole source activity. Int J Psychophysiol 1994;17:1–13. Kaskey GB, Salzman LF, Klorman R, Pass HL. Relationships between stimulus intensity and amplitude of visual and auditory event related potentials. Biol Psychol 1980;10:115–25. Laurienti PJ, Burdette JH, Wallace MT, Yen Y, Field AS, Stein BE. Deactivation of sensory-specific cortex by cross-modal stimuli. J Cogn Neurosci 2002;14:420–9. Lehmann C, Herdener M, Esposito F, Hubl D, di Salle F, Scheffler K, et al. Differential patterns of multisensory interactions in core and belt areas of human auditory cortex. Neuroimage 2006;31:294–300. Kaernbach C. Simple adaptive testing with the weighted up–down method. Percept Psychophys 1991;49:227–9. Mathiak K, Hertrich I, Kincses WE, Riecker A, Lutzenberger W, Ackermann H. The right supratemporal plane hears the distance of objects: neuromagnetic correlates of virtual reality. Neuroreport 2003;14:307–11. McGregor P, Horn AG, Todd MA. Are familiar sounds ranged more accurately? Percept Mot Skills 1985;61:1082. Mershon DH, King E. Intensity and reverberation as factors in the auditory perception of egocentric distance. Percept Psychophys 1975;18:409–15. Mershon DH, Desaulniers DH, Kiefer SA, Amerson TL, Mills JT. Perceived loudness and visually-determined distance. Perception 1981;10:531–43. Moore DR, King AJ. Auditory perception: the near and far of sound localization. Curr Biol 1999;9:361–3. Mulert C, Jäger L, Propp S, Karch S, Störmann S, Pogarell O, et al. Sound level dependence of the primary auditory cortex: Simultaneous measurement with 61-channel EEG and fMRI. Neuroimage 2005;28:49–58. Murray SO, Boyaci H, Kersten D. The representation of perceived angular size in human primary visual cortex. Nat Neurosci 2006;9:429–34. Neukirch M, Hegerl U, Kötitz R, Dorn H, Gallinat U, Herrmann WM. Comparison of the amplitude/intensity function of the auditory evoked N1m and N1 components. Neuropsychobiology 2002;45:41–8. Philbeck JW, Mershon DH. Knowledge about typical source output influences perceived auditory distance. J Acoust Soc Am 2002;111:1980–3. Rapin I, Schimmel H, Tourk LM, Krasnegor NA, Pollak C. Evoked responses to clicks and tones of varying intensity in waking adults. Electroencephalogr Clin Neurophysiol 1966;21:335–44. Schadow J, Lenz D, Thaerig S, Busch NA, Fründ I, Herrmann CS. Stimulus intensity affects early sensory processing: sound intensity modulates auditory evoked gamma-band activity in human EEG. Int J Psychophysiol 2007;65:152–61. Schreiner CE, Mendelson JR, Sutter ML. Functional topography of cat primary auditory cortex: representation of tone intensity. Exp Brain Res 1992;92:105–22. Soeta Y, Nakagawa S. Sound level-dependent growth of N1m amplitude with low and high-frequency tones. Neuroreport 2009;20:548–52. Stekelenburg JJ, Vroomen J, de Gelder B. Illusory sound shifts induced by the ventriloquist illusion evoke the mismatch negativity. Neurosci Lett 2004;357:163–6. Stevens SS. Psychophysics: Introduction to its perceptual, neural and social prospects. New York: Wiley; 1975. Stevens SS, Guirao M. Loudness, reciprocality, and partition scales. J Acoust Soc Am 1962;34:1466–71. Sutter ML, Schreiner CE. Topography of intensity tuning in cat primary auditory cortex: single-neuron versus multiple-neuron recordings. J Neurophysiol 1995;73:190–204. Sutter ML. Shapes and level tolerances of frequency tuning curves in primary auditory cortex: quantitative measures and population codes. J Neurophysiol 2000;84:1012–25. Talairach J, Tournoux P. Co-planar stereotaxic atlas of the human brain: 3Dimensional proportional system: an approach to cerebral imaging. New York: Thieme; 1988. Tarkiainen A, Helenius P, Salmelin R. Category-specific occipitotemporal activation during face perception in dyslexic individuals: an MEG study. Neuroimage 2003;19:1194–204. Valentine DE, Moss CF. Spatially selective auditory responses in the superior colliculus of the echolocating bat. J Neurosci 1997;17:1720–33. Voss P, Lassonde M, Gougoux F, Fortin M, Guillemot JP, Lepore F. Early and lateonset blind individuals show supra-normal auditory abilities in far-space. Curr Biol 2004;14:1734–8. Whitmal NA, Poissant SF. Effects of source-to-listener distance and masking on perception of cochlear implant processed speech in reverberant rooms. J Acoust Soc Am 2009;126:2556–69. Zahorik P, Wightman FL. Loudness constancy with varying sound source distance. Nat Neurosci 2001;4:78–83. Zahorik P. Assessing auditory distance perception using virtual acoustics. J Acoust Soc Am 2002;111:1832–46.