BR A I N R ES E A RC H 1 2 2 0 ( 2 00 8 ) 2 4 6 –25 5
a v a i l a b l e a t w w w. s c i e n c e d i r e c t . c o m
w w w. e l s e v i e r. c o m / l o c a t e / b r a i n r e s
Research Report
Peripheral and central aspects of auditory across-frequency processing Stephan M.A. Ernst⁎, Jesko L. Verhey AG Neurosensorik, Institut für Physik, Carl von Ossietzky Universität Oldenburg, 26111 Oldenburg, Germany
A R T I C LE I N FO
AB S T R A C T
Article history:
Many natural sounds such as, e.g., speech show common level fluctuations across
Accepted 4 August 2007
frequency. It is generally assumed that the auditory system uses this spectro-temporal
Available online 14 August 2007
information to group the frequency components into auditory objects although the exact physiological mechanism is still not fully understood. The aim of the present study is to
Keywords:
disentangle the relative contribution of peripheral and central aspects of this across-
Auditory system
frequency processing using psychophysical experiments and modelling. The study focuses
Across-frequency process
on two different psychophysical phenomena which are thought to be related to the ability to
Peripheral processing
compare information across frequency: comodulation masking release (CMR), i.e., a release
Suppression
from masking of a sinusoidal signal due to the addition of a comodulated off-frequency
Central auditory processing
masker component to the masker component at the signal frequency, and comodulation
Comodulation
detection difference (CDD), i.e., the reduced ability of the auditory system to detect a masked signal if masker and signal share the same envelope. The comparison between model predictions and experimental results indicates that a considerable amount of these effects can be accounted for by peripheral processing alone. This is confirmed by experimental results with confounding across-frequency information about the grouping of the different frequencies into auditory objects. © 2007 Elsevier B.V. All rights reserved.
1.
Introduction
A common property of the auditory processing in mammals is the tuning of neural responses to specific frequencies in sounds which is found at all levels of the auditory system (e.g., Simmons et al., 1996). This tuning implies that information is processed within a limited frequency range only. However, the auditory system is also able to compare temporal information across a large frequency range. For example, neurons that integrate across several octaves are already found at the level of the cochlear nucleus, i.e., the first central processing site for acoustic information (Winter and Palmer, 1995; Jiang et al.,
1996). This integration of information may help to differentiate between sounds of different sound sources in natural environments, since many natural sounds are characterised by events of short duration, i.e., they show contiguous level fluctuations in different frequency regions (Florentine et al., 1996; Nelken et al., 1999; Singh and Theunissen, 2003). One psychoacoustical effect related to the ability of the auditory system to compare information across frequency is comodulation masking release (CMR). Comodulation masking release describes the reduced detrimental influence of a masker centred at the frequency of the sinusoidal signal when an additional off-frequency masker component with the same
⁎ Corresponding author. E-mail address:
[email protected] (S.M.A. Ernst). 0006-8993/$ – see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.brainres.2007.08.013
BR A I N R ES E A RC H 1 2 2 0 ( 2 00 8 ) 2 4 6 –2 55
level fluctuations as the on-frequency masker is presented simultaneously (Hall et al., 1984; Verhey et al., 2003, see also top panels of Fig. 1). Physiological correlates of CMR have been found at different levels of the auditory pathway. Nelken et al. (1999) showed that a signal added to amplitude-modulated noise can markedly reduce the response of neurons of the primary auditory cortex of cats to the level fluctuations of the masker. This phenomenon was referred to as “envelope locking suppression”. In a similar vein, Verhey et al. (1999) modelled psychoacoustical data of one type of CMR experiments using the reduction of the modulation depth due to the addition of the signal as an additional cue. Las et al. (2005) showed a larger “envelope locking suppression” in the cortical region A1 of the cat than on lower stages (inferior colliculus, IC, and medial geniculate body, MGB) of the auditory system suggesting a gradual segregation of signal from modulated noise along the auditory pathway. In contrast to this high-level process, Pressnitzer et al. (2001) proposed a physiological mechanism on an early stage of the auditory system. They argued that broadband inhibition on the level of the ventral part of the cochlear nucleus (VCN) may be the physiological mechanism underlying CMR. The role of wideband inhibition has also been investigated by Klump and colleagues in the auditory forebrain of the European starling (Nieder and Klump, 2001; Hofer and Klump, 2003). They failed to observe CMR in the population response when the off-frequency maskers were positioned in the inhibitory sidebands of the units. However, a subpopulation of single units or multiunits was able to show CMR of similar magnitude to the CMR measured behaviourally in the same species (Langemann and Klump, 2001). Neuert et al. (2004) measured responses of single
Fig. 1 – Schematic representation of the spectrogram for the reference (RF) and comodulated (CM) condition of CMR experiments (top panels) and the uncorrelated (UN) and CM condition of CDD experiments (bottom panels). The masker components of the stimulus are shown in grey and the signal in red.
247
units from the dorsal cochlear nucleus (DCN) of the Guinea pig in which wideband inhibition is particularly pronounced. Their results were consistent with the hypothesis by Pressnitzer et al. (2001) that wideband inhibition plays a role in acrosschannel processing on an early stage of the auditory pathway. The broadly tuned inhibitory cells of the cochlear nucleus show excitatory regions that extend over several octaves (Winter and Palmer, 1995; Jiang et al., 1996) suggesting that psychoacoustical CMR may also be observed over such large frequency ranges. In line with this hypothesis, Cohen (1991) and Ernst and Verhey (2005) measured CMR over a threeoctave range. Recently, Ernst and Verhey (2006) and Verhey and Ernst (2007) showed a CMR that extended over an even larger frequency range. All psychoacoustical studies showing a CMR over a large frequency range have in common that they all found large CMR only for off-frequency maskers (flanking bands) at levels which were substantially higher than the level of the masker component centred at the signal frequency (signal-centred band). For the low-level signal-centred band and the frequency range considered in Ernst and Verhey (2006), the level and frequency range for a masking release due to comodulation is similar to psychoacoustical suppression data. In psychoacoustical suppression experiments, a threshold reduction for a sinusoidal signal presented after the offset of the masker is observed when, in addition to a sinusoidal masker centred at the signal frequency, a second sinusoid with a different frequency and at a higher level is presented that suppresses the masker component at the signal frequency (Houtgast, 1974; Shannon, 1976; Yasin and Plack, 2007). A physiological correlate of this two-tone suppression experiment was observed at the level of the basilar membrane (e.g., Cooper, 1996; Geisler and Nuttall, 1997). The response of the basilar membrane to a low-level sinusoid with a frequency equal to the characteristic frequency can be suppressed when a second tone with a different frequency and at a higher level is presented simultaneously (e.g., Ruggero et al., 1992). Ernst and Verhey (2006) showed that a model simulating psychoacoustical two-tone suppression data also predicts part of the CMR suggesting that part of the CMR may already be accounted for by processes on the basilar membrane. A psychoacoustical effect related to CMR is comodulation detection difference (CDD, cf. lower panels of Fig. 1). The term CDD refers to the effect that the ability to detect a noise band (signal band) in the presence of another noise band (masker band) is reduced when masker and signal are comodulated (Cohen and Schubert, 1987, see also bottom panels of Fig. 1). A behavioural study in the hooded crow showed a similar CDD in birds as in humans (Jensen, 2007). It has been hypothesised that the same mechanisms underlie CDD and CMR (Cohen, 1991). Recent studies found physiological correlates of CDD at the level of the auditory cortex in the starling (Buschermöhle et al., 2006). Although this correlate of CDD has been found at a central level of the auditory system, Buschermöhle et al. (2006) showed that a model incorporating peripheral filtering and compression – characteristics of the basilar membrane – was able to predict the physiological data. Borrill and Moore (2002) also presented experimental results of psychoacoustical CDD experiments and simulations that supported an explanation based properties of the basilar membrane. They offered an
248
BR A I N R ES E A RC H 1 2 2 0 ( 2 00 8 ) 2 4 6 –25 5
explanation for CDD that was based on spread of excitation on the basilar membrane rather than on perceptual grouping at a more central level of the auditory system. In contrast, Hall et al. (2006) argued on the basis of psychoacoustical data with onset asynchrony between masker and signal that CDD is at least partly due to across-channel processes (see also section on peripheral aspects of CDD experiments below). In summary, across-frequency processes have been proposed at the level of the cochlea as well as at higher levels of the auditory system. In the following, peripheral contributions to across-frequency processes are defined as cochlear processes. All other processes are based on neural interactions such as, e.g., wideband inhibition, and will be referred to as processes of central stages of the auditory pathway. The aim of the present study is to shed some light on the relative contributions of these peripheral and central stages on two psychoacoustical across-frequency effects CMR and CDD using experimental data and model predictions. For the simulations, a model was used which accounts for physiological (Meddis et al., 2001) and psychoacoustical suppression data (Plack et al., 2002). Ernst and Verhey (2006) have shown that a slightly modified version of this suppression model is able to predict CMR over a four-octave range. They used a high-level off-frequency masker and a low-level onfrequency masker centred at 2 kHz. The first part of the present study investigates if CMR can also be predicted for other signal frequencies and if the filter characteristics of the model are realistic for such large spectral distances between signal and masker components. While the first part uses a model to quantify peripheral contributions to CMR, the second part of the present study uses an experimental paradigm proposed by Dau et al. (2005) to differentiate between peripheral and central aspects of CMR for masker configurations comparable to those used in Ernst and Verhey (2006) and in the first part of the present study. The aim of the last experimental part of the present study is to investigate if the same model that was used to predict CMR can be applied to CDD.
2. Contribution of peripheral processes to comodulation masking release The model used in Ernst and Verhey (2006) to simulate CMR is based on a model by Plack et al. (2002). Plack et al. (2002) showed that the model was able to predict psychoacoustical two-tone-suppression data. However, the model parameters used in Plack et al. (2002) were tuned to experimental data obtained for different signal frequencies and smaller frequency ranges than considered in the CMR study by Ernst and Verhey (2006). Thus, in the present study, CMR is measured for different signal frequencies and compared to model predictions. The CMR is determined by measuring thresholds for the comodulated (CM) condition, where the masker consists of signal-centred band and a comodulated flanking band, and for the reference (RF) condition, where the signal is masked by the signal-centred band only. In order to investigate if the influence of the flanking band at the place of the signal frequency is modelled properly, thresholds of the signal in the presence of the flanking band alone are measured in addition to the threshold for the CM and the RF condition.
2.1.
Suppression model
The model structure used in the present study is identical to the model described in Ernst and Verhey (2006). The first stage of the model is a combined outer and middle ear filter as used in Breebaart et al. (2001). A low-level noise is added to the output of the filter to simulate the threshold in quiet. The following two stages of the model, the dual-resonance-nonlinear (DRNL) filter (Meddis et al., 2001) and the temporal window, are essentially implemented in the same way as proposed in Plack et al. (2002). The DRNL filter is divided into a linear and a nonlinear pathway. The linear pathway consists of a gammatone filter followed by a low-pass filter. The nonlinear pathway consists of a gammatone filter, a compressive nonlinearity and a second gammatone filter. The nonlinear pathway has a gain relative to the linear pathway. The input is processed in parallel through both pathways and then added. In general, all DRNL parameters were taken from Plack et al. (2002). The output is squared and then passed through the temporal window. The window comprises three exponential functions, one to describe backward masking and two to describe forward masking. All parameters for the temporal window were taken from Oxenham (2001), which showed the best fit between his data and predictions with the same compression values for the nonlinearity as used by Plack et al. (2002). The decision variable is the quotient of the maximum amplitude of the whole temporal window output from the masker plus signal interval and the maximum amplitude of the two masker-only intervals. To determine a threshold the decision variable has to exceed the parameter k. The same procedure as in the experiment was used to determine the threshold with the model. The final threshold estimate was taken as the mean of 5 threshold estimates.
2.2.
Methods
Stimuli were generated digitally with a sampling rate of 44.1 kHz, converted to analogue signals (RME ADI-8 DS) and amplified (Tucker-Davis HB7). The stimuli were presented to both ears through Sennheiser HDA200 headphones with freefield equalisation (ISO 389-8, 2004). The frequency of the sinusoidal signal was either 1, 2, 4 or 8 Hz. The signal duration was 250 ms including 50-ms raised-cosine ramps and the signal was temporally centred in the masker. The masker duration was 500 ms including 50-ms raised cosine ramps. Depending on the masking condition, the masker was composed of one or two 20-Hz wide noise bands. Each noise band was created in the time domain by multiplying a sinusoidal carrier on a sample-by-sample basis with a 10-Hz low-pass-filtered noise extending down to 2 Hz. For each stimulus presentation, new noise bands were computed. The signal was always in phase with the sinusoidal carrier of the masker. The signal threshold was determined for three masking conditions. In the reference (RF) condition, the masker was composed of a masker centred at the signal frequency (signalcentred band). In the comodulated (CM) condition, the masker consisted of the signal-centred band and a flanking band with the same envelope, obtained by using the same low-pass filtered noise for signal-centred band and flanking band. The level for the signal-centred band was 33 dB SPL and the level of
BR A I N R ES E A RC H 1 2 2 0 ( 2 00 8 ) 2 4 6 –2 55
the flanking band was 78 dB SPL. The centre frequency of the flanking band was always three octaves below the signal frequency. In the third masking condition, thresholds were measured for the signal in the presence of the flanking band alone (FB-only condition). In addition to the three masking conditions, thresholds in quiet were measured for sinusoidal signals at octave frequencies from 0.125 to 8.0 kHz. A three-alternative, forced-choice (3-AFC) procedure with adaptive signal-level adjustment was used to determine the threshold of the sinusoidal signal. Intervals in a trial were separated by gaps of 500 ms. The sinusoidal signal was added to one of these intervals which was randomly selected for each trial. Subjects had to indicate which of the intervals contained the signal. Visual feedback was provided after each response. Signal level was adjusted according to a two-down, one-up rule to estimate the 70.7% detection threshold (Levitt, 1971). The initial step size was 8 dB. After every second reversal, the step size was halved until a step size of 2 dB was reached. The run was then continued for another six reversals. From the level of these last six reversals, the mean was calculated and used as an estimate of the threshold. Four threshold estimates were collected for each condition. The final threshold value for that condition was taken as the mean of the four threshold estimates. Eight normal hearing subjects participated in the experiment varying in age from 22 to 27 years. All subjects had thresholds ≤15 dB HL (ISO 8253-1, 1989) at octave frequencies from 0.125 to 8.0 kHz. They had practice trials in CMR experiments before collecting the data.
2.3.
249
Results
The upper panels of Fig. 2 show the mean measured thresholds averaged across the individual data of the eight subjects (left) and model predictions (right) for the three masking conditions (RF: downward pointing triangles, CM: upward pointing triangles, FB-only: squares) together with the thresholds in quiet (crosses) as a function of the signal frequency. The lower panels show the amount of CMR, i.e., the difference between the thresholds for the RF and CM conditions. In general, individual masked thresholds are very similar across subjects as indicated by the small interindividual standard deviations of the mean. For all signal frequencies measured, thresholds for the FB-only condition are 4 to 6 dB SPL above the corresponding thresholds in quiet, indicating some influence of the flanking band at the cochlear place most sensitive to the signal. For all signal frequencies, threshold for the RF condition is about 36 dB SPL. Thresholds for the CM condition are always below those for the RF condition indicating a masking release (CMR) due to the presence of the flanking band (lower left panel). At a signal frequency of 1 kHz, the CMR is about 6 dB. It increases slightly towards higher frequencies; the largest CMR of 10 dB is found for the 8-kHz signal. Model predictions (right panels of Fig. 2) for the different condition are shown in the same way as the measured data. A slight modification of the model described by Ernst and Verhey (2006) was necessary in order to obtain realistic simulated thresholds for the FB-only condition of the present study. The
Fig. 2 – Mean measured thresholds (top left panel) and CMR calculated as the difference between RF- and CM-threshold (bottom left panel) for eight subjects and corresponding model predictions (right panels) as function of the signal frequency. The level of the signal-centred band was 33 dB SPL. The data for the different conditions are indicated by different symbols. In addition, the dotted line indicates predicted thresholds of the original model by Ernst and Verhey (2006). Error bars indicate plus minus one standard deviation. The overall shape of the simulated data matches the experimental data. For most of the data points, simulated and measured thresholds differ by less than the interindividual standard deviation. However, for the 2, 4 and 8 kHz signal, the simulated thresholds for the CM condition are too low. The simulated CMR is equal or even higher than the measured CMR indicating that peripheral processes are sufficient to account for the magnitude of CMR in this experiment.
250
BR A I N R ES E A RC H 1 2 2 0 ( 2 00 8 ) 2 4 6 –25 5
filter of the linear pathway and the second filter of the nonlinear pathway of the DRNL filter were 4th order gammatone filters instead of second-order gammatone filters as used in Ernst and Verhey (2006) and Plack et al. (2002). The original model predicted thresholds for the FB-only condition (dotted line) which were, on average, 16 dB higher than the measured thresholds. The simulated thresholds of the modified model are on average 15 dB below the predictions of the original model, i.e., in quantitative agreement with the experimental data. The average simulated threshold for the RF condition is 36 dB SPL, i.e., the same as the average measured threshold for this condition. For the signal frequency of 1 kHz, the threshold for the CM condition is 6 dB lower than that for the RF condition, i.e., the simulated CMR is the same as the measured masking release. For the other signal frequencies, the model overestimates the CMR by up to 8 dB.
3. Contribution of central processes to comodulation masking release Dau et al. (2005) attempted to distinguish between peripheral and central processes by combining different across-frequency processes. A common onset causes a binding of frequency components into auditory objects, whereas onset asynchrony results in a dissociation of the frequency components. For example, Darwin and Ciocca (1992) showed different changes in pitch perception when a mistuned harmonic was added to a harmonic complex tone depending on whether the additional tone started before or simultaneously with the complex tone. Dau et al. (2005) argued that if CMR is a central process, onset asynchrony between the masker components may reduce CMR, because the comodulated bands are not grouped into one object. In agreement with this hypothesis, Grose and Hall (1993) found a largely reduced CMR for flanking bands starting before the signal-centred band. Dau et al. (2005) extended the study of Grose and Hall by using different spectral separations between the masker components. They found a largely reduced CMR for flanking bands starting before the signal-centred band only for well separated masker components. This finding further supports the hypothesis that across-channel interactions are central processes. The present study investigates if central processes are also involved in the paradigm used in Ernst and Verhey (2006) and in the present study with only one flanking band and large level and frequency differences relative to the signal-centred band.
3.1.
Methods
Stimuli were presented to both ears through Sennheiser HD580 headphones. All other parts of the experimental setup and the measurement procedure were the same as used in the previous experiment. The frequency of the sinusoidal signal was 2 kHz. Signal duration was 250 ms including 50-ms raisedcosine ramps and the signal was temporally centred in the masker. As in the first experiment, thresholds were measured in two masking conditions which differed in the masker spectrum: (i) the RF condition, where the masker consisted of the 20-Hz wide multiplied noise centred at the signal frequency and (ii) the CM condition where the masker consisted of the
signal-centred band and a flanking band with the same temporal envelope. In addition to these two spectral conditions, two gating conditions were used. In a synchronous condition, signal, signal-centred band and, in the case of the CM condition, the flanking band were gated on and off synchronously. In a fringe condition, signal and signal-centred band were gated on and off simultaneously while the onset of the 500-ms flanking band was 125 ms before the signal-centred band. The flanking band was centred either two or three octaves below the signal frequency. For both flanking band positions, the two highest levels of Ernst and Verhey (2006) were used, i.e., 60 and 70 dB for a flanking band two octaves below the signal frequency and 70 and 80 dB for a flanking band centred three octaves below the signal frequency. The level of the signalcentred band was 20 dB SPL. In these conditions, Ernst and Verhey (2006) showed that a suppression model predicted a substantial CMR. Eleven normal hearing subjects participated in the experiment, varying in age from 23 to 30 years. All subjects had thresholds ≤15 dB HL (ISO 8253-1, 1989) at octave frequencies from 0.125 to 8.0 kHz. They had practice trials in CMR experiments before collecting the data.
3.2.
Results
Fig. 3 shows mean CMR averaged across the 11 subjects for the synchronous condition (open symbols) and for the fringed condition (filled symbols) for the two frequency separations between flanking band and signal-centred band. For all combinations of level and centre frequency of the flanking band, the CMR is similar (difference b1.5 dB) for the fringe and the synchronous condition. That means that the CMR was not eliminated by introducing a gating asynchrony between the signal-centred band and the flanking band as found by Dau et al. (2005). The failure to find an effect of onset asynchrony on the CMR with high-level flanking bands supports the hypothesis that a large proportion of the CMR in those masking conditions is due to across-frequency processes at a peripheral level of the auditory system. The discrepancies between the findings of the present study and the results of Dau et al. (2005) might be due to differences in the stimulus parameters. Dau et al. (2005) used more flanking bands with the same on- and offsets. This may have enhanced the impression of two objects (flanking bands and signalcentred band) in the fringe condition. In addition, the present study investigated CMR with levels of the flanking band that were considerably higher than the level of the signal-centred band. For these high level differences and frequencies of the flanking band below, the signal frequency suppression is likely to contribute to CMR (Oxenham and Plack, 1998). In contrast, Dau et al. (2005) used masker components with the same level.
4. Relative contributions of peripheral and central processes to comodulation detection difference Hall et al. (2006) investigated whether CDD could be accounted for better in terms of energetic masking at a peripheral level of the auditory system or in terms of perceptual fusion based on
BR A I N R ES E A RC H 1 2 2 0 ( 2 00 8 ) 2 4 6 –2 55
Fig. 3 – Mean measured CMR for 11 subjects for two centre frequencies of the flanking bands for a synchronous condition (open symbols), where all masker components were gated on and off synchronously and for the fringe condition (filled symbols), where the flanking band was gated on before and gated off after the signal-centred band. The level of the signal-centred band was 20 dB SPL. The level of the flanking band was 60, 70 or 80 dB and is indicated by different symbols. Error bars indicate plus minus one standard error. Thresholds for the synchronous and the fringe conditions are very similar (difference smaller than 1.5 dB) indicating that onset asynchrony has a negligible effect on the CMR. This result further supports the hypothesis that CMR with high-level off-frequency maskers and lowlevel on-frequency maskers can be explained by considering only peripheral processes.
a central across-frequency process. They used a similar paradigm as in the previous section for CMR. Hall et al. (2006) determined CDD for multiple masker bands which were either gated synchronously with the signal band or had a leading temporal fringe of 200 ms. Three of the eight listeners showed a large CDD which was decreased in the fringe condition, supporting the hypothesis that part of the CDD is at least for those subjects due to a central fusion process. However, the majority of the listeners showed little effect of the onset of the masker bands relative to that of the signal band on the magnitude of CDD (see also Moore and Borrill, 2002). This result indicates that peripheral aspects may also play an important role. The suppression model described in the present study predicts a CDD of 5 dB for the masking paradigm used in Hall et al. (2006) which is equal to the average measured CDD observed for the listeners without a fringe effect. To clarify the relevance of suppression in CDD experiments, three experiments from the literature are modelled which show the influence of the three main parameters on the magnitude of CDD: (i) number of masker bands, (ii) spectral distance between signal band and masker bands and (iii) level of the masker bands. The left panels of Fig. 4A show the average data (top) and CDD (bottom) of a study by McFadden (1987) in which the number and position of the masker bands were varied. Thresholds of a signal band with a centre frequency of 2.5 kHz are shown for a condition without masker bands (0), for three conditions with two masker bands and for two conditions with
251
four masker bands. The masker bands were positioned symmetrically around the signal band. The frequency separation between the signal and masker bands for the condition with only two masker bands was either 200 Hz (2n), 500 Hz (2m) or 1000 Hz (2b). For the condition with four masker bands, the frequency separation between adjacent bands was either 200 Hz (4n) or 500 Hz (4m). Each noise band was 100-Hz wide and had a level of approximately 70 dB SPL. The signal band was either comodulated (CM condition) or the masker bands and the signal band had different envelopes (uncorrelated condition). For both conditions, all masker bands had identical envelopes. Therefore the uncorrelated condition will be referred to as councorrelated (CU) in the following. McFadden found that the CDD in the conditions with two masker bands decreases from 6 dB for the broadest frequency separation to approximately 2 dB for a separation of 200 Hz (lower left panel of Fig. 4A). The CDD for the conditions with four masker bands was always larger than the CDD for the corresponding condition with the two nearest masker bands. The right panels of Fig. 4A show the corresponding model predictions. Predicted thresholds for the CM condition (upward pointing triangles) agree quantitatively with the experimental results. Thresholds for the CU condition are slightly higher than found in the measured data. Thus, the predicted CDD is on average about 3 dB smaller than found in McFadden (1987). In general, the influence of frequency separation and number of masker bands on the magnitude of CDD is predicted by the suppression model, i.e., the model predicts a decrease in CDD as the frequency separation increases and a larger CDD for four bands than for the nearest two masker bands. Fig. 4B shows data (left) and model predictions (right) for an experiment conducted by Cohen and Schubert (1987). They measured the amount of CDD in the presence of a masker band centred at 1 kHz for various centre frequencies of the signal band in the range from 0.2 kHz up to 6 kHz. The level of the masker band was 73 dB SPL. Both the signal band and the masker band had a bandwidth of 100 Hz. In general, Cohen and Schubert (1987) found a larger CDD for centre frequencies of the signal band above the centre frequency of the masker band (1000 Hz) than for centre frequencies of the signal band below 1000 Hz. The maximum CDD of 11 dB was found for a signal band centred at 2 kHz, i.e., with a masker band one octave below the signal band. The model predicts the asymmetry for upper and lower masker bands (right panels of Fig. 4B). However, for the signal band centred at 0.8 kHz, the model predicts a CDD that is substantially larger than the CDD found by Cohen and Schubert (1987). The overestimation of the CDD for the centre frequency is presumably related to the overestimation for suppressors above the signal frequency (Plack et al., 2002). A comparable effect was observed in the simulation of CMR (Ernst and Verhey, 2006). The predicted CDD is also slightly larger than the measured CDD for signal bands with frequencies of 4 and 6 kHz indicating that the amount of suppression within the model may be too large in those masking conditions. The predicted thresholds for centre frequencies of the signal band close to the centre frequency of the masker band are considerably higher than the experimental results. A model only using the masker energy in the filter at the signal
252
BR A I N R ES E A RC H 1 2 2 0 ( 2 00 8 ) 2 4 6 –25 5
frequency (power spectrum model, Fletcher, 1940) predicts thresholds for the UN condition that are similar to the predictions of the current model indicating that other processes are responsible for the lower measured thresholds for the UN condition for small spectral separation. A similar effect has been observed in spectral masking patterns. Derleth and Dau (2000) showed that a model using modulation as an additional cue predicts thresholds that are up to 10 to 15 dB lower than those of a power-spectrum model. Fig. 4C shows data (left) from Borrill and Moore (2002) and model predictions (right) on the effect of the level of the masker bands on the magnitude of CDD. Masker bands were centred at 0.9 and 2.1 kHz. The centre frequency of the signal band was 1.5 kHz. Each noise band was 20 Hz wide and the spectrum level of the masker bands was either 45 or 65 dB spectrum level. Thresholds of the CM condition (upward pointing triangles) and two uncorrelated conditions are shown. Apart from the CU condition (downward pointing triangles) as used in McFadden (1987), Borrill and Moore measured thresholds for an alluncorrelated (AU) condition (circles), where all masker bands had different envelopes. For both levels of the masker bands, the lowest mean measured thresholds were obtained for the CU condition. Thresholds for the AU condition were slightly higher than those for the CU condition and the highest thresholds were obtained for the CM condition. Thus, the CDD calculated as the threshold difference between the CM and the AU conditions (circles) was slightly smaller than the CDD using the CU condition as the reference (downward pointing triangles) as in McFadden (1987). In general, the measured data show large interindividual differences. Thus, only the CDD at the higher level of the masker bands was significant. In agreement with the data, the model predicts the highest thresholds for the CM condition and the lowest for the CU condition. The average predicted CDD for all conditions is in good agreement with the measured data. However, the model overestimates the CDD (CM-CU) for the lower level. Fig. 4 – Mean measured thresholds (upper left panels, respectively) and CDD (lower left panels, respectively) and model predictions (right panels) for three main parameters influencing the magnitude of CDD. (A) Number of masker bands (McFadden, 1987). (B) Signal frequency (Cohen and Schubert, 1987). (C) Spectrum level of the masker band (Borrill and Moore, 2002). Upward-pointing triangles represent the thresholds for the CM condition. Downward-pointing triangles represent the thresholds for the CU condition (upper panels) and the CDD calculated as the difference between CM- and CU-threshold (lower panels). Circles represent the thresholds for the AU condition (upper panels) and the CDD (lower panels) calculated as the threshold difference between CM and AU condition. Error bars indicate plus minus one standard deviation. The model predictions are in qualitative agreement with the experimental data. The simulated CDD as a function of masker bands is, however, about three 3 dB smaller than in the data, indicating that central processes may contribute to CDD in this type of experiment. For the other parameters, the modelled difference between correlated and all-uncorrelated condition is similar to the measured CDD emphasising the important role of peripheral processes in CDD experiment.
Fig. 5 shows data (open symbols) from Borrill and Moore (2002) and model predictions (filled symbols) for two additional conditions: (i) upper-correlated (UC) condition (diamonds), where the upper masker band was correlated with the signal
BR A I N R ES E A RC H 1 2 2 0 ( 2 00 8 ) 2 4 6 –2 55
Fig. 5 – Mean thresholds (open symbols) and model predictions (filled symbols) for two conditions as a function of the spectrum level of the masker. Diamonds indicate threshold for the upper-correlated (UC) condition, where the upper masker band was correlated with the signal band while the lower band was uncorrelated. Triangles indicate thresholds for lower-correlated (LC) condition, where the lower masker band was correlated with the masker band and the upper was uncorrelated. The experimental data are averaged across the individual data shown in Borrill and Moore (2002). Error bars indicate plus minus one standard deviation. Given the large interindividual differences, model predictions are in reasonable agreement with the data for the LC condition and for the UC condition at the lower level. The simulated thresholds for the UC condition at the higher level are too high indicating that the model is unable to predict the asymmetry in the role of the upper and lower band in CDD experiment at this level.
band while the lower band was uncorrelated and (ii) lower correlated (LC) condition (triangles), where the lower masker band was correlated with the masker band and the and the upper was uncorrelated. Measured thresholds for the two conditions are similar for the low masker level. At the higher masker level, thresholds for the LC condition are considerably higher than those for the UC condition, i.e., only the correlation of the lower band is important at the masker level. The predicted thresholds of the LC condition are similar to the data. However, in contrast to the experimental results, the model predicts higher thresholds for the UC condition than the LC condition for both levels of the masker bands. This is presumably again a consequence of the overestimated suppression for frequencies above the centre frequency of the signal. A similar trend was already observed for the predicted CDD of the experiment by Cohen and Schubert (1987, see middle panel of Fig. 4). For a signal frequency of 0.8 kHz, i.e., a comparable frequency ratio between signal and masker band, the predicted CDD was considerably higher than the measured CDD.
5.
Discussion
The model simulates several aspects of CDD and CMR. This and the data for the fringe condition indicate that peripheral processes may play an important role in these effects. The
253
model uses suppression as the peripheral mechanism to account for CDD and CMR. Suppression within the model results from a compression followed by a bandpass filtering. In the CM condition of CMR experiments, the flanking band is the suppressor (Ernst and Verhey, 2006). The flanking band and the signal-centred band are compressed together by the nonlinearity of the DRNL filter. The following bandpass largely attenuates the flanking band whereas the signal-centred band is essentially unaffected by the filter. Since the signal-centred band is compressed with the flanking band, its level at the output of the second filter is less than it would be if it were presented alone as in the RF condition. The strength of the suppression fluctuates over time due to the inherent envelope fluctuations of the narrow-band noise. Obviously, when the target signal is present in a CM condition it is also attenuated. However, since the flanking band and the signal-centred band are comodulated, the flanking band largely removes the signal-centred band, whereas the reduction in the average amplitude of the signal is comparatively small. This results in a larger signal-to-masker ratio for the CM condition than for the RF condition. In the CDD experiments, the masker consists of an offfrequency masker only. This off-frequency masker suppresses the signal. Suppression will be more effective when the fluctuation in the strength of the suppression is correlated with the level fluctuations of the target signal, i.e., in the CM condition than for uncorrelated suppressing masker and suppressed target signal. As mentioned in the introduction, previous studies have already modelled CDD on the basis of peripheral processes other than suppression. Borrill and Moore (2002) proposed a model on the basis of the excitation of the masker bands at the place of the basilar membrane tuned to the centre frequency. This excitation varies over time due to intrinsic envelope
Fig. 6 – Predicted CDD of the experiment in Cohen and Schubert (1987, see Fig. 4B) for a modified model as a function of the centre frequency of the signal band. As proposed in Buschermöhle et al. (2996), the model only uses a gammatone filter followed by a compression. The model fails to predict the data indicating that the second filter after the compression with the DRNL filter, which is responsible for suppression within the original version of the model, is crucial for the prediction of CDD.
254
BR A I N R ES E A RC H 1 2 2 0 ( 2 00 8 ) 2 4 6 –25 5
fluctuation of the masker. They argued that in the CM condition the signal-to-noise ratio does not change over time since the excitation due to the signal varies in the same way as the masker excitation. In contrast, in the uncorrelated condition, there are certain moments in time where the signal level is high while the masker level is low, i.e., where the signal-tonoise ratio is high. The model of Borrill and Moore (2002) uses those short time windows with a high signal-to-noise ratio to account for the difference in threshold between the CM and the uncorrelated conditions. The decision strategy is comparable to listening into the valleys of the masker envelope which was proposed by Buus (1985) as a possible mechanism underlying CMR. Data with different levels of the masker bands shown in Figs. 4C and 5 bolstered the hypothesis that excitation is the peripheral process underlying CDD. In the framework of the excitation model by Borrill and Moore, the higher influence of the correlation of the lower masker band (LC condition) can be accounted for by the shallower lowfrequency slope of the auditory filter. The increased asymmetry in the filter shape towards higher levels is also realized in the present DRNL filter by an increasing contribution of the linear pathway to the output of the DRNL filter. However, the suppression model fails to predict the higher influence of the correlation of the lower masker band (see Fig. 5) which was supposed to be due to the shallower low-frequency side of the filter at higher levels. This indicates that the asymmetry in filter shape (and, thus, the linear pathway) has a negligible effect on the predictions of the present model. Another peripheral model for CDD was proposed by Buschermöhle et al. (2006). They used only peripheral filtering followed by compression to predict the physiological correlate of CDD. Both processing stages are also included in the DRNL filter used in the present study, however, followed by a further bandpass filter in the nonlinear pathway and an additional linear pathway. In order to test the influence of peripheral filtering and compression alone, the nonlinear filter stage of the present model was reduced to the first gammatone filter and the broken-stick nonlinearity of the nonlinear pathway. Since the filter after the nonlinearity is crucial for the prediction of suppression (Plack et al., 2002; Ernst and Verhey, 2006), the modified model should not predict CDD if suppression is the only reason for the ability of the complete model to predict CDD. Fig. 6 shows the simulated thresholds with the modified model together with the data in Cohen and Schubert (1987). The modified model fails to predict CDD indicating the importance of suppression within the present model framework. The amount of the simulated CMR and CDD is in some cases too small (cf. e.g., Fig. 4A). This may indicate that central across-frequency processes contribute to the masking release. In other cases, the magnitudes of the predicted CMR and CDD are too large. The mismatch does not necessarily contradict the idea of suppression as a mechanism underlying CDD and CMR. It could either show the influence of detrimental acrosschannel effects at a central level of the auditory pathway or the inaccuracy of the model to simulate suppression in some masking conditions. Further experiments determining suppression and one of the effects (CMR or CDD) in the same subjects and with the same frequency-level combinations for
the masker are needed to quantify the exact contribution of suppression to the masking release. REFERENCES
Borrill, S.J., Moore, B.C.J., 2002. Evidence that comodulation detection differences depend on within-channel mechanisms. J. Acoust. Soc. Am. 111, 309–319. Breebaart, J., van de Par, S., Kohlrausch, A., 2001. Binaural processing model based on contralateral inhibition: I. Model structure. J. Acoust. Soc. Am. 110, 1074–1088. Buschermöhle, M., Feudel, U., Klump, G.M., Bee, M.A., Freund, J.A., 2006. Signal detection enhanced by comodulated noise. Fluctuation and Noise Letters (FNL), vol. 6, pp. L339–L347. Buus, S., 1985. Release from masking caused by envelope fluctuations. J. Acoust. Soc. Am. 78, 1958–1965. Cohen, M.F., 1991. Comodulation masking release over a three octave range. J. Acoust. Soc. Am. 90, 1381–1384. Cohen, M.F., Schubert, E.D., 1987. The effect of cross-spectrum correlation on the detectability of a noise band. J. Acoust. Soc. Am. 81, 721–723. Cooper, N.P., 1996. Two-tone suppression in cochlear mechanics. J. Acoust. Soc. Am. 99, 3087–3098. Darwin, C.J., Ciocca, V., 1992. Grouping in pitch perception: effects of onset asynchrony and ear of presentation of a mistuned component. J. Acoust. Soc. Am. 9, 3381–3390. Dau, T., Ewert, S.D., Oxenham, A.J., 2005. Effects of concurrent and sequential streaming in comodulation masking release. In: Pressnitzer, D., de Cheveigne, A., McAdams, S., Collet, L. (Eds.), Physiology, Psychoacoustics and Models. Springer, New York, pp. 335–341. Derleth, R.P., Dau, T., 2000. On the role of envelope fluctuation processing in spectral masking. J. Acoust. Soc. Am. 108, 285–296. Ernst, S.M.A., Verhey, J.L., 2005. Comodulation masking release over a three octave range. Acta Acustica united with Acustica, vol. 91, pp. 998–1006. Ernst, S.M.A., Verhey, J.L., 2006. Role of suppression and retro-cochlear processes in comodulation masking release. J. Acoust. Soc. Am. 120, 384–2852. Fletcher, H., 1940. Auditory patterns. Rev. Mod. Phys. 12, 47–65. Florentine, M., Buus, S., Poulsen, T., 1996. Temporal integration of loudness as a function of level. J. Acoust. Soc. Am. 99, 1633–1644. Geisler, C.D., Nuttall, A.L., 1997. Two-tone suppression of basilar membrane vibrations in the base of the guinea pig cochlea using “lowside” suppressors. J. Acoust. Soc. Am. 102, 430–440. Grose, J.H., Hall, J.W., 1993. Comodulation masking release: is comodulation sufficient? J. Acoust. Soc. Am. 93, 2896–2902. Hall, J.W., Haggard, M.P., Fernandes, M.A., 1984. Detection in noise by spectro-temporal pattern analysis. J. Acoust. Soc. Am. 76, 50–56. Hall, J.W., Buss, E., Grose, J.H., 2006. Comodulation detection differences for fixed-frequency and roved-frequency maskers. J. Acoust. Soc. Am. 119, 1021–1028. Houtgast, T., 1974. Lateral suppression in hearing: a psychophysical study on the ear's capability to preserve and enhance spectral contrast. Ph.D. dissertation (Academische Pers. B. V., Amsterdam). Hofer, S.B., Klump, G.M., 2003. Within- and across-channel processing in auditory masking: a physiological study in the songbird forebrain. J. Neurosci. 23, 5732–5739. International Standards Organization, 1989. Acoustics—Audiometric Test Methods – Part 1: Basic Pure Tone Air and Bone Conduction Threshold Audiometry. ISO 8253-1, 1989. ISO, Geneva.
BR A I N R ES E A RC H 1 2 2 0 ( 2 00 8 ) 2 4 6 –2 55
International Standards Organization, 2004. Acoustics—Reference Zero for the Calibration of Audiometric Equipment – Part 8: Reference Equivalent Threshold Sound Pressure Levels for Pure Tones and Circumaural Earphones. ISO 389-8, 2004. ISO, Geneva. Jensen, K.J., 2007. Comodulation detection differences in the hooded crow (Corvus corone cornix), with direct comparison to human subjects. J. Acoust. Soc. Am. 121, 1783–1789. Jiang, D., Palmer, A.R., Winter, I.M., 1996. The frequency extent of two tone facilitation in onset units in the ventral cochlear nucleus. J. Neurophysiol. 75, 380–396. Langemann, U., Klump, G.M., 2001. Signal detection in amplitude-modulated maskers: I. Behavioural auditory thresholds in a songbird. Eur. J. Neurosci. 13, 1025–1032. Las, L., Stern, E.A., Nelken, I., 2005. Representation of tone in fluctuating maskers in the ascending auditory system. J. Neurosci. 25, 1503–1513. Levitt, H., 1971. Transformed up-down procedures in psychoacoustics. J. Acoust. Soc. Am. 49, 467–477. McFadden, D., 1987. Comodulation detection differences using noiseband signals. J. Acoust. Soc. Am. 81, 1519–1527. Meddis, R., O'Mard, L.P.O., Lopez-Poveda, E.A., 2001. A computational algorithm for computing non-linear auditory frequency selectivity. J. Acoust. Soc. Am. 109, 2852–2861. Moore, B.C.J., Borrill, S.J., 2002. Tests of a within-channel account of comodulation detection differences. J. Acoust. Soc. Am. 112, 2099–2109. Nelken, I., Rotman, Y., Yosef, O.B., 1999. Responses of auditory-cortex neurons to structural features of natural sounds. Nature 397, 154–157. Neuert, V., Verhey, J.L., Winter, I.M., 2004. Responses of dorsal cochlear nucleus neurons to signals in the presence of modulated maskers. J. Neurosci. 24, 5789–5797. Nieder, A., Klump, G.M., 2001. Signal detection in amplitude-modulated maskers: II. Processing in the songbird's auditory forebrain. Eur. J. Neurosci. 13, 1033–1044. Oxenham, A.J., 2001. Forward masking: adaptation or integration. J. Acoust. Soc. Am. 109, 732–741. Oxenham, A.J., Plack, C.J., 1998. Suppression and the upward spread of masking. J. Acoust. Soc. Am. 104, 3500–3510.
255
Plack, C.J., Oxenham, A.J., Drga, V., 2002. Linear and nonlinear processes in temporal masking. Acta Acustica united with Acustica, vol. 88, pp. 348–358. Pressnitzer, D., Meddis, R., Delahaye, R., Winter, I.M., 2001. Physiological correlates of comodulation masking release in the mammalian ventral cochlear nucleus. J. Neurosci. 21, 6377–6386. Ruggero, M.A., Robles, L., Rich, N.C., 1992. Two-tone suppression in the basilar membrane of the cochlea: mechanical basis of auditory-nerve rate suppression. J. Neurophysiol. 68, 1087–1099. Shannon, R.V., 1976. Two-tone unmasking and suppression in a forward-masking situation. J. Acoust. Soc. Am. 59, 1460–1470. Singh, N.C., Theunissen, F.E., 2003. Modulation spectra of natural sounds and ethological theories of auditory processing. J. Acoust. Soc. Am. 114, 3394–3411. Simmons, J.A., Saillant, P.A., Ferragamo, M.J., Haresign, T., Dear, S.P., Fritz, J., McMullen, T.A., 1996. Auditory computations for biosonar target imaging in bats. In: Hawkins, H.L., McMullen, T.A., Popper, A.N., Fay, R.R. (Eds.), Auditory Computation. Springer, New York. Verhey, J.L., Ernst, S.M.A., 2007. Role of peripheral nonlinearities in comodulation masking release. In: Kollmeier, B., et al. (Ed.), Hearing—From Sensory Processing to Perception. Springer, New York. Verhey, J.L., Dau, T., Kollmeier, B., 1999. Within-channel cues in comodulation masking release (CMR) experiments and model predictions using a modulation-filterbank model. J. Acoust. Soc. Am. 106, 2733–2745. Verhey, J.L., Presnitzer, D., Winter, I.M., 2003. The psychophysics and physiology of comodulation masking release. Exp. Brain Res. 153, 405–417. Winter, I.M., Palmer, A.R., 1995. Level dependence of cochlear nucleus onset unit responses and facilitation by second tones or broadband noise. J. Neurophysiol. 73, 141–159. Yasin, I., Plack, C.J., 2007. The effects of low- and high-frequency suppressors on psychophysical estimates of basilar-membrane compression and gain. J. Acoust. Soc. Am. 121, 2832–2841.