Hearing Research 211 (2006) 74–84 www.elsevier.com/locate/heares
Research paper
Masking release for consonant features in temporally fluctuating background noise Christian Fu¨llgrabe a
a,*
, Fre´de´ric Berthommier b, Christian Lorenzi
a,c
Laboratoire de Psychologie Expe´rimentale – UMR CNRS 8581, Institut de Psychologie, Universite´ Rene´ Descartes – Paris 5, 71 Avenue Vaillant, 92774 Boulogne-Billancourt, France b Institut de la Communication Parle´e, UPRESA CNRS 5009, INPG, 46 Avenue Viallet, 38031 Grenoble, France c Institut Universitaire de France, France Received 5 September 2005; received in revised form 5 September 2005; accepted 14 September 2005 Available online 8 November 2005
Abstract Consonant identification was measured for normal-hearing listeners using Vowel–Consonant–Vowel stimuli that were either unprocessed or spectrally degraded to force listeners to use temporal-envelope cues. Stimuli were embedded in a steady state or fluctuating noise masker and presented at a fixed signal-to-noise ratio. Fluctuations in the maskers were obtained by applying sinusoidal modulation to: (i) the amplitude of the noise (1st-order SAM masker) or (ii) the modulation depth of a 1st-order SAM noise (2nd-order SAM masker). The frequencies of the amplitude variation fm and the depth variation fm0 were systematically varied. Consistent with previous studies, identification scores obtained with unprocessed speech were highest in an 8-Hz, 1st-order SAM masker. Reception of voicing and manner also peaked around fm = 8 Hz, while the reception of place of articulation was maximal at a higher frequency (fm = 32 Hz). When 2nd-order SAM maskers were used, identification scores and received information for each consonant feature were found to be independent of fm0 . They decreased progressively with increasing carrier modulation frequency fm, and ranged between those obtained with the steady state and the 1st-order SAM maskers. Finally, the results obtained with spectrally degraded speech were similar across all types of maskers, although an 8% improvement in the reception of voicing was observed for modulated maskers with fm < 64 Hz compared to the steady-state masker. These data provide additional evidence that listeners take advantage of temporal minima in fluctuating background noises, and suggest that: (i) minima of different durations are required for an optimal reception of the three consonant features and (ii) complex (i.e., 2nd-order) envelope fluctuations in background noise do not degrade speech identification by interfering with speech-envelope processing. 2005 Elsevier B.V All rights reserved. Keywords: Speech perception; Background noise; Masking release; Modulation masking; 1st-order modulation; 2nd-order modulation
1. Introduction
Abbreviations: ANOVA, analysis of variance; fm, 1st-order modulation frequency; fm0 , 2nd-order modulation frequency; SAM, sinusoidal amplitude modulation; SD, standard deviation; SNR, signal-to-noise ratio; SRT, speech reception threshold; Tukey HSD test, Tukey Honestly Significant Difference test * Corresponding author. Present address: Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, United Kingdom. Tel.: +44 122 376 5283. E-mail address:
[email protected] (C. Fu¨llgrabe). 0378-5955/$ - see front matter 2005 Elsevier B.V All rights reserved. doi:10.1016/j.heares.2005.09.001
Over the last decades, many studies have compared speech identification in steady state and temporally fluctuating backgrounds presented at the same level (e.g., Miller and Licklider, 1950; Duquesnoy, 1983; Festen and Plomp, 1990; Takahashi and Bacon, 1992; Gustafsson and Arlinger, 1994; Nelson et al., 2003; Qin and Oxenham, 2003). These studies demonstrated that, in normal-hearing listeners, speech identification performance and speech reception thresholds (SRTs) were better in fluctuating than in steadystate backgrounds. At least six different mechanisms or
C. Fu¨llgrabe et al. / Hearing Research 211 (2006) 74–84
factors seem to be involved in this so-called ‘‘masking-release’’ effect. 1.1. Dip listening Masking release is greater for square-wave than for speech-envelope modulation of a background noise (Bacon et al., 1998). It increases with the modulation depth of the modulated noise (e.g., Howard-Jones and Rosen, 1993a; Gustafsson and Arlinger, 1994) and the optimal amplitude modulation frequencies for observing masking release range from about 10 to 25 Hz (e.g., Miller and Licklider, 1950; Gustafsson and Arlinger, 1994; Kwon and Turner, 2001; Nelson et al., 2003) depending somewhat upon the speech material. Moreover, masking release increases when introducing spectral dips in a steady state or temporally fluctuating noise, and when the width of those spectral dips is increased (Peters et al., 1998). Finally, masking release is weaker or abolished for multi-talker babble maskers with a relatively flat temporal envelope, and it is reduced if the lower level portions of speech are masked by a spectrally shaped steady-state noise (e.g., Eisenberg et al., 1995; Bacon et al., 1998). These findings strongly suggest that listeners are able to take advantage of relatively short temporal minima in the fluctuating background to detect speech cues, a capacity often referred to as ‘‘listening-in-the-dips’’ or ‘‘listening-in-the-valleys’’. Clearly, this capacity requires a certain degree of temporal resolution (i.e., an ability to follow the background fluctuations in order to extract speech cues during the background valleys) and spectral resolution (i.e., an ability to access parts of the speech spectrum that are not masked or are less masked by the background). Consistent with this notion, the decrease in masking release observed for modulation frequencies greater than 30 Hz (and thus, for background valleys shorter than about 17 ms) can be attributed to forward masking – an important factor limiting temporal resolution – smoothing out or ‘‘filling in’’ the background valleys (e.g., Festen, 1993; Dubno et al., 2002). 1.2. Modulation masking/modulation interference A recent speech-perception study reveals that dip listening is counterbalanced by the perceptual interference of temporal modulations in the fluctuating background with the auditory processing of temporal modulations in the speech envelope (Kwon and Turner, 2001). However, this perceptual interference does not seem to follow the general characteristics reported in previous psychoacoustical studies on modulation masking or modulation detection interference (e.g., Bacon and Grantham, 1989; Houtgast, 1989; Yost et al., 1989; Ewert and Dau, 2000; Millman et al., 2002): the modulation frequencies of speech and the fluctuating background do not allow prediction of the perceptual interference, and the amount of perceptual interference decreases as speech and fluctuating background get closer in the audio-frequency domain (Kwon and Turner, 2001).
75
This suggests that the mechanisms supposed to be involved in modulation masking, modulation interference, and modulation tuning [e.g., modulation filters as proposed by Dau et al. (1997)] may be different from those involved in masking release. For instance, fluctuations in the background may cause some form of perceptual illusion of attentional origin leading listeners to process the background as part of the speech stimulus (e.g., Kwon and Turner, 2001; Nelson et al., 2003). 1.3. Auditory grouping Additional studies reveal that grouping mechanisms (e.g., Bregman, 1990) may be involved in masking release since the latter disappears in case of degraded spectral resolution and loss of temporal fine-structure information (as encountered in cochlear implant users or when using noisevocoded speech) (e.g., Nelson et al., 2003; Qin and Oxenham, 2003; Zeng et al., 2005). The fact that under these conditions speech identification is similar for fluctuating and steady-state backgrounds suggests that, when spectral resolution is sufficiently fine and temporal fine-structure information is available, normal-hearing listeners do take advantage of fine-structure cues and the incoherence of envelope cues across the spectrum to segregate speech from background. 1.4. Perceptual restoration Earlier work demonstrated that human listeners perceive missing phonemes by using the redundancies in speech stimuli occurring at the acoustic, phonetic, phonological, and/or lexical level (e.g., Warren, 1970). It is very likely that the ability to reconstruct speech or, in other words to infer the complete spectro-temporal structure of speech from incomplete information, plays a role in this situation. For instance, Howard-Jones and Rosen (1993b) demonstrated that, to a certain extent, normal-hearing listeners are able to ‘‘patch together’’ information in different broad frequency regions at different times to achieve a release from masking (a process termed ‘‘uncomodulated glimpsing’’ by the authors). Moreover, speech identification for normal-hearing listeners is very robust for speech subjected to periodic interruption by silence or by noise segments (e.g., Miller and Licklider, 1950; Powers and Speaks, 1973; Powers and Wilcox, 1977; Nelson and Jin, 2004). 1.5. Informational masking Recent work by Summers and Molis (2004) indicated that competing speech yields more masking than timereversed speech containing temporal fluctuations of equal magnitude. This suggests that informational masking, presumably resulting from competitive processing of linguistic information within the speech masker, may reduce the amount of masking release [however, no difference between
76
C. Fu¨llgrabe et al. / Hearing Research 211 (2006) 74–84
forward and backward speech maskers was found by Duquesnoy (1983) and Festen and Plomp (1990)]. 1.6. Across-critical band processes Although comodulation in the background masker appears to be crucial for masking release to occur, the contribution of true across-frequency processing of masker fluctuations (i.e., comodulation masking release, Hall et al., 1988) is relatively modest, and estimated to produce about a 1 dB change in SRT (Festen, 1993). The present study is particularly concerned with two issues. Firstly, the studies listed above have investigated masking release for speech in temporally fluctuating backgrounds in terms of identification performance or SRT. As recently pointed out by Munson et al. (2003), using such dependent measures has the advantage of providing summary estimates of phoneme perception by collapsing performance across different phonemes. However, the evident drawback of such an approach is to obscure any systematic pattern of phoneme misperception. Alone or in combination, the six mechanisms or factors supposed to be involved in masking release may not yield identical patterns of phoneme-perception errors as a function of background fluctuation frequency. The present study therefore examines release from masking both quantitatively and qualitatively by characterizing masked speech identification in terms of overall identification performance and patterns of information transmission, that is reception of consonant features such as voicing, place of articulation, and manner. According to previous studies listed above, identification performance should peak at a rather low background fluctuation frequency (between 10 and 25 Hz), whereas optimal reception of voicing, place, and manner information may be expected over a larger range of background fluctuation frequencies depending on the spectral or temporal nature of the acoustic cues upon which the different features are based. For example, perception of temporal-envelope cues in the range of hundreds of milliseconds (as used to receive manner) should require longer dips in the background noise, while perception of spectral and fast fine structure cues (as used to receive place or voicing) should be optimal for shorter dips in the background noise. Secondly, the present study extends the investigation of masking release by using a continuum of simple and complex background noise modulators, referred to as 1st- and 2nd- order modulators, in an attempt to disentangle the contributions of dip listening and modulation masking to masking release. In the past, this has proved to be extremely difficult, since sinusoidal amplitude modulation (SAM) promotes masking or interference with speech envelope processing as well as allows dip listening (Kwon and Turner, 2001). Recent psychoacoustical studies have demonstrated that modulation masking (of 10–13 dB) may occur in conditions where the physical modulation components of the fluctuating masker are remote in the
modulation spectrum from the target modulation (e.g., Sheft and Yost, 1997; Moore et al., 1999; Lorenzi et al., 2001b; Verhey et al., 2003; Sek and Moore, 2004; Fu¨llgrabe et al., 2005). Such a form of masking can be obtained with so-called 2nd-order SAM stimuli (Lorenzi et al., 2001a), which have slow sinusoidal variation in the modulation depth of a ‘‘carrier’’ (or 1st-order) SAM stimulus (as shown in the lower left panel of Fig. 1). Here, fm and fm0 correspond to the frequency of the carrier SAM and the frequency of the depth variation, respectively. In the time domain, such variations in depth yield a ‘‘beat’’ in the temporal envelope of the 2nd-order SAM stimulus at frequency fm0 . As indicated in the lower right panel of Fig. 1, the sinusoidal modulation in depth generates two additional components (i.e., sidebands) at frequencies fm fm0 and f m þ fm0 in the modulation spectrum of the 2nd-order SAM stimulus. However, there is no spectral component at the envelope beat frequency fm0 . The upper panels of Fig. 1 show the envelope waveform and modulation spectrum of an 8-Hz, 1st-order SAM stimulus for comparison. The envelope of an unmodulated noise masker is also shown in the upper left panel (straight dashed line). Psychoacoustical studies assessing auditory sensitivity to 2nd-order SAM stimuli have shown that 2nd-order SAM may be perceptually as salient as 1st-order SAM (Lorenzi et al., 2001b; Fu¨llgrabe and Lorenzi, 2003). They also revealed that: (i) 1st-order SAM detection performance decreased when 2nd-order masking modulations were introduced near the 1st-order modulation frequency (e.g., Fu¨llgrabe et al., 2005) and conversely, (ii) 2nd-order SAM detection thresholds increased when 1st-order masking modulations were presented near the envelope beat frequency (Lorenzi et al., 2001b). In these studies, the physical modulation components of the 1st- and 2nd-order SAM stimuli were too far apart to yield modulation masking effects as shown previously by Bacon and Grantham (1989) and Houtgast (1989). To date, the mechanism underlying these masking effects has not been fully elucidated. Masking effects may result either from: (i) a similarity between the signal and masker temporal envelope patterns (Moore et al., 1999), (ii) an interference between the signal modulation and a modulation distortion component of frequency fm0 generated by some nonlinear mechanism(s) along the auditory pathway in response to the masker (Shofner et al., 1996; Sheft and Yost, 1997; Moore et al., 1999; Verhey et al., 2003; Sek and Moore, 2004; Fu¨llgrabe et al., 2005), or (iii) an across-modulation-channel interference process (Ewert et al., 2002; Sek and Moore, 2003). This leads to the suggestion that a masker (lower panels of Fig. 1) with slow 2nd-order envelope fluctuations (e.g., below 16 Hz) carried by fast 1st-order envelope fluctuations (e.g., above 100 Hz) should not allow release from masking since dip listening is precluded when the dips in the masker background noise are shorter than approximately 5 ms [as shown by Gustafsson and Arlinger (1994)]. However, this type of noise with complex fluctuations
C. Fu¨llgrabe et al. / Hearing Research 211 (2006) 74–84
Envelope waveform
Modulation spectrum 0
100% dip
Log spectrum (dB)
Amplitude
2
1
0
-5 -10 -15 -20
0
0.1
0.2
0.3
0.4
0.5
8
Time (sec)
Log spectrum (dB) 0% depth (unmodulated)
0
0.1
32
64
128
0
1
0
16
Modulation frequency (Hz)
100% dip
2
Amplitude
77
0.2
0.3
0.4
-5 -10 -15 -20
0.5
Time (sec)
8
16
32
64
128
Modulation frequency (Hz)
Fig. 1. Envelope waveforms and modulation spectra of a 1st- (top row) and 2nd-order (bottom row) SAM stimulus used as maskers in the present study. The 1st-order SAM stimulus is 100% amplitude modulated at 8 Hz. In case of the 2nd-order SAM stimulus, the modulation frequency of the carrier SAM fm and the modulation frequency of the sinusoidal depth variation fm0 are 128 and 8 Hz, respectively. The modulation depths m and m 0 are fixed to 50%; therefore, stimulus modulation depth varies cyclically between 0 and 100%. Note that there is no energy in the modulation spectrum at the 2nd-order modulation frequency fm0 of 8 Hz. However, a salient envelope beat of 8 Hz can be heard. For comparison, the envelope of an unmodulated stimulus is shown in the upper left panel by a dashed line. Arrows indicate: (i) ‘‘dips’’ (i.e., temporal minima where stimulus modulation depth is 100%) in the envelope of both SAM masker stimuli that listeners presumably ‘‘listen into’’ to extract speech cues and (ii) unmodulated portions of the 2nd-order SAM masker (i.e., with a modulation depth of 0%) for which the SNR is identical to that of the unmodulated masker.
should yield some degree of ‘‘pure’’ modulation masking produced by any or a combination of the three mechanisms listed above. For instance, some speech envelope components may be confused with the 2nd-order envelope fluctuations of the masker. Such a modulation-masking effect should be reflected by a decrease in speech-identification scores below those obtained with a steady-state noise masker. Moreover, the deleterious effect of the 2nd-order masker modulations should mainly be observed when the frequency of the 2nd-order modulation masker falls within the range of low modulation frequencies (i.e., 4–16 Hz) known to be important for speech identification (e.g., Houtgast and Steeneken, 1985; Drullman et al., 1994a,b; Shannon et al., 1995). The present study aimed to further examine masking release and test the above suggestions by studying the effects of 1st- and 2nd-order temporal envelope fluctuations in a noise masker on consonant identification. Identification performance and patterns of information transmission were measured for young normal-hearing listeners in three types of background that differed only in their temporal envelope: steady-state speech-shaped noise, speech-shaped noise modulated by a 1st-order SAM with modulation frequencies ranging from 2 to 256 Hz, and speech-shaped
noise modulated by a 2nd-order SAM. When the masker modulator was a 2nd-order SAM, the frequency of the 1st-order SAM fm, acting as a carrier, was fixed at 4– 256 Hz, and the frequency of the 2nd-order SAM fm0 was systematically varied from 2 Hz to one or two octaves below fm. The speech stimuli used in these experiments were either left intact (unprocessed) or spectrally degraded (processed). Such a spectral degradation was intended to force listeners to identify speech using mainly temporal envelope cues, and therefore to highlight any potential masking effect between the 2nd-order modulation masker and speech modulation components. 2. Experiment 1 – unprocessed stimuli 2.1. Method 2.1.1. Listeners Eight native French speakers participated in this experiment. Their ages ranged from 20 to 31 years (mean age = 23 years; SD = 3 years). All listeners had audiometric thresholds less than 20 dB HL at octave frequencies between 250 and 8000 Hz, and no history of hearing difficulty. Listeners were university students who were paid
78
C. Fu¨llgrabe et al. / Hearing Research 211 (2006) 74–84
for their services. All listeners were fully informed about the goal of the present study and provided written consent before their participation.1 2.1.2. Stimuli Forty-eight Vowel–Consonant–Vowel (VCV) stimuli were recorded digitally via a 16-bit A/D converter at a 44.1-kHz sampling frequency and equalized in rms power. These speech stimuli consisted of three exemplars of 16 /a/ -C-/a/ utterances (C = /p,t,k,b,d,g,f,s,,m,n,r,l,v,z,Z/) read by a female native French speaker in quiet. For the masked test conditions, a speech-shaped noise (3-dB cutoff frequency: 140 Hz; rolloff: 6 dB/ octave), refreshed in each interval, was added to each utterance. This noise masker was either: (i) Steady (i.e., unmodulated). (ii) Modulated by a sine-wave modulator; the expression describing this 1st-order SAM masker m1(t) was m1 ðtÞ ¼ ½1 þ m sinð2pfm t þ /ÞnðtÞ;
ð1Þ
where n(t) represents the noise carrier. The modulation depth m was fixed at 1 (i.e., 100%) and the 1storder modulation frequency fm was 2, 4, 8, 16, 32, 64, 128, or 256 Hz. The starting phase of the modulation / was randomized in each interval. (iii) Modulated by a 2nd-order SAM modulator with a carrier modulation frequency fm. The modulation depth of the carrier SAM was sinusoidally modulated at a given 2nd-order modulation frequency fm0 . The expression describing the 2nd-order SAM masker m2(t) was m2 ðtÞ ¼ ½1 þ ½0:5 þ m0 sinð2pfm0 t þ /Þ sinð2pfm t þ /ÞnðtÞ;
ð2Þ
where m 0 is the modulation depth of modulation-depth variation, fixed at 0.5 (i.e., 50%), and / represents the starting phase of modulations, randomized in each interval. As indicated in Eq. (2), the modulation depth of the carrier SAM varied periodically between 0% and 100% at fm0 . The carrier modulation frequency fm was 4, 8, 16, 64, or 256 Hz. The 2nd-order modulation frequency fm0 was 2 Hz when fm = 4 Hz, 2 or 4 Hz when fm = 8 Hz, 2, 4, or 8 Hz when fm = 16 Hz, 2, 4, 8, 16, or 32 Hz when fm = 64 Hz, and 2, 4, 8, 16, 32, or 64 Hz when fm = 256 Hz. In all experimental conditions, the noise masker was added to each utterance at a 15 dB (rms) signal-to-noise ratio (SNR). This SNR was determined in a preliminary experiment so as to yield consonant identification performance of about 50–60% correct in the presence of the steady-state noise masker. In each interval, the utterance and
1 This study was carried out in accordance with the Declaration of Helsinki.
the noise were of identical duration (mean duration = 648 ms; SD = 46 ms). All noise maskers were shaped using a raised-cosine function with 10-ms rise/fall times. Each stimulus was presented diotically to the listener via headphones (Sennheiser HD 565) and overall levels were calibrated to produce an average output level of 70 dB (A) for continuous speech in noise. More specifically, output levels at the headphones were measured using an artificial ear (Bru¨el & Kjær, Type 4153). Pseudo ‘‘continuous’’ speech segments were generated by repeating the different VCV utterances (with 10-ms inter-stimulus intervals) for about 10 s. 2.2. Procedure Listeners were tested individually in a sound-attenuating booth using a single-interval, 16-alternative procedure without feedback. A PC controlled the course of the experiment. Each listener was instructed to identify the presented consonant. The 16 possible responses were displayed orthographically on a computer screen, and the listener entered his/her choice by selecting one response using a computer mouse. In each experimental run, a set of 48 VCV was presented in random order. At the end of each experimental run, the percentage of correct identification was calculated and a 16 · 16 confusion matrix was compiled. Prior to data collection, listeners received practice for a total of 1.5 h to familiarize them with the speech material and different masking conditions. Each listener was then tested in four 2-h sessions on nonconsecutive days. In a typical experimental session, the listener first completed an experimental run in quiet followed by the steady-state masker condition. The remaining 25 conditions were then completed in pseudo-random order alternating 1st- and 2nd-order SAM maskers. 2.3. Results Mean identification scores across listeners obtained in quiet (filled star) and for steady-state (asterisk), 1st-order SAM (filled circles), and 2nd-order SAM (open symbols) noise maskers are presented in Fig. 2. For the 1st-order SAM masker condition, identification performance is presented as a function of the 1st-order modulation frequency fm. For the 2nd-order SAM masker condition, identification performance is presented as a function of the 2nd-order SAM frequency fm0 for each carrier modulation frequency fm, as given in the figure key. Chance level always corresponds to 6.25% correct responses (1/16). For each experimental condition (quiet, steady-state masker, 1st-order SAM masker, 2nd-order SAM masker), a group matrix was constructed by summing the individual confusion matrices, yielding 96 presentations per consonant (8 listeners · 3 exemplars · 4 sessions). The specific reception of three speech features was evaluated by information transmission analysis (Miller and Nicely, 1955) on the aggregate confusion matrices. The results for the reception of
C. Fu¨llgrabe et al. / Hearing Research 211 (2006) 74–84
79
a
b
Fig. 2. Mean identification scores for unprocessed speech presented in quiet (filled star), or within a steady-state (asterisk), 1st-order SAM (filled circles), or 2nd-order SAM (open symbols) noise masker. For the 1storder SAM masker condition, performance is plotted as a function of the 1st-order modulation frequency fm. For the 2nd-order SAM masker condition, performance is plotted as a function of the 2nd-order modulation frequency fm0 for each carrier modulation frequency fm, as given by the figure key. Error bars represent ±one standard deviation about the mean across listeners.
voicing (a), place of articulation (b), and manner (c) as a function of experimental condition are given in Fig. 3. 2.3.1. First-order SAM maskers As shown in Fig. 2, all listeners identified VCV stimuli perfectly when presented in quiet, but scores reached only 57% correct identification when the steady-state noise masker was added. The results obtained for a 1st-order SAM masker showed a clear release from masking: while identification performance in the presence of a very rapidly fluctuating noise (fm = 256 Hz) was almost identical to that obtained with a steady-state noise, identification performance increased progressively with decreasing modulation frequency of the SAM masker down to 8 Hz (i.e., with increasing duration of the temporal dips). In agreement with previous studies using different speech material and masker parameters (Gustafsson and Arlinger, 1994; Nelson et al., 2003), maximum performance (here corresponding to 35% masking release) was obtained for fm of 8 and 16 Hz, and performance tended to decreased again as the masker fluctuations got slower (fm < 8 Hz). A one-way repeated-measures analysis of variance (ANOVA) followed by Tukey HSD tests was used to estimate the significance of the differences in identification scores across the different (unmodulated and modulated) noise
c
Fig. 3. Percentage of transmitted information of unprocessed speech for voicing (a), place of articulation (b), and manner (c) in presence of a steady-state noise masker or as a function of the 1st- or 2nd-order modulation frequency, fm or fm0 , of a fluctuating noise masker.
masker conditions. An arcsine transform was applied to the proportions of correct responses in order to make the identification scores follow a normal distribution. The analysis showed a significant main effect of condition [F(8, 56) = 135.8, p < 0.001]. Post hoc comparisons indicated that identification scores obtained with fm = 256 Hz and a steady-state noise did not differ significantly (p = 0.97), whereas all other fm yielded significantly better identification scores than the steady-state noise (all p < 0.001). Moreover, performance at 8 Hz differed significantly from performance obtained for the other fm (all p < 0.005), except when fm = 16 Hz (p = 0.72), confirming the visual impression of maximum masking release at these two modulation frequencies. Overall, these data replicate previous results obtained by Gustafsson and Arlinger (1994) using Swedish sentences either in a steady-state or 100%, 1st-order SAM speechshaped noise maskers presented at a 15 dB SNR. They found maximum release from masking of about 50% for
80
C. Fu¨llgrabe et al. / Hearing Research 211 (2006) 74–84
fm between 10 and 20 Hz. The general characteristics of masking release are therefore very consistent across the two studies despite differences in terms of speech material and language (French VCVs vs. Swedish sentences), with the exception that masking release in the present study was still observed at fm = 128 Hz. This discrepancy may be due to differences in the experimental paradigm: a closed-set format was used here while an open-set format was used by Gustafsson and Arlinger (1994). Interestingly, Fig. 3 reveals that the introduction of (1st-order) temporal fluctuations in the background noise (filled circles) does not yield identical improvements in the reception of the three speech features. Consistent with the identification scores, the reception of voicing (a) and of manner (c) peaks at fm = 8 Hz, whereas highest reception of place of articulation (b) is found for fm = 32 Hz. It is also noteworthy that, despite a modest peak at 8 Hz, the reception of voicing remains relatively constant over the whole range of low fm and only decreases beyond 32 Hz, while the reception of place and manner decreases below and above their respective peaks. Detailed analysis of the confusion matrices reveals that the perception of the fricatives /f,s,,v,z,Z/ is relatively unaffected by the introduction of 1st-order envelope fluctuations in the masking noise. Overall, confusions mainly occur between plosives, nasals, and liquids of identical place of articulation for fm < 64 Hz. The release from masking corresponds mainly to a reduction in the confusion between the plosives /b,d,g/ on the one hand, and the nasals /m,n/ and the liquid /l/ on the other hand. 2.3.2. Second-order SAM maskers The results obtained for 2nd-order SAM maskers show that: (i) identification scores systematically fall between those obtained with a steady-state and the 1st-order SAM masker, (ii) for a given 1st-order (carrier) modulation frequency fm, identification scores are not affected by changes in the 2nd-order modulation frequency fm0 , (iii) when averaged over the different values of fm0 , identification performance is best for a carrier modulation frequency of 8 Hz and decreases for lower and higher values of fm, consistent with the data obtained with a 1st-order SAM masker, and (iv) identification performance for fm = 256 Hz (on average 58% correct) is very close to that obtained in the steady-state noise condition (57%). For each carrier modulation frequency (except the lowest fm, for which only one value of fm0 was used), a one-way repeated-measures ANOVA was conducted on the arcsine transformed proportions of correct responses obtained in the presence of a 2nd-order SAM masker, with fm0 (2–6 levels) as the within-subjects factor. Analyses for fm = 8– 64 Hz show no significant main effect of 2nd-order modulation frequency fm0 (all F non-significant at p > 0.17). However, when fm = 256 Hz, the main effect of 2nd-order modulation frequency fm0 was just significant [F(5, 35) = 2.58, p = 0.043]. Given the general absence of a significant main effect of 2nd-order modulation frequency, a second one-way repeated-measures ANOVA was performed
to determine the statistical significance of the differences between average identification performance for the various carrier modulation frequencies. This analysis – including identification scores from the steady-state noise condition to quantify masking release – showed a significant main effect of carrier modulation frequency [F(5, 35) = 147.5, p < 0.001]. Subsequent post hoc comparisons revealed that: (i) identification scores in the presence a 4-Hz carrier modulation were not significantly different from those obtained with fm = 8 (p = 0.058) and 16 Hz (p = 0.66) and (ii) identification scores in the steady-state noise condition differed significantly from those measured when a 2nd-order SAM masker was used (all p < 0.001), except when fm = 256 Hz (p = 0.96). Inspection of the information transmission analysis for conditions involving 2nd-order SAM maskers (open symbols) indicates no systematic differences across speech features as a function of fm or fm0 . Detailed analysis of confusion matrices revealed that, for each value of fm0 , a decrease in fm from 256 to 4–8 Hz was mainly associated with a progressive reduction in the confusion between the plosives / p,t,b,d,g/ on the one hand, and the nasals /m,n/ and the liquid /l/ on the other hand. Moreover, the general pattern of confusions was relatively unaffected by changes in fm0 . 3. Experiment 2 – processed stimuli The above experiment did not provide any clear indication of a deleterious effect of the 2nd-order SAM masker on consonant perception due to modulation masking: average identification scores measured in the presence of a 2ndorder SAM masker with fm = 256 Hz were never lower than those measured with a steady-state noise.2 However, such a deleterious effect may have been obscured by the fact that listeners could also rely on spectral cues to achieve speech identification, thereby minimizing the potential interference caused by the 2nd-order SAM masker on auditory processing of the speech envelope. A second experiment therefore replicated the above experiment with spectrally degraded VCV stimuli, produced using a 4-channel noise vocoder identical to that described in Shannon et al. (1995). Spectral degradation was intended to force listeners to mainly use temporal-envelope cues when identifying the VCV stimuli, and therefore to allow a more direct examination of the modulation-masking effect. 3.1. Method 3.1.1. Listeners Seven native French speakers aged 21–28 years (mean age = 24 years; SD = 2 years) were tested. One listener 2
It is noteworthy that one listener showed consistently worse identification scores in the presence of the 2nd-order SAM masker with fm = 256 Hz than in the steady-state noise condition. On average, this modulation masking effect was only about 4%, but increased to 7% at fm = 32 Hz.
C. Fu¨llgrabe et al. / Hearing Research 211 (2006) 74–84
81
had participated in the previous experiment, but none was familiar with the noise-vocoded stimuli. All listeners had audiometric thresholds less than 20 dB HL at octave frequencies between 250 and 8000 Hz, and no history of hearing difficulty. Listeners were university students who were paid for their services. All listeners were fully informed about the goal of the present study and provided written consent before their participation.3 3.1.2. Stimuli and procedure Spectrally degraded speech stimuli were constructed from the set of unprocessed speech stimuli used in Experiment 1. Each digitized signal was lowpass filtered at 5 kHz (1st-order Butterworth filter). The signal was then split into four broad frequency bands (3rd-order elliptical IIR filters): 20–800, 800–1500, 1500–2500, and 2500–5000 Hz. Adjacent filters overlapped at the point at which the output from each filter was 15 dB down from the level in the passband. In each frequency band, the envelope was extracted by half-wave rectification and lowpass filtering at 500 Hz (1st-order Butterworth filter) of the bandpass filtered signal. The resulting envelope was used to modulate a white noise. The modulated noise was then frequency-limited by filtering with the same bandpass filters as used to create the original analysis band. Finally, the resulting modulated noises from each band were combined and lowpass filtered at 5 kHz. In each experimental condition, a steady state or fluctuating speech-shaped noise masker was added to the processed speech stimulus at 0 dB (rms) SNR (i.e., speech processing was performed before the addition of the noise masker). This SNR was determined in a preliminary experiment so as to yield identification performance of at least 13% correct for each listener when the speech-shaped noise was steady (each listener was initially trained between 3 and 5 h in this experimental condition). Such a criterion of approximately twice the chance level was adopted to avoid a floor effect that would have made it impossible to observe any modulation masking. The apparatus, procedure, background noises, and presentation level were identical to those described in Experiment 1, except that only three carrier modulation frequencies of the 2nd-order SAM masker were tested (fm = 4, 16, or 256 Hz). 3.1.3. Results Mean identification scores across listeners are presented in Fig. 4 for the different noise maskers. The reception of voicing (a), place of articulation (b), and manner (c) was evaluated by information transmission analysis on the aggregate confusion matrices, and is presented in Fig. 5. Fig. 4 shows that listeners identified spectrally degraded consonants in the presence of a steady-state noise (asterisk) at levels substantially above chance performance, on aver-
3 This study was carried out in accordance with the Declaration of Helsinki.
Fig. 4. Mean identification scores for processed (i.e., spectrally degraded) speech presented within a steady-state (asterisk), 1st-order SAM (filled circles), or 2nd-order SAM (open symbols) noise masker. Otherwise as in Fig. 2, except that only three carrier modulation frequencies (fm = 4, 16, or 256 Hz) were used in the 2nd-order SAM masker condition.
age of about 38% correct. This score is compatible with previous results obtained by Lorenzi et al. (1999) in very similar experimental conditions (amount of training, SNR, speech material). The large inter-listener variability is due to relatively low identification performance (between 11 and 24% depending on the experimental condition) for two listeners and performance exceeding 60% correct for the listener who had already participated in Experiment 1. The remaining four listeners showed homogeneous identification scores in the range of 35–40%. In contrast to results for unprocessed speech, identification performance for severely spectrally degraded speech was similar in all experimental conditions, that is steady-state, 1st-order SAM, and 2nd-order SAM noise maskers produced the same effects on consonant identification. The absence of masking release is in line with data recently reported by Nelson et al. (2003) and Nelson and Jin (2004) with 4channel noise-vocoded sentences. One-way repeated-measures ANOVAs were performed to estimate the significance of the differences in identification scores across 1st-order modulation frequencies in the case of the 1st-order SAM masker or across 2nd-order modulation frequencies for carrier modulation frequencies of 16 and 256 Hz. An arcsine transform was applied to the proportions of correct responses. Analyses showed no significant main effect of 1st- or 2nd-order modulation frequency (all F < 1, NS), except for a carrier modulation frequency of 16 Hz, where the main effect of 2nd-order modulation frequency was just significant [F(2, 12) = 3.99,
82
C. Fu¨llgrabe et al. / Hearing Research 211 (2006) 74–84
a
manner (c) with increasing 1st- and 2nd-order modulation frequency. Detailed inspection of the confusion matrices revealed that the observed increases in transmitted information were mainly due to a reduction in the confusion between: (i) the voiced and unvoiced fricatives /Z/ and // and (ii) the plosives /p,b/ and the fricative /f/ on the one hand, and the nasals /m,n/ on the other hand. 4. Summary and discussion 4.1. Masking release with spectral cues
b
c
Fig. 5. Percentage of transmitted information of spectrally degraded speech received for voicing (a), place of articulation (b), and manner (c) in presence of a steady-state noise masker or as a function of the 1st- or 2ndorder modulation frequency, fm or fm0 , of a fluctuating noise masker.
p = 0.047]. Finally, a one-way repeated-measures ANOVA was conducted to determine the statistical significance of differences between the five masking conditions (with scores being averaged over fm or fm0 for the 1st- and 2ndorder SAM noises, respectively). There was no significant difference between results for the different maskers [F(4, 24) < 1, NS]. Fig. 5 shows that the introduction of 1st- and 2nd-order temporal-envelope fluctuations in the background noise did not yield any substantial change in the reception of the three speech features: overall, the reception of voicing, place, and manner appears roughly similar for all types of noise masker. Nevertheless, the reception of voicing (a) improved by about 8% relative to the steady-state noise condition for 1st-order SAM maskers below 64 Hz and for 2nd-order SAM maskers with low carrier modulation frequencies, replicating somewhat the pattern of results observed for unprocessed speech. A very modest improvement by about 5% is also apparent in the reception of
In agreement with previous studies, identification of unprocessed VCV stimuli was typically better for noises with (1st-order) sinusoidal temporal-envelope fluctuations than for a steady-state noise, demonstrating that listeners were able to take advantage of temporal minima in the noise background as short as 4 ms (i.e., with fm = 128 Hz) to extract speech cues; maximum masking release of 35% occurred for a modulation frequency of 8 Hz. Identification performance was further characterized using information transmission analysis. This approach may contribute to a better understanding of the release from masking by showing that this phenomenon mainly corresponded to a reduction in the confusion between plosives, nasals, and liquids. In addition, although temporal minima or ‘‘glimpses’’ in the noise of about 63 ms (corresponding to fm = 8 Hz) yielded maximum consonant identification scores, minima of different durations were required for optimal reception of voicing [125–16 ms (fm = 4–32 Hz)], place of articulation [16 ms (fm = 32 Hz)], and manner [125 ms (fm = 8 Hz)]. These estimates are compatible with certain time frames known to be important for speech perception. For instance, dynamic formant transitions last about 30 ms (Calliope, 1989), and correspond to an important acoustic cue for place of articulation (e.g., Hazan and Rosen, 1991). Moreover, temporal modulation periods critical for speech identification in noise range between 31 and 125 ms (Drullman et al., 1994a,b). Taken together, these data suggest that short glimpses of about 16 ms are optimal for speech perception mechanisms using mainly spectral or temporal fine-structure information (as in the case of place of articulation), while much longer glimpses of about 63 ms are optimal for mechanisms using mainly temporal-envelope information (as in the case of manner). Voicing may then represent an intermediate case involving both spectral and temporal information (i.e., place, temporal fine-structure, and envelope information). 4.2. Masking release without spectral cues Severely degrading the spectral resolution of the VCV stimuli using a 4-channel speech vocoder yielded almost identical identification performance for the steady-state and fluctuating noise conditions. Despite this apparent lack of masking release in the identification scores, an
C. Fu¨llgrabe et al. / Hearing Research 211 (2006) 74–84
83
improvement of 8% was observed with slow 1st-order modulation frequencies for the reception of voicing, while the reception of manner was very modestly increased at modulation frequencies above 2 Hz. As slight as may be these effects, they illustrate the potential advantage of using information transmission analysis in addition to summary scores of phoneme perception. The absence of a release from masking when the background was a 1st-order SAM noise is somewhat surprising. However, this finding is globally in agreement with the outcome of studies conducted with patients showing poor frequency selectivity or limited access to spectral cues (e.g., Gustafsson and Arlinger, 1994; Eisenberg et al., 1995; Bacon et al., 1998; Peters et al., 1998; Nelson et al., 2003): moderately/severely hearing-impaired listeners and cochlear-implant users consistently show substantially less masking release for fluctuating maskers than normal-hearing listeners. The present finding is also consistent with recent data obtained from normal-hearing listeners using similar speech-processing schemes (Nelson et al., 2003; Nelson and Jin, 2004; Qin and Oxenham, 2003; Zeng et al., 2005). These authors proposed that, when spectral resolution is severely impoverished and/or no temporalfine structure information is available, listeners cannot use pitch cues and the incoherence of envelope cues across the spectrum to segregate speech from the background. However, the ability to distinguish between signal and masker is crucial in order to be able to benefit from the temporal minima in the masker. In contrast to Kwon and Turner (2001), the present data did not show any ‘‘direct’’ evidence for modulation masking in terms of poorer identification performance in the modulated than in the unmodulated noise condition. This discrepancy may be due to the fact that broadband noise and speech stimuli were used here, whereas Kwon and Turner (2001) found masking when the bandlimited speech and masker stimuli were remote in audio frequency.
ground. However, as for 1st-order SAM maskers, an improvement of 8% was observed with slow 2nd-order modulation frequencies for the reception of voicing, confirming the potential advantage of using information transmission analysis. Finally, the data obtained with 2nd-order SAM maskers and unprocessed speech stimuli indicated that identification scores and information transmission were influenced by two main factors: (i) the frequency of the 1st-order (carrier) SAM (here performance and reception of the three speech features followed results obtained for a 1st-order SAM masker, with highest scores occurring for fm between 8 and 32 Hz) and (ii) the modulation depth of the 1st-order (carrier) SAM, which increases and decreases cyclically at a slow frequency fm0 (for each value of fm, consonant identification and consonant-feature reception measured for a 2nd-order SAM masker were always lower than or equal to those measured for a 1st-order SAM masker of identical modulation frequency). In line with the outcome of previous studies investigating masking release with temporally complex backgrounds (e.g., Gustafsson and Arlinger, 1994; Bacon et al., 1998), this study suggests that: (i) identification performance in background noise showing simple (1st-order) or complex (2nd-order) envelope fluctuations is strongly determined by the ability to glimpse into the noise valleys, and (ii) the duration and number of these valleys are crucial factors for the level of identification performance and consonant-feature reception.
4.3. Masking release with 2nd-order masker modulations
References
For both unprocessed and processed speech, noise backgrounds with 2nd-order envelope fluctuations did not yield more masking than steady-state noise backgrounds; this did not change when the 2nd-order modulations fell within the 4- to 16-Hz range critical for speech perception. Therefore, it is possible that a 10–13 dB modulation-masking effect caused by a 2nd-order modulation masker, as reported in previous psychoacoustical studies, is not sufficient to yield degradation in the processing of speech-envelope components. The data obtained with 2nd-order SAM maskers and processed speech stimuli indicated that identification scores were not influenced by the temporal characteristics of the masker. Similar results were obtained with and without 2nd-order SAM. It is likely that, as in the conditions using 1st-order SAM maskers, listeners were unable to use 2ndorder envelope cues to segregate the speech from the back-
Bacon, S.P., Grantham, D.W., 1989. Modulation masking patterns: effects of modulation frequency, depth, and phase. J. Acoust. Soc. Am. 85, 2575–2580. Bacon, S.P., Opie, J.M., Montoya, D.Y., 1998. The effects of hearing loss and noise masking on the masking release for speech in temporally complex background. J. Speech Lang. Hear. Res. 41, 549–563. Bregman, A.S., 1990. Auditory Scene Analysis. MIT Press, Cambridge, MA. Calliope, 1989. La parole et son traitement automatique. Masson, Paris. Dau, T., Kollmeier, B., Kohlrausch, A., 1997. Modeling auditory processing of amplitude modulation. I. Modulation detection and masking with narrow-band carriers. J. Acoust. Soc. Am. 102, 2892–2905. Drullman, R., Festen, J.M., Plomp, R., 1994a. Effect of temporal envelope smearing on speech reception. J. Acoust. Soc. Am. 95, 1053–1064. Drullman, R., Festen, J.M., Plomp, R., 1994b. Effect of reducing slow temporal modulations on speech reception. J. Acoust. Soc. Am. 95, 2670–2680. Dubno, J.R., Horwitz, A.R., Ahlstrom, J.B., 2002. Benefit of modulated maskers for speech recognition by younger and older adults with normal hearing. J. Acoust. Soc. Am. 111, 2897–2907.
Acknowledgements This research was supported by a MENRT grant to C. Fu¨llgrabe, a grant from the Institut Universitaire de France to C. Lorenzi, and a PICTI grant to F. Berthommier and C. Lorenzi. The authors thank the Editor-in-Chief A.R. Møller and B.C.J. Moore for very helpful comments on an earlier version of the manuscript.
84
C. Fu¨llgrabe et al. / Hearing Research 211 (2006) 74–84
Duquesnoy, A.J., 1983. Effect of a single interfering noise or speech source upon the binaural sentence intelligibility of aged persons. J. Acoust. Soc. Am. 74, 739–743. Eisenberg, L.S., Dirks, D.D., Bell, T.S., 1995. Speech recognition in amplitude-modulated noise of listeners with normal and listeners with impaired hearing. J. Speech Lang. Hear. Res. 38, 222–233. Ewert, S.D., Dau, T., 2000. Characterizing frequency selectivity for envelope fluctuations. J. Acoust. Soc. Am. 108, 1181–1196. Ewert, S.D., Verhey, J.L., Dau, T., 2002. Spectro-temporal processing in the envelope frequency domain. J. Acoust. Soc. Am. 112, 2921–2931. Festen, J.M., 1993. Contributions of comodulation masking release and temporal resolution to the speech-reception threshold masked by an interfering noise. J. Acoust. Soc. Am. 94, 1295–1300. Festen, J.M., Plomp, R., 1990. Effects of fluctuating noise and interfering speech on the speech reception threshold for impaired and normal hearing. J. Acoust. Soc. Am. 88, 1725–1736. Fu¨llgrabe, C., Lorenzi, C., 2003. The role of envelope beat cues in the detection and discrimination of second-order amplitude modulation. J. Acoust. Soc. Am. 113, 49–52. Fu¨llgrabe, C., Moore, B.C.J., Demany, L., Ewert, S.D., Sheft, S., Lorenzi, C., 2005. Modulation masking with 2nd-order modulators. J. Acoust. Soc. Am. 117, 2158–2168. ˚ ., Arlinger, S.D., 1994. Masking of speech by amplitudeGustafsson, H.A modulated noise. J. Acoust. Soc. Am. 95, 518–529. Hall, J.W., Grose, J.H., Haggard, M.P., 1988. Comodulation masking release for multicomponent signals. J. Acoust. Soc. Am. 83, 677–686. Hazan, V., Rosen, S., 1991. Individual variability in the perception of cues to place contrasts in initial stops. Percept. Psychophys. 49, 187–200. Houtgast, T., 1989. Frequency selectivity in amplitude-modulation detection. J. Acoust. Soc. Am. 85, 1676–1680. Houtgast, T., Steeneken, H.J.M., 1985. A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria. J. Acoust. Soc. Am. 77, 1069–1077. Howard-Jones, P.A., Rosen, S., 1993a. The perception of speech in fluctuating noise. Acustica 78, 258–272. Howard-Jones, P.A., Rosen, S., 1993b. Uncomodulated glimpsing in ‘‘checkerboard’’ noise. J. Acoust. Soc. Am. 93, 2915–2922. Kwon, B.J., Turner, C.W., 2001. Consonant identification under maskers with sinusoidal modulation: masking release or modulation interference? J. Acoust. Soc. Am. 110, 1130–1140. Lorenzi, C., Soares, C., Vonner, T., 2001a. Second-order temporal modulation transfer functions. J. Acoust. Soc. Am. 110, 1030–1038. Lorenzi, C., Berthommier, F., Apoux, F., Bacri, N., 1999. Effects of envelope expansion on speech recognition. Hear. Res. 136, 131–138. Lorenzi, C., Simpson, M.I.G., Millman, R.E., Griffiths, T.D., Woods, W.P., Rees, A., Green, G.G.R., 2001b. Second-order modulation detection thresholds for pure-tone and narrow-band noise carriers. J. Acoust. Soc. Am. 110, 2470–2478. Miller, G.A., Licklider, J.C.R., 1950. The intelligibility of interrupted speech. J. Acoust. Soc. Am. 22, 167–173. Miller, G.A., Nicely, P.E., 1955. Analysis of perceptual confusions among some English consonants. J. Acoust. Soc. Am. 27, 338–352. Millman, R.E., Lorenzi, C., Apoux, F., Fu¨llgrabe, C., Green, G.G.R., Bacon, S.P., 2002. Effect of duration on amplitude-modulation masking. J. Acoust. Soc. Am. 111, 2551–2554.
Moore, B.C.J., Sek, A., Glasberg, B.R., 1999. Modulation masking produced by beating modulators. J. Acoust. Soc. Am. 106, 908–918. Munson, B., Donaldson, G.S., Allen, S.L., Collison, E.A., Nelson, D.A., 2003. Patterns of phoneme perception errors by listeners with cochlear implants as a function of overall speech perception ability. J. Acoust. Soc. Am. 113, 925–935. Nelson, P.B., Jin, S.-H., 2004. Factors affecting speech understanding in gated interference: cochlear implant users and normal-hearing listeners. J. Acoust. Soc. Am. 115, 2286–2294. Nelson, P.B., Jin, S.-H., Carney, A.E., Nelson, D.A., 2003. Understanding speech in modulated interference: cochlear implant users and normalhearing listeners. J. Acoust. Soc. Am. 113, 961–968. Peters, R.W., Moore, B.C.J., Baer, T., 1998. Speech reception thresholds in noise with and without spectral and temporal dips for hearing-impaired and normally hearing people. J. Acoust. Soc. Am. 103, 577–587. Powers, G.L., Speaks, C., 1973. Intelligibility of temporally interrupted speech. J. Acoust. Soc. Am. 54, 661–667. Powers, G.L., Wilcox, J.C., 1977. Intelligibility of temporally interrupted speech with and without intervening noise. J. Acoust. Soc. Am. 61, 195–199. Qin, M.K., Oxenham, A.J., 2003. Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers. J. Acoust. Soc. Am. 114, 446–454. Sek, A., Moore, B.C.J., 2003. Testing the concept of a modulation filter bank: the audibility of component modulation and detection of phase change in three-component modulators. J. Acoust. Soc. Am. 113, 2801–2811. Sek, A., Moore, B.C.J., 2004. Estimation of the level and phase of the simple distortion tone in the modulation domain. J. Acoust. Soc. Am. 116, 3031–3037. Shannon, R.V., Zeng, F., Kamath, V., Wygonski, J., Ekelid, M., 1995. Speech recognition with primarily temporal cues. Science 270, 303–304. Sheft, S., Yost, W.A., 1997. Modulation detection interference with twocomponent masker modulators. J. Acoust. Soc. Am. 102, 1106–1112. Shofner, W.P., Sheft, S., Guzman, S.J., 1996. Responses of ventral cochlear nucleus units in the chinchilla to amplitude modulation by low-frequency, two-tone complexes. J. Acoust. Soc. Am. 99, 3592– 3605. Summers, V., Molis, M.R., 2004. Speech recognition in fluctuating and continuous maskers: effects of hearing loss and presentation level. J. Speech Lang. Hear. Res. 47, 245–256. Takahashi, G.A., Bacon, S.P., 1992. Modulation detection, modulation masking, and speech understanding in noise in the elderly. J. Speech Hear. Res. 35, 1410–1421. Verhey, J.L., Ewert, S.D., Dau, T., 2003. Modulation masking produced by complex tone modulators. J. Acoust. Soc. Am. 114, 2135–2146. Warren, R.M., 1970. Perceptual restoration of missing speech sounds. Science 167, 392–393. Yost, W.A., Sheft, S., Opie, J., 1989. Modulation interference in detection and discrimination of amplitude modulation. J. Acoust. Soc. Am. 86, 2138–2147. Zeng, F.-G., Nie, K., Stickney, G.S., Kong, Y.-Y., Vongphoe, M., Bhargave, A., Wei, C., Cao, K., 2005. Speech recognition with amplitude and frequency modulations. Proc. Natl. Acad. Sci. USA 102, 2293–2298.