Hearing Research
Hearing Research 225 (2007) 11–24
www.elsevier.com/locate/heares
Research paper
Auditory stream segregation of tone sequences in cochlear implant listeners q Huw R. Cooper *, Brian Roberts School of Psychology, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, UK Received 3 August 2006; received in revised form 30 October 2006; accepted 27 November 2006 Available online 24 January 2007
Abstract Previous claims that auditory stream segregation occurs in cochlear implant listeners are based on limited evidence. In experiment 1, eight listeners heard tones presented in a 30-s repeating ABA-sequence, with frequencies matching the centre frequencies of the implant’s 22 electrodes. Tone A always stimulated electrode 11 (centre of the array); tone B stimulated one of the others. Tone repetition times (TRTs) from 50 to 200 ms were used. Listeners reported when they heard one or two streams. The proportion of time that each sequence was reported as segregated was consistently greater with increased electrode separation. However, TRT had no significant effect, and the perceptual reversals typical of normal-hearing listeners rarely occurred. The results may reflect channel discrimination rather than stream segregation. In experiment 2, six listeners performed a pitch-ranking task using tone pairs (reference = electrode 11). Listeners reported which tone was higher in pitch (or brighter in timbre) and their confidence in the pitch judgement. Similarities were observed in the individual pattern of results for reported segregation and pitch discrimination. Many implant listeners may show little or no sign of automatic stream segregation owing to the reduced perceptual space within which sounds can differ from one another. 2006 Elsevier B.V. All rights reserved. Keywords: Auditory grouping; Stream segregation; Pitch discrimination; Cochlear implants
1. Introduction The process of auditory stream segregation and its properties have been well documented and investigated in normal-hearing listeners (e.g., Miller and Heise, 1950; Bregman and Campbell, 1971; Van Noorden, 1975; Bregman, 1990). Sequential grouping has been described as the process by which the auditory system groups together sounds that are similar to one another in preference to q Summaries of this research were presented at the British Society of Audiology, Short Papers on Experimental Studies of Hearing and Deafness, University College London, September 2004 (p. 140–142 in book of abstracts) and at the Conference on Implantable Auditory Prostheses, Asilomar, Pacific Grove, CA, August 2005 (p. 119 in book of abstracts). * Corresponding author. Address: Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, UK. Tel.: +44 121 627 8106; fax: +44 121 627 8914. E-mail address:
[email protected] (H.R. Cooper).
0378-5955/$ - see front matter 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.heares.2006.11.010
sounds that follow one another immediately in time (Bregman, 1990). When listening to sequences of sounds, the sounds may be grouped together and so perceived as emanating from a single source (fusion or integration), or perceived as separate auditory streams that originate from distinct sources (fission or stream segregation). In normal hearing, the most obviously demonstrable cues for stream segregation are frequency separation and rate of presentation (e.g., Bregman and Campbell, 1971; Van Noorden, 1975). For sequences of pure tones alternating between high and low frequency, sounds heard as similar in pitch tend to be grouped together perceptually. As the frequency separation between alternating tones increases, so does the probability of hearing two distinct auditory streams. The rate of presentation is inversely related to the time between onsets of consecutive tones (tone repetition time, or TRT). For a fixed frequency separation, a smaller TRT (faster rate) is associated with an increased tendency towards stream segregation.
12
H.R. Cooper, B. Roberts / Hearing Research 225 (2007) 11–24
Van Noorden (1975) made a distinction between the fission boundary and the temporal coherence boundary (see Fig. 1). If the frequency separation between the tones is below the fission boundary, the sounds are grouped together and a single stream is always heard – it is impossible to hear two distinct streams. If the separation is above the temporal coherence boundary, stream segregation occurs automatically – it is impossible to hear a single integrated stream. In contrast to the temporal coherence boundary, the fission boundary shows relatively little dependence on TRT. Therefore, the range of frequency separations for which a pure-tone sequence may be perceived either as integrated or segregated increases for longer TRTs (i.e., slower rates). Bregman (1990) refers to the automatic and pre-attentive fission of a sequence of sounds into two or more perceptual streams as primitive stream segregation. He makes an important distinction between this process and the voluntary formation of an auditory stream from a subset of acoustic elements. The latter process, which he refers to as schema-based selection, is generally assumed to require selective attention and hence to carry an appreciable cognitive load. The perception of a pure-tone sequence is ambiguous for a combination of frequency separation and TRT that falls between the temporal coherence and fission boundaries. What is heard may ‘flip’ spontaneously between segregation and integration – even when the listener is not trying to hear the sequence in a particular way. Thus, listeners are likely to press the key frequently in a task in which they are required to monitor their perception contin-
Fig. 1. The effect of frequency separation and tone repetition time (TRT) on the perceptual organization of a repeating sequence consisting of pure tones of lower and higher frequency. From: Temporal Coherence in the Perception of Tone Sequences, by Van Noorden (1975). Adapted with the permission of the author.
uously, because the number of streams heard may change every few seconds (Anstis and Saida, 1985). This perceptual instability is analogous to the ambiguous figure/ground phenomenon well known in the visual domain (e.g., the vase-faces illusion). Another well established property of sequential stream segregation is that it is cumulative. The auditory system appears to begin listening to a sequence of sounds with a bias towards hearing the input as a single stream, but then gradually accumulates evidence over a period of seconds, which may lead to stream segregation (Bregman, 1978; Anstis and Saida, 1985). More generally, perceptual grouping is clearly important when listening to speech originating from a single speaker in the presence of competing speech or other background sounds, a situation found very commonly in everyday listening environments (Cherry, 1953). In this difficult situation, the listener’s auditory system needs to make use of perceptual characteristics of the target speaker’s voice (such as voice pitch, accent, timbre etc) in order to separate it from competing sounds. 1.1. Perceptual grouping in cochlear implant listeners Cochlear implant listeners clearly need to deal with the same listening conditions as people with normal hearing; the ability of the auditory system to assign sounds perceptually to different sources is relevant and important, irrespective of whether the listener is hearing via electrical stimulation or through natural means. As the vast majority of implant users have only one implant, and are therefore listening monaurally, they do not have access to any binaural cues that might aid perceptual segregation of concurrent or sequential sounds. Also, the quality of signal available to an implant user is clearly reduced substantially when compared to normal hearing. There is wide variability in the performance achieved by cochlear implant listeners; the very best ‘star’ performers can achieve up to 100% correct speech recognition in quiet, while at the other extreme some implant users can achieve only very limited open-set speech recognition. The ability of implant listeners to understand speech is often badly affected by competing background sounds, and this is a major limitation of the benefit that implants can provide. For example, Nelson and Jin (2004) showed that competing speech can impair performance in implant listeners even at a favourable S/N ratio of +16 dB (more favourable than many real-life listening situations). Implant listeners also demonstrate poor perception and appreciation of music (e.g., Kong et al., 2004). These limitations of the ability of implant listeners to hear well in challenging listening conditions may be due in part to spatial interaction between cochlear implant channels, even when neuronal survival is good. Forward masking and other studies have shown that the neural populations stimulated by different electrodes overlap and that the degree of overlap varies between listeners (Throckmorton and Collins, 2002; Chatterjee and Shannon, 1998). This
H.R. Cooper, B. Roberts / Hearing Research 225 (2007) 11–24
spread of excitation across and between channels leads to reduced spectral resolution and a high degree of spectral ‘smearing’, which increases the implant listener’s susceptibility to noise (Fu and Nogaki, 2004). The speech-processing strategies used by modern cochlear implants use input filters for individual channels that are typically too broad to resolve individual harmonics, and the spread of current across the cochlea would further increase the ‘mixing’ of harmonics (Moore and Carlyon, 2005). The simplest form of perceptual grouping that can be explored in implant listeners is the perceptual organization of sequences of pure tones. These stimuli have the advantage of allowing an exploration of the perceptual effects of reduced signal quality whilst avoiding the complication of temporal overlap between individual sounds. Thus far, very few studies of stream segregation in cochlear implant listeners have been reported. An exception is the study of Chatterjee and Galvin (2002). These authors used repeating patterns of loudness-matched stimuli composed of two or three different tones and varied the tonotopic distance (electrode pair separation) between these tones. Before each test sequence, listeners heard a ‘‘preview’’ sequence with a rhythm corresponding to a subset of the tones and they were asked to judge whether or not they could hear this rhythm within the test sequence. A high proportion of positive responses was taken as evidence that the particular subset of elements could be heard out as a separate perceptual stream. Chatterjee and Galvin (2002) reported the results for a TRT of 100 ms (tone duration = 50 ms, inter-tone interval = 50 ms) and found that the proportion of positive responses increased with tonotopic distance, as would be expected for normal-hearing listeners. Ratings of the degree of perceived segregation showed a similar dependence on tonotopic distance. On the basis of their results, Chatterjee and Galvin (2002) suggested that sequential stream segregation was achievable in cochlear implant listeners. However, their experimental design did not allow for a definitive conclusion that automatic stream segregation was taking place; i.e., that the test sequence breaks into two perceptual streams that cannot voluntarily be recombined by the listener (Bregman, 1990). First, they did not report the effects of varying the rate of presentation for their sequences of stimuli. This is important because, as noted earlier, the temporal coherence boundary is known to be strongly dependent on the TRT (Van Noorden, 1975). Second, their task involved the selection of a subset of elements from a larger sequence. It is known from studies of normal-hearing listeners that it is possible to focus attention on a subset of tones in a sequence when they differ in frequency by only a few semitones (e.g., Cusack and Roberts, 2000). Also, in contrast with the temporal coherence boundary, the fission boundary shows little dependence on the TRT (Van Noorden, 1975). In short, the ability of an implant listener to select a subset of elements from a mixture does not of itself indicate that primitive stream segregation has taken place.
13
This distinction is an important one because automatic stream segregation provides, for listeners with normal hearing, a basic structuring of the auditory input that can be further refined by attentional processes. In the absence of primitive stream segregation, cochlear implant listeners would always have to rely on schema-based processing to select out a particular subset of acoustic elements as a separate perceptual stream. Attentional resources are limited, and so listeners who rely mainly (or exclusively) on schema-based selection to partition sound mixtures would be at a considerable disadvantage in complex listening environments. The aim of the experiments reported here was firstly to demonstrate, if possible, more definitive evidence of automatic stream segregation in cochlear implant listeners. In order to be convincing, evidence is needed that any reported stream segregation depends not only on frequency separation but also on rate of presentation, and that some combinations of separation and rate lead to ambiguity in the percepts produced, as found in normal hearing. Secondly, we investigated pitch ranking across the implant electrode array and compared performance on this task with the results of the stream segregation task. The rationale behind experiment 2 was to investigate whether a greater tendency towards reporting a segregated percept in experiment 1 is associated with more reliable (and more confident) pitch judgements. 2. Overview of methods In order to provide evidence of auditory stream segregation, three of the well-documented features of this phenomenon were manipulated: 1. The effect of frequency separation. In normal hearing, increased frequency separation increases the probability of two distinct auditory streams being heard. In implant listeners, channel separation can be manipulated such that increased separation would be expected to give rise to more stream segregation (with the caveats discussed in more detail below). 2. The effect of rate of presentation on the probability of stream segregation. The temporal coherence boundary (Van Noorden, 1975) has been shown in normal-hearing listeners to increase with increasing TRT (i.e., reduced rate of presentation). This means that, within a certain range, TRT should affect the probability of alternating tones being reported as segregated if primitive stream segregation is taking place. 3. For frequency separations lying between the temporal coherence boundary and the fission boundary, the percept heard should be ambiguous, such that it can ‘flip’ randomly between integration (one stream) and segregation (two streams). In a task in which the listener is required to respond every time (s)he perceives a change in organization from one to the other, this should typically lead to several key-presses over the course of 30 s.
14
H.R. Cooper, B. Roberts / Hearing Research 225 (2007) 11–24
It should be noted that, for cochlear implant listeners, it is not possible to make the assumption that differences in frequency are encoded purely as differences in pitch. Although pitch is the dominant perceptual attribute of sounds when stimuli delivered to different electrodes are balanced for loudness, there may be multiple percepts that change with electrode location (Collins and Throckmorton, 2000). Cochlear implant users report various other attributes of the sounds heard as different electrodes in their implant are stimulated, including changes in timbre such as an increased quality of ‘sharpness’ or ‘squeakiness’ as more basal electrodes are stimulated. Moore and Carlyon (2005) suggest that the percepts conveyed through electrical stimulation do not meet a strict definition of musical pitch, and that what happens when place of stimulation is varied in a cochlear implant may be similar to the changes in pitch that can be reported by normal-hearing listeners responding to a shift in the spectral shaping of noise components, i.e. a change in timbral brightness. Another constraint on the perception of place pitch by cochlear implant listeners is the design of their speech processors, in which input sounds are filtered into a number of channels (22 channels in the Nucleus CI24 implant used by the participants in this study). Each channel is defined by a specific frequency allocation with upper and lower boundaries, and sound energy falling into each filter leads to stimulation on the electrode allocated to that channel. Note that the frequency allocation for a particular electrode rarely matches the characteristic frequencies for the area
of the cochlea that it stimulates, and that the channel-toplace mapping varies between individual implant users, depending on insertion depth. A pure tone at a frequency falling within the frequency allocation for a particular channel will cause stimulation of one electrode only, and complex sounds made up of multiple frequencies will only cause stimulation of those electrodes allocated to the channels within which the components fall. If two tones have different frequencies that both fall within a single channel, they could not be expected to be discriminable through place pitch alone. In normal use of a typical speech processing strategy, the pulse rate is not varied between electrodes and so the only cue available to the listener for pitch is place. To ensure that different stimuli result in stimulation on different channels in our experiments, their frequencies were selected to correspond to the centre frequency of each channel in the experimental speech processor used (see Table 1). The speech processor was programmed so that it could deliver stimulation on only one channel at a time (in normal use 6 electrodes or more are activated, selected from the full set of available electrodes). The ‘Map’ used in our experiments was programmed with the number of ‘maxima’ (active channels) set to 1, using the ‘Advanced Combination Encoder’ (ACE) strategy and Nucleus R126 programming software. For all listeners, the pulse rate used in this study was 900 pulses per second and the pulse width used was 25 ls. 3. Experiment 1
Table 1 Frequency characteristics of the stimuli and their relation to the implant channels Channel/ electrode number
Lower frequency boundary (Hz)
Upper frequency boundary (Hz)
Channel centre frequency and frequency of the pure tone stimuli (Hz)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
6938 6063 5313 4688 4063 3563 3063 2688 2313 2063 1813 1563 1313 1188 1063 938 813 688 563 438 313 188
7938 6938 6063 5313 4688 4063 3563 3063 2688 2313 2063 1813 1563 1313 1188 1063 938 813 688 563 438 313
7438 6500 5688 5000 4375 3813 3313 2875 2500 2188 1938 1688 1438 1250 1125 1000 875 750 625 500 375 250
Channels 1 and 22 correspond to the most basal and the most apical channels, respectively.
3.1. Method In this experiment, sequences of non-overlapping pure tones were employed, thus enabling the investigation of sequential grouping. Similar stimuli to those described by Van Noorden (1975) were used, i.e. a repeating ABA–ABA–. . . sequence of pure tones, where the frequency of tone A remains constant throughout the experiment but that of tone B is varied. 3.1.1. Stimuli Characteristics of the stimuli are summarized in Tables 1 and 2. As mentioned above, all stimuli used were pure tones at the frequencies corresponding to the centre frequencies of the frequency allocations for each of the 22 channels programmed into the speech processor ‘Map’ Table 2 Temporal characteristics of the stimuli TRT (ms)
Onset ramp (ms)
Plateau (ms)
Offset ramp (ms)
Within-triplet silent intervals (ms)
Between-triplet silent intervals (ms)
50 100 150 200
10 10 10 10
20 40 70 100
10 10 10 10
10 40 60 80
60 140 210 280
H.R. Cooper, B. Roberts / Hearing Research 225 (2007) 11–24
used throughout both experiments. Although spectral ‘splatter’ theoretically should not be a risk for stimuli presented via electrical stimulation, 10-ms rise and fall times were used for each tone. In order to investigate the effect of rate of presentation on reported stream segregation, four different TRTs were used: 50, 100, 150, and 200 ms (see Table 2). These values exceed the range of TRTs used by Van Noorden (1975), which ranged from 60 to 150 ms (see Fig. 1). He showed an approximately two-fold increase in the temporal coherence boundary (from around 6 to 12 semitones) for a change in TRT from 100 to 150 ms, thus demonstrating a strong effect of rate on primitive stream segregation in normal-hearing listeners. In order to verify that the selected channels were indeed activated as intended by the pure tone stimuli described, all experimental stimuli were routed through the experimental speech processor and the output was analysed using a ‘dummy’ cochlear implant within the manufacturer’s computer interface, which allows a stimulus frame-by-frame listing of the output of the implant transmitter coil and the generation of an ‘Electrodogram’ which illustrates visually the output on each electrode over a selected time window (see Fig. 2). 3.1.2. Loudness balancing Prior to the experiment, comfortable and equal loudness levels were measured for each listener for each of the pure
15
tones using the implant manufacturer’s programming software. Individual electrodes were set at the maximum comfortable loudness level or ‘C level’ (the highest stimulus level that remained comfortable). This was achieved by a ‘sweep’ across channels, as many times as required to ensure equal and comfortable loudness for all the stimuli. Adjustments were made to individual levels as necessary. All stimuli were then presented at these levels. Stimuli were delivered via a computer-controlled sound card (16-bit resolution, 20-kHz sampling frequency) to the external input socket of the speech processor, via an electrically isolated adaptor cable (as supplied by the manufacturer for use with the Nucleus implant system). The control knob on the adaptor cable and the sensitivity control on the experimental speech processor were set to maximum, thus ensuring that stimulation always occurred at the appropriate C levels. The speech-processor microphone was disconnected during the experiment, and so the listener was isolated from all external sounds. 3.1.3. Listeners Listeners were eight post-lingually deafened adults (6 female, 2 male) who were experienced users of the Nucleus CI24 cochlear implant. All had full electrode insertions. Demographic and clinical data for the participants are shown in Table 3, including their open set (sound only) speech recognition score for sentences (recorded within 6
Fig. 2. An example ‘‘electrodogram’’ displaying the output of the experimental speech processor and transmitter coil for a time window of approximately 2.2 s. The output was taken from within an experimental run comprising ABA triplets with a TRT of 200 ms. The y-axis shows electrode number and the x-axis shows time elapsed in ls. Only a range of electrode numbers that encompasses those stimulated in this experimental run is shown (electrodes 11 and 7 for tones A and B, respectively). The filled rectangles illustrate when electrical stimulation was present on each electrode.
16
H.R. Cooper, B. Roberts / Hearing Research 225 (2007) 11–24
Table 3 Demographic and clinical details of the listeners Listener
Age
Gender
Open set speech recognition score (BKB sentences in quiet, % correct)
Pulse rate (pulses per second) in normal daily use
Pulse width (ls) in normal daily use
L1 L2 L3 L4 L5 L6 L7 L8
42 48 61 33 58 25 35 52
F F F F M M F F
80 76 91 88 48 98 100 79
1200 250 1200 250 250 250 250 900
25 25 25 25 25 25 25 25
months of the experiment). Two of the listeners (L3 and L7) were tested using only two of the four TRTs (100 and 150 ms). 3.1.4. Procedure Listeners were presented with 30-s sequences of alternating stimuli in a repeating ABA–ABA-. . . format, where A corresponds to a stimulus applied to electrode 11 (in the middle of the electrode array, corresponding to a pure tone at 1938 Hz) and B corresponds to a stimulus on one of the other electrodes, i.e. 1 to 10 and 12 to 22 (N.B.: higher numbers denote more apically placed electrodes). Some larger electrode separations were not included in the test set, because the area of most interest was the smaller electrode separations, located on either side of electrode 11. The electrode used for tone B was selected at random between presentation sequences, but did not vary within each 30-s sequence. In total, five repetitions of each ABA combination were used. This combination of task and sequence structure has previously been found to provide a sensitive and reliable measure of stream segregation in normal-hearing listeners (Cusack and Roberts, 1999, 2004; Roberts et al., 2002). 3.1.5. Task and responses The purpose of the experiment was explained to the listeners; they were instructed both verbally and in writing. Each listener was seated in front of a computer screen and keyboard. They were instructed to press the spacebar on the keyboard whenever their perception of the sequence changed from an ‘integrated’ (1 stream) to a ‘segregated’ (2 streams) percept or vice versa. It was explained to the listeners that the ‘integrated’ percept corresponded with a ‘galloping’ rhythm, which would disappear for the ‘segregated’ percept. All listeners received training and practice on the task prior to experimental runs. The training used stimuli that were chosen to demonstrate the two ‘extremes’ of possible percepts, i.e. (a) alternating tones presented to electrodes close together, intended to evoke a clearly integrated percept and (b) alternating tones stimulating electrodes that were widely spatially separated, intended to evoke a clearly segregated percept. Care was taken to ensure that listeners understood the task, were able to respond appropriately, and that any task learning
that was evident was essentially complete before the experimental runs began. The computer screen displayed their current choice. At the beginning of each 30-s sequence, the screen always read ‘Integrated’. This was based on the finding by Bregman (1978) that the default percept is integration, and that the tendency for stream segregation builds up over time. The computer recorded the timing of each key-press and from these calculated both the total number of key-presses for the sequence and the proportion of time for which the sequence was heard as segregated. 3.2. Results and discussion Fig. 3 shows the mean segregation reported for each electrode for tone B across all 8 listeners. Overall, a strong effect of electrode number on reported segregation can be seen, and this effect is broadly symmetrical on either side of electrode 11. A within-subjects ANOVA was performed that took into account combinations of electrode number and TRT for which data was not available for all listeners.
Fig. 3. Results for experiment 1 showing reported segregation in percent (mean data) for all eight listeners. Not all separations between electrode 11 and electrodes near the ends of the electrode array were tested for TRTs of 50 and 200 ms. Data collected in these regions of limited sampling are displayed as isolated points.
H.R. Cooper, B. Roberts / Hearing Research 225 (2007) 11–24
This analysis showed a highly significant effect of electrode separation [F(13, 65) = 12.145, p < 0.001], but there was no significant effect of TRT [F(3, 15) = 0.394, p = 0.759]. Also,
17
there was no significant interaction between electrode separation and TRT [F(39, 195) = 1.040, p = 0.416]. The proportion of time that listeners reported hearing a segregated
Fig. 4. Reported segregation results for individual listeners in experiment 1.
18
H.R. Cooper, B. Roberts / Hearing Research 225 (2007) 11–24
percept clearly increased as the spatial separation between tone A and tone B increased in terms of electrode number, and the lowest reported segregation was observed when tone B was on electrode 10 or 12 (both immediately adjacent to tone A on electrode 11). Although there were differences in reported segregation for different TRTs at greater electrode separations, it should be noted that there was no systematic effect of varying TRT towards the basal or apical ends of the array. Fig. 4 shows individual results for each listener and reveals interesting variations. For example, L6 showed a clear tendency to report a segregated percept most of the time, with significant reductions in reported segregation seen only when tone B stimulated electrode 10 or 12, both adjacent to electrode 11. In contrast, at intermediate TRTs (100 and 150 ms), L5 hardly ever reported segregation when tone B stimulated one of the electrodes in the range 7–14, which corresponds to 3–4 electrodes either side of electrode 11. Inspection of the patterns of key-presses recorded for each listener also reveals considerable variability. Fig. 5 shows the number of key-presses averaged over TRTs of 100 and 150 ms (the TRT values tested in all listeners), because key-press activity did not vary systematically with rate. Most listeners showed a clear tendency towards making only a single key-press, indicating just one change in percept throughout the 30-s listening period. Analysis of 5-s intervals throughout the 30-s period indicated that, when a single key-press occurred, it was almost invariably within the first 5 s of the sequence. Given that the default screen display was ‘integrated’, this suggests that listeners often tended towards a single judgement of whether they could hear two distinct sounds and, if not, they did not press the space-bar at all. This pattern of responses is not consistent with the perceptual reversals that are usually
found in streaming tasks using long sequences (e.g., Anstis and Saida, 1985; Roberts et al., 2002). One listener (L4) did display some relationship between electrode separation and key-pressing activity: she showed a clear tendency towards more key-presses for smaller electrode separations, i.e. for values of tone B corresponding to electrodes between 9 and 16. Indeed, when tone B stimulated electrode 14, the mean number of key-presses for L4 peaked at 6.5. For greater electrode separations (i.e., for values of tone B corresponding to electrode 17 or above, or to electrode 8 or below) her typical mean number of key-presses was low (between 1 and 2). These results could imply greater ambiguity and perceptual instability for smaller electrode separations, at least for those electrodes more apical to number 11. However, it is notable that L4 did not show any systematic relationship between % segregation and rate of presentation. Indeed, for small electrode separations, % segregation was lowest for the most rapid sequences (50-ms TRT) for this listener. This is the opposite of the typical finding in normal-hearing listeners. The absence of a clear effect of the rate of presentation of tones on the tendency towards reporting a segregated percept, combined with limited evidence of perceptual reversals from the key-press data, indicates that the responses of the cochlear implant listeners in this task were very different from those of normal-hearing listeners. Although it cannot be ruled out entirely, it seems unlikely that electrical stimulation would change such basic characteristics of a cognitive process like automatic stream segregation, were it actually taking place in our cochlear implant listeners. Instead, the results for our listeners may have reflected simple channel discrimination judgements rather than auditory stream segregation. Listeners tend to try and follow instructions, and in the absence of any automatic stream segregation, they may instead have interpreted the task as to report when they could hear more than one distinct pitch in the sequence. If this were the case, then little or no effect of the rate of stimulus presentation would be expected over the range used. In order to investigate this further, experiment 2 employed a pitch-discrimination task. If our interpretation of the results of experiment 1 is correct, namely that the data provide a measure of channel discrimination rather than primitive stream segregation, then overall performance on a more straightforward pitch task, and also individual differences in performance, should correlate with the corresponding segregation scores obtained in experiment 1. 4. Experiment 2
Fig. 5. Key-press results for individual listeners in experiment 1. For electrode positions near to the basal and apical ends of the array, the results are shown only for every other electrode number.
In this experiment, listeners judged the pitch (or sharpness/brightness) of each of the other electrodes in the electrode array in comparison with electrode 11. This allowed for direct comparison with the results obtained in experiment 1.
H.R. Cooper, B. Roberts / Hearing Research 225 (2007) 11–24
19
4.1. Method 4.1.1. Stimuli, task, and procedure Stimuli were pure tones with identical frequencies to those used in experiment 1 (see Table 1). Each tone had a total duration of 120 ms including 10-ms rise and fall times (this corresponds to the tones used in the 200-ms TRT condition in experiment 1). This duration was chosen as it was considered sufficiently long to enable an optimal subjective pitch judgement. As before, all stimuli were equalized for loudness before the experimental trials were run, using the procedure described earlier. Listeners were presented with two consecutive pure tones, with a 1-s silence between them. One of the stimuli, randomly the first or second presented, was always a 1938Hz pure tone that stimulated channel 11 (tone A). The other stimulus (tone B) was selected at random to stimulate any one of the other electrodes (1–10 or 12–22). The listeners were instructed to listen to both sounds, and then indicate which of the sounds (first or second) sounded higher pitched, or brighter/sharper in timbre, by pressing either key 1 or key 2. No feedback was provided after each response. After responding in this way, listeners were asked to indicate how confident they were in their choice, on a scale from 1 to 5, by pressing the appropriate key. The values of these keys corresponded to these descriptions: 1 = Very unsure, really guessing; 2 = A little bit confident; 3 = Moderately confident; 4 = Very confident, but not certain; 5 = Absolutely confident. In total, ten repetitions of each electrode pairing, in both possible orders, were presented (in random order) to each listener. Both pitch ranking and confidence ratings were recorded. 4.1.2. Listeners Listeners were 6 post-lingually deafened adults who were experienced users of the Nucleus CI24 cochlear implant. All 6 had participated in experiment 1. The other two listeners from experiment 1 (L3 and L8) were unavailable. 4.2. Results and discussion Fig. 6 shows the pitch-ranking results for all 6 listeners. Considerable variability is apparent across listeners in the slope of the function describing electrode number vs. pitch ranking in relation to electrode 11. As expected, all the curves converged on a probability of 50% when both stimuli were on electrode 11. Most listeners show the expected monotonic relationship between electrode number and perceived pitch: higher electrode numbers (more apical to electrode 11) are more likely to elicit a lower pitch (and so the probability of being reported as higher in pitch than electrode 11 falls). For the lower electrode numbers (more basal than electrode 11) the converse is true. In a perfect situation, every electrode higher in number than 11 would always be reported as lower pitched than 11, and all those lower than 11 would always be reported as higher pitched
Fig. 6. Pitch ranking results for individual listeners in experiment 2.
than 11. For some listeners, (e.g., L6) this is essentially the case. However, the situation is less clear-cut for a significant number of other listeners, for whom the responses suggest much more ambiguity in their pitch perceptions. In particular, L2 displayed seemingly quite random responses in her pitch judgements, with no clear relationship to channel number. L4 showed the expected trend when responding to more basal electrodes, i.e. generally reporting them as higher in pitch than electrode 11, while for more apical electrodes her responses were far more ambiguous. Similar variability can be seen in the confidence rating scores shown in Fig. 7. Some listeners (e.g., L6) reported a high degree of confidence in the vast majority of their pitch judgements, which only fell to near 1 (corresponding to guessing) when the spatial separation from electrode 11 was small or absent. Others (e.g., L2) required greater electrode separations to be confident in their pitch judgements. The overall mean confidence score (collapsed across all electrode separations) ranged across listeners from 3.6 to 4.6. Generally, those listeners who reported the highest confidence showed the sharpest tuning in their pitch ranking judgements. However, L2, who showed apparently rather disordered pitch perception (as discussed above), reported quite high confidence in her judgements except for small electrode separations. The variance in confidence scores may partly reflect individual differences in overall confidence in responses to psychophysical tasks. Nelson et al. (1995) reported very large variability in the performance of cochlear implant listeners in a pitch ranking task in which pairs of electrodes were stimulated sequentially and listeners responded according to which stimulus was higher in pitch or ‘sharper’. Listeners who demonstrated steeper electrode ranking functions may possess a
20
H.R. Cooper, B. Roberts / Hearing Research 225 (2007) 11–24
Fig. 7. Reported confidence ratings for individual listeners in experiment 2.
larger population of surviving neural elements, including functional cell bodies in the spiral ganglion that remain functionally organized tonotopically. Moore and Carlyon (2005) interpret the data reported in the Nelson et al. (1995) study to suggest that the threshold value for a median implant listener was about 1.2 mm along the basilar membrane, corresponding to an approximately 21% change in acoustic frequency based on Greenwood’s (1990) equation. In contrast, the frequency difference limen for a normal-hearing listener can be as low as 0.2%. 5. Comparison of segregation and pitch ranking judgements The results of experiment 1 suggest that our implant listeners experienced automatic stream segregation only rarely, if at all, even when tones A and B stimulated widely separated electrode pairs. Instead, it is proposed that implant listeners may often have defaulted to judging how many pitches (or timbral brightnesses) they heard in a test sequence. If this is so, then a relationship might be anticipated between a tendency towards reporting a segregated percept in experiment 1 and discriminating pitch reliably (and with confidence) in experiment 2. The data for the six individual listeners who took part in both experiments were transformed to facilitate qualitative and quantitative comparisons of the different judgements. For each channel separation, the transformations used were as follows: (i) reported segregation scores were averaged for TRTs of 100 and 150 ms (the values tested in all listeners); (ii) the percentage of responses where each electrode was reported as higher in pitch than electrode 11 was converted into an absolute difference from chance (50%) and re-scaled from 0 to 100 (i.e., absolute difference · 2); (iii) the confidence ratings were re-scaled from the 1 to 5 range that lis-
teners used into a 0 to 100 scale (i.e., 25 · (rating – 1)). An absolute difference from chance was used as an approximate measure of discriminability when transforming the pitch rankings, because it was assumed that the key factor is electrode discriminability rather than the perceived direction of the pitch difference. Furthermore, a monotonic relationship between electrode number and perceived pitch height cannot be assumed for all implant listeners. Fig. 8 shows the relationship between reported segregation, pitch discrimination, and the confidence of listeners in their pitch rankings. In general, with the exception of L1 in the basal direction, there is a similarity in the pattern of individual listeners’ results across these three measures. For example, L6 showed a strong tendency towards reporting segregation except for those electrodes closest to electrode 11. He also showed nearly perfect pitch ranking performance in experiment 2 and reported the highest overall confidence (mean = 4.6). In comparison, L5 showed broader tuning on all three measures. The overall relationship between reported segregation and confidence in pitch ranking is closer than between reported segregation and pitch discrimination. For example, L4 reported high segregation and high confidence in her pitch judgements when tone B stimulated electrodes towards the apical end of the array, despite her poor discrimination of pitch for these electrodes relative to electrode 11. Linear regressions providing correlations between the reported segregation scores and (a) the transformed pitch rankings and (b) the transformed confidence ratings for each listener were calculated (Table 4). A separate correlation was calculated for each of the two directions of electrode separation, i.e. basal (electrodes 1–10) and apical (electrodes 12–22). Five out of the 12 correlations calculated between reported segregation and pitch ranking were significant and positive. Nearly all (11/12) of the correlations between reported segregation and the confidence ratings were significant and positive. These findings confirm the observation above that there is a closer relationship between reported segregation and confidence in pitch rankings than with pitch discrimination per se. It should be noted that, for L6, the correlation between his reported segregation scores and transformed pitch rankings was low and not significant, despite his similar and high scores on these two measures. This was presumably the result of a ceiling effect, because many of the values for both measures were close to 100%. In contrast, L6 showed a very high and significant correlation between his reported segregation scores and confidence ratings. The overall relationship between the measures obtained in experiments 1 and 2 is broadly consistent with our proposal that the results of experiment 1 primarily reflect the channel discrimination abilities of our listeners, rather than the effects of automatic stream segregation. However, we acknowledge that correlation does not prove causation, and that there is an alternative explanation of this relationship that merits consideration. Specifically, the correspondence between reported segregation and pitch ranking
H.R. Cooper, B. Roberts / Hearing Research 225 (2007) 11–24
21
Fig. 8. Comparison of individual results for reported segregation, transformed pitch rankings, and transformed confidence ratings.
may have arisen simply because implant listeners who are more capable of perceptually mapping pitch are also better able to segregate different frequencies into separate perceptual streams. Although this account cannot be ruled out, we suggest that two observations are more compatible with our interpretation. First, as already discussed, our implant listeners have shown an absence of the rate and ambiguity effects characteristic of streaming studies using normalhearing listeners. Second, three of the six listeners who took part in both experiments (L2, L4, and L6) showed an exceptionally close parallel between reported segregation and their transformed confidence ratings for pitch ranking, with large and highly significant correlations between the two measures (see Table 4 and Fig. 8). Indeed, the data sets almost overlap for two of these listeners. This indicates that these listeners were reporting segregation
close to the point at which they began to become confident in their pitch rankings. If normal-hearing listeners behaved in this way, then frequency separations in the region of 1% would lead to reports of segregated percepts. 6. General discussion Both sensory and cognitive factors influence behaviour in a perceptual task, and the relationship between these factors is a complex one. It is undoubtedly the case that there are constraints on the sensory information available from a cochlear implant – there is a lack of periodicity information and limited channel discrimination. However, the consequences of these constraints for auditory stream segregation in cochlear implant listeners may be indirect. Specifically, the implant may well provide enough sensory
22
H.R. Cooper, B. Roberts / Hearing Research 225 (2007) 11–24
Table 4 Correlations between reported segregation scores and (a) transformed pitch scores and (b) transformed confidence scores Listener
Direction of electrode separation
Reported segregation and transformed pitch scores
Reported segregation and transformed confidence scores
R2
R2
p values
p values
L1
Basal Apical
0.025 0.531
0.664 0.011
0.341 0.495
0.076 0.016
L2
Basal Apical
0.001 0.398
0.943 0.038
0.834 0.895
<0.001 <0.001
L4
Basal Apical
0.851 0.034
<0.001 0.588
0.937 0.889
<0.001 <0.001
L5
Basal Apical
0.411 0.835
0.046 <0.001
0.628 0.830
0.006 <0.001
L6
Basal Apical
0.034 0.041
0.612 0.550
0.955 0.563
<0.001 0.008
L7
Basal Apical
0.254 0.229
0.138 0.137
0.596 0.853
0.009 <0.001
Separate correlations were calculated for each direction of electrode separation (i.e.: basal, electrodes 1–10, and apical, electrodes 12–22). Correlations that are statistically significant (p < 0.05) are shown in bold text.
information to discriminate two subsets of sounds in a sequence, but this is not necessarily sufficient to support their automatic segregation by cognitive grouping mechanisms. Moore and Gockel (2002) concluded, in their review of sequential grouping, that the extent to which stream segregation occurs is directly related to the degree of perceptual difference between successive sounds. One might postulate that sensory constraints are likely to reduce the perceptual space available to implant listeners when compared with normal-hearing listeners, such that the degree of perceptual difference between successive sounds that stimulate different electrodes will be reduced. Given that the degree of perceptual difference required to trigger automatic stream segregation is much larger than that required simply to discriminate between stimuli, a more limited perceptual space within which successive sounds can differ may reduce the ability of implant listeners to achieve stream segregation. If this conjecture is correct, then implant listeners would have to rely more heavily on attentional mechanisms to be able to select a subset of sounds from a sequence. In Bregman’s (1990) terminology, this would mean that implant listeners would have to rely more on schema-based grouping than primitive grouping, compared with normalhearing listeners. Hong and Turner (2006) have recently presented the results of a temporal discrimination task designed to provide a performance measure of stream segregation in cochlear implant listeners. The rationale for the use of such a task is that perceptual properties of auditory events are computed within but not across streams. Therefore, stream segregation should lead to worse performance on a task requiring judgements of the relative timing of sounds (e.g., Warren et al., 1969; Bregman and Campbell, 1971). Hong and Turner’s first experiment was based on the task introduced by Roberts et al. (2002), in which listeners hear
in succession on each trial two sequences of rapidly alternating pure tones (A and B). The rhythm remains isochronous throughout in one sequence; in the other it begins as isochronous but becomes progressively irregular after several AB cycles. The extent of the delay applied to tone B was varied using an adaptive staircase (Levitt, 1971) and the task of the listener was to identify the irregular interval. By introducing the delay on tone B only after several AB cycles, sufficient time was allowed for the strength of stream segregation to build up before the two sequences began to differ (Bregman, 1978; Anstis and Saida, 1985). Consistent with the idea that stream segregation occurs in cochlear implant listeners, Hong and Turner (2006) observed a strong and progressive increase in temporal discrimination thresholds as the frequency separation between tones A and B was increased. However, this finding is not conclusive in itself, because thresholds for detecting a temporal gap between two isolated pure tones rise for normalhearing listeners as the frequency separation is increased (see, e.g., Grose et al., 2001). Similarly, gap detection thresholds rise as the physical separation of stimulated electrode pairs is increased in cochlear implant listeners (Hanekom and Shannon, 1998). Therefore, in their second experiment, Hong and Turner measured the extent to which detection of the delay on tone B was dependent on frequency separation in the context of an isolated ABA triplet, for which there was insufficient time for a significant build-up in the tendency for stream segregation. The dependence observed was shallower than in the first experiment, which supports their conclusion that the cochlear implant listeners in their first experiment were experiencing automatic stream segregation. The reason for the discrepancy between our results and those of Hong and Turner (2006) is unclear, but several observations merit note. First, stimulus delivery was very different in the two studies; ours by direct stimulation of
H.R. Cooper, B. Roberts / Hearing Research 225 (2007) 11–24
the implant when configured to activate only one electrode at a time and theirs indirectly via the free field. Second, all our listeners were fitted with the Nucleus CI24 implant, whereas their listeners were a mixture of Nucleus (22 channels) and Clarion (16 channels) implant users. Third, the methods used to assess stream segregation are very different in the two studies; ours based on direct report and theirs inferred from the predicted consequences of stream segregation. Nonetheless, these two approaches have been shown previously to produce compatible outcomes in normal-hearing listeners (Roberts et al., 2002) and in hearingimpaired listeners (Stainsby et al., 2004). Differences in frequency region do not appear to be an issue, as one of the base frequencies used by Hong and Turner (tone A = 2000 Hz) was very similar to ours (tone A = 1938 Hz). However, it should be remembered that the implant channel-to-place mapping varies between individuals. Finally, as in many studies requiring access to limited clinical populations, participant sampling may have been an important factor. We used eight listeners in our experiment 1; Hong and Turner also used eight listeners in their temporal discrimination task with long sequences. However, only three listeners took part in the short-sequence task that is key to sustaining the conclusion that stream segregation influenced judgements of the longer sequences. Further research is needed to resolve the discrepancy between these two studies. In conclusion, we suggest that there is still relatively little evidence to indicate that automatic stream segregation makes a substantial contribution to the perceptual experience of most cochlear implant listeners. Rather, many of these listeners may have to rely mainly on schema-based selection to hear out a subset of acoustic elements from a sequence as a separate perceptual stream. Such a reliance on attentional mechanisms would inevitably limit the ability of cochlear implant listeners to cope with complex listening environments, particularly given that the sensory information that they receive from their implants is already impoverished. There are clear pitfalls in providing a convincing demonstration of stream segregation in cochlear-implant listeners, as illustrated by the two experiments we have reported here. Of the three properties of stream segregation that were investigated (i.e., the effect of frequency separation, the effect of rate, and evidence of perceptual ambiguity), only the first was apparent in our data. Without the other two effects, it is not possible to conclude that a task is providing a genuine measure of stream segregation. Indeed, it may be that many implant listeners will show little or no sign of automatic stream segregation, owing to the reduced perceptual space within which sounds can differ from one another. Acknowledgement This work was supported by the ‘Hear and Now’ Trust Fund (University Hospital Birmingham, UK).
23
References Anstis, S., Saida, S., 1985. Adaptation to auditory streaming of frequencymodulated tones. J. Exp. Psychol.: Hum. Percept. Perform. 11, 257– 271. Bregman, A.S., 1978. Auditory streaming is cumulative. J. Exp. Psychol.: Hum. Percept. Perform. 4, 380–387. Bregman, A.S., 1990. Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, Cambridge, MA. Bregman, A.S., Campbell, J., 1971. Primary auditory stream segregation and the perception of order in rapid sequences of tones. J. Exp. Psychol. 89, 244–249. Chatterjee, M., Galvin, J.J., 2002. Auditory streaming in cochlear implant listeners. J. Acoust. Soc. Am. 111, 2429 [abstract]. Chatterjee, M., Shannon, R.V., 1998. Forward masked excitation patterns in multielectrode electrical stimulation. J. Acoust. Soc. Am. 103, 2565– 2572. Cherry, E.C., 1953. Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Am. 25, 975–979. Collins, L.M., Throckmorton, C.S., 2000. Investigating perceptual features of electrode stimulation via a multidimensional scaling paradigm. J. Acoust. Soc. Am. 108, 2353–2365. Cusack, R., Roberts, B., 1999. Effects of similarity in bandwidth on the auditory sequential streaming of two-tone complexes. Perception 28, 1281–1289. Cusack, R., Roberts, B., 2000. Effects of differences in timbre on sequential grouping. Percept. Psychophys. 62, 1112–1120. Cusack, R., Roberts, B., 2004. Effects of differences in the pattern of amplitude envelopes across harmonics on auditory stream segregation. Hear. Res. 193, 95–104. Fu, Q-J., Nogaki, G., 2004. Noise susceptibility of cochlear implant users: the role of spectral resolution and smearing. J. Assoc. Res. Otol. 6, 19– 27. Greenwood, D.D., 1990. A cochlear frequency-position function for several species – 29 years later. J. Acoust. Soc. Am. 87, 2592– 2605. Grose, J.H., Hall, J.W., Buss, E., Hatch, D., 2001. Gap detection for similar and dissimilar gap markers. J. Acoust. Soc. Am. 109, 1587– 1595. Hanekom, J.J., Shannon, R.V., 1998. Gap detection as a measure of electrode interaction in cochlear implants. J. Acoust. Soc. Am. 104, 2372–2384. Hong, R.S., Turner, C.W., 2006. Pure-tone auditory stream segregation and speech perception in noise in cochlear implant recipients. J. Acoust. Soc. Am. 120, 360–374. Kong, Y-Y., Cruz, R., Ackland Jones, J., Zeng, F-G., 2004. Music perception with temporal cues in acoustic and electric hearing. Ear Hear. 25, 173–185. Levitt, H., 1971. Transformed up-down methods in psychoacoustics. J. Acoust. Soc. Am. 49, 467–477. Miller, G.A., Heise, G.A., 1950. The trill threshold. J. Acoust. Soc. Am. 22, 637–638. Moore, B.C.J., Carlyon, R.P., 2005. Perception of pitch by people with cochlear hearing loss and by cochlear implant users. In: Plack, C.J., Oxenham, A.J. (Eds.), Springer Handbook of Auditory Research: Pitch Perception. Springer, New York. Moore, B.C.J., Gockel, H., 2002. Factors influencing sequential stream segregation. Acta Acust. Acust. 88, 320–333. Nelson, P.B., Jin, S.H., 2004. Factors affecting speech understanding in gated interference: cochlear implant users and normal-hearing listeners. J. Acoust. Soc. Am. 115, 2286–2294. Nelson, D.A., van Tassell, D.J., Schroder, A.C., Soli, S., Levine, S., 1995. Electrode ranking of ‘‘place pitch’’ and speech recognition in electrical hearing. J. Acoust. Soc. Am. 98, 1987–1999. Roberts, B., Glasberg, B.R., Moore, B.C.J., 2002. Primitive stream segregation of tone sequences without differences in fundamental frequency or passband. J. Acoust. Soc. Am. 112, 2074–2085.
24
H.R. Cooper, B. Roberts / Hearing Research 225 (2007) 11–24
Stainsby, T.H., Moore, B.C.J., Glasberg, B.R., 2004. Auditory streaming based on temporal structure in hearing-impaired listeners. Hear. Res. 192, 119–130. Throckmorton, C.S., Collins, L.M., 2002. The effect of channel interactions on speech recognition in cochlear implant subjects: predictions from an acoustic model. J. Acoust. Soc. Am. 112, 285–296.
Van Noorden, L.P.A.S., 1975. Temporal coherence in the perception of tone sequences. Doctoral thesis, Eindhoven University of Technology, The Netherlands. Warren, R.M., Obusek, C.J., Farmer, R.M., Warren, R.P., 1969. Auditory sequence: confusion of patterns other than speech and music. Science 164, 586–587.