Intentional switching in auditory selective attention: Exploring attention shifts with different reverberation times

Intentional switching in auditory selective attention: Exploring attention shifts with different reverberation times

Hearing Research xxx (2017) 1e8 Contents lists available at ScienceDirect Hearing Research journal homepage: www.elsevier.com/locate/heares Intenti...

666KB Sizes 0 Downloads 39 Views

Hearing Research xxx (2017) 1e8

Contents lists available at ScienceDirect

Hearing Research journal homepage: www.elsevier.com/locate/heares

Intentional switching in auditory selective attention: Exploring attention shifts with different reverberation times Josefa Oberem a, *, Julia Seibold b, Iring Koch b, Janina Fels a a b

Institute of Technical Acoustics, Medical Acoustics Group, RWTH Aachen University, Kopernikusstraße 5, 52074 Aachen, Germany €gerstraße 17, 52066 Aachen, Germany Institute of Psychology, RWTH Aachen University, Ja

a r t i c l e i n f o

a b s t r a c t

Article history: Received 6 July 2017 Received in revised form 11 December 2017 Accepted 18 December 2017 Available online xxx

Using a well-established binaural-listening paradigm the ability to intentionally switch auditory selective attention was examined under anechoic, low reverberation (0.8 s) and high reverberation (1.75 s) conditions. Twenty-three young, normal-hearing subjects were tested in a within-subject design to analyze influences of the reverberation times. Spoken word pairs by two speakers were presented simultaneously to subjects from two of eight azimuth positions. The stimuli consisted of a single number word, (i.e., 1 to 9), followed by either the direction “UP” or “DOWN” in German. Guided by a visual cue prior to auditory stimulus onset indicating the position of the target speaker, subjects were asked to identify whether the target number was numerically smaller or greater than five and to categorize the direction of the second word. Switch costs, (i.e. reaction time differences between a position switch of the target relative to a position repetition), were larger under the high reverberation condition. Furthermore, the error rates were highly dependent on reverberant energy and reverberation interacted with the congruence effect, (i.e. stimuli spoken by target and distractor may evoke the same answer (congruent) or different answers (incongruent)), indicating larger congruence effects under higher reverberation times. © 2017 Elsevier B.V. All rights reserved.

Keywords: Auditory selective attention Task switching Reverberation Binaural hearing

1. Introduction Studies on auditory selective attention were firstly introduced by Cherry (1953) and since then analyzed with several dichotic [Broadbent (1958); Pashler (1999); Ihlefeld and Shinn-Cunningham (2008); Bronkhorst (2015); Koch et al. (2011)] and binaural [Best et al. (2007, 2010); Kidd et al. (2005b); Allen et al. (2009); Oberem et al. (2014)] paradigms. In real-life scenes reverberant be lek and Robinson (1982); Darwin energy distorts the signal [Na and Hukin (2000a); Lavandier and Culling (2008)] and therefore it is of interest how auditory selective attention is affected by reverberant energy. Using an attention task where subjects were asked to repeat four consecutive digits spoken by the target speaker always positioned in front in the presence of two other distracting speakers located to the sides, Ruggles and Shinn-Cunningham (2011) varied

* Corresponding author. E-mail addresses: [email protected] (J. Oberem), [email protected] (J. Seibold), [email protected] (I. Koch), [email protected] (J. Fels).

the amount of reverberant energy from “anechoic (RT60 ¼ 0 s), intermediate reverberation (RT60 ¼ 0:4 s) to high reverberation (RT60 ¼ 3 s)”. They reported a great impact on performance when adding reverberation, especially differences in performance between anechoic (60e80% correct) and intermediate reverberation (40e50% correct) were noteworthy. On account of these results they conclude that reverberant energy interferes with spatial selective attention. Similar reverberation times were analyzed by Culling et al. (2003) who measured Speech Reception Thresholds (SRTs) under anechoic (RT60 ¼ 0 s) and reverberant (RT60 ¼ 0:4 s) conditions. Target and distractor were collocated in front of the subject or spatially separated (60+ /þ60+ ). SRTs were found to be significantly lower under anechoic conditions, which was reconfirmed by Lavandier and Culling (2007). The reverberant energy also interacted with the location of target and distractor, indicating no improvement in SRT for spatially separated speakers in the reverberant condition. Contradictory to that were findings by Kidd et al. (2005b). They reported that the effect of reverberation was greater when target and masker were spatially separated rather than collocated at the

https://doi.org/10.1016/j.heares.2017.12.013 0378-5955/© 2017 Elsevier B.V. All rights reserved.

Please cite this article in press as: Oberem, J., et al., Intentional switching in auditory selective attention: Exploring attention shifts with different reverberation times, Hearing Research (2017), https://doi.org/10.1016/j.heares.2017.12.013

2

J. Oberem et al. / Hearing Research xxx (2017) 1e8

same position. Instead of using simulated reverberation times, Kidd and colleagues changed the reverberation of the laboratory by mounting foam and plexiglas to the walls. Further findings were that the amount of masking increased as reverberation times increased and that these acoustic differences also significantly affected the performance in the speech identification task. Related to the cited investigations [Ruggles and ShinnCunningham (2011); Culling et al. (2003); Kidd et al. (2005b)], Darwin and Hukin (2000b) explored the effect of reverberation (RT60 ¼ 0:4 s) on the ability of listeners to maintain their attention to one speaker across time. Using a paradigm with minimal intelligibility requirements it was found that the influences of reverberant energy on inter-aural time differences (ITD) were significant. The use of ITD differences was impaired by reverberation and therefore maintaining attention to the target was more complicated. However, natural prosody and vocal-tract size differences between talkers, being two further cues for selective attention, were not affected by reverberation. In the present study, a paradigm focusing on the intentional switching of auditory attention rather than maintaining the listener's attention to a single source was used. Different to paradigms of cited studies the present paradigm offers the possibility to analyze reaction times of the participants and their error rates. Firstly introduced by Koch et al. (2011), the paradigm has been used and tested [Lawo et al. (2014); Lawo and Koch (2015)] with dichotic reproduction and is by now well-established [Bronkhorst (2015)]. Koch et al. (2011) explicitly examined the endogenous, voluntary attention switches, therefore cued attention switches referred to the target's gender and its location (e.g. the target's gender switched between trials; in the preceding trial the target was a male speaker on the left side and in the following trial the target was a female speaker on the right side). The main finding was that a cued switch of the relevant target resulted in a worse performance than in cued repetitions of the relevant target's speaker gender and location [Lawo et al. (2014); Lawo and Koch (2015)]. Furthermore, the role of attentional control in processing of task-irrelevant information in auditory attention switching has been explored by the authors. The participants' task was to always categorize the relevant number word presented by the target speaker as smaller than or greater than five and press the corresponding response button. The two presented stimuli of one trial could be congruent (both number words smaller than five or both greater than five) and therefore call for the same response, or they could be incongruent (one digit was smaller and one was greater than five) and therefore call for different responses. The “congruency effect” [Kiesel et al. (2010)], showing that participants respond faster in congruent trials than in incongruent trials, was confirmed [Koch et al. (2011)], suggesting some processing of irrelevant information (i.e. of distractor's speech). The paradigm was also extended into a binaural version to reproduce more realistic scenarios [Oberem et al. (2014, 2017a,b); Fels et al. (2016)]. For this purpose, a scene with more combinations of the speaker's locations than in the dichotic set-up was provided in an anechoic chamber with different binaural reproduction methods. Besides a set-up with real loudspeakers, headrelated-transfer-functions were used to present binaural stimuli via headphones. In the investigation by Oberem et al. (2014) it was found that a required switch of the attention focus yielded longer reaction times and increased error rates than a repetition of the target's location, which were also dependent from the target's location itself. In the present investigation participants had to categorize one of two binaurally presented couple of a number word and a direction word according to numerical size and direction. The target and distracting speaker were always positioned in one out of eight

different positions around the listener but never collocated [Fels et al. (2016)]. As outcome measures reaction times as well as error rates were observed. Inspired by the findings of Ruggles and Shinn-Cunningham (2011), reverberation times in three levels from anechoic (RT60 ¼ 0 s), low reverberation (RT60 ¼ 0:8 s, comparable to an acoustically untreated classroom instead of RT60 ¼ 0:4 s, comparable to a damped recording room) to high reverberation (RT60 ¼ 1:75 s, comparable to a auditorium instead of RT60 ¼ 3 s, comparable to a medium-sized church) were simulated. The underlying room model was also designed with comparable diameters, however, walls were not set to be parallel and the listener was not positioned in the center of the room to prevent unwanted acoustical effects due to nodal points or echos [Hartmann (1983); re and Abel (1993)] (c.f. SecRakerd and Hartmann (1985); Gigue tion 2.4). It was postulated that reverberant energy would increase reaction times and error rates in the present investigation, based on the cited findings [Ruggles and Shinn-Cunningham (2011); Culling et al. (2003); Kidd et al. (2005a); Darwin and Hukin (2000b)]. Furthermore, Ruggles and Shinn-Cunningham (2011) showed how maintaining auditory selective attention on a single sound source in presence of interfering sources is degraded by reverberant energy. These findings led to the hypothesis of increased reaction times and error rates for repetition trials (i.e. where a listener is asked to focus on the same direction in two consecutive trials (c.f. Section 2.6)), under increased reverberation in the present investigation. Since it was known, how reverberation degrades ITD timing information, which results in a blurred localization information [Ruggles and Shinn-Cunningham (2011)], it was predicted in the present investigation that localizing a new sound source and focusing attention on that source would also degrade with increasing reverberation times. This is the case for switch trials where the listener has to switch his/her attention to a new spatial position between trials (c.f. Section 2.6). Spatial separation turned out to be beneficial in findings by Kidd et al. (2005a) under increasing reverberation times, however, Culling et al. (2003) reported an opposite effect. Therefore, in this investigation special attention is focused on the spatial location of target and distractor as well as their angular separation. 2. Methods 2.1. Participants A number of 23 paid (8 Euro) students aged between 19 and 34 years (mean age: 23.8 ± 3.4 years) participated in the experiment. Subjects were equally divided into male (12) and female (11) listeners. Listeners were screened to ensure that they had normal hearing (within 20 dBHL) for frequencies between 250 Hz and 10 kHz via pure-tone ascending standard audiometry. All listeners could be considered as non-expert listeners since they had never participated in a listening test on auditory selective attention. 2.2. Stimulus material Speech material was recorded under anechoic conditions with two male and two female professional, native German speakers. The used hardware, a large diaphragm condenser microphone TLM170 by Neumann and Zoom H6 Handy Recorder (both: cardioid directivity pattern), allowed recordings with a frequency range from 70 Hz to 20 kHz. The stimuli consisted of a single spoken digits (1e9, excluding 5) which was followed by one of two German disyllabic direction words (“UP”, in German “OBEN” and “DOWN”, in German “UNTEN”)(e.g. the combined stimulus could be “Four

Please cite this article in press as: Oberem, J., et al., Intentional switching in auditory selective attention: Exploring attention shifts with different reverberation times, Hearing Research (2017), https://doi.org/10.1016/j.heares.2017.12.013

J. Oberem et al. / Hearing Research xxx (2017) 1e8 Table 1 Reverberation times in seconds for the three simulated rooms in octave bands. Frequency in Hz

Anechoic Low High

63

125

250

500

1k

2k

4k

8k

0 0.5 2.5

0 0.6 2.2

0 0.8 2.0

0 0.8 1.8

0 0.8 1.7

0 0.8 1.5

0 0.6 1.2

0 0.5 0.8

Fig. 1. Visual cue with target cued in the direction front-left.

Up” or “Seven Down”). With a time stretching algorithm that maintains the original frequency distribution of the recording [Institute of Technical Acoustics, RWTH Aachen (2012)], all stimuli (digits and direction words separately) were shortened or stretched to 600 ms (max. modification of length: 32%). Therefore, stimuli started and ended synchronously when presented at the same time. The total length of the stimulus is therefore 1200 ms. The loudness of the recorded stimuli was adjusted according to DIN 45631 [German Standard (2010)]. 2.3. Room setup The listening tests took place in a hearing booth (l  w  h ¼ 2:3  2:3  1:98m3 ) to ensure a quiet environment during the test. Lights were turned off during the listening test to direct the focus to the aural sense other than the visual sense [Blauert (1997)]. 2.4. Synthesis and reproduction of stimuli The modeled room had a total volume of 137 m3 with a €der (2012)]. All walls were quadrangular ground area [RAVEN Schro not parallel and had different lengths (front: 6:1 m, right: 7:5 m, back: 6:0 m, left: 7:6 m). The height of the room was set to a constant value of 3 m. The listener was located slightly off the center of the room with a sitting height of 1:3 m (listener's position: l ¼ 2:9 m; w ¼ 3:2 m; h ¼ 1:3 m). The source positions were located in a circular arrangement around the listener, each with a distance of 1:8 m to the listener. Absorption coefficients in the room model were changed to achieve three levels of reverberation: anechoic (RT60 ¼ 0 s), low reverberation (RT60 ¼ 0:8 s), and high reverberation (RT60 ¼ 1:75 s). Reverberation times for octave-bands from 63 Hz to 8 kHz are listed in Table 1. The absorption coefficients for the three conditions were adjusted according to common building materials for floor, wall and ceiling.

3

Binaural room impulse responses (BRIR) were calculated with RAVEN based on the simulated room model as well as HRTFs of an €der (2012)]. artificial head, measured in an anechoic chamber [Schro The dummy head is a mannequin produced at the Institute of Technical Acoustics, RWTH Aachen University, with a simple torso and a detailed ear geometry [Schmitz (1995); Minnaar et al. (2001)]. Open headphones (Sennheiser HD 600) were used for the binaural reproduction. As shown by Masiero and Fels (2011), a robust headphone equalization is especially important when headphones are taken off or repositioned on the head during the experiment. Hence, a number of eight headphone transfer functions (HpTFs) were measured with MATLAB, a Focusrite Scarlett 2i2 and Sennheiser KE3 microphones placed at the entrance of the blocked ear canal using exponential sweeps (frequency range: 70 Hz to 20 kHz, bit depth: 24 bit, sampling rate: 44:1 kHz, total excitation length: 7:5 s, no averaging). After every measurement the subject was asked to reposition the headphones. Measurements were averaged and the final headphone equalization was realized as minimum-phase-filter [Masiero and Fels (2011)]. The convolution of stimuli with BRIRs and equalization was done off-line using Matlab. All binaural stimuli were stored separately in wave format. The binaural reproduction was static, meaning HRTFs were not adapted in real time in case of head movements. Recent investigations [e.g. McAnally and Martin (2014)] have shown how listener's head movements had a great impact on localization. However, using the presented paradigm static and dynamic reproduction were compared in an investigation [Oberem et al. (2017b)] and findings do not show a significant difference between reproduction methods. Therefore, the authors chose the static reproduction for the present investigation. 2.5. Experimental procedure The original paradigm was firstly introduced by Koch et al. (2011) and developed to analyze the intentional switching in auditory selective attention using dichotic listening [Bronkhorst (2015); Oberem et al. (2014); Lawo et al. (2014)]. Stimuli and response options were successfully extended to ensure the possibility to analyze the intentional switching of auditory selective attention in realistic environments [Fels et al. (2016)]. It consisted of two simultaneously presented stimuli. These stimuli were spoken by two speakers of opposite sex. The target and the distractor were located in two different directions (out of eight possible, cf. Fig. 1). The subjects' task was to focus on the target-speaker and ignore the distractor. To distinguish between target and distractor the target speaker's direction was cued in advance. Hence, a visual cue highlighting the target's direction was shown on a monitor (15 inch screen, 0:7 m distance). The visual cue consisted of a sketch of all directions indicating the target direction with a filled dot (cf. Fig. 1). Stimuli of both speakers consisted of a single spoken digit (1e9, excluding 5) which was followed by a German disyllabic direction word for “UP” or “DOWN” (e.g. the combined stimulus could be “Eight Up” or “One Down”). The participants' task was a two-step process. Concerning the digit the participant had to categorize the relevant digit presented by the target speaker as smaller or larger than five. Secondly, the direction word had to be cognitively processed to press the corresponding response button. There were four response possibilities given in a quadratic arrangement to be pressed by index fingers and middle fingers of both hands (clockwise: “ > 5þup” to be pressed by right index finger, “ > 5þdown” to be pressed by right middle finger, “ < 5þdown” to be pressed by left middle finger, “ < 5þup” to be pressed by left index finger).

Please cite this article in press as: Oberem, J., et al., Intentional switching in auditory selective attention: Exploring attention shifts with different reverberation times, Hearing Research (2017), https://doi.org/10.1016/j.heares.2017.12.013

4

J. Oberem et al. / Hearing Research xxx (2017) 1e8

Fig. 2. Target's spatial position: Median plane (Front; Back), Diagonal planes (Front-Left; Front-Right; Back-Left; Back-Right), Frontal plane (Left; Right).

Fig. 3. Spatial angle between target and distractor: 45 , 90 , 135 and 180 . Exemplary for the distractor's position being in front.

Therefore the categories of smaller and greater five were mapped to the left hand buttons and the right hand buttons at the front side of a controller. Furthermore, the direction word presented by the target gave information whether the index finger (in case the direction word was “UP”) or the middle finger (in case the direction word was “DOWN”). Each trial started with a visual cue presented on the monitor in front of the subject. After a cue-stimulus interval (CSI) of 500 ms the two acoustic stimuli (target and distractor) were simultaneously presented. The visual cue remained on the screen until the subject responded to the acoustic target. The interval between response and next cue (RCI) was also set to 500 ms. In case of an error visual feedback (“Fehler!”, German for “error”) was displayed for 500 ms, delaying the onset of the next cue. In total, 576 trials divided into six blocks of 96 trials each were separated by short breaks (2min and 5min between the third and fourth block, respectively). The experimental blocks were preceded by two training blocks. The first training block (10 trials) presented the target's speech only to give the subject the opportunity to get familiar with the input device. Another 40 trials were presented in

the second anechoic training's block, also including the distractor's speech as in the experimental blocks. The total duration of the

Fig. 4. Reaction times (in ms) as a function of reverberation time, target's position and attention switch (RT  TP  AS). In the interest of clarity the factor congruency is not shown. Error bars indicate standard errors.

Please cite this article in press as: Oberem, J., et al., Intentional switching in auditory selective attention: Exploring attention shifts with different reverberation times, Hearing Research (2017), https://doi.org/10.1016/j.heares.2017.12.013

J. Oberem et al. / Hearing Research xxx (2017) 1e8

experiment did not exceed 70 min including an audiometry. The three different reverberation conditions were changed blockwise. Conditions were assigned to block numbers according to a Latin Square Design. Within a block the location of the target speaker was repeated or changed by the same chance. The location of the distracting speaker was changed in every trial. Changes of speakers positions were assigned randomly. Furthermore, trials were counterbalanced over combinations of digits. The speakers of the stimuli were assigned randomly. 2.6. Experimental design Two analyses were conducted. The independent variables of the first analyses were reverberation time (RT: anechoic vs. low reverberation vs. high reverberation), target's position (TP: median plane vs. diagonal plane vs. frontal plane), auditory attention switch (AS: switch vs. repetition) and congruency (C: congruent vs. incongruent) as within-subject variables. In the second analysis a closer look on the spatial arrangement of target and distractor was taken. Therefore the independent variables were reverberation time (RT: anechoic vs. low reverberation vs. high reverberation), target's position (TP: median plane vs. diagonal plane vs. frontal plane) and spatial angle between target and distractor (ANG: 45 vs. 90 vs. 135 vs. 180 ) as within-subject variables. Dependent variables were reaction time and error rates in both analyses. 2.6.1. Reverberation time The reverberation time of the room was varied between an anechoic condition and two reverberant conditions, one with fairly low and one with rather high reverberation times (0 s vs. 0.8 s vs. 1.75 s). Further details about the synthesis and reproduction of the stimuli is explained in Section 2.4. 2.6.2. Spatial position of target Furthermore, the effect of the target's position (TP) was studied. There were eight possible positions for the target (cf. Fig. 1). These eight positions were categorized into three different classes (cf. Fig. 2). The classes were designed with respect to the planes of the head-related coordinate system [Blauert (1997)] as well as the results of the analysis of preceded studies [Oberem et al. (2017a)]. The first class included all positions on the median plane (Front; Back) and was therefore called “Median”. The second class described all positions placed on the inter-aural axis and i.e. on the frontal plane (Left; Right); which was called “Frontal”. The third class, named “Diagonal”, included all other possible spatial positions which were located in 45 of the defined planes of the head-related coordinate system.

Fig. 5. Error rate (in %) as a function of reverberation time, target's position and congruency (RT  TP  C). In the interest of clarity the factor attention switch is not shown. Error bars indicate standard errors.

5

2.6.3. Auditory attention switch Auditory attention switch (AS) referred to the target's spatial position in two consecutive trials. The target's spatial position could either be repeated from one trial to another, (e.g. front - front), or switched between trials, (e.g. left - back). The distractor's position was switched between all trials. 2.6.4. Congruency Congruency (C) referred to the stimuli of target and distractor within one trial. The variable had two different levels (congruent vs. incongruent). A trial was considered congruent when target's digit and distractor's digit belonged to the same category (both digits were smaller than 5 or both greater than 5 (e.g., 2 and 4, 6 and 9)) and the direction word was identical. In case the digits belonged to different categories (one digit was smaller and one was greater than 5 (e.g. 1 and 7, 8 and 3)) and/or the direction word was not identical the trial was considered as incongruent. 2.6.5. Spatial angle between target and distractor In a second analysis, the effect of the angle between target's position and distractor's position (ANG) was studied. There were four possible angles between the target and the distractor since they could never be collocated. With respect to the defined positions for target and distractor, (c.f. Fig. 2), the angles were 45 , 90 , 135 and 180 as shown in Fig. 3. 3. Results For the analysis of reaction times and error rates, the training sequence was removed from data. The first trial in every block and every trial with a reaction time exceeding ±3 standard deviations from the individual's mean reaction time were also excluded from the analysis (244 excluded trials of 13,248 trials in total). Additionally, for the analysis of reaction times, every trial with an error and the following trial were eliminated since these trials could not validly be defined as switch or repeat trials (2985 excluded trials of 13,248 trials in total). Reaction times and the error rates were submitted to two separate 4-waywithin-subject analysis of variances (ANOVA) with the factors of reverberation time (RT), target's position (TP), auditory attention switch (AS) and congruency (C). A Kolmogorov-Smirnov test was used to test for normality (p > :05). In case Mauchly's test indicated that the assumption of sphericity was violated for the effect of spatial target position variable (with three levels), the Huynh-Feldt correction was applied. In reaction times, there was a non-significant trend towards smaller reaction times in the anechoic condition (anechoic: 1812 ms vs. low rev.: 1871 ms vs. high rev.: 1866 ms) ½RT : F < 1. The main effect of the target's position ½TP : Fð1:51; 33:13Þ ¼ 32:96; p < :001 was significant. Reaction times were largest for trials where the target was positioned in the median plane and smallest for trials where the target was positioned in the frontal plane (Median: 1964 ms vs. Diagonal: 1865 ms vs. Frontal: 1719 ms). The interaction of reverberation time and position was not significant ½RT  TP : Fð3:42; 75:28Þ ¼ 1:38; p ¼ :25. The main effect of the attention switch was significant and indicated larger reaction times for switches than for repetitions (1900 ms vs. 1799 ms) ½AS : Fð1; 22Þ ¼ 17:11; p < :001 (cf. Fig. 4). The switch costs (reaction time differences between switch trials and repetition trials) amounted on average to 101 ms. The ANOVA yielded a significant interaction of reverberation time and attention switch ½RT  AS : Fð2; 44Þ ¼ 3:45; p ¼ :04, indicating greater switch costs for anechoic conditions than for reverberant conditions (anechoic: 156 ms vs. low rev.: 82 ms vs. high rev.: 63 ms).

Please cite this article in press as: Oberem, J., et al., Intentional switching in auditory selective attention: Exploring attention shifts with different reverberation times, Hearing Research (2017), https://doi.org/10.1016/j.heares.2017.12.013

6

J. Oberem et al. / Hearing Research xxx (2017) 1e8

Table 2 Reaction times (in ms) and Error Rate (in %) for different target's positions and different angles between target and distractor. Position

Angle

Reaction Time

Error Rate

Median

45 90 135 180 45 90 135 180 45 90 135 180

1981 1917 1916 2033 1918 1860 1870 1744 1960 1738 1605 1613

13.4 11.8 12.1 27.1 21.1 19.4 10.7 8.3 23.3 9.3 8.7 6.0

Diagonal

Frontal

Post-hoc tests show a significant difference between repetitions of the anechoic and the high reverberation condition (p < :05). The two-way interaction of the target's position and the attention switch, as well as the three-way interaction with reverberation time was not significant ½TP  AS : Fð2; 44Þ ¼ 2:96; p ¼ :06, ½RT  TP  AS : F < 1. There was a non-significant trend towards greater reaction times for incongruent trials than for congruent trials (incongruent: 1885 ms vs. congruent: 1814 ms) ½C : Fð1; 22Þ ¼ 1:38; p ¼ :18. All two-way, three-way and the four-way interactions including congruency did not turn out to be significant ½RT  C : F < 1, ½TP  C : Fð2; 44Þ ¼ 1:00; p ¼ :37, ½AS  C : F < 1, ½RT  TP  C : Fð4; 88Þ ¼ 1:20; p ¼ :32, ½RT  AS  C : Fð1:71; 37:55Þ ¼ 2:17; p ¼ :14, ½TP  AS  C : F < 1, ½RT  TP  AS  C : F < 1. Summarizing, the effect of attention switch turned out to be significant and also interacted with the reverberation time. Furthermore, the position of the target speaker was a significant main effect. In error rates, there was a significant main effect of reverberation time ½RT : Fð2; 44Þ ¼ 3:94; p ¼ :03, indicating smaller error rates in anechoic conditions than in reverberant conditions (anechoic: 12:4% vs. low rev.: 13:4% vs. high rev.: 14:8%). The main effect of the target's position ½TP : Fð2; 44Þ ¼ 23:87; p < :001 was significant. Error rates were highest for trials where the target was positioned in the median plane and smallest for trials where the target was positioned in the frontal plane (Median: 15:9% vs. Diagonal: 14:9% vs. Frontal: 9:8%). The interaction of reverberation time and target's position was not significant ½RT  TP : F < 1. The main effect of congruency was significant ½C : Fð1; 22Þ ¼ 231:76; p < :001 indicating greater error rates for incongruent trials than for congruent trials (incongruent: 23:4% vs. congruent: 3:7%). The ANOVA yielded a significant interaction of reverberation time and congruency ½RT  C : Fð2; 44Þ ¼ 5:38; p < :01, indicating increasing error rates in incongruent trials with increasing reverberation time, while congruent trials lead to the same error rate for all tested reverberation times (congruent: z3:7%) (anechoic: 21:1% vs. low rev.: 23:1% vs. high rev.: 26:1%)(cf. Fig. 5). The interaction of the target's position and the congruency turned out to be significant ½TP  C : Fð2; 44Þ ¼ 27:74; p < :001. The congruency effect is greatest in Median plane and smallest in Frontal plane (difference in error rates between congruent and incongruent trials: Median: 26:0% vs. Diagonal: 20:4% vs. Frontal: 12:9%). The ANOVA yielded no significant three-way interaction of reverberation time, target's position and congruency ½RT  TP  C : F < 1. No significant attention switch effect was found in error rates ½AS : F < 1. All two-way, three-way and the four-way interactions including attention switch did not turn out to be significant

½RT  AS : F < 1, ½TP  AS : Fð2; 44Þ ¼ 2:38; p ¼ :11, ½AS C : F < 1, ½RT  TP  AS : Fð4; 88Þ ¼ 2:32; p ¼ :86, ½RT  AS  C : F < 1, ½TP AS  C : F < 1, ½RT  TP  AS  C : Fð4; 88Þ ¼ 1:58; p ¼ :19. Error rates are displayed in Fig. 5 as a function of reverberation time, target's position and congruency. For purpose of clarity and since attention switch did not yield a significant main effect or any significant interaction, the variable of attention switch is not included in the figure. Summarizing, the main effect of reverberation time was significant. Furthermore, the effect of congruency turned out to be significant and also interacted with the reverberation time. Another interaction was found between congruency and position of the target speaker, which was also a significant main effect. In a second analysis the spatial arrangement between target and distractor was analyzed. Therefore reaction times and the error rates were submitted to two separate 3-waywithin-subject analysis of variances with the variables of reverberation time (RT), target's position (TP) and spatial angle between target and distractor (ANG). In reaction times, there was no significant effect of reverberation ½RT : F < 1. The main effect of target's position ½TP : Fð2; 44Þ ¼ 28:43; p < :001 was significant. Reaction times were largest for trials where the target was positioned in the median plane and smallest for trials where the target was positioned in the frontal plane (Median: 1962 ms vs. Diagonal: 1848 ms vs. Frontal: 1729 ms). The interaction between reverberation time and position was not significant ½RT  TP : Fð3:31; 72:88Þ ¼ 1:00; p > :40. The main effect of the spatial angle between target and distractor was significant ½ANG : Fð3; 66Þ ¼ 13:25; p < :001, indicating highest reaction times for adjacent speakers (spatial angle of 45 between target and distractor) (1953 ms) and significantly lower reaction times for trials with a spatial separation of 90 (1839 ms). In case target and distractor were located in two separate quadrants (i.e. 135 and 180 ) reaction times were again significantly lower (1797 ms and 1798 ms). The interaction between reverberation time and spatial angle was not significant ½RT  ANG : F < 1D. The two-way interaction of position and spatial angle was significant ½TP  ANG : Fð3:29; 72:46Þ ¼ 9:64; p < :001. For a spatial angle of 45 reaction times did not significantly differ between target's positions (Median: 1981 ms vs. Diagonal: 1960 ms vs. Frontal: 1918 ms). However, for a spatial angle of 180 reaction times significantly differ between target's positions (Median: 2033 ms vs. Diagonal: 1744 ms vs. Frontal: 1613 ms). All reaction times are displayed in Table 2. The three-way interaction did not turn out to be significant ½RT  TP  ANG : Fð8:95; 196:90Þ ¼ 1:14; p ¼ :34. In error rates, there was no significant effect of reverberation ½RT : Fð2; 46Þ ¼ 2:25; p ¼ :12. The main effect of the target's position ½TP : Fð2; 46Þ ¼ 10:60; p < :001 was significant. Error rates were largest for trials where the target was positioned in the median plane and smallest for trials where the target was positioned in the frontal plane (Median: 16:1% vs. Diagonal: 14:9% vs. Frontal: 11:8%). The interaction between reverberation time and position was not significant ½RT  TP : F < 1. The main effect of the spatial angle between target and distractor was significant ½ANG : Fð3; 69Þ ¼ 20:60; p < :001, indicating highest error rates for adjacent speakers (spatial angle of 45 between target and distractor) (19:3%) and significantly lower error rates for trials with a spatial separation of 180 (13:8%). Error rates for trials where target and distractor were separated by 90 were comparable (13:5%). Lowest error rates could be found for a spatial separation of 135 , amounting to 10:5%. The interaction between reverberation time and spatial angle was not significant ½RT  ANG : Fð6; 138Þ ¼ 2:17; p ¼ :053.

Please cite this article in press as: Oberem, J., et al., Intentional switching in auditory selective attention: Exploring attention shifts with different reverberation times, Hearing Research (2017), https://doi.org/10.1016/j.heares.2017.12.013

J. Oberem et al. / Hearing Research xxx (2017) 1e8

The two-way interaction of position and spatial angle was significant ½TP  ANG : Fð3:79; 87:22Þ ¼ 41:25; p < :001. Findings in error rates matched those of reaction times. For a spatial angle of 45 , error rates did not vary widely between target's positions (Median: 13:4% vs. Diagonal: 21:1% vs. Frontal: 23:3%). However, for a spatial angle of 180 , reaction times significantly differ between target's positions (Median: 27:1% vs. Diagonal: 8:3% vs. Frontal: 6:0%). All error rates are displayed in Table 2. The three-way interaction did not turn out to be significant ½RT  TP  ANG : Fð8:95; 196:90Þ ¼ 1:19; p ¼ :29. Summarizing, the second analysis showed significant main effects of target's position and the spatial angle between target and distractor as well as their interaction in reaction times and error rates. 4. Discussion In line with colleagues Kidd et al. (2005a) as well as Darwin and Hukin (2000b), reverberation significantly degraded the performance in error rates in the range of 2:4% between anechoic and high reverberation conditions. In addition to error rates and different to the cited investigations, reaction times were also observed, where only a non-significant trend towards greater reaction times for higher reverberant energy was found. However, the most important finding in this study was the interaction between reverberation and the attention switch in reaction times. In accordance with previous findings [Koch et al. (2011); Lawo et al. (2014); Lawo and Koch (2015); Oberem et al. (2014); Fels et al. (2016)] the attention switch was significant in reaction times amounting to about 100 ms switch costs. These switch costs were strongly depended on the reverberation time yielding to a maximum difference of 93 ms between the anechoic and the high reverberation condition (switch costanechoic¼ 156 ms and switch cost(high reverberation¼ 63 ms). This effect of decreasing switch costs for increasing reverberation was based on the increasing reaction times for repetition trials for increasing reverberation. While under anechoic condition reaction times amounted on average to 1734 ms, they were up to 100 ms greater under high reverberation condition. Reaction times for switch trials did not significantly differ between reverberation conditions (max. difference: 23 ms). Consequently, intentionally switching auditory selective attention was such a great demand by itself that additional reverberant energy did not make the task more difficult regarding reaction times. Independent from switch or repetition trials a significant interaction of reverberation and congruency was observed in error rates. The congruence effect was mainly reflected in error rates amounting to about 20% difference in error rates between congruent and incongruent trials (i.e. stimuli of target and distractor cause the same response and different responses, respectively.). The interaction with reverberant energy showed how the task of focusing on and processing the target speaker while ignoring the speech of the distracting speaker was influenced by reverberation. Up to 5% difference in error rates between anechoic and high reverberation condition for incongruent trials were observed while error rates for congruent trials were not affected by the reverberant energy. The congruence effect could be taken as an implicit performance measure of attending to task relevant information and filtering out the irrelevant information [Koch et al. (2011)] and hence, the conclusion was drawn that reverberation significantly affected attending and filtering out information of target and interferer. To take a closer look at the mismatch of results by Culling et al. (2003) and Kidd et al. (2005a) the separation angle between target and distractor dependent on the target's position in space was analyzed. In reaction times and in error rates there was no

7

significant interaction between the reverberation and the separation angle of speakers. Therefore, results by Culling et al. (2003) could not be verified who found a significant improvement for spatially separated talkers under anechoic conditions, but no improvement for spatially separated talkers under reverberant conditions. However, despite differences in methodology of creating different reverberation times (modeled room vs. real room), the present findings were in agreement with Kidd and colleagues statement that “the large spatial advantage […] appears to be unaffected by a significant degradation of the inter-aural time and level difference cues [which] suggests that the role of spatial separation of the sources in the perceptual segregation of images is robust” [Kidd et al. (2005a)]. Regardless of the reverberation a significant interaction between target's spatial position and the separation angle of target and distractor in reaction times and error rates was found. The target's position was significantly influencing the performance yielding to greatest error rates and reaction times in median plane and lowest values in frontal plane [Oberem et al. (2014); Fels et al. (2016)]. This effect was based on different degrees of difficulty in localization especially in the extreme cases of median plane (ITD and ILD information vanish) and frontal plane (ITD and ILD information are maximal). The significant main effect of spatial angle between talkers yields to the best performance when target and distractor were separated by 180+ and worst performance for adjacent speakers (45+ ) [Middlebrooks and Onsan (2012); Zurek (1993); Yost (1997); Bronkhorst (2015)]. The most profound result in the interaction of these variables was that reaction times and error rates were greatest when target and distractor were positioned in the median plane (Position: Median, Angle: 180+ ) compared to all other combinations. Fundamentals in binaural hearing give reasons for localization challenges due to vanishing ITD and ILD but it was remarkable that this effect could also be observed in reaction times and error rates of an experiment focusing on auditory selective attention. 5. Conclusions Reverberation has a detrimental effect on reaction times when maintaining attention to one source at a constant spatial location, however, intentionally switching the attention to a sound source at a different spatial location requires per se more attention and is more difficult that additional reverberant energy does not have any impact. Furthermore, the human ability to ignore or rather not to process the content of a distracting source is significantly influenced by reverberation. Even though the target's and the distractor's spatial position and their separation angle were not influenced by reverberation it became clear that localization cues (ITD, ILD) have a relevant effect on performance (reaction times and error rates) in tasks of auditory selective attention. Acknowledgments The authors are grateful for the provided financing by DFG (Deutsche Forschungsgemeinschaft, FE1168/1-2 and KO2045/11-2). Special thanks go to Nora Schürhoff who assisted with data collection. References Allen, K., Alais, D., Carlile, S., 2009. Speech intelligibility reduces over distance from an attended location: evidence for an auditory spatial gradient of attention. Percept. Psychophys. 71 (1), 164e173. Best, V., Gallun, F.J., Carlile, S., Shinn-Cunningham, B.G., 2007. Binaural interference and auditory grouping. J. Acoust. Soc. Am. 121 (2), 1070e1076. Best, V., Shinn-Cunningham, B.G., Ozmeral, E.J., Kop co, N., 2010. Exploring the

Please cite this article in press as: Oberem, J., et al., Intentional switching in auditory selective attention: Exploring attention shifts with different reverberation times, Hearing Research (2017), https://doi.org/10.1016/j.heares.2017.12.013

8

J. Oberem et al. / Hearing Research xxx (2017) 1e8

benefit of auditory spatial continuity. J. Acoust. Soc. Am. 127 (6), EL258eEL264. Blauert, J., 1997. Spatial Hearing - the Psychophysics of Human Sound Localization, second ed. MIT Press, Cambridge MA, pp. 11e12. 155e158. Broadbent, D.E., 1958. Perception and Communication. Pergamon, Oxford, Oxford. Bronkhorst, A.W., 2015. The cocktail-party problem revisited: early processing and selection of multi-talker speech. Atten. Percept. Psychophys. 77 (5), 1465e1487. Cherry, E.C., 1953. Some experiments on the recognition of speech, with one and two ears. J. Acoust. Soc. Am. 25 (5), 975e979. Culling, J.F., Hodder, K.I., Toh, C.Y., 2003. Effects of reverberation on perceptual segregation of competing voices. J. Acoust. Soc. Am. 114 (5), 2871e2876. Darwin, C.J., Hukin, R.W., 2000a. Effectiveness of spatial cues, prosody, and talker characteristics in selective attention. J. Acoust. Soc. Am. 107 (2), 970e977. Darwin, C.J., Hukin, R.W., 2000b. Effects of reverberation on spatial, prosodic, and vocal-tract size cues to selective attention. J. Acoust. Soc. Am. 108 (1), 335e342. Fels, J., Oberem, J., Koch, I., 2016. Examining auditory selective attention in realistic, natural environments with an optimized paradigm. Proc. Meet. Acoust. 28 (1), 50001. German Standard, 03.2010. Calculation of Loudness Level and Loudness from the Sound Spectrum - Zwicker Method - Amendment 1: Calculation of the Loudness of Time-variant Sound. re, C., Abel, S.M., 1993. Sound localization: effects of reverberation time, Gigue speaker array, stimulus frequency, and stimulus rise/decay. J. Acoust. Soc. Am. 94 (2), 769e776. Hartmann, W.M., 1983. Localization of sound in rooms. J. Acoust. Soc. Am. 74 (5), 1380e1391. Ihlefeld, A., Shinn-Cunningham, B.G., 2008. Spatial release from energetic and informational masking in a selective speech identification task. J. Acoust. Soc. Am. 123 (6), 4369e4379. Institute of Technical Acoustics, RWTH Aachen, 2012. Ita toolbox.org. Commit-ID ¼ f1b7328cc3ee226197116d61046ea4754bde97d4. Kidd, G., Arbogast, T.L., Mason, C.R., Gallun, F.J., 2005a. The advantage of knowing where to listen. J. Acoust. Soc. Am. 118 (6), 3804e3815. Kidd, G., Mason, C.R., Brughera, A., Hartmann, W.M., 2005b. The role of reverberation in release from masking due to spatial separation of sources for speech identification. Acta Acust. United Acust. 91 (3), 526e536. Kiesel, A., Steinhauser, M., Wendt, M., Falkenstein, M., Jost, K., Philipp, A.M., Koch, I., 2010. Control and interference in task switchingda review. Psychol. Bull. 136 (5), 849e874. €nder, M., 2011. Switching in the cocktail party: Koch, I., Lawo, V., Fels, J., Vorla exploring intentional control of auditory selective attention. J. Exp. Psychol. Hum. Percept. Perform. 37 (4), 1140e1147. Lavandier, M., Culling, J.F., 2007. Speech segregation in rooms: effects of reverberation on both target and interferer. J. Acoust. Soc. Am. 122 (3), 1713e1723. Lavandier, M., Culling, J.F., 2008. Speech segregation in rooms: monaural, binaural,

and interacting effects of reverberation on target and interferer. J. Acoust. Soc. Am. 123 (4), 2237e2248. Lawo, V., Fels, J., Oberem, J., Koch, I., 2014. Intentional attention switching in dichotic listening: exploring the efficiency of nonspatial and spatial selection. Q. J. Exp. Psychol. 67 (10), 2010e2024. Lawo, V., Koch, I., 2015. Attention and action: the role of response mappings in auditory attention switching. J. Cognit. Psychol. 27 (2), 194e206. McAnally, K.I., Martin, R.L., 2014. Sound localization with head movement: implications for 3-D audio displays. Front. Neurosci. 8 (210), 1e6. Masiero, B., Fels, J., 2011. Perceptually robust headphone equalization for binaural reproduction. In: 130th Audio Engineering Society Convention, p. 8388. New York, NY, US. Middlebrooks, J.C., Onsan, Z.A., 2012. Stream segregation with high spatial acuity. J. Acoust. Soc. Am. 132 (6), 3896e3911. Minnaar, P., Olesen, S.K., Christensen, F., Møller, H., 2001. Localization with binaural recordings from artificial and human heads. J. Audio Eng. Soc. 49, 323e336. lek, A.K., Robinson, P.K., 1982. Monaural and binaural speech perception in N abe reverberation for listeners of various ages. J. Acoust. Soc. Am. 71 (5), 1242e1248. Oberem, J., Koch, I., Fels, J., 2017a. Intentional switching in auditory selective attention: exploring age-related effects in a spatial setup requiring speech perception. Acta Psychol. 177, 36e43. Oberem, J., Lawo, V., Koch, I., Fels, J., 2014. Intentional switching in auditory selective attention: exploring different binaural reproduction methods in an anechoic chamber. Acta Acust. United Acust. 100 (6), 1139e1148. Oberem, J., Seibold, J., Koch, I., Fels, J., 2017b. Exploring influences on auditory selective attention by a static and a dynamic binaural reproduction. In: Fortschritte der Akustik: DAGA 2017, pp. 1154e1155. Pashler, H.E., 1999. The Psychology of Attention. MIT Press, pp. 39e53. Rakerd, B., Hartmann, W.M., 1985. Localization of sound in rooms, II: the effects of a single reflecting surface. J. Acoust. Soc. Am. 78 (2), 524e533. Ruggles, D., Shinn-Cunningham, B.G., 2011. Spatial selective auditory attention in the presence of reverberant energy: individual differences in normal-hearing listeners. J. Assoc. Res. Otolaryngol. 12, 395e405. Schmitz, A., 1995. Ein neues digitales Kunstkopfmesssystem (a new digital measurement system for artifical heads). Acta Acust. United Acust. 81 (4), 416e420. €der, D., 2012. Physically Based Real-Time Auralization of Interactive Virtual Schro Environments. Ph.d. thesis. RWTH Aachen University, Aachen Germany, pp. 183e194. Yost, W.A., 1997. The cocktail party problem: forty years later. In: Gilkey, R.H., Anderson, T.R. (Eds.), Binaural and Spatial Hearing in Real and Virtual Environments. Lawrence Erlbaum Associates, Publishers, Mahwah, New Jersey, USA, pp. 329e347. Zurek, P.M., 1993. Binaural advantages and directional effects in speech intelligibility. In: Acoustical Factors Affecting Hearing aid Performance, 2, pp. 255e275.

Please cite this article in press as: Oberem, J., et al., Intentional switching in auditory selective attention: Exploring attention shifts with different reverberation times, Hearing Research (2017), https://doi.org/10.1016/j.heares.2017.12.013