Switching of auditory attention in “cocktail-party” listening: ERP evidence of cueing effects in younger and older adults

Switching of auditory attention in “cocktail-party” listening: ERP evidence of cueing effects in younger and older adults

Brain and Cognition 111 (2017) 1–12 Contents lists available at ScienceDirect Brain and Cognition journal homepage: www.elsevier.com/locate/b&c Swi...

837KB Sizes 0 Downloads 19 Views

Brain and Cognition 111 (2017) 1–12

Contents lists available at ScienceDirect

Brain and Cognition journal homepage: www.elsevier.com/locate/b&c

Switching of auditory attention in ‘‘cocktail-party” listening: ERP evidence of cueing effects in younger and older adults Stephan Getzmann ⇑, Julian Jasny, Michael Falkenstein Leibniz Research Centre for Working Environment and Human Factors, D-44139 Dortmund, Germany

a r t i c l e

i n f o

Article history: Received 7 April 2016 Revised 28 June 2016 Accepted 13 September 2016

Keywords: Aging Speech perception Attention Cueing Event-related potentials

a b s t r a c t Verbal communication in a ‘‘cocktail-party situation” is a major challenge for the auditory system. In particular, changes in target speaker usually result in declined speech perception. Here, we investigated whether speech cues indicating a subsequent change in target speaker reduce the costs of switching in younger and older adults. We employed event-related potential (ERP) measures and a speech perception task, in which sequences of short words were simultaneously presented by four speakers. Changes in target speaker were either unpredictable or semantically cued by a word within the target stream. Cued changes resulted in a less decreased performance than uncued changes in both age groups. The ERP analysis revealed shorter latencies in the change-related N400 and late positive complex (LPC) after cued changes, suggesting an acceleration in context updating and attention switching. Thus, both younger and older listeners used semantic cues to prepare changes in speaker setting. Ó 2016 Elsevier Inc. All rights reserved.

1. Introduction Speech comprehension in the presence of noise or concurrent talkers is one of the most challenging tasks for the auditory system. Especially older adults have difficulties in the so-called cocktailparty situation (Burke & Shafto, 2008), and often failed to understand what has been said in a multi-speaker conversation (for review, Wingfield & Stine-Morrow, 2000). These deficits are partly based on age-related changes in peripheral hearing (e.g., presbycusis) and in central auditory processing (Humes & Dubno, 2010). In addition, speech perception deficits have been related to declines in general cognitive abilities, such as working memory capacity, inhibitory control, and information processing speed (Van der Linden et al., 1999; for review, see Schneider, Pichora-Fuller, & Daneman, 2010). According to the inhibition deficit hypothesis (Hasher & Zacks, 1988), for example, reduced inhibitory control might lead to attentional deficits, resulting in a higher distraction by concurrent auditory stimuli in older, than younger, adults. In general, successful speech perception in a cocktail-party situation depends on auditory scene analysis, consisting of auditory object formation and segregation (Bregman, 1994), and on focusing of auditory attention on the speaker of interest (for review, see Shinn-Cunningham, 2008). Auditory scene analysis and focusing

⇑ Corresponding author at: Leibniz Research Centre for Working Environment and Human Factors, Ardeystraße 67, D-44139 Dortmund, Germany. E-mail address: [email protected] (S. Getzmann). http://dx.doi.org/10.1016/j.bandc.2016.09.006 0278-2626/Ó 2016 Elsevier Inc. All rights reserved.

of attention appear to be (at least partly) object-based (for review, see Alain & Arnott, 2000). According to the object-based hypothesis of auditory attention, an auditory object is defined as the coherent whole of a group of sounds emanating from a single source. Thus, using perceptual grouping principles (Bregman, 1994), we can selectively focus our attention to perceptual objects derived from the auditory scene. There is evidence that the processes underlying auditory scene analysis and attentional orienting are affected by aging. In particular, concurrent stream segregation, i.e., the ability to segregate speech sounds from concurrently presented sounds, appears to be impaired (for review, see Alain, Dyson, & Snyder, 2006), whereas sequential auditory scene analysis is obviously preserved in older adults (Snyder & Alain, 2007). Using behavioral and electrophysiological measures, Snyder and Alain (2005) found agerelated differences in concurrent vowel segregation, indicating that older adults were less able than younger ones to use vocal differences between talkers to separate their speech streams. Older adults needed more time for stream segregation in the presence of concurrent speech than younger ones (Getzmann & Näätänen, 2015). Furthermore, using event-related potential (ERPs) measures and a dynamic multi-talker speech perception task, it could be demonstrated that attentional control (i.e., the switching of attention between different speakers) was delayed in older adults (Getzmann, Wascher, & Falkenstein, 2015a). A number of imaging studies aimed at the neural substrates of auditory attention in speech processing. These studies revealed that the initiation of an auditory attentional shift resulted in

2

S. Getzmann et al. / Brain and Cognition 111 (2017) 1–12

cortical activity changes mainly in superior parietal lobus (SPL), in temporo-parietal junction (TPJ), and some regions of prefrontal cortex (PFC) (e.g., Larson & Lee, 2014; Shomstein & Yantis, 2006). Notably, these regions are also involved in goal-directed attentional control in vision (Corbetta & Shulman, 2002; Serences & Yantis, 2006), suggesting a supramodal mechanism of attention shifting (Corbetta, Patel, & Shulman, 2008). Interestingly, different neural mechanisms of attentional control (i.e., the search of a taskrelevant auditory object in a complex auditory scenery) and attentional selection (i.e., the focussing of attention toward that object of interest and the inhibition of concurrent sound) have been observed: While attentional control activated a left-dominated, fronto-parietal network, mainly comprising inferior frontal gyrus (IFG), SPL, and intraparietal sulcus (IPS), attentional selection recruited a network of cortical areas that include bilateral superior temporal gyrus (STG), superior temporal sulcus (STS), and IPS (Alho, Rinne, Herron, & Woods, 2014; Hill & Miller, 2010). In particular, the left IPS is assumed to be an integrative, multi-modal center that coordinates attention. Comparing brain networks processing (uncued) involuntary and (cued) voluntary auditory attention shifts revealed that areas most selectively involved with cued shifting included the superior/posterior IPS, SPL, and some precentral areas, indicating these regions’ involvement in topdown/voluntary attentional control (Huang, Belliveau, Tengshe, & Ahveninen, 2012; Rossi, Huang, Furtak, Belliveau, & Ahveninen, 2014). Taken together, these neurophysiological findings indicate a distributed cortical network of attention shifting with the involvement of different areas depending on stimuli and taskrelevant factors. There is evidence that auditory scene analysis and focusing of attention are time-consuming (Cusack, Decks, Aikman, & Carlyon, 2004; Larson & Lee, 2013a; Shinn-Cunningham & Best, 2008; for electrophysiological evidence, see Kerlin, Shahin, & Miller, 2010; for a review, see Fritz, Elhilali, David, & Shamma, 2007). Using ERP measures, it has been demonstrated that the attentiondependent process that integrates successive tones within streams takes several seconds to build up (Snyder, Alain, & Picton, 2006). Accordingly, continuity in the auditory scenery resulted in a more efficient object selection and focussing of attention toward the features of a relevant source, whereas changes in the scenery (e.g., switches in speaker locations and voices) require renewed object formation/segregation and attentional re-focussing (Best, Ozmeral, Kopco, & Shinn-Cunningham, 2008). Moreover, deliberate switches from one speaker of interest to another, previously unattended speaker have been found to produce switch costs, indicated by higher error rates and reaction times after a change, relative to the pre-change level (Koch, Lawo, Fels, & Vorländer, 2011). This could also be demonstrated in two recent studies, in which listening costs associated with shifts in spatial attention were tested in dynamic multi-speaker environments. Using conversational turn-takings in a sentence recall task, a pronounced decrease in word recall was found when a target speaker switched location while talking, relative to a non-switching condition (Lin & Carlile, 2015). Significant correlations between switching performance and individual cognitive measures were found, indicating that conversational tracking requires cognitive resources. In particular, working memory, required for maintaining task-relevant information and extracting meaning in adverse listening conditions (Baddeley, 2003; Engle & Kane, 2004), appeared to be essential for speech perception in conversational turn-takings. Using electrophysiological measures, we recently investigated the effect of changes in speaker settings on speech perception of younger and older adults (Getzmann, Hanenberg, Lewald, Falkenstein, & Wascher, 2015b). In a simulated stock-market scenario, sequences of short words (combinations of company names and values) were simultaneously presented by four speakers at different locations in

space, and the participants responded to the value of a target company. While continuity in the target speakers’ voice and location resulted in improved performance, an unexpected and unpredictable change in target speaker caused an increase in error rates. This holds even more so in older participants, who needed more time to re-achieve the pre-change level of performance. The analysis of the event-related potentials (ERPs) before and after a change revealed a N400diff and a late positive complex (LPCdiff) over parietal areas in both age groups. A similar pattern of activation has recently been reported in a multi-talker wordpair semantic categorization task in which younger and middleaged adults responded to an attended stream of words, while ignoring competing speech from a different location (Davis & Jerger, 2014; Davis, Jerger, & Martin, 2013). These components have been interpreted as correlates of increased evaluation of stimulus meaning and context updating as well as attention switching, elicited by a change of the target speaker (Davis & Jerger, 2014; Getzmann, Hanenberg et al., 2015b). Also, similar pattern of N400 and LPC are usually observed after language switching, i.e., when speech information switches from one language to another different language, which has been related to in increased processing at the level of the lexico-semantic system and updating or reanalysis process (e.g., Moreno, Federmeier, & Kutas, 2002; Van Der Meij, Cuetos, Carreiras, & Barber, 2011). Taken together, changes in speaker settings in multi-speaker environments appear to be critical events in speech perception for younger and older adults that are associated with increased cortical processing. It has to be noted, however, that changes in speaker settings are usually not completely unexpected for the listener. In a realistic conversation, a change from one speaker of interest to another may be indicated by modulations in prosodic features and intonation as well as by the semantic content. For example, the structure of an interrogative sentence typically indicates a subsequent change between speakers, i.e., from the questioner to the potentially answering person. Likewise, the semantic end of a sentence might indicate that another speaker will probably continue the conversation. Hence, the question arises whether listeners actively use this information to prepare a switch in attention. This preparation has the potential to improve object re-selection and attentional re-focussing and may thereby reduce the switch costs after a change. In the present study, we investigated effects of cueing of an upcoming change in speaker setting on speech perception of younger and older adults. We therefore modified our previous stock-price monitoring task and introduced a condition in which an incoming change of the target speaker was indicated by a cue. This verbal cue was specified by task-relevant semantic content, i.e., by a keyword consisting of a specific company value. By comparing the performance in speech perception after a cued vs. uncued change we tested whether cueing information helps managing dynamic multi-speaker situations, and whether younger and older adults may differ in the use of cueing information. The analysis of the ERPs focused on the change-related components observed in previous studies (Davis & Jerger, 2014; Davis et al., 2013; Getzmann, Hanenberg et al., 2015b). In particular, we tested whether cued changes are associated with differences in the N400diff and LPCdiff amplitudes and latencies. In addition, we analyzed possible effects of cueing and age on the ERPs that are typically elicited by speech onset. Here, the most pronounced effects were expected for the N2 component. The N2 is usually assumed to reflect inhibitory control and suppressing irrelevant information (Folstein & Van Petten, 2008), and has been found to be reduced in speech perception in older adults (Getzmann, Lewald, & Falkenstein, 2014; Getzmann, Wascher et al., 2015a). Finally, we investigated whether younger and older adults may differ in cue processing by analyzing the ERPs elicited by the cue stimulus.

S. Getzmann et al. / Brain and Cognition 111 (2017) 1–12

2. Methods 2.1. Participants A total of 22 younger (11 female, mean age 24.8 years, age range 21–35 years) and 22 older (11 female, mean age 64.6 years, age range 57–72 years) adults participated in the study. The younger subjects were recruited from local colleges, and the older subjects were recruited by advertisements in newspapers of the city of Dortmund (Germany). All participants reported to be righthanded and healthy, free of medication during the experimental sessions, and without any history of neurological, psychiatric, or chronic somatic problems. All subjects underwent a pure-tone audiometry (Oscilla USB 330; Inmedico, Lystrup, Denmark) at 125–8000 Hz. The audiograms of all listeners indicated normal hearing below 4000 Hz (thresholds better than 35 dB hearing loss). However, at 4000 Hz we measured a mild (26–40 dB HL: 9 subjects) to moderate (41–70 dB HL: 4 subjects) presbyacusis in the older group (Fig. 1C), and a between-group t-test of the mean

3

audiometric thresholds (averaged across 125, 250, 500, 1000, 2000, and 4000 Hz) indicated elevated thresholds of older participants, relative to the younger ones (22.7 dB vs. 9.0 dB; t42 = 7.69; p < 0.001). Before the experiment started, participants gave their written informed consent. The study conformed to the Code of Ethics of the World Medical Association (Declaration of Helsinki) and was approved by the local Ethical Committee of the Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany. 2.2. Apparatus, stimuli and task The experiment took place in a video-controlled, electrically shielded, sound-proof and sound-absorbing room (for details, see Getzmann, Hanenberg et al., 2015b). The participants rested on a vertically adjustable armchair. The position of the head was held constant by a chin rest. In front of the participants, a semicircular array of four broad-band loudspeakers (SC 5.9, Visaton, Haan, Germany) was mounted at a distance of 1.5 m from the center of

A

B

C

Fig. 1. (A) Schematic illustration of the simulated stock-price monitoring scenario with four loudspeakers at different locations in space. (B) Four different speakers presented simultaneously sequences of short company names and numbers. The participants responded to the value of a target company (here ‘‘Bosch”, in bold print), while all other company names were irrelevant and should be ignored. Location and speaker of the target company were kept constant for a variable number of trials and then changed according to a pseudo-random scheme. The change was either cued (by a specific value of the target company, here ‘‘1”, marked by a circle) or uncued. Sequences preceding (Pre) and following a change trial (Post1) were analyzed. (C) Hearing levels of the younger and older participants. Error bars are standard errors across participants (younger: N = 22; older: N = 22).

4

S. Getzmann et al. / Brain and Cognition 111 (2017) 1–12

the head (Fig. 1A). The loudspeakers were arranged at ear level in the horizontal plane and were located at 45°, 15° (left), +15°, and +45° (right). All speech stimuli were digitally recorded in a sound-proof and anechoic environment using a freestanding microphone (MCE 91, Beyerdynamic, Heilbronn, Germany) and a mixing console (1202-VLZ PRO, Mackie, Woodinville, WA, sampling rate 48 kHz). They were processed offline using CoolEdit 2000 (Syntrillium Software Co., Phoenix, AZ, USA) and converted to analogue form via a computer-controlled external soundcard (Terrasoniq TS88 PCI, TerraTec Electronic, Nettetal, Germany). A red light-emitting diode (diameter 3 mm) located at a frontal (0°) position served as a steady fixation light. The speech stimuli consisted of eight one- to two-syllable names of companies (Audi, Bosch, Deutz, Eon, Gerri, Merck, Rhön, Tui) and ten one- to two-syllable German numerals (Eins [1], Zwei [2], Drei [3], Vier [4], Fünf [5], Sechs [6], Sieben [7], Acht [8], Neun [9], Zehn [10]), spoken by two male and two female monolingual native German speakers of young and middle age. The speakers did not have any dialect or speech disorders. The fundamental frequencies of their voices were 123 Hz and 126 Hz for the male speakers, and 162 Hz and 171 Hz for the female speakers. The duration of each speech stimulus (company names and numerals) was 500 ms, the presentation levels were about 65 dB(A). Word pairs of a company’s name and a numeral simulating its stock price (e.g., ‘‘Bosch—eins” [‘‘Bosch—one”] or ‘‘Deutz—acht” [‘‘ Deutz—eight”]) were presented to the participants. The company names and numerals were separated by a 300-ms silent interval. Each company name was randomly combined with one of the ten numerals. Participants were instructed to attend on a target company (either ‘‘Bosch” or ‘‘Deutz”, balanced across participants), and to press the lower button of a keypad when the value of the target company was less or equal five (values ‘‘1”, ‘‘2”, ‘‘3”, ‘‘4”, ‘‘5”; each in 10% of trials), and the upper button when the value was greater or equal six (values ‘‘6”, ‘‘7”, ‘‘8”, ‘‘9”, ‘‘10”; each in 10% of trials), using the index and the middle finger of the dominant hand. In case the participants were not able to identify the target value, they were instructed to make a guess, i.e., to press either button. The target company and following numerals were spoken by the same person and were presented by the same loudspeaker for a variable number of trials. Concurrent company names paired with different numerals were simultaneously spoken by three other persons and presented by the three other loudspeakers (Fig. 1B). This concurrent speech information had to be ignored. A dynamic multi-speaker scenario was created by randomly changing the speakers and the locations of the concurrent companies from trial to trial in a way that within each trial (a) all four different loudspeakers were active, (b) each company name and value occurred only once, and (c) there were always two numerals within the range of 1–5 or 6–10. After three, four, five, or six subsequent trials (4.5 trials on average) the speaker of the target company (voice and location) changed according to a pseudorandomized scheme. There were two change conditions: In the cued condition, the change in target speaker and location was indicated by the value of the target company (either ‘‘1” or ‘‘10”, balanced across participants). Thus, whenever the value of the target company was ‘‘1” (or ‘‘10”), the participant knew that the target speaker would change in the next trial. In the uncued condition, changes occurred after all other numbers, i.e., after ‘‘2” to ‘‘10” when ‘‘1” was the cue, and after ‘‘1” to ‘‘9” when ‘‘10” was the cue. The session began with a practice block (about 20 trials) to familiarize the participants with the task. Here, sequences of target companies and numbers were presented in silence, and examples of cued and uncued changes were presented. Thereafter, a total of 912 trials was presented in two blocks, each lasting about 25 min. Among the trials a total of 180 changes were presented, 90 of which were cued, and 90 of which were uncued. The two

blocks were separated by a short rest break. Each trial lasted for 3.3 s: The duration of the word pairs was 1.3 s (500-ms company name, 300-ms silence, 500-ms numeral), leaving 2 s after the end of the numeral for response. No feedback was given at any time during the experiment. In order to minimize possible effects of eye movements, the participants were instructed to focus on the central fixation light and to avoid any eye movements toward the target speaker position. The timing of the stimuli and the recording of the participants’ responses were controlled by custom-written software. 2.3. Data recording and analysis 2.3.1. Behavioral data The rates of correct responses (i.e., the percentage of responses in which a participant pressed the lower button when the value of the target company was less or equal five, and the upper button when the value was greater or equal six) were computed for the younger and older subjects, and analyzed for trials preceding a change (Pre) and following a change (Post1, Post2, Post3). The rates of correct responses were subjected to a three-way ANOVA with between-subjects factor Age (younger, older) and within-subjects factors Sequence (Pre, Post1, Post2, Post3) and Cue (cued, uncued). In addition, the effect of changes in speaker setting was assessed by subjecting the relative differences in correct responses before and after a change (Post1 Pre/Pre ⁄ 100%) to a two-way ANOVA (Age, Cue). Given that response accuracy was favored over response speed, response times were not analyzed. Levene’s tests were used to assess the homogeneity of variance, and the degrees of freedom were adjusted if variances were unequal. Only significant differences in homogeneity of variance between the groups were reported. Effect sizes were computed to provide a more accurate interpretation of the practical significance of the findings, using the partial eta-squared coefficient (g2p). 2.3.2. EEG data The continuous EEG was sampled at 2048 Hz using 64 electrodes and a BioSemi amplifier (Active Two; Biosemi, Amsterdam, Netherlands). Electrode positions were based on the International 10-10 system. The amplifier bandpass was 0.01–140 Hz. Horizontal and vertical eye positions were recorded by electrooculography (EOG) using six electrodes positioned around both eyes. Two additional electrodes were placed on the left and right mastoids. Electrode impedance was kept below 10 kX. The raw data were downscaled offline to a sampling rate of 1000 Hz, digitally band-pass filtered (cut-off frequencies 0.5 and 25 Hz; slopes 48 dB/octave), and re-referenced to the linked mastoid electrodes. The data were segmented into 3200-ms stimulus-locked epochs covering the period from 100 to 3100 ms relative to speech onset. To avoid overlapping sequences, the epochs were chosen to be shorter than the trial duration. Data were then corrected for ocular artifacts using the Gratton, Coles, & Donchin procedure (Gratton, Coles, & Donchin, 1983). Individual epochs exceeding a maximum-minimum difference of 200 lV and a maximum voltage step of 50 lV per sampling point were excluded from further analysis (automatic artifact rejection as implemented in the BrainVision Analyzer software, Version 1.05; Brain Products, Gilching, Germany). The remaining epochs were baseline corrected to a 100-ms pre-stimulus window relative to the speech onset. Trials containing correct responses were averaged for each participant. The ERP analysis focused on the pre- and post-change trials (Pre and Post1) and was divided in three parts. Firstly, peaks of the different ERP deflections were defined as maximum positivity or negativity within particular 100-ms latency windows of the specific waveforms (P1: 10–110 ms at FCz; N1: 60–160 ms at Cz; P2: 145–245 ms at FCz; N2: 245–345 ms at Cz; relative to the onset

S. Getzmann et al. / Brain and Cognition 111 (2017) 1–12

of the company names). The latency windows were centered around the latencies of the grand average ERPs (averaged across all younger and older participants). ERP peak latencies were measured at electrode positions chosen to be commensurate with previous knowledge of the topographical scalp distribution of specific ERPs (for review, Barrett, Neshige, & Shibasaki, 1987; Friedman, Simpson, & Hamberger, 1993; Lovrich, Novick, & Vaughan, 1988; Näätänen & Picton, 1987; Smith, Michalewski, Brent, & Thompson, 1980). The ERP latencies and amplitudes were subjected to three-way ANOVAs with between-subject factor Age (younger, older) and within-subjects factors Sequence (Pre, Post1) and Cue (uncued, cued). Secondly, to delineate the change-related N400diff and LPCdiff components, difference waveforms were calculated by subtracting the Pre-change ERPs from the Post1-change ERPs. The amplitude of the N400diff and LPCdiff were determined for each participant as the mean value of a 20-ms period centered at the individual peak latencies. In order to test potential differences in topography, the amplitudes of the N400diff and LPCdiff were analyzed within an array of 3  3 electrodes around the electrode position of maximal activation, resulting in two additional within-subjects factors, Frontality and Laterality around the posterior electrode Pz (CP3, P3, PO3, CPz, Pz, POz, CP4, P4, PO4). The amplitudes of the N400diff and LPCdiff were subjected to four-way ANOVAs (Age, Cue, Frontality, and Laterality), and the latencies were subjected to two-way ANOVAs (Age, Cue). Finally, to investigate the processing of the cue indicating a subsequent change, the ERPs elicited by the cue word in Pre sequences were analyzed. These trials were baseline corrected to a 100-ms pre-stimulus window relative to the number onset, i.e., to the onset of the critical information. The ERPs were analyzed separately for the cued condition (i.e., for the 90 trials in which the number was equal to the cue values ‘‘1” or ‘‘10”) and for the uncued condition (i.e., for the 90 trials in which all other numbers preceded a change trial). The ERP deflections (P1: 20–120 ms at FCz; N1: 70–170 ms at Cz; P2: 155–255 ms at FCz; N2: 250– 550 ms at Cz; relative to the onset of the numbers) were subjected to two-way ANOVAs (Age and Cue). In order to control for the differences in audiometric thresholds between younger and older participants (see Section 2.1), the individual hearing level was used as a covariate in all ANOVAs (e.g., Howell, 2012). 2.3.3. EOG data Accurate visual fixation on the central LED was assessed off-line by analyzing the EOG data across the time window of 0–1000 ms after speech onset, covering the time range of the ERPs to speech onset (see Section 3.2) and the change-related ERPs (see Section 3.3). Data of the two horizontal EOG channels (one to the left eye and one to the right eye) were averaged in a way that gaze directions to the right (left) would result in positive (negative) amplitude values. In a first analysis, these values were averaged for each participant across trials and cue conditions, separately for each target position, and subjected to a two-way ANOVA with between-subjects factor Age and within-subjects factor Position ( 45°, 15°, +15°, +45°). There was no main effect of Position (F3,126 = 2.55) and no interaction of Age by Position (F3,126 = 0.53; both p > 0.05; both gp2 < 0.06). Also, post-hoc t-tests did not indicate significant differences between target positions (all p > 0.05). In a second analysis, we tested whether the participants performed eye movements after a change in target position. We therefore computed the differences in EOG amplitudes in Post1 minus Pre trials, separately for left-to-right (LR) and right-to-left (RL) changes in target positions. In order to increase the data power, we averaged across all possible LR and RL changes and both cue conditions, and subjected the EOG amplitudes to a two-way ANOVA with between-subjects factor Age and within-subjects factor Direction

5

(LR, RL). Again, we did not find a main effect of Direction (F1,42 = 0.02) and no interaction of Age by Direction (F1,42 = 0.86; both p > 0.05; both gp2 < 0.02). 3. Results 3.1. Performance The rate of correct responses decreased after a change in target speaker settings in younger and older subjects, and subsequently re-approached the pre-change level (Fig. 2A). The decrease in performance was less pronounced when the change was cued. Accordingly, the ANOVA of the accuracy scores indicated a significant Sequence by Cue interaction (F3,126 = 7.5; p < 0.001; gp2 = 0.15), besides a main effect of Sequence (F3,126 = 71.0; p < 0.001; gp2 = 0.63). Older participants performed slightly worse than the younger ones (78.1% vs. 82.1%), but the main effect of Age (F1,42 = 3.2; p = 0.083; gp2 = 0.07) failed to reach clear statistical significance. Also, there were no interactions of Age by Cue or Age by Cue by Sequence (both p > 0.05; both gp2 < 0.03). Additional Bonferroni-corrected t-tests indicated that the difference in the rate of correct responses between the cued and uncued condition in Post1 trials was significant in the younger (t21 = 4.95; p < 0.001) and in the older group (t21 = 2.40; p < 0.05). In order to circumvent some known problems with submitting accuracy scores to parametric tests like an ANOVA (e.g., Studebaker, McDaniel, & Sherbecoe, 1995), an additional analysis of the RAU transformed accuracy data (Studebaker, 1985) was conducted. This ANOVA also indicated a significant Sequence by Cue interaction (F3,126 = 4.5; p < 0.01; gp2 = 0.10) and a main effect of Sequence (F3,126 = 64.2; p < 0.001; gp2 = 0.61), but only a marginal main effect of Age (F1,42 = 3.7; p = 0.06; gp2 = 0.08) and no interactions of Age by Cue or Age by Cue by Sequence (both p > 0.05; both gp2 < 0.03). In order to further analyze the effect of cueing on the decrease in performance in younger and older participants, the percentage change in correct responses in Post1 trials (relative to the prechange level) was computed and submitted to a two-way ANOVA (Age, Cue). There was a main effect of Cue (F1,42 = 16.0; p < 0.001; gp2 = 0.28), resulting from a less pronounced decrease in performance after cued than uncued changes ( 8.3% vs. 15.7%; Fig. 2B). However, there was no main effect of Age or Age by Cue interaction (both p > 0.05; both gp2 < 0.03). Given the differences in audiometric thresholds both between the younger and the older group and within the older group (see Section 2.1), it was additionally tested whether the performance in speech perception was related to basic hearing abilities. Pearson correlations between the rates of correct responses (averaged across Pre, Post1, Post2, Post3 trials and cue conditions) and the change-related declines in correct responses (Post1 minus Pre trials, averaged across cue conditions) on the one hand, and individual audiometric thresholds on the other hand were therefore computed, separately for younger and older participants. There were no significant correlations, neither for the younger group (correct responses: r = 0.07; decline in correct responses: r = 0.16), nor for the older group (correct responses: r = 0.32; decline in correct responses: r = 0.41; all p > 0.05; Bonferronicorrected). Thus, performance in speech perception and switch costs were obviously not correlated with the participants’ basic hearing abilities. 3.2. ERPs to speech onset The onset of the company names produced a typical frontocentral P1-N1-P2 complex (Fig. 3A). Especially the younger group

6

S. Getzmann et al. / Brain and Cognition 111 (2017) 1–12

Younger

Older

90

Correct Responses [%]

Correct Responses [%]

A 85 80 75 70 65 Pre

Post1

Change in Correct Responses [%]

85 80 75 70 65 Pre

Post2 Post3

Sequence

B

90

Post1

Post2 Post3 Sequence

uncued cued

20 10 0

uncued cued

-10 -20 -30 -40

Younger

Older

Fig. 2. (A) Rates of correct responses of younger and older adults for Pre, Post1, Post2, and Post3 sequences and for cued and uncued changes. Error bars are standard errors across participants (N = 22). (B) Relative changes in the rate of correct responses of younger and older adults (means and individual values) in Post1 sequences (relative to pre-change levels) for cued and uncued changes.

A

Younger Company

Cz

Older Company

Number

Number

P2

P2

P1

P1

N2 N1

N2 N1

+

Pre

2 µV

Post1 cued uncued

200 ms

B Younger

Older 350 Latency [ms]

Latency [ms]

350 325 300

300 275

275 0

325

0 Pre

Post1

Pre

Post1

uncued cued Fig. 3. (A) Grand-average ERPs of younger and older adults at Cz plotted as a function of time relative to the speech onset in Pre and Post1 sequences and for cued and uncued changes. (B) N2 latencies (means and individual values) of younger and older participants, shown for Pre and Post1 sequences and for cued and uncued changes.

7

S. Getzmann et al. / Brain and Cognition 111 (2017) 1–12

showed a second negative peak (N2) that appeared to be reduced in the older group. About 800 ms after the company onset, the number onset elicited a second complex of ERPs that was not further analyzed here. In the following sections, these ERP components are compared for cued and uncued changes, and for the younger and older groups. P1. There were no main effects of Age, Cue, or Sequence on P1 amplitude or latency, and no interactions (all p > 0.05; all gp2 < 0.06). N1. The N1 was greater to cued, than uncued, changes (cued: 3.8 lV vs. uncued: 3.3 lV; F1,41 = 6.33; p < 0.05; gp2 = 0.13). Also, there were no effects of Age or Sequence, and interactions on N1 amplitude. Neither, there were any effects on N1 latency (all p > 0.05; all gp2 < 0.07). P2. There were no main effects of Age, Cue, or Sequence on P2 amplitude, and no interaction (all p > 0.05; all gp2 < 0.03). The P2 latency was reduced after a change (198 ms vs. 206 ms; F1,41 = 7.92; p < 0.01; gp2 = 0.16). There were no main effects of Age or Cue, and no interactions on P2 latency (all p > 0.05; all gp2 < 0.07). N2. The N2 amplitude was increased after a change ( 1.9 lV vs. 0.9 lV; F1,41 = 18.62; p < 0.001; gp2 = 0.31). There was no effect of Age or Cue, and no interactions on N2 amplitude (all p > 0.05; all gp2 < 0.06). The N2 latency was shorter in younger than older participants (295 ms vs. 308 ms; F1,41 = 5.57; p < 0.05; gp2 = 0.12), and in the cued than uncued condition (298 ms vs. 305 ms; F1,41 = 4.58; p < 0.05; gp2 = 0.10). A tendency for a Cue by Sequence interaction (F1,41 = 3.63; p = 0.064; gp2 = 0.08) suggested that the N2 latency was reduced after a cued change. This reduction in N2 latency after a cued change was modulated by Age, according to an Age by Cue by Sequence interaction (F1,41 = 15.67; p < 0.001; gp2 = 0.28): While the older group showed a decreased N2 latency only after a cued change (and even an increase after an uncued change), the younger group showed a decrease after cued and uncued changes (Fig. 3B).

N400diff. The N400diff had a centro-parietal topography and the overall amplitude was smaller when the change was cued, than uncued ( 1.6 lV vs. 2.1 lV; F1,41 = 4.37; p < 0.05; gp2 = 0.10; Fig. 5A). This cue-based reduction was especially pronounced over left parietal areas (Cue by Laterality interaction: F2,82 = 4.88; p < 0.01; gp2 = 0.11; Cue by Frontality interaction: F2,82 = 3.60; p < 0.05; gp2 = 0.08). Also, the N400diff latency was much shorter when the change was cued (349 ms vs. 410 ms; F1,41 = 14.40; p < 0.001; gp2 = 0.26; Fig. 5A). There were no main effects of Age and no interactions of Age by Cue (all p > 0.05; all gp2 < 0.05). LPCdiff. The LPCdiff had a parietal topography for the younger participants, and a centro-parietal topography for the older ones (Age by Frontality interaction: F2,82 = 4.93; p < 0.01; gp2 = 0.11; Fig. 4). Also, the LPCdiff appeared to be left-lateralized in the younger group, but non-lateralized in the older group. However, the Age by Laterality interaction failed to reach clear statistical significance (p > 0.05; gp2 < 0.05). There was no effect of Cue on LPCdiff amplitude, but the LPCdiff latency was profoundly reduced in the cue condition (752 ms vs. 842 ms; F1,41 = 7.83; p < 0.01; gp2 = 0.16; Fig. 5B). There were no main effects of Age and no Age by Cue interactions (all p > 0.05; all gp2 < 0.05). 3.4. ERPs to cue onset The onset of the number in Pre trials also elicited a complex of ERPs (Fig. 6) that have been subjected to two-way ANOVAs (Age, Cue). These indicated a shorter P1 latency in older, than younger, participants (61 ms vs. 72 ms; F1,41 = 10.10; p < 0.05; gp2 = 0.20). Also, older participants showed a greater N1 amplitude ( 4.6 lV vs. 1.5 lV; F1,41 = 9.86; p < 0.005; gp2 = 0.19) and a tendency to a shorter N1 latency (116 ms vs. 128 ms; F1,41 = 3.38; p = 0.073; gp2 = 0.08) than the younger ones. However, there were no Age by Cue interactions on any ERP amplitude or latency (all p > 0.05; all gp2 < 0.09).

3.3. Change-related ERPs

3.5. Summary of ERP results

The difference waveform (Post1 minus Pre) revealed two deflections (Fig. 4), consisting of a negative one peaking around 380 ms (N400diff), and a positive one peaking around 800 ms (LPCdiff).

In sum, the ERP analysis indicated (a) shorter P1 latency and a larger and slightly faster N1 amplitude to the cue words in older, than younger, participants, (b) an increase in N2 amplitude and a

Older

Younger

cued uncued

+

cued uncued

1 µV

200 ms Pz N400diff

N400diff LPCdiff

LPCdiff

Difference Post1 - Pre 0 µV cued uncued

+/-3 µV cued uncued

cued uncued

Fig. 4. Grand-average difference waveforms (Post1 minus Pre sequences) of younger and older adults at Pz plotted as a function of time relative to the speech onset and topographies of the change-related ERPs (N400diff, LPCdiff) for cued and uncued changes. Grey rectangles mark the latency ranges in which – based on individual, singlesubject data – amplitudes of N400diff and LPCdiff were computed.

8

S. Getzmann et al. / Brain and Cognition 111 (2017) 1–12

N400 diff

N400 diff

3 2 1 0 -1 -2 -3 -4 -5

600

Latency [ms]

Amplitude [µV]

A

500 400 300 200 0

Younger

Older

Younger

Older

uncued cued

B

LPC diff

5

1100

Latency [ms]

4

Amplitude [µV]

LPC diff

1200

3 2 1

1000 900 800 700 600

0

500 0

-1

Younger

Older

Younger

Older

Fig. 5. Amplitudes and latencies (means and individual values) of the change-related N400diff (A) and LPCdiff (B) of younger and older participants shown for cued and uncued changes. The amplitudes were averaged across a parietal electrode array.

Number

+ 2 µV

Cz

100 ms P1

and uncued changes. The N400diff and LPCdiff latencies were reduced, and the N400diff amplitude was smaller after a cued, than uncued, change. Finally, cue words elicited a smaller N2 than noncues.

P2 4. Discussion

N2

N1

Younger Older cued uncued Fig. 6. Grand-average ERPs of younger and older adults at Cz plotted as a function of time relative to the number onset in Pre sequences, shown separately for conditions when the subsequent change was cued or uncued.

decrease in P2 latency after a change, and (c) a decrease in N2 latencies after cued changes. In particular, the N2 latency of the older participants was reduced only after cued changes, whereas the N2 latency of the younger participants was reduced after cued

A change in target speaker impaired the performance in the present speech comprehension task in younger and older adults. This was indicated by a decrease in the rate of correct responses by about 16%, relative to the pre-change level. The decline in performance is in accordance with our previous results (Getzmann, Hanenberg et al., 2015b) and indicates the costs of attention switching as proposed by Koch et al. (2011). The decline is also in line with a recent study in which listening costs associated with conversational turn-taking were investigated in a sentence-recall task in younger adults (Lin & Carlile, 2015). Relative to a noswitch condition, shifts in speaker position within a sentence decreased the performance of word recall by about 11%. These listening costs have been attributed to shifts in spatial attention. They appear to be associated with working memory capacity, relevant for maintaining goal-relevant information and extracting meaning from dynamic multi-speaker conversations (Lin & Carlile, 2015). The present study revealed that speech comprehension in multi-talker situations was markedly improved when the change was announced by a cue word included in the target stream. In fact, the performance decline was reduced by about 47%, relative to the uncued condition. Effects of cueing on auditory processing have been reported in a number of previous studies. They are usually related to the

S. Getzmann et al. / Brain and Cognition 111 (2017) 1–12

so-called ‘‘spotlight model” of attention, in which auditory spatial attention can be allocated to a particular region or item in space where the processing of stimuli is increased, relative to stimuli presented away from the attended location (Allen, Alais, & Carlile, 2009; Mondor & Zatorre, 1995). Speech intelligibility benefits from knowing where and when to listen (Kidd, Arbogast, Mason, & Gallun, 2005; Kitterick, Bailey, & Summerfield, 2010). On the other hand, performance in a dynamic listening task decreased as a function of the spatial distance from the expected target location (Brungart & Simpson, 2007). Also, shifts in stimulus location or a change in the attended-to target have been shown to result in costs in performance (Best, Gallun, Mason, Kidd, & Shinn-Cunningham, 2010; Best et al., 2008), in line with the assumption that auditory attention is object based and that the representation of an auditory object builds up over time (e.g., Best et al., 2008). The present data showed that cueing profoundly reduces the decline in speech comprehension after a change in speaker settings. However, they also indicated limitations in the preparation of switches in auditory attention, given that even cued changes clearly reduced performance. As proposed by Koch et al. (2011), this might suggests a considerable degree of ‘‘inertia” in intentional control of auditory selection criteria. It should be noted, however, that the cues in the present study did not contain any information about the future location or voice of the target speaker, but rather unspecifically indicated a subsequent switch. The benefit in performance observed with cued changes clearly indicates that the knowing of a future change in target speaker setting allows for an efficient preparation of the switch, even without knowing where and to whom to listen. In this regard, it should be noted that in the present task both voice and location of the target speaker changed. While this scenario corresponds well to a realistic conversational turn-taking, it would also be important to distinguish between the effects of ongoing changes in voice or location in future research.

4.1. Cueing and change-related ERPs By subtraction of pre-change waveforms from post-change waveforms, genuine processes of speech perception and comprehension (that were present in both pre- and post-change trials) were eliminated and switch-related ERPs could be revealed. As in our previous study (Getzmann, Hanenberg et al., 2015b), a biphasic pattern of activation was observed over parietal scalp areas, consisting of the N400diff and LPCdiff components. The ‘‘classical” N400 is traditionally linked to the semantic processing of linguistic material (such as word recognition, phonological analysis, or integration of a word’s meaning into the preceding context; for review, Kutas & Federmeier, 2011), and has been showed to be linked to expectancies under degraded speech perception (e.g., Strauss, Kotz, & Obleser, 2013). In the present context, the N400diff could be related to the detection of the mismatch between the current and the expected linguistic input that occurred when the target speaker had changed: Instead of the expected target word (i.e., the name of the target company, spoken by the expected speaker at the expected location) a different word (i.e., a different company name) was perceived. In line with this hypothesis, it has been shown that the repeated presentation of words reduces the N400, whereas a violation of the repetition increases the N400. Moreover, there is evidence that the N400 amplitude depends on expectancies, being larger the more the current linguistic input differs from the expected input (for review, see Kutas, Van Petten, & Kluender, 2006). The mismatch between the expected target speaker voice/target location and the current speaker setting in the present experimental paradigm could elicit the N400diff, reflecting an enhanced phonological analysis of the linguistic

9

input. The N400diff was followed by the LPCdiff. The LPC in general is assumed to reflect processes of decision-making and target selection (Kok, 1997; Picton, 1992) as well as context updating and evaluation of stimulus meaning (Juottonen, Revonsuo, & Lang, 1996). In the present context, the LPCdiff could reflect an increased stimulus evaluation and context updating after a change in target speaker. Interestingly, neuroimaging studies revealed that activation of parietal brain areas, in particular, the posterior parietal cortex (PPC; Corbetta et al., 2008; Shomstein & Yantis, 2006; for review, see Shomstein, 2012) and the right TPJ (e.g., Larson & Lee, 2014; Salmi, Rinne, Koistinen, Salonen, & Alho, 2009; for review, see Lee, Larson, Maddox, & Shinn-Cunningham, 2014) play an important role for voluntary attention switching, in both vision and audition (Corbetta et al., 2008). When listeners switched attention between two streams of different voices or between the two ears, activation in the PPC was higher than in a non-switch condition (Shomstein & Yantis, 2006). Similarly, the right TPJ was more engaged in switched, than non-switched, trials (Larson & Lee, 2013b; for review, see Lee et al., 2014). Using a cued attentionswitching task, it could further be demonstrated that the right TPJ plays a specific role in switching of auditory spatial attention (Larson & Lee, 2014). On the other hand, left-lateralized areas (mainly left IPS) have been proposed to act as featureindependent integrative center that coordinates selective attention for complex auditory scenes (Hill & Miller, 2010). Although it is clear that ERP topographies cannot directly be related to underlying brain structures, the parietal N400diff and LPCdiff observed in the present study could reflect electrophysiological correlates of such processes. Cortical source localization revealed a widely distributed set of brain regions co-active with the scalp-recorded N400 and LPC, including the anterior medial temporal lobe, middle and superior temporal areas, inferior temporal areas, and prefrontal areas (for reviews, see Kutas & Federmeier, 2011; Lau, Phillips, & Poeppel, 2008) and temporal-parietal brain areas (for review, see Polich, 2007), respectively. Thus, right temporo-parietal and left parietal areas could act as a filter for incoming stimuli by suppressing attention switching to distractors (Shulman, Astafiev, McAvoy, d’Avossa, & Corbetta, 2007), and by triggering the spatial reorientation of the focus of attention to the new speaker and location of interest (Corbetta & Shulman, 2002). Assuming similar processes to be active after a change in speaker setting, the activation over parietal areas observed in the present study could reflect a shift of attention toward the new target speaker and location, as indicated by the LPCdiff. Cueing affected both the N400diff and the LPCdiff. The decrease in N400diff and LPCdiff latencies suggested that the change-related processes were accelerated when the change was cued. The decrease in N400diff amplitude in the cued condition could result from a reduction in the perceived mismatch between the expected and the current linguistic input, given that the target word was no longer expected at the current location when the change was cued. In line with this interpretation, the N400 to an unexpected input is usually smaller when the probability (and hence expectancy) for the deviant input is increased (for review, see Kutas et al., 2006). In particular, the N400 is smaller, the more predictable a word is, indicating that the difficulty of meaning activation/integration processes is reduced (e.g., Kutas & Federmeier, 2000; Molinaro, Conrad, Barber, & Carreiras, 2010). Interestingly, while many different experimental manipulations have been found to affect the amplitude of the N400, only a few factors alter the latency of the N400 component (Federmeier & Laszlo, 2009), as did the cueing in the present speech perception task. On the other hand, it should be noted that the effect size was rather small, and that the effect of cueing on N400diff amplitude should therefore be treated with caution.

10

S. Getzmann et al. / Brain and Cognition 111 (2017) 1–12

Besides the change-related N400diff and LPCdiff components, an effect of cueing was also found on the N1 to speech onset. This was greater in the cued, than uncued, condition. Assuming the N1 to be a correlate of early attentional processes (Hillyard, Hink, Schwent, & Picton, 1973; Schneider, Beste, & Wascher, 2012; for reviews, Eimer, 2014), one could speculate that the increase in N1 amplitude could be attributed to early preparatory processes of the change. However, given that the effect of cueing on N1 did not clearly differ between pre-change and post-change sequences, this interpretation was not supported. Irrespective of cueing, the N2 amplitude was increased after a change, similar as in our previous study (Getzmann, Hanenberg et al., 2015b). The N2 usually reflects control processes (for review, see Folstein & Van Petten, 2008) and conflict processing or inhibitory control of irrelevant information (e.g., Bertoli, Smurzynski, & Probst, 2005; Falkenstein, Hoormann, & Hohnsbein, 2002). The increase in N2 amplitude after a change could therefore reflect enhanced control processes and, in particular, enhanced inhibitory control of irrelevant information. In line with this interpretation, a combined ERP and fMRI study found a greater fronto-central negativity in the N2 time range after successful detection of auditory targets in a divided attention paradigm (Sabri, Humphries, Binder, & Liebenthal, 2013). 4.2. Cueing and aging Neither the decrease in performance after a change, nor the benefit of cueing depended on age, suggesting that younger and older adults used the cue to prepare change processing to the same amount. In line with this assumption, there was no significant interaction of Age and Cue in the ERPs to the cue words. There were, however, age-related differences in the ERPs to the cue word. The older participants showed faster P1 latencies, (slightly) faster N1 latencies, and larger N1 amplitudes than the younger ones. These effects are in line with previous studies (e.g., Getzmann et al., 2014; Getzmann, Hanenberg et al., 2015b) and can be interpreted as a compensatory increase of the allocation of early attentional resources (e.g., Yordanova, Kolev, Hohnsbein, & Falkenstein, 2004). Assuming an early selection model of attention, the N1 could reflect a sensory gating mechanism of attention, which facilitates the further processing of a relevant stimulus (Luck et al., 1994; for a review, see Eimer, 2014). The greater N1 of the older participants therefore suggests stronger early attentional processes. This can be interpreted in accordance with the declinecompensation hypothesis (Schneider et al., 2010; Wingfield & Grossman, 2006), assuming that additional brain activity during task performance in older adults could help counteract agerelated neurocognitive deficits (Cabeza, Anderson, Locantore, & McIntosh, 2002). However, it should be noted that the agerelated N1 effects in the present study did not depend on cueing, suggesting that the extra brain activity of the older group was not confined to the change-predicting cue words, but also occurred when the words were non-informative. On the other hand, age-related changes in N2 latency indicated that this was reduced after changes, relative to the pre-change. Most importantly, while younger participants exhibited the N2 latency reduction for cued and uncued trials, this reduction was only seen after cued changes in the older group. In contrast, the N2 latency of older participants was even increased after an uncued change. This suggests that older adults needed the cue to prepare for a reduction, while younger adults did not. Obviously, older participants used the cue more than the younger ones to accelerate the processes reflected by the N2, i.e., to accelerate control processes such as inhibitory control of irrelevant information (for review, Folstein & Van Petten, 2008). The relative delay

observed in uncued changes did not affect their behavioral performance, however. In this regard, the large variability in N2 latency should also be noted (cf. Fig. 3): While some of the older participants showed a strong decrease in N2 latency in cued, relative to uncued, changes, other did obviously not. This suggests that the individual participants profoundly differed in inhibitory control when the change was cued, and that further influencing factors might play a role. Finally, there were no differences between younger and older participants in the effect of cueing on N400diff and LPCdiff. The more pronounced frontal topography of the LPCdiff in the older group, which has also been observed in previous studies (Davis & Jerger, 2014; Getzmann, Hanenberg et al., 2015b), occurred irrespective of cueing. A more pronounced frontal activation is typically found in older populations (for review, see Friedman, Kazmerski, & Fabiani, 1997), according to the PASA model (Posterior-Anterior Shift with Aging; Davis, Dennis, Fleck, Daselaar, & Cabeza, 2008). The same was true for the slightly more bilateral topography of LPCdiff in the older group that refers to the phenomenon that brain activation during cognitive performance tends to be less lateralized in older adults than in younger adults, according to the HAROLD model (hemispheric asymmetry reduction in older adults; Cabeza et al., 2002). These changes in the topography of the LPCdiff are consistent with functional neuroimaging results, indicating agerelated differences in neural activation to compensate for speech comprehension declines in older adults (for review, see Wingfield & Grossman, 2006). For example, contrasting brain activation in younger adults and high-performing older adults (‘‘good comprehenders”) revealed that the latter ones showed increased activity in dorsal inferior frontal and right temporal-parietal regions, probably to compensate for a reduced left temporal-parietal activation they showed relative to the young adults (Grossman et al., 2002). An important issue in this context is the potential effects of peripheral hearing loss. As could be expected, the older group had overall reduced hearing abilities (especially at higher frequencies), as indicated by pure-tone audiometry. Reduced hearing abilities are a serious problem to older persons, because these declines at a sensory level can interact with deficits of higher-order central auditory processing and declines in general cognitive abilities (e.g., CHABA, 1988; for a review, see Burke & Shafto, 2008). In the present study, the differences in basic hearing abilities between the two age groups were accounted for by using the individual hearing loss as covariate in ERP analysis. Also, there were no significant correlations of hearing level and performance, neither for the overall rate of correct responses, nor for the change-related decline in correct responses. Thus, in the present task, cocktail-party listening and switching between speakers obviously did not depend on basic hearing abilities. This assumption is corroborated by results of a related study, in which a similar multi-speaker speech comprehension task was employed, and in which the group of older subjects was divided into high and low performing subgroups (Getzmann, Wascher et al., 2015a): No significant differences in hearing loss were found between the two subgroups, suggesting that cocktailparty listening was not directly related to basic hearing functions. While there is further evidence that individual performance in speech-in-noise perception is not directly related to basic hearing abilities (Getzmann, Hanenberg et al., 2015b; Ruggles, Bharadwaj, & Shinn-Cunningham, 2011), the interplay of age, hearing loss, and speech-in-noise perception is still a matter of intense debate. In this regard, it should be noted that in the present study all participants reported to clearly understand all spoken words in a single-speech training session, indicating that the intensity level of the speech stimuli was adequate for both age groups. The impact of basic hearing abilities might become more prominent when the intensity level is reduced.

S. Getzmann et al. / Brain and Cognition 111 (2017) 1–12

5. Conclusion Changes in the relevant speaker in multi-speaker environments reduced the speech comprehension, and triggered change-related cognitive processes, comprising an enhanced phonological analysis of the linguistic input, context updating, and switching of attention, as indicated by the N400diff and LPCdiff. A semantic cue indicating a subsequent change reduced the decline in performance, and accelerated the change-related cognitive processes. Younger and older adults benefit to the same amount from cueing on a behavioral level. On a neurophysiological level, however, younger adults showed an acceleration of inhibitory control even without explicate cueing, whereas older adults depended more on a cue indicating a subsequent change, as indicated by differences in N2 latency. Disclosure statement All authors disclose no actual or potential conflicts of interest including any financial, personal, or other relationships with other people or organizations that could inappropriately influence (bias) their work. Acknowledgements The authors are grateful to Peter Dillmann for technical assistance, to Christina Hannenberg and Lukas Labisch for their help in running the experiments, and two anonymous reviewers for very constructive comments on a previous version of the manuscript. This work was funded by a grant from the Deutsche Forschungsgemeinschaft (DFG GE 1920/3-1). References Alain, C., & Arnott, S. R. (2000). Selectively attending to auditory objects. Frontiers in Neuroendocrinology, 5, D202–D212. Alain, C., Dyson, B. J., & Snyder, J. S. (2006). Aging and the perceptual organization of sounds: A change of scene? In P. M. Conn (Ed.), Handbook of models for human aging (pp. 759–770). New York, NY: Academic Press. Alho, K., Rinne, T., Herron, T. J., & Woods, D. L. (2014). Stimulus-dependent activations and attention-related modulations in the auditory cortex: A metaanalysis of fMRI studies. Hearing Research, 307, 29–41. Allen, K., Alais, D., & Carlile, S. (2009). Speech intelligibility reduces over distance from an attended location: Evidence for an auditory spatial gradient of attention. Attention, Perception, & Psychophysics, 71, 164–173. Baddeley, A. (2003). Working memory: Looking back and looking forward. Nature Reviews Neuroscience, 4, 829–839. Barrett, G., Neshige, R., & Shibasaki, H. (1987). Human auditory and somatosensory event-related potentials: Effects of response condition and age. Electroencephalography and Clinical Neurophysiology, 66, 409–419. Bertoli, S., Smurzynski, J., & Probst, R. (2005). Effects of age, age-related hearing loss, and contralateral cafeteria noise on the discrimination of small frequency changes: Psychoacoustic and electrophysiological measures. Journal of the Association for Research in Otolaryngology, 6, 207–222. Best, V., Gallun, F. J., Mason, C. R., Kidd, G., Jr., & Shinn-Cunningham, B. G. (2010). The impact of noise and hearing loss on the processing of simultaneous sentences. Ear and Hearing, 31, 213–220. Best, V., Ozmeral, E. J., Kopco, N., & Shinn-Cunningham, B. G. (2008). Object continuity enhances selective auditory attention. Proceedings of the National academy of Sciences of the United States of America, 105, 13174–13178. Bregman, A. S. (1994). Auditory scene analysis: The perceptual organization of sound. Cambridge, Massachusetts: MIT Press. Brungart, D., & Simpson, B. (2007). Cocktail party listening in a dynamic multitalker environment. Perception & Psychophysics, 69, 79–91. Burke, D. M., & Shafto, M. A. (2008). Language and aging. In F. I. M. Craik & T. A. Salthouse (Eds.), The handbook of aging and cognition (pp. 373–443). New York, NY: Psychology Press. Cabeza, R., Anderson, N. D., Locantore, J. K., & McIntosh, A. R. (2002). Aging gracefully: Compensatory brain activity in high-performing older adults. Neuroimage, 17, 1394–1402. Committee on Hearing, Bioacoustics and Biomechanics (CHABA), Working Group on Speech Understanding and Aging (1988). Speech understanding and aging. Journal of the Acoustical Society of America, 83, 859–893. Corbetta, M., Patel, G., & Shulman, G. L. (2008). The reorienting system of the human brain: From environment to theory of mind. Neuron, 58, 306–324.

11

Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience, 3, 201–215. Cusack, R., Decks, J., Aikman, G., & Carlyon, R. P. (2004). Effects of location, frequency region, and time course of selective attention on auditory scene analysis. Journal of Experimental Psychology: Human Perception and Performance, 30, 643–656. Davis, S. W., Dennis, N. A., Fleck, M. S., Daselaar, S. M., & Cabeza, R. (2008). Que PASA? The posterior–anterior shift in aging. Cerebral Cortex, 18, 1201–1209. Davis, T. M., & Jerger, J. (2014). The effect of middle age on the late positive component of the auditory event-related potential. Journal of the American Academy of Audiology, 25, 199–209. Davis, T. M., Jerger, J., & Martin, J. (2013). Electrophysiological evidence of augmented interaural asymmetry in middle-aged listeners. Journal of the American Academy of Audiology, 24, 159–173. Eimer, M. (2014). The time course of spatial attention: Insights from event-related brain potentials. In A. C. Nobre & S. Kastner (Eds.), The oxford handbook of attention (pp. 289–317). Oxford, UK: Oxford University Press. Engle, R. W., & Kane, M. J. (2004). Executive attention, working memory capacity, and a two-factor theory of cognitive control. Psychology of Learning and Motivation, 44, 145–199. Falkenstein, M., Hoormann, J., & Hohnsbein, J. (2002). Inhibition-related ERP components: Variation with modality, age, and time-on-task. Journal of Psychophysiology, 16, 167–175. Federmeier, K. D., & Laszlo, S. (2009). Time for meaning: Electrophysiology provides insights into the dynamics of representation and processing in semantic memory. In B. H. Ross (Ed.), Psychology of learning and motivation (pp. 1–44). Burlington: Academic Press. Folstein, J. R., & Van Petten, C. (2008). Influence of cognitive control and mismatch on the N2 component of the ERP: A review. Psychophysiology, 45, 152–170. Friedman, D., Kazmerski, V., & Fabiani, M. (1997). An overview of age-related changes in the scalp distribution of P3b. Electroencephalography and Clinical Neurophysiology, 104, 498–513. Friedman, D., Simpson, G., & Hamberger, M. (1993). Age-related changes in scalp topography to novel and target stimuli. Psychophysiology, 30, 383–396. Fritz, J. B., Elhilali, M., David, S. V., & Shamma, S. A. (2007). Auditory attention – Focusing the searchlight on sound. Current Opinion in Microbiology, 17, 437–455. Getzmann, S., Hanenberg, C., Lewald, J., Falkenstein, M., & Wascher, E. (2015). Effects of age on electrophysiological correlates of speech processing in a dynamic ‘‘cocktail-party” situation. Frontiers in Neuroscience, 9, 341. Getzmann, S., Lewald, J., & Falkenstein, M. (2014). Using auditory pre-information to solve the cocktail-party problem: Electrophysiological evidence for age-specific differences. Frontiers in Neuroscience, 8, 413. Getzmann, S., & Näätänen, R. (2015). The mismatch negativity (MMN) as a measure of auditory stream segregation in a simulated ‘‘cocktail-party” scenario: Effect of age. Neurobiology of Aging, 36, 3029–3037. Getzmann, S., Wascher, E., & Falkenstein, M. (2015). What does successful speechin-noise perception in aging depend on? Electrophysiological correlates of high and low performance in older adults. Neuropsychologia, 70, 43–57. Gratton, G., Coles, M. G. H., & Donchin, E. (1983). A new method for off-line removal of ocular artifact. Electroencephalography and Clinical Neurophysiology, 55, 468–484. Grossman, M., Cooke, A., DeVita, C., Alsop, D., Detre, J., Chen, W., & Gee, J. (2002). Age-related changes in working memory during sentence comprehension: An fMRI study. NeuroImage, 15, 302–317. Hasher, L., & Zacks, R. T. (1988). Working memory, comprehension, and aging: A review and a new view. In G. H. Bower (Ed.), The psychology of learning and motivation (pp. 193–225). New York, NY: Academic Press. Hill, K. T., & Miller, L. M. (2010). Auditory attentional control and selection during cocktail party listening. Cerebral Cortex, 20, 583–590. Hillyard, S. A., Hink, R. F., Schwent, V. L., & Picton, T. W. (1973). Electrical signs of selective attention in the human brain. Science, 182, 177–180. Howell, D. C. (2012). Statistical methods for psychology. Belmont: Cengage Wadsworth. Huang, S., Belliveau, J. W., Tengshe, C., & Ahveninen, J. (2012). Brain networks of novelty-driven involuntary and cued voluntary auditory attention shifting. PLoS ONE, 7, e44062. Humes, L. E., & Dubno, J. R. (2010). Factors affecting speech understanding in older adults. In S. Gordon-Salant, R. D. Frisna, A. N. Popper, & R. R. Fay (Eds.), The aging auditory system (pp. 211–257). New York, NY: Springer. Juottonen, K., Revonsuo, A., & Lang, H. (1996). Dissimilar age influences on two ERP waveforms (LPC and N400) reflecting semantic context effect. Cognitive Brain Research, 4, 99–107. Kerlin, J. R., Shahin, A. J., & Miller, L. M. (2010). Attentional gain control of ongoing cortical speech representations in a cocktail party. Journal of Neuroscience, 30, 620–628. Kidd, G., Arbogast, T. L., Mason, C. R., & Gallun, F. J. (2005). The advantage of knowing where to listen. Journal of the Acoustical Society of America, 118, 3804–3815. Kitterick, P. T., Bailey, P. J., & Summerfield, A. Q. (2010). Benefits of knowing who, where, and when in multi-talker listening. Journal of the Acoustical Society of America, 127, 2498–2508. Koch, I., Lawo, V., Fels, J., & Vorländer, M. (2011). Switching in the cocktail party: Exploring intentional control of auditory selective attention. Journal of Experimental Psychology: Human Perception and Performance, 37, 1140–1147. Kok, A. (1997). Event-related-potential (ERP) reflections of mental resources: A review and synthesis. Biological Psychology, 45, 19–56.

12

S. Getzmann et al. / Brain and Cognition 111 (2017) 1–12

Kutas, M., & Federmeier, K. D. (2000). Electrophysiology reveals semantic memory use in language comprehension. Trends in Cognitive Sciences, 4, 463–470. Kutas, M., & Federmeier, K. D. (2011). Thirty years and counting: Finding meaning in the N400 component of the event-related brain potential (ERP). Annual Review of Psychology, 62, 621–647. Kutas, M., Van Petten, C., & Kluender, R. (2006). Psycholinguistics electrified II. In M. Traxler & M. A. Gernsbacher (Eds.), Handbook of psycholinguistics (pp. 659–724). New York, NY: Elsevier. Larson, E., & Lee, A. K. C. (2013a). Influence of preparation time and pitch separation in switching of auditory attention between streams. Journal of the Acoustical Society of America, 134, EL165–EL171. Larson, E., & Lee, A. K. C. (2013b). The cortical dynamics underlying effective switching of auditory spatial attention. NeuroImage, 64, 365–370. Larson, E., & Lee, A. K. C. (2014). Switching auditory attention using spatial and non-spatial features recruits different cortical networks. Neuroimage, 84, 681–687. Lau, E. F., Phillips, C., & Poeppel, D. (2008). A cortical network for semantics:(de) constructing the N400. Nature Reviews Neuroscience, 9, 920–933. Lee, A. K., Larson, E., Maddox, R. K., & Shinn-Cunningham, B. G. (2014). Using neuroimaging to understand the cortical mechanisms of auditory selective attention. Hearing Research, 307, 111–120. Lin, G., & Carlile, S. (2015). Costs of switching auditory spatial attention in following conversational turn-taking. Frontiers in Neuroscience, 9, 124. Lovrich, D., Novick, B., & Vaughan, H. G. Jr., (1988). Topographic analysis of auditory event-related potentials associated with acoustic and semantic processing. Electroencephalography and Clinical Neurophysiology, 71, 40–54. Luck, S. J., Hillyard, S. A., Mouloua, M., Woldorff, M. G., Clark, V. P., & Hawkins, H. L. (1994). Effect of spatial cueing on luminance detectability: Psychophysical and electrophysiological evidence for early selection. Journal of Experimental Psychology: Human Perception and Performance, 20, 887–904. Molinaro, N., Conrad, M., Barber, H. A., & Carreiras, M. (2010). On the functional nature of the N400: Contrasting effects related to visual word recognition and contextual semantic integration. Cognitive Neuroscience, 1, 1–7. Mondor, T., & Zatorre, R. (1995). Shifting and focusing auditory spatial attention. Journal of Experimental Psychology: Human Perception and Performance, 21, 387–409. Moreno, E. M., Federmeier, K. D., & Kutas, M. (2002). Switching languages, switching palabras (words): An electrophysiological study of code switching. Brain and Language, 80, 188–207. Näätänen, R., & Picton, T. W. (1987). The N1 wave of the human electric and magnetic response to sound: A review and an analysis of the component structure. Psychophysiology, 24, 375–425. Picton, T. W. (1992). The P300 wave of the human event-related potential. Journal of Clinical Neurophysiology, 9, 465–479. Polich, J. (2007). Updating P300: An integrative theory of P3a and P3b. Clinical Neurophysiology, 118, 2128–2148. Rossi, S., Huang, S., Furtak, S. C., Belliveau, J. W., & Ahveninen, J. (2014). Functional connectivity of dorsal and ventral frontoparietal seed regions during auditory orienting. Brain Research, 1583, 159–168. Ruggles, D., Bharadwaj, H., & Shinn-Cunningham, B. G. (2011). Normal hearing is not enough to guarantee robust encoding of suprathreshold features important in everyday communication. Proceedings of the National academy of Sciences of the United States of America, 108, 15516–15521. Sabri, M., Humphries, C., Binder, J. R., & Liebenthal, E. (2013). Neural events leading to and associated with detection of sounds under high processing load. Human Brain Mapping, 34, 587–597.

Salmi, J., Rinne, T., Koistinen, S., Salonen, O., & Alho, K. (2009). Brain networks of bottom–up triggered and top–down controlled shifting of auditory attention. Brain Research, 1286, 155–164. Schneider, D., Beste, C., & Wascher, E. (2012). On the time course of bottom-up and top-down processes in selective visual attention: An EEG study. Psychophysiology, 49, 1492–1503. Schneider, B. A., Pichora-Fuller, M. K., & Daneman, M. (2010). Effects of senescent changes in audition and cognition on spoken language comprehension. In S. Gordon-Salant, R. D. Frisna, A. N. Popper, & R. R. Fay (Eds.), The aging auditory system (pp. 167–210). New York, NY: Springer. Serences, J. T., & Yantis, S. (2006). Selective visual attention and perceptual coherence. Trends in Cognitive Sciences, 10, 38–45. Shinn-Cunningham, B. G. (2008). Object-based auditory and visual attention. Trends in Cognitive Sciences, 12, 182–186. Shinn-Cunningham, B. G., & Best, V. (2008). Selective attention in normal and impaired hearing. Trends in Amplification, 12, 283–299. Shomstein, S. (2012). Cognitive functions of the posterior parietal cortex: Top-down and bottom-up attentional control. Frontiers in Integrative Neuroscience, 6, 38. Shomstein, S., & Yantis, S. (2006). Parietal cortex mediates voluntary control of spatial and nonspatial auditory attention. Journal of Neuroscience, 26, 435–439. Shulman, G. L., Astafiev, S. V., McAvoy, M. P., d’Avossa, G., & Corbetta, M. (2007). Right TPJ deactivation during visual search: Functional significance and support for a filter hypothesis. Cerebral Cortex, 17, 2625–2633. Smith, D. B., Michalewski, H. J., Brent, G. A., & Thompson, L. W. (1980). Auditory averaged evoked potentials and aging: Factors of stimulus, task and topography. Biological Psychology, 11, 135–151. Snyder, J. S., & Alain, C. (2005). Age-related changes in neural activity associated with concurrent vowel segregation. Cognitive Brain Research, 24, 492–499. Snyder, J. S., & Alain, C. (2007). Sequential auditory scene analysis is preserved in normal aging adults. Cerebral Cortex, 17, 501–512. Snyder, J. S., Alain, C., & Picton, T. W. (2006). Effects of attention on neuroelectric correlates of auditory stream segregation. Journal of Cognitive Neuroscience, 18, 1–13. Strauss, A., Kotz, S. A., & Obleser, J. (2013). Narrowed expectancies under degraded speech: Revisiting the N400. Journal of Cognitive Neuroscience, 25, 1383–1395. Studebaker, G. A. (1985). A ‘‘rationalized” arcsine transformation. Journal of Speech, Language, and Hearing Research, 28, 455–462. Studebaker, G. A., McDaniel, D. M., & Sherbecoe, R. L. (1995). Evaluating relative speech recognition performance using the proficiency factor and rationalized arcsine differences. Journal of the American Academy of Audiology, 6, 173. Van der Linden, M., Hupet, M., Feyereisen, P., Schelstraete, M.-A., Bestgen, Y., Bruyer, R., ... Seron, X. (1999). Cognitive mediators of age-related differences in language comprehension and verbal memory performance. Aging, Neuropsychology, and Cognition, 6, 32–55. Van Der Meij, M., Cuetos, F., Carreiras, M., & Barber, H. A. (2011). Electrophysiological correlates of language switching in second language learners. Psychophysiology, 48, 44–54. Wingfield, A., & Grossman, M. (2006). Language and the aging brain: Patterns of neural compensation revealed by functional brain imaging. Journal of Neurophysiology, 96, 2830–2839. Wingfield, A., & Stine-Morrow, E. A. L. (2000). Language and speech. In F. I. M. Craik & T. A. Salthouse (Eds.), The handbook of aging and cognition (pp. 359–416). Mahwah, NJ, US: Lawrence Erlbaum Associates Publishers. Yordanova, J., Kolev, V., Hohnsbein, J., & Falkenstein, M. (2004). Sensorimotor slowing with ageing is mediated by a functional dysregulation of motorgeneration processes: Evidence from high-resolution event-related potentials. Brain, 127, 351–362.