Oscillatory profiles of positive, negative and neutral feedback stimuli during adaptive decision making

Oscillatory profiles of positive, negative and neutral feedback stimuli during adaptive decision making

    Oscillatory profiles of positive, negative and neutral feedback stimuli during adaptive decision making Peng Li, Travis E. Baker, Chr...

813KB Sizes 0 Downloads 74 Views

    Oscillatory profiles of positive, negative and neutral feedback stimuli during adaptive decision making Peng Li, Travis E. Baker, Chris Warren, Hong Li PII: DOI: Reference:

S0167-8760(16)30121-0 doi: 10.1016/j.ijpsycho.2016.06.018 INTPSY 11130

To appear in:

International Journal of Psychophysiology

Received date: Revised date: Accepted date:

20 October 2015 22 June 2016 30 June 2016

Please cite this article as: Li, Peng, Baker, Travis E., Warren, Chris, Li, Hong, Oscillatory profiles of positive, negative and neutral feedback stimuli during adaptive decision making, International Journal of Psychophysiology (2016), doi: 10.1016/j.ijpsycho.2016.06.018

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

IP

T

Running head: Neutral feedback

NU

making

SC R

Oscillatory profiles of positive, negative and neutral feedback stimuli during adaptive decision

Brain Function and Psychological Science Research Center Shenzhen University Shenzhen, China Department of Neurology and Neurosurgery Montreal Neurological Institute McGill University Montreal, Canada

AC

CE P

TE

2

D

1

MA

Peng Li1, Travis E. Baker2, Chris Warren3 and Hong Li1*

3

Department of Psychology Leiden University Leiden, The Netherlands

*Corresponding author: Hong Li, email address: [email protected] Postal address: No 3688, Nanhai Road, Nanshan District, Shenzhen, China, 518060; Tel.: +86 13509494932; fax: +86 0755 26958874.

1

ACCEPTED MANUSCRIPT Abstract

MA

NU

SC R

IP

T

The electrophysiological response to positive and negative feedback during reinforcement learning has been well documented over the past two decades, yet, little is known about the neural response to uninformative events that often follow our actions. To address this issue, we recorded the electroencephalograph (EEG) during a time-estimation task using both informative (positive and negative) and uninformative (neutral) feedback. In the time-frequency domain, uninformative feedback elicited significantly less induced beta-gamma activity than informative feedback. This result suggests that beta-gamma activity is particularly sensitive to feedback that can guide behavioral adjustments, consistent with other work. In contrast, neither theta nor delta activity were sensitive to the difference between negative and neutral feedback, though both frequencies discriminated between positive, and non-positive (neutral or negative) feedback. Interestingly, in the time domain, we observed a linear relationship in the amplitude of the feedback-related negativity (neutral > negative > positive), a component of the event-related brain potential thought to index a specific kind of reinforcement learning signal called a reward prediction error. Taken together, these results suggest that the reinforcement learning system treats neutral feedback as a special case, providing valuable information about the electrophysiological measures used to index the cognitive function of frontal midline cortex.

D

Key words: theta, beta-gamma, feedback-related negativity, neutral feedback, reinforcement

AC

CE P

TE

learning

2

ACCEPTED MANUSCRIPT Introduction Our ability to predict and evaluate the consequences of our actions is fundamental to adaptive

IP

T

decision making. Reinforcement learning (RL) theory holds that if an action is followed by positive

SC R

feedback then that action will have a greater probability of being performed again, whereas if an action is followed by negative feedback then that action will have a lesser probability of being

NU

performed again (i.e. Thorndike’s Law of Effect: Catania, 1999). But in everyday life, not all of our actions are followed by such binary consequences, but rather by uninformative events. In fact, the

MA

term neutral operants has long been used by RL theorists to describe responses from the environment that neither increase nor decrease the probability of a behavior being repeated (Skinner,

TE

D

1938). While observations of electrophysiological activity over frontal midline cortex have motivated a wealth of experimental and theoretical analyses of RL, it remains unclear how

CE P

uninformative feedback is ultimately processed during trial-and-error learning tasks. Over the last decade, both time domain and time-frequency domain analyses of

AC

electrophysiological recordings have been increasingly used in research concerned with neural processes that differentiate performance feedback indicating positive outcomes (e.g., monetary gain, correct feedback) from negative outcomes (e.g., monetary loss, error feedback) (Weinberg, Riesel, and Proudfit, 2014). In the time domain, event-related brain potential (ERP) studies have revealed a negative-going deflection in the ERP that peaks over frontal-central recording sites approximately 250ms following feedback presentation. This feedback-locked ERP component, termed the feedback-related negativity (FRN), is typically enhanced following unexpected task-relevant events (e.g. negative feedback, errors) and is reduced or absent following positive feedback. 1 Interestingly, 1

Recent evidence suggests that the difference in FRN amplitude between reward and error trials results from a positive-going 3

ACCEPTED MANUSCRIPT the few FRN studies examining neutral feedback have produced largely mixed results (Holroyd, Hajcak, & Larsen, 2006; Kujawa et al., 2013; Huang & Yu, 2014; Yu & Zhou, 2006). In particular,

IP

T

studies either report larger FRNs to neutral feedback compared to negative and positive feedback

SC R

(Muller et al. 2005; Nieuwenhuis et al. 2005; Kujawa et al., 2013; Huang & Yu, 2014) or comparable FRN amplitudes between neutral and negative feedback (Holroyd, Hajcak, & Larsen, 2006). 2

NU

Although neutral feedback has yet been investigated in the time-frequency domain, electroencephalogram (EEG) oscillations in the theta frequency range (4-8 Hz) recorded over frontal

MA

midline areas of the scalp have been associated with outcome processing (Cavanagh & Frank, 2014; Cavanagh et al., 2012), as well as other cognitive processes related to effort, attention and motivation

TE

D

(for reviews Hsieh & Ranganath, 2014; Mitchell et al., 2008). Notably, more frontal midline theta power is observed following negative feedback compared to positive feedback, suggesting that this

CE P

signal reflects an error-driven learning mechanism consistent with principles of reinforcement learning (Cavanagh et al., 2010). However, others have argued that frontal midline theta reflects

AC

sensitivity to important cognitive events in general rather than to errors in particular (Cavanagh et al., 2012), and signal the deployment of control (Cavanagh & Frank, 2014). Furthermore, power in the delta (1-4 Hz) and beta-gamma (20-30 Hz) frequency range have been shown to increase following positive feedback compared to negative feedback (Bernat et al., 2011; Cohen et al., 2007; Hajihosseini et al., 2012; Marco-Pallarès et al., 2008 ). In particular, recent work has indicated that feedback-locked delta band activity appears to be specific to surprising rewards, but does not predict deflection, the reward positivity (Rew-P), elicited by reward feedback (see Holroyd et al. 2008; Warren and Holroyd, 2012; Baker and Holroyd, 2013; Proudfit, 2015; ). Because the Rew-P typically occurs during the time-range of the FRN and P300, the difference-wave method is commonly used to isolate the reward positivity from other ERP components by taking the difference between the ERPs to positive and negative feedback. For the purpose of this study, we focused our analysis on condition-specific ERP effects by measuring the amplitudes of the FRN elicited by neutral, negative, and positive feedback. 2

It is interesting to note that across 5 experiments reported in Holroyd et al. (2006), the authors detailed that neutral feedback stimuli elicited larger FRNs than did the negative feedback stimuli, but was not statistically significant (see Figure 1 and 2, Holroyd, Hajcak, & Larsen, 2006). 4

ACCEPTED MANUSCRIPT associated behavioral adjustments (Cavanagh, 2014). By contrast, feedback-locked beta-gamma activity is thought to reflect a salience signal and has been associated with ongoing adjustments of

IP

T

behavior (HajiHosseini & Holroyd, 2015) and cognitive demand (Chen et al., 2012; Gilbert &

SC R

Sigman, 2007; Lee et al., 2003).

While decomposing feedback-related EEG and ERPs into spectral quantities has provided a

NU

thorough understanding of the cognitive processes underlying RL, much of the existing research on these time-frequency components often characterizes feedback-locked oscillatory activity (delta,

MA

theta, beta-gamma) as total spectral power. Unfortunately, this approach does not capture all the information available in these signals because total power within a given frequency band consists of

D

the stimulus phase-locked part of the EEG that gives rise to the ERP, called the “evoked” power, and

TE

the non-stimulus-phase-locked part of the EEG that is invisible in the ERP, called the “induced”

CE P

power (Tallon-Baudry & Bertrand, 1999). Importantly, current thinking holds that these activities reflect different cognitive processes, such that evoked activity reflects bottom-up neural activity,

AC

whereas induced activity is thought to reflect top-down modulation (Tallon-Baudry, & Bertrand, 1999). Indeed, it was recently demonstrated that total theta power is equally sensitive to outcome valence and outcome probability, however, evoked theta power was mainly sensitive to outcome valence whereas induced theta power was mainly sensitive to outcome probability (Hajihosseini & Holroyd, 2013). The role of delta and beta-gamma band phase dynamics in feedback processing remains unknown. The difference in dominant frequencies (delta, theta, and beta-gamma) between negative and positive feedback could provide a deeper understanding of the phase dynamics (evoked vs induced) at play during trial-by-trial RL. Furthermore, it is also important to consider the relationship between time domain and 5

ACCEPTED MANUSCRIPT time-frequency domain measures. For example, a relationship between feedback-related delta activity and the amplitude of the P300 component has been demonstrated (e.g. Bernat et al., 2007;

IP

T

Cavanagh & Frank, 2014), possibly suggesting that the evoked portion of delta following feedback

SC R

may contribute to feedback-related differences observed in the amplitude of the P300. However, this relationship has never been formally tested. Further, theta oscillations and the FRN have been extensively studied in parallel, decades-long literatures. Feedback-induced theta power and the FRN

NU

occur at about the same time (200-400 ms post feedback) and share the same scalp location (over the

MA

frontal midline), suggesting a functional relationship between these two phenomena. In particular, converging evidence across multiple methodologies indicates that the anterior cingulate cortex (ACC)

D

is the source of both frontal midline theta oscillations (Cavanagh & Frank, 2014) and the FRN

TE

(Holroyd & Yeung, 2012). Importantly, recent examinations of theta power and the FRN have

CE P

provided a nuanced account about their relationship (Hajihosseini & Holroyd, 2013). Under this account, unexpected, task-relevant events elicit an ACC-dependent control process that manifests in

AC

the frequency domain as theta oscillations over frontal-central areas of the scalp (Cavanagh & Frank, 2014). In the time domain, the “evoked” portion of this theta activity that is consistent in phase across trials gives rise to the FRN (Hajihosseini & Holroyd, 2013; see also Yeung et al., 2004). Although both measures provide valuable information about cognitive function of frontal midline cortex, it has recently been argued that FRN amplitude is specifically sensitive to dopamine reinforcement learning signals whereas evoked theta power reflects the ACC response to unexpected events. Given the relationship between theta oscillations and FRN, it is perhaps surprising that neutral feedback has not yet been investigated in the time-frequency domain. Furthermore, because of the 6

ACCEPTED MANUSCRIPT inconsistency in FRN studies examining neutral operants, the functional role of the ACC in the cognitive processes that underlie reinforcement learning remains incomplete. Thus, in order to

IP

T

further our understanding of the cognitive processes underlying informative and non-informative

SC R

feedback during RL, the electrophysiological response to the good, the bad, and the neutral needs to be further characterized. In the present study, we present an harmonious application of both ERP (i.e. FRN) and time-frequency (i.e. evoked and induced delta, theta, and beta-gamma power) approaches

NU

in an attempt to elucidate the discrete aspects of the electrophysiological dynamics between positive,

MA

negative, and neutral feedback (Holroyd, HajiHosseini, & Baker, 2012).

TE

Participants and Procedure

D

Methods

CE P

Nineteen undergraduate students (eight males) aged 18–23 years participated in the experiment for monetary compensation. All participants had normal or corrected-to normal vision, were right-handed and had no neurological or psychological disorders. Two subjects were excluded out

AC

from the final analysis due to their poor behavioral performance. The study was approved by the local ethics committee. Participants were asked to perform a time estimation task (e.g. Miltner et al., 1997) that included neutral feedback. They were required to press the spacebar following a cue (a 1500 Hz sound that lasted 50 ms) to indicate that their estimate of one second had elapsed. Following their response, a feedback stimulus appeared on the screen indicating whether their estimation was correct (positive feedback, win 5 cents; a circle with a check mark), incorrect (negative feedback, 0 cents; a circle with a cross mark) or the feedback was absent (neutral feedback, either 5 or 0 cents: a circle with nothing inside). To note, participants did not know whether or not they received money following neutral feedback immediately, but would receive money for correct response (in total, 50% 7

ACCEPTED MANUSCRIPT of the trials) at the end of the experiment. Participants were told that if their reaction time was within the time window from 900ms to 1100ms, they would receive positive feedback; otherwise they

IP

T

would get negative feedback. However, this time window narrowed by 10 ms if they responded

SC R

correctly on the previous trial and widened by 10 ms if they responded incorrectly on the previous trial. For 1/3 of negative-feedback trials and 1/3 of positive-feedback trials, the appropriate feedback was randomly replaced with neutral feedback. Of the 288 trials, participants received 28% neutral

MA

NU

feedback, 35% positive feedback and 37% negative feedback in total.

D

EEG Acquisition

TE

The electroencephalogram (EEG) was recorded at 64 scalp sites using tin electrodes mounted in

CE P

an elastic cap (Brain Product, Munchen, Germany), with a ground electrode placed on the frontal midline and references placed on the left and right mastoids. Vertical electrooculograms (EOGs) were recorded supra-orbitally and infra-orbitally relative to the left eye. The horizontal EOG was

AC

recorded as the difference in activity from the right versus the left orbital rim. The impedance of all electrodes was kept below 10 kΩ. The EEG and EOG were amplified using a 0.05–100Hz bandpass and continuously digitized at 500 Hz/channel for offline analysis. The EEG data were further filtered offline (0.1–40 Hz bandwidth) for ERP analysis. Then, ocular artifacts were corrected using the eye movement correction algorithm described by Gratton, Coles, and Donchin (1983). Trials with EOG artifacts (mean EOG voltage exceeding 80μV) and peak-to-peak deflection and those contaminated artifacts due to amplifier clipping exceeding 80μV were excluded from averaging. Less than 5% of trials were rejected after preprocessing in each of the three conditions.

8

ACCEPTED MANUSCRIPT

T

Time-frequency analysis.

IP

To extract time-frequency information from EEG data associated with feedback stimulus

SC R

presentation, 2-s epochs centered on feedback onset were extracted from the single-trial data. The EEG epochs were convolved with a complex 7-cycle Morlet wavelet using custom-written Matlab

NU

routines (see Marco-Pallares et al., 2008; HajiHosseini & Holroyd, 2013) that implement the method described by Lachaux, Rodriguez, Martinerie and Varela (1999). Changes in power over time

MA

(squared amplitude of the convolution between the signal and the wavelet) in the 1 to 40 Hz frequency range were computed for each single trial and averaged for each subject and condition

TE

D

before creating grand averages across subjects. The relative change in the power for each condition was determined by averaging the baseline activity (100 ms prestimulus) across time for each

CE P

frequency and then subtracting the average from each data point following stimulus presentation for the corresponding frequency. For each subject, the total power was calculated as the average value of

AC

time-frequency power across single trials and the evoked power was determined directly from the averaged ERPs, and induced power was then identified by subtracting the evoked theta power from the total theta power (HajiHosseini & Holroyd, 2013; Behroozmand et al., 2015). For statistical purposes, the mean induced and evoked power was obtained within a 200 ms window following the onset of the feedback stimulus (Delta [1-3 Hz], 300-500 ms; theta [4-8 Hz], 200-400 ms; beta-gamma [20-40 Hz], 350-550ms, cf. Cunillera et al., 2012; HajiHosseini, & Holroyd, 2015). All our analyses were restricted to channel FCz.

9

ACCEPTED MANUSCRIPT ERP analysis The FRN was quantified by first segmenting the EEG into 800 ms epochs time-locked to the

IP

T

feedback stimulus, including a 200 ms baseline preceding the feedback. After baseline correction,

SC R

FRN amplitude was evaluated for each participant and feedback condition (positive, negative, neutral) using a base-to-peak algorithm described in Holroyd, Hajcak, & Larsen (2006, see also Holroyd,

NU

Nieuwenhuis, Yeung, & Cohen, 2003; Holroyd, Pakzad-Vaezi, & Krigolson, 2008), as follows. First, the most positive voltage within a 160 to 260 ms window following feedback presentation was taken

MA

as the base of the FRN. Then, the sample with the most negative value within a time window starting from the base of FRN to 400 ms after feedback presentation was taken as the peak. FRN amplitude

D

was calculated as the difference between these base and peak amplitudes. The algorithm assigned

Results

CE P

variance (ANOVA).

TE

0μV where no N200 was detected. FRN values at FCz were analyzed with a one-way analyses of

AC

Trial-to-trial behavioral adjustment The raw reaction time (RT) does not provide any more information than accuracy in this kind of time estimation task. However, the absolute value of change in RT (△ RT) between the N trial and N+1 trial gives a measure of the behavioral adjustment following each of the three types of feedback, and controls for slow changes in performance over time (Dutilh, van Ravenzwaaij, Nieuwenhuis, van der Maas, Forstmann, & Wagenmakers, 2012). A one way ANOVA was conducted on this data with the three types of feedback as independent variables. As showed in Fig1A, results showed that the main effect of feedback valence was significant, F(2, 34)=90.9 p<.001, η2 = .85. The following 10

ACCEPTED MANUSCRIPT pairwise comparisons revealed that △ RT following negative feedback (230±37 ms) was significantly larger than △ RT following neutral feedback (178±43 ms), t (16) =8.95, p <.001, and △

IP

T

RT following neutral feedback was significantly larger than △ RT following positive feedback

SC R

(144±27 ms), t (16) =5.05, p <.001.

NU

ERP results

MA

Fig.1B illustrates the ERPs elicited by the neutral, negative, and positive feedback. A one-way ANOVA analysis on the FRN amplitude with feedback condition (Negative, Neutral & Positive) as a

D

factor revealed a main effect of feedback3, F (2, 32) =21.09, p <.001, η2 = .57. Pairwise comparisons

TE

revealed that the FRN following neutral feedback (M = -7.80μV, SEM =1.04) was significantly larger

CE P

than both FRN following negative feedback (M = -6.23μV, SEM =0.94), t (16) =2.12, p <.05, and positive feedback (M = -1.76μV, SEM =0.39), t(16) =5.36, p <.001. Consistent with previous research, the FRN following negative feedback was larger than the FRN following positive feedback,

AC

t (16) =4.52, p <.001. The difference waves between negative and positive and between neutral and positive and corresponding scalp distributions were shown in Fig 1C and Fig 1D. <> Time-frequency results Delta (1-3Hz) Figure 2 shows the time-frequency results for both induced and evoked power across each feedback condition. Figure 3 highlights the effect of feedback on delta activity. A

3

Note that the pattern of results is the same if the FRN is measured by a mean amplitude approach from 220-320 ms: the main effect of valence is significant, F (1.2, 19.9) =24.08, p <.001, η2 = .60. Pair-wise comparisons further showed that neutral feedback elicited significant larger FRN (M = 8.63 μV, SEM = 1.4) than the negative feedback (M = 10.88 μV, SEM = 1.62, p <.01). The negative feedback also elicited a larger FRN than positive feedback (M = 16.46 μV, SEM = 2.21, p <.001). 11

ACCEPTED MANUSCRIPT two-way ANOVA on delta activity with Power (induced, evoked) and Feedback (positive, negative, neutral) revealed a main effect of Feedback, F (2, 32) = 8.81, p = .001, η2= .36. Post-hoc analyses

IP

T

indicated that delta activity overall was characterized by greater power for positive feedback (M

SC R

= .18 dB, SEM = .02) than negative (M = .11 dB, SEM = .02), t(16) =3.96, p <.001, and neutral feedback (M = .10 dB, SEM = .02), t(16) =3.97, p <.001. No differences in delta power were observed between negative and neutral feedback, t(16) =0.55 , p =.59). The main effect of power

NU

didn’t reach significant, F(1, 16) <1, p = .38, η2= .05. There was no significant interaction effect

MA

between power and feedback, F(2, 32) = 2.2, p = .13, η2= .12. Theta (4-8 Hz) As shown in Fig.2, a two-way ANOVA on theta activity with Power (induced,

D

evoked) and Feedback (positive, negative, neutral) revealed a main effect of Power, F(1, 16) = 9.08,

TE

p < .01, η2= .36, a main effect of feedback, F(2, 32) = 4.48, p < .05, η2= .22, and an interaction

CE P

between Power and Feedback, F(2, 32) = 9.94, p < .005, η2= .38 (Fig. 2A). Post-hoc analyses indicated that theta was characterized by greater induced power (M = .60 dB, SEM = .09) than

AC

evoked power (M = .36 dB, SEM = .06), t(16) =3.01, p<.01. In regards to Feedback, theta activity overall was characterized by reduced power for positive feedback (M = .41 dB, SEM = .07) relative to negative feedback (M = .55 dB, SEM = .09), t(16) =2.68, p <.02, and neutral (M = .48 dB, SEM = .07), t (16) =2.56, p <.03. Theta power following negative feedback didn’t differ from that following neutral feedback, t(16) =1.26, p=.23. More importantly, there was a significant interaction effect between Power and Feedback, post-hoc tests indicated that evoked power following positive feedback (M = .16 dB, SEM = .03) was significantly reduced compared to negative (M = .49 dB, SEM = .10; t (16) =3.82, p =.002) and neutral (M = .42 dB, SEM = .06; t (16) =5.20, p <.001) feedback. No differences were observed between evoked power following neutral and negative 12

ACCEPTED MANUSCRIPT feedback, t(16) =1.32, p=.21. By contrast, induced power following positive feedback was marginally significantly larger (M = .65 dB, SEM = .12) than neutral feedback (M = .54 dB, SEM

IP

T

= .09), t (16) = 1.96, p =.07. No other differences were detected (p >.05).

SC R

Beta-Gamma (20-40 Hz). A two-way ANOVA on beta-gamma activity with Power (induced, evoked) and Feedback (positive, negative, neutral) revealed a main effect of Feedback, F(2, 32) =

NU

8.38, p = .001, η2= .34, and an interaction between Power and Feedback, F(2, 32) = 8.41, p = .001, η2= .35. The interaction (Fig 3A & Fig 3B) indicates that induced power following neutral feedback

MA

(M = -.043 dB, SEM = .039) was significantly reduced compared to negative (M = .064dB, SEM = .040), t (16) = 2.74, p <.02, and positive (M = .179 dB, SEM = .069), t (16) = 3.47, p =.003,

TE

D

feedback. Induced beta-gamma power following positive feedback was larger than negative feedback, but the difference was only marginally significant t (16) = 2.05, p =.057. No significant difference

CE P

was observed between the three feedback conditions for evoked power (all p >.05). As well, the main effect of Power was not significant, F(1, 16) = 2.65, p = .12, η2= .14. Post-hoc analyses

AC

indicated that beta-gamma activity overall was characterized by greater power for positive feedback (M = .18 dB, SEM = .07) than negative (M = .06 dB, SEM = .04), t(16) =2.11, p =.051, and neutral (M = -.04 dB, SEM = .04), t (16) =3.19, p <.005, feedback. Finally, negative feedback elicited a stronger beta-gamma response than the neutral feedback, t(16) =2.65, p<.02. <> <>

13

ACCEPTED MANUSCRIPT Discussion The purpose of this study was to apply both a time-frequency and ERP analysis to the

T

electrophysiological activity elicited by neutral feedback during an RL paradigm. Our ERP analysis

IP

revealed that neutral feedback elicited a larger FRN than negative feedback, replicating some

SC R

previous work (Huang &Yu, 2014; Yu & Zhou, 2006), but inconsistent with other work (Holroyd, Hajcak, & Larsen, 2006). Our time-frequency analysis revealed three oscillatory frequencies that were sensitive to our feedback manipulation, theta, delta, and beta-gamma.

NU

In our time-frequency analysis, the theta and delta bands exhibited a similar pattern of sensitivity to feedback, whereby both frequency bands distinguished positive feedback from each of negative

MA

and neutral feedback, but did not distinguish between negative and neutral feedback. This similarity is in line with work from Bernat and colleagues, demonstrating that the time-domain FRN is

D

produced by differences in both theta and delta power (Bernat et al., 2007; Bernat et al., 2011; Bernat

TE

et al., 2015). However, we observed differences in the time-domain FRN between neutral and negative feedback, suggesting an additional layer to the FRN that cannot be explained solely by theta

CE P

and delta power. In contrast to theta and delta activity, beta-gamma activity differentiated neutral feedback from informative feedback, and thus we speculate that the neutral-feedback FRN is produced by an interaction of two effects on EEG oscillations: valence effects (theta and delta), and

AC

informative versus uninformative feedback effects (beta-gamma). Perhaps our most important finding is that induced beta-gamma power dissociated between uninformative feedback (i.e. neutral feedback) and informative feedback. Interestingly, the lack of induced beta-gamma to neutral feedback coincided with the pattern of behavioral adjustments following feedback. In particular, subjects did not adjust their behavior systematically following neutral feedback, but tended to keep their response pattern following positive feedback, and made large adjustments in response time following negative feedback. The beta-gamma range has been associated with behavioral adjustments in other work (e.g. Cunillera et al., 2012; Van de Vijver, Ridderinkhof, & Cohen, 2011), and has been shown to be more sensitive to valence in trial-and-error learning tasks that require ongoing adjustments of behavior (such as the time-estimation task) (HajiHosseini & Holroyd, 2015). Notably, it has been suggested that (total) beta-gamma activity 14

ACCEPTED MANUSCRIPT represents a “motivational value signal” (HajiHosseini, Rodriguez-Fornells, & Marco-Pallares, 2012, p.1683), and that beta-gamma increases are representative of increased attentional resources being applied to particularly important events. In this context, our findings support the idea that induced

IP

T

beta-gamma reflects a manifestation of a motivational value signal that energizes behavioral adjustments when feedback is meaningful, but that is absent when feedback provides no

SC R

task-relevant information. Therefore, based on previous literature and on our beta-gamma effects, we speculate that induced beta-gamma activity reflects a motivational response to feedback (HajiHosseini & Holroyd, 2015) and active inhibition/disinhibition of motor commands (Cavanagh,

NU

2014; Engel and Fries, 2010), possibly highlighting a top-down modulatory role during RL. The nature of induced delta and beta-gamma may provide further insight into the neural computations

MA

supporting RL.

An influential theory holds that the FRN is produced by the impact of phasic increases and

D

decreases in dopamine activity coding for positive and negative reward prediction error signals (RPE)

TE

on ACC (RPE-ACC theory; Holroyd & Coles, 2002; Holroyd et al., 2008). RPEs constitute the learning term in powerful reinforcement learning algorithms that indicate when events are “better” or

CE P

“worse” than expected (Sutton & Barto, 1998) and substantial evidence over the past decade has confirmed that the FRN reflects an RPE signal (for reviews see Walsh & Anderson, 2012; Sambrook & Goslin, 2015). The RPE-ACC theory holds that FRN amplitude is regulated up and down by

AC

dopamine RPE signals conveyed to ACC. In particular, positive dopamine RPE signals conveyed to the ACC following unexpected positive feedback suppresses the production of the FRN whereas negative dopamine RPE signals enhance FRN amplitude (Holroyd et al., 2008; see Proudfit, 2015 for review). The FRN has been shown to categorically distinguish between positive RPEs and negative RPEs, showing a more negative voltage for the latter. The RPE-ACC theory requires the FRN to show two further properties beyond this categorical distinction. That is, the FRN should be sensitive to both the prior likelihood of reward (likely, unlikely) and outcome magnitude (how much better or worse than expected value an outcome is). With this in mind, the larger FRN following neutral feedback suggests that the size of the negative RPE varied as either a function of outcome magnitude or outcome likelihood, or possibly both. In regards to an effect of magnitude, it is possible that 15

ACCEPTED MANUSCRIPT participants came to view neutral feedback a “much” worse than expected because neutral feedback gave no useable information. Alternatively, it is also possible that feedback likelihood modulated the negative RPE size following neutral feedback, resulting in a larger FRN amplitude. Because of the

IP

T

dynamics of trial-and-error learning tasks, subjects may have come to expect positive or negative feedback to follow their responses and categorized uninformative feedback as a relatively rare event

SC R

(1/3 of trials vs. 2/3 of trials giving informative feedback).

We note two potential limitations of our study. First, our analysis was based on data from 17

NU

subjects. Though this sample size is on par and even exceeds most ERP studies, it is small relative to the recent trend of increasing sample sizes in ERP research. Second, our neutral feedback stimulus was relatively unique compared to the feedback in the positive and negative conditions, which could

MA

provoke a low-level salience reaction in our subjects that would not be seen to negative or positive feedback. That being said, the effects we report were exhibited later in the EEG than components

D

typically associated with low-level physical features of stimuli, such as the N1 (Luck, Heinze,

TE

Mangun, & Hillyard, 1990).

CE P

Conclusion

To the best of our knowledge, a time-frequency and time-domain analyses have yet to be utilized

AC

together to investigate neutral operants during RL. Using neutral feedback during RL, we present further evidence of the role of induced beta-gamma oscillations as a motivational signal for behavioral adjustment (HajiHosseini, Rodriguez-Fornells, & Marco-Pallares, 2012), and argue that the FRN cannot be explained as an effect of theta oscillatory activity alone (Cavanagh & Frank, 2014). Our results suggest that the FRN also contains power from other frequency bands, including delta power (Bernat et al., 2011; Bernat et al., 2015), and beta-gamma. Lastly, these findings motivate further study of the role of these electrophysiological signals to neutral feedback in understanding individual differences (e.g. Hirsh & Inzlicht, 2008; Gu, Ge, Jiang, & Luo, 2010; Li et al., 2015) and psychopathology (e.g. Proudfit, 2015; Baker et al. 2011; Morris et al. 2008) associated 16

ACCEPTED MANUSCRIPT with RL. For instance, according to the “aberrant salience” hypothesis, schizophrenia patients attribute salience to otherwise neutral environmental stimuli, and those stimuli may ultimately appear

IP

T

meaningful and evoke delusional mood in patients (for review, see Deserno et al., 2013). These

SC R

issues are ripe for future investigations. Acknowledgment

NU

This work was supported by the National Natural Science Foundation of China (NSFC31300872&81171289), and MOE (Ministry of Education in China) Project of Humanities and

MA

Social Sciences (13YJC190013).

D

References

TE

Tallon-Baudry, C., & Bertrand, O. (1999). Oscillatory gamma activity in humans and its role in

CE P

object representation. Trends in Cognitive Sciences, 3, 151–162. Behroozmand, R., Ibrahim, N., Korzyukov, O., Robin, D. A., & Larson, C. R. (2015). Functional

AC

role of delta and theta band oscillations for auditory feedback processing during vocal pitch motor control. Frontiers in neuroscience, 9. Baker, TE, Stockwell, T, Barnes, G, Holroyd, CB. (2011). Individual differences in substance dependence: at the intersection of brain, behaviour and cognition. Addiction Biology 16: 458-466. Bernat, E. M., Malone, S. M., Williams, W. J., Patrick, C. J., & Iacono, W. G. (2007). Decomposing delta, theta, and alpha time–frequency erp activity from a visual oddball task using pca. International Journal of Psychophysiology, 64(1), 62-74. Bernat, E. M., Nelson, L. D., & Baskin-Sommers, A. R. (2015). Time-frequency theta and delta 17

ACCEPTED MANUSCRIPT measures index separable components of feedback processing in a gambling task. Psychophysiology, 52(5), 626–637.

IP

T

Bernat, E. M., Nelson, L. D., Steele, V. R., Gehring, W. J., & Patrick, C. J. (2011). Externalizing

SC R

psychopathology and gain–loss feedback in a simulated gambling task: Dissociable components of brain response revealed by time-frequency analysis. Journal of Abnormal Psychology, 120(2), 352-364.

NU

Catania, A. C. (1999). Thorndike's legacy: Learning, selection, and the law of effect. Journal of

MA

the experimental analysis of behavior, 72(3), 425-428.

Cavanagh, J. F., Figueroa, C. M., Cohen, M. X., & Frank, M. J. (2012). Frontal Theta Reflects

D

Uncertainty and Unexpectedness during Exploration and Exploitation. Cerebral Cortex, 22,

TE

2575-2586.

CE P

Cavanagh, J. F., & Frank, M. J. (2014). Frontal theta as a mechanism for cognitive control. Trends in cognitive sciences, 18(8), 414-421.

AC

Cavanagh, J. F., Frank, M. J., Klein, T. J., & Allen, J. J. (2010). Frontal theta links prediction errors to behavioral adaptation in reinforcement learning. Neuroimage, 49(4), 3198-3209. Chen, C. C., Kiebel, S. J., Kilner, J. M., Ward, N. S., Stephan, K. E., Wang, W. J., & Friston, K. J. (2012). A dynamic causal model for evoked and induced responses. NeuroImage, 59(1), 340-348. Cohen, M.X., Elger, C.E., Ranganath, C., 2007. Reward expectation modulates feedback-related negativity and EEG spectra. NeuroImage, 35, 968 - 978. Cunillera, T., Fuentemilla, L., Periañez, J., Marco-Pallarès, J., Krämer, U. M., Càmara, E., ... & Rodríguez-Fornells, A. (2012). Brain oscillatory activity associated with task switching and 18

ACCEPTED MANUSCRIPT feedback processing. Cognitive, Affective, & Behavioral Neuroscience, 12(1), 16-33. Deserno, L., Boehme, R., Heinz, A., and Schlagenhauf, F. (2013). Reinforcement learning and

IP

T

dopamine in schizophrenia: dimensions of symptoms or specific features of a disease group?

SC R

Front. Psychiatry. 4, 1-16.

Gilbert, C.D., Sigman, M., 2007. Brain states: top-down influences in sensory processing. Neuron 54 (5), 677–696.

NU

Gratton, G., Coles, M. G., & Donchin, E. (1983). A new method for offline removal of ocular

MA

artifact. Electroencephalography and Clinical Neurophysiology, 55, 468–484. Gu, R., Ge, Y., Jiang, Y., & Luo, Y. J. (2010). Anxiety and outcome evaluation: The good, the

D

bad and the ambiguous. Biological psychology, 85, 200-206.

TE

Hajihosseini, A., & Holroyd, C. B. (2013). Frontal midline theta and N200 amplitude reflect

550-562.

CE P

complementary information about expectancy and outcome evaluation. Psychophysiology, 50,

AC

HajiHosseini, A., & Holroyd, C. B. (2015). Sensitivity of frontal beta oscillations to reward valence but not probability. Neuroscience letters, 602, 99-103. HajiHosseini, A., Rodríguez-Fornells, A., & Marco-Pallarés, J. (2012). The role of beta-gamma oscillations in unexpected rewards processing. Neuroimage, 60(3), 1678-1685. Hirsh, J. B., & Inzlicht, M. (2008). The devil you know neuroticism predicts neural response to uncertainty. Psychological Science, 19, 962-967. Holroyd, C. B., Hajcak, G., & Larsen, J. T. (2006). The good, the bad and the neutral: electrophysiological responses to feedback stimuli. Brain research, 1105, 93-101. Holroyd, C. B., HajiHosseini, A., & Baker, T. E. (2012). ERPs and EEG oscillations, best friends 19

ACCEPTED MANUSCRIPT forever: comment on Cohen et al. Trends in cognitive sciences, 16(4), 192. Holroyd, C. B. and M. G. H. Coles (2002). "The neural basis of human error processing:

IP

T

Reinforcement learning, dopamine, and the error-related negativity." Psychological Review,

SC R

109, 679-709.

Holroyd, C. B., Pakzad‐ Vaezi, K. L., & Krigolson, O. E. (2008). The feedback correct‐ related positivity: Sensitivity of the event‐ related brain potential to unexpected positive feedback.

NU

Psychophysiology, 45(5), 688-697.

MA

Holroyd, C. B., & Yeung, N. (2012). Motivation of extended behaviors by anterior cingulate cortex. Trends in cognitive sciences, 16(2), 122-128.

D

Hsieh, L. T., & Ranganath, C. (2014). Frontal midline theta oscillations during working memory

TE

maintenance and episodic encoding and retrieval. Neuroimage, 85, 721-729.

CE P

Huang, Y., & Yu, R. (2014). The feedback-related negativity reflects “more or less” prediction error in appetitive and aversive conditions. Frontiers in Neuroscience, 8.

AC

Kujawa, A., Smith, E., Luhmann, C., & Hajcak, G. (2013). The feedback negativity reflects favorable compared to nonfavorable outcomes based on global, not local, alternatives. Psychophysiology, 50(2), 134-138. Lachaux, J. P., Rodriguez, E., Martinerie, J., & Varela, F. J. (1999). Measuring phase synchrony in brain signals. Human brain mapping, 8, 194-208. Lee, K. H., Williams, L. M., Breakspear, M., & Gordon, E. (2003). Synchronous gamma activity: a review and contribution to an integrative neuroscience model of schizophrenia. Brain Research Reviews, 41(1), 57-78. Li, P., Song, X., Wang, J., Zhou, X., Li, J., & Lin, F., et al. (2015). Reduced sensitivity to neutral 20

ACCEPTED MANUSCRIPT feedback versus negative feedback in subjects with mild depression: evidence from event-related potentials study. Brain & Cognition, 100, 15-20.

IP

T

Luck, S. J., Heinze, H. J., Mangun, G. R., & Hillyard, S. A. (1990). Visual event-related

SC R

potentials index focused attention within bilateral stimulus arrays. II. Functional dissociation of P1 and N1 components. Electroencephalography and clinical neurophysiology, 75(6), 528-542.

NU

Marco-Pallares, J., Cucurell, D., Cunillera, T., García, R., Andrés-Pueyo, A., Münte, T. F., &

MA

Rodríguez-Fornells, A. (2008). Human oscillatory activity associated to reward processing in a gambling task. Neuropsychologia, 46, 241-248.

D

Miltner, W. H. R., Braun, C. H., & Coles, M. G. H. (1997). Event-related brain potentials

TE

following incorrect feedback in a time-estimation task: Evidence for a “generic” neural

CE P

system for error detection. Journal of Cognitive Neuroscience, 9, 788 –798. Mitchell, D. J., McNaughton, N., Flanagan, D., & Kirk, I. J. (2008). Frontal-midline theta from

AC

the perspective of hippocampal “theta”. Progress in neurobiology, 86(3), 156-185. Morris, S.E., Heerey, E.A., Gold, J.M., Holroyd, C.B. (2008). Learning-related changes in brain activity following errors and performance feedback in schizophrenia. Schizophrenia Research, 99: 274-285. Proudfit, GH (2015). The reward positivity: From basic research on reward to a biomarker for depression. Psychophysiology. Sambrook, T. D., & Goslin, J. (2015). A neural reward prediction error revealed by a meta-analysis of ERPs using great grand averages. Psychological Bulletin, 141(1), 213-235. Skinner, B. F. (1938). The Behavior of Organisms: An Experimental Analysis. New York: 21

ACCEPTED MANUSCRIPT Appleton-Century. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1).

IP

T

Cambridge: MIT press.

SC R

Van de Vijver, I., Ridderinkhof, K. R., & Cohen, M. X. (2011). Frontal oscillatory dynamics predict feedback learning and action adjustment. Journal of cognitive neuroscience, 23(12), 4106-4121.

NU

Walsh, M. M., & Anderson, J. R. (2012). Learning from experience: event-related potential

MA

correlates of reward processing, neural adaptation, and behavioral choice. Neuroscience & Biobehavioral Reviews, 36(8), 1870-1884.

D

Warren, C. M., & Holroyd, C. B. (2012). The impact of deliberative strategy dissociates ERP

TE

components related to conflict processing vs. reinforcement learning. Frontiers in

CE P

Neuroscience, 6(43), 1-17.

Weinberg, A., Riesel, A., & Proudfit, G. H. (2014). Show me the money: The impact of actual

AC

rewards and losses on the feedback negativity. Brain and cognition, 87, 134-139. Yeung, N., Bogacz, R., Holroyd, C. B., & Cohen, J. D. (2004). Detection of synchronized oscillations in the electroencephalogram: an evaluation of methods. Psychophysiology, 41(6), 822-832. Yu, R., and Zhou, X. (2006). Brain potentials associated with outcome expectation and outcome evaluation. Neuroreport, 17, 1649.

22

CE P

TE

D

MA

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

Figure captions

AC

Fig. 1 (A) Bar showed averaged changed RT cross trials for each condition, the error bars represents standard error; (B) Grand average ERP at FCz associated with neutral (green line), negative (red line), and positive (blue line) feedback; (C) Difference wave between neutral and positive condition (Neutral-Positive DW)

and difference

wave between negative

and positive condition

(Negative-Positive DW); (D) Scalp distributions of difference waves.

Fig. 2 The time frequency representations were shown in Fig. 2: (A) Induced power following positive (left panel), negative (middle panel) and neutral (right panel) feedback; (B) Evoked power following positive (left panel), negative (middle panel) and neutral (right panel) feedbacks; All data 23

ACCEPTED MANUSCRIPT recorded at channel FCz.

IP

T

Fig. 3 The time course of power change of delta, theta and beta-gamma band for positive, negative

SC R

and neutral feedback: (A) The time course of the change in induced delta (left panel), induced theta (middle panel), and induced beta-gamma (right panel) power associated with neutral (green solid line), negative (red solid line), and positive (blue solid line) feedback (B) The time course of the

NU

change in evoked delta (left panel), evoked beta-gamma (middle panel) and evoked-theta (right panel)

MA

power associated with neutral (green dash line), negative (red dash line), and positive (blue dash line)

AC

CE P

TE

D

feedback, All data recorded at channel FCz.

24

AC

Figure 1

CE P

TE

D

MA

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

25

MA

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

AC

CE P

TE

D

Figure 2

26

MA

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

AC

CE P

TE

D

Figure 3

27

ACCEPTED MANUSCRIPT Highlights We investigated the electrophysiological response to neutral feedback



A larger FRN was observed to neutral stimuli than negative or positive feedback



Theta was equal for negative and neutral feedback but smaller for positive results



Beta-gamma discriminated between informative and uninformative feedback

AC

CE P

TE

D

MA

NU

SC R

IP

T



28