Reward magnitude and timing in pigeons

Reward magnitude and timing in pigeons

Behavioural Processes 86 (2011) 359–363 Contents lists available at ScienceDirect Behavioural Processes journal homepage: www.elsevier.com/locate/be...

571KB Sizes 0 Downloads 79 Views

Behavioural Processes 86 (2011) 359–363

Contents lists available at ScienceDirect

Behavioural Processes journal homepage: www.elsevier.com/locate/behavproc

Short report

Reward magnitude and timing in pigeons Elliot A. Ludvig a,∗ , Fuat Balci b , Marcia L. Spetch c a

Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, United States c Department of Psychology, University of Alberta, Edmonton, Alberta, Canada b

a r t i c l e

i n f o

Article history: Received 24 August 2010 Received in revised form 11 December 2010 Accepted 7 January 2011 Keywords: Motivation Interval timing Reward magnitude Peak procedure Fixed intervals Pigeons

a b s t r a c t We investigated the interaction of motivation and timing by manipulating the expected reward magnitude during a peak procedure. Four pigeons were tested with three different reward magnitudes, operationalized as duration of food access. Each stimulus predicted a different reward magnitude on a 5 s fixed-interval schedule. Trials with different reward magnitudes were randomly intermingled in a session. Most pigeons responded less often and started responding later on peak trials when a smaller reward was expected, but showed no differences in response termination or peak times. Reward magnitude was independently corroborated through unreinforced choice trials, when pigeons chose between the three stimuli presented simultaneously. These results contribute to a growing body of evidence that the expected reward magnitude influences the decision to start anticipatory responding in tasks where the reward becomes available after a fixed interval, but does not alter peak times, nor the decision to stop responding on peak trials. © 2011 Elsevier B.V. All rights reserved.

Early theories of timing supposed that motivational factors, such as reward magnitude or satiety, should have no influence on timing (e.g., Gibbon, 1977). Initial data supported this position (Roberts, 1981; Hatten and Shull, 1983), but more recent results have revealed that motivational manipulations do alter responding on timing procedures (e.g., Balci et al., 2010a,b; Galtress and Kirkpatrick, 2009, 2010; Ludvig et al., 2007; McClure et al., 2009). For example, decreased reward magnitude (Galtress and Kirkpatrick, 2009; Grace and Nevin, 2000; Ludvig et al., 2007), pre-feeding (Plowright et al., 2000), and increasing satiety during a session (Balci et al., 2010b) all increase the time to initiate responding in the peak procedure, while having more muted effects on the remainder of the response curve. The studies with reward magnitude have mostly involved changing reward magnitude for an extended period, across multiple sessions (e.g., Galtress and Kirkpatrick, 2009; Ludvig et al., 2007), thereby producing differences in the overall reward rates across conditions. Here, we evaluate dynamic changes in timed responding on the peak procedure by intermingling three different reward magnitudes in a session. The peak procedure is the most prominent method for evaluating timing in animals (Roberts, 1981). There are two types of

∗ Corresponding author at: Princeton University, Princeton Neuroscience Institute, 3-N-12 Green Hall, Princeton, NJ 08542, United States. Tel.: +1 609 849 8879. E-mail addresses: [email protected], [email protected] (E.A. Ludvig). 0376-6357/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.beproc.2011.01.003

trials in a peak procedure. On rewarded trials, pigeons are reinforced for the first response emitted after a fixed amount of time has elapsed since stimulus onset. On peak trials, no reward is available and the stimulus remains present considerably longer than on rewarded trials. Typically, average response rates on peak trials increase until around the time the reward is usually available and decrease afterward. Individual peak trials can be approximated as a three-state process, where an initial pause is followed by a burst of responding, followed by another pause (Church et al., 1994; Gallistel et al., 2004a). This molecular approach allows for independent estimates of the time to start and stop responding on individual trials. In this paper, we extended the recent results on the interaction of motivation and timing by using three reward magnitudes within a session of the peak procedure in pigeons. Stimuli were different colours for the three different reward magnitudes. This design allowed us to change the local expectation of reward without altering the overall reward rate. As a result, there should not be any overall changes in attention or arousal that could differentially affect the levels of expected reward magnitude. In addition, we incorporated choice trials that allowed for independent corroboration that the reward magnitudes led to different preferences. Finally, we explored performance on a shorter (5 s) fixed interval than earlier studies. Based on previous results (e.g., Ludvig et al., 2007; Galtress and Kirkpatrick, 2009), we hypothesized that smaller reward magnitudes would lead to later start times on peak trials, with limited changes later in the trial.

360

E.A. Ludvig et al. / Behavioural Processes 86 (2011) 359–363

1. Methods 1.1. Subjects Four adult pigeons (Columba livia), two Silver Kings and two Racers, were obtained from local suppliers (numbered P5, P18, P35, and P299). Pigeons were housed in standard cages and given unlimited access to water and grit. The light cycle was 12:12, and all testing occurred during the light portion of the cycle. Pigeons were maintained at 85–90% of ad libitum body weight by post-session feeding in their home cage. Sessions were run 6 days a week. All pigeons had prior experience with operant conditioning tasks, but not with the stimuli used, nor with similar timing procedures. 1.2. Apparatus Testing occurred in custom-made operant chambers. The chambers were 44 cm high, 32 cm deep, and 74 cm wide (inside dimensions). At the front, there was a Carroll Touch infrared touch fame (Elo Touch Systems, Inc., Menlo Park, CA) to record pecks, which was mounted in front of a 15 in. LCD monitor that displayed all stimuli. On either side of the touchscreen/monitor, there were solenoid-type bird feeders that, when raised, provided access to pigeon feed, which was delivered randomly through one of the feeders on a trial. Photocells in the feeder trough detected the presence of the pigeon’s head and allowed precise control of the duration of feeder access. Stimuli were presented and data recorded by computers running in adjoining room. A fan provided masking noise and adequate ventilation. 1.3. Procedure Phase 1: Pre-training. Pigeons were initially autoshaped to peck at the three different colour stimuli (red, green, and yellow) and two additional stimuli (black-and-white dotted and striped patterns) by rewarding the first response on that stimulus, or, in the absence of responding, after 60 s. On each trial, only one stimulus was present, and there were up to 60 trials per session. All stimuli were 3.8 cm × 3.8 cm squares (113 × 113 pixels), and the stimuli appeared in the center of the screen. The red stimulus was rewarded with .5 s of grain access, yellow with 1.5 s, and green with 4.5 s of access, except for pigeon P5, who had these reward durations reduced by 40%, following session 22 of pre-training, because it repeatedly failed to eat in the latter half of sessions. Once the pigeons responded reliably to all the stimuli, they were exposed to an FI 5 s schedule with each of the 3 colour stimuli. On this schedule, the first peck on the stimulus after 5 s was followed by a reward of the appropriate duration (i.e., magnitude). Occasional (20%) FI trials were followed immediately by a “choice” probe before the reward, during which the two different (noncolour) stimuli appeared on the sides of the initial stimulus, and the reward duration was determined by that choice. For the final 12 sessions of this phase, these choice trials were eliminated, and unreinforced equidistant choice trials were interspersed among the FI trials instead. On these equidistant probes, all 3 colour stimuli were simultaneously presented in locations surrounding the usual stimulus location. Location was counterbalanced across choice trials. The first peck to any of the 3 stimuli was recorded as the choice, the stimuli disappeared, and no reinforcement was given. This pretraining phase lasted from 67 to 70 sessions. Phase 2: Peak testing. Pigeons received 35–45 sessions of peak testing, and only data from the final 10 sessions for each pigeon were analyzed to ensure that stable responding had been reached. Sessions of peak testing consisted of 60 trials. The first 6 trials always consisted of 2 FI trials with each of the 3 stimuli. The remaining 54 trials consisted of an additional 8 FI and 6 peak trials with

each of the 3 stimuli, and 12 of the equidistant, unreinforced choice probes, all randomly intermixed. Peak trials were not reinforced and lasted from 20 to 40 s (uniformly distributed). Inter-trial intervals of 30 to 50 s, uniformly distributed, separated all trials. Pigeon P299 became ill in its final (36th) test session—data from this test session have been discarded. Data from the penultimate session for pigeon P18 were lost due to a hardware failure and are not included in the analysis. Data analysis. To estimate start and stop times on individual trials, we used a relative-likelihood change-point algorithm that finds statistically reliable changes in response rates (for details, see Gallistel et al., 2004a; Balci et al., 2009). In short, the approach assumes that inter-response times (IRTs) are exponentially distributed. The algorithm works by examining each successive IRT on a trial and, for each data point, computing the relative likelihood that all IRTs up to and including that data point come from the same distribution or from two different distributions. A user-specified decision criterion (the Bayes factor) determines the sensitivity of the change-point algorithm for finding transitions; we used a Bayes factor of 10, which adequately characterized the current dataset. Stricter criteria did not qualitatively change our results. Start times were defined as the first positive change point in a trial, and stop times were defined as the first negative change point. Pair-wise comparisons of start, stop, peak (midpoint of start and stop), and wait times (time to first response) across different reward magnitude conditions were conducted for individual pigeons using a Mann–Whitney U-test. We followed up these analyses with single-subject permutation tests with 10,000 iterations per test, using the difference between the medians. Choice proportions were compared by chi-squared tests for each subject, followed by pairwise comparisons using one-tailed binomial tests. The Holm–Bonferroni method (with alpha of .05) was used to correct for multiple comparisons.

2. Results Fig. 1 shows the average normalized response rates on peak trials. For all 4 pigeons, responding increased earlier in the trial for the largest reward magnitude. These differences were not as consistent later in the trial, where responding tailed off for the larger reward magnitude more quickly (P18), more slowly (P5, P35), or at around the same rate (P299), depending on the pigeon. There was some tendency towards a bimodal response distribution with a sharp early peak, followed by a lower, later peak (most notable for P18 and P35). To quantify these observations, we extracted start and stop times for each trial from the single-trial analyses. Fig. 2A–C depict the start, stop, and wait times for the different reward magnitude for each pigeon. For three subjects (P5, P18, P299), start and wait times were significantly delayed during the low-reward stimulus as compared to both the medium-reward and high-reward stimuli (all ps < .02). For the fourth pigeon, the lower magnitude did not reliably affect start or wait times (though the pattern in Fig. 1 resembles the other pigeons), but did produce earlier stop times than the two larger magnitudes (both ps < .03). For all pigeons, there were no reliable differences in peak times (not shown) nor between the two larger reward magnitudes on any of the measures (all ps > .05). The choice proportions further suggest that pigeons were indeed sensitive to the reward magnitudes as predicted by the different stimuli. Fig. 2D depicts how all four pigeons chose the stimulus that cued the largest reward magnitude more frequently than the stimulus that cued the smallest reward magnitude (all ps < .05). There was not, however, a graded preference curve as the medium reward magnitude was always chosen in equal proportion with either the higher (3 pigeons) or lower (pigeon P18) reward magnitude.

E.A. Ludvig et al. / Behavioural Processes 86 (2011) 359–363

361

Fig. 1. Time course of the average normalized response rate on peak trials for the three different reward magnitudes. The larger, top panel depicts the group average, and the bottom panels show data from the individual 4 pigeons. In all cases, responding seems to grow more quickly to the largest reward than the other two reward magnitudes.

3. Discussion Our results corroborate the recent pattern of results observed with motivational effects on timing (e.g., Balci et al., 2010a,b; Galtress and Kirkpatrick, 2009; Ludvig et al., 2007). Most pigeons started responding later with the low reward magnitude, but there was a less consistent effect on stop and peak times. This pattern emerged even though different reward magnitudes were intermingled in a single session, thus keeping the overall reward rate the same across the different conditions. The two higher reward magnitudes did not have any consistent differential effects on the timing or choice measures. This seeming equivalence of the longer durations of food access is reminiscent of the phenomenon of “duration neglect”, wherein people and rats do not prefer a longer, yet otherwise equivalent, reward (e.g., Fredrickson and Kahneman, 1993; Shizgal, 1999). In terms of timing processes, one potential explanation for this dataset would be that an internal clock is speeded up when expecting a larger reward. This account would correctly predict the earlier start times with the larger reward, but would also expect earlier stop and peak times, an effect not observed in our data. Similarly, if attention to time were affected by reward magnitude manipulations (e.g., Galtress and Kirkpatrick, 2009; 2010), then there should

also be equivalent effects on both start and stop times. Perhaps the best timing-based explanation for these twin results is to suppose separate thresholds for initiating and terminating timed responding, which are independently modulated (Gallistel et al., 2004b; Taylor et al., 2007). Reward magnitude would thus influence the start threshold, but not the stop threshold (Balci et al., 2010a,b; Ludvig et al., 2007), resulting in earlier start times, but not stop times. Another possibility is that arrival of the cue for the larger reward magnitude makes pigeons more aroused and thus more likely to emit a burst of untimed responding early in the trial. This increased frequency of untimed early bursts would thus make it appear as though pigeons started their timed responding earlier. The bimodal responding exhibited by some pigeons with the high reward magnitude supports this possibility (see Fig. 1). This hypothesis is congruent with earlier observations that start times reflect a mixture of timed and untimed bouts of responding (Matell and Portugal, 2007), whereas stop times track the expected time to reward more carefully (Gallistel et al., 2004b). This idea ties in nicely with the known roles of dopamine in learning about rewards and locomotor activity (e.g., Schultz et al., 1997; Wise, 2004). Cues for larger reward magnitudes typically elicit larger phasic dopamine bursts (Tobler et al., 2005), and higher tonic dopamine levels pro-

362

E.A. Ludvig et al. / Behavioural Processes 86 (2011) 359–363

Fig. 2. (A) Median start times, (B) median stop times, and (C) median wait times for individual pigeons as a function of reward magnitude. Error bars denote 95% confidence intervals for the median. The difference in stop times for pigeon P35 between the low and medium rewards was not significant with the Mann–Whitney test, only with the single-subject permutation test. (D) Choice proportion for individual pigeons and different stimuli. Error bars denote the standard error of the binomial distribution. * = p < .05, Med = Medium.

duce hyperactivity (e.g., Balci et al., 2010a; Zhuang et al., 2001). If larger dopamine bursts also lead to a transient increase in activity levels, then that would provide a potential neural mechanism for an increase in untimed bursts early in the trial. Thus, this interpretation suggests that perhaps motivation and timing are indeed independent after all. Acknowledgement Preparation of this manuscript was supported in part by NSERC Discovery Grant #38861 to MLS and by National Institute of Mental Health grant P50 MH062196 and Air Force Office of Scientific Research Grant FA9550-07-1-0537 (FB). References Balci, F., Gallistel, C.R., Allen, B.D., Frank, K., Gibson, J., Brunner, D., 2009. Acquisition of peak responding: what is learned? Behav. Process. 80, 67–75. Balci, F., Ludvig, E.A., Abner, R., Zhuang, X., Poon, P., Brunner, B.D., 2010a. Motivational effects on interval timing in dopamine transporter (DAT) knockdown mice. Brain Res. 1325, 89–99. Balci, F., Ludvig, E.A., Brunner, D., 2010b. Within-session modulation of timed anticipatory responding: when to start responding. Behav. Process. 85, 204–206. Church, R.M., Meck, W.H., Gibbon, J., 1994. Application of scalar timing theory to individual trials. J. Exp. Psychol. Anim. Behav. Process. 20, 135–155.

Fredrickson, B.L., Kahneman, D., 1993. Duration neglect in retrospective evaluations of affective episodes. J. Pers. Soc. Psychol. 65, 45–55. Gallistel, C.R., Balsam, P.D., Fairhurst, S., 2004a. The learning curve: implications of a quantitative analysis. Proc. Natl. Acad. Sci. 101, 13124–13131. Gallistel, C.R., King, A., McDonald, R., 2004b. Sources of variability and systematic error in mouse timing behavior. J. Exp. Psychol. Anim. Behav. Process. 30, 3–16. Galtress, T., Kirkpatrick, K., 2009. Reward value effects on timing in the peak procedure. Learn. Motiv. 40, 109–131. Galtress, T., Kirkpatrick, K., 2010. Reward magnitude effects on temporal discrimination. Learn. Motiv. 41, 108–124. Gibbon, J., 1977. Scalar expectancy theory and Weber’s law in animal timing. Psychol. Rev. 84, 279–325. Grace, R.C., Nevin, J.A., 2000. Response strength and temporal control in fixedinterval schedules. Anim. Learn. Behav. 28, 313–331. Hatten, J.L., Shull, R.L., 1983. Pausing on fixed-interval schedules: effects of the prior feeder duration. Behav. Anal. Lett. 3, 101–111. Ludvig, E.A., Conover, K., Shizgal, P., 2007. The effects of reinforcer magnitude on timing in rats. J. Exp. Anal. Behav. 87, 201–218. Matell, M.S., Portugal, G.S., 2007. Impulsive responding on the peak-interval procedure. Behav. Process. 74, 198–208. McClure, E.A., Saulsgiver, K.A., Wynne, C.D., 2009. Manipulating pre-feed, density of reinforcement, and extinction produces disruption in the location variation of a temporal discrimination task in pigeons. Behav. Process. 82, 85–89. Plowright, C.M.S., Church, D., Behnke, P., Silverman, A., 2000. Time estimation by pigeons on a fixed interval: the effect of pre-feeding. Behav. Process. 52, 43–48. Roberts, S., 1981. Isolation of an internal clock. J. Exp. Psychol. Anim. Behav. Process. 7, 242–268. Shizgal, P., 1999. On the neural computation of utility: implications from studies of brain stimulation reward. In: Kahneman, D., Diener, E., Schwarz, N. (Eds.), WellBeing: The Foundations of Hedonic Psychology. Russell Sage Foundation, New York, pp. 502–526.

E.A. Ludvig et al. / Behavioural Processes 86 (2011) 359–363 Schultz, W., Dayan, P., Montague, P.R., 1997. A neural substrate of prediction and reward. Science 275, 1593–1599. Taylor, K.M., Horvitz, J.C., Balsam, P.D., 2007. Amphetamine affects the start of responding in the peak interval timing task. Behav. Process. 74, 168–175. Tobler, P.N., Fiorillo, C.D., Schultz, W., 2005. Adaptive coding of reward value by dopamine neurons. Science 307, 1642–1645.

363

Wise, R.A., 2004. Dopamine, learning and motivation. Nat. Rev. Neurosci. 5, 1–12. Zhuang, X., Oosting, R.S., Jones, S.R., Gainetdinov, R.R., Miller, G.W., Caron, M.G., et al., 2001. Hyperactivity and impaired response habituation in hyperdopaminergic mice. Proc. Natl. Acad. Sci. U.S.A. 98, 1982–1987.