Behavioural Processes 75 (2007) 220–224
Short communication
Species differences between rats and pigeons in choices with probabilistic and delayed reinforcers James E. Mazur ∗ Psychology Department, Southern Connecticut State University, New Haven, CT 06515, United States
Abstract An adjusting-delay procedure was used to study rats’ choices with probabilistic and delayed reinforcers, and to compare them with previous results from pigeons. A left lever press led to a 5-s delay signaled by a light and a tone, followed by a food pellet on 50% of the trials. A right lever press led to an adjusting delay signaled by a light followed by a food pellet on 100% of the trials. In some conditions, the light and tone for the probabilistic reinforcer were present only on trials that delivered food. In other conditions, the light and tone were present on all trials that the left lever was chosen. Similar studies with pigeons [Mazur, J.E., 1989. Theories of probabilistic reinforcement. J. Exp. Anal. Behav. 51, 87–99; Mazur, J.E., 1991. Conditioned reinforcement and choice with delayed and uncertain primary reinforcers. J. Exp. Anal. Behav. 63, 139–150] found that choice of the probabilistic reinforcer increased dramatically when the delay-interval stimuli were omitted on no-food trials, but this study found no such effect with the rats. In other conditions, the probability of food was varied, and comparisons to previous studies with pigeons indicated that rats showed greater sensitivity to decreasing reinforcer probabilities. The results support the hypothesis that rats’ choices in these situations depend on the total time between a choice response and a reinforcer, whereas pigeons’ choices are strongly influenced by the presence of delay-interval stimuli. © 2007 Elsevier B.V. All rights reserved. Keywords: Probabilistic reinforcers; Delayed reinforcers; Delay-interval stimuli
Previous studies have shown that when pigeons choose between reinforcers that vary in both delay and probability, their choices can be predicted using the following hyperbolic equation: n A . (1) V = Pi 1 + KDi i=1
V is the value or strength of a reinforcer that could be delivered after any one of n possible delays, Pi is the probability that a delay of Di seconds will occur, A is a measure of the amount of reinforcement, and K is a decay parameter that determines how quickly V decreases with increases in Di . In a series of experiments (Mazur, 1989, 1991, 1995; Mazur and Romano, 1992), I have used an adjusting-delay procedure to test the predictions of this equation. The procedure gives animals a choice between a standard alternative, which has a constant delay (e.g., a 5-s delay, followed by food on 20% of the trials) and an adjusting alternative (e.g., a delay that varies over trials, followed by food on 100% of the trials). The delay for the adjusting alternative is systematically increased and decreased over trials, depend∗
Tel.: +1 203 392 6876; fax: +1 203 392 6805. E-mail address:
[email protected].
0376-6357/$ – see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.beproc.2007.02.004
ing on the animal’s choices, in order to estimate an indifference point—a delay at which the two alternatives are chosen about equally often. Besides demonstrating the effects of reinforcer delay and probability on choice, this research has found that the presence or absence of distinctive stimuli during the delay intervals can have a large effect on the pigeons’ indifference points. In one experiment (Mazur, 1989), the standard and adjusting alternatives were associated with red and green keylights, and red and green houselights were lit during the delays before food. In red-present conditions, a peck on the red (standard) key was always followed by a 5-s delay with red houselights, and then food was delivered on 20% of the trials. However, in red-absent conditions, whereas the 5-s red houselights were present on the standard trials that ended with food, they were omitted on the standard trials that ended without food, and a peck on the red key led only to the white houselights associated with the intertrial interval (ITI), which remained on until the next trial began. Indifference points averaged about 17 s in red-present conditions and about 7 s in red-absent conditions, indicating a much stronger preference for the standard alternative when the red houselights were omitted on standard trials without food. To account for these results, Mazur (1989) proposed that choice depended on the strengths of the conditioned reinforcers
J.E. Mazur / Behavioural Processes 75 (2007) 220–224
that preceded food (the red and green keylights and houselights), and that Di in Eq. (1) should include only the time spent in the presence of these conditioned reinforcers, not the total time between a response and a food delivery. With a 20% chance of food, an average of five standard trials would occur per food delivery. Therefore, in the red-present conditions, the red stimuli associated with the standard alternative would be present for an average of 30 s per food delivery (because with response latencies of about 1 s, there would be an average of five 1-s keylight presentations and five 5-s houselight presentations per food delivery). In red-absent conditions, the red stimuli would be present for an average of only 10 s per food delivery (five 1-s keylight presentations but only one 5-s houselight presentation per food delivery). According to this reasoning, the red conditioned reinforcers were stronger in the red-absent conditions because food was delivered more frequently in their presence than in the red-present conditions, which is why the pigeons showed a much stronger preference for the standard alternative in the red-absent conditions. This notion that the strength of a conditioned reinforcer is inversely related to its duration has been suggested by a number of other writers (e.g., Fantino, 1977; Vaughan, 1985). Another finding from this research was that varying the duration of the ITI had no effect on the pigeons’ indifference points (Mazur, 1989). This finding was consistent with the view that only time spent in the presence of the delay-interval stimuli (the red keylight and houselights) should be counted as part of Di , not time spent in the ITI (because the red stimuli were not present during the ITIs). However, in similar experiments with rats, Mazur (2005) did not find any systematic changes when the delay-interval stimulus (a light above the response lever) was omitted on trials without food. Another difference from the pigeons’ results was that the duration of the ITI did affect the rats’ choices—preference for the probabilistic reinforcer decreased with longer ITIs. Mazur suggested that pigeons and rats may differ in how they are affected by the stimuli that precede probabilistic reinforcers: rats may be sensitive to the total time between a response and food delivery, regardless of what stimuli are present or absent. Another possible explanation of the species differences, however, was simply that the light above the lever was not a salient stimulus for the rats. The purposes of the present experiment were (1) to collect more data from rats, with a variety of delay and probability combinations, that could be compared to data previously obtained from pigeons, and (2) to determine if the presence of delayinterval stimuli could affect the rats’ preferences if they were more salient. Therefore, the delay-interval stimuli for the probabilistic reinforcer included both a light above the lever and a tone. 1. Method 1.1. Subjects Four Sprague–Dawley rats, about 7 months old at the start of the experiment, were maintained at approximately 80% of their free-feeding weights.
221
1.2. Apparatus The experimental chamber was a modular test chamber for rats, 30.5 cm long, 24 cm wide, and 21 cm high. The sidewalls and top of the chamber were Plexiglas, and the front and back walls were aluminum. The floor consisted of steel rods, 0.48 cm in diameter and 1.6 cm apart, center to center. The front wall had two retractable response levers, 11 cm apart, 6 cm above the floor, 4.8 cm long, and extending 1.9 cm into the chamber. Centered in the front wall was a non-retractable lever with the same dimensions, 11.5 cm above the floor. A force of approximately 0.25 N was required to operate each lever, and when a lever was active, each effective response produced a feedback click. Above each lever was a 2-W white stimulus light, 2.5 cm in diameter. A pellet dispenser delivered 45-mg food pellets into a receptacle through a square 5.1 cm opening in the center of the front wall. A 2-W white houselight was mounted at the top center of the rear wall. A Sonalert tone generator (2900 Hz) was mounted behind the rear wall of the chamber. The chamber was enclosed in a sound-attenuating box containing a ventilation fan. All stimuli were controlled and responses recorded by an IBM-compatible personal computer using the Medstate programming language. 1.3. Procedure The rats had previously participated in conditions with an adjusting-delay procedure similar to the one used in this experiment, so no additional training was necessary. The experiment consisted of nine conditions, which were divided into three phases. 1.3.1. Phase 1 (Conditions 1–3) Every session lasted for 64 trials or for 60 min whichever came first. Within a session, each block of four trials consisted of two forced trials followed by two choice trials. At the start of each trial, the houselight was turned off, the light above the center lever was lit, and a response on this lever was required to begin the choice period. On choice trials, after a response on the center lever, the light above this lever was turned off, the two front levers were extended into the chamber, and the lights above the two side levers were turned on. A single response on the left lever constituted a choice of the standard alternative, and a single response on the right lever constituted a choice of the adjusting alternative. If the adjusting (right) lever was pressed during the choice period, the two side levers were retracted, only the light above the right lever remained on, and there was a delay of adjusting duration (as explained below). At the end of the adjusting delay, the light above the right lever was turned off, one food pellet was delivered, and the chamber was dark for 1 s. Then the houselight was turned on, and an ITI began. For all adjusting and standard trials, the duration of the ITI was set so that the total time from a choice response to the start of the next trial was 50 s. If the standard (left) lever was pressed during the choice period, the two side levers were retracted, and there was a 5s delay during which the light above the left lever was lit and
222
J.E. Mazur / Behavioural Processes 75 (2007) 220–224
a 2900-Hz tone was on. At the end of the standard delay, the light and tone were turned off, and on a certain percentage of the trials, a food pellet was delivered and the chamber was dark for 1 s. Then the houselight was turned on, and the ITI began. If no food pellet was delivered on a standard trial, the chamber did not turn dark, and the houselight was turned on immediately after the 5-s standard delay. Food pellets were delivered on 100, 50, and 25% of standard trials in Conditions 1–3, respectively. The procedure on forced trials was the same as on choice trials, except that only one lever, left or right, was extended into the chamber after a response on the center lever, and only the stimulus light above that lever was lit. A response on this lever led to the same sequence of events as on choice trials. Of every two forced trials, one involved the left lever and the other the right lever. The temporal order of these two types of trials varied randomly. After every two choice trials, the duration of the adjusting delay might be changed. If the subject chose the standard lever on both trials, the adjusting delay was decreased by 1 s. If a subject chose the adjusting lever on both choice trials, the adjusting delay was increased by 1 s (up to a maximum of 35 s). If the subject chose each lever on one trial, no change was made. In all three cases, this adjusting delay remained in effect for the next block of four trials. At the start of the first session of a condition, the adjusting delay was 0 s. At the start of later sessions of the same condition, the adjusting delay was determined by the above rules as if it were a continuation of the preceding session. 1.3.2. Phase 2 (Conditions 4–6) The purpose of this phase was to determine whether the absence of the delay-interval light and tone on trials without food would affect preference for the standard alternative. The procedure in Condition 5 was the same as for Condition 2 (50% of the standard trials ended with food, after a 5-s delay with the light and tone on). Conditions 4 and 6 were similar, except for the following change. On standard trials without food, the light above the left lever was turned off immediately after a response on this lever (along with the light above the right lever if it was a choice trial), and no tone was presented. Instead, the houselight was turned on, and the ITI began. On standard trials with food, the light above the left lever and the tone were on during the 5-s delay that preceded food. Conditions in which the light above the left lever and the tone were presented for 5 s on every standard trial will be called stimuli-present conditions. Conditions in which the light and tone were only presented on standard trials with food will be called stimuli-absent conditions. 1.3.3. Phase 3 (Conditions 7–9) The purpose of this phase was to examine the effects of decreasing the reinforcement percentage when the standard delay was zero. The procedure was the same as in Phase 1, except that there was no delay (and therefore no lever light or tone) after a left lever press. A choice of the left lever led either to the delivery of a food pellet and 1-s blackout, or to the onset of the white houselight and the start of the ITI. Food pellets were delivered on 50, 25, and 10% of standard trials in Conditions 7–9, respectively.
1.3.4. Stability criteria Condition 1 lasted for a minimum of 12 sessions, Conditions 2 through 6 for a minimum of 20 sessions, and Conditions 7 through 9 for a minimum of 15 sessions. After the minimum number of sessions, a condition was terminated for each subject individually when several stability criteria were met. To assess stability, each session was divided into two 32-trial blocks, and for each block the mean adjusting delay was calculated. The results from the first two sessions of a condition were not used, and the condition was terminated when the following criteria were met, using the data from all subsequent sessions: (a) Neither the highest nor the lowest single-block mean of a condition could occur in the last six blocks of a condition. (b) The mean adjusting delay across the last six blocks could not be the highest or the lowest six-block mean of the condition. (c) The mean delay of the last six blocks could not differ from the mean of the preceding six blocks by more than 10% or by more than 1 s (whichever was larger). 2. Results All data analyses were based on the results from the six half-session blocks that satisfied the stability criteria in each condition. For each subject and each condition, the mean adjusting delay from these six half-session blocks was used as a measure of the indifference point. Fig. 1 shows the mean adjusting delays from the four conditions in which the reinforcement probability was 50% for the standard alternative (Conditions 2, 4, 5, and 6). In stimuli-present conditions, the light and tone were on during the 5-s delay on all standard trials, but in stimuli-absent conditions, the light and tone were omitted on trials without food. Fig. 1 shows that the presence or absence of the light and tone on trials without food had no systematic effects on the mean adjusting delays. There was no systematic effect of the delay-interval stimuli for any of the four rats. For each rat, Fig. 2 shows the indifference points from all the stimuli-present conditions. The bottom panel compares the group means to the results from two experiments with pigeons
Fig. 1. Mean adjusting delays are shown for the stimuli-present conditions and the stimuli-absent conditions in which 50% of the standard trials ended with food.
J.E. Mazur / Behavioural Processes 75 (2007) 220–224
223
Fig. 2. Mean adjusting delays are shown for each rat from conditions with different standard reinforcement percentages and delays. The bottom panel compares the group means to data from similar studies with pigeons (Mazur, 1989, 1991).
that used similar procedures (Mazur, 1989, 1991). Reinforcement percentage and delay had the expected effects: indifference points were longer with longer standard delays and with smaller reinforcement probabilities. The overall pattern of results for the rats were similar to those for the pigeons, except that with smaller reinforcement percentages the indifference points from the two species diverged, with longer indifference points for the rats than the pigeons. A comparison of the results from the rats in this experiment and pigeons in previous studies reveals both some similarities and some differences. The addition of the tone during the standard delay (so that the delay interval included both the
tone and the light above the left lever) did not change the outcome obtained with rats by Mazur (2005). That is, there were no systematic differences between the stimuli-present and stimuli-absent conditions (Fig. 1). This finding is distinctly different from the results obtained with pigeons, where the indifference points (mean adjusting delays) were much shorter in stimuli-absent conditions, indicating a stronger preference for the probabilistic reinforcer when the delay-interval stimuli were omitted on trials without food (Mazur, 1989, 1991). It was this result from pigeons that led Mazur (1989) to propose that only time spent in the presence of distinctive delay-interval stimuli (which might be called conditioned reinforcers) should be
224
J.E. Mazur / Behavioural Processes 75 (2007) 220–224
counted as part of Di in the hyperbolic equation. The results from several other experiments were consistent with this interpretation of Di , and the equation has been able to account for a variety of results from pigeons (Mazur, 1991, 1995; Mazur and Romano, 1992). However, based on the different results from rats (particularly the failure to find any differences between stimuli-present and stimuli-absent conditions), Mazur (2005) proposed that rats might be sensitive to the entire delay between a choice response and the eventual delivery of food (including the ITIs in cases where food was not delivered until several trials later). When all of this time, including ITI duration, was included as part of Di , the hyperbolic equation made fairly accurate predictions for the rats’ indifference points in Mazur (2005). The lack of a difference between stimuli-present and stimuli-absent conditions in the present experiment is consistent with the hypothesis that rats’ choices depend on the total time between a choice response and food delivery, not on the presence or absence of delay-interval stimuli. The results from the different reinforcement percentage and delay combinations showed patterns that were qualitatively similar to those obtained in previous experiments with pigeons. That is, both species showed sensitivity to reinforcer percentages and delays (Fig. 2). For both species, indifferences points increased at an accelerating rate as reinforcement percentages decreased. However, with the smaller reinforcement percentages, the indifference points for the rats were greater than those for the pigeons. Mazur (1991) showed that Eq. (1) could describe quite accurately the data from pigeons shown in Fig. 2, but only if Di did not include ITI durations (when the colored houselights and keylights were not present). If Di does include ITI durations, Eq. (1) predicts longer indifference points, especially when reinforcement percentages are low. It is therefore possible that the quantitative differences between rats and pigeons shown in Fig. 2 may be attributable to differences in sensitivity to the ITI. There are, of course, other possible explanations for the quantitative differences between species shown in Fig. 2, but these differences are consistent with the proposition that the rats were sensitive to the ITIs but the pigeons were not.
Although this research suggests that there may be a difference between rats and pigeons in their sensitivities to delay-interval stimuli, the reasons for such a difference are not clear. The experiments with pigeons (Mazur, 1989, 1991) used keypeck responses, and, as the extensive literature on autoshaping has shown, keypecking is strongly influenced by Pavlovian contingencies. It is therefore not too surprising that the presence versus absence of distinctive visual stimuli during the delay intervals had large effects on the pigeons’ choice responses. It would be interesting to see if whether pigeons would continue to show differences between stimuli-absent and stimuli-present conditions if some other response, such as treadle pressing, were used. If they did, this would strengthen the case for a broad-based species difference. If not, this would suggest that their choice behavior, at least in this situation, is heavily dependent on the particular response topography that is used. Acknowledgments This research was supported by Grant MH 38357 from the National Institute of Mental Health. I thank Maureen LapointeRyan, Michael Lejeune, and Brian Wallace for their assistance in this research. References Fantino, E., 1977. Conditioned reinforcement: choice and information. In: Honig, W.K., Staddon, J.E.R. (Eds.), Handbook of Operant Behavior. Prentice Hall, Englewood Cliffs, NJ, pp. 313–339. Mazur, J.E., 1989. Theories of probabilistic reinforcement. J. Exp. Anal. Behav. 51, 87–99. Mazur, J.E., 1991. Choice with probabilistic reinforcement: effects of delay and conditioned reinforcers. J. Exp. Anal. Behav. 55, 63–77. Mazur, J.E., 1995. Conditioned reinforcement and choice with delayed and uncertain primary reinforcers. J. Exp. Anal. Behav. 63, 139–150. Mazur, J.E., 2005. Effects of reinforcer probability, delay, and response requirements on the choice of rats and pigeons: possible species differences. J. Exp. Anal. Behav. 83, 263–279. Mazur, J.E., Romano, A., 1992. Choice with delayed and probabilistic reinforcers: effects of variability, time between trials, and conditioned reinforcers. J. Exp. Anal. Behav. 58, 513–525. Vaughan, W., 1985. Choice: a local analysis. J. Exp. Anal. Behav. 43, 383–405.