Behavioural Processes 157 (2018) 279–285
Contents lists available at ScienceDirect
Behavioural Processes journal homepage: www.elsevier.com/locate/behavproc
The influence of outcome delay on suboptimal choice a,⁎
b
b
Margaret A. McDevitt , Jeffrey M. Pisklak , Marcia Spetch , Roger Dunn a b c
T c
Department of Psychology, McDaniel College, 2 College Hill, Westminster, MD, 21157, United States University of Alberta, Canada San Diego State University, United States
A R T I C LE I N FO
A B S T R A C T
Keywords: Choice Suboptimal behavior Conditioned reinforcement Preference Concurrent chain Key peck Pigeons
Under certain conditions pigeons will choose an option that provides less probable food over one that provides more probable food. This suboptimal choice behavior occurs when the outcomes are delayed and stimuli during the delay differentially signal the upcoming outcomes on the suboptimal alternative, but not the optimal alternative. The present study assessed whether duration of the outcome delay affects pigeons’ suboptimal preference. Pigeons chose between a suboptimal alternative that provided food 20% of the time and an optimal alternative that provided food 80% of the time. Stimuli presented during the delays signaled the outcomes on the suboptimal alternative, but not on the optimal alternative. The outcome delays were 5 s in some conditions and 20 s in others. The results of two experiments demonstrate that behavior is generally more suboptimal when the outcome delays are longer but tends to stay relatively suboptimal if subjects experience the long delay condition before the short delay condition. The finding that behavior is more suboptimal with longer delays to the outcomes is consistent with the view that pigeons’ suboptimal choice is influenced by both conditioned and primary reinforcement and is inconsistent with the view that suboptimal choice is influenced solely by signal value.
1. Introduction Pigeons, like other animals, typically show preferences that can be described as optimal in that they tend to prefer alternatives that provide food in larger amounts, after shorter delays, or with greater probability over alternatives that provide food in smaller amounts, after longer delays, or with lower probability. A notable exception is found with the suboptimal choice task developed by Kendall (1974). In this task, pigeons were presented with a choice between an optimal alternative that always ended in food after a delay and a suboptimal alternative that, after an equivalent delay, ended in food only half of the time. Thus, the optimal alternative provided double the primary reinforcement compared to the suboptimal alternative. When the delay stimuli on the suboptimal alternative did not differentially signal whether food would be delivered on that trial, pigeons generally preferred the optimal alternative, as expected. However, when the suboptimal alternative provided differential stimuli during the delay that signaled whether or not food would be delivered at the end of that trial, pigeons instead preferred the suboptimal alternative. Since Kendall’s (1974) initial demonstration, the suboptimal choice task has been studied extensively in varied forms (for reviews, see McDevitt et al., 2016, and Zentall, 2016). A consistently critical feature in suboptimal preference, despite variations in other procedural details, ⁎
is the presence of signals on the suboptimal alternative. In fact, pigeons’ choices are more optimal if the stimuli on the suboptimal alternative become uninformative by no longer differentially signaling food and no food outcomes (Dunn and Spetch, 1990). The information provided by the signals is not useful in the sense that getting the information does not allow the pigeon to alter the outcome of the trial. While this shift in preference may, in part, reflect a general psychological drive to seek information that occurs across species and can be characterized as curiosity in humans (see Kidd and Hayden, 2015), the present study investigates the local effects of reinforcement on the level of suboptimal choice. The finding that suboptimal preference depends on the presence of signals has more often been interpreted as indicative of the role of the signals as conditioned reinforcers, and explanations offered for the phenomenon rely on conditioned reinforcement (e.g., Cunningham and Shahan, 2018; Mazur, 1995; Spetch et al., 1990). However, these explanations differ in precisely how conditioned reinforcement is conceptualized and applied to the suboptimal choice task, and whether other variables, such as primary reinforcement, also influence suboptimal choice. One such explanation, the hyperbolic-decay model (HDM), was originally used to describe how the value of a reinforcer decreases as delay increases and has also been applied to suboptimal preference
Corresponding author. E-mail address:
[email protected] (M.A. McDevitt).
https://doi.org/10.1016/j.beproc.2018.10.008 Received 14 August 2018; Received in revised form 12 October 2018; Accepted 17 October 2018 Available online 28 October 2018 0376-6357/ © 2018 Elsevier B.V. All rights reserved.
Behavioural Processes 157 (2018) 279–285
M.A. McDevitt et al.
Fig. 1. Suboptimal choice procedure in which the optimal alternative provided food 80% of the time and the suboptimal alternative provided food only 20% of the time. Keylight colors in the terminal links were correlated with outcomes on the suboptimal alternative, but not correlated with outcomes on the optimal alternative. A single peck was required in the initial link, and the terminal-link duration was either 5 or 20 s, depending on the condition. The side and stimulus assignments were counterbalanced across subjects.
should reduce the impact of the conditioned reinforcement, causing behavior to become more optimal. Dunn and Spetch (1990) and Spetch et al. (1990) assessed the effect of manipulating the initial-link (IL) and TL schedules, respectively, in procedures similar to that used by Kendall (1974). Their results were consistent with the SiGN hypothesis, and a reanalysis of these and other studies led Cunningham and Shahan (2018, p. 13) to conclude that there is “reasonable evidence for a positive relation between suboptimal choice and TL duration and a negative relation between suboptimal choice and IL duration.” More recent demonstrations of suboptimal choice have explored probabilities of primary reinforcement other than those used in Kendall's (1974) original procedure and have generated alternative explanations. For example, Zentall et al. (2015) presented one group of pigeons, Group 50/75, with a choice between a standard suboptimal alternative (i.e., it provided differential signals and ended in food 50% of the time) and an optimal alternative that ended in food 75% of the time (but did not provide differential signals). A second group of pigeons, Group 25/75, were presented with the same task, except the probability of food on the signaled suboptimal alternative was reduced to 25%. The two groups showed, on average, similar levels of preference for the suboptimal alternative, despite the difference in the relative probability of food delivery. This finding led Zentall et al. to conclude that the predictive value of the food signal is the sole contributor to suboptimal choice. That is, in both groups, the food signals on the suboptimal alternative signaled equivalent 10-s delays to food delivery, and thus were of equivalent value. The signals on the more optimal alternative provided less conditioned reinforcement because the stimuli present during the delay were associated with both food and no food outcomes. If suboptimal choice is driven by only the value of the contingent conditioned reinforcers, the same level of preference should be obtained in each group regardless of variation in the rate of primary reinforcement on the suboptimal alternative. The results of other studies have also been consistent with the notion that the predictive value of signals determines choice behavior (e.g., Stagner et al., 2012), but they do not rule out alternative explanations. Zentall’s (2016) conception of conditioned reinforcement does not include a mechanism for the influence of temporal variables, as long as the TL delays are equivalent for the alternatives. Thus, both HDM and Zentall’s approach predict no effect of varying the TL duration equally on both the optimal and suboptimal alternatives. The SiGN model, in contrast, predicts that preference will be more suboptimal with longer TLs because the effect of primary reinforcement will be diminished by the longer delay. Although this prediction of the SiGN model has been supported by studies in which the choice was between 50% and 100% reinforcement, to date, no studies have specifically evaluated the influence of temporal variables in Zentall’s modified procedure in which the optimal alternative is unsignaled and does not always end in food (as described above). Because temporal manipulations offer the
(e.g., Mazur, 1991, 1995, 2001). HDM assumes that conditioned reinforcement is the sole contributor to the value of an alternative, and that the value of the conditioned reinforcement is a function of the amount of time spent in the presence of stimuli associated with food. That is, the longer the time spent in the presence of stimuli associated with food, the lower the value of the corresponding alternative. As applied to the present procedure, a suboptimal alternative is more valuable when the outcomes are signaled because the time spent in the presence of stimuli not associated with food is excluded from the calculation of value. The left side of Fig. 1 shows a suboptimal alternative, which leads to either a green or white stimulus. Because the white stimulus is never followed by food, the time spent in the presence of that stimulus would not be included in the HDM calculation of value. If both the green and white stimuli were sometimes followed by food and sometimes not, the time spent in the presence of both stimuli would be included in the calculation of value, thereby reducing the estimate of the value of that alternative. Thus, choice should shift towards a suboptimal alternative when the procedure switches from unsignaled to signaled. Spetch and colleagues (Dunn and Spetch, 1990; McDevitt et al., 1997; Pisklak et al., 2015; McDevitt et al., 2016; Spetch et al.,1990), influenced by Fantino’s Delay Reduction Theory (1969), conceptualized conditioned reinforcement in terms of the degree to which the signal for food after a choice indicates a reduction in the waiting time to food. Unlike the HDM model, the Signal for Good News (SiGN) hypothesis assumes suboptimal choice is the net result of both primary reinforcement and conditioned reinforcement effects. In Kendall’s (1974) task, the “good news” signal for food on the suboptimal alternative functions as a conditioned reinforcer because its presentation improves the situation from an overall 50% chance of food (at the time the choice response was made) to certain food. It is important to note that this view also assumes that the signal for no food does not have a comparable opposite (i.e., punishing) effect on the choice response. Primary reinforcement, in contrast, always favors the optimal alternative. Thus, according to the SiGN hypothesis, in a signaled suboptimal choice task, conditioned and primary reinforcement oppose each other. Suboptimal choice occurs when a suboptimal alternative provides an immediate conditioned reinforcer (the signal for food, shown as green stimulus in Fig. 1) that is strong enough to counter the influence of the greater (but delayed) primary reinforcement on the optimal alternative. Because of this competition between primary and conditioned reinforcement, the SiGN hypothesis makes qualitative predictions in conditions in which temporal variables are manipulated. A discrete-trial choice procedure coupled with a long delay phase should result in the most suboptimal behavior. As the delay phase is lengthened by increasing the terminal-link (TL) time, the relative impact of primary reinforcement should decrease, causing behavior to become more suboptimal. Conversely, lengthening the choice (or initial-link) phase 280
Behavioural Processes 157 (2018) 279–285
M.A. McDevitt et al.
second condition. In addition to the change in TL duration, the side assignments were reversed (i.e., if the suboptimal alternative was on the left in the first condition, it was on the right in the second). The TL stimuli remained in the same locations as in the first condition but became associated with the other alternative during the second condition. Each session lasted for 65 min and was composed of blocks of three trials. Within each block, one trial presented a choice between the suboptimal and optimal alternatives (i.e., both circle stimuli were presented simultaneously), and the other two were forced-exposure (FE) trials in which only one alternative was presented. Thus, every block of three trials consisted of one choice trial, one suboptimal FE trial, and one optimal FE trial. The order of the three trials within each block was randomly determined, and a 5-s intertrial interval separated each trial.
possibility to distinguish between some of the existing models of suboptimal choice, it is important to determine whether the TL effect found in the previous studies will hold across this variation in procedure. The present studies employed the suboptimal choice task shown in Fig. 1, consisting of a signaled suboptimal alternative that provided food 20% of the time and an unsignaled optimal alternative that provided food 80% of the time. A single response was required during the choice phase. One group of pigeons began the task with short (5 s) TLs and a second group began with long (20 s) TLs. After preference was established, the TL duration assignments were switched, and preference was reestablished. Although it is important to note that any choice of the 20% food alternative represents suboptimal behavior, based on the results of Spetch et al. (1990), we expect that pigeons’ behavior will be more suboptimal with 20 s TLs than with 5 s TLs. 2. Experiment 1
2.2. Results
2.1. Method
Statistical analyses were conducted with R 3.5.0 using the ez and BayesFactor packages (Lawrence, 2016; Morey and Rouder, 2018; R Core Team, 2018). A mixed Analysis of Variance modeling the effect of TL duration, order, and their interaction on the mean proportion of suboptimal choices made during the last four sessions of each condition was conducted. A corresponding generalized eta-squared effect size (ηˆ2 ) and Bayes Factor (BF10) are reported for each tested effect. All Bayes Factors are tested against a null model that included only the intercept and subject as an additive random effect. Means are reported with corresponding 95% confidence intervals and all significant effects are reported at p < .05. Fig. 2 shows the development of preference for the suboptimal alternative for the two groups of subjects across both TL duration conditions of Experiment 1. Fig. 3 shows the results for each bird in the two order groups, presented as the mean choice proportion for the suboptimal alternative (averaged over the last four sessions in each TL condition). Overall, preference for the suboptimal alternative was more extreme when the TL duration was 20 s (M = 0.83, 95% CI [0.6, 1.05]) than when it was 5 s (M = 0.42, [0.07, 0.77]). Corroborating this, statistical analyses revealed a significant main effect of TL duration, F (1, 4) = 14.48, p = .019, ηˆ2 = 0.61, BF10 = 2.90. No significant main effect of order was observed, F(1, 4) = 0.80, p = .421, ηˆ2 = 0.10, BF10 = 0.58. However, birds receiving the short TL duration first (shown in the bottom portion of Fig. 2) reversed suboptimal preference across the two TL duration conditions (M = 0.18 [−0.22, 0.57] and M = 0.96 [0.87, 1.06] for the Short and Long TL conditions, respectively) while birds receiving the long TL duration first
2.1.1. Subjects The subjects were six adult ex-racing pigeons with experience in a variety of experimental procedures and were cared for in accordance with McDaniel College’s Animal Care Guidelines. They were maintained at approximately 85% of their free-feeding weights by grain obtained during experimental sessions and immediate post-session feedings when necessary. The pigeons were housed in individual cages under a 12-hr light/dark cycle, with water and grit freely available. 2.1.2. Apparatus Six operant chambers (approximately 360 mm wide, 320 mm long, and 350 mm high) were used. Three translucent response keys, 25 mm in diameter, were mounted on the front intelligence panel 260 mm above the floor and 72.5 mm apart. The center key was never used in these experiments. Each side key could be illuminated from the rear by standard IEE 28-V 12-stimulus projectors. A 28-V 1-W miniature lamp, located 87.5 mm above the center response key, provided general chamber illumination for the duration of each session. Directly below the center key and 95 mm above the floor was an opening (57 mm high by 50 mm wide) that provided access to a solenoid-operated grain hopper filled with mixed grain. When activated, the food hopper was raised for 4 s and illuminated from above with white light by a 28-V 1W miniature lamp. A computer and a MED-PC interface, located in an adjacent room, controlled experimental events. 2.1.3. Procedure The basic procedure is shown in Fig. 1. The optimal alternative was presented on one side key and consisted of a circle stimulus that, when pecked once, was replaced with a color keylight (e.g., blue or red, which were presented equally often). After the TL delay, food was presented 80% of the time. The other 20% of the trials terminated after the TL delay with a blackout, during which the houselight was turned off for 4 s. The optimal alternative was unsignaled, in that the colors of the terminal-link keylights did not differentially signal the trial outcomes. The suboptimal alternative was presented on the opposite side key, and also consisted of a circle stimulus, that when pecked once, was replaced with a color keylight (e.g., green or white). One color (e.g., green) was presented on 20% of the trials, and always terminated with food after the TL delay. The other color (e.g., white) was presented on the remaining 80% of the trials, and always terminated with a 4-s blackout after the TL delay. The side location of the alternatives and the stimulus assignments were counterbalanced across subjects. In the first condition, the terminal-link duration was 5 s for three subjects and 20 s for three subjects. After 30 sessions were completed, the durations were switched between the two groups, and an additional 30 sessions were completed in the
Fig. 2. Mean choice proportions, in blocks of three sessions, for the suboptimal alternative for the two groups of birds in Experiment 1. In one group, the terminal-link duration was 5 s in the first condition and 20 s in the second (Short to Long TL). The other group (Long to Short TL) received the conditions in the reverse order. 281
Behavioural Processes 157 (2018) 279–285
M.A. McDevitt et al.
3. Experiment 2 In order to further substantiate the results of Experiment 1, Experiment 2 used a similar procedure with a second set of birds but modified the switch between conditions. Instead of reversing side assignments, the side and stimulus contingencies from the first condition remained in effect for the second condition, such that only the TL duration was changed (i.e., either lengthened or shortened) in the second condition. 3.1. Method 3.1.1. Subjects and apparatus The subjects were eight adult pigeons with experiences similar to those of the pigeons in Experiment 1. They were housed and maintained as described in Experiment 1. Two operant chambers, similar to the ones described in Experiment 1, were used. 3.1.2. Procedure The basic procedure was similar to that in Experiment 1 (as shown in Fig. 1). The only differences were in the session duration (which was reduced to 40 min), the number of sessions in each condition, and the condition reversal procedure. In condition one, the terminal-link duration was 5 s for four birds and 20 s for four birds. After 18 sessions, the TL durations were switched between the two groups, but all other contingencies remained the same. The second condition continued for 28 additional sessions. 3.2. Results All statistical analyses were conducted as per Experiment 1. Fig. 4 shows the development of preference for the suboptimal alternative for the two groups of subjects across both TL durations of Experiment 2. Fig. 5 shows the results for each bird, presented as the mean choice proportion for the suboptimal alternative averaged over the last four sessions at each TL condition for birds in the two order groups. Although the main effect of TL duration was smaller than that observed in Experiment 1, preference for the suboptimal alternative, collapsed across all birds, was again significantly greater when the TL duration was 20 s (M = 0.58 [0.35, 0.80]) than when it was 5 s (M = 0.42 [0.14, 0.69]) F(1, 6) = 6.91, p = .039, ηˆ2 = 0.09, BF10 = 0.82. As in Experiment 1, no significant main effect of order was observed F(1, 6) = 0.01, p = .940, ηˆ2 < 0.01, BF10 = 0.57, but there was a significant interaction between TL duration and order, F(1, 6) = 19.68, p = .004, ηˆ2 = 0.23, BF10 = 4.51, with suboptimal preference dramatically
Fig. 3. Mean choice proportions for the suboptimal alternative for each bird in Experiment 1 in conditions in which the terminal-link duration was short (5 s) or long (20 s). Means are the average of the last four sessions in each condition.
showed no systematic difference across the two TL conditions (shown in the top portion of Fig. 2; M = 0.69 [0.08, 1.30] and M = 0.67 [0.01, 1.33] for the Long and Short TL conditions, respectively). These findings suggest an interaction between the order in which a condition was run and the TL duration itself. Consistent with this observation, statistical analyses revealed a significant interaction between TL duration and order, F(1, 4) = 13.08, p = .022, ηˆ2 = 0.58, BF10 = 14.98. 2.3. Discussion The results of Experiment 1 corroborate previous studies of suboptimal choice with pigeons (see reviews by McDevitt et al., 2016, and Zentall, 2016), and also show that suboptimal choice occurred more frequently with 20-s than with 5-s TL durations. The difference in preference obtained with the two TL durations was pronounced when the short TL duration occurred first, with all three birds in that group reversing preference after the TL duration was extended. Preference was more variable for the birds that received the long TL duration first, and they also failed to demonstrate any consistent difference when the TL duration was shortened. Thus, it is possible that once suboptimal behavior is generated, it does not readily shift to optimal behavior. Some of the earlier studies using Kendall’s (1974) procedure have been criticized for inter-subject variability (Zentall, 2016), but some later studies reported large individual differences as well (e.g., Gipson et al., 2009; Laude et al., 2012; Smith et al., 2016). Although individual differences have often been attributed to stimulus or side preferences, it is possible that order effects may also have been a factor, especially in initial investigations such as Dunn and Spetch (1990) and Spetch et al. (1990) that consisted of numerous conditions in which variables were manipulated in a within-subject design.
Fig. 4. Mean choice proportions, in blocks of two sessions, for the suboptimal alternative for the two groups of birds in Experiment 2. In one group, the terminal-link duration was 5 s in the first condition and 20 s in the second (Short to Long TL). The other group (Long to Short TL) received the conditions in the reverse order. 282
Behavioural Processes 157 (2018) 279–285
M.A. McDevitt et al.
Fig. 6. Mean choice proportions for the suboptimal alternative for all birds in Experiments 1 and 2. Terminal-link duration was short (5 s) or long (20 s). The top panel shows only the first condition for each subject, the bottom panel shows the mean for all conditions. Means are the average of the last four sessions in each condition. Error bars show SE.
4. General discussion The suboptimal choice procedure employed in the present studies differed from the procedure used in early investigations (Dunn and Spetch, 1990; Kendall, 1974; McDevitt et al., 1997; Spetch et al., 1990) in several ways. The suboptimal alternative provided less frequent primary reinforcement (20% instead of 50%) as did the optimal alternative (80% instead of 100%). In addition, reinforcers on the optimal alternative were not signaled by the TL stimuli presented during the delay period. Although similar procedures have been since been employed in many studies (see Zentall, 2016 for a review), temporal effects had not been yet studied within this procedure. The results reported here extend the generality of Spetch et al.’s findings (1990) demonstrating the influence of TL duration on suboptimal preference. In addition to the significant main effect of TL duration within each experiment, a between-subjects comparison of the first assessment of preference for all subjects in both experiments shows the suboptimal alternative was chosen more frequently when TLs were long (see the top panel in Fig. 6), t(12) = 2.55, p = .025, 95%CI [0.05, 0.59], d = 1.28, BF10 = 2.76. In addition, subjects that experienced the short TL first consistently preferred the optimal alternative and reversed to preference for the suboptimal alternative when TLs were extended. The bottom panel of Fig. 6 shows the mean choice proportion for the suboptimal alternative in all conditions, averaged across all subjects in both experiments. These results are consistent with the SiGN hypothesis (McDevitt et al., 2016) and the notion that interplay between primary and conditioned reinforcement are jointly responsible for suboptimal choice behavior. The SiGN hypothesis predicts that, as the TL duration is
Fig. 5. Mean choice proportions for the suboptimal alternative for each bird in Experiment 2 in conditions in which the terminal-link duration was short (5 s) or long (20 s). Means are the average of the last four sessions in each condition.
reversing across TL conditions for pigeons that received the short TL duration first (M = 0.27 [−0.09, 0.63] and M = 0.70 [0.33, 1.08] for the Short and Long TL conditions, respectively) but not for pigeons that received the long TL duration first (M = 0.45 [0.02, 0.88] and M = 0.56 [−0.05, 1.17] for the Long and Short TL conditions, respectively),
3.3. Discussion The results of Experiment 2 were consistent with those of Experiment 1. Suboptimal choice occurred more frequently with the long TL duration than with the short TL duration. Furthermore, preferences reversed when TL duration was changed for subjects that received the short TL duration first, but not for subjects that received the long TL duration first. Thus, both the effects of TL duration and the interaction between order and TL duration were replicated, even though the side and stimulus contingencies were maintained in the transition between conditions in Experiment 2.
283
Behavioural Processes 157 (2018) 279–285
M.A. McDevitt et al.
predictions similar to those derived from the approach taken by Zentall and his colleagues (Zentall et al., 2015). According to RRM, the failure to maximize food in suboptimal choice paradigms is the result of an adaptive process minimizing attention devoted to uninformative stimuli and lost opportunities for food over the long run. Formally, the RRM predicts a higher reinforcement rate value for the suboptimal alternative than the optimal, with a larger difference occurring for the short (5 s) than the long (20 5 s) TL duration. To facilitate comparison of the two conditions, we make a straightforward assumption – based on the prevalence of matching relations in choice research (deVilliers, 1977; Herrnstein, 1961) – that choices in each condition will be distributed according to the relative rate at which they are reinforced, as predicted by the RRM’s rate values. Based on this assumption, the model predicts near equivalent proportions of responding to the suboptimal alternative across the short and long conditions; a prediction not supported by the results of the present study. At present, theoretical views that account for temporal influences, such as the SiGN hypothesis (McDevitt et al., 2016), the informationtheoretic approach (Cunningham and Shahan, 2018), and the associability decay model (Daniels and Sanabria, 2018), appear to be more useful in understanding the dynamics of suboptimal preference. In future research, we will determine whether the effect of IL schedule (Dunn and Spetch, 1990) also replicates within this variant of the suboptimal choice procedure. One unexpected but interesting result that emerged in these studies was that the order of the TL conditions had a large effect on suboptimal choice, and this was true regardless of the method used in switching conditions. In Experiment 1, the side and stimulus associations were reversed when the TL durations were reversed but in Experiment 2, the contingencies remained the same when the TL durations were reversed. The results of both experiments are consistent with the possibility that once suboptimal behavior has been established (i.e., in the long TL condition), preference is less responsive to a manipulation (shortening the TL) that otherwise would have produced more optimal behavior. On the other hand, when optimal responding is generated initially, it appears quite sensitive to the influence of extending the TL duration, and behavior consistently becomes suboptimal. Future research is needed in order to determine if this differential order effect is restricted to manipulations involving TL duration, or if, once established, suboptimal preference in general is harder to reverse than optimal preference. This finding could be important in applied settings in which treatments are applied to less than optimal behaviors. In summary, this research indicates that pigeons’ suboptimal choice cannot be explained by a single mechanism such as stimulus value. Here we have shown that delay duration and order of conditions both affect suboptimal choice. The importance of temporal information is consistent with the SiGN model (McDevitt et al., 2016), the informationtheoretic approach (Cunningham and Shahan, 2018), and the associability decay model (Daniels and Sanabria, 2018). Further research is needed to test other predictions of these models and to determine the generality of temporal parameter effects on suboptimal choice in other procedures or species (e.g., Blanchard et al., 2015; Chow et al., 2017).
extended, the relative influence of the primary reinforcement will be diminished. Given that primary reinforcement always favors the optimal alternative, longer TL delays should result in greater choice of the suboptimal alternative. The tendency to make more suboptimal choices with longer TLs is also consistent with two very recent models of suboptimal choice. First, a temporal information-theoretic approach to conditioned reinforcement proposed by Cunningham and Shahan (2018) posits that the effectiveness of a conditioned reinforcer is determined by the information conveyed about when to next expect a primary reinforcer. Conceptually similar to Delay Reduction Theory, when a putative conditioned reinforcer signals a greater reduction in delay to primary reinforcement relative to the average delay to primary reinforcement (i.e., when the signal is more temporally informative), a conditioned reinforcing effect is predicted. In addition, it is assumed that the informativeness of the signal competes with an organism’s tendency to match relative rates of responding across alternatives to the relative rates of primary reinforcement on those alternatives. Thus, suboptimal choice, driven by conditioned reinforcement, should be greater when primary reinforcement is more delayed via longer TLs. Second, the tendency to make more suboptimal choices with longer TLs is specifically predicted by a recent associability decay model of paradoxical choice (Daniels and Sanabria, 2018). According to this model, delay discounting of the primary rewards (resulting from longer TLs) affects choice of the non-informative (optimal) alternative more than the informative (suboptimal) alternative because the outcomes that follow the non-informative TLs are unpredictable, leading to high TL associability, whereas those that follow the informative TLs are fully predicted, leading to low TL associability. This model makes the untested prediction that TL duration should have the opposite effect in the suboptimal choice task based on differences in magnitude (e.g. Zentall and Stagner, 2011) rather than probability because in the magnitude procedure the outcomes following the TLs on the suboptimal alternative are unpredictable whereas those on the optimal alternative are smaller but certain; thus, associability, and hence sensitivity to delay discounting is higher for the suboptimal alternative. However, Dunn and Spetch (1990) and Spetch et al. (1990) found increased choice of the suboptimal alternative with longer TLs when the outcome on the optimal alternative was certain (100%), which does not seem to be predicted by this model. The greater choice of the suboptimal alternative with longer TLs shown in the present studies also appears inconsistent with HDM (Mazur, 1991, 1995, 2001), which estimates conditioned reinforcement value as a function of the time spent in the presence of stimuli associated with reinforcement. When TL increases, relatively more time is accrued on the optimal alternative as every TL is counted, and thus it would seem to suggest that preference should increase for the suboptimal alternative. However, the impact on the calculated value of the alternatives is very slight, which suggests that preference should remain roughly the same despite changes in the TL duration. It should be noted, though, that it is not clear how value translates into predictions of preference. Despite that uncertainty, HDM’s estimates of value of the two alternatives with 5-s TLs in the present study are nearly identical and thus the preference for the optimal alternative observed in the first short TL conditions of Experiments 1 and 2 is inconsistent with HDM. The differences in the degree of suboptimal choice as a function of TL duration are also inconsistent with the theoretical approach taken by Zentall and colleagues (e.g., Zentall, 2016; Zentall et al., 2015), which assumes that only the value of the conditioned reinforcer affects suboptimal preference. According to that view, given that the signal for food on the suboptimal alternative predicts the same delay as the food stimuli on the optimal alternative, the degree of preference should not change as a function of the TL duration. Other recent work has conceptualized suboptimal choice paradigms in terms of contemporary accounts of optimal foraging theory. The reinforcement rate model (RRM, Vasconcelos et al., 2018) makes
Author note We thank Brittany Sears, Sydney Palmer, Maxwell Seigel, Davon Ingram, Benjamin Carl, and the Spring 2018 students in PSY 2201 Psychology of Learning for assistance with data collection. References Blanchard, T.C., Hayden, B.Y., Bromberg-Martin, E.S., 2015. Orbitofrontal cortex uses distinct codes for different choice attributes in decisions motivated by curiosity. Neuron 85 (3), 602–614. https://doi.org/10.1016/j.neuron.2014.12.050. Chow, J.J., Smith, A.P., Wilson, A.G., Zentall, T.R., Beckmann, J.S., 2017. Suboptimal choice in rats: incentive salience attribution promotes maladaptive decision-making. Behav. Brain Res. 320, 244–254. https://doi.org/10.1016/j.bbr.2016.12.013.
284
Behavioural Processes 157 (2018) 279–285
M.A. McDevitt et al.
bad choices. J. Exp. Anal. Behav. 105 (1), 23–40. https://doi.org/10.1002/jeab.192. McDevitt, M.A., Spetch, M.L., Dunn, R., 1997. Contiguity and conditioned reinforcement in probabilistic choice. J. Exp. Anal. Behav. 68 (3), 317–327. https://doi.org/10. 1901/jeab.1997.68-317. Morey, R.D., Rouder, J.N., 2018. BayesFactor: Computation of Bayes Factors for Common Designs. R Package Version 0.9.12-4.2. https://CRAN.Rproject.org/package= BayesFactor. Pisklak, J.M., McDevitt, M.A., Dunn, R.M., Spetch, M.L., 2015. When good pigeons make bad decisions: choice with probabilistic delays and outcomes. J. Exp. Anal. Behav. 104 (3), 241–251. https://doi.org/10.1002/jeab.177. R Core Team, 2018. R: A Language and Environment for Statistical Computing. Retrieved from. R Foundation for Statistical Computing, Vienna, Austria. https:// www.R-project.org/. Smith, A.P., Bailey, A.R., Chow, J.J., Beckmann, J.S., Zentall, T.R., 2016. Suboptimal choice in pigeons: stimulus value predicts choice over frequencies. PLoS One 11, e0159336. https://doi.org/10.1371/journal.pone.0159336. Spetch, M., Belke, T., Barnet, R., Dunn, R., Pierce, W., 1990. Suboptimal choice in a percentage-reinforcement procedure: effects of signal condition and terminal-link length. J. Exp. Anal. Behav. 53 (2), 219–234. https://doi.org/10.1901/jeab.1990.53219. Stagner, J.P., Laude, J.R., Zentall, T.R., 2012. Pigeons prefer discriminative stimuli independently of the overall probability of reinforcement and of the number of presentations of the conditioned reinforcer. J. Exp. Psychol. Anim. Behav. Process. 38 (4), 446–452. https://doi.org/10.1037/a0030321. Vasconcelos, M., Machado, A., Pandeirada, J.N.S., 2018. Ultimate explanations and suboptimal choice. Behav. Processes 152, 63–72. https://doi.org/10.1016/j.beproc. 2018.03.023. Zentall, T.R., 2016. Resolving the paradox of suboptimal choice. J. Exp. Psychol. Anim. Learn. Cogn. 1–14. https://doi.org/10.1037/xan0000085. Zentall, T.R., Laude, J.R., Stagner, J.P., Smith, A.P., 2015. Suboptimal choice by pigeons: evidence that the value of the conditioned reinforcer rather than its frequency determines choice. Psychol. Rec. 65 (2), 223–229. https://doi.org/10.1007/s40732015-0119-2. Zentall, T.R., Stagner, J., 2011. Maladaptive choice behaviour by pigeons: an animal analogue and possible mechanism for gambling (sub-optimal human decision-making behaviour). Proc. B: Biol. Sci./R. Soc. 278, 1203–1208. https://doi.org/10.1098/ rspb.2010.1607.
Cunningham, P.J., Shahan, T.A., 2018. Suboptimal choice, reward-predictive signals, and temporal information. J. Exp. Psychol. Anim. Learn. Cogn. 44 (1), 1–22. https://doi. org/10.1037/xan0000160. Daniels, C.W., Sanabria, F., 2018. An associability decay model of paradoxical choice. J. Exp. Psychol.: Anim. Learn. Cogn. 44 (3), 258–271. https://doi.org/10.1037/ xan0000179. deVilliers, P.A., 1977. Choice in concurrent schedules and a quantitative formulation of the law of effect. In: Honig, W.K., Staddon, J.E.R. (Eds.), Handbook of Operant Behavior. Prentice Hall, Englewood Cliffs, NJ, pp. 233–287. Dunn, R., Spetch, M.L., 1990. Choice with uncertain outcomes: conditioned reinforcement effects. J. Exp. Anal. Behav. 53 (2), 201–218. https://doi.org/10.1901/jeab. 1990.53-201. Fantino, E., 1969. Choice and rate of reinforcement. J. Exp. Anal. Behav. 12 (5), 723–730. https://doi.org/10.1901/jeab.1969.12-723. Gipson, C.D., Alessandri, J.J.D., Miller, H.C., Zentall, T.R., 2009. Preference for 50% reinforcement over 75% reinforcement by pigeons. Learn. Behav. 37 (4), 289–298. https://doi.org/10.3758/LB.37.4.289. Herrnstein, R.J., 1961. Relative and absolute strength of response as a function of frequency of reinforcement. J. Exp. Anal. Behav. 4, 267–272. https://doi.org/10.1901/ jeab.1961.4-267. Kendall, S.B., 1974. Preference for intermittent reinforcement. J. Exp. Anal. Behav. 21 (3), 463–473. https://doi.org/10.1901/jeab.1974.21-463. Kidd, C., Hayden, B.Y., 2015. The psychology and neuroscience of curiosity. Neuron 88 (3), 449–460. https://doi.org/10.1016/j.neuron.2015.09.010. Laude, J.R., Pattison, K.F., Zentall, T.R., 2012. Hungry pigeons make suboptimal choices, less hungry pigeons do not. Psychon. Bull. Rev. 19, 884–891. https://doi.org/10. 3758/s13423-012-0282-2. Lawrence, M.A., 2016. Ez: Easy Analysis and Visualization of Factorial Experiments. R Package Version 4.4-0. https://CRAN.R-project.org/package=ez. Mazur, J.E., 2001. Hyperbolic value addition and general models of animal choice. Psychol. Rev. 108 (1), 96–112. https://doi.org/10.1037/0033-295X.108.1.96. Mazur, J.E., 1991. Choice with probabilistic reinforcement: effects of delay and conditioned reinforcers. J. Exp. Anal. Behav. 55 (1), 63–77. https://doi.org/10.1901/jeab. 1991.55-63. Mazur, J.E., 1995. Conditioned reinforcement and choice with delayed and uncertain primary reinforcers. J. Exp. Anal. Behav. 63 (2), 139–150. https://doi.org/10.1901/ jeab.1995.63-139. McDevitt, M.A., Dunn, R.M., Spetch, M.L., Ludvig, E.A., 2016. When good news leads to
285